Skip to content

Provide custom script for merging MetaPhlAn tables for better sample name handling #574

@alexhbnr

Description

@alexhbnr

Description of feature

Currently, the nf-core module mergemetaphlantables uses the script merge_metaphlan_tables.py that comes along the MetaPhlAn software. This script takes a number of MetaPhlAn profiles as input and merges them using some basic merge functionality of Python's pandas module.

Prior to merging, the script determines the sample name of the profile by parsing the filename and removing the file extension and the addition _profile: https://github.com/biobakery/MetaPhlAn/blob/b7e6670831f4842afdf3b0a8531a6f676ed56c45/metaphlan/utils/merge_metaphlan_tables.py#L36
Applying this to the filenaming scheme used by taxprofiler, for which the MetaPhlAn profiles have filenames following the scheme <sample name>_<database name>.metaphlan_profile.txt, this leads to the case that each sample name will be <sample name>_<database name>.metaphlan.

While the nf-core module mergemetaphlantables does the job of merging the tables, I as the user have to manually edit this merged tables and clean the sample names when I don't want to have the database name and the suffix .metaphlan in the sample names.

Therefore, I would suggest that it would make sense to either replace the MetaPhlAn script merge_metaphlan_tables.py with a custom script that can handle the divergent filename pattern introduced by nf-core/taxprofiler or adding some code, e.g. sed, to remove the additional suffix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementImprovement for existing functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions