-
Notifications
You must be signed in to change notification settings - Fork 60
Description
Description of feature
Currently, the nf-core module mergemetaphlantables uses the script merge_metaphlan_tables.py that comes along the MetaPhlAn software. This script takes a number of MetaPhlAn profiles as input and merges them using some basic merge functionality of Python's pandas module.
Prior to merging, the script determines the sample name of the profile by parsing the filename and removing the file extension and the addition _profile: https://github.com/biobakery/MetaPhlAn/blob/b7e6670831f4842afdf3b0a8531a6f676ed56c45/metaphlan/utils/merge_metaphlan_tables.py#L36
Applying this to the filenaming scheme used by taxprofiler, for which the MetaPhlAn profiles have filenames following the scheme <sample name>_<database name>.metaphlan_profile.txt, this leads to the case that each sample name will be <sample name>_<database name>.metaphlan.
While the nf-core module mergemetaphlantables does the job of merging the tables, I as the user have to manually edit this merged tables and clean the sample names when I don't want to have the database name and the suffix .metaphlan in the sample names.
Therefore, I would suggest that it would make sense to either replace the MetaPhlAn script merge_metaphlan_tables.py with a custom script that can handle the divergent filename pattern introduced by nf-core/taxprofiler or adding some code, e.g. sed, to remove the additional suffix.