Skip to content

Conversation

@diegozea
Copy link
Owner

No description provided.

@diegozea diegozea changed the title Fixes for https://github.com/diegozea/MIToS.jl/issues/172 Fixes for #172 Nov 11, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 11, 2025

Benchmark Results (Julia v1)

Time benchmarks
master 443a02e... master / 443a02e...
Information/CorrectedMutualInformation/buslje09/msa 0.748 ± 0.005 s 0.747 ± 0.0061 s 1 ± 0.011
Information/CorrectedMutualInformation/buslje09/msa_large 31.2 ± 0.15 ms 31.1 ± 0.11 ms 1 ± 0.0058
Information/CorrectedMutualInformation/buslje09/msa_wide 0.669 ± 0.0014 s 0.667 ± 0.0016 s 1 ± 0.0032
Information/MIp/PF09645 5.87 ± 0.024 ms 5.9 ± 0.046 ms 0.996 ± 0.0088
Information/frequencies!/1 0.268 ± 0.025 μs 0.253 ± 0.003 μs 1.06 ± 0.1
Information/frequencies!/2 0.95 ± 0.021 μs 0.97 ± 0.017 μs 0.979 ± 0.028
Information/highlevel/BLMI 0.0756 ± 0.00058 s 0.0759 ± 0.00041 s 0.997 ± 0.0094
Information/highlevel/buslje09 9.92 ± 0.067 ms 9.9 ± 0.059 ms 1 ± 0.009
Information/shannon_entropy/PF09645 18 ± 0.84 μs 19 ± 0.83 μs 0.945 ± 0.06
MSA/Annotations/filtercolumns/boolean mask 10.1 ± 0.61 μs 10.9 ± 0.8 μs 0.931 ± 0.088
MSA/Annotations/filtercolumns/index array 7.47 ± 3.3 μs 3.32 ± 0.2 μs 2.25 ± 1
MSA/Base.vcat/annotated 4.7 ± 0.25 μs 4.57 ± 0.28 μs 1.03 ± 0.083
MSA/Base.vcat/unannotated 1.69 ± 0.16 μs 1.68 ± 0.16 μs 1.01 ± 0.13
MSA/Residue conversions/char2res 0.388 ± 0.65 ms 0.396 ± 0.64 ms 0.98 ± 2.3
MSA/Residue conversions/int2res 0.284 ± 0.57 ms 0.285 ± 0.56 ms 0.997 ± 2.8
MSA/Residue conversions/res2char 0.271 ± 0.083 ms 0.278 ± 0.022 ms 0.976 ± 0.31
MSA/Residue conversions/res2int 0.286 ± 0.57 ms 0.294 ± 0.56 ms 0.973 ± 2.7
MSA/hobohmI/pid62 0.515 ± 0.099 μs 0.524 ± 0.23 μs 0.983 ± 0.47
MSA/identity/matrix_Float64 16.9 ± 0.58 μs 17 ± 0.61 μs 0.993 ± 0.049
MSA/identity/mean 0.0866 ± 0.014 ms 0.0874 ± 0.015 ms 0.991 ± 0.24
MSA/read/FASTA.gz 0.0758 ± 0.0049 ms 0.0761 ± 0.0057 ms 0.997 ± 0.098
MSA/read/Stockholm 0.0653 ± 0.005 ms 0.0672 ± 0.0084 ms 0.971 ± 0.14
MSA/read/Stockholm_annotated 0.0751 ± 0.0063 ms 0.076 ± 0.0098 ms 0.988 ± 0.15
MSA/read/Stockholm_mapping 0.225 ± 0.012 ms 0.207 ± 0.015 ms 1.09 ± 0.099
MSA/read/Stockholm_mapping_coords 0.145 ± 0.0095 ms 0.127 ± 0.023 ms 1.14 ± 0.22
MSA/write/FASTA 0.154 ± 0.014 ms 0.148 ± 0.018 ms 1.04 ± 0.16
PDB/_generate_interaction_keys/defaults 28.5 ± 12 μs 27.9 ± 12 μs 1.02 ± 0.62
PDB/_get_matched_Cαs/hemoglobin 23.2 ± 7.3 μs 23.3 ± 7.8 μs 0.994 ± 0.46
PDB/_pdbresidues_to_mmcifdict/2vqc 0.616 ± 0.049 ms 0.613 ± 0.056 ms 1.01 ± 0.12
PDB/count_alanine/1CBN 0.293 ± 0.015 μs 0.277 ± 0.021 μs 1.06 ± 0.097
PDB/distance/1CBN_20_30 0.155 ± 0.003 μs 0.155 ± 0.002 μs 1 ± 0.023
PDB/read/MMCIFFile 2.99 ± 0.044 ms 2.97 ± 0.039 ms 1.01 ± 0.02
SIFTS/SIFTSResidue/18gs 0.094 ± 0.005 μs 0.093 ± 0.005 μs 1.01 ± 0.076
Utils/get_n_words/ascii 0.127 ± 0.008 μs 0.133 ± 0.01 μs 0.955 ± 0.094
Utils/get_n_words/utf8 0.12 ± 0.007 μs 0.124 ± 0.008 μs 0.968 ± 0.084
time_to_load 0.746 ± 0.0017 s 0.744 ± 0.002 s 1 ± 0.0035
Memory benchmarks
master 443a02e... master / 443a02e...
Information/CorrectedMutualInformation/buslje09/msa 0.766 M allocs: 0.032 GB 0.766 M allocs: 0.032 GB 1
Information/CorrectedMutualInformation/buslje09/msa_large 0.0901 M allocs: 5.03 MB 0.0901 M allocs: 5.03 MB 1
Information/CorrectedMutualInformation/buslje09/msa_wide 0.742 M allocs: 30.3 MB 0.742 M allocs: 30.3 MB 1
Information/MIp/PF09645 20.3 k allocs: 0.819 MB 20.3 k allocs: 0.819 MB 1
Information/frequencies!/1 0 allocs: 0 B 0 allocs: 0 B
Information/frequencies!/2 0 allocs: 0 B 0 allocs: 0 B
Information/highlevel/BLMI 19.9 k allocs: 1.19 MB 19.9 k allocs: 1.19 MB 1
Information/highlevel/buslje09 0.0377 M allocs: 2.3 MB 0.0377 M allocs: 2.3 MB 1
Information/shannon_entropy/PF09645 0.047 k allocs: 12.2 kB 0.047 k allocs: 12.2 kB 1
MSA/Annotations/filtercolumns/boolean mask 18 allocs: 5.16 kB 18 allocs: 5.22 kB 0.99
MSA/Annotations/filtercolumns/index array 19 allocs: 17.3 kB 16 allocs: 1.62 kB 10.7
MSA/Base.vcat/annotated 0.143 k allocs: 6.58 kB 0.143 k allocs: 6.58 kB 1
MSA/Base.vcat/unannotated 0.064 k allocs: 2.7 kB 0.064 k allocs: 2.7 kB 1
MSA/Residue conversions/char2res 3 allocs: 4.1 MB 3 allocs: 4.1 MB 1
MSA/Residue conversions/int2res 3 allocs: 4.1 MB 3 allocs: 4.1 MB 1
MSA/Residue conversions/res2char 3 allocs: 2.05 MB 3 allocs: 2.05 MB 1
MSA/Residue conversions/res2int 3 allocs: 4.1 MB 3 allocs: 4.1 MB 1
MSA/hobohmI/pid62 31 allocs: 1.77 kB 31 allocs: 1.77 kB 1
MSA/identity/matrix_Float64 0.249 k allocs: 11.8 kB 0.249 k allocs: 11.8 kB 1
MSA/identity/mean 1.23 k allocs: 0.0517 MB 1.23 k allocs: 0.0517 MB 1
MSA/read/FASTA.gz 0.622 k allocs: 0.0871 MB 0.622 k allocs: 0.0871 MB 1
MSA/read/Stockholm 0.581 k allocs: 0.045 MB 0.581 k allocs: 0.045 MB 1
MSA/read/Stockholm_annotated 0.738 k allocs: 0.0534 MB 0.738 k allocs: 0.0532 MB 1
MSA/read/Stockholm_mapping 2.37 k allocs: 0.162 MB 2.25 k allocs: 0.116 MB 1.39
MSA/read/Stockholm_mapping_coords 1.93 k allocs: 0.138 MB 1.81 k allocs: 0.0931 MB 1.49
MSA/write/FASTA 0.303 k allocs: 14.1 kB 0.303 k allocs: 14.1 kB 1
PDB/_generate_interaction_keys/defaults 0.497 k allocs: 0.0581 MB 0.497 k allocs: 0.0581 MB 1
PDB/_get_matched_Cαs/hemoglobin 0.584 k allocs: 0.0438 MB 0.584 k allocs: 0.0438 MB 1
PDB/_pdbresidues_to_mmcifdict/2vqc 8.56 k allocs: 1.12 MB 8.56 k allocs: 1.12 MB 1
PDB/count_alanine/1CBN 0 allocs: 0 B 0 allocs: 0 B
PDB/distance/1CBN_20_30 0 allocs: 0 B 0 allocs: 0 B
PDB/read/MMCIFFile 0.039 M allocs: 2.9 MB 0.039 M allocs: 2.9 MB 1
SIFTS/SIFTSResidue/18gs 4 allocs: 0.125 kB 4 allocs: 0.125 kB 1
Utils/get_n_words/ascii 5 allocs: 0.203 kB 5 allocs: 0.203 kB 1
Utils/get_n_words/utf8 5 allocs: 0.219 kB 5 allocs: 0.219 kB 1
time_to_load 0.149 k allocs: 11.1 kB 0.149 k allocs: 11.1 kB 1

@codecov
Copy link

codecov bot commented Nov 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.00%. Comparing base (1041446) to head (443a02e).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #175      +/-   ##
==========================================
+ Coverage   96.98%   97.00%   +0.02%     
==========================================
  Files          64       64              
  Lines        4770     4843      +73     
==========================================
+ Hits         4626     4698      +72     
- Misses        144      145       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coveralls
Copy link

coveralls commented Nov 11, 2025

Coverage Status

coverage: 97.187% (+0.02%) from 97.164%
when pulling 443a02e on fix-msa-read-for-large-msas
into 1041446 on master.

@diegozea
Copy link
Owner Author

This solves #172
With this PR, avoiding the call to permutedims and the repeated calls to join by using IOBuffer, it is now possible to load the MSA for PF00069 in my notebook. However, there is room for improvement:

using Revise
using MIToS.Pfam
using MIToS.MSA
pfamfile = downloadpfam("PF00069")
@time msa = read_file(pfamfile, Stockholm; generatemapping=true, useidcoordinates=true)

1175.590645 seconds (633.13 M allocations: 380.659 GiB, 55.95% gc time, 0.85% compilation time: 39% of which was recompilation)

---

using Revise
using MIToS.Pfam
using MIToS.MSA
pfamfile = downloadpfam("PF00069")
@time msa = read_file(pfamfile, Stockholm)

251.604457 seconds (67.10 M allocations: 68.233 GiB, 10.93% gc time, 1.53% compilation time: <1% of which was recompilation)

---


@profview_allocs msa = read_file(pfamfile, Stockholm; generatemapping=true, useidcoordinates=true)

# count
# -----
# Line 41 : print(seq_ann, init)

# size
# ----
# deletefullgapcolumns!
# filtercolumns!
# _filter_mapping (Annoattions.jl L 98) -> split

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants