Skip to content

Conversation

@rly
Copy link
Contributor

@rly rly commented Jul 23, 2025

In response to a Slack thread, this PR adds mappings for some neurodata types from ndx-pose, ndx-photometry (deprecated), and ndx-fiber-photometry. I wasn't sure whether there would be issues with the same neurodata type name from two different modules being mapped. I also made up the approaches and techniques...

It also fixes the casing of "OptogeneticStimulusSIte" -> OptogeneticStimulusSite"

@rly
Copy link
Contributor Author

rly commented Jul 23, 2025

@satra pointed me to:
https://github.com/BICCN/TMN/blob/main/templates/approaches_template.csv for approaches

I did not find a generic controlled vocabulary for techniques. Are you using https://github.com/BICCN/TMN/blob/main/templates/SPARC%20Modalities.csv ?

@codecov
Copy link

codecov bot commented Jul 23, 2025

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 74.81%. Comparing base (2827711) to head (b784ba3).

Files with missing lines Patch % Lines
dandi/metadata/neurodata_typemap.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1665      +/-   ##
==========================================
- Coverage   74.82%   74.81%   -0.01%     
==========================================
  Files          84       85       +1     
  Lines       11693    11694       +1     
==========================================
  Hits         8749     8749              
- Misses       2944     2945       +1     
Flag Coverage Δ
unittests 74.81% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@satra
Copy link
Member

satra commented Jul 23, 2025

@rly - this is a lot less formal at this point. if you see the module list you will see how we were approaching the classification for that. the main thing this is being used for is the summary info in the dandiset and for the CLI to generate meaningful suffices when disambiguating. if you have suggestions for changes to the others, please make them.

for example ndx-spectrum seems off, and incorrect. the modality field should be on alphabetical only without any dashes.

i believe we did this by taking all the dandisets we had 4+ years ago and extending this list.

@CodyCBakerPhD CodyCBakerPhD marked this pull request as ready for review August 26, 2025 13:33
Comment on lines 855 to 859
"PoseTraining": {
"module": "ndx-pose",
"neurodata_type": "PoseTraining",
"technique": "pose estimation technique",
"approach": "behavioral approach",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a more general note, i'm a bit confused about why pose training is a data type, as that doesn't seem to reflect behavior per se, and also why it is a pose estimation technique.

some of these are direct measurement techniques or estimation techniques, but this one seems odd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PoseTraining is a group consisting of training frames and source videos. A training frame refers to a frame from a source video and an the locations of each node of 1 or more skeletons, often from human labeling. So PoseTraining isn't exactly a data type but indicative of the NWB file containing training data for pose estimation methods. Should I pick out a different data type to detect instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should consider data type as a broad umbrella of what data is generated from or used for (for example there is fourier analysis above). it could be associated with a "pose estimation training technique"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. that makes sense

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rly here we have 2 extensions listed to assist with importing https://github.com/dandi/dandi-cli/blob/HEAD/dandi/metadata/nwb.py#L99 -- should we also add extension information into this 'registry' and use it there too? or may be it is no longer even needed?

Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreeing with @satra on avoiding - and also overall we should aim to make them shorter

@rly
Copy link
Contributor Author

rly commented Aug 27, 2025

It's probably worthwhile also to reindex all the neurodata types observed on DANDI and add those to the map, but this PR is a start and addresses the issue raised over Slack.

@yarikoptic
Copy link
Member

there are errors from docker compose but also smth which is relevant to this PR to be addressed

FAILED dandi/tests/test_metadata.py::test_ndtypes[ndtypes34-asset_dict34] - AssertionError: assert 'calcium imag...ation imaging' == 'fiber photom...cal technique'
  
  - fiber photometry technique; optical technique
  + calcium imaging; cell population imaging

re @rly's

It's probably worthwhile also to reindex all the neurodata types observed on DANDI and add those to the map, but this PR is a start and addresses the issue raised over Slack.

do we have enough extracted in @magland's lindi's to just grep through them?

@rly
Copy link
Contributor Author

rly commented Aug 28, 2025

From https://neurosift.app/dandi > "Neurodata Types Search", you can see a dropdown menu of all of the parsed neurodata types.

That is populated from parsing https://lindi.neurosift.org/dandi/neurodata_types_index.json.gz

Generated by https://github.com/magland/neurosift-kerchunker/blob/main/workflow_scripts/create_neurodata_types_index.py

I believe these come from the first 100 assets of each dandiset, only from dandisets where at least one of the first 20 assets is NWB, and only if LINDI files have been generated for the NWB files (and other edge case handling).

@rly
Copy link
Contributor Author

rly commented Aug 28, 2025

I'll try having an AI generate the

name {
        "module": x,
        "neurodata_type": x,
        "technique": x,
        "approach" x
}

mappings tomorrow and see how it goes.

@rly
Copy link
Contributor Author

rly commented Sep 5, 2025

NWB files with sorted single units in the NWB Units table does not get a filename suffix, but if the data has raw ecephys data, it gets the "ecephys" suffix. I see that in the dandi/metadata/util.py file modified in this PR, the module is listed as "misc" and "misc" is later filtered out from appended suffixes. Is there any particular reason to not label data with a Units table with the "ecephys" suffix?

This confused a user when looking at https://dandiarchive.org/dandiset/001533/draft/files?location=sub-CSH-ZAD-001&page=1 which has NWB files with suffix "behavior" and has a Units table. They thought the data has only behavior and no ecephys data.

@yarikoptic yarikoptic added enhancement New feature or request patch Increment the patch version when merged labels Sep 5, 2025
@yarikoptic
Copy link
Member

I think it would be great to add annotation on having Units! But I would like to be more specific than misc -- what is the best way to identify having them?

Also I would prefer to be more specific with them, since they are "quite derived" in a sense... what if we dedicate for them something like eceunits whenever we can ensure that they are associated with ecephys... or just units even?

In BIDS it will likely be an analogous dedicated suffix (e.g. _units) for such files which would be stored under ecephys folder.

@rly
Copy link
Contributor Author

rly commented Sep 5, 2025

_units makes sense (to a systems neuroscientist). _single_units or _sua (single unit activity) is even more explicit, but some rows in the Units table may represent multi-unit activity. _sortedunits may be even better. @bendichter @stephprince - as you have done experimental systems neuroscience, what do you think is most intuitive?

If an NWB file has both raw ecephys data and derived units, would the suffix then be _ecephys_units?

@stephprince
Copy link

_units makes sense (to a systems neuroscientist). _single_units or _sua (single unit activity) is even more explicit, but some rows in the Units table may represent multi-unit activity. _sortedunits may be even better. @bendichter @stephprince - as you have done experimental systems neuroscience, what do you think is most intuitive?

I would lean towards _sorted_units.

I think _units on its own is also ok, but is potentially more confusing to a non systems neuroscientist. And then I agree that a Units table in an NWB file could contain neurons tagged as mua by spike sorting software, so calling it _single_units may lead to confusion.

@yarikoptic
Copy link
Member

minor note: we should avoid _ in the names there so _sortedunits. But also will we have any other _*units to tell apart?

"approach": "optogenetic approach",
},
"OptogeneticSeries": {
"module": "ogen",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are iterating on a better structure for standardizing optogenetic implant/site details, including all those mentioned by @TheChymera in https://github.com/rly/ndx-optogenetics

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great. Attn @CodyCBakerPhD , ideally we should follow up on that issue in bids-standard and align to the efforts in ndx-optogenetics, so we have metadata appropriately exposed at BIDS level too.

},
"Spectrum": {
"module": "ndx-spectrum",
"module": "spectrum",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I wonder if this is more to be described in _desc-

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: apparently this "module" is not used as far as I can see within dandi-cli :-/ so the destiny of such changes is not even clear. For the participation in suffix, we solely rely on get_neurodata_types_to_modalities_map which bases its map on path within .nwb file

here is currently produced map
print(json.dumps(get_neurodata_types_to_modalities_map(), indent=2))
{
  "NWBMixin": "core",
  "NWBContainer": "core",
  "NWBDataInterface": "core",
  "NWBData": "core",
  "ScratchData": "core",
  "MultiContainerInterface": "core",
  "ProcessingModule": "base",
  "TimeSeries": "base",
  "Image": "base",
  "ImageReferences": "base",
  "Images": "base",
  "Device": "device",
  "ElectrodeGroup": "ecephys",
  "ElectricalSeries": "ecephys",
  "SpikeEventSeries": "ecephys",
  "EventDetection": "ecephys",
  "EventWaveform": "ecephys",
  "Clustering": "ecephys",
  "ClusterWaveforms": "ecephys",
  "LFP": "ecephys",
  "FilteredEphys": "ecephys",
  "FeatureExtraction": "ecephys",
  "IntracellularElectrode": "icephys",
  "PatchClampSeries": "icephys",
  "CurrentClampSeries": "icephys",
  "IZeroClampSeries": "icephys",
  "CurrentClampStimulusSeries": "icephys",
  "VoltageClampSeries": "icephys",
  "VoltageClampStimulusSeries": "icephys",
  "ImageSeries": "image",
  "IndexSeries": "image",
  "ImageMaskSeries": "image",
  "OpticalSeries": "image",
  "GrayscaleImage": "image",
  "RGBImage": "image",
  "RGBAImage": "image",
  "OpticalChannel": "ophys",
  "ImagingPlane": "ophys",
  "OnePhotonSeries": "ophys",
  "TwoPhotonSeries": "ophys",
  "CorrectedImageStack": "ophys",
  "MotionCorrection": "ophys",
  "ImageSegmentation": "ophys",
  "RoiResponseSeries": "ophys",
  "DfOverF": "ophys",
  "Fluorescence": "ophys",
  "OptogeneticStimulusSite": "ogen",
  "OptogeneticSeries": "ogen",
  "AnnotationSeries": "misc",
  "AbstractFeatureSeries": "misc",
  "IntervalSeries": "misc",
  "DecompositionSeries": "misc",
  "LabMetaData": "file",
  "Subject": "file",
  "NWBFile": "file",
  "SpatialSeries": "behavior",
  "BehavioralEpochs": "behavior",
  "BehavioralEvents": "behavior",
  "BehavioralTimeSeries": "behavior",
  "PupilTracking": "behavior",
  "EyeTracking": "behavior",
  "CompassDirection": "behavior",
  "Position": "behavior"
}

so we really need to reapproach this map here and see how to avoid duplication and unused specifications

@yarikoptic yarikoptic added minor Increment the minor version when merged and removed patch Increment the patch version when merged labels Sep 16, 2025
@rly
Copy link
Contributor Author

rly commented Sep 17, 2025

I added a new neurodata_type_map.py file with a more complete mapping of neurodata types (as detected by neurosift, as described above). Before I integrate it, update tests, etc., please take a look and let me know what you think. It should probably be its own file because it's so big. Should it be a python dictionary? JSON? Also, what is the difference between the key and the value of "neurodata_type"?

rly added 2 commits September 17, 2025 06:42
Added new neurodata types for image series and eye tracking metadata while removing deprecated types.
Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dandi/metadata/neurodata_typemap.py seems ot be note used, then what is it for?

type checking seems to fail.

I will move it to draft for now, but I would love for us to finalize and merge this notable improvement!!

@yarikoptic yarikoptic marked this pull request as draft October 13, 2025 17:28
@yarikoptic
Copy link
Member

@rly with an interest in physio recordings and thus potentially in https://github.com/BCM-Neurosurgery/ndx-wearables I would like to bring this over the finish line. Worse comes to worse we should do that while on @bendichter's couch in the booth at SfN!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request minor Increment the minor version when merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants