Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about naming when a spike sorting is included as a separate NWB file #1314

Open
magland opened this issue Aug 9, 2023 · 11 comments

Comments

@magland
Copy link

magland commented Aug 9, 2023

Hello DANDI team!

I have a situation where I'd like to upload the results of spike sorting as a separate NWB file from the one that contains the raw ephys traces. The reason I would like to do this is that I'd like to put the ephys data online first, and then perform spike sorting by streaming that data down into the sorting process. I don't want to then add the result to the original file, because then I'd need to re-upload the new file, which could be very large. Another reason for using a separate file is that I might want to do this more than once, for different sorting algorithms.

So I'm going to run into a naming problem, because the auto-assigned name is going to be the same (it's based on the session, etc). In that case, I realize that a checksum string will be added to the filename to distinguish it. But that's still not ideal because the name will not indicate which one has the spike sorting result. Ideally the name would have a helpful string in it such as "sorting" or "kilosort".

Wondering what you would recommend. Should I create my own naming convention and figure out how to upload while bypassing the "organize" step?

Thanks in advance!

@CodyCBakerPhD
Copy link
Contributor

Should I create my own naming convention and figure out how to upload while bypassing the "organize" step?

dandi organize is just a helper to put contents into a fashion compatible with dandi validate

If running dandi organize does not make use of the session ID as ses-{session_id} (say, when run on a single file in isolation of other dandiset contents), then you can just manually add ses-{session_id} to the filename; this is exactly how the automatic dandi upload helper function in NeuroConv works

I point this out because what I would do is just append -{name_of_sorter} to the session ID of the file, which will then show up on the name of the file as well.

See #1265 for a more detailed discuussion on the similar topic of separating raw from processed files, which we're currently experimenting with different approaches as examples in https://dandiarchive.org/dandiset/000568?pos=3 and https://dandiarchive.org/dandiset/000552?pos=4

@yarikoptic
Copy link
Member

yarikoptic commented Aug 9, 2023

Believe it or not but I am thrilled to hear all your arguments for storing raw and processed spike sorted data in different .nwb files -- that is how I kept suggesting it should be done so "great minds think alike" ;)

  • dandi organize is just a helper - it is not mandatory to be used. As long as naming of the folders files follows either DANDI (output of dandi organize) or BIDS convention -- we should be good!
    • DANDI convention: we use only a set of fields based on the metadata we extract from nwb files - https://github.com/dandi/dandi-cli/blob/HEAD/dandi/consts.py#L189 . ATM there is no "semantic" (there is _obj- field) which would meaningfully distinguish raw from spike sorted files indeed.
    • At BIDS level, work is only ongoing to formalize for animal ephys data within https://bids.neuroimaging.io/bep032 . AFAIK it did not yet go to "spike sorted" data. Since it is a common case, I would expect some dedicated entity or even suffix (e.g. _units) to annotate files. I left a comment/question in that BEP032 google doc. I see meeting coming up next Wed (right @SylvainTakerkart?) so may be we could briefly discuss. But meanwhile we could introduce both a suffix (_units?) and use of _desc- entities (so e.g., sub-mice1_ses-1_ephys.nwb and sub-mice1_ses-1_desc-kilosort1_units.nwb). And see if we could teach dandi organize to even automagically populate them? Do you have some sample files (raw + 2 different spike sorting ones)?

edit 1: fixed typos and added an example

@CodyCBakerPhD
Copy link
Contributor

Oh, that reminds me - the only caveat is that the session ID cannot contain underscores, since those are used as separator characters in the DANDI filename convention; I just replace them with dashes usually

@magland
Copy link
Author

magland commented Aug 9, 2023

@yarikoptic that makes sense.

I prepared a file called sub-paired-english/sub-paired-english_ses-paired-english-m108-191125-163508_desc-ms5-units_ecephys.nwb

and I tried to upload with the cli using

dandi upload

But I get an error because the name does not conform. Is there a different way I can upload?

@yarikoptic
Copy link
Member

But I get an error because the name does not conform. Is there a different way I can upload?

it would not conform until we allow for _desc field. just disable validation for now. What API do you use for upload and what error do you get?

@magland
Copy link
Author

magland commented Aug 9, 2023

Thanks, I have disabled validation and then the command went through. I have the example data here!

https://dandiarchive.org/dandiset/000618/draft/files?location=sub-paired-english

You can view the raster plot in neurosift.

@yarikoptic
Copy link
Member

oh neurosift is nice! but can't see anything interesting for units seems to me -- please guide me:

image

may be errors in console are of relevance?

image

@yarikoptic
Copy link
Member

on 2nd try, when I clicked right away on "raster plot" it worked!

@magland
Copy link
Author

magland commented Aug 9, 2023

on 2nd try, when I clicked right away on "raster plot" it worked!

Great! You can also click on autocorrelograms.

@yarikoptic
Copy link
Member

support for _desc should come in #1315 . I think, as it is a very generic and useful entity in BIDS, we should adopt it too. Yet to see if it would be feasible for dandi organize to automagically figure some label though. Ideas?

@magland
Copy link
Author

magland commented Aug 9, 2023

support for _desc should come in #1315 . I think, as it is a very generic and useful entity in BIDS, we should adopt it too. Yet to see if it would be feasible for dandi organize to automagically figure some label though. Ideas?

Maybe there could be an optional dandi_desc attribute in the NWB file? But maybe it shouldn't have the word "dandi", not sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants