Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata / validation not caught before attempting to upload #1270

Open
jeromelecoq opened this issue Apr 5, 2023 · 48 comments
Open

Metadata / validation not caught before attempting to upload #1270

jeromelecoq opened this issue Apr 5, 2023 · 48 comments

Comments

@jeromelecoq
Copy link

I am getting a sequence of error when some metadata is missing :

(nwb) jerome.lecoq@OSXLTCYGQCV upload % nwbinspector ./to_upload --config dandi


NWBInspector Report Summary

Timestamp: 2023-04-05 13:50:51.651946-07:00
Platform: macOS-12.6.3-arm64-arm-64bit
NWBInspector version: 0.4.26

Found 17 issues over 1 files:
2 - BEST_PRACTICE_VIOLATION
15 - BEST_PRACTICE_SUGGESTION


0 BEST_PRACTICE_VIOLATION

0.0 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_index_series_points_to_image - 'IndexSeries' object at location '/stimulus/presentation/natural_movie_three_stimulus'
Message: Pointing an IndexSeries to a TimeSeries will be deprecated. Please point to an Images container instead.

0.1 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_index_series_points_to_image - 'IndexSeries' object at location '/stimulus/presentation/natural_movie_one_stimulus'
Message: Pointing an IndexSeries to a TimeSeries will be deprecated. Please point to an Images container instead.

1 BEST_PRACTICE_SUGGESTION

1.2 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_description - 'Device' object at location '/general/devices/eye-tracking camera'
Message: Description is missing.

1.3 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_description - 'Device' object at location '/general/devices/display monitor'
Message: Description is missing.

1.4 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_description - 'Device' object at location '/general/devices/Microscope'
Message: Description is missing.

1.5 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_description - 'Device' object at location '/general/devices/2-photon microscope'
Message: Description is missing.

1.6 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_description - 'Images' object with name 'SegmentationImages'
Message: Description ('no description') is a placeholder.

1.7 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_description - 'GrayscaleImage' object with name 'mean'
Message: Description is missing.

1.8 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_description - 'GrayscaleImage' object with name 'correlation'
Message: Description is missing.

1.9 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_small_dataset_compression - 'OpticalSeries' object at location '/stimulus/templates/natural_movie_three_image_stack'
Message: data is not compressed. Consider enabling compression when writing a dataset.

1.10 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_small_dataset_compression - 'OpticalSeries' object at location '/stimulus/templates/natural_movie_one_image_stack'
Message: data is not compressed. Consider enabling compression when writing a dataset.

1.11 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_experimenter_exists - 'NWBFile' object at location '/'
Message: Experimenter is missing.

1.12 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_experiment_description - 'NWBFile' object at location '/'
Message: Experiment description is missing.

1.13 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_keywords - 'NWBFile' object at location '/'
Message: Metadata /general/keywords is missing.

1.14 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_column_binary_capability - 'TimeIntervals' object with name 'trials'
Message: Column 'blank_sweep' uses 'float32' but has binary values [0. 1.]. Consider making it boolean instead and renaming the column to start with 'is_'; doing so will save 1.88KB.

1.15 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_column_binary_capability - 'PlaneSegmentation' object with name 'PlaneSegmentation'
Message: Column 'Accepted' uses 'integers' but has binary values [0 1]. Consider making it boolean instead and renaming the column to start with 'is_'; doing so will save 13.02KB.

1.16 to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb: check_column_binary_capability - 'PlaneSegmentation' object with name 'PlaneSegmentation'
Message: Column 'Rejected' uses 'integers' but has binary values [0 1]. Consider making it boolean instead and renaming the column to start with 'is_'; doing so will save 13.02KB.

(nwb) jerome.lecoq@OSXLTCYGQCV upload % cd 000459
(nwb) jerome.lecoq@OSXLTCYGQCV 000459 % dandi organize ../to_upload
2023-04-05 13:51:11,061 [ WARNING] A newer version (0.52.0) of dandi/dandi-cli is available. You are using 0.51.0
2023-04-05 13:51:11,490 [ INFO] NumExpr defaulting to 8 threads.
2023-04-05 13:51:12,251 [ INFO] Loading metadata from 1 files
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 2.6s
[Parallel(n_jobs=-1)]: Done 1 out of 1 | elapsed: 2.6s finished
2023-04-05 13:51:14,851 [ WARNING] Completely empty record for ../to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb
2023-04-05 13:51:14,851 [ INFO] Logs saved in /Users/jerome.lecoq/Library/Logs/dandi-cli/20230405205110Z-75206.log
Error: 1 out of 1 files were found not containing all necessary metadata: ../to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb

@jeromelecoq
Copy link
Author

It looks like nwbinspector is not catching the validation issue?

@CodyCBakerPhD
Copy link
Contributor

Can you try dandi validate --ignore DANDI.NO_DANDISET_FOUND <source_folder> before dandi organize?

They've been adding more content beyond the Inspector lately; also could you share the log file /Users/jerome.lecoq/Library/Logs/dandi-cli/20230405205110Z-75206.log? Maybe it has a clue as to 'what' metadata that might be

@jeromelecoq
Copy link
Author

jeromelecoq commented Apr 5, 2023

I am not sure that works as intended

(nwb) jerome.lecoq@OSXLTCYGQCV upload % cd 000459 
(nwb) jerome.lecoq@OSXLTCYGQCV 000459 % dandi validate --ignore DANDI.NO_DANDISET_FOUND ../to_upload 
2023-04-05 15:32:25,088 [    INFO] NumExpr defaulting to 8 threads.
2023-04-05 15:32:28,767 [    INFO] Logs saved in /Users/jerome.lecoq/Library/Logs/dandi-cli/20230405223222Z-76404.log
Error: Path '../to_upload' is not inside Dandiset path '/Users/jerome.lecoq/Documents/Work documents/Allen Institute/Projects/DendriticColumns/dandiset/upload/000459'
(nwb) jerome.lecoq@OSXLTCYGQCV 000459 % dandi organize ../to_upload                                 
2023-04-05 15:35:12,506 [    INFO] NumExpr defaulting to 8 threads.
2023-04-05 15:35:13,308 [    INFO] Loading metadata from 1 files
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   1 tasks      | elapsed:    2.6s
[Parallel(n_jobs=-1)]: Done   1 out of   1 | elapsed:    2.6s finished
2023-04-05 15:35:15,898 [ WARNING] Completely empty record for ../to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb
2023-04-05 15:35:15,899 [    INFO] Logs saved in /Users/jerome.lecoq/Library/Logs/dandi-cli/20230405223511Z-76664.log
Error: 1 out of 1 files were found not containing all necessary metadata: ../to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb
(nwb) jerome.lecoq@OSXLTCYGQCV 000459 % dandi validate --ignore DANDI.NO_DANDISET_FOUND ../to_upload
2023-04-05 15:35:22,661 [    INFO] NumExpr defaulting to 8 threads.
2023-04-05 15:35:23,445 [    INFO] Logs saved in /Users/jerome.lecoq/Library/Logs/dandi-cli/20230405223521Z-76685.log
Error: Path '../to_upload' is not inside Dandiset path '/Users/jerome.lecoq/Documents/Work documents/Allen Institute/Projects/DendriticColumns/dandiset/upload/000459'

@jeromelecoq
Copy link
Author

Here is the content of the log:

2023-04-05T13:51:10-0700 [INFO    ] dandi 75206:4308206976 dandi v0.51.0, hdmf v3.5.2, pynwb v2.3.1, h5py v3.7.0
2023-04-05T13:51:10-0700 [INFO    ] dandi 75206:4308206976 sys.argv = ['/Users/jerome.lecoq/opt/miniconda3/envs/nwb/bin/dandi', 'organize', '../to_upload']
2023-04-05T13:51:10-0700 [INFO    ] dandi 75206:4308206976 os.getcwd() = /Users/jerome.lecoq/Documents/Work documents/Allen Institute/Projects/DendriticColumns/dandiset/upload/000459
2023-04-05T13:51:10-0700 [DEBUG   ] urllib3.connectionpool 75206:4308206976 Starting new HTTPS connection (1): rig.mit.edu:443
2023-04-05T13:51:11-0700 [DEBUG   ] urllib3.connectionpool 75206:4308206976 https://rig.mit.edu:443 "GET /et/projects/dandi/dandi-cli HTTP/1.1" 200 579
2023-04-05T13:51:11-0700 [WARNING ] dandi 75206:4308206976 A newer version (0.52.0) of dandi/dandi-cli is available. You are using 0.51.0
2023-04-05T13:51:11-0700 [DEBUG   ] h5py._conv 75206:4308206976 Creating converter from 7 to 5
2023-04-05T13:51:11-0700 [DEBUG   ] h5py._conv 75206:4308206976 Creating converter from 5 to 7
2023-04-05T13:51:11-0700 [DEBUG   ] h5py._conv 75206:4308206976 Creating converter from 7 to 5
2023-04-05T13:51:11-0700 [DEBUG   ] h5py._conv 75206:4308206976 Creating converter from 5 to 7
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'zlib'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'gzip'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'bz2'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'lzma'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'blosc'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'zstd'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'lz4'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'astype'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'delta'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'quantize'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'fixedscaleoffset'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'packbits'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'categorize'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'pickle'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'base64'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'shuffle'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'bitround'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'msgpack2'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'crc32'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'adler32'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'json2'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'vlen-utf8'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'vlen-bytes'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'vlen-array'
2023-04-05T13:51:11-0700 [DEBUG   ] numcodecs 75206:4308206976 Registering codec 'n5_wrapper'
2023-04-05T13:51:11-0700 [INFO    ] numexpr.utils 75206:4308206976 NumExpr defaulting to 8 threads.
2023-04-05T13:51:12-0700 [INFO    ] dandi 75206:4308206976 Loading metadata from 1 files
2023-04-05T13:51:14-0700 [WARNING ] dandi 75206:4308206976 Completely empty record for ../to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb
2023-04-05T13:51:14-0700 [DEBUG   ] dandi 75206:4308206976 Caught exception 1 out of 1 files were found not containing all necessary metadata: ../to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb
2023-04-05T13:51:14-0700 [INFO    ] dandi 75206:4308206976 Logs saved in /Users/jerome.lecoq/Library/Logs/dandi-cli/20230405205110Z-75206.log

@CodyCBakerPhD
Copy link
Contributor

Last idea of mine: 2023-04-05T13:51:11-0700 [WARNING ] dandi 75206:4308206976 A newer version (0.52.0) of dandi/dandi-cli is available. You are using 0.51.0

Try upgrading with pip install -U dandi and retrying?

Otherwise I defer to @yarikoptic on what looks to be bugs on the DANDI CLI side of things

@jeromelecoq
Copy link
Author

Ah yes, I upgraded after that run
same

@yarikoptic
Copy link
Member

so the heart of the problem is the message(s) from dandi organize

2023-04-05 13:51:14,851 [ WARNING] Completely empty record for ../to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb
2023-04-05 13:51:14,851 [ INFO] Logs saved in /Users/jerome.lecoq/Library/Logs/dandi-cli/20230405205110Z-75206.log
Error: 1 out of 1 files were found not containing all necessary metadata: ../to_upload/Rorb-IRES2-Cre_590168381_590168385.nwb

correct? I guess we might improve the message there. What it means that the file contains no fields of interest for organize, not even the object_id. There was a recent issue filed for that #1266 -- may be situation described there rings a bell?

you could use dandi ls on those files to see all metadata we load, and something like this

find pynwb -iname *.nwb | while read p; do echo $p; python -c 'import sys,yaml; from dandi.pynwb_utils import get_object_id; print(get_object_id(sys.argv[1]));' $p; done

to go through your .nwb files and print their object_ids (we might want to add printing object_id by dandi ls).

Then you can see which metadata fields are used by organize to construct filenames here https://github.com/dandi/dandi-cli/blob/HEAD/dandi/consts.py#L177 . Current explanation is that among those fields there were no information to name that file. What did you expect it to get named like?

@jeromelecoq
Copy link
Author

I ran the script you suggested :

(nwb) jerome.lecoq@OSXLTCYGQCV to_upload % find . -iname *.nwb | while read p; do echo $p; python -c 'import sys,yaml; from dandi.pynwb_utils import get_object_id; print(get_object_id(sys.argv[1]));' $p; done
./Rorb-IRES2-Cre_590168381_590168385.nwb
86fb9251-666e-4ac6-b246-ed7e2747c238

@jeromelecoq
Copy link
Author

jeromelecoq commented Apr 6, 2023

To note, these are NWB files that I created by merging the output of suite2p+NeuroConv with data from Allen Institute Visual coding NWB 1.0 files. It looks like maybe some metadata needs to move around.

@jeromelecoq
Copy link
Author

I am not entirely sure what is missing.

Here is output of pynwb on this file :

input_nwb
root pynwb.file.NWBFile at 0x5050990496
Fields:
devices: {
2-photon microscope <class 'pynwb.device.Device'>,
Microscope <class 'pynwb.device.Device'>,
display monitor <class 'pynwb.device.Device'>,
eye-tracking camera <class 'pynwb.device.Device'>
}
file_create_date: [datetime.datetime(2023, 3, 18, 16, 18, 27, 551472, tzinfo=tzutc())]
identifier: 64ae8bcc-92ea-4b0c-a207-130dd959045b_test_IDs
imaging_planes: {
ImagingPlane <class 'pynwb.ophys.ImagingPlane'>
}
institution: Allen Institute for Brain Science
intervals: {
trials <class 'pynwb.epoch.TimeIntervals'>
}
processing: {
behavior <class 'pynwb.base.ProcessingModule'>,
ophys <class 'pynwb.base.ProcessingModule'>
}
session_description: no description
session_id: 590168385
session_start_time: 2020-01-01 12:30:00-08:00
stimulus: {
natural_movie_one_stimulus <class 'pynwb.image.IndexSeries'>,
natural_movie_one_stimulus_frame_duration <class 'pynwb.image.ImageSeries'>,
natural_movie_three_stimulus <class 'pynwb.image.IndexSeries'>,
natural_movie_three_stimulus_frame_duration <class 'pynwb.image.ImageSeries'>,
spontaneous_stimulus <class 'pynwb.misc.IntervalSeries'>
}
stimulus_template: {
natural_movie_one_image_stack <class 'pynwb.image.OpticalSeries'>,
natural_movie_three_image_stack <class 'pynwb.image.OpticalSeries'>
}
subject: subject pynwb.file.Subject at 0x5050983488
Fields:
age: P113D
age__reference: birth
description: Mus musculus in vivo
genotype: Rorb-IRES2-Cre/wt;Camk2a-tTA/wt;Ai93(TITL-GCaMP6f)/wt
sex: M
species: Mus musculus
subject_id: 575296278

timestamps_reference_time: 2020-01-01 12:30:00-08:00
trials: trials <class 'pynwb.epoch.TimeIntervals'>

@jeromelecoq
Copy link
Author

We are not sure which metadata is missing. Ahad and I were wondering if something else was crashing organize.

See here for an example of these files : https://www.dropbox.com/s/qwv4i2zh0un4v9d/Rorb-IRES2-Cre_590168381_590168385.nwb?dl=0

@CodyCBakerPhD
Copy link
Contributor

We are not sure which metadata is missing.

From the printout of your NWB file, it looks like you ought to have everything DANDI currently requires (at least to my knowledge). Thanks for including that

Ahad and I were wondering if something else was crashing organize.

That is my best guess now as well

@jeromelecoq
Copy link
Author

If that is helpful, I am comparing the content of this NWB files with another file that dandi organize actually like and is already on Dandi.

WORKS:

Fields:
  devices: {
    2p_microscope <class 'pynwb.device.Device'>
  }
  file_create_date: [datetime.datetime(2022, 9, 25, 4, 53, 2, 714938, tzinfo=tzoffset(None, -25200))]
  identifier: 758519303
  imaging_planes: {
    ImagingPlane <class 'pynwb.ophys.ImagingPlane'>
  }
  institution: Allen Institute for Brain Science
  intervals: {
    trials <class 'pynwb.epoch.TimeIntervals'>
  }
  processing: {
    behavior <class 'pynwb.base.ProcessingModule'>,
    ophys <class 'pynwb.base.ProcessingModule'>
  }
  session_description: Allen Institute OpenScope dataset
  session_id: 758519303
  session_start_time: 2018-09-26 17:29:17.502000-07:00
  subject: subject pynwb.file.Subject at 0x5192267424
Fields:
  age: P95D
  genotype: Cux2-CreERT2;Camk2a-tTA;Ai93
  sex: M
  species: Mus musculus
  subject_id: 408021

  timestamps_reference_time: 2018-09-26 17:29:17.502000-07:00
  trials: trials <class 'pynwb.epoch.TimeIntervals'>

DOES NOT WORK

Fields:
  devices: {
    2-photon microscope <class 'pynwb.device.Device'>,
    Microscope <class 'pynwb.device.Device'>,
    display monitor <class 'pynwb.device.Device'>,
    eye-tracking camera <class 'pynwb.device.Device'>
  }
  file_create_date: [datetime.datetime(2023, 3, 18, 16, 18, 27, 551472, tzinfo=tzutc())]
  identifier: 64ae8bcc-92ea-4b0c-a207-130dd959045b_test_IDs
  imaging_planes: {
    ImagingPlane <class 'pynwb.ophys.ImagingPlane'>
  }
  institution: Allen Institute for Brain Science
  intervals: {
    trials <class 'pynwb.epoch.TimeIntervals'>
  }
  processing: {
    behavior <class 'pynwb.base.ProcessingModule'>,
    ophys <class 'pynwb.base.ProcessingModule'>
  }
  session_description: no description
  session_id: 590168385
  session_start_time: 2020-01-01 12:30:00-08:00
  stimulus: {
    natural_movie_one_stimulus <class 'pynwb.image.IndexSeries'>,
    natural_movie_one_stimulus_frame_duration <class 'pynwb.image.ImageSeries'>,
    natural_movie_three_stimulus <class 'pynwb.image.IndexSeries'>,
    natural_movie_three_stimulus_frame_duration <class 'pynwb.image.ImageSeries'>,
    spontaneous_stimulus <class 'pynwb.misc.IntervalSeries'>
  }
  stimulus_template: {
    natural_movie_one_image_stack <class 'pynwb.image.OpticalSeries'>,
    natural_movie_three_image_stack <class 'pynwb.image.OpticalSeries'>
  }
  subject: subject pynwb.file.Subject at 0x5194424160
Fields:
  age: P113D
  age__reference: birth
  description: Mus musculus in vivo
  genotype: Rorb-IRES2-Cre/wt;Camk2a-tTA/wt;Ai93(TITL-GCaMP6f)/wt
  sex: M
  species: Mus musculus
  subject_id: 575296278

  timestamps_reference_time: 2020-01-01 12:30:00-08:00
  trials: trials <class 'pynwb.epoch.TimeIntervals'>

@jeromelecoq
Copy link
Author

Could it be fields that should NOT be there?

@CodyCBakerPhD
Copy link
Contributor

Could it be fields that should NOT be there?

Doubtful, when it comes to metadata the more information than can be included the better, so as such I don't believe there are any 'forbidden' contents

Something I did just notice is the underscores in the identifier='64ae8bcc-92ea-4b0c-a207-130dd959045b_test_IDs' @yarikoptic would that cause a problem do you think?

@jeromelecoq
Copy link
Author

I just remove most of the identifier:

path = 'Rorb-IRES2-Cre_590168381_590168385.nwb'
import h5py
X = h5py.File(path, 'r+')
X.keys()
<KeysViewHDF5 ['acquisition', 'analysis', 'file_create_date', 'general', 'identifier', 'intervals', 'processing', 'session_description', 'session_start_time', 'specifications', 'stimulus', 'timestamps_reference_time']>
X['identifier']
<HDF5 dataset "identifier": shape (), type "|O">
X['identifier']
<HDF5 dataset "identifier": shape (), type "|O">
X['identifier'][()]
b'64ae8bcc-92ea-4b0c-a207-130dd959045b_test_IDs'
local_data = X['identifier'][()]
local_data[0:8]
b'64ae8bcc'
X['identifier'][()] = local_data[0:8]
X['identifier'][()]
b'64ae8bcc'
X.close()

@jeromelecoq
Copy link
Author

same error

@satra
Copy link
Member

satra commented Apr 6, 2023

@jwodder and @yarikoptic - this section of the reader is resulting in an error - perhaps that results in the issue @jeromelecoq is seeing:

using _get_pynwb_metadata("/Users/satra/Downloads/Rorb-IRES2-Cre_590168381_590168385.nwb") in pynwb_utils

File ~/software/dandi/dandi-cli/dandi/pynwb_utils.py:210, in _get_pynwb_metadata(path)
    206 out = {}
    207 with open_readable(path) as fp, h5py.File(fp) as h5, NWBHDF5IO(
    208     file=h5, load_namespaces=True
    209 ) as io:
--> 210     nwb = io.read()
    211     for key in metadata_nwb_file_fields:
    212         value = getattr(nwb, key)

results in:

ConstructError: (root/stimulus/presentation/natural_movie_one_stimulus GroupBuilder {'attributes': {'comments': 'The data stored here is a precursor for what was displayed. Please see http://help.brain-map.org/download/attachments/10616846/VisualCoding_VisualStimuli.pdf for instructions for how to convert this to actual stimulus data', 'description': 'natural_movie_one_stimulus', 'namespace': 'core', 'neurodata_type': 'IndexSeries', 'object_id': '42360a35-1bd0-4f36-b9cc-0dc461ad4438'}, 'groups': {}, 'datasets': {'data': root/stimulus/presentation/natural_movie_one_stimulus/data DatasetBuilder {'attributes': {'conversion': 1.0, 'offset': 0.0, 'resolution': -1.0, 'unit': 'N/A'}, 'data': <Closed HDF5 dataset>}, 'timestamps': root/stimulus/presentation/natural_movie_one_stimulus/timestamps DatasetBuilder {'attributes': {'interval': 1, 'unit': 'seconds'}, 'data': <Closed HDF5 dataset>}}, 'links': {}}, 'Could not construct IndexSeries object due to: Either indexed_timeseries or indexed_images must be provided when creating an IndexSeries.')

where as this works just fine:

In [17]: with NWBHDF5IO("path_to_file.nwb", load_namespaces=True) as io
    ...: :
    ...:     nwb = io.read()
    ...: 

@jeromelecoq
Copy link
Author

Ah that seems like it. Yes. I tested the io.read() but not the thing above. We just need to find the key it crashes on?

@yarikoptic
Copy link
Member

Thanks for digging!

hm, I have tried to reproduce while incrementally building up how I open it

$> cat Rorb-IRES2-Cre_590168381_590168385.py
from pynwb import NWBHDF5IO
import h5py
from dandi.pynwb_utils import open_readable

fname = "Rorb-IRES2-Cre_590168381_590168385.nwb"

with NWBHDF5IO(fname, load_namespaces=True) as io:
      nwb = io.read()
print("way 1 worked")

with open(fname, 'rb') as fp, h5py.File(fp) as h5, NWBHDF5IO(file=h5, load_namespaces=True) as io:
      nwb = io.read()
print("way 2 worked")

with open_readable(fname) as fp, h5py.File(fp) as h5, NWBHDF5IO(file=h5, load_namespaces=True) as io:
      nwb = io.read()
print("way 3 worked")

and they all worked out

$> python Rorb-IRES2-Cre_590168381_590168385.py
way 1 worked
way 2 worked
way 3 worked

@jeromelecoq
Copy link
Author

jeromelecoq commented Apr 6, 2023

So I used a variant of this code https://github.com/rly/aibs-nwb1-to-nwb2/blob/038aff3ff09d5093d5acbffad496600a4adc607a/append_suite2p.py#L138

To port visual stimuli object in an NWB 1 files to a newly created NWB 2.0 files.

What exactly is the sub-object that crashes?

@jeromelecoq
Copy link
Author

@yarikoptic
Copy link
Member

  • I can't reproduce above crash, but can reproduce organize misbehaving
  • running $> DANDI_CACHE=ignore DANDI_DEVEL=1 dandi -l 1 organize --devel-debug ../Rorb-IRES2-Cre_590168381_590168385.nwb gives me more as
2023-04-06 19:18:32,451 [    INFO] Loading metadata from 1 files
2023-04-06 19:18:32,579 [   DEBUG] Failed to get metadata for ../Rorb-IRES2-Cre_590168381_590168385.nwb: NWB files with external 
links are not supported: /home/yoh/proj/dandi/nwb-files/Rorb-IRES2-Cre_590168381_590168385.nwb
2023-04-06 19:18:32,580 [ WARNING] Failed to load metadata for 1 out of 1 files

which is due to https://github.com/dandi/dandi-cli/blob/HEAD/dandi/metadata.py#L110 which was added in #843 to "address" #840 .

If my analysis is right, the "solution" here might be

  • improve error reporting in dandi-cli to give a clear reason why organize failed -- external links
  • prepare nwb and referenced external files properly named in dandi convention to start with and just copy instead of using dandi organize.

@jeromelecoq
Copy link
Author

Can you clarify how I can address the error? Should I remove external links?

@satra
Copy link
Member

satra commented Apr 6, 2023

@yarikoptic - perhaps its a version thing. in a fresh mamba environment on my m1 tin can:

mamba create -n testnwb ipython pip python=3.10
mamba activate testnwb 
pip install dandi

and then

from dandi.metadata import _get_pynwb_metadata
_get_pynwb_metadata("Rorb-IRES2-Cre_590168381_590168385.nwb")

the error (which points to the links as well i think):

ConstructError: (root/stimulus/presentation/natural_movie_one_stimulus GroupBuilder {'attributes': {'comments': 'The data stored here is a precursor for what was displayed. Please see http://help.brain-map.org/download/attachments/10616846/VisualCoding_VisualStimuli.pdf for instructions for how to convert this to actual stimulus data', 'description': 'natural_movie_one_stimulus', 'namespace': 'core', 'neurodata_type': 'IndexSeries', 'object_id': '42360a35-1bd0-4f36-b9cc-0dc461ad4438'}, 'groups': {}, 'datasets': {'data': root/stimulus/presentation/natural_movie_one_stimulus/data DatasetBuilder {'attributes': {'conversion': 1.0, 'offset': 0.0, 'resolution': -1.0, 'unit': 'N/A'}, 'data': }, 'timestamps': root/stimulus/presentation/natural_movie_one_stimulus/timestamps DatasetBuilder {'attributes': {'interval': 1, 'unit': 'seconds'}, 'data': }}, 'links': {}}, 'Could not construct IndexSeries object due to: Either indexed_timeseries or indexed_images must be provided when creating an IndexSeries.')

some relevant bits:

dandi                     0.52.0                   pypi_0    pypi
h5py                      3.8.0                    pypi_0    pypi
hdmf                      3.5.2                    pypi_0    pypi
pynwb                     2.3.1                    pypi_0    pypi

@satra
Copy link
Member

satra commented Apr 6, 2023

@jeromelecoq - this may help: https://www.dandiarchive.org/2022/03/03/external-links-organize.html (perhaps @CodyCBakerPhD could say if its still up to date)

@jeromelecoq
Copy link
Author

I am not sure why there are external links with the movies. I can access the raw data directly. It looks like the raw movie is in the template.

@jeromelecoq
Copy link
Author

I can't seem to replicate

>>> from dandi.metadata import _get_pynwb_metadata
>>> path
'Rorb-IRES2-Cre_590168381_590168385.nwb'
>>> _get_pynwb_metadata(path)
{'experiment_description': None, 'experimenter': None, 'identifier': '64ae8bcc', 'institution': 'Allen Institute for Brain Science', 'keywords': None, 'lab': None, 'related_publications': None, 'session_description': 'no description', 'session_id': '590168385', 'session_start_time': datetime.datetime(2020, 1, 1, 12, 30, tzinfo=tzoffset(None, -28800)), 'age': 'P113D', 'date_of_birth': None, 'genotype': 'Rorb-IRES2-Cre/wt;Camk2a-tTA/wt;Ai93(TITL-GCaMP6f)/wt', 'sex': 'M', 'species': 'Mus musculus', 'subject_id': '575296278', 'number_of_electrodes': 0, 'number_of_units': 0, 'external_file_objects': []}

@jeromelecoq
Copy link
Author

pynwb.version
'2.3.1'
dandi.version
'0.52.0'
hdmf.version
'3.5.2'
pynwb.version
'2.3.1'

@jeromelecoq
Copy link
Author

jerome.lecoq@OSXLTCYGQCV to_upload % python
Python 3.10.8

@jeromelecoq
Copy link
Author

So I am not sure how it happened so far but I have an external link to dataset in the same file ...

X.get('/stimulus/presentation/natural_movie_one_stimulus/indexed_timeseries', getlink=True)
<ExternalLink to "/stimulus/templates/natural_movie_one_image_stack" in file "Rorb-IRES2-Cre_590168381_590168385.nwb"

X.get('/stimulus/templates/natural_movie_one_image_stack', getlink=True)
<h5py._hl.group.HardLink at 0x114e22350>

@satra
Copy link
Member

satra commented Apr 7, 2023

thanks @jeromelecoq - suggests something else on my machine. still trying to get clean read.

@jeromelecoq
Copy link
Author

It does seem that this is related to ExternalLinks.

This link is between datasets in the same file. this link was created by this code line : https://github.com/rly/aibs-nwb1-to-nwb2/blob/038aff3ff09d5093d5acbffad496600a4adc607a/append_suite2p.py#L184

To connect a template with a presentation.

Is that the wrong way to do this?

@jeromelecoq
Copy link
Author

@satra
Copy link
Member

satra commented Apr 7, 2023

@jeromelecoq - when did ryan make that suggestion? perhaps the pynwb bug is fixed now and you can go to addressing the best practice violation suggested in your original post?

@yarikoptic and @jeromelecoq - i can't reproduce the contexterror on a separate linux machine, but i can on my m1 mac both natively and using a docker container. and it's interesting that the error points to the same relevant section of code. all coincidence perhaps.

@jeromelecoq
Copy link
Author

@jeromelecoq - when did ryan make that suggestion? perhaps the pynwb bug is fixed now and you can go to addressing the best practice violation suggested in your original post?

@yarikoptic and @jeromelecoq - i can't reproduce the contexterror on a separate linux machine, but i can on my m1 mac both natively and using a docker container. and it's interesting that the error points to the same relevant section of code. all coincidence perhaps.

I completely changed the way the natural_movie template is added and used Images object, per Satra suggestion. The same error occurs. So this is ruled out. Here is the newer file.
https://www.dropbox.com/s/i708rcvel1r5lwb/Rorb-IRES2-Cre_590168381_590168385-2.nwb?dl=0

Here is copy of cmd:

(nwb) jerome.lecoq@OSXLTCYGQCV 000459 % DANDI_CACHE=ignore DANDI_DEVEL=1 dandi -l 1 organize --devel-debug ../to_upload/Rorb-IRES2-Cre_590168381_590168385-2.nwb
2023-04-07 10:26:12,116 [   DEBUG] Starting new HTTPS connection (1): rig.mit.edu:443
2023-04-07 10:26:12,671 [   DEBUG] https://rig.mit.edu:443 "GET /et/projects/dandi/dandi-cli HTTP/1.1" 200 579
2023-04-07 10:26:12,673 [   DEBUG] No newer (than 0.52.0) version of dandi/dandi-cli found available
2023-04-07 10:26:12,940 [   DEBUG] Creating converter from 7 to 5
2023-04-07 10:26:12,940 [   DEBUG] Creating converter from 5 to 7
2023-04-07 10:26:12,940 [   DEBUG] Creating converter from 7 to 5
2023-04-07 10:26:12,940 [   DEBUG] Creating converter from 5 to 7
2023-04-07 10:26:12,986 [   DEBUG] Registering codec 'zlib'
2023-04-07 10:26:12,987 [   DEBUG] Registering codec 'gzip'
2023-04-07 10:26:12,988 [   DEBUG] Registering codec 'bz2'
2023-04-07 10:26:12,988 [   DEBUG] Registering codec 'lzma'
2023-04-07 10:26:12,993 [   DEBUG] Registering codec 'blosc'
2023-04-07 10:26:12,996 [   DEBUG] Registering codec 'zstd'
2023-04-07 10:26:12,997 [   DEBUG] Registering codec 'lz4'
2023-04-07 10:26:12,997 [   DEBUG] Registering codec 'astype'
2023-04-07 10:26:12,998 [   DEBUG] Registering codec 'delta'
2023-04-07 10:26:12,998 [   DEBUG] Registering codec 'quantize'
2023-04-07 10:26:12,998 [   DEBUG] Registering codec 'fixedscaleoffset'
2023-04-07 10:26:12,999 [   DEBUG] Registering codec 'packbits'
2023-04-07 10:26:12,999 [   DEBUG] Registering codec 'categorize'
2023-04-07 10:26:12,999 [   DEBUG] Registering codec 'pickle'
2023-04-07 10:26:13,000 [   DEBUG] Registering codec 'base64'
2023-04-07 10:26:13,001 [   DEBUG] Registering codec 'shuffle'
2023-04-07 10:26:13,001 [   DEBUG] Registering codec 'bitround'
2023-04-07 10:26:13,004 [   DEBUG] Registering codec 'msgpack2'
2023-04-07 10:26:13,004 [   DEBUG] Registering codec 'crc32'
2023-04-07 10:26:13,004 [   DEBUG] Registering codec 'adler32'
2023-04-07 10:26:13,004 [   DEBUG] Registering codec 'json2'
2023-04-07 10:26:13,006 [   DEBUG] Registering codec 'vlen-utf8'
2023-04-07 10:26:13,006 [   DEBUG] Registering codec 'vlen-bytes'
2023-04-07 10:26:13,006 [   DEBUG] Registering codec 'vlen-array'
2023-04-07 10:26:13,030 [   DEBUG] Registering codec 'n5_wrapper'
2023-04-07 10:26:13,115 [    INFO] NumExpr defaulting to 8 threads.
2023-04-07 10:26:13,873 [    INFO] Loading metadata from 1 files
2023-04-07 10:26:14,008 [   DEBUG] Failed to get metadata for ../to_upload/Rorb-IRES2-Cre_590168381_590168385-2.nwb: NWB files with external links are not supported: /Users/jerome.lecoq/Documents/Work documents/Allen Institute/Projects/DendriticColumns/dandiset/upload/to_upload/Rorb-IRES2-Cre_590168381_590168385-2.nwb
2023-04-07 10:26:14,008 [ WARNING] Failed to load metadata for 1 out of 1 files
2023-04-07 10:26:14,008 [ WARNING] Completely empty record for ../to_upload/Rorb-IRES2-Cre_590168381_590168385-2.nwb
2023-04-07 10:26:14,008 [   DEBUG] Caught exception 1 out of 1 files were found not containing all necessary metadata: ../to_upload/Rorb-IRES2-Cre_590168381_590168385-2.nwb
2023-04-07 10:26:14,008 [    INFO] Logs saved in /Users/jerome.lecoq/Library/Logs/dandi-cli/20230407172611Z-96788.log
Traceback (most recent call last):
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/bin/dandi", line 8, in <module>
    sys.exit(main())
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/click/decorators.py", line 38, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/dandi/cli/base.py", line 102, in wrapper
    return f(*args, **kwargs)
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/dandi/cli/cmd_organize.py", line 109, in organize
    organize(
  File "/Users/jerome.lecoq/opt/miniconda3/envs/nwb/lib/python3.10/site-packages/dandi/organize.py", line 842, in organize
    raise ValueError(msg)
ValueError: 1 out of 1 files were found not containing all necessary metadata: ../to_upload/Rorb-IRES2-Cre_590168381_590168385-2.nwb

I am a very unclear as to what is going on. Should we loop in Ryan here?

@Ahad-Allen
Copy link

Ahad-Allen commented Apr 7, 2023

Hi all, I am having the same type of error as jerome

Completely empty record for /allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb/1193555033/000336/sub-621602/sub-621602_ophys.nwb
Traceback (most recent call last):
  File "dandi_uploads.py", line 117, in <module>
    automatic_dandi_upload(nwb_folder_path = r'/allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb', dandiset_id = '000336', session_id=r'1193555033', experiment_id = '1193675753', subject_id='621602')
  File "dandi_uploads.py", line 89, in automatic_dandi_upload
    dandi_organize(paths=str(directory_path), dandiset_path=str(dandi_path_set))
  File "/allen/programs/mindscope/workgroups/openscope/ahad/Conda_env/long_nwb/lib/python3.8/site-packages/dandi/organize.py", line 842, in organize
    raise ValueError(msg)
ValueError: 2 out of 2 files were found not containing all necessary metadata: /allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb/1193555033/1193675750raw_data.nwb, /allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb/1193555033/000336/sub-621602/sub-621602_ophys.nwb
(/allen/programs/mindscope/workgroups/openscope/ahad/Conda_env/long_nwb) [ahad.bawany@ibs-ahadb-vm1 scripts]$ python dandi_uploads.py 
PATHS:  /allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb/1193555033 /allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb/1193555033 /allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb/1193555033/000336
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   2 | elapsed:    7.7s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   2 out of   2 | elapsed:    7.7s finished
Completely empty record for /allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb/1193555033/1193675750raw_data.nwb
Completely empty record for /allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb/1193555033/000336/sub-621602/sub-621602_ophys.nwb
Traceback (most recent call last):
  File "dandi_uploads.py", line 117, in <module>
    automatic_dandi_upload(nwb_folder_path = r'/allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb', dandiset_id = '000336', session_id=r'1193555033', experiment_id = '1193675753', subject_id='621602')
  File "dandi_uploads.py", line 89, in automatic_dandi_upload
    dandi_organize(paths=str(directory_path), dandiset_path=str(dandi_path_set))
  File "/allen/programs/mindscope/workgroups/openscope/ahad/Conda_env/long_nwb/lib/python3.8/site-packages/dandi/organize.py", line 842, in organize
    raise ValueError(msg)
ValueError: 2 out of 2 files were found not containing all necessary metadata: /allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb/1193555033/1193675750raw_data.nwb, /allen/programs/mindscope/workgroups/openscope/ahad/ophys_no_behavior_nwb/1193555033/000336/sub-621602/sub-621602_ophys.nwb

with different nwb files. These files are recgenerations of files that were already on dandi and have passed dandi validation in the past, with the only difference being that the subject_id in the subject field has changed.
One important element to note is that I upgraded from dandi version 0.48.1 to the latest version before attempting these uploads.
I have attached a copy of the file here: https://drive.google.com/file/d/1WCzmOd-V3KtAiy1uN4LB-_ShT1yeoxcD/view?usp=sharing

@yarikoptic
Copy link
Member

yarikoptic commented Apr 7, 2023

@Ahad-Allen following above discussion -- do you know if files include external links?

edit: ignore -- as I showed below, it does not

you can possibly get to the original exception and warnings (which might warn about external links) via running it as DANDI_CACHE=ignore DANDI_DEVEL=1 dandi -l 1 organize --devel-debug ....

@yarikoptic
Copy link
Member

yarikoptic commented Apr 7, 2023

some relevant bits:

dandi                     0.52.0                   pypi_0    pypi
h5py                      3.8.0                    pypi_0    pypi
hdmf                      3.5.2                    pypi_0    pypi
pynwb                     2.3.1                    pypi_0    pypi
using this script -- those modules versions seems to be the same
from pynwb import NWBHDF5IO
from dandi.consts import metadata_nwb_file_fields
from dandi.pynwb_utils import open_readable
from dandi.pynwb_utils import nwb_has_external_links

import sys


def load(io):
    nwb = io.read()
    for key in metadata_nwb_file_fields:
        value = getattr(nwb, key)

import pkg_resources
import dandi, h5py, hdmf, pynwb
for m in dandi, h5py, hdmf, pynwb:
    print(pkg_resources.get_distribution(m.__name__))


for fname in sys.argv[1:]:
    print(f"{fname} has links: {nwb_has_external_links(fname)}")
    with NWBHDF5IO(fname, load_namespaces=True) as io:
          load(io)
    print("way 1 worked")

    with open(fname, 'rb') as fp, h5py.File(fp) as h5, NWBHDF5IO(file=h5, load_namespaces=True) as io:
          load(io)
    print("way 2 worked")

    with open_readable(fname) as fp, h5py.File(fp) as h5, NWBHDF5IO(file=h5, load_namespaces=True) as io:
          load(io)
    print("way 3 worked")

    from dandi.metadata import _get_pynwb_metadata
    print(_get_pynwb_metadata(fname))
$> DANDI_CACHE=ignore python test_on_nwb.py Rorb-IRES2-Cre_590168381_590168385.nwb
dandi 0.52.0
h5py 3.8.0
hdmf 3.5.2
pynwb 2.3.1
Rorb-IRES2-Cre_590168381_590168385.nwb has links: True
way 1 worked
way 2 worked
way 3 worked
{'experiment_description': None, 'experimenter': None, 'identifier': '64ae8bcc-92ea-4b0c-a207-130dd959045b_test_IDs', 'institution': 'Allen Institute for Brain Science', 'keywords': None, 'lab': None, 'related_publications': None, 'session_description': 'no description', 'session_id': '590168385', 'session_start_time': datetime.datetime(2020, 1, 1, 12, 30, tzinfo=tzoffset(None, -28800)), 'age': 'P113D', 'date_of_birth': None, 'genotype': 'Rorb-IRES2-Cre/wt;Camk2a-tTA/wt;Ai93(TITL-GCaMP6f)/wt', 'sex': 'M', 'species': 'Mus musculus', 'subject_id': '575296278', 'number_of_electrodes': 0, 'number_of_units': 0, 'external_file_objects': []}

and on file from @Ahad-Allen

$> DANDI_CACHE=ignore python test_on_nwb.py 1193675750raw_data.nwb                
dandi 0.52.0
h5py 3.8.0
hdmf 3.5.2
pynwb 2.3.1
1193675750raw_data.nwb has links: False
/home/yoh/proj/dandi/dandi-cli/venvs/dev3/lib/python3.9/site-packages/hdmf/spec/namespace.py:531: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.5.0 because version 1.5.1 is already loaded.
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
/home/yoh/proj/dandi/dandi-cli/venvs/dev3/lib/python3.9/site-packages/hdmf/spec/namespace.py:531: UserWarning: Ignoring cached namespace 'core' version 2.3.0 because version 2.6.0-alpha is already loaded.
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
/home/yoh/proj/dandi/dandi-cli/venvs/dev3/lib/python3.9/site-packages/hdmf/spec/namespace.py:531: UserWarning: Ignoring cached namespace 'hdmf-experimental' version 0.1.0 because version 0.2.0 is already loaded.
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
way 1 worked
way 2 worked
way 3 worked
{'experiment_description': 'ophys session', 'experimenter': None, 'identifier': '1193675750', 'institution': 'Allen Institute for Brain Science', 'keywords': ['2-photon', 'calcium imaging', 'visual cortex', 'behavior', 'task'], 'lab': None, 'related_publications': None, 'session_description': 'Ophys Session', 'session_id': None, 'session_start_time': datetime.datetime(2022, 7, 22, 12, 7, 33, 412000, tzinfo=tzutc()), 'age': 'P161.0D', 'date_of_birth': None, 'genotype': 'Rbp4-Cre_KL100/wt;Camk2a-tTA/wt;Ai93(TITL-GCaMP6f)/wt', 'sex': 'F', 'species': 'Mus musculus', 'subject_id': '621602', 'number_of_electrodes': 0, 'number_of_units': 0, 'external_file_objects': []}
DANDI_CACHE=ignore python test_on_nwb.py 1193675750raw_data.nwb  37.75s user 0.88s system 103% cpu 37.303 total

so also works -- I guess difference in some other version detail.

edit: on that box I use simple virtualenv with system wide python 3.9

@yarikoptic
Copy link
Member

and running organize on the file from @Ahad-Allen worked for me
smaug:~/proj/dandi/nwb-files/000027
$> DANDI_CACHE=ignore DANDI_DEVEL=1 dandi -l 1 organize --devel-debug ../1193675750raw_data.nwb 
...
2023-04-07 15:55:45,114 [    INFO] Symlink support autodetected; setting files_mode='symlink'
2023-04-07 15:55:45,118 [   DEBUG] Assigned 1 session_id's based on the date
2023-04-07 15:55:45,119 [    INFO] Organized 1 paths. Visit /home/yoh/proj/dandi/nwb-files/000027/
2023-04-07 15:55:45,119 [    INFO] Logs saved in /home/yoh/.cache/dandi-cli/log/20230407195534Z-3648741.log
DANDI_CACHE=ignore DANDI_DEVEL=1 dandi -l 1 organize --devel-debug   11.94s user 0.83s system 108% cpu 11.736 total
(dev3) 1 10233.....................................:Fri 07 Apr 2023 03:55:45 PM EDT:.
smaug:~/proj/dandi/nwb-files/000027
$> ls -l /home/yoh/proj/dandi/nwb-files/000027/sub-621602/sub-621602_ophys.nwb 
lrwxrwxrwx 1 yoh yoh 53 Apr  7 15:55 /home/yoh/proj/dandi/nwb-files/000027/sub-621602/sub-621602_ophys.nwb -> /home/yoh/proj/dandi/nwb-files/1193675750raw_data.nwb

@bendichter
Copy link
Contributor

It looks like these files were created with non-standard means. Without more detailed reported from dandi-cli, it's going to be difficult to know how to resolve.

@jeromelecoq
Copy link
Author

Hi @bendichter, well, I am not entirely sure to what extent this is out of a normal workflow.

1/ I used suite2p to segment a movie.
2/ I used NeuroConv to make the first NWB from suite2p output.
3/ I loaded NWB 1.0 files from the Allen
4/ I added objects to the NeuroConv NWB 2.0 output, by copying values from the NWB 1.0 file.

@jeromelecoq
Copy link
Author

I was able to nailed down the issue further. The problem is the IndexSeries object when it receives a index_timeseries as parameter to register the associated template. This end-up creating an NWB file with external file link.
Perhaps the problem is in pynwb upon creating. If I convert to a TimeSeries, removing the link to the template. It all works.

@jeromelecoq
Copy link
Author

I believe this code here : https://pynwb.readthedocs.io/en/stable/tutorials/domain/brain_observatory.html

Would not work as a result.
In particular this part :

for stimulus in stimulus_list:
    visual_stimulus_images = ImageSeries(
        name=stimulus,
        data=dataset.get_stimulus_template(stimulus),
        unit='NA',
        format='raw',
        timestamps=[0.0])
    image_index = IndexSeries(
        name=stimulus,
        data=dataset.get_stimulus_table(stimulus).frame.values,
        unit='NA',
        indexed_timeseries=visual_stimulus_images,
        timestamps=timestamps[dataset.get_stimulus_table(stimulus).start.values])
    nwbfile.add_stimulus_template(visual_stimulus_images)
    nwbfile.add_stimulus(image_index)

The problem is that indexed_timeseries link which causes dandi to have issues.

@satra
Copy link
Member

satra commented Apr 9, 2023

i'm going to bring @rly into this conversation. the summary of this issue is that certain operations lead to external links being created, which are not external links (as in don't point to files outside we think) and that's triggering dandi cli to complain.

@jeromelecoq - just a random thought, is it possible that some part of the step is still pointing to data/data array in the nwb 1 file? i.e. still maintains a reference hence treated as an external link?

@jeromelecoq
Copy link
Author

jeromelecoq commented Apr 9, 2023 via email

@jeromelecoq
Copy link
Author

Using TimeSeries allowed me to move forward and upload a draft of Visual Coding NWB 2.0 to Dandi. This supports the issue is related to links. I am still working on it little things here and there but I will go back to this later on. Obviously my files do not have the template images, just the underlying stimulus structure.

@rly
Copy link
Contributor

rly commented Apr 12, 2023

@jeromelecoq please try installing this branch of HDMF referenced in hdmf-dev/hdmf#847:

pip uninstall hdmf --yes
pip install git+https://github.com/hdmf-dev/hdmf.git@fix/export_links

And let me know if that resolves the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants