Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try loading xenium zarr #18

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tshauck
Copy link
Collaborator

@tshauck tshauck commented May 2, 2024

This is more a scratch work spot then a PR, but I may peel off things from here if we thing they're useful.

I tried loading a xenium dataset and am getting an error that looks to be an issue with the chunks key in the metadata changing values... Error: Execution("infer error: InvalidMetadata(\"inconsistent chunks in metadata, got [313] but have [41945, 1]\")").

@maximedion2
Copy link
Collaborator

Got it, I'll take a look before I get started on anything else.

@maximedion2
Copy link
Collaborator

To confirm, you were trying to read the zarr data that you get from running this, https://github.com/giovp/spatialdata-sandbox/blob/main/xenium_rep1_io/to_zarr.py?

@tshauck
Copy link
Collaborator Author

tshauck commented May 3, 2024

Sorry if it didn't work, I didn't expect you to look so quickly :/ -- there's the link on the page: https://s3.embl.de/spatialdata/spatialdata-sandbox/xenium_rep1_io.zip which is where I got the data from. I assumed the script would be the same, but may it's not.

@maximedion2
Copy link
Collaborator

Oh well I actually asked before I tried, just to make sure I was looking at the right script, but as it turns, no it didn't work. I'm sure I could easily fix it, but I'll just download from the link, simpler like that.

Regarding the error, I'll look into it tomorrow/over the weekend, but yeah a requirement here is that all the arrays in the store must have the exact same dimension and chunk layout, so that all the variables (i.e. "columns") can be read together, it won't work if 2 arrays have different sizes/dimensions. If arrays with different sizes need to be combined (e.g. with a "join"), they have to be read separately, from 2 different stores.

@maximedion2
Copy link
Collaborator

maximedion2 commented May 5, 2024

So I'm a bit confused by this data, for example if you look at data.zarr/labels/cell_labels, what are those folders, 0, 1, 2, 3, 4? They're not chunks, there's no .zarray file there, you have to go into those directories to see .zarray files... In any case, if I go into one of those directories (e.g. 0/), and copy the contents, and do the same with nucleus_labels, basically if I rearrange the data as

xenium_example.zarr
----> cell_labels
--> <2D data>
----> nucleus_labels
--> <2D data>

I can read it (after a minor fix, see #19), I don't really know how to confirm if the data is fine or not, but it works, it reads the correct number of chunks.

While we're looking into some real data, a few comments:

  • As I mentioned before, fill_value and missing chunks are not supported
  • Only 1D, 2D and 3D data is supported (I can extend to 4D in the future)
  • I'm ignoring "store level" metadata, like what you can get by consolidating the zarr metadata, I only rely on the individual .zarray files.
  • I'm ignoring the .zattrs files, those don't have any official specs right, it's for specific applications of the data, they don't define anything that zarr implementations are supposed to use?

Anything in that list strikes you as a problem, like something that would be a blocker for real applications that you work with?

@tshauck
Copy link
Collaborator Author

tshauck commented May 8, 2024

I think the 0/ and soforth correspond to resolutions, e.g.

├── DataTree('scale0')
│       Dimensions:  (c: 1, y: 25778, x: 35416)
│       Coordinates:
│         * c        (c) int64 0
│         * y        (y) float64 0.5 1.5 2.5 3.5 ... 2.578e+04 2.578e+04 2.578e+04
│         * x        (x) float64 0.5 1.5 2.5 3.5 ... 3.541e+04 3.541e+04 3.542e+04
│       Data variables:
│           image    (c, y, x) uint16 dask.array<chunksize=(1, 4096, 4096), meta=np.ndarray>
├── DataTree('scale1')
│       Dimensions:  (c: 1, y: 12889, x: 17708)
│       Coordinates:
│         * c        (c) int64 0
│         * y        (y) float64 1.0 3.0 5.0 7.0 ... 2.577e+04 2.578e+04 2.578e+04
│         * x        (x) float64 1.0 3.0 5.0 7.0 ... 3.541e+04 3.541e+04 3.542e+04
│       Data variables:
│           image    (c, y, x) uint16 dask.array<chunksize=(1, 4096, 4096), meta=np.ndarray>
├── DataTree('scale2')
│       Dimensions:  (c: 1, y: 6444, x: 8854)
│       Coordinates:
│         * c        (c) int64 0
│         * y        (y) float64 2.0 6.0 10.0 14.0 ... 2.577e+04 2.577e+04 2.578e+04
│         * x        (x) float64 2.0 6.0 10.0 14.0 ... 3.541e+04 3.541e+04 3.541e+04
│       Data variables:
│           image    (c, y, x) uint16 dask.array<chunksize=(1, 4096, 4096), meta=np.ndarray>
├── DataTree('scale3')
│       Dimensions:  (c: 1, y: 3222, x: 4427)
│       Coordinates:
│         * c        (c) int64 0
│         * y        (y) float64 4.0 12.0 20.0 28.0 ... 2.576e+04 2.577e+04 2.577e+04
│         * x        (x) float64 4.0 12.0 20.0 28.0 ... 3.54e+04 3.54e+04 3.541e+04
│       Data variables:
│           image    (c, y, x) uint16 dask.array<chunksize=(1, 3222, 4096), meta=np.ndarray>
└── DataTree('scale4')
        Dimensions:  (c: 1, y: 1611, x: 2213)
        Coordinates:
          * c        (c) int64 0
          * y        (y) float64 8.001 24.0 40.0 56.0 ... 2.574e+04 2.575e+04 2.577e+04
          * x        (x) float64 8.002 24.01 40.01 ... 3.538e+04 3.539e+04 3.541e+04
        Data variables:
            image    (c, y, x) uint16 dask.array<chunksize=(1, 1611, 2213), meta=np.ndarray>

@maximedion2
Copy link
Collaborator

aah right I see. So something that could be set up as partitions, once it's implemented! I'll get started on that soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants