Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for categorical axis labels #1441

Open
kdheepak opened this issue Jun 2, 2023 · 15 comments
Open

Support for categorical axis labels #1441

kdheepak opened this issue Jun 2, 2023 · 15 comments
Labels
enhancement New feature or request

Comments

@kdheepak
Copy link

kdheepak commented Jun 2, 2023

Is your feature request related to a problem?

I have multiple datasets in the same HDF5 files. These datasets have a "dims" attribute that contain the path to different daatsets that contain string or integer values that then contain the domain of the dimension.

For example, here's some pseudo code that would pass tests in my application

@test attrs(h5file, "/group/dataset")["dims"] == [
  "/group/states", 
  "/group/years"
]

@test read(h5file, "/group/states") == ["California", "Colorado", "New York"]
@test read(h5file, "/group/years") == [2020, 2021, 2022, 2023]

Is it possible for the domain sliders to show the dimension information?
Is there a standard data schema for the attributes such that the visualization could be improved on?

Requested solution or feature

Add features to show custom labels and custom domains based on attributes of a dataset in a HDF5 file.

@kdheepak kdheepak added the enhancement New feature or request label Jun 2, 2023
@axelboc
Copy link
Contributor

axelboc commented Jun 2, 2023

Hi @kdheepak,

The H5Web viewer supports some aspects of the NeXus Data Format, including the ability to specify axis datasets. You can take a look at the NXdata class, and more specifically the axes attribute.

In short, the following changes should give you a functionality equivalent to your dims attribute:

  1. Move the dims attribute to the parent group and rename it axes.
  2. Add a signal attribute to the parent group with value "/group/dataset".

For a concrete example, inspect the /nexus_entry/image group in this demo.

In the future, we may also support HDF5 dimension scales, which provide equivalent functionality to your dims attribute without the NeXus boilerplate. We're also open to supporting additional data formats than NeXus.


There's a catch with your example, however: H5Web currently supports only continuous, numeric axis datasets. We don't support "categorical" axis datasets like in your example. So this part of your feature request still stands, for sure.

Could you please give us some more information about the content of your dataset and the information you're trying to visualize? If you could attach your HDF5 file, it'd be ideal.

@kdheepak
Copy link
Author

kdheepak commented Jun 2, 2023

Thanks for the quick response and the detailed explanation! I’ll look through the format in more detail (and maybe get back to you if I have questions?) tomorrow.

The quick answer is that our data is multidimensional energy system data. A large number of the datasets are time series data, for example energy demand in different areas, by fuel type, by technologies etc. And we want to make it easy for modelers to quickly explore the dataset. I personally found the line plot and heat map useful (especially with the hover tooltips) to get information about specific values.

We have users that make tweaks to input data and run the model, and then would like to visualize the data to see if the model is doing what they expect it to. I think this tool would be great for that kind of exploratory interactive debugging.

I’ll be able to attach a sample file and explain more on the details in a following comment.

@kdheepak
Copy link
Author

kdheepak commented Jun 2, 2023

Is there a way for me to download the source.h5 file?

There's a catch with your example, however: H5Web currently supports only continuous, numeric axis datasets. We don't support "categorical" axis datasets like in your example. So this part of your feature request still stands, for sure.

Is this something you'd be open to adding? I'd be up to make a PR if this would be of interest.

@kdheepak kdheepak changed the title Specification for attributes in a dataset for better visualization Support for categorical axis labels Jun 2, 2023
@axelboc
Copy link
Contributor

axelboc commented Jun 2, 2023

We'd need to define the feature in more details first (maybe with screenshots from other visualization solutions) and consider its impact on the codebase, but yes, I think it would be beneficial to have this in the H5Web viewer!

Did you try zipping your HDF5 file?

@kdheepak
Copy link
Author

kdheepak commented Jun 2, 2023

Here's a sample.zip file:

sample.zip

This file has just one "data" dataset called Dmd for demand:

image

The remaining datasets in this sample file are all categorical data used as axis values for the dimensions.

Any name in the dims list is also a HDF5 dataset but is either a list of strings or a list of integers in our case:

image

image

All the datasets in the file have some attributes:

image

We use type: "variable" for data that is floating point, and type: "set" for categorical data.

@kdheepak
Copy link
Author

kdheepak commented Jun 2, 2023

This is a early draft of our data format, and most of it is abstracted in our code for modelers, so I'm open to changing the schema of the attributes (perhaps even following the nexus format), if that would make visualization more easy.

@loichuder
Copy link
Member

loichuder commented Jun 12, 2023

That won't be super helpful since we don't support categorical axis labels, but this is how you would structure your data in a NXData group:

nx_categorial_datasets.zip

Just to give you a taste of the NeXus format

@axelboc
Copy link
Contributor

axelboc commented Jun 13, 2023

Thanks for the detailed explanation and example!

I have to be honest: showing categorical values on the axes and on the dimension sliders is gonna be very tricky — both in terms of UI (i.e. dealing with many values and long labels) and code (i.e. axis datasets used for slicing are not fetched currently).

After discussing with @t20100, we think we might be able to improve on the current state by at least showing categorical values in the tooltip of the NX line and heatmap visualizations. The axes would still show indices, but the tooltip would display the real axis value. Does that sound useful to you @kdheepak?

Of course, this requires you to adopt the NeXus data format. To expand on Loïc's example, note that NeXus can very well be used in addition to your existing data format: just create a separate NXdata group containing HDF5 "soft links" to your existing datasets:

- sample.hdf5
  - Area
  - Dmd
  - ...
  - nxdata (group with attributes: NX_class, axes, signal)
    - Area (soft link to /Area)
    - Dmd (soft link to /Dmd)
    - ...

@kdheepak
Copy link
Author

Yes, just having the tooltip hover would solve our modelers needs I think!

@axelboc
Copy link
Contributor

axelboc commented Jul 26, 2023

After doing a bit more investigating in the code of LineVis and HeatmapVis, I can't help but think that this brings us back to some of the problems described in issue #1278 about continuous vs discrete axes.

Currently, the visualization components use custom axis values when provided (via abscissaParams and ordinateParams) to compute the domains of the axes and to create the "value-to-index" scales used for the tooltip. There's even a bit more complexity in HeatmapVis around the fact that we support providing one extra custom axis value to mark the outer edge of the last image pixel.

In other words, adding string[] to the AxisParams['value'] type (which is currently NumArray | undefined) has non-trivial implications...

Perhaps LineVis and HeatmapVis try to be too smart and handle too many cases. It might be worth splitting them into separate, more specialised components to handle the various use cases: no axis values/no domains; continuous numerical axis values/custom domains; discrete/categorical axis values.

We could also give up on having those high-level visualization components in the lib and instead focus on making VisCanvas and the other low-level building blocks more user friendly to use directly (or on providing small, simple abstractions on top of those, like LinearVisCanvas in daiquiri).

@kdheepak
Copy link
Author

kdheepak commented Nov 2, 2023

I seem to remember being able to download the source.h5 file in the past but I'm now not able to find an easy way to do that. I see links to other .h5 files in https://h5web.panosc.eu/ but not to source.h5. Am I just misremembering? How can I get access to the original source.h5 file?

@axelboc
Copy link
Contributor

axelboc commented Nov 3, 2023

Oh I wasn't sure what source.h5 you were referring to, sorry. I get you now.

The mock demo does not use an HDF5 file per se. It's basically one big hard-coded JS object that mimics the shape of the metadata we might retrieve from a real provider like h5grove or h5wasm.

The source code for this is located here: https://github.com/silx-kit/h5web/blob/main/packages/shared/src/mock/metadata.ts. The values of the datasets are generated in a separate module: https://github.com/silx-kit/h5web/blob/main/packages/shared/src/mock/values.ts

@kdheepak
Copy link
Author

kdheepak commented Nov 3, 2023

Ah thanks for the clarification. I just wanted to download a .h5 file that had NeXusFormat attributes that I could examine locally that takes maximum advantage of the h5web features. Is there a file you recommend I look into?

@axelboc
Copy link
Contributor

axelboc commented Nov 3, 2023

Hmm, this one is a good basic example: https://myhdf5.hdfgroup.org/view?url=https%3A%2F%2Fgithub.com%2Foasys-esrf-kit%2Fdabam2d%2Fblob%2Ff3aed913976d5772a51e6bac3bf3c4e4e4c8b4e1%2Fdata%2Fdabam2d-0001.h5 (you can download the file from the sidebar). It has an NXdata group with a signal dataset and two axis datasets. It also makes use of the long_name and interpretation attributes (the latter of which forces the use of the "NX Image" visualization in H5Web, though it's already the default visualization for 2D+ signals).

Do note that you can inspect the metadata in H5Web, though. The mock demo may not use a real HDF5 file, but it does demonstrate all of the NeXus features that we support, like auxiliary_signals, and allows you to inspect the corresponding metadata:

image

@axelboc
Copy link
Contributor

axelboc commented Nov 9, 2023

Bringing @kdheepak's sample file from #1523 (thanks!)

output-testing.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants