Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for distributed cholla datasets. #4702

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mabruzzo
Copy link
Contributor

PR Summary

This PR adds support for loading Cholla datasets that are distributed over multiple files. Previously, the frontend could only load Cholla datasets after they were concatenated into a single large dataset.

This functionality is currently a little inefficient right now - we need to read in every hdf5 file to figure out the mapping between spatial locations and locations on disk. This seems like something we can easily improve in the future (possibly by having Cholla write out an extra attribute how 3D locations are mapped into 1D).

PR Checklist

  • Adds a test for any bugs fixed. Adds tests for new features.

For this PR, I suspect that we will need to upload a new test dataset. I just had a few questions:

  • It's been a while since I've done this. Could someone remind me of the procedure for doing this?
  • Weirdly enough, I get the following message when I run the unit-tests on the main branch. Do you have any idea why this is happening? (For context, the other 3 tests all run)

    yt/frontends/cholla/tests/test_outputs.py::test_cholla_data SKIPPED (cannot load dataset ChollaSimple/0.h5)

  • Is there any preference for unit tests vs answer-tests when it comes to frontends?

@neutrinoceros
Copy link
Member

It's been a while since I've done this. Could someone remind me of the procedure for doing this?

you'll need to

Weirdly enough, I get the following message when I run the unit-tests on the main branch. Do you have any idea why this is happening? (For context, the other 3 tests all run)

Maybe that's a bug with small_patch_amr. I suggest trying to work on a simplified version of the test and refine it until it doesn't skip, to discover what's happening.

Is there any preference for unit tests vs answer-tests when it comes to frontends?

I think unit tests should be preferred whenever they suffice for a couple reasons:

  • answer tests are currently deeply rooted in the nose test framework (migration to pytest is still ongoing), so adding more of them makes this long lasting migration ever so slightly harder
  • fast tests are easier to scale

That said, if what you need is some answer tests, go for it !

@neutrinoceros neutrinoceros added code frontends Things related to specific frontends enhancement Making something better labels Oct 11, 2023
from yt.geometry.api import Geometry
from yt.geometry.grid_geometry_handler import GridIndex
from yt.utilities.on_demand_imports import _h5py as h5py

from .fields import ChollaFieldInfo


def _split_fname_proc_suffix(filename: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you put a short note about how this is different from os.path.splitext? Just to avoid future confusion.

Copy link
Member

@matthewturk matthewturk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only minor stuff -- looks good otherwise

Comment on lines +142 to +144
self.grid_left_edge[i] = left_frac
self.grid_right_edge[i] = right_frac
self.grid_dimensions[i] = dims_local
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.grid_left_edge[i] = left_frac
self.grid_right_edge[i] = right_frac
self.grid_dimensions[i] = dims_local
self.grid_left_edge[i,:] = left_frac
self.grid_right_edge[i,:] = right_frac
self.grid_dimensions[i,:] = dims_local

Just for clarity, could we make it obvious that it's setting a slice to the values?

def io_iter(self, chunks, fields):
# this is loosely inspired by the implementation used for Enzo/Enzo-E
# - those other options use the lower-level hdf5 interface. Unclear
# whether that affords any advantages...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I think in the past it did because we avoided having to re-allocate temporary scratch space, but I am not sure that would hold up to current inquiries. I think the big advantage those have is tracking the groups within the iteration.

fh, filename = None, None
for chunk in chunks:
for obj in chunk.objs:
if obj.filename is None: # unclear when this case arises...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likely it will not here, unless you manually construct virtual grids

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what is a virtual grid?

I realize this may be an involved answer - so if you could just point me to a frontend (or other area of the code) using virtual grids, I can probably investigate that on my own.

fh, filename = None, None
for chunk in chunks:
for obj in chunk.objs:
if obj.filename is None: # unclear when this case arises...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likely it will not here, unless you manually construct virtual grids

@mabruzzo
Copy link
Contributor Author

My apologies for taking a while to follow up on this. I plan to circle back in the next week or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code frontends Things related to specific frontends enhancement Making something better
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants