Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find output by its attribute #4717

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

Find output by its attribute #4717

wants to merge 19 commits into from

Conversation

cphyc
Copy link
Member

@cphyc cphyc commented Oct 27, 2023

PR Summary

This provides the ability to quickly find the dataset that has the closest value to a given one (e.g. time or redshift). Example

import yt

# Load all RAMSES dataset in a folder - we assume here we have a cosmological simulation
ts = yt.load("/path/to/simulation/folder/output*")

# Get the snapshot that's as close as possible from z=2
ds = ts.get_by_redshift(2)

# Get the snapshot that's as close as possible from t=10Gyr after Big-Bang,
t = ts[0].quan(10, "Gyr")
ds = ts.get_by_time(t)  

PR Checklist

  • New features are documented, with docstrings and narrative docs
  • Adds a test for any bugs fixed. Adds tests for new features.

@cphyc cphyc added the new feature Something fun and new! label Oct 27, 2023
@neutrinoceros
Copy link
Member

neutrinoceros commented Oct 27, 2023

Quick input:

  • I think this is a great idea
  • I'd prefer we avoid making it overly flexible: supporting the ability to search via arbitrary attributes may create unforeseen edge cases which will be impossible to test systematically. I think supporting search by time or redshift is sufficient for now, especially if it can be done in a way that doesn't involve typing the exact attribute names (IMO current_time is somewhat badly named, because its meaning seems excessively context-dependent)

@matthewturk
Copy link
Member

Not sure I completely agree with @neutrinoceros about limiting the flexibility. The question the other day about flash datasets with particles would suggest there may be times when we don't know the attribute someone will be looking for.

Can you make it use "Price Is Right" rules? i.e., closest-without-going-over?

@cphyc
Copy link
Member Author

cphyc commented Nov 1, 2023

I am not sure I understand what you mean by "Price Is Right" rule. Anyways, in its current state, it is already possible to query a dataset by attribute, but the method is hidden (_get_by_attribute), because “with great power comes great responsibility”.

Let me know what you think! I can always strip the leading _ to make it more user-facing, with an obvious comment in the docstring warning against misuses?

@matthewturk
Copy link
Member

matthewturk commented Nov 1, 2023 via email

@cphyc
Copy link
Member Author

cphyc commented Nov 1, 2023

Oh I see, yes, I'll implement a side argument. To either get the output on the left or on the right!

@neutrinoceros
Copy link
Member

I have nothing against that "the price is right rule", but I'd like to point out that left-to-right reading order isn't culturally neutral. I'd prefer that the interface be more inclusive.

@cphyc
Copy link
Member Author

cphyc commented Nov 1, 2023

@neutrinoceros I've implemented a "smaller", "larger" or "nearest" switch so that there is no ambiguity arising from left-to-right vs. right-to-left reading order. Thanks for pointing it out!

@matthewturk
Copy link
Member

I think the new language is better. That being said, it's not necessarily true that RTL languages have different number line concepts:

https://math.stackexchange.com/questions/2618537/do-right-to-left-readers-also-reverse-mathematical-concepts-like-number-line

There's also some evidence that filmmaking conventions (such as progression from the left side of the screen to the right) are also commonly used outside of left-to-right languages.

I will say that with redshift the entire situation is odd because the numeric progression is reversed from temporal progression. And I did choose redshift as my example... So I guess this is on me!

@cphyc
Copy link
Member Author

cphyc commented Nov 2, 2023

@yt-fido test this please

@cphyc cphyc marked this pull request as ready for review November 2, 2023 09:31
Copy link
Member

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functionality is useful and the implementation is sound ! I have a couple questions and suggestions (mostly about type annotations)

# This will fail if no output is found within 100 Myr
ds = ts.get_by_time((3, "Gyr"), tolerance=(100, "Myr"))
# Get the output at the time right before and after 3 Gyr
ds_before = ts.get_by_time((3, "Gyr"), side="smaller")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think the name of the argument could be clearer than side. What about prefer="smaller" ?

yt/data_objects/time_series.py Outdated Show resolved Hide resolved
yt/data_objects/time_series.py Outdated Show resolved Hide resolved
yt/data_objects/time_series.py Outdated Show resolved Hide resolved
yt/data_objects/time_series.py Outdated Show resolved Hide resolved

Parameters
----------
key : str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is out of sync with the argument nam


# Use a binary search to find the closest value
iL = 0
iH = len(self._pre_outputs) - 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what H stands for here. I would expect iL and iR (left/right) be used instead, can you explain ?

else:
raise ValueError(
f"{dsL} and {dsH} have both {attribute}={vL}, cannot perform search. "
"Try with another key."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to me like a poor suggestion, I'd rather have none at all.

Suggested change
"Try with another key."

@@ -2,13 +2,15 @@
import glob
import inspect
import os
import typing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import typing
from typing import TYPE_CHECKING

@@ -28,6 +30,9 @@
parallel_root_only,
)

if typing.TYPE_CHECKING:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if typing.TYPE_CHECKING:
if TYPE_CHECKING:

@cphyc
Copy link
Member Author

cphyc commented Nov 27, 2023

@yt-fido test this please

1 similar comment
@cphyc
Copy link
Member Author

cphyc commented Nov 27, 2023

@yt-fido test this please

Copy link
Contributor

@chrishavlin chrishavlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

I think the case of choosing a value exactly between two datasets needs to either explicitly handled or have its behavior defined in the docstring (see comments), but otherwise this is very useful!

def test_init_fake_dataseries():
file_list = [f"fake_data_file_{str(i).zfill(4)}" for i in range(10)]
@pytest.fixture
def FakeDataset():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fixture seems potentially useful beyond the scope of just timeseries, you could consider moving it up to conftest.py.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see now that this PR is not introducing FakeDataset but converting it to a pytest fixture. So probably not worth moving this up to conftest.py... though now that I know it's here I might do so in the future if I want to use it :)


def _set_code_unit_attributes(self):
return
def test_get_by_key(FakeDataset, fake_datasets):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to test the preference kwarg as well here.

# tear down to avoid possible breakage in following tests
output_type_registry.pop("FakeDataset")
with pytest.raises(ValueError):
ts.get_by_redshift(1000, tolerance=0.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this actually causes the test to fail. picking a point exactly between two timesteps will always return the larger regardless of the prefer value. See comment below.

Suggested change
ts.get_by_redshift(1000, tolerance=0.1)
ts.get_by_redshift(1000, tolerance=0.1)
assert sfile_list[0] == ts.get_by_redshift(1/2, prefer='smaller').filename
assert sfile_list[1] == ts.get_by_redshift(1/2, prefer='larger').filename

dsL = dsR = dsM
break

if prefer == "smaller":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm not sure if this needs to happen up in the while loop or down here... but I think you need to explicitly handle the case where the chosen value is exactly halfway between two timesteps, particularly when a value for prefer has been provided. As this is, the larger value will always be returned when the provided value is halfway, regardless of prefer (see my suggested change in test_time_series.py). If you prefer the current behavior then I think the behavior should be documented in the docstring and maybe even a log message (in which case you can ignore my suggested change to the test).

@cphyc cphyc added this to the 4.4.0 milestone Apr 12, 2024
@neutrinoceros
Copy link
Member

pre-commit.ci autofix

@cphyc
Copy link
Member Author

cphyc commented Apr 16, 2024

@yt-fido test this please

@neutrinoceros
Copy link
Member

@yt-fido Test this please

2 similar comments
@neutrinoceros
Copy link
Member

@yt-fido Test this please

@neutrinoceros
Copy link
Member

@yt-fido Test this please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature Something fun and new!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants