Find output by its attribute #4717

cphyc · 2023-10-27T15:27:21Z

PR Summary

This provides the ability to quickly find the dataset that has the closest value to a given one (e.g. time or redshift). Example

import yt

# Load all RAMSES dataset in a folder - we assume here we have a cosmological simulation
ts = yt.load("/path/to/simulation/folder/output*")

# Get the snapshot that's as close as possible from z=2
ds = ts.get_by_redshift(2)

# Get the snapshot that's as close as possible from t=10Gyr after Big-Bang,
t = ts[0].quan(10, "Gyr")
ds = ts.get_by_time(t)

PR Checklist

New features are documented, with docstrings and narrative docs
Adds a test for any bugs fixed. Adds tests for new features.

neutrinoceros · 2023-10-27T15:46:27Z

Quick input:

I think this is a great idea
I'd prefer we avoid making it overly flexible: supporting the ability to search via arbitrary attributes may create unforeseen edge cases which will be impossible to test systematically. I think supporting search by time or redshift is sufficient for now, especially if it can be done in a way that doesn't involve typing the exact attribute names (IMO current_time is somewhat badly named, because its meaning seems excessively context-dependent)

matthewturk · 2023-11-01T17:03:50Z

Not sure I completely agree with @neutrinoceros about limiting the flexibility. The question the other day about flash datasets with particles would suggest there may be times when we don't know the attribute someone will be looking for.

Can you make it use "Price Is Right" rules? i.e., closest-without-going-over?

cphyc · 2023-11-01T17:16:56Z

I am not sure I understand what you mean by "Price Is Right" rule. Anyways, in its current state, it is already possible to query a dataset by attribute, but the method is hidden (_get_by_attribute), because “with great power comes great responsibility”.

Let me know what you think! I can always strip the leading _ to make it more user-facing, with an obvious comment in the docstring warning against misuses?

matthewturk · 2023-11-01T17:22:50Z

I mean, if I want the closet to z=1 but definitely on one side or the other, for instance. So having 0.99 take precedence over 1.01. (or even 0.95 over 1.001)

…

On Wed, Nov 1, 2023, 12:17 PM Corentin Cadiou ***@***.***> wrote: I am not sure I understand what you mean by "Price Is Right" rule. Anyways, in its current state, it is already possible to query a dataset by attribute, but the method is hidden (_get_by_attribute), because “with great power comes great responsibility”. Let me know what you think! I can always strip the leading _ to make it more user-facing, with an obvious comment in the docstring warning against misuses? — Reply to this email directly, view it on GitHub <#4717 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAVXOYJQUYAJMHRHRFESSDYCJ7ZHAVCNFSM6AAAAAA6S75QZ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBZGM2DKNZZGU> . You are receiving this because you commented.Message ID: ***@***.***>

cphyc · 2023-11-01T17:38:59Z

Oh I see, yes, I'll implement a side argument. To either get the output on the left or on the right!

neutrinoceros · 2023-11-01T17:45:06Z

I have nothing against that "the price is right rule", but I'd like to point out that left-to-right reading order isn't culturally neutral. I'd prefer that the interface be more inclusive.

cphyc · 2023-11-01T17:57:09Z

@neutrinoceros I've implemented a "smaller", "larger" or "nearest" switch so that there is no ambiguity arising from left-to-right vs. right-to-left reading order. Thanks for pointing it out!

matthewturk · 2023-11-01T18:49:25Z

I think the new language is better. That being said, it's not necessarily true that RTL languages have different number line concepts:

https://math.stackexchange.com/questions/2618537/do-right-to-left-readers-also-reverse-mathematical-concepts-like-number-line

There's also some evidence that filmmaking conventions (such as progression from the left side of the screen to the right) are also commonly used outside of left-to-right languages.

I will say that with redshift the entire situation is odd because the numeric progression is reversed from temporal progression. And I did choose redshift as my example... So I guess this is on me!

cphyc · 2023-11-02T09:13:19Z

@yt-fido test this please

neutrinoceros

The functionality is useful and the implementation is sound ! I have a couple questions and suggestions (mostly about type annotations)

neutrinoceros · 2023-11-25T10:14:07Z

doc/source/analyzing/time_series_analysis.rst

+   # This will fail if no output is found within 100 Myr
+   ds = ts.get_by_time((3, "Gyr"), tolerance=(100, "Myr"))
+   # Get the output at the time right before and after 3 Gyr
+   ds_before = ts.get_by_time((3, "Gyr"), side="smaller")


I still think the name of the argument could be clearer than side. What about prefer="smaller" ?

yt/data_objects/time_series.py

neutrinoceros · 2023-11-25T10:31:22Z

yt/data_objects/time_series.py

+
+        Parameters
+        ----------
+        key : str


this is out of sync with the argument nam

neutrinoceros · 2023-11-25T10:33:11Z

yt/data_objects/time_series.py

+
+        # Use a binary search to find the closest value
+        iL = 0
+        iH = len(self._pre_outputs) - 1


I don't understand what H stands for here. I would expect iL and iR (left/right) be used instead, can you explain ?

neutrinoceros · 2023-11-25T10:34:41Z

yt/data_objects/time_series.py

+        else:
+            raise ValueError(
+                f"{dsL} and {dsH} have both {attribute}={vL}, cannot perform search. "
+                "Try with another key."


This seems to me like a poor suggestion, I'd rather have none at all.

Suggested change

"Try with another key."

neutrinoceros · 2023-11-25T10:37:57Z

yt/data_objects/time_series.py

@@ -2,13 +2,15 @@
 import glob
 import inspect
 import os
+import typing


Suggested change

import typing

from typing import TYPE_CHECKING

neutrinoceros · 2023-11-25T10:38:08Z

yt/data_objects/time_series.py

@@ -28,6 +30,9 @@
    parallel_root_only,
 )

+if typing.TYPE_CHECKING:


Suggested change

if typing.TYPE_CHECKING:

if TYPE_CHECKING:

cphyc · 2023-11-27T16:37:11Z

@yt-fido test this please

cphyc · 2023-11-27T16:44:05Z

@yt-fido test this please

chrishavlin

Looks great!

I think the case of choosing a value exactly between two datasets needs to either explicitly handled or have its behavior defined in the docstring (see comments), but otherwise this is very useful!

chrishavlin · 2023-11-28T15:34:26Z

yt/data_objects/tests/test_time_series.py

-def test_init_fake_dataseries():
-    file_list = [f"fake_data_file_{str(i).zfill(4)}" for i in range(10)]
+@pytest.fixture
+def FakeDataset():


this fixture seems potentially useful beyond the scope of just timeseries, you could consider moving it up to conftest.py.

Oh, I see now that this PR is not introducing FakeDataset but converting it to a pytest fixture. So probably not worth moving this up to conftest.py... though now that I know it's here I might do so in the future if I want to use it :)

chrishavlin · 2023-11-28T15:38:40Z

yt/data_objects/tests/test_time_series.py


-            def _set_code_unit_attributes(self):
-                return
+def test_get_by_key(FakeDataset, fake_datasets):


would be good to test the preference kwarg as well here.

chrishavlin · 2023-11-28T15:59:32Z

yt/data_objects/tests/test_time_series.py

-            # tear down to avoid possible breakage in following tests
-            output_type_registry.pop("FakeDataset")
+    with pytest.raises(ValueError):
+        ts.get_by_redshift(1000, tolerance=0.1)


note: this actually causes the test to fail. picking a point exactly between two timesteps will always return the larger regardless of the prefer value. See comment below.

Suggested change

ts.get_by_redshift(1000, tolerance=0.1)

ts.get_by_redshift(1000, tolerance=0.1)

assert sfile_list[0] == ts.get_by_redshift(1/2, prefer='smaller').filename

assert sfile_list[1] == ts.get_by_redshift(1/2, prefer='larger').filename

chrishavlin · 2023-11-28T16:06:33Z

yt/data_objects/time_series.py

+                dsL = dsR = dsM
+                break
+
+        if prefer == "smaller":


So I'm not sure if this needs to happen up in the while loop or down here... but I think you need to explicitly handle the case where the chosen value is exactly halfway between two timesteps, particularly when a value for prefer has been provided. As this is, the larger value will always be returned when the provided value is halfway, regardless of prefer (see my suggested change in test_time_series.py). If you prefer the current behavior then I think the behavior should be documented in the docstring and maybe even a log message (in which case you can ignore my suggested change to the test).

… value

Co-authored-by: Clément Robert <cr52@protonmail.com>

neutrinoceros · 2024-04-13T06:11:32Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

cphyc · 2024-04-16T08:12:17Z

@yt-fido test this please

neutrinoceros · 2024-04-16T09:54:11Z

@yt-fido Test this please

neutrinoceros · 2024-04-16T09:58:28Z

@yt-fido Test this please

neutrinoceros · 2024-04-16T11:51:03Z

@yt-fido Test this please

cphyc added the new feature Something fun and new! label Oct 27, 2023

cphyc force-pushed the find-by-key branch from 10e57ba to e0e7136 Compare November 1, 2023 17:55

cphyc force-pushed the find-by-key branch from e0e7136 to a41c86a Compare November 1, 2023 17:58

cphyc marked this pull request as ready for review November 2, 2023 09:31

cphyc force-pushed the find-by-key branch from bb6a687 to 8072744 Compare November 3, 2023 15:30

neutrinoceros reviewed Nov 25, 2023

View reviewed changes

cphyc force-pushed the find-by-key branch from 57f7b5f to e53668d Compare November 27, 2023 16:00

chrishavlin reviewed Nov 28, 2023

View reviewed changes

cphyc added this to the 4.4.0 milestone Apr 12, 2024

cphyc added 10 commits April 12, 2024 14:53

Add functionality to find a dataset by its closest value

2315d17

Test the ability to get output from key

cd94faf

Only 'get_by_time' and '_by_redshift' are user-facing

eb73188

Fix typos in docstring

d4c1120

Match by filename

1502aaf

Short-circuit iterations for out-of-bounds and exact matches

7d0fea8

Make sure we can pass tuples for unitfull quantities

e15fe1c

Skip time_series for nose testing

a34ac2a

Provide 'side' to pick whether we want the closest, smaller or larger…

e0d0e71

… value

Add documentation

f263684

cphyc and others added 8 commits April 12, 2024 14:54

Update yt/data_objects/tests/test_time_series.py

edec406

Co-authored-by: Clément Robert <cr52@protonmail.com>

Replace numpy.assert_raises with pytest's version

e9cc072

Import TYPE_CHECKING from typing

37613d6

Remove outdated comment

048561f

Rename iH(igh) to the more canonical iR(ight)

6ca655c

Rename side to more explicit 'prefer' kwa

6482639

Fix typing

bf8e078

Do not suggest very vague fix

b45d47d

cphyc force-pushed the find-by-key branch from 6c24010 to b45d47d Compare April 12, 2024 12:55

[pre-commit.ci] auto fixes from pre-commit.com hooks

57d2b2e

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find output by its attribute #4717

Find output by its attribute #4717

cphyc commented Oct 27, 2023 •

edited

neutrinoceros commented Oct 27, 2023 •

edited

matthewturk commented Nov 1, 2023

cphyc commented Nov 1, 2023

matthewturk commented Nov 1, 2023 via email

cphyc commented Nov 1, 2023

neutrinoceros commented Nov 1, 2023

cphyc commented Nov 1, 2023

matthewturk commented Nov 1, 2023

cphyc commented Nov 2, 2023

neutrinoceros left a comment

neutrinoceros Nov 25, 2023

neutrinoceros Nov 25, 2023

neutrinoceros Nov 25, 2023

neutrinoceros Nov 25, 2023

neutrinoceros Nov 25, 2023

neutrinoceros Nov 25, 2023

cphyc commented Nov 27, 2023

cphyc commented Nov 27, 2023

chrishavlin left a comment

chrishavlin Nov 28, 2023

chrishavlin Nov 28, 2023

chrishavlin Nov 28, 2023

chrishavlin Nov 28, 2023

chrishavlin Nov 28, 2023

neutrinoceros commented Apr 13, 2024

cphyc commented Apr 16, 2024

neutrinoceros commented Apr 16, 2024

neutrinoceros commented Apr 16, 2024

neutrinoceros commented Apr 16, 2024

Find output by its attribute #4717

Are you sure you want to change the base?

Find output by its attribute #4717

Conversation

cphyc commented Oct 27, 2023 • edited

PR Summary

PR Checklist

neutrinoceros commented Oct 27, 2023 • edited

matthewturk commented Nov 1, 2023

cphyc commented Nov 1, 2023

matthewturk commented Nov 1, 2023 via email

cphyc commented Nov 1, 2023

neutrinoceros commented Nov 1, 2023

cphyc commented Nov 1, 2023

matthewturk commented Nov 1, 2023

cphyc commented Nov 2, 2023

neutrinoceros left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cphyc commented Nov 27, 2023

cphyc commented Nov 27, 2023

chrishavlin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neutrinoceros commented Apr 13, 2024

cphyc commented Apr 16, 2024

neutrinoceros commented Apr 16, 2024

neutrinoceros commented Apr 16, 2024

neutrinoceros commented Apr 16, 2024

cphyc commented Oct 27, 2023 •

edited

neutrinoceros commented Oct 27, 2023 •

edited