Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend compare_metadata #770

Open
WenzDaniel opened this issue Oct 26, 2023 · 4 comments
Open

Extend compare_metadata #770

WenzDaniel opened this issue Oct 26, 2023 · 4 comments
Assignees

Comments

@WenzDaniel
Copy link
Collaborator

WenzDaniel commented Oct 26, 2023

We should add a fall back method to

def compare_metadata(self, run_id, target, old_metadata):
which only compares the lineages in case the current data-type is not stored. Currently, we are loading the metadata via:

# new metadata for the given runid + target; fetch from context
 new_metadata = self.get_metadata(run_id, target)

We should as fall back in case data is not stored:

new_lineage = st.key_for(run_id, data-type).linegae

One needs to only cast all tuples into lists or vise-versa or deactivate the type checking as otherwise one cannot compare to metadata files loaded from disk.__

@WenzDaniel WenzDaniel self-assigned this Oct 26, 2023
@KaraMelih
Copy link
Contributor

I started looking into this. It looks straightforward, and I think I just now understood what you meant by typecasting. Just to confirm;

a = st.get_metadata(run_id, target)['lineage']
b = st.key_for(run_id, target).lineage
strax.utils.compare_dict(a, b)

returns the same information but one of them returns them in a list while the other returns them in a tuple.

Then I need to check how deep the lists and tuples are nested to convert them.

@KaraMelih
Copy link
Contributor

Should I make a PR branching from this commit 5f9051b ?
The version on the master is different

@KaraMelih
Copy link
Contributor

I have refactored the code to check lineages in case the data is not stored. I also wrote the following utility function for converting tuples at varying-depth nested objects.

import copy
def convert_tuple_to_list(init_func_input):
    func_input = copy.deepcopy(init_func_input)
    # if it is a tuple convert it and reiterate
    if isinstance(func_input, tuple):
        _func_input = list(func_input)
        return convert_tuple_to_list(_func_input)
        
    # if it is a list, go over all the elements until all tuples are lists
    elif isinstance(func_input, list):
        new_func_inp = []
        # check each element
        for i in func_input:
            new_func_inp.append(convert_tuple_to_list(i))
            # iterates until everything is all depths are exhausted
        return new_func_inp
        
    # if it is a dict iterate
    elif isinstance(func_input, dict):
        for k,v in func_input.items():
            func_input[k] = convert_tuple_to_list(v)
        return func_input
    else:
        # if not a container, return. i.e. int, float, bytes, str etc.
        return func_input

I tested it and seems to be working as intended.

However, I think the lineage comparison might be a little misleading as the lineages of different runs do not differ if they are from the same context, any runid as long as the data type is the same will result in the same lineage e.g.;

st.key_for("000000", "event_basics").lineage == st.key_for("053675", "event_basics").lineage

is always True.
So by comparing the lineages (after straightening out the tuples and lists, if either of the compared parts is different) we can only say that they are created with the same context and they are looking at the same datatype. Is this the intended behavior?

@WenzDaniel
Copy link
Collaborator Author

So by comparing the lineages (after straightening out the tuples and lists, if either of the compared parts is different) we can only say that they are created with the same context and they are looking at the same datatype. Is this the intended behavior?

No then you did not understand my use case. In the current implementation we can only compare the lineage of the current context with the metadata of some stored data, if the metadata of the current context is also already stored somewhere. Because you load the metadata of the context via self.get_meta which only works if the data is stored somewhere. However, I think the nominal use case is rather: "I cannot load any data with my current context, why does my lineage differ with this piece of data stored in my directory."

You can just branch of the master for your changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants