Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: added separate column validation #127

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

andyfaff
Copy link
Contributor

Added a separate validation for column properties. Eventually this validation should simplify the schema creation.

Instead of raising a jsonschema.exceptions.ValidationError I decided to raise a ValueError - the validation of the column properties is not being done by jsonschema.

Should there be extra checks?

@aglavic
Copy link
Collaborator

aglavic commented Mar 1, 2024

Do we check anywhere, that the shape of column data is equal? If not, is this mendatory?

@andyfaff
Copy link
Contributor Author

andyfaff commented Mar 1, 2024

The number of columns needs to be congruent with the data. I would say that the number of data points should be the same for all columns.

For an ORB file a column has the potential to be multidimensional. For an ORT file each column is expected to have ndim=1. So for a concatenated datapoint,column you would expect a shape of (ndata, ncol). A question is how to hold it internally, as a tuple/list of columns, or as a 2D array. I believe we were aiming for the former, as it would make holding ORB data (i.e. column ndim>=1) more easily?

But yes, easy to add the check on load.

@bmaranville
Copy link
Contributor

In the __post_init__ for OrsoDataset, we're checking that the number of columns in the data matches the number of columns defined in the header: https://github.com/reflectivity/orsopy/blob/main/orsopy/fileio/orso.py#L156
But we're not checking the first dimension of the columns, which should match.

What about this type of check?

    def __post_init__(self):
        if len(self.data) != len(self.info.columns):
            raise ValueError("Data has to have the same number of columns as header")
        column_lengths = set(len(c) for c in self.data)
        if len(column_lengths) > 1:
            raise ValueError("Columns must all have the same length in first dimension")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants