Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wavelength is a header item but should be allowed to be column? #69

Open
bmaranville opened this issue Dec 7, 2021 · 9 comments
Open

Comments

@bmaranville
Copy link
Contributor

(This is also true for incident_angle)
It seems unclear from the class layout how to structure the InstrumentSettings class if one of these is made into a column instead of putting it in the header. What do we put in InstrumentSettings.wavelength, in that case?

@arm61
Copy link
Contributor

arm61 commented Dec 7, 2021

I would make InstrumentSettings.wavelength a ValueRange type object and also have it as a column. Although this might be an edge-case that we should account for differently.

@bmaranville
Copy link
Contributor Author

The problem I see with that is discerning what to use for wavelength values: if ValueRange always means "look for a column", then that's ok - but if it sometimes means literally that it's just a range, full stop, then the logic for determining wavelength at each point in your data gets convoluted.

@bmaranville
Copy link
Contributor Author

Right now I'm trying to solve the problem: I need to use an Orso object and extract the wavelength at each point. I can look in the header, and then next I can look for a column, but if both are present which one do I use? What standard logic can be applied?

@arm61
Copy link
Contributor

arm61 commented Dec 7, 2021

I don't think wavelength/incident_angle are required by the specification though... (https://github.com/reflectivity/file_format/blob/master/specification.md). I might be mistaken (or looking at an out of date document).

@bmaranville
Copy link
Contributor Author

They are not required by the spec, but I need them in order to make sense of the data. I need them as much as I need Qz and R, so I am looking for a canonical way to specify them in the header or as a column (not a required way, mind you - just a standard way)

@andyfaff
Copy link
Contributor

andyfaff commented Dec 7, 2021

I would use a ValueRange for InstrumentSettings.wavelength and also supply a column, using the following logic in your own code:

if has_wavelength_column(header):
    # ignore the InstrumentSettings.wavelength attribute, using the column alone
    pass
elif has_angle_column(header):
    # this could be optional
    # work out wavelength from Q and angle.
else:
    # use wavelength from InstrumentSettings alone
    pass

EDIT: I changed ValueRage to ValueRange. I wonder what that says about any anger issues towards data.

@bmaranville
Copy link
Contributor Author

True - that logic will work for our use case, but will it be recommended as standard practice? Part of the purpose of the standard is so that e.g. someone wanting to extract the wavelength from an ORSO file wouldn't have to guess at the facility-specific heuristic being applied by the writer.

@andyfaff
Copy link
Contributor

andyfaff commented Dec 8, 2021

They are not required by the spec, but I need them in order to make sense of the data. I need them as much as I need Qz and R, so I am looking for a canonical way to specify them in the header or as a column

Part of the purpose of the standard is so that e.g. someone wanting to extract the wavelength from an ORSO file wouldn't have to guess at the facility-specific heuristic being applied by the writer.

The same argument applies to things like providing a resolution kernel for each data point, but there's no canonical facility-independent heuristic for doing things like that in the header either. Every facility might say, "we need this specific feature because we measure (or want to analyse) in a slightly different way", those specific features make it hard to generalise for everyone in an ORSO file.

My thought is that the header shouldn't be a place to store array-like data for information. If there is a wavelength corresponding to each of the Q points, that should belong in a column. Perhaps we should have reserved names for columns? i.e. in the spec we say what the first 4 columns have to be, additional columns are possible, but don't use "wavelength, omega, twotheta, etc", as they are reserved names for carrying that information.

True - that logic will work for our use case, but will it be recommended as standard practice?

I guess it depends, I haven't found detailed wavelength information (past a superficial wavelength range) all that useful to have in the data file. THe exceptions are for doing liquid-liquid where one might need to do some absorption corrections to the reflectivity.

@bmaranville
Copy link
Contributor Author

I agree with you that we can't support everything, and I agree it makes more sense as a column if there is an array of them. If we had a list of approved names for columns I would happily use columns with names from the list. Wavelength and wavelength resolution and incident angle and angular resolution were among the potential columns/and or header items we all agreed would be useful, we just never got around to standardising the names.

The names would get used both for columns and items in info.data_source.measurement.instrument_settings, since something like wavelength would be a column if there were N of them, but a header item if there is just 1 (and the same goes for all the "extra" columns?) The connection between those two locations is unusually strong since many items would naturally be in either one or the other, and could swap locations depending on the type or configuration of the instrument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants