Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column headers from first dataset are used for all subsequent datasets #128

Open
rbauststfc opened this issue Mar 22, 2024 · 13 comments
Open

Comments

@rbauststfc
Copy link

The library gives you the flexibility to put multiple datasets in a single file. Each of these datasets can be set up with different numbers of columns, and the column names can also be different for each dataset. However, the saved file continues to use the column headers provided for the first dataset.

This script can be used to demonstrate:

import numpy as np

from orsopy.fileio.data_source import DataSource, Person, Experiment, Sample, Measurement
from orsopy.fileio import Reduction, Software
from orsopy.fileio.orso import Orso, OrsoDataset, save_orso
from orsopy.fileio.base import Column

###################
# Set up the header data
###################

owner = Person(name=None, affiliation=None)
experiment = Experiment(
    title=None,
    instrument="Test Instrument",
    start_date=None,
    probe="neutron",
)
sample = Sample(name="Sample Name")
measurement = Measurement(instrument_settings=None, data_files=[])
creator = Person(name="Creator Name", affiliation="Affiliation")
software = Software(name="Software Name", version="v1")

data_source = DataSource(owner=owner, experiment=experiment, sample=sample, measurement=measurement)
reduction = Reduction(software=software, creator=creator)

###############################
# Create the first OrsoDataset for the file
###############################

columns_1 = [
    Column(name="Qz", unit="1/angstrom", physical_quantity="normal_wavevector_transfer"),
    Column(name="R", unit=None, physical_quantity="reflectivity"),
]

header_1 = Orso(data_source, reduction, columns_1, "dataset_1")

data_1 = np.array([
    np.full(5, 2),
    np.full(5, 3),
]).T

dataset_1 = OrsoDataset(info=header_1, data=data_1)

###################################################
# Create the second OrsoDataset for the file with one fewer column
###################################################

columns_2 = [
    Column(name="Qz", unit="1/angstrom", physical_quantity="normal_wavevector_transfer"),
]

header_2 = Orso(data_source, reduction, columns_2, "dataset_2")

data_2 = np.array([
    np.full(5, 2),
]).T

dataset_2 = OrsoDataset(info=header_2, data=data_2)

#############################################################
# Create the third OrsoDataset for the file with one column with a different name
#############################################################

columns_3 = [
    Column(name="Theta", unit="deg", physical_quantity="incident_angle"),
]

header_3 = Orso(data_source, reduction, columns_3, "dataset_3")

data_3 = np.array([
    np.full(5, 2.3),
]).T

dataset_3 = OrsoDataset(info=header_3, data=data_3)

####################
# Save the datasets to file
####################

save_orso(datasets=[dataset_1, dataset_2, dataset_3], fname="REPLACE_WITH_FILE_PATH")

The saved ORSO file gives the following column information for each dataset:

# data_set: dataset_1
# columns:
# - {name: Qz, unit: 1/angstrom, physical_quantity: normal_wavevector_transfer}
# - {name: R, physical_quantity: reflectivity}
# # Qz (1/angstrom)    R                     
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
# data_set: dataset_2
# columns:
# - name: Qz
#   unit: 1/angstrom
#   physical_quantity: normal_wavevector_transfer
# # Qz (1/angstrom)    R                     
2.0000000000000000e+00
2.0000000000000000e+00
2.0000000000000000e+00
2.0000000000000000e+00
2.0000000000000000e+00
# data_set: dataset_3
# columns:
# - name: Theta
#   unit: deg
#   physical_quantity: incident_angle
# # Qz (1/angstrom)    R                     
2.2999999999999998e+00
2.2999999999999998e+00
2.2999999999999998e+00
2.2999999999999998e+00
2.2999999999999998e+00

The R column header is present for all three datasets, but it is only relevant for the first. The final dataset should have Theta as the column header, but it remains as Qz.

For information, we have a use case for wanting multiple datasets with different numbers of columns in the same file. However, we don't currently have a use case for wanting multiple datasets with different column names, I just thought it would be worth pointing out the behaviour.

@andyfaff
Copy link
Contributor

According to the spec the first four columns must always be Qz, R, [dR, [Qz] ]. If you have a column that is not one of those, then you need to add a fifth column. See https://www.reflectometry.org/file_format/specification#column-description

@rbauststfc
Copy link
Author

rbauststfc commented Mar 22, 2024

Thanks @andyfaff, the example script is quite artificial just to demonstrate the behaviour. Our use case is that we're including the first four columns specified by the specification in all datasets, but in some datasets we want to include a further 4 columns. We find that only the column headers that were set on the first dataset are the ones that are used throughout the file.

@bmaranville
Copy link
Contributor

Can you try this?

import numpy as np

from orsopy.fileio.data_source import DataSource, Person, Experiment, Sample, Measurement
from orsopy.fileio import Reduction, Software
from orsopy.fileio.orso import Orso, OrsoDataset, save_orso
from orsopy.fileio.base import Column, ErrorColumn

FILEPATH = "test_header_output.ort"

###################
# Set up the header data
###################

owner = Person(name=None, affiliation=None)
experiment = Experiment(
    title=None,
    instrument="Test Instrument",
    start_date=None,
    probe="neutron",
)
sample = Sample(name="Sample Name")
measurement = Measurement(instrument_settings=None, data_files=[])
creator = Person(name="Creator Name", affiliation="Affiliation")
software = Software(name="Software Name", version="v1")

data_source = DataSource(owner=owner, experiment=experiment, sample=sample, measurement=measurement)
reduction = Reduction(software=software, creator=creator)

###############################
# Create the first OrsoDataset for the file
###############################

columns_1 = [
    Column(name="Qz", unit="1/angstrom", physical_quantity="normal_wavevector_transfer"),
    Column(name="R", unit=None, physical_quantity="reflectivity"),
]

header_1 = Orso(data_source, reduction, columns_1, "dataset_1")

data_1 = np.array([
    np.full(5, 2),
    np.full(5, 3),
]).T

dataset_1 = OrsoDataset(info=header_1, data=data_1)

###################################################
# Create the second OrsoDataset for the file with one fewer column
###################################################

columns_2 = [
    Column(name="Qz", unit="1/angstrom", physical_quantity="normal_wavevector_transfer"),
    Column(name="R", unit=None, physical_quantity="reflectivity"),
    ErrorColumn(error_of="R"),
    ErrorColumn(error_of="Q"),
    Column(name="Theta", unit=None, physical_quantity="incident_angle"),
]

header_2 = Orso(data_source, reduction, columns_2, "dataset_2")

data_2 = np.array([
    np.full(5, 2),
    np.full(5, 3),
    np.full(5, 0.1),
    np.full(5, 0.2),
    np.full(5, 2.3),
]).T

dataset_2 = OrsoDataset(info=header_2, data=data_2)

####################
# Save the datasets to file
####################

save_orso(datasets=[dataset_1, dataset_2], fname=FILEPATH)

@bmaranville
Copy link
Contributor

Note that you can also output a NeXus file by appending these two lines to the code:

from orsopy.fileio.orso import save_nexus
save_nexus(datasets=[dataset_1, dataset_2], fname=FILEPATH.replace(".ort", ".orb"))

test_header_output.orb

@rbauststfc
Copy link
Author

Hi @bmaranville, thanks for the re-worked example script. When I run that I think it demonstrates what I'm referring to - both datasets have only two column headers, Qz and R. This leaves the second, 5 column dataset with three columns that don't have headers:

# data_set: dataset_1
# columns:
# - {name: Qz, unit: 1/angstrom, physical_quantity: normal_wavevector_transfer}
# - {name: R, physical_quantity: reflectivity}
# # Qz (1/angstrom)    R                     
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
# data_set: dataset_2
# columns:
# - name: Qz
#   unit: 1/angstrom
#   physical_quantity: normal_wavevector_transfer
# - name: R
#   physical_quantity: reflectivity
# - error_of: R
# - error_of: Q
# - name: Theta
#   physical_quantity: incident_angle
# # Qz (1/angstrom)    R                     
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00

Thanks for the information about the Nexus output, that's really useful to hear. And it looks like the column headers print out correctly there, so it's just the .ort format that seems to be affected.

@bmaranville
Copy link
Contributor

Ah - I think I understand the issue now. I was looking at the YAML headers, which look correct, but you're referring to the text column headers on one line before the data starts, and I see what you mean about only having 2 labels there.

@bmaranville
Copy link
Contributor

Fixed by eea1bf3

# # ORSO reflectivity data file | 1.1 standard | YAML encoding | https://www.reflectometry.org/
# data_source:
#   owner:
#     name: null
#     affiliation: null
#   experiment:
#     title: null
#     instrument: Test Instrument
#     start_date: null
#     probe: neutron
#   sample:
#     name: Sample Name
#   measurement:
#     instrument_settings: null
#     data_files: []
# reduction:
#   software: {name: Software Name, version: v1}
#   creator:
#     name: Creator Name
#     affiliation: Affiliation
# data_set: dataset_1
# columns:
# - {name: Qz, unit: 1/angstrom, physical_quantity: normal_wavevector_transfer}
# - {name: R, physical_quantity: reflectivity}
# # Qz (1/angstrom)    R                     
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
# data_set: dataset_2
# columns:
# - name: Qz
#   unit: 1/angstrom
#   physical_quantity: normal_wavevector_transfer
# - name: R
#   physical_quantity: reflectivity
# - error_of: R
# - error_of: Q
# - name: Theta
#   physical_quantity: incident_angle
# # Qz (1/angstrom)    R                      sR                     sQ                     Theta                 
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00

@rbauststfc
Copy link
Author

Brilliant, thanks very much @bmaranville. I'll keep an eye out for the next version of orsopy so we can pull in this fix.

@arm61
Copy link
Contributor

arm61 commented Mar 26, 2024

@bmaranville do we want to make a release of orsopy so that @rbauststfc can take advantage of this?

@bmaranville
Copy link
Contributor

sounds good to me... but I won't be able to help much until the middle of next week.

@rbauststfc
Copy link
Author

There's no significant rush from our side, whenever you next have the opportunity would be great.

@andyfaff
Copy link
Contributor

We're always looking for new contributors to the project, so PRs and help advancing the cause is always great.

@aglavic
Copy link
Collaborator

aglavic commented Apr 12, 2024

As this is a stright forward bug-fix, I don't see a reason to not push a realse. I can take care of that latest next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants