Column headers from first dataset are used for all subsequent datasets #128

rbauststfc · 2024-03-22T11:01:44Z

The library gives you the flexibility to put multiple datasets in a single file. Each of these datasets can be set up with different numbers of columns, and the column names can also be different for each dataset. However, the saved file continues to use the column headers provided for the first dataset.

This script can be used to demonstrate:

import numpy as np

from orsopy.fileio.data_source import DataSource, Person, Experiment, Sample, Measurement
from orsopy.fileio import Reduction, Software
from orsopy.fileio.orso import Orso, OrsoDataset, save_orso
from orsopy.fileio.base import Column

###################
# Set up the header data
###################

owner = Person(name=None, affiliation=None)
experiment = Experiment(
    title=None,
    instrument="Test Instrument",
    start_date=None,
    probe="neutron",
)
sample = Sample(name="Sample Name")
measurement = Measurement(instrument_settings=None, data_files=[])
creator = Person(name="Creator Name", affiliation="Affiliation")
software = Software(name="Software Name", version="v1")

data_source = DataSource(owner=owner, experiment=experiment, sample=sample, measurement=measurement)
reduction = Reduction(software=software, creator=creator)

###############################
# Create the first OrsoDataset for the file
###############################

columns_1 = [
    Column(name="Qz", unit="1/angstrom", physical_quantity="normal_wavevector_transfer"),
    Column(name="R", unit=None, physical_quantity="reflectivity"),
]

header_1 = Orso(data_source, reduction, columns_1, "dataset_1")

data_1 = np.array([
    np.full(5, 2),
    np.full(5, 3),
]).T

dataset_1 = OrsoDataset(info=header_1, data=data_1)

###################################################
# Create the second OrsoDataset for the file with one fewer column
###################################################

columns_2 = [
    Column(name="Qz", unit="1/angstrom", physical_quantity="normal_wavevector_transfer"),
]

header_2 = Orso(data_source, reduction, columns_2, "dataset_2")

data_2 = np.array([
    np.full(5, 2),
]).T

dataset_2 = OrsoDataset(info=header_2, data=data_2)

#############################################################
# Create the third OrsoDataset for the file with one column with a different name
#############################################################

columns_3 = [
    Column(name="Theta", unit="deg", physical_quantity="incident_angle"),
]

header_3 = Orso(data_source, reduction, columns_3, "dataset_3")

data_3 = np.array([
    np.full(5, 2.3),
]).T

dataset_3 = OrsoDataset(info=header_3, data=data_3)

####################
# Save the datasets to file
####################

save_orso(datasets=[dataset_1, dataset_2, dataset_3], fname="REPLACE_WITH_FILE_PATH")

The saved ORSO file gives the following column information for each dataset:

# data_set: dataset_1
# columns:
# - {name: Qz, unit: 1/angstrom, physical_quantity: normal_wavevector_transfer}
# - {name: R, physical_quantity: reflectivity}
# # Qz (1/angstrom)    R                     
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
# data_set: dataset_2
# columns:
# - name: Qz
#   unit: 1/angstrom
#   physical_quantity: normal_wavevector_transfer
# # Qz (1/angstrom)    R                     
2.0000000000000000e+00
2.0000000000000000e+00
2.0000000000000000e+00
2.0000000000000000e+00
2.0000000000000000e+00
# data_set: dataset_3
# columns:
# - name: Theta
#   unit: deg
#   physical_quantity: incident_angle
# # Qz (1/angstrom)    R                     
2.2999999999999998e+00
2.2999999999999998e+00
2.2999999999999998e+00
2.2999999999999998e+00
2.2999999999999998e+00

The R column header is present for all three datasets, but it is only relevant for the first. The final dataset should have Theta as the column header, but it remains as Qz.

For information, we have a use case for wanting multiple datasets with different numbers of columns in the same file. However, we don't currently have a use case for wanting multiple datasets with different column names, I just thought it would be worth pointing out the behaviour.

The text was updated successfully, but these errors were encountered:

andyfaff · 2024-03-22T11:06:46Z

According to the spec the first four columns must always be Qz, R, [dR, [Qz] ]. If you have a column that is not one of those, then you need to add a fifth column. See https://www.reflectometry.org/file_format/specification#column-description

rbauststfc · 2024-03-22T11:21:54Z

Thanks @andyfaff, the example script is quite artificial just to demonstrate the behaviour. Our use case is that we're including the first four columns specified by the specification in all datasets, but in some datasets we want to include a further 4 columns. We find that only the column headers that were set on the first dataset are the ones that are used throughout the file.

bmaranville · 2024-03-22T14:00:58Z

Can you try this?

import numpy as np

from orsopy.fileio.data_source import DataSource, Person, Experiment, Sample, Measurement
from orsopy.fileio import Reduction, Software
from orsopy.fileio.orso import Orso, OrsoDataset, save_orso
from orsopy.fileio.base import Column, ErrorColumn

FILEPATH = "test_header_output.ort"

###################
# Set up the header data
###################

owner = Person(name=None, affiliation=None)
experiment = Experiment(
    title=None,
    instrument="Test Instrument",
    start_date=None,
    probe="neutron",
)
sample = Sample(name="Sample Name")
measurement = Measurement(instrument_settings=None, data_files=[])
creator = Person(name="Creator Name", affiliation="Affiliation")
software = Software(name="Software Name", version="v1")

data_source = DataSource(owner=owner, experiment=experiment, sample=sample, measurement=measurement)
reduction = Reduction(software=software, creator=creator)

###############################
# Create the first OrsoDataset for the file
###############################

columns_1 = [
    Column(name="Qz", unit="1/angstrom", physical_quantity="normal_wavevector_transfer"),
    Column(name="R", unit=None, physical_quantity="reflectivity"),
]

header_1 = Orso(data_source, reduction, columns_1, "dataset_1")

data_1 = np.array([
    np.full(5, 2),
    np.full(5, 3),
]).T

dataset_1 = OrsoDataset(info=header_1, data=data_1)

###################################################
# Create the second OrsoDataset for the file with one fewer column
###################################################

columns_2 = [
    Column(name="Qz", unit="1/angstrom", physical_quantity="normal_wavevector_transfer"),
    Column(name="R", unit=None, physical_quantity="reflectivity"),
    ErrorColumn(error_of="R"),
    ErrorColumn(error_of="Q"),
    Column(name="Theta", unit=None, physical_quantity="incident_angle"),
]

header_2 = Orso(data_source, reduction, columns_2, "dataset_2")

data_2 = np.array([
    np.full(5, 2),
    np.full(5, 3),
    np.full(5, 0.1),
    np.full(5, 0.2),
    np.full(5, 2.3),
]).T

dataset_2 = OrsoDataset(info=header_2, data=data_2)

####################
# Save the datasets to file
####################

save_orso(datasets=[dataset_1, dataset_2], fname=FILEPATH)

bmaranville · 2024-03-22T18:38:27Z

Note that you can also output a NeXus file by appending these two lines to the code:

from orsopy.fileio.orso import save_nexus
save_nexus(datasets=[dataset_1, dataset_2], fname=FILEPATH.replace(".ort", ".orb"))

test_header_output.orb

rbauststfc · 2024-03-25T08:11:09Z

Hi @bmaranville, thanks for the re-worked example script. When I run that I think it demonstrates what I'm referring to - both datasets have only two column headers, Qz and R. This leaves the second, 5 column dataset with three columns that don't have headers:

# data_set: dataset_1
# columns:
# - {name: Qz, unit: 1/angstrom, physical_quantity: normal_wavevector_transfer}
# - {name: R, physical_quantity: reflectivity}
# # Qz (1/angstrom)    R                     
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
# data_set: dataset_2
# columns:
# - name: Qz
#   unit: 1/angstrom
#   physical_quantity: normal_wavevector_transfer
# - name: R
#   physical_quantity: reflectivity
# - error_of: R
# - error_of: Q
# - name: Theta
#   physical_quantity: incident_angle
# # Qz (1/angstrom)    R                     
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00

Thanks for the information about the Nexus output, that's really useful to hear. And it looks like the column headers print out correctly there, so it's just the .ort format that seems to be affected.

bmaranville · 2024-03-25T12:24:56Z

Ah - I think I understand the issue now. I was looking at the YAML headers, which look correct, but you're referring to the text column headers on one line before the data starts, and I see what you mean about only having 2 labels there.

bmaranville · 2024-03-25T14:38:00Z

Fixed by eea1bf3

# # ORSO reflectivity data file | 1.1 standard | YAML encoding | https://www.reflectometry.org/
# data_source:
#   owner:
#     name: null
#     affiliation: null
#   experiment:
#     title: null
#     instrument: Test Instrument
#     start_date: null
#     probe: neutron
#   sample:
#     name: Sample Name
#   measurement:
#     instrument_settings: null
#     data_files: []
# reduction:
#   software: {name: Software Name, version: v1}
#   creator:
#     name: Creator Name
#     affiliation: Affiliation
# data_set: dataset_1
# columns:
# - {name: Qz, unit: 1/angstrom, physical_quantity: normal_wavevector_transfer}
# - {name: R, physical_quantity: reflectivity}
# # Qz (1/angstrom)    R                     
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
# data_set: dataset_2
# columns:
# - name: Qz
#   unit: 1/angstrom
#   physical_quantity: normal_wavevector_transfer
# - name: R
#   physical_quantity: reflectivity
# - error_of: R
# - error_of: Q
# - name: Theta
#   physical_quantity: incident_angle
# # Qz (1/angstrom)    R                      sR                     sQ                     Theta                 
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00

rbauststfc · 2024-03-26T08:31:58Z

Brilliant, thanks very much @bmaranville. I'll keep an eye out for the next version of orsopy so we can pull in this fix.

arm61 · 2024-03-26T08:39:52Z

@bmaranville do we want to make a release of orsopy so that @rbauststfc can take advantage of this?

bmaranville · 2024-03-26T10:26:27Z

sounds good to me... but I won't be able to help much until the middle of next week.

rbauststfc · 2024-03-26T10:29:34Z

There's no significant rush from our side, whenever you next have the opportunity would be great.

andyfaff · 2024-03-26T11:02:35Z

We're always looking for new contributors to the project, so PRs and help advancing the cause is always great.

aglavic · 2024-04-12T11:47:46Z

As this is a stright forward bug-fix, I don't see a reason to not push a realse. I can take care of that latest next week.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Column headers from first dataset are used for all subsequent datasets #128

Column headers from first dataset are used for all subsequent datasets #128

rbauststfc commented Mar 22, 2024

andyfaff commented Mar 22, 2024

rbauststfc commented Mar 22, 2024 •

edited

bmaranville commented Mar 22, 2024

bmaranville commented Mar 22, 2024

rbauststfc commented Mar 25, 2024

bmaranville commented Mar 25, 2024

bmaranville commented Mar 25, 2024

rbauststfc commented Mar 26, 2024

arm61 commented Mar 26, 2024

bmaranville commented Mar 26, 2024

rbauststfc commented Mar 26, 2024

andyfaff commented Mar 26, 2024

aglavic commented Apr 12, 2024

Column headers from first dataset are used for all subsequent datasets #128

Column headers from first dataset are used for all subsequent datasets #128

Comments

rbauststfc commented Mar 22, 2024

andyfaff commented Mar 22, 2024

rbauststfc commented Mar 22, 2024 • edited

bmaranville commented Mar 22, 2024

bmaranville commented Mar 22, 2024

rbauststfc commented Mar 25, 2024

bmaranville commented Mar 25, 2024

bmaranville commented Mar 25, 2024

rbauststfc commented Mar 26, 2024

arm61 commented Mar 26, 2024

bmaranville commented Mar 26, 2024

rbauststfc commented Mar 26, 2024

andyfaff commented Mar 26, 2024

aglavic commented Apr 12, 2024

rbauststfc commented Mar 22, 2024 •

edited