Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tutorial/docs: write structure-specific SDMX-ML #93

Open
PalGal2 opened this issue Feb 10, 2022 · 8 comments
Open

Add tutorial/docs: write structure-specific SDMX-ML #93

PalGal2 opened this issue Feb 10, 2022 · 8 comments
Labels
doc Documentation, help, examples, etc. enh Enhancements & new features help welcome Issues that depend on contributions from new developers

Comments

@PalGal2
Copy link

PalGal2 commented Feb 10, 2022

Hi,

I would like to know if there is a method to write a Structure Specific Data SDMX-ML 2.1 with sdmx package.

>>> sdmx.format.FORMATS[1]
Format(mime='application/vnd.sdmx.structurespecificdata+xml;version=2.1', base='xml', data=True, meta=False, extra=<Extra.ss: 1>)

It looks like that the to_xml function writes in generic by default. Is there a way to specify the "structure specific data" parameter ?

Many thanks

@PalGal2 PalGal2 changed the title Write SDMX-ML structure Write SDMX-ML 2.1 Structure Specific Data Feb 10, 2022
@khaeru khaeru added doc Documentation, help, examples, etc. enh Enhancements & new features labels Feb 11, 2022
@khaeru
Copy link
Owner

khaeru commented Feb 11, 2022

Is there a way to specify the "structure specific data" parameter ?

There is, but it's not very obvious, sorry.

In short:

  • The writer.xml code (i.e. to_xml()) behaves differently depending on the class of objects it receives. See

    sdmx/sdmx/writer/xml.py

    Lines 100 to 104 in 4905936

    def _dm(obj: message.DataMessage):
    struct_spec = len(obj.data) and isinstance(
    obj.data[0],
    (model.StructureSpecificDataSet, model.StructureSpecificTimeSeriesDataSet),
    )
  • If it receives a DataMessage containing 1 or more StructureSpecificDataSet, then it writes out structure-specific XML format. (If, on the other hand, the data message contains other kinds of data sets, it writes the generic/non-structure-specific format.)
  • So, the suggested usage: if you ensure that your code uses the StructureSpecificDataSet class when you construct a DataMessage, then to_xml() will happily write out structure-specific XML.

Other thoughts:

  • I think we could improve the docs here, and maybe even provide an override option to force struct_spec=True when GenericDataSet is used. Added labels accordingly.
  • We do already test this functionality, because the test specimens include some structure-specific XML messages, and these are "round-tripped"—we read them into Python, then write them out to XML again to check the results are identical.
  • In SDMX 3.0 structure-specific is the only data format, so this will probably be relevant to Implement/ensure compat with SDMX 3.0 #87.

@PalGal2
Copy link
Author

PalGal2 commented Feb 14, 2022

Many thanks @khaeru, this is perfect.

I have just a minor question related to your answer : changing DataSet to StructureSpecificDataset works great, but I still have a repetition of dimensions at the level of observation, whereas I would like to have them at the series level (i.e. for a given series, the value changes only on the time period dimension). My demo is "cross-sectional data".

I wasn't able to find any documentation on that, but I am sure it is feasible with your library. Could you please give me some advice ?

Thank you so much.

@khaeru
Copy link
Owner

khaeru commented Feb 14, 2022

I still have a repetition of dimensions at the level of observation, whereas I would like to have them at the series level (i.e. for a given series, the value changes only on the time period dimension). My demo is "cross-sectional data".

Without a specimen or example code, it's really not clear what precisely you're trying to do, and what result you're getting instead.

A wild guess: taking this example https://github.com/khaeru/sdmx-test-data/blob/f040d18abbcf6b40b1640e510fbea0f91aa22d60/ECB_EXR/ng-xs-ss.xml#L14-L20 —what you are saying is that some dimension, e.g. CURRENCY_DENOM, is being written as part of the individual <Obs …/> XML elements, instead of as part of the containing <Series …></Series> elements. Is that correct?

If so, note the writer code follows the structure of the data, e.g.

  • If your observations are ungrouped in series, they get written out as such.
  • If certain dimensions appear in the Observation.dimension Key, and not in the DataSet.series SeriesKey, then that's how they will be written out.
  • Attributes (in the above example, things like DECIMALS, UNIT_MULT, etc. are all attributes, not dimensions) are written according to the data structure definition. This is somewhat complicated, but basically if they are attached to individual observations, that's how they'll be written out.

I hope that helps to investigate the data structures you are writing out.

@PalGal2
Copy link
Author

PalGal2 commented Feb 14, 2022

Thank you again for your precious help @khaeru. The example perfectly matches with what I would like to have in my final SDMX-ML file. I followed this tutorial which, however, does not present the <Series ...> elements.

How can I adapt this tutorial to add the <Series...> element in the final xml file? I would guess I have to specify the series argument in the sdmx.model.StructureSpecificDataset but I do not know how. Is that correct ? Could you please give me some guidance please ?

Can you give some examples for the following arguments in sdmx.model.StructureSpecificDataSet drawing on the tutorial (maybe to implement in the doc for other users?) :

Thank you again very much.

@khaeru
Copy link
Owner

khaeru commented Feb 15, 2022

maybe to implement in the doc for other users?

As you note, this is a substantial expansion of the docs, i.e. a whole new tutorial. I think that would be a nice enhancement to have, but do not have time to do it at this moment. I also cannot write your code for you.

What I would suggest you do is:

  • In an interactive session (Python, IPython, or Jupyter), read the example/specimen XML file I linked earlier.
  • Explore the arrangement of observations and series keys in the loaded message.
  • Try adjusting the code you've written to generate a new message, i.e. adapt it to produce a similar arrangement.
  • Try writing out the results.

@PalGal2
Copy link
Author

PalGal2 commented Feb 15, 2022

Many thanks @khaeru for your time.

@PalGal2 PalGal2 closed this as completed Feb 15, 2022
@khaeru
Copy link
Owner

khaeru commented Feb 15, 2022

Let's keep the issue open as a TODO item for the eventual new tutorial.

@khaeru khaeru reopened this Feb 15, 2022
@khaeru khaeru changed the title Write SDMX-ML 2.1 Structure Specific Data Add tutorial/docs: write structure-specific SDMX-ML Feb 15, 2022
@PalGal2
Copy link
Author

PalGal2 commented Feb 15, 2022

If this may help someone, I found here a sample code to build SeriesKey / Observation edited by @brockfanning.

@khaeru khaeru added the help welcome Issues that depend on contributions from new developers label Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Documentation, help, examples, etc. enh Enhancements & new features help welcome Issues that depend on contributions from new developers
Projects
None yet
Development

No branches or pull requests

2 participants