Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change internal data storage format #47

Open
khaeru opened this issue Jan 26, 2021 · 0 comments
Open

Change internal data storage format #47

khaeru opened this issue Jan 26, 2021 · 0 comments
Labels
needs-info Needs more info from the issuer to proceed

Comments

@khaeru
Copy link
Owner

khaeru commented Jan 26, 2021

(Copied from the former doc/roadmap.rst.)

sdmx handles Observation as individual object instances. An alternative is to use pandas.DataFrame, xarray.DataSet, or other data structures internally.

See:

  • sdmx/experimental.py for a partial mock-up of such code, and
  • tests/test_experimental.py for tests.

Choosing either the current or experimental DataSet as a default should be based on detailed performance (memory and time) evaluation under a variety of use-cases.
The key question is: is there a performance limitation that this could solve?

To that end, note that the experimental DataSet involves three conversions:

  1. a reader parses the XML or JSON source, creates Observation instances, and adds them using DataSet.add_obs()
  2. experimental.DataSet.add_obs() populates a pd.DataFrame from these Observations, then discards them.
  3. experimental.DataSet.obs() creates new Observation instances again.

For a fair comparison, the API between the readers and DataSet could be changed to eliminate the round trip in (1)/(2), but without sacrificing the data model consistency provided by pydantic on Observation instances.

@khaeru khaeru added the needs-info Needs more info from the issuer to proceed label Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-info Needs more info from the issuer to proceed
Projects
None yet
Development

No branches or pull requests

1 participant