Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize a Dataset from plain python dicts #491

Open
JWCook opened this issue May 18, 2021 · 3 comments
Open

Initialize a Dataset from plain python dicts #491

JWCook opened this issue May 18, 2021 · 3 comments

Comments

@JWCook
Copy link

JWCook commented May 18, 2021

I'd like to be able to create a Dataset from plain python dicts. I read through just about everything in the docs and some of the code, and can't seem to find a way to do this directly.

What I'd like to be able to do is something like:

source_data = [
     {'column_1': 'value_1A',  'column_2':  'value_2A'},
     {'column_1': 'value_1B',  'column_2':  'value_2B'},
]
data = Dataset().load(*source_data)

Or maybe:

data = import_set(source_data, format='dict')

Current workarounds:

  • json.dumps the data first before loading
  • Register a custom format class
  • Separately add headers and row values:
    data = DataSet()
    data.headers = source_data[0].keys()
    data.extend([item.values() for item in source_data])

So I guess my questions are:

  • Is there currently a better way to do this?
  • If not, would you be interested in a PR for this?
@brandonrobertz
Copy link

brandonrobertz commented Oct 7, 2021

data = DataSet()
data.headers = source_data[0].keys()
data.extend([item.values() for item in source_data])

I'll note that this only works if your dicts have all of the keys, which a lot of data doesn't. It's a waste to serialize every blank value as None so many applications will not contain every possible key.

I'd like to see movement on this as well and am willing to start developing on it. My current solution is to pass over the dataset a few times. Once to collect all the keys and gather the rows. Then again to build rows from the full set of headers and append to the Dataset.

It would be really nice if append could operate on a dict, adding a new column automatically, setting all previous values to blank, when encountered.

@brandonrobertz
Copy link

An update: I've been working through this and in the code there's references to the necessity of rows aligning, etc, but I'm not sure if this is strictly required by the validation checks or if there's something deeper in the functionality of appending rows that requires this. Some guidance from a dev would be helpful here!

@daghemo
Copy link

daghemo commented Jan 10, 2022

Same problem/question here.
My dicts have all of the keys, just like in the @JWCook's example. That's why I've been able to use:

source_data = [
     {'column_1': 'value_1A',  'column_2':  'value_2A'},
     {'column_1': 'value_1B',  'column_2':  'value_2B'},
]
data = Dataset()
data.dict = source_data

I may also override the column names with:

data.headers = ["Column 1","Column 2"]

This works only if every item in the dict has all of the keys. If not, a tablib.exceptions.InvalidDimensions exception is raised.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants