Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow label based indexing in Rows (incl. test updates) #268

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

ms8r
Copy link

@ms8r ms8r commented Jan 1, 2017

Enables access to the items in a Dataset Row by index or by column header. For example, data[0]['first_name'] == data[0][0] if 'first_name' is the label of the first column as specified in the Dataset's headers. (Ref. issues #22, #158, #265.)

Implemented by adding a Row attribute _dset that stores a reference to the Dataset that "owns" the Row and thus allowing each Row access to the parent Dataset's headers. Constructors, insert methods and itemgetters/setters have been updated accordingly. In addition Dataset has a new attribute _lblidx that indicates whether label based indexing is possible (i.e. header with unique labels exists). _lblidxis maintained via updated headers property.

To allow label based access within a Row the Dataset's __getitem__ now returns a Row rather than a tuple, with the Row basically behaving like a list externally. This has the potential to cause some backwards compatibility issues if client code relied on Dataset items being returned as plain tuples. To minimize this impact the PR adds __add__, __eq__, and __ne__ methods for Rows. Tests have been updated by applying the Row.tuple property for comparisons with tuple literals (PR will fail existing tests otherwise). Independent of the label based indexing I'd suggest returning Dataset items as Rows instead of plain tuples may be preferable in any case to enable adding additional functionality in the future.

Other changes/additions:

  • Add copy method for Datasets that updates _dset references in new object's Rows and uses copy.deepcopy instead of copy.copy. This should also fix a bug in the current version where copies (in filterand stack) are shallow and the new object's _data attribute points to the same list as the original object (filter and stack updated accordingly).
  • Add assertions to existing tests for methods that return new Dataset objects to verify that Row's _dset points to the new object and that the new object is not a shallow copy (filter, stack, stack_col, subset, sorted, and transpose)
  • Add tests for new functionality (plus one for existing filter)

@timofurrer
Copy link
Member

Can you please resolve the conflicts. Thanks 🎉

@ms8r
Copy link
Author

ms8r commented Mar 17, 2019

Done ;-) This also surfaced a left over bug in the has_tag method (incorrect unicode handling under Python 2.7.... time to move to Python 3 only...

@hugovk hugovk mentioned this pull request Oct 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants