Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using homogenize() after denormalize() results in some rows without row_names #756

Closed
Kirkman opened this issue Apr 22, 2021 · 5 comments
Closed
Labels

Comments

@Kirkman
Copy link

Kirkman commented Apr 22, 2021

I noticed recently that if you use table.homogenize() to fill in missing rows after earlier running table.denormalize(), the new filler rows will not have row_names, while the original rows will have row_names.

The problem is that this will lead to later errors when you try to invoke methods like .order_by() on the resulting table.
Because some rows have row_names and others don't, you'll get this error message:

File "/python3.8/site-packages/agate/table/order_by.py", line 46, in <listcomp>
    row_names = [self._row_names[i] for i in indices]
IndexError: tuple index out of range

This error can be reproduced with the following test code:

import agate
from decimal import Decimal

data = [
	{ 'date':'2021-04-01', 'name':'England', 'distributed':1000, 'administered':700 },
	{ 'date':'2021-04-02', 'name':'England', 'distributed':1100, 'administered':800 },
	{ 'date':'2021-04-04', 'name':'England', 'distributed':1300, 'administered':1000 },
	{ 'date':'2021-04-05', 'name':'England', 'distributed':1400, 'administered':1100 },
	{ 'date':'2021-04-01', 'name':'Mexico', 'distributed':1000, 'administered':700 },
	{ 'date':'2021-04-02', 'name':'Mexico', 'distributed':1100, 'administered':800 },
	{ 'date':'2021-04-04', 'name':'Mexico', 'distributed':1300, 'administered':1000 },
	{ 'date':'2021-04-05', 'name':'Mexico', 'distributed':1400, 'administered':1100 },
]

table = agate.Table.from_object(
	data,
	column_types=agate.TypeTester(force={
		'date': agate.Text(),
	})
)

table = table.denormalize(
	key='date',
	property_column='name',
	value_column='administered',
)

table = table.homogenize(
	'date', 
	[ '2021-04-01', '2021-04-02', '2021-04-03', '2021-04-04', '2021-04-05' ], 
	[ Decimal(0), Decimal(0) ]
)

table = table.order_by('date')

I don't use row_names myself, so I don't know why it's necessary to have .denormalize() automatically generate them. But I assume preserving that is important, so I guess the fix would be to make .homogenize() also generate row_names?

@Kirkman
Copy link
Author

Kirkman commented Apr 23, 2021

The kludge I'm using to get around this is to add the following immediately after invoking .homogenize():

	table = table._fork(table.rows, row_names='date')

This basically re-creates the table and forces the addition of row_names to all rows.

@jpmckinney
Copy link
Member

I think this can be partially solved in the case where the key argument to homogenize is a single column name, but if it is a sequence of names, then it is not clear which value in compare_values to use as the row name. So, absent adding a row_names parameter to homogenize, I do not think it is possible to preserve the names while also correctly setting names for any added rows. So, closing unless anyone needs this feature.

@Kirkman
Copy link
Author

Kirkman commented Jul 14, 2021

Could .homogenize() be set up to add row names set to None?

I have never used row names for anything explicitly in my code. So I don't understand why it's necessary for .denormalize() to automatically generate them, especially if other methods like homogenize will not generate them.

To me, it feels wrong that the sequence of commands I outlined above (denormalize, homogenize, and then order_by) results in an error. I feel like it should work just fine.

@jpmckinney
Copy link
Member

Hmm, that's an idea. I'll reopen the issue.

agate lacks a maintainer, so I'm just stepping in to fix some easy issues and close feature requests that will never be resolved.

@jpmckinney
Copy link
Member

I found a simple solution similar to #691.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants