Using homogenize() after denormalize() results in some rows without row_names #756

Kirkman · 2021-04-22T23:35:56Z

I noticed recently that if you use table.homogenize() to fill in missing rows after earlier running table.denormalize(), the new filler rows will not have row_names, while the original rows will have row_names.

The problem is that this will lead to later errors when you try to invoke methods like .order_by() on the resulting table.
Because some rows have row_names and others don't, you'll get this error message:

File "/python3.8/site-packages/agate/table/order_by.py", line 46, in <listcomp>
    row_names = [self._row_names[i] for i in indices]
IndexError: tuple index out of range

This error can be reproduced with the following test code:

import agate
from decimal import Decimal

data = [
	{ 'date':'2021-04-01', 'name':'England', 'distributed':1000, 'administered':700 },
	{ 'date':'2021-04-02', 'name':'England', 'distributed':1100, 'administered':800 },
	{ 'date':'2021-04-04', 'name':'England', 'distributed':1300, 'administered':1000 },
	{ 'date':'2021-04-05', 'name':'England', 'distributed':1400, 'administered':1100 },
	{ 'date':'2021-04-01', 'name':'Mexico', 'distributed':1000, 'administered':700 },
	{ 'date':'2021-04-02', 'name':'Mexico', 'distributed':1100, 'administered':800 },
	{ 'date':'2021-04-04', 'name':'Mexico', 'distributed':1300, 'administered':1000 },
	{ 'date':'2021-04-05', 'name':'Mexico', 'distributed':1400, 'administered':1100 },
]

table = agate.Table.from_object(
	data,
	column_types=agate.TypeTester(force={
		'date': agate.Text(),
	})
)

table = table.denormalize(
	key='date',
	property_column='name',
	value_column='administered',
)

table = table.homogenize(
	'date', 
	[ '2021-04-01', '2021-04-02', '2021-04-03', '2021-04-04', '2021-04-05' ], 
	[ Decimal(0), Decimal(0) ]
)

table = table.order_by('date')

I don't use row_names myself, so I don't know why it's necessary to have .denormalize() automatically generate them. But I assume preserving that is important, so I guess the fix would be to make .homogenize() also generate row_names?

The text was updated successfully, but these errors were encountered:

Kirkman · 2021-04-23T13:24:03Z

The kludge I'm using to get around this is to add the following immediately after invoking .homogenize():

	table = table._fork(table.rows, row_names='date')

This basically re-creates the table and forces the addition of row_names to all rows.

jpmckinney · 2021-07-14T16:52:29Z

I think this can be partially solved in the case where the key argument to homogenize is a single column name, but if it is a sequence of names, then it is not clear which value in compare_values to use as the row name. So, absent adding a row_names parameter to homogenize, I do not think it is possible to preserve the names while also correctly setting names for any added rows. So, closing unless anyone needs this feature.

Kirkman · 2021-07-14T18:31:23Z

Could .homogenize() be set up to add row names set to None?

I have never used row names for anything explicitly in my code. So I don't understand why it's necessary for .denormalize() to automatically generate them, especially if other methods like homogenize will not generate them.

To me, it feels wrong that the sequence of commands I outlined above (denormalize, homogenize, and then order_by) results in an error. I feel like it should work just fine.

jpmckinney · 2021-07-14T21:14:17Z

Hmm, that's an idea. I'll reopen the issue.

agate lacks a maintainer, so I'm just stepping in to fix some easy issues and close feature requests that will never be resolved.

jpmckinney · 2021-07-14T22:03:55Z

I found a simple solution similar to #691.

jpmckinney closed this as completed Jul 14, 2021

jpmckinney reopened this Jul 14, 2021

jpmckinney added bug priority-low labels Jul 14, 2021

jpmckinney closed this as completed in 60cdde8 Jul 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using homogenize() after denormalize() results in some rows without row_names #756

Using homogenize() after denormalize() results in some rows without row_names #756

Kirkman commented Apr 22, 2021 •

edited

Kirkman commented Apr 23, 2021 •

edited

jpmckinney commented Jul 14, 2021

Kirkman commented Jul 14, 2021 •

edited

jpmckinney commented Jul 14, 2021

jpmckinney commented Jul 14, 2021

Using homogenize() after denormalize() results in some rows without row_names #756

Using homogenize() after denormalize() results in some rows without row_names #756

Comments

Kirkman commented Apr 22, 2021 • edited

Kirkman commented Apr 23, 2021 • edited

jpmckinney commented Jul 14, 2021

Kirkman commented Jul 14, 2021 • edited

jpmckinney commented Jul 14, 2021

jpmckinney commented Jul 14, 2021

Kirkman commented Apr 22, 2021 •

edited

Kirkman commented Apr 23, 2021 •

edited

Kirkman commented Jul 14, 2021 •

edited