Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when "where" returns nothing #703

Closed
Mr-F opened this issue Dec 5, 2017 · 3 comments
Closed

Issue when "where" returns nothing #703

Mr-F opened this issue Dec 5, 2017 · 3 comments
Labels

Comments

@Mr-F
Copy link

Mr-F commented Dec 5, 2017

Hi,

I've been giving this a good whirl with some large data and so far things are going well. However, I've come across something, which doesn't seem quite right to me but wanted to check if this was another potential issue or maybe an improvement that could be made.

A problem occurs if you select rows based upon a value that will return you an empty table, and then try and perform other steps. For example

import agate
data = [
    {'name':'keith', 'department': None, 'job':'programmer'},
    {'name':'nick', 'department': None, 'job':'qa'}
]

t = agate.Table.from_object(data)

t.where(lambda row: row['department'] is not None) \
    .group_by('department') \
    .aggregate([('count', agate.Count())]) \ 
    .print_table()

If you try and run this code then you will get IndexError: tuple index out of range. The problem seems to relate (and this might be my incorrect understanding) grouping by a column which doesn't exist because of the previous where removed all rows and thus all column information. A short-term fix is to break apart the above calls, and perform a len operation on the result from the where operation and then decide if to proceed.

What I would have expected to happen is it just to do nothing, and return something similar to this

| department | count |
| ---------- | ----- |

This would be similar to what you get if you just say print_table after the initial where operation.

t.where(lambda row: row['department'] is not None) \
    .print_table()

Which yields

| department | job | name |
| ---------- | --- | ---- |

Also this would be more in keeping with SQL analogy as if you perform a group by on a query which the where has removed all rows then the SQL command proceeds to return you an empty result set. This would also be more python like in the way it loops through empty lists/sets

@Kirkman
Copy link

Kirkman commented Apr 8, 2019

I am also having this problem. It happens when I try to aggregate after grouping:

    homicide_monthly = (the_table
        .where(lambda row: row['crime_category'] == 'homicide')
        .group_by('date_occurred')
        .aggregate([
            ('homicide', agate.Sum('count') )
        ])
    )

If I run this on a smaller-than-usual dataset, there may not be any homicides. In such a case, I get the IndexError: tuple index out of range error, similar to what @Mr-F described above.

@jpmckinney
Copy link
Member

jpmckinney commented Jul 14, 2021

Possibly related: #714

@jpmckinney
Copy link
Member

Thank you @Mr-F for the reproducible code and clear description!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants