Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed fix for infinite loop at lrucacheextension.pyx #890

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zequihg50
Copy link
Contributor

Hi,

Under non specific conditions I'm sometimes running into an infinite loop at lrucacheextension.pyx:372. When the code enters the while loop, it tries to remove a node from the cache through a call to removeslot_ but if the node is None it does not free cache space (see this).

I have added the following line before this one to prevent selecting None nodes. Although this has worked for the concrete example I was testing I'm not sure if this is an adequate modification.

while size + self.cachesize > self.maxcachesize:
    not_nones = [x is not None for x in self.__list]
    largidx = self.sizes[not_nones].argsort()[-10:]
    ...

Would someone be able to review this modification?

@avalentino
Copy link
Member

@zequihg50 I'm not sure hap you can index the array of sizes with not_nones.
I would like to make a deeper analysis.
Are you able to provide a self-contained test code to reproduce the issue?

Another question: have you had the chance to run the --heavy test suite with your fix?

@zequihg50
Copy link
Contributor Author

I wasn't able to run the tests with the --heavy flag, I was running python setup.py test.

The following PyTables file is an example of the infinite loop:

infloop.zip

This is the code I use to read data:

import tables

f = tables.open_file('infloop.hdf', 'r') 
t = f.get_node('/files')

step = 100000
s = set()
for i in range(len(t)//step + 1):
    s.update( t.colindexes['_eva_ensemble_aggregation'].read_sorted(i*step, i*step+step) )

for i,agg in enumerate(sorted(s)):
    if len(agg) == 0:
        continue

    dataset = t.where('''_eva_ensemble_aggregation == {}'''.format(agg))
    print('i={}, agg={}'.format(i,agg))

f.close()

I see the following output:

i=1, agg=b'CMIP6.AerChemMIP.AS-RCEC.TaiESM1.histSST.AERmon.gn.v20201223@esgf-data1.llnl.gov'
i=2, agg=b'CMIP6.AerChemMIP.AS-RCEC.TaiESM1.histSST.AERmon.gn.v20201223@esgf.rcec.sinica.edu.tw'
i=3, agg=b'CMIP6.AerChemMIP.AS-RCEC.TaiESM1.histSST.Amon.gm.v20200309@esgf.rcec.sinica.edu.tw'
...
i=101, agg=b'CMIP6.AerChemMIP.BCC.BCC-ESM1.hist-piNTCF.SIday.gn.v20190620@cmip.bcc.cma.cn'

It should print additional elements after 101 but it freezes in the infinite loop.

I'm running the current master version.

Forget about the fix I proposed above, it does not work.

@avalentino avalentino marked this pull request as draft December 30, 2021 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants