Better handle IA caching #7

cdrini · 2019-05-14T04:43:00Z

Often get an error when requesting too much data; should handle this more elegantly, because things slow WAY down when we can't cache IA metadata.

2019-05-14 00:09:10,175 [ERROR] Error while caching IA
Traceback (most recent call last):
  File "solr_builder_main.py", line 149, in solr_builder.solr_builder_main.LocalPostgresDataProvider.cache_ia_metadata
    for doc in self._get_lite_metadata(b, rows=batch_size)['docs']:
  File "solr_builder_main.py", line 139, in solr_builder.solr_builder_main.LocalPostgresDataProvider._get_lite_metadata
    return simplejson.loads(resp_str)['response']
  File "/usr/local/lib/python2.7/dist-packages/simplejson/__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python2.7/dist-packages/simplejson/decoder.py", line 373, in decode
    raise JSONDecodeError("Extra data", s, end, len(s))
JSONDecodeError: Extra data: line 3 column 1 - line 3 column 19998 (char 61 - 20058)

The text was updated successfully, but these errors were encountered:

tfmorris · 2019-07-22T18:48:36Z

I have a fix for this on my branch. It:

caches all ocaids regardless of length
queries IA in chunks to keep the URL size manageable
doesn't retry anything that's not in the cache (because it'll just fail again, like it did the first time)

tfmorris · 2019-07-31T22:13:50Z

Having said that, I don't think we should be querying an API during a bulk load operation at all. I don't think we actually need this data, but if we do, we should get it from a bulk dump from IA and read it from that file as part of the indexing process.

Requiring a low latency network with 100% availability is too fragile.

cdrini added the performance label May 14, 2019

cdrini added this to To Do in Solr Builder May 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better handle IA caching #7

Better handle IA caching #7

cdrini commented May 14, 2019

tfmorris commented Jul 22, 2019

tfmorris commented Jul 31, 2019

Better handle IA caching #7

Better handle IA caching #7

Comments

cdrini commented May 14, 2019

tfmorris commented Jul 22, 2019

tfmorris commented Jul 31, 2019