projects.document_list causing Read TimeOut Errors. #156

duckduckgrayduck · 2023-08-18T20:41:49Z

Summary of the problem

Big projects (>10k documents), are having trouble with the wrapper loading the document list.

Steps to reproduce the bug

from documentcloud import DocumentCloud

client = DocumentCloud('invisibleinstitute', password, timeout=400)
project = client.projects.get_by_id(49649)

docs = project.document_list

What did you expect to happen?

It should return the document_list, which can be accessed. Instead, it is returning a read timeout failure. This was originally discovered by Matt Chapman on his Invisible Institute Document set of ~30k documents (1 million pages).

docs = client.documents.search("project:49649") is a workaround for now.

We have to determine if we get rid count for projects, which gets rid of it for project embeds too, in order to resolve this issue.

duckduckgrayduck added the bug Something isn't working label Aug 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

projects.document_list causing Read TimeOut Errors. #156

projects.document_list causing Read TimeOut Errors. #156

duckduckgrayduck commented Aug 18, 2023

projects.document_list causing Read TimeOut Errors. #156

projects.document_list causing Read TimeOut Errors. #156

Comments

duckduckgrayduck commented Aug 18, 2023

Summary of the problem

Steps to reproduce the bug

What did you expect to happen?