New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make ValidationError unpicklable #1616
Comments
Surely as simple as implementing If it is that simple, PR welcome to implement it. |
I don't believe so. I did spend a few hours trying to do just that and couldn't get it to work. Iirc, the problem with that approach is that Reading the pickle docs, it seemed like |
My python.org bug from earlier was a duplicate. I've closed it. This is the older bug: https://bugs.python.org/issue27015 There's a cpython PR which I confirmed would fix at least the mandatory keyword args. It is currently awaiting another review from a python core developer: python/cpython#11580 |
Hi @abadger I just had a look at it and I think I have something working. |
I have verified that the simplified script in https://gist.github.com/abadger/bfd55741c281ccb534f7bbc8fe9b6202 and my original script are both fixed by #1630. Thanks @PrettyWood ! |
* fix: make pydantic errors (un)pickable closes #1616 * add typing * refactor: rename kwargs into ctx
Bug
Output of
python -c "import pydantic.utils; print(pydantic.utils.version_info())"
:Use case
I'm going to give you two code snippets because it might not be obvious from the simplest case why I would want to do it.
A simple approximation of my use case is here: https://gist.github.com/abadger/bfd55741c281ccb534f7bbc8fe9b6202
I am trying to use pydantic to validate and normalize data from a large number of data sources I need to run each validation separately so that I can know which data sources are providing invalid data. I decided to split it up amongst multiple CPUs by using asyncio's run_in_executor with a ProcessPoolExecutor. However, when the pydantic.constr validation failed, I would get a BrokenProcessPool error on everything that had been queued but not run rather than a pydantic ValidationError on the specific task which failed.
Root cause
I was able to workaround the problem by catching the pydantic exception and raising a ValueError with all of the information I needed. This lead me to the root cause: pydantic errors are not unpicklable. Because of that, the exception raised in the worker process is pickled there and sent back to the parent process. The parent process attempts to unpickle it, encounters the error, and then gives the generic, unhelpful BrokenProcessPool error and cancels the other pending tasks.
Here's a reproducer for the root cause:
Looking at the python stdlib bugtracker there are many open bugs with interactions between pickle and exceptions. I didn't see this one so I added this: https://bugs.python.org/issue40917 Some others that might cause different bugs with pydantics exceptions:
Given so many potential bugs, I'm not sure if this is solvable in pydantic code or has to wait for pickle fixes. However, if it's not solvable, adding my workaround and an explanation of what's happening to the docs would be nice. That way searching for pydantic, ProcessPoolExecutor, pickle, multiprocessing might save the next person some time wondering why only a portion of their data was being converted.
The text was updated successfully, but these errors were encountered: