Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ValidationError unpicklable #1616

Closed
abadger opened this issue Jun 9, 2020 · 5 comments · Fixed by #1630
Closed

Make ValidationError unpicklable #1616

abadger opened this issue Jun 9, 2020 · 5 comments · Fixed by #1630
Labels
feature request help wanted Pull Request welcome

Comments

@abadger
Copy link

abadger commented Jun 9, 2020

Bug

Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())":

             pydantic version: 1.5.1
            pydantic compiled: False
                 install path: /home/badger/.cache/pypoetry/virtualenvs/antsibull-F3umzYPl-py3.8/lib/python3.8/site-packages/pydantic
               python version: 3.8.3 (default, Feb 26 2020, 00:00:00)  [GCC 9.3.1 20200408 (Red Hat 9.3.1-2)]
                     platform: Linux-5.6.15-200.fc31.x86_64-x86_64-with-glibc2.2.5
     optional deps. installed: ['typing-extensions']
...

Use case

I'm going to give you two code snippets because it might not be obvious from the simplest case why I would want to do it.

A simple approximation of my use case is here: https://gist.github.com/abadger/bfd55741c281ccb534f7bbc8fe9b6202

I am trying to use pydantic to validate and normalize data from a large number of data sources I need to run each validation separately so that I can know which data sources are providing invalid data. I decided to split it up amongst multiple CPUs by using asyncio's run_in_executor with a ProcessPoolExecutor. However, when the pydantic.constr validation failed, I would get a BrokenProcessPool error on everything that had been queued but not run rather than a pydantic ValidationError on the specific task which failed.

Root cause

I was able to workaround the problem by catching the pydantic exception and raising a ValueError with all of the information I needed. This lead me to the root cause: pydantic errors are not unpicklable. Because of that, the exception raised in the worker process is pickled there and sent back to the parent process. The parent process attempts to unpickle it, encounters the error, and then gives the generic, unhelpful BrokenProcessPool error and cancels the other pending tasks.

Here's a reproducer for the root cause:

import pickle
from pydantic.errors import StrRegexError

p = pickle.dumps(StrRegexError(pattern='test'))
print(pickle.loads(p))

# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# TypeError: __init__() missing 1 required positional argument: 'pattern'

Looking at the python stdlib bugtracker there are many open bugs with interactions between pickle and exceptions. I didn't see this one so I added this: https://bugs.python.org/issue40917 Some others that might cause different bugs with pydantics exceptions:

Given so many potential bugs, I'm not sure if this is solvable in pydantic code or has to wait for pickle fixes. However, if it's not solvable, adding my workaround and an explanation of what's happening to the docs would be nice. That way searching for pydantic, ProcessPoolExecutor, pickle, multiprocessing might save the next person some time wondering why only a portion of their data was being converted.

@abadger abadger added the bug V1 Bug related to Pydantic V1.X label Jun 9, 2020
@samuelcolvin samuelcolvin added feature request help wanted Pull Request welcome and removed bug V1 Bug related to Pydantic V1.X labels Jun 11, 2020
@samuelcolvin
Copy link
Member

Surely as simple as implementing __getstate__ and __setstate__ on ValidatoinError?

If it is that simple, PR welcome to implement it.

@samuelcolvin samuelcolvin changed the title Problems using pydantic with asyncio or multiprocessing Make ValidationError picklable Jun 11, 2020
@abadger
Copy link
Author

abadger commented Jun 11, 2020

I don't believe so. I did spend a few hours trying to do just that and couldn't get it to work. Iirc, the problem with that approach is that __setstate__ isn't called until after the object's __init__ is called. So the error has already occurred and you don't get a chance to fix things.

Reading the pickle docs, it seemed like __getnewargs_ex__ might be a way to get that to work but i failed at getting that to work too. I'm not sure, though, if i just didn't/don't understand getnewargs or if it turned out that exceptions are specialcased already and that specialcasing was interfering (in one of the python bug reports, ncoghlan noted that there were specialcases in exceptions which perhaps no longer worked with the new ways that picked worked)

@abadger
Copy link
Author

abadger commented Jun 12, 2020

My python.org bug from earlier was a duplicate. I've closed it. This is the older bug: https://bugs.python.org/issue27015 There's a cpython PR which I confirmed would fix at least the mandatory keyword args. It is currently awaiting another review from a python core developer: python/cpython#11580

@abadger abadger changed the title Make ValidationError picklable Make ValidationError unpicklable Jun 12, 2020
@PrettyWood
Copy link
Member

Hi @abadger I just had a look at it and I think I have something working.
I'm opening a PR! Please tell me if it works with your whole example

PrettyWood added a commit to PrettyWood/pydantic that referenced this issue Jun 13, 2020
PrettyWood added a commit to PrettyWood/pydantic that referenced this issue Jun 13, 2020
@abadger
Copy link
Author

abadger commented Jun 25, 2020

I have verified that the simplified script in https://gist.github.com/abadger/bfd55741c281ccb534f7bbc8fe9b6202 and my original script are both fixed by #1630. Thanks @PrettyWood !

samuelcolvin pushed a commit that referenced this issue Jun 27, 2020
* fix: make pydantic errors (un)pickable

closes #1616

* add typing

* refactor: rename kwargs into ctx
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request help wanted Pull Request welcome
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants