Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compat: Import Pickler from "pickle" instead of "_pickle" #469

Merged
merged 1 commit into from May 24, 2022

Conversation

jvesely
Copy link
Contributor

@jvesely jvesely commented May 6, 2022

The latter is an implementation detail (see [0]),
and the docs instruct users to "always import the standard version"[0].
Moreover, "pickle.Pickler" is the same as "_pickle.Pickler",
for all recent CPython releases (tested 3.6->3.9).

[0] https://docs.python.org/3.1/whatsnew/3.0.html#library-changes

Fixes: #458

Signed-off-by: Jan Vesely jano.vesely@gmail.com

@zwelz3
Copy link

zwelz3 commented May 11, 2022

I can't approve, but this LGTM.

@jvesely jvesely changed the title compat: Import Picker from "pickle" instead of "_pickle" compat: Import Pickler from "pickle" instead of "_pickle" May 18, 2022
@jvesely
Copy link
Contributor Author

jvesely commented May 18, 2022

I can't approve, but this LGTM.

thank you. Do you know who can approve?

@codecov
Copy link

codecov bot commented May 20, 2022

Codecov Report

Merging #469 (2ec0f50) into master (f758eb3) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #469      +/-   ##
==========================================
+ Coverage   92.58%   92.60%   +0.02%     
==========================================
  Files           4        4              
  Lines         715      717       +2     
  Branches      159      158       -1     
==========================================
+ Hits          662      664       +2     
  Misses         32       32              
  Partials       21       21              
Impacted Files Coverage Δ
cloudpickle/compat.py 100.00% <100.00%> (ø)
cloudpickle/cloudpickle_fast.py 96.93% <0.00%> (ø)
cloudpickle/cloudpickle.py 88.59% <0.00%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f758eb3...2ec0f50. Read the comment docs.

cloudpickle/compat.py Outdated Show resolved Hide resolved
@ogrisel
Copy link
Contributor

ogrisel commented May 20, 2022

The 3.11 tests will be fixed when merging #470.

cloudpickle/compat.py Outdated Show resolved Hide resolved
else:
import pickle # noqa: F401
from _pickle import Pickler # noqa: F401
from pickle import Pickler # noqa: F401
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change on the other hand should be fine: on recent Python versions from pickle import Pickler imports the fast C implementation from the _pickle builtin module (when available) and falls back to the C variant if Python was not built with this module.

The latter is an implementation detail (see [0]),
and the docs instruct users to "always import the standard version"[0].
Moreover, "pickle.Pickler" is the same as "_pickle.Pickler",
for all recent CPython releases (tested 3.6->3.9).

[0] https://docs.python.org/3.1/whatsnew/3.0.html#library-changes

Fixes: cloudpipe#458

Signed-off-by: Jan Vesely <jano.vesely@gmail.com>
Copy link
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ogrisel
Copy link
Contributor

ogrisel commented May 20, 2022

@pierreglaser I let you merge if you agree.

@ogrisel
Copy link
Contributor

ogrisel commented May 24, 2022

Let's merge, I don't think it's likely to have any impact in practice but makes the code cleaner and possibly more compatible for alternative Python implementations such as PyPy.

@ogrisel ogrisel merged commit 8bbea3e into cloudpipe:master May 24, 2022
dongjoon-hyun added a commit to apache/spark that referenced this pull request May 3, 2023
### What changes were proposed in this pull request?

This PR aims two goals.
1. Make PySpark support Python 3.8+ with PyPy3
2. Upgrade PyPy3 to Python 3.8 in our GitHub Action Infra Image to enable test coverage

Note that there was one failure at `test_create_dataframe_from_pandas_with_day_time_interval` test case. This PR skips the test case and SPARK-43354 will recover it after further investigation.

### Why are the changes needed?

Previously, PySpark fails at PyPy3 `Python 3.8` environment.
```
pypy3 version is: Python 3.8.16 (a9dbdca6fc3286b0addd2240f11d97d8e8de187a, Dec 29 2022, 11:45:13)
[PyPy 7.3.11 with GCC 10.2.1 20210130 (Red Hat 10.2.1-11)]
Starting test(pypy3): pyspark.sql.tests.pandas.test_pandas_cogrouped_map (temp output: /__w/spark/spark/python/target/f1cacde7-d369-48cf-a8ea-724c42872020/pypy3__pyspark.sql.tests.pandas.test_pandas_cogrouped_map__rxih6dqu.log)
Traceback (most recent call last):
  File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/__w/spark/spark/python/pyspark/__init__.py", line 59, in <module>
    from pyspark.rdd import RDD, RDDBarrier
  File "/__w/spark/spark/python/pyspark/rdd.py", line 54, in <module>
    from pyspark.java_gateway import local_connect_and_auth
  File "/__w/spark/spark/python/pyspark/java_gateway.py", line 32, in <module>
    from pyspark.serializers import read_int, write_with_length, UTF8Deserializer
  File "/__w/spark/spark/python/pyspark/serializers.py", line 69, in <module>
    from pyspark import cloudpickle
  File "/__w/spark/spark/python/pyspark/cloudpickle/__init__.py", line 1, in <module>
    from pyspark.cloudpickle.cloudpickle import *  # noqa
  File "/__w/spark/spark/python/pyspark/cloudpickle/cloudpickle.py", line 56, in <module>
    from .compat import pickle
  File "/__w/spark/spark/python/pyspark/cloudpickle/compat.py", line 13, in <module>
    from _pickle import Pickler  # noqa: F401
ModuleNotFoundError: No module named '_pickle'
```

To support Python 3.8 in PyPy3.
- From PyPy3.8, `_pickle` is removed.
  - cloudpipe/cloudpickle#458
- We need this change.
  - cloudpipe/cloudpickle#469

### Does this PR introduce _any_ user-facing change?

This is an additional support.

### How was this patch tested?

Pass the CIs.

Closes #41024 from dongjoon-hyun/SPARK-43348.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
LuciferYang pushed a commit to LuciferYang/spark that referenced this pull request May 10, 2023
### What changes were proposed in this pull request?

This PR aims two goals.
1. Make PySpark support Python 3.8+ with PyPy3
2. Upgrade PyPy3 to Python 3.8 in our GitHub Action Infra Image to enable test coverage

Note that there was one failure at `test_create_dataframe_from_pandas_with_day_time_interval` test case. This PR skips the test case and SPARK-43354 will recover it after further investigation.

### Why are the changes needed?

Previously, PySpark fails at PyPy3 `Python 3.8` environment.
```
pypy3 version is: Python 3.8.16 (a9dbdca6fc3286b0addd2240f11d97d8e8de187a, Dec 29 2022, 11:45:13)
[PyPy 7.3.11 with GCC 10.2.1 20210130 (Red Hat 10.2.1-11)]
Starting test(pypy3): pyspark.sql.tests.pandas.test_pandas_cogrouped_map (temp output: /__w/spark/spark/python/target/f1cacde7-d369-48cf-a8ea-724c42872020/pypy3__pyspark.sql.tests.pandas.test_pandas_cogrouped_map__rxih6dqu.log)
Traceback (most recent call last):
  File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/__w/spark/spark/python/pyspark/__init__.py", line 59, in <module>
    from pyspark.rdd import RDD, RDDBarrier
  File "/__w/spark/spark/python/pyspark/rdd.py", line 54, in <module>
    from pyspark.java_gateway import local_connect_and_auth
  File "/__w/spark/spark/python/pyspark/java_gateway.py", line 32, in <module>
    from pyspark.serializers import read_int, write_with_length, UTF8Deserializer
  File "/__w/spark/spark/python/pyspark/serializers.py", line 69, in <module>
    from pyspark import cloudpickle
  File "/__w/spark/spark/python/pyspark/cloudpickle/__init__.py", line 1, in <module>
    from pyspark.cloudpickle.cloudpickle import *  # noqa
  File "/__w/spark/spark/python/pyspark/cloudpickle/cloudpickle.py", line 56, in <module>
    from .compat import pickle
  File "/__w/spark/spark/python/pyspark/cloudpickle/compat.py", line 13, in <module>
    from _pickle import Pickler  # noqa: F401
ModuleNotFoundError: No module named '_pickle'
```

To support Python 3.8 in PyPy3.
- From PyPy3.8, `_pickle` is removed.
  - cloudpipe/cloudpickle#458
- We need this change.
  - cloudpipe/cloudpickle#469

### Does this PR introduce _any_ user-facing change?

This is an additional support.

### How was this patch tested?

Pass the CIs.

Closes apache#41024 from dongjoon-hyun/SPARK-43348.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

import Pickler class directly from the pickle module
3 participants