compat: Import Pickler from "pickle" instead of "_pickle" #469

jvesely · 2022-05-06T14:34:02Z

The latter is an implementation detail (see [0]),
and the docs instruct users to "always import the standard version"[0].
Moreover, "pickle.Pickler" is the same as "_pickle.Pickler",
for all recent CPython releases (tested 3.6->3.9).

[0] https://docs.python.org/3.1/whatsnew/3.0.html#library-changes

Fixes: #458

Signed-off-by: Jan Vesely jano.vesely@gmail.com

zwelz3 · 2022-05-11T14:10:20Z

I can't approve, but this LGTM.

jvesely · 2022-05-18T22:11:54Z

I can't approve, but this LGTM.

thank you. Do you know who can approve?

codecov · 2022-05-20T11:34:27Z

Codecov Report

Merging #469 (2ec0f50) into master (f758eb3) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #469      +/-   ##
==========================================
+ Coverage   92.58%   92.60%   +0.02%     
==========================================
  Files           4        4              
  Lines         715      717       +2     
  Branches      159      158       -1     
==========================================
+ Hits          662      664       +2     
  Misses         32       32              
  Partials       21       21

Impacted Files	Coverage Δ
cloudpickle/compat.py	`100.00% <100.00%> (ø)`
cloudpickle/cloudpickle_fast.py	`96.93% <0.00%> (ø)`
cloudpickle/cloudpickle.py	`88.59% <0.00%> (+0.06%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f758eb3...2ec0f50. Read the comment docs.

cloudpickle/compat.py

ogrisel · 2022-05-20T13:20:46Z

The 3.11 tests will be fixed when merging #470.

cloudpickle/compat.py

ogrisel · 2022-05-20T13:40:40Z

cloudpickle/compat.py

 else:
    import pickle  # noqa: F401
-    from _pickle import Pickler  # noqa: F401
+    from pickle import Pickler  # noqa: F401


This change on the other hand should be fine: on recent Python versions from pickle import Pickler imports the fast C implementation from the _pickle builtin module (when available) and falls back to the C variant if Python was not built with this module.

The latter is an implementation detail (see [0]), and the docs instruct users to "always import the standard version"[0]. Moreover, "pickle.Pickler" is the same as "_pickle.Pickler", for all recent CPython releases (tested 3.6->3.9). [0] https://docs.python.org/3.1/whatsnew/3.0.html#library-changes Fixes: cloudpipe#458 Signed-off-by: Jan Vesely <jano.vesely@gmail.com>

ogrisel

LGTM

ogrisel · 2022-05-20T18:19:29Z

@pierreglaser I let you merge if you agree.

ogrisel · 2022-05-24T08:38:44Z

Let's merge, I don't think it's likely to have any impact in practice but makes the code cleaner and possibly more compatible for alternative Python implementations such as PyPy.

### What changes were proposed in this pull request? This PR aims two goals. 1. Make PySpark support Python 3.8+ with PyPy3 2. Upgrade PyPy3 to Python 3.8 in our GitHub Action Infra Image to enable test coverage Note that there was one failure at `test_create_dataframe_from_pandas_with_day_time_interval` test case. This PR skips the test case and SPARK-43354 will recover it after further investigation. ### Why are the changes needed? Previously, PySpark fails at PyPy3 `Python 3.8` environment. ``` pypy3 version is: Python 3.8.16 (a9dbdca6fc3286b0addd2240f11d97d8e8de187a, Dec 29 2022, 11:45:13) [PyPy 7.3.11 with GCC 10.2.1 20210130 (Red Hat 10.2.1-11)] Starting test(pypy3): pyspark.sql.tests.pandas.test_pandas_cogrouped_map (temp output: /__w/spark/spark/python/target/f1cacde7-d369-48cf-a8ea-724c42872020/pypy3__pyspark.sql.tests.pandas.test_pandas_cogrouped_map__rxih6dqu.log) Traceback (most recent call last): File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 188, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 111, in _get_module_details __import__(pkg_name) File "/__w/spark/spark/python/pyspark/__init__.py", line 59, in <module> from pyspark.rdd import RDD, RDDBarrier File "/__w/spark/spark/python/pyspark/rdd.py", line 54, in <module> from pyspark.java_gateway import local_connect_and_auth File "/__w/spark/spark/python/pyspark/java_gateway.py", line 32, in <module> from pyspark.serializers import read_int, write_with_length, UTF8Deserializer File "/__w/spark/spark/python/pyspark/serializers.py", line 69, in <module> from pyspark import cloudpickle File "/__w/spark/spark/python/pyspark/cloudpickle/__init__.py", line 1, in <module> from pyspark.cloudpickle.cloudpickle import * # noqa File "/__w/spark/spark/python/pyspark/cloudpickle/cloudpickle.py", line 56, in <module> from .compat import pickle File "/__w/spark/spark/python/pyspark/cloudpickle/compat.py", line 13, in <module> from _pickle import Pickler # noqa: F401 ModuleNotFoundError: No module named '_pickle' ``` To support Python 3.8 in PyPy3. - From PyPy3.8, `_pickle` is removed. - cloudpipe/cloudpickle#458 - We need this change. - cloudpipe/cloudpickle#469 ### Does this PR introduce _any_ user-facing change? This is an additional support. ### How was this patch tested? Pass the CIs. Closes #41024 from dongjoon-hyun/SPARK-43348. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

### What changes were proposed in this pull request? This PR aims two goals. 1. Make PySpark support Python 3.8+ with PyPy3 2. Upgrade PyPy3 to Python 3.8 in our GitHub Action Infra Image to enable test coverage Note that there was one failure at `test_create_dataframe_from_pandas_with_day_time_interval` test case. This PR skips the test case and SPARK-43354 will recover it after further investigation. ### Why are the changes needed? Previously, PySpark fails at PyPy3 `Python 3.8` environment. ``` pypy3 version is: Python 3.8.16 (a9dbdca6fc3286b0addd2240f11d97d8e8de187a, Dec 29 2022, 11:45:13) [PyPy 7.3.11 with GCC 10.2.1 20210130 (Red Hat 10.2.1-11)] Starting test(pypy3): pyspark.sql.tests.pandas.test_pandas_cogrouped_map (temp output: /__w/spark/spark/python/target/f1cacde7-d369-48cf-a8ea-724c42872020/pypy3__pyspark.sql.tests.pandas.test_pandas_cogrouped_map__rxih6dqu.log) Traceback (most recent call last): File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 188, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 111, in _get_module_details __import__(pkg_name) File "/__w/spark/spark/python/pyspark/__init__.py", line 59, in <module> from pyspark.rdd import RDD, RDDBarrier File "/__w/spark/spark/python/pyspark/rdd.py", line 54, in <module> from pyspark.java_gateway import local_connect_and_auth File "/__w/spark/spark/python/pyspark/java_gateway.py", line 32, in <module> from pyspark.serializers import read_int, write_with_length, UTF8Deserializer File "/__w/spark/spark/python/pyspark/serializers.py", line 69, in <module> from pyspark import cloudpickle File "/__w/spark/spark/python/pyspark/cloudpickle/__init__.py", line 1, in <module> from pyspark.cloudpickle.cloudpickle import * # noqa File "/__w/spark/spark/python/pyspark/cloudpickle/cloudpickle.py", line 56, in <module> from .compat import pickle File "/__w/spark/spark/python/pyspark/cloudpickle/compat.py", line 13, in <module> from _pickle import Pickler # noqa: F401 ModuleNotFoundError: No module named '_pickle' ``` To support Python 3.8 in PyPy3. - From PyPy3.8, `_pickle` is removed. - cloudpipe/cloudpickle#458 - We need this change. - cloudpipe/cloudpickle#469 ### Does this PR introduce _any_ user-facing change? This is an additional support. ### How was this patch tested? Pass the CIs. Closes apache#41024 from dongjoon-hyun/SPARK-43348. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

jvesely changed the title ~~compat: Import Picker from "pickle" instead of "_pickle"~~ compat: Import Pickler from "pickle" instead of "_pickle" May 18, 2022

jvesely force-pushed the master branch from cde91b9 to 9c64b6f Compare May 18, 2022 22:11

ogrisel reviewed May 20, 2022

View reviewed changes

cloudpickle/compat.py Outdated Show resolved Hide resolved

ogrisel reviewed May 20, 2022

View reviewed changes

jvesely force-pushed the master branch from 9c64b6f to 2ec0f50 Compare May 20, 2022 17:00

ogrisel approved these changes May 20, 2022

View reviewed changes

ogrisel merged commit 8bbea3e into cloudpipe:master May 24, 2022

jvesely mentioned this pull request May 24, 2022

cloudpickle should import Pickler class directly from the pickle module joblib/joblib#1238

Closed

ogrisel mentioned this pull request Sep 5, 2022

MAINT update lock files on CI scikit-learn/scikit-learn#24363

Merged

lesteve mentioned this pull request Sep 6, 2022

pypy 3.8 fails to import cloudpickle #455

Closed

dongjoon-hyun mentioned this pull request May 3, 2023

[SPARK-43348][PYTHON] Support Python 3.8 in PyPy3 apache/spark#41024

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compat: Import Pickler from "pickle" instead of "_pickle" #469

compat: Import Pickler from "pickle" instead of "_pickle" #469

jvesely commented May 6, 2022

zwelz3 commented May 11, 2022

jvesely commented May 18, 2022

codecov bot commented May 20, 2022 •

edited

ogrisel commented May 20, 2022

ogrisel May 20, 2022

ogrisel left a comment

ogrisel commented May 20, 2022

ogrisel commented May 24, 2022

compat: Import Pickler from "pickle" instead of "_pickle" #469

compat: Import Pickler from "pickle" instead of "_pickle" #469

Conversation

jvesely commented May 6, 2022

zwelz3 commented May 11, 2022

jvesely commented May 18, 2022

codecov bot commented May 20, 2022 • edited

Codecov Report

ogrisel commented May 20, 2022

ogrisel May 20, 2022

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel commented May 20, 2022

ogrisel commented May 24, 2022

codecov bot commented May 20, 2022 •

edited