Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-43348][PYTHON] Support Python 3.8 in PyPy3 #41024

Closed
wants to merge 3 commits into from
Closed

[SPARK-43348][PYTHON] Support Python 3.8 in PyPy3 #41024

wants to merge 3 commits into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented May 2, 2023

What changes were proposed in this pull request?

This PR aims two goals.

  1. Make PySpark support Python 3.8+ with PyPy3
  2. Upgrade PyPy3 to Python 3.8 in our GitHub Action Infra Image to enable test coverage

Note that there was one failure at test_create_dataframe_from_pandas_with_day_time_interval test case. This PR skips the test case and SPARK-43354 will recover it after further investigation.

Why are the changes needed?

Previously, PySpark fails at PyPy3 Python 3.8 environment.

pypy3 version is: Python 3.8.16 (a9dbdca6fc3286b0addd2240f11d97d8e8de187a, Dec 29 2022, 11:45:13)
[PyPy 7.3.11 with GCC 10.2.1 20210130 (Red Hat 10.2.1-11)]
Starting test(pypy3): pyspark.sql.tests.pandas.test_pandas_cogrouped_map (temp output: /__w/spark/spark/python/target/f1cacde7-d369-48cf-a8ea-724c42872020/pypy3__pyspark.sql.tests.pandas.test_pandas_cogrouped_map__rxih6dqu.log)
Traceback (most recent call last):
  File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/__w/spark/spark/python/pyspark/__init__.py", line 59, in <module>
    from pyspark.rdd import RDD, RDDBarrier
  File "/__w/spark/spark/python/pyspark/rdd.py", line 54, in <module>
    from pyspark.java_gateway import local_connect_and_auth
  File "/__w/spark/spark/python/pyspark/java_gateway.py", line 32, in <module>
    from pyspark.serializers import read_int, write_with_length, UTF8Deserializer
  File "/__w/spark/spark/python/pyspark/serializers.py", line 69, in <module>
    from pyspark import cloudpickle
  File "/__w/spark/spark/python/pyspark/cloudpickle/__init__.py", line 1, in <module>
    from pyspark.cloudpickle.cloudpickle import *  # noqa
  File "/__w/spark/spark/python/pyspark/cloudpickle/cloudpickle.py", line 56, in <module>
    from .compat import pickle
  File "/__w/spark/spark/python/pyspark/cloudpickle/compat.py", line 13, in <module>
    from _pickle import Pickler  # noqa: F401
ModuleNotFoundError: No module named '_pickle'

To support Python 3.8 in PyPy3.

Does this PR introduce any user-facing change?

This is an additional support.

How was this patch tested?

Pass the CIs.

@github-actions github-actions bot added the BUILD label May 2, 2023
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-43348][INFRA] Upgrade PyPy to Python 3.8 [SPARK-43348][INFRA] Upgrade PyPy to Python 3.8 in GitHub Action Infra Image May 2, 2023
@HyukjinKwon HyukjinKwon requested a review from Yikun May 3, 2023 00:57
@dongjoon-hyun
Copy link
Member Author

Thank you, @HyukjinKwon .

@dongjoon-hyun
Copy link
Member Author

Thank you, @Yikun .

from pickle import _Pickler as Pickler # noqa: F401
else:
import pickle # noqa: F401
from _pickle import Pickler # noqa: F401
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import pickle # noqa: F401
from _pickle import Pickler # noqa: F401
import pickle # noqa: F401
from pickle import Pickler # noqa: F401
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-43348][INFRA] Upgrade PyPy to Python 3.8 in GitHub Action Infra Image [SPARK-43348][INFRA] Support Python 3.8 in PyPy3 and test in GitHub Action Infra Image May 3, 2023
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-43348][INFRA] Support Python 3.8 in PyPy3 and test in GitHub Action Infra Image [SPARK-43348][INFRA] Support Python 3.8 in PyPy3 and test in GitHub Action May 3, 2023
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-43348][INFRA] Support Python 3.8 in PyPy3 and test in GitHub Action [SPARK-43348][PYTHOIN] Support Python 3.8 in PyPy3 and test in GitHub Action May 3, 2023
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-43348][PYTHOIN] Support Python 3.8 in PyPy3 and test in GitHub Action [SPARK-43348][PYTHOIN] Support Python 3.8 in PyPy3 May 3, 2023
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-43348][PYTHOIN] Support Python 3.8 in PyPy3 [SPARK-43348][PYTHON] Support Python 3.8 in PyPy3 May 3, 2023
@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented May 3, 2023

Could you review once more, @HyukjinKwon and @Yikun .
I found that we didn't support Python 3.8+ in PyPy3 environment.
I converted this PR from INFRA to PYTHON module in both the JIRA and here.

@dongjoon-hyun
Copy link
Member Author

Thank you, @HyukjinKwon !

@dongjoon-hyun
Copy link
Member Author

Merged to master. Thank you all!
The last commit was disabling a single test case.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-43348 branch May 3, 2023 14:01
HyukjinKwon added a commit that referenced this pull request May 8, 2023
…st only with PyPy 3.8

### What changes were proposed in this pull request?

This PR is a followup of #41024 that skips the test only with PyPy 3.8.

### Why are the changes needed?

To narrow the scope of testing skipped.

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

CI in this PR should verify the change.

Closes #41085 from HyukjinKwon/SPARK-43354-followup.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
LuciferYang pushed a commit to LuciferYang/spark that referenced this pull request May 10, 2023
### What changes were proposed in this pull request?

This PR aims two goals.
1. Make PySpark support Python 3.8+ with PyPy3
2. Upgrade PyPy3 to Python 3.8 in our GitHub Action Infra Image to enable test coverage

Note that there was one failure at `test_create_dataframe_from_pandas_with_day_time_interval` test case. This PR skips the test case and SPARK-43354 will recover it after further investigation.

### Why are the changes needed?

Previously, PySpark fails at PyPy3 `Python 3.8` environment.
```
pypy3 version is: Python 3.8.16 (a9dbdca6fc3286b0addd2240f11d97d8e8de187a, Dec 29 2022, 11:45:13)
[PyPy 7.3.11 with GCC 10.2.1 20210130 (Red Hat 10.2.1-11)]
Starting test(pypy3): pyspark.sql.tests.pandas.test_pandas_cogrouped_map (temp output: /__w/spark/spark/python/target/f1cacde7-d369-48cf-a8ea-724c42872020/pypy3__pyspark.sql.tests.pandas.test_pandas_cogrouped_map__rxih6dqu.log)
Traceback (most recent call last):
  File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/__w/spark/spark/python/pyspark/__init__.py", line 59, in <module>
    from pyspark.rdd import RDD, RDDBarrier
  File "/__w/spark/spark/python/pyspark/rdd.py", line 54, in <module>
    from pyspark.java_gateway import local_connect_and_auth
  File "/__w/spark/spark/python/pyspark/java_gateway.py", line 32, in <module>
    from pyspark.serializers import read_int, write_with_length, UTF8Deserializer
  File "/__w/spark/spark/python/pyspark/serializers.py", line 69, in <module>
    from pyspark import cloudpickle
  File "/__w/spark/spark/python/pyspark/cloudpickle/__init__.py", line 1, in <module>
    from pyspark.cloudpickle.cloudpickle import *  # noqa
  File "/__w/spark/spark/python/pyspark/cloudpickle/cloudpickle.py", line 56, in <module>
    from .compat import pickle
  File "/__w/spark/spark/python/pyspark/cloudpickle/compat.py", line 13, in <module>
    from _pickle import Pickler  # noqa: F401
ModuleNotFoundError: No module named '_pickle'
```

To support Python 3.8 in PyPy3.
- From PyPy3.8, `_pickle` is removed.
  - cloudpipe/cloudpickle#458
- We need this change.
  - cloudpipe/cloudpickle#469

### Does this PR introduce _any_ user-facing change?

This is an additional support.

### How was this patch tested?

Pass the CIs.

Closes apache#41024 from dongjoon-hyun/SPARK-43348.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
LuciferYang pushed a commit to LuciferYang/spark that referenced this pull request May 10, 2023
…st only with PyPy 3.8

### What changes were proposed in this pull request?

This PR is a followup of apache#41024 that skips the test only with PyPy 3.8.

### Why are the changes needed?

To narrow the scope of testing skipped.

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

CI in this PR should verify the change.

Closes apache#41085 from HyukjinKwon/SPARK-43354-followup.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants