Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatible Architecture #34

Closed
swetepete opened this issue Jul 6, 2022 · 4 comments
Closed

Incompatible Architecture #34

swetepete opened this issue Jul 6, 2022 · 4 comments

Comments

@swetepete
Copy link

swetepete commented Jul 6, 2022

I am using a 2021 iMac with the Apple M1 chip and macOS Monterey 12.4.

So far to set up PySpark I have pip3 installed pyspark, plus cloned this repo and installed from the requirements.txt file, plus downloaded Java from their homepage. I'm using Python 3.8.9.

I added the path to the pip3 installation of pyspark to SPARK_HOME in my .zshrc and sourced it:

% echo $SPARK_HOME
/Users/julius/Library/Python/3.8/lib/python/site-packages/pyspark

I then executed the following command:

$SPARK_HOME/bin/spark-submit ./server_count.py \
	--num_output_partitions 1 --log_level WARN \
	./input/test_warc.txt servernames

I had to execute this from inside the cc-pyspark repo, otherwise the script could not find the program server_count.py.

It returns this error message:

julius@Juliuss-iMac cc-pyspark % $SPARK_HOME/bin/spark-submit ./server_count.py \
        --num_output_partitions 1 --log_level WARN \
        ./input/test_warc.txt servernames
Traceback (most recent call last):
  File "/Users/julius/cc-pyspark/server_count.py", line 1, in <module>
    import ujson as json
ImportError: dlopen(/Users/julius/Library/Python/3.8/lib/python/site-packages/ujson.cpython-38-darwin.so, 0x0002): tried: '/Users/julius/Library/Python/3.8/lib/python/site-packages/ujson.cpython-38-darwin.so' (mach-o file, but is an incompatible architecture (have 'arm64', need 'x86_64'))
22/07/06 15:04:13 INFO ShutdownHookManager: Shutdown hook called
22/07/06 15:04:13 INFO ShutdownHookManager: Deleting directory /private/var/folders/xv/yzpjb77s2qg14px8dc7g4m_80000gn/T/spark-80c476e9-b5ba-4710-b292-e367dd387ece

There's something wrong with my installation of "ujson", it is for arm, but PySpark is designed for x86? Is that correct?

What is the simplest way to fix this issue? Should I try to run PySpark in some kind of x86 emulation like Rosetta? Has PySpark not been designed for the M1 Chip?

Is there a chance this is the fault of my Java installation? I took the first one offered; it seemed to say x86, but when I tested running PySpark on its own, it seemed to work fine.

Thanks very much

@sebastian-nagel
Copy link
Contributor

Hi @swetepete,

PySpark runs also on ARM - we use it in production on a Hadoop 3.2 / 3.3 based on Apache Bigtop, sometimes even on a mixed cluster (ARM and AMD64 machines).

The issue with ujson seems to be known, see on Stackoverflow or ultrajson#456.

Since ujson is an API-compatible but more performant replacement for the json module, you might work around the issue by

try:
  import ujson as json
except ImportError:
  import json

@swetepete
Copy link
Author

Thank you. That, as well as pip installing psutil, has allowed the command to execute successfully.

The linked bug is tagged as "completed" - should I open a new bug with ujson's developers, seeing as there may be some new compatibility issue they don't know about, or is this something PySpark might be able to address?

Thank you

@sebastian-nagel
Copy link
Contributor

open a new bug with ujson's developers

Nothing I can answer. After a closer look: the issue was fixed for ujson 5.0 and upwards: first, make sure that the latest ujson version is installed and the issue is reproducible.

is this something PySpark might be able to address?

If you mean "cc-pyspark": yes, we could the work-around using the json module as fall-back. But that's not a nice fix: makes the code less readable and less performant.

@sebastian-nagel
Copy link
Contributor

Closing - a work-around exists and the underlying issue in ujson is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants