New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] significant slow down with VectorUDT and ParquetCachedBatchSerializer #8474
Comments
I have not run this, but I am just guessing. The GPU does not support UserDefinedTypes right now, and VectorUDT is a user defined type, so we are going to fall back to the CPU to serialize and deserialize them. This might not be ideal because the CPU is really bad at writing parquet in many cases compared to the GPU. 600x better (that is hard to believe so I need to do some testing). |
@eordentlich do you have any instructions on how to get an environment setup to do this? I tried to use conda to setup an environment following the instructions at https://xgboost.readthedocs.io/en/stable/tutorials/spark_estimator.html
But it didn't work and I had to change the last command to
Then when I tried to import xgboost in the notebook
I got an error about no
And now I am getting what appears to be CUDA mismatch of some kind.
|
Indeed, looks like those instructions can use some work. I think a conda cudatoolkit package needs to be added to your conda environment create command: e.g cudatoolkit=11.5 with version ( >= 11.2, <= 11.8) that matches the one installed on your host. If you are running on a single node, you can activate the conda environment and run in either local mode or standalone, with master and worker started in the environment. That said, I think you can replicate the key issue via the following running in a pyspark shell started as
And then in the shell, paste
The first |
Thanks for the simplified setup. I was able to reproduce the caching issue. At least I was able to get the Spark to crash with a timeout when using the parquet cached batch serializer for your initial request (I made the data bigger because I was using more cores, but I guess I made it too big!!!). |
I found at least one really bad problem where we were doing code generation for each row in a specific code path. I need to do some more profiling to see what else might be bad about it. |
@eordentlich do you have the ability to try out #8495? It is not going to solve all of your problems but it would be good to know if it is good enough for now or if we have to start looking at some of the other optimizations too. |
Thanks. I'll have to build the jar (unless it is already in cicd somewhere) and give it a try on that notebook. |
@revans2 I tested the PR and it is a huge improvement. Still 4x slower than mapping vector to array type and back, with PCBS, and about 6x slower than regular non-PCBS caching for the notebook example eval stage. |
@eordentlich glad to hear that it is helping. I'll see if we can get some help in improving the performance even more. @sameerz looks like we should spend some time on the other issues I filed especially #8496 I think we can make it work without too much difficulty. |
Describe the bug
The evaluation step in this example notebook: https://github.com/NVIDIA/spark-rapids-examples/blob/main/examples/XGBoost-Examples/mortgage/notebooks/python/MortgageETL%2BXGBoost.ipynb?short_path=f801328#L1035
is about 600x slower when
ParquetCachedBatchSerializer
is enabled.The
result
DataFrame being processed here has several columns of type VectorUDT. If these columns are either dropped or converted to array type usingpsypark.ml.functions.vector_to_array
before cacheing and then converted back upon read usingarray_to_vector
, the slow down can be avoided. This indicates thatParquetCachedBatchSerializer
has an issue with processingVectorUDT
columns on the read side.Steps/Code to reproduce bug
Run the above example notebook with
ParquetCachedBatchSerializer
enabled.Expected behavior
No 600x slow down.
Environment details (please complete the following information)
Standalone.
Additional context
Related issue, with likely same underlying problem: #5975
The text was updated successfully, but these errors were encountered: