OP_REQUIRES failed at xla_ops : UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND #2214

kmkolasinski · 2024-04-07T08:21:00Z

Bug Report

Does Tensorflow Serving support XLA compiled SavedModels ? or am I doing something wrong ?

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Dockerfile
TensorFlow Serving installed from (source or binary): No
TensorFlow Serving version: 2.13.1-gpu

Describe the problem

Hi, I'm trying to run XLA compiled models via Tensorflow Serving, however it seems to not work for me.

Here is the notebook I used to create XLA/AMP compiled SavedModel of very simple classifiers like ResNet50
https://github.com/kmkolasinski/triton-saved-model/blob/main/notebooks/export-classifier.ipynb

When running the TFServing server I can see following warning in the console

tf_serving_server-1  | 2024-04-07 07:57:23.842168: W external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at xla_ops.cc:503 : UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND: could not find registered compiler for platform CUDA -- was support for that platform linked in?

I get similar message on the client side

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNIMPLEMENTED
	details = "2 root error(s) found.
  (0) UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND: could not find registered compiler for platform CUDA -- was support for that platform linked in?
	 [[{{function_node __inference_predict_images_90382}}{{node StatefulPartitionedCall}}]]
	 [[cluster_6_1/merge_oidx_0/_655]]
  (1) UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND: could not find registered compiler for platform CUDA -- was support for that platform linked in?
	 [[{{function_node __inference_predict_images_90382}}{{node StatefulPartitionedCall}}]]
0 successful operations.
0 derived errors ignored."
	debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B::1%5D:8500 {grpc_message:"2 root error(s) found.\n  (0) UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND: could not find registered compiler for platform CUDA -- was support for that platform linked in?\n\t [[{{function_node __inference_predict_images_90382}}{{node StatefulPartitionedCall}}]]\n\t [[cluster_6_1/merge_oidx_0/_655]]\n  (1) UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND: could not find registered compiler for platform CUDA -- was support for that platform linked in?\n\t [[{{function_node __inference_predict_images_90382}}{{node StatefulPartitionedCall}}]]\n0 successful operations.\n0 derived errors ignored.", grpc_status:12, created_time:"2024-04-07T09:57:23.84265796+02:00"}"
>

Exact Steps to Reproduce

You can find my repo where I compare Triton Server (python backend) with TFServing here: https://github.com/kmkolasinski/triton-saved-model. In the notebooks directory you will find

a notebook which created SavedModel (with XLA and without)
another notebook which runs predictions

@dataclass(frozen=True)
class TFServingGRPCClient:
    model_name: str
    url: str = "localhost:8500"

    def predict(
        self, signature: str, *, images: np.ndarray, timeout: int = 60
    ) -> predict_pb2.PredictResponse:

        options = [("grpc.max_receive_message_length", 1 << 30)]
        channel = grpc.insecure_channel(self.url, options=options)
        stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

        grpc_request = predict_pb2.PredictRequest()
        grpc_request.model_spec.name = self.model_name
        grpc_request.model_spec.signature_name = signature

        grpc_request.inputs["images"].CopyFrom(tf.make_tensor_proto(images))

        predict_response = stub.Predict(grpc_request, timeout)
        channel.close()
        return predict_response

images = tf.random.uniform((100, 224, 224, 3), minval=0, maxval=255, dtype=tf.int32)
images = tf.cast(images, tf.uint8)

client = TFServingGRPCClient("resnet50-no-opt")
result = client.predict("predict_images", images=images)
# works correcltly

client = TFServingGRPCClient("resnet50-xla-amp")
result = client.predict("predict_images", images=images)
# fails with UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND:

Is this expected behavior ? I am aware of this flag

 --xla_cpu_compilation_enabled=false     bool    EXPERIMENTAL; CAN BE REMOVED ANYTIME! Enable XLA:CPU JIT (default is disabled). With XLA:CPU JIT disabled, models utilizing this feature will return bad Status on first compilation request.

however, I was not able to find any reasonable resources on how to use it, to test my case.

The text was updated successfully, but these errors were encountered:

singhniraj08 · 2024-04-22T07:24:39Z

@kmkolasinski, --xla_cpu_compilation_enabled=true parameter should be passed as additional argument to enable XLA:CPU JIT (default is disabled).
Can you try creating a TF Serving docker container with additional parameters as shown in example and see if model inferencing works. Thank you!

kmkolasinski · 2024-04-22T08:02:35Z

Hi @singhniraj08, yes I tried this appraoch, please check out my docker (at CMD command) which I used for testing: https://github.com/kmkolasinski/triton-saved-model/blob/main/tf_serving/Dockerfile

Here is my docker compose which I used to run serving: https://github.com/kmkolasinski/triton-saved-model/blob/main/docker-compose.yml

# Firstly, use https://github.com/kmkolasinski/triton-saved-model/blob/main/notebooks%2Fexport-classifier.ipynb to export various classifiers
docker compose up tf_serving_server

I prepared this repository which reproduces this issue https://github.com/kmkolasinski/triton-saved-model/tree/main

kmkolasinski · 2024-04-29T05:05:34Z

Hi @YanghuaHuang did you have time to take a look at this issue ?

YanghuaHuang · 2024-05-10T15:29:01Z

Sorry for the late reply. Assign to @guanxinq to triage, who has better knowledge on this.

kmkolasinski · 2024-05-10T16:45:10Z

Hi @YanghuaHuang thanks, I just wonder whether we can use XLA compiled models in TF Serving or not. If yes, how we can achieve it as I couldn't find any information about this.

YanghuaHuang · 2024-05-10T17:07:25Z

I think tfs does support XLA CPU but not GPU. But I could be wrong.

@gharibian Hey Dero, can you help on this? Thanks!

kmkolasinski · 2024-05-10T17:18:35Z

Thanks for the answer. If this is a truth, the message

Could not find compiler for platform CUDA: NOT_FOUND

makes a perfect sense to me now and that's a pitty. I assumed that TF Serving is using the same C++ backend to run SavedModel graph as TF libraries, so any SavedModel which I can run via python code I can also run via TF Serving. Let's wait for the confirmation from @gharibian .

google-ml-butler bot added the type:bug label Apr 7, 2024

singhniraj08 self-assigned this Apr 22, 2024

singhniraj08 added the stat:awaiting response label Apr 22, 2024

singhniraj08 mentioned this issue Apr 22, 2024

OP_REQUIRES failed at xla_compile_on_demand_op.cc:290 : UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND #2217

Open

google-ml-butler bot removed the stat:awaiting response label Apr 22, 2024

singhniraj08 assigned YanghuaHuang and unassigned singhniraj08 Apr 22, 2024

singhniraj08 added the stat:awaiting tensorflower label Apr 22, 2024

YanghuaHuang assigned guanxinq and unassigned YanghuaHuang May 10, 2024

YanghuaHuang assigned gharibian and YanghuaHuang May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OP_REQUIRES failed at xla_ops : UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND #2214

OP_REQUIRES failed at xla_ops : UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND #2214

kmkolasinski commented Apr 7, 2024

singhniraj08 commented Apr 22, 2024

kmkolasinski commented Apr 22, 2024

kmkolasinski commented Apr 29, 2024

YanghuaHuang commented May 10, 2024

kmkolasinski commented May 10, 2024

YanghuaHuang commented May 10, 2024

kmkolasinski commented May 10, 2024

OP_REQUIRES failed at xla_ops : UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND #2214

OP_REQUIRES failed at xla_ops : UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND #2214

Comments

kmkolasinski commented Apr 7, 2024

Bug Report

System information

Describe the problem

Exact Steps to Reproduce

singhniraj08 commented Apr 22, 2024

kmkolasinski commented Apr 22, 2024

kmkolasinski commented Apr 29, 2024

YanghuaHuang commented May 10, 2024

kmkolasinski commented May 10, 2024

YanghuaHuang commented May 10, 2024

kmkolasinski commented May 10, 2024