bug: "request body is too large" when passing numpy array #521

decadance-dance · 2024-05-01T14:52:28Z

Describe the bug

I want to test my mosec server in tensor-in, tensor-out manner.
But when I try to send an array as content, I get 413 b'request body is too large'.
To serialize a numpy array I use pickle.

To Reproduce

Client

from http import HTTPStatus

import httpx
import msgpack  # type: ignore
import numpy as np
import numpy.typing as npt
import pickle


def pickle_serialize_numpy(arr: npt.NDArray) -> list[bytes]:
    bufs = []

    def callback(buf):
        bufs.append(buf)

    pickled = pickle.dumps(arr, 5, buffer_callback=callback)
    return [pickled] + [bytes(buf) for buf in bufs]

def pickle_deserialize_numpy(data: list[bytes]) -> npt.NDArray:
    return pickle.loads(data[0], buffers=data[1:])


inputs = np.zeros((1, 3, 1024, 1024))
inputs = pickle_serialize_numpy(inputs)

prediction = httpx.post(
    "http://127.0.0.1:8000/inference",
    content=msgpack.packb({"batch": inputs}),
)
if prediction.status_code == HTTPStatus.OK:
    print(msgpack.unpackb(prediction.content))
else:
    print(prediction.status_code, prediction.content)

Server

from io import BytesIO
from typing import List
from urllib.request import urlretrieve

import numpy as np  # type: ignore
from PIL import Image  # type: ignore

import onnxruntime as ort

from mosec import Server, ValidationError, Worker, get_logger
from mosec.mixin import MsgpackMixin

logger = get_logger()

INFERENCE_BATCH_SIZE = 16


class Model(Worker):
    """Sample Inference worker"""

    def __init__(self):
       # You can specify any model here because I even cannot reach the server side.
        super().__init__()
        self.model = ort.InferenceSession("...") 

    def forward(self, batch: np.ndarray) -> np.ndarray: 
        # You can specify any model here because I even cannot reach the server side.
        logits = self.model.run(None, {'input': batch})[0]
        prob_maps = np.transpose(1/(1 + np.exp(-logits)), (0, 2, 3, 1))  # Sigmoid func
        return prob_maps

if __name__ == "__main__":
    server = Server()
    server.append_worker(Model, num=2, max_batch_size=INFERENCE_BATCH_SIZE)
    server.run()

Expected behavior

No response

The mosec version

Name: mosec
Version: 0.8.4
Summary: Model Serving made Efficient in the Cloud
Home-page: https://github.com/mosecorg/mosec
Author: Keming Yang
Author-email: Keming kemingy94@gmail.com, Zichen lkevinzc@gmail.com
License: Apache-2.0
Location: /home/dmytrodronov/miniconda3/envs/ocr2/lib/python3.10/site-packages
Requires:
Required-by:

Additional context

No response

The text was updated successfully, but these errors were encountered:

kemingy · 2024-05-02T03:18:21Z

I think it's related to the default max body size: https://docs.rs/axum/latest/axum/extract/struct.DefaultBodyLimit.html#method.max

By default, it's 2MB. This is usually enough for a normal web application. But may not be sufficient for an ML image process service. I think we can increase it to 5MB.

kemingy · 2024-05-02T04:04:56Z

I re-check the code and I found that the size limit is 10MiB which should be sufficient.

mosec/src/routes.rs

Line 38 in 7fbdca0

const DEFAULT_MAX_REQUEST_SIZE: usize = 10 * 1024 * 1024; // 10MB

As for your test code, you should specify the dtype since numpy will use float64 by default. Image data can use np.uint8.

Feel free to re-open this issue if you still have any questions.

decadance-dance · 2024-05-02T06:39:46Z

Indeed, a dtype affects on size but it dosen't solve my issue in sclae, because when I want to send more then one image / numpy array I still face this limitation.
For example inputs = np.zeros((10, 3, 1024, 1024), np.uint8)
It'd be great if you add an ability to change that parameter when running a server.

decadance-dance · 2024-05-02T06:45:31Z

BTW, how can I re-open the issue? I don't see a "Reopen" button under the comment field.

kemingy · 2024-05-02T07:44:31Z

Indeed, a dtype affects on size but it dosen't solve my issue in sclae, because when I want to send more then one image / numpy array I still face this limitation. For example inputs = np.zeros((10, 3, 1024, 1024), np.uint8) It'd be great if you add an ability to change that parameter when running a server.

You should send multiple images (async or multi-thread) instead of one large batch of images to fully utilize dynamic batching. 10MiB is actually very large for most of the use cases.

decadance-dance · 2024-05-02T08:05:00Z

@kemingy got you, thanks

decadance-dance · 2024-05-09T10:58:14Z

I noticed that I get the "request body is too large" error even when actual file size is smaller than 10MB.
It's kind of blocker for me now, because I can't work with 50% of my images. I hope we will figure out why this happens.
Please, try to reproduce my issue using my code and data.

server.py

import pickle
from predictor import Predictor, BboxPredictor
from mosec import Server, Worker, Runtime
from mosec.errors import DecodingError, EncodingError
from typing import Any
import numpy as np


BATCH_SIZE = 2


class PickleMixin:

    def serialize(self, data: np.ndarray) -> bytes:
        try:
            data_bytes = pickle.dumps(data, fix_imports=False, protocol=pickle.HIGHEST_PROTOCOL)
        except Exception as err:
            raise EncodingError from err
        return data_bytes  # type: ignore

    def deserialize(self, data: bytearray) -> Any:
        try:

            data_msg = pickle.loads(data, fix_imports=False)
        except Exception as err:
            raise DecodingError from err
        return data_msg


class HeatmapWorker(Predictor, PickleMixin, Worker):
    
    def __init__(self) -> None:
        super().__init__(
            model_path="model.onnx",
            batch_size=BATCH_SIZE
        )

class BboxWorker(BboxPredictor, PickleMixin, Worker):
    def __init__(self) -> None:
        super().__init__()


if __name__ == "__main__":
    server = Server()
    heatmap_runtime = Runtime(
        HeatmapWorker, 
        num=4, 
        max_batch_size=BATCH_SIZE, 
        timeout=10
    )
    bboxes_runtime = Runtime(
        BboxWorker,
        num=4, 
        max_batch_size=BATCH_SIZE, 
        timeout=10
    )
    server.register_runtime(
        {
            "/heatmap": [heatmap_runtime],
            "/bboxes": [bboxes_runtime],
        }
    )
    server.run()

client.py

from http import HTTPStatus
import os
import cv2
import httpx
import pickle 


def pickle_serialize(data) -> bytes:
    data_bytes = pickle.dumps(data, fix_imports=False, protocol=pickle.HIGHEST_PROTOCOL)
    return data_bytes  # type: ignore


def pickle_deserialize(data):
    return pickle.loads(data, fix_imports=False)


img_path = "path/to/my/image"
print(f"Image size: {os.path.getsize(img_path) / (1024 * 1024):.4}MB")

img = cv2.imread(img_path)

heatmap_prediction = httpx.post(
    "http://127.0.0.1:8000/heatmap",
    content=pickle_serialize(img),
    timeout=None,
)
if heatmap_prediction.status_code != HTTPStatus.OK:
    raise Exception(heatmap_prediction.content)

heatmap = pickle_deserialize(heatmap_prediction.content)

model

google drive

images

request_body_too_large_images.zip

kemingy · 2024-05-10T01:46:10Z

This is because the images are JPEG format which means they are compressed. The compress ratio is about 10:1 to 20:1. If you read it as numpy array then it's about 10x larger than the JPEG format.

I suggest you send the bytes of the JPEG and de-compress it on the server side.

decadance-dance · 2024-05-10T07:10:02Z

@kemingy, unfortunately, your suggestion doesn't fit to my real application because I have multiple modules which precede. where I preprocess image. So I just cannot operate JPEG bytes. Of course, I could save the processed images to disk and load them as JPEG bytes, but this looks like a workaround. I don't want to go that way.
I really ask you to add the ability to specify the --max_req_size parameter like you did for --timeout when starting a server.

kemingy · 2024-05-10T09:03:32Z

You can load the jpeg from the request in memory with BytesIO https://docs.python.org/3/library/io.html just like reading from a local file. I don’t enable the max-body-size feature because if the request body exceeds 10MiB, the performance will be heavily affected. Of course you can change the code and compile it to see if it works well for you

…

On Fri, 10 May 2024 at 15:10, kllis ***@***.***> wrote: @kemingy <https://github.com/kemingy>, unfortunately, your suggestion doesn't fit to my real application because I have multiple modules which precede. where I preprocess image. So I just cannot operate JPEG bytes. Of course, I could save the processed images to disk and load them as JPEG bytes, but this looks like a workaround. I don't want to go that way. I really ask you to add the ability to specify the --max_req_size parameter like you did for --timeout when starting a server. — Reply to this email directly, view it on GitHub <#521 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADC7UXJODX7HTOZH3ZYSI6DZBRXGBAVCNFSM6AAAAABHCCTE7WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBUGAZDSNJQG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

decadance-dance · 2024-05-10T09:51:03Z

@kemingy I am just asking to add such parameter as optional for guys (like me) who need to work with larger objects. In this case, the responsibility for changing this parameter other than the default (10 MB) rests with the user. So I don't ask to increase max body size by default and keep it for each user but give the ability to change it at one's own risk.

I would make these changes to the code myself, but I'm not a Rust specialist so it might take me a long time.
Hope for understanding.

decadance-dance added the bug Something isn't working label May 1, 2024

kemingy closed this as completed May 2, 2024

kemingy reopened this May 2, 2024

decadance-dance closed this as completed May 2, 2024

decadance-dance reopened this May 9, 2024

decadance-dance closed this as completed May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: "request body is too large" when passing numpy array #521

bug: "request body is too large" when passing numpy array #521

decadance-dance commented May 1, 2024

kemingy commented May 2, 2024

kemingy commented May 2, 2024

decadance-dance commented May 2, 2024

decadance-dance commented May 2, 2024

kemingy commented May 2, 2024

decadance-dance commented May 2, 2024

decadance-dance commented May 9, 2024

kemingy commented May 10, 2024

decadance-dance commented May 10, 2024

kemingy commented May 10, 2024 via email

decadance-dance commented May 10, 2024

bug: "request body is too large" when passing numpy array #521

bug: "request body is too large" when passing numpy array #521

Comments

decadance-dance commented May 1, 2024

Describe the bug

To Reproduce

Client

Server

Expected behavior

The mosec version

Additional context

kemingy commented May 2, 2024

kemingy commented May 2, 2024

decadance-dance commented May 2, 2024

decadance-dance commented May 2, 2024

kemingy commented May 2, 2024

decadance-dance commented May 2, 2024

decadance-dance commented May 9, 2024

server.py

client.py

model

images

kemingy commented May 10, 2024

decadance-dance commented May 10, 2024

kemingy commented May 10, 2024 via email

decadance-dance commented May 10, 2024