Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: "request body is too large" when passing numpy array #521

Closed
decadance-dance opened this issue May 1, 2024 · 11 comments
Closed

bug: "request body is too large" when passing numpy array #521

decadance-dance opened this issue May 1, 2024 · 11 comments
Labels
bug Something isn't working

Comments

@decadance-dance
Copy link

Describe the bug

I want to test my mosec server in tensor-in, tensor-out manner.
But when I try to send an array as content, I get 413 b'request body is too large'.
To serialize a numpy array I use pickle.

To Reproduce

Client

from http import HTTPStatus

import httpx
import msgpack  # type: ignore
import numpy as np
import numpy.typing as npt
import pickle


def pickle_serialize_numpy(arr: npt.NDArray) -> list[bytes]:
    bufs = []

    def callback(buf):
        bufs.append(buf)

    pickled = pickle.dumps(arr, 5, buffer_callback=callback)
    return [pickled] + [bytes(buf) for buf in bufs]

def pickle_deserialize_numpy(data: list[bytes]) -> npt.NDArray:
    return pickle.loads(data[0], buffers=data[1:])


inputs = np.zeros((1, 3, 1024, 1024))
inputs = pickle_serialize_numpy(inputs)

prediction = httpx.post(
    "http://127.0.0.1:8000/inference",
    content=msgpack.packb({"batch": inputs}),
)
if prediction.status_code == HTTPStatus.OK:
    print(msgpack.unpackb(prediction.content))
else:
    print(prediction.status_code, prediction.content)

Server

from io import BytesIO
from typing import List
from urllib.request import urlretrieve

import numpy as np  # type: ignore
from PIL import Image  # type: ignore

import onnxruntime as ort

from mosec import Server, ValidationError, Worker, get_logger
from mosec.mixin import MsgpackMixin

logger = get_logger()

INFERENCE_BATCH_SIZE = 16


class Model(Worker):
    """Sample Inference worker"""

    def __init__(self):
       # You can specify any model here because I even cannot reach the server side.
        super().__init__()
        self.model = ort.InferenceSession("...") 

    def forward(self, batch: np.ndarray) -> np.ndarray: 
        # You can specify any model here because I even cannot reach the server side.
        logits = self.model.run(None, {'input': batch})[0]
        prob_maps = np.transpose(1/(1 + np.exp(-logits)), (0, 2, 3, 1))  # Sigmoid func
        return prob_maps

if __name__ == "__main__":
    server = Server()
    server.append_worker(Model, num=2, max_batch_size=INFERENCE_BATCH_SIZE)
    server.run()

Expected behavior

No response

The mosec version

Name: mosec
Version: 0.8.4
Summary: Model Serving made Efficient in the Cloud
Home-page: https://github.com/mosecorg/mosec
Author: Keming Yang
Author-email: Keming kemingy94@gmail.com, Zichen lkevinzc@gmail.com
License: Apache-2.0
Location: /home/dmytrodronov/miniconda3/envs/ocr2/lib/python3.10/site-packages
Requires:
Required-by:

Additional context

No response

@decadance-dance decadance-dance added the bug Something isn't working label May 1, 2024
@kemingy
Copy link
Member

kemingy commented May 2, 2024

I think it's related to the default max body size: https://docs.rs/axum/latest/axum/extract/struct.DefaultBodyLimit.html#method.max

By default, it's 2MB. This is usually enough for a normal web application. But may not be sufficient for an ML image process service. I think we can increase it to 5MB.

@kemingy
Copy link
Member

kemingy commented May 2, 2024

I re-check the code and I found that the size limit is 10MiB which should be sufficient.

const DEFAULT_MAX_REQUEST_SIZE: usize = 10 * 1024 * 1024; // 10MB

As for your test code, you should specify the dtype since numpy will use float64 by default. Image data can use np.uint8.

Feel free to re-open this issue if you still have any questions.

@kemingy kemingy closed this as completed May 2, 2024
@decadance-dance
Copy link
Author

Indeed, a dtype affects on size but it dosen't solve my issue in sclae, because when I want to send more then one image / numpy array I still face this limitation.
For example inputs = np.zeros((10, 3, 1024, 1024), np.uint8)
It'd be great if you add an ability to change that parameter when running a server.

@decadance-dance
Copy link
Author

BTW, how can I re-open the issue? I don't see a "Reopen" button under the comment field.

@kemingy
Copy link
Member

kemingy commented May 2, 2024

Indeed, a dtype affects on size but it dosen't solve my issue in sclae, because when I want to send more then one image / numpy array I still face this limitation. For example inputs = np.zeros((10, 3, 1024, 1024), np.uint8) It'd be great if you add an ability to change that parameter when running a server.

You should send multiple images (async or multi-thread) instead of one large batch of images to fully utilize dynamic batching. 10MiB is actually very large for most of the use cases.

@kemingy kemingy reopened this May 2, 2024
@decadance-dance
Copy link
Author

@kemingy got you, thanks

@decadance-dance
Copy link
Author

I noticed that I get the "request body is too large" error even when actual file size is smaller than 10MB.
It's kind of blocker for me now, because I can't work with 50% of my images. I hope we will figure out why this happens.
Please, try to reproduce my issue using my code and data.

server.py

import pickle
from predictor import Predictor, BboxPredictor
from mosec import Server, Worker, Runtime
from mosec.errors import DecodingError, EncodingError
from typing import Any
import numpy as np


BATCH_SIZE = 2


class PickleMixin:

    def serialize(self, data: np.ndarray) -> bytes:
        try:
            data_bytes = pickle.dumps(data, fix_imports=False, protocol=pickle.HIGHEST_PROTOCOL)
        except Exception as err:
            raise EncodingError from err
        return data_bytes  # type: ignore

    def deserialize(self, data: bytearray) -> Any:
        try:

            data_msg = pickle.loads(data, fix_imports=False)
        except Exception as err:
            raise DecodingError from err
        return data_msg


class HeatmapWorker(Predictor, PickleMixin, Worker):
    
    def __init__(self) -> None:
        super().__init__(
            model_path="model.onnx",
            batch_size=BATCH_SIZE
        )

class BboxWorker(BboxPredictor, PickleMixin, Worker):
    def __init__(self) -> None:
        super().__init__()


if __name__ == "__main__":
    server = Server()
    heatmap_runtime = Runtime(
        HeatmapWorker, 
        num=4, 
        max_batch_size=BATCH_SIZE, 
        timeout=10
    )
    bboxes_runtime = Runtime(
        BboxWorker,
        num=4, 
        max_batch_size=BATCH_SIZE, 
        timeout=10
    )
    server.register_runtime(
        {
            "/heatmap": [heatmap_runtime],
            "/bboxes": [bboxes_runtime],
        }
    )
    server.run()

client.py

from http import HTTPStatus
import os
import cv2
import httpx
import pickle 


def pickle_serialize(data) -> bytes:
    data_bytes = pickle.dumps(data, fix_imports=False, protocol=pickle.HIGHEST_PROTOCOL)
    return data_bytes  # type: ignore


def pickle_deserialize(data):
    return pickle.loads(data, fix_imports=False)


img_path = "path/to/my/image"
print(f"Image size: {os.path.getsize(img_path) / (1024 * 1024):.4}MB")

img = cv2.imread(img_path)

heatmap_prediction = httpx.post(
    "http://127.0.0.1:8000/heatmap",
    content=pickle_serialize(img),
    timeout=None,
)
if heatmap_prediction.status_code != HTTPStatus.OK:
    raise Exception(heatmap_prediction.content)

heatmap = pickle_deserialize(heatmap_prediction.content)

model

google drive

images

request_body_too_large_images.zip

@kemingy
Copy link
Member

kemingy commented May 10, 2024

This is because the images are JPEG format which means they are compressed. The compress ratio is about 10:1 to 20:1. If you read it as numpy array then it's about 10x larger than the JPEG format.

I suggest you send the bytes of the JPEG and de-compress it on the server side.

@decadance-dance
Copy link
Author

@kemingy, unfortunately, your suggestion doesn't fit to my real application because I have multiple modules which precede. where I preprocess image. So I just cannot operate JPEG bytes. Of course, I could save the processed images to disk and load them as JPEG bytes, but this looks like a workaround. I don't want to go that way.
I really ask you to add the ability to specify the --max_req_size parameter like you did for --timeout when starting a server.

@kemingy
Copy link
Member

kemingy commented May 10, 2024 via email

@decadance-dance
Copy link
Author

@kemingy I am just asking to add such parameter as optional for guys (like me) who need to work with larger objects. In this case, the responsibility for changing this parameter other than the default (10 MB) rests with the user. So I don't ask to increase max body size by default and keep it for each user but give the ability to change it at one's own risk.

I would make these changes to the code myself, but I'm not a Rust specialist so it might take me a long time.
Hope for understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants