VLLM is stuck on Outlines 0.0.34 and this sample requires 0.0.37 #28

ProVega · 2024-04-10T05:15:18Z

I am relatively new to VLLM and BentoML, but trying to get this to work fails with a range of issues.

INFO: pip is looking at multiple versions of vllm to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install outlines==0.0.37 and vllm==0.4.0.post1 because these package versions have conflicting dependencies.

The conflict is caused by:
The user requested outlines==0.0.37
vllm 0.4.0.post1 depends on outlines==0.0.34

To fix this you could try to:

loosen the range of package versions you've specified
remove package versions to allow pip attempt to solve the dependency conflict

I then try to adapt the sample to use 0.0.34, updating the service.py as follows:

@bentoml.api
async def adapted(
    self,
    prompt: str = DEFAULT_USER_PROMPT,
    max_tokens: Annotated[int, Ge(128), Le(MAX_TOKENS)] = MAX_TOKENS,
    json_schema: t.Optional[str] = DEFAULT_SCHEMA,
) -> AsyncGenerator[str, None]:
    from vllm import SamplingParams
    **from vllm.model_executor.guided_logits_processors import JSONLogitsProcessor**


    SAMPLING_PARAM = SamplingParams(
            max_tokens=max_tokens,
            **logits_processors=[JSONLogitsProcessor(json_schema, self.engine.engine)]**
    )



    prompt = PROMPT_TEMPLATE.format(user_prompt=prompt)
    stream = await self.engine.add_request(uuid.uuid4().hex, prompt, SAMPLING_PARAM)

    # Standard Stuff
    cursor = 0
    async for request_output in stream:
        text = request_output.outputs[0].text
        yield text[cursor:]
        cursor = len(text)

But then I get this error:
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/home/aaron/.local/lib/python3.10/site-packages/starlette/responses.py", line 261, in wrap
| await func()
| File "/home/aaron/.local/lib/python3.10/site-packages/starlette/responses.py", line 250, in stream_response
| async for chunk in self.body_iterator:
| File "/home/aaron/.local/lib/python3.10/site-packages/_bentoml_sdk/io_models.py", line 183, in async_stream
| async for item in obj:
| File "/home/aaron/BentoVLLM/mistral-7b-instruct/service.py", line 96, in competitors
| logits_processors=[JSONLogitsProcessor(json_schema, self.engine.engine)]
| File "/home/aaron/.local/lib/python3.10/site-packages/vllm/model_executor/guided_logits_processors.py", line 154, in init
| super().init(regex_string, tokenizer)
| File "/home/aaron/.local/lib/python3.10/site-packages/vllm/model_executor/guided_logits_processors.py", line 116, in init
| tokenizer = self.adapt_tokenizer(tokenizer)
| File "/home/aaron/.local/lib/python3.10/site-packages/vllm/model_executor/guided_logits_processors.py", line 44, in adapt_tokenizer
| tokenizer.vocabulary = tokenizer.get_vocab()
| AttributeError: '_AsyncLLMEngine' object has no attribute 'get_vocab'

Any help is appreciated.

The text was updated successfully, but these errors were encountered:

larme · 2024-04-11T09:16:53Z

Hi @ProVega, thanks for the feedback.

Unfortunately, vLLM 0.4 lock outlines==0.0.34 currently while our outlines integration example depends on new features in outlines==0.0.37.

The good news is that vLLM 0.4 implemented its own guided_json in OpenAI-compatible endpoints. So If you just want guided_json format in the OpenAI-compatible endpoints, you can run examples like mistra-7b-instruct/ directly and use OpenAI client to query the endpoints. An example is shown in this pr.

However if you want to customize BentoML's /generate endpoint like how outlines-integration/ works, you need to stick with vllm==0.3.3 for now. I think vLLM may release a newer version which is compatible with outlines==0.0.37 later. We will update our examples to have both features then.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VLLM is stuck on Outlines 0.0.34 and this sample requires 0.0.37 #28

VLLM is stuck on Outlines 0.0.34 and this sample requires 0.0.37 #28

ProVega commented Apr 10, 2024

larme commented Apr 11, 2024 •

edited

VLLM is stuck on Outlines 0.0.34 and this sample requires 0.0.37 #28

VLLM is stuck on Outlines 0.0.34 and this sample requires 0.0.37 #28

Comments

ProVega commented Apr 10, 2024

larme commented Apr 11, 2024 • edited

larme commented Apr 11, 2024 •

edited