Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VLLM is stuck on Outlines 0.0.34 and this sample requires 0.0.37 #28

Open
ProVega opened this issue Apr 10, 2024 · 1 comment
Open

VLLM is stuck on Outlines 0.0.34 and this sample requires 0.0.37 #28

ProVega opened this issue Apr 10, 2024 · 1 comment

Comments

@ProVega
Copy link

ProVega commented Apr 10, 2024

I am relatively new to VLLM and BentoML, but trying to get this to work fails with a range of issues.

INFO: pip is looking at multiple versions of vllm to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install outlines==0.0.37 and vllm==0.4.0.post1 because these package versions have conflicting dependencies.

The conflict is caused by:
The user requested outlines==0.0.37
vllm 0.4.0.post1 depends on outlines==0.0.34

To fix this you could try to:

  1. loosen the range of package versions you've specified
  2. remove package versions to allow pip attempt to solve the dependency conflict

I then try to adapt the sample to use 0.0.34, updating the service.py as follows:

@bentoml.api
async def adapted(
    self,
    prompt: str = DEFAULT_USER_PROMPT,
    max_tokens: Annotated[int, Ge(128), Le(MAX_TOKENS)] = MAX_TOKENS,
    json_schema: t.Optional[str] = DEFAULT_SCHEMA,
) -> AsyncGenerator[str, None]:
    from vllm import SamplingParams
    **from vllm.model_executor.guided_logits_processors import JSONLogitsProcessor**


    SAMPLING_PARAM = SamplingParams(
            max_tokens=max_tokens,
            **logits_processors=[JSONLogitsProcessor(json_schema, self.engine.engine)]**
    )



    prompt = PROMPT_TEMPLATE.format(user_prompt=prompt)
    stream = await self.engine.add_request(uuid.uuid4().hex, prompt, SAMPLING_PARAM)

    # Standard Stuff
    cursor = 0
    async for request_output in stream:
        text = request_output.outputs[0].text
        yield text[cursor:]
        cursor = len(text)

But then I get this error:
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/home/aaron/.local/lib/python3.10/site-packages/starlette/responses.py", line 261, in wrap
| await func()
| File "/home/aaron/.local/lib/python3.10/site-packages/starlette/responses.py", line 250, in stream_response
| async for chunk in self.body_iterator:
| File "/home/aaron/.local/lib/python3.10/site-packages/_bentoml_sdk/io_models.py", line 183, in async_stream
| async for item in obj:
| File "/home/aaron/BentoVLLM/mistral-7b-instruct/service.py", line 96, in competitors
| logits_processors=[JSONLogitsProcessor(json_schema, self.engine.engine)]
| File "/home/aaron/.local/lib/python3.10/site-packages/vllm/model_executor/guided_logits_processors.py", line 154, in init
| super().init(regex_string, tokenizer)
| File "/home/aaron/.local/lib/python3.10/site-packages/vllm/model_executor/guided_logits_processors.py", line 116, in init
| tokenizer = self.adapt_tokenizer(tokenizer)
| File "/home/aaron/.local/lib/python3.10/site-packages/vllm/model_executor/guided_logits_processors.py", line 44, in adapt_tokenizer
| tokenizer.vocabulary = tokenizer.get_vocab()
| AttributeError: '_AsyncLLMEngine' object has no attribute 'get_vocab'

Any help is appreciated.

@larme
Copy link
Member

larme commented Apr 11, 2024

Hi @ProVega, thanks for the feedback.

Unfortunately, vLLM 0.4 lock outlines==0.0.34 currently while our outlines integration example depends on new features in outlines==0.0.37.

The good news is that vLLM 0.4 implemented its own guided_json in OpenAI-compatible endpoints. So If you just want guided_json format in the OpenAI-compatible endpoints, you can run examples like mistra-7b-instruct/ directly and use OpenAI client to query the endpoints. An example is shown in this pr.

However if you want to customize BentoML's /generate endpoint like how outlines-integration/ works, you need to stick with vllm==0.3.3 for now. I think vLLM may release a newer version which is compatible with outlines==0.0.37 later. We will update our examples to have both features then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants