You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am relatively new to VLLM and BentoML, but trying to get this to work fails with a range of issues.
INFO: pip is looking at multiple versions of vllm to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install outlines==0.0.37 and vllm==0.4.0.post1 because these package versions have conflicting dependencies.
The conflict is caused by:
The user requested outlines==0.0.37
vllm 0.4.0.post1 depends on outlines==0.0.34
To fix this you could try to:
loosen the range of package versions you've specified
remove package versions to allow pip attempt to solve the dependency conflict
I then try to adapt the sample to use 0.0.34, updating the service.py as follows:
But then I get this error:
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/home/aaron/.local/lib/python3.10/site-packages/starlette/responses.py", line 261, in wrap
| await func()
| File "/home/aaron/.local/lib/python3.10/site-packages/starlette/responses.py", line 250, in stream_response
| async for chunk in self.body_iterator:
| File "/home/aaron/.local/lib/python3.10/site-packages/_bentoml_sdk/io_models.py", line 183, in async_stream
| async for item in obj:
| File "/home/aaron/BentoVLLM/mistral-7b-instruct/service.py", line 96, in competitors
| logits_processors=[JSONLogitsProcessor(json_schema, self.engine.engine)]
| File "/home/aaron/.local/lib/python3.10/site-packages/vllm/model_executor/guided_logits_processors.py", line 154, in init
| super().init(regex_string, tokenizer)
| File "/home/aaron/.local/lib/python3.10/site-packages/vllm/model_executor/guided_logits_processors.py", line 116, in init
| tokenizer = self.adapt_tokenizer(tokenizer)
| File "/home/aaron/.local/lib/python3.10/site-packages/vllm/model_executor/guided_logits_processors.py", line 44, in adapt_tokenizer
| tokenizer.vocabulary = tokenizer.get_vocab()
| AttributeError: '_AsyncLLMEngine' object has no attribute 'get_vocab'
Any help is appreciated.
The text was updated successfully, but these errors were encountered:
Unfortunately, vLLM 0.4 lock outlines==0.0.34 currently while our outlines integration example depends on new features in outlines==0.0.37.
The good news is that vLLM 0.4 implemented its own guided_json in OpenAI-compatible endpoints. So If you just want guided_json format in the OpenAI-compatible endpoints, you can run examples like mistra-7b-instruct/ directly and use OpenAI client to query the endpoints. An example is shown in this pr.
However if you want to customize BentoML's /generate endpoint like how outlines-integration/ works, you need to stick with vllm==0.3.3 for now. I think vLLM may release a newer version which is compatible with outlines==0.0.37 later. We will update our examples to have both features then.
I am relatively new to VLLM and BentoML, but trying to get this to work fails with a range of issues.
INFO: pip is looking at multiple versions of vllm to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install outlines==0.0.37 and vllm==0.4.0.post1 because these package versions have conflicting dependencies.
The conflict is caused by:
The user requested outlines==0.0.37
vllm 0.4.0.post1 depends on outlines==0.0.34
To fix this you could try to:
I then try to adapt the sample to use 0.0.34, updating the service.py as follows:
But then I get this error:
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/home/aaron/.local/lib/python3.10/site-packages/starlette/responses.py", line 261, in wrap
| await func()
| File "/home/aaron/.local/lib/python3.10/site-packages/starlette/responses.py", line 250, in stream_response
| async for chunk in self.body_iterator:
| File "/home/aaron/.local/lib/python3.10/site-packages/_bentoml_sdk/io_models.py", line 183, in async_stream
| async for item in obj:
| File "/home/aaron/BentoVLLM/mistral-7b-instruct/service.py", line 96, in competitors
| logits_processors=[JSONLogitsProcessor(json_schema, self.engine.engine)]
| File "/home/aaron/.local/lib/python3.10/site-packages/vllm/model_executor/guided_logits_processors.py", line 154, in init
| super().init(regex_string, tokenizer)
| File "/home/aaron/.local/lib/python3.10/site-packages/vllm/model_executor/guided_logits_processors.py", line 116, in init
| tokenizer = self.adapt_tokenizer(tokenizer)
| File "/home/aaron/.local/lib/python3.10/site-packages/vllm/model_executor/guided_logits_processors.py", line 44, in adapt_tokenizer
| tokenizer.vocabulary = tokenizer.get_vocab()
| AttributeError: '_AsyncLLMEngine' object has no attribute 'get_vocab'
Any help is appreciated.
The text was updated successfully, but these errors were encountered: