Replies: 21 comments
-
@fat-tire Thanks for trying this out! I tried this basic local-chat example with mistral after doing And it runs fine. That script also has instructions on how to set I haven't yet tried |
Beta Was this translation helpful? Give feedback.
-
Okay let me have a play with that example to see if I can make anything improve. I tried several mixtral 8x7B local models, including gguf and bpw quantized versions. Also, I don't run it on a mac-- it's running oobabooga's API (which I think is now openai compatible by default-- at one point there were two APIs, a native and a openai one, but it's one api now). It's not running on the same container, but it is running on the same machine and it's accessed via a |
Beta Was this translation helpful? Give feedback.
-
If the script works with some models but not others, it's an indication that the langroid "pipes" are fine, and the problem lies in the LLM setup, e.g. the chat-prompt formatting could be an issue. |
Beta Was this translation helpful? Give feedback.
-
The code looks for "local/" not just "local" so this shouldn't have an effect. Also, if your model is listening at |
Beta Was this translation helpful? Give feedback.
-
Yeah, the only difference that I had considered is maybe there is something wrong with the template formatting such that the prompt wasn't being delivered properly. The weird thing is that this works fine:
It's only with the introduction of the agent that it responds as if I hadn't asked it anything at all, with a totally random response. So I thought maybe somewhere along the line maybe something wasn't parsed right-- I just don't know if that's by langroid or on the server.. It's weird because my testing worked fine with a regular 7B model. I thought maybe there was a difference in the fine-tuning that has to do with an unexpected or different templating/formatting so that the prompt gets lost somewhere. Note that with the regular ooba api docs I am able to specify a couple things like Ah, okay--let me try changing the "http://" to "local/". I'm not sure if it will make a difference, but who knows... back in a few. |
Beta Was this translation helpful? Give feedback.
-
Sorry, if I was unclear-- I was referencing this bit: if chat_model.startswith("litellm") or chat_model.startswith("local"):
local_model = True not here: elif self.config.chat_model.startswith("local/"): Anyway let me give this a shot with local/192.168.etcetc |
Beta Was this translation helpful? Give feedback.
-
Ah yes those need to be changed to have "/" at the end, in the next PR. So just set The onus is generally on whichever library is creating a chat endpoint for the LLM, to automatically insert the requisite dialog-turn delimiters between system, assistant, user, etc. I would assume ooba is doing it, but maybe they haven't done it well with this model. Langroid itself has a general |
Beta Was this translation helpful? Give feedback.
-
Okay, update! As a test, I'm using this model: https://huggingface.co/TheBloke/Starling-LM-alpha-8x7B-MoE-GGUF -- it's based on Mistral's MoE model. Here's my simplified llmconfig, which uses the "local/#.#.#.#:5000/v2" formulation as you recommended, and is assigned this time to my_llm_config = MyLLMConfig(
chat_context_length=2048, # adjust based on model
api_key=api_key,
litellm = False, # use litellm api?
max_output_tokens= 2048,
min_output_tokens = 64,
chat_model=llm_url,
timeout=60,
seed=random.randint(0,9999999),
cache_config=RedisCacheConfig(fake=True) # get rid of annoying warning
) So now the agent responds correctly with: agent = ChatAgent(agent_config)
response = agent.llm_response("Is New York in America?") It responds correctly that yes, new york is a state. Unfortunately, when I try the two-agent chat (adding the numbers together), i'm getting some weird timeout issues, but it does appear to work eventually, and I see some communication between agents now. It still isn't following the prompts perfectly. But at least it sees them! 😄 Thanks for the help! A couple quick thoughts/suggestions:
Again, I just want to stress how absolutely cool and fun this project is-- I can easily see a future of pre-written agents and tasks that you can download and snap together to do all kinds of cool tasks. A modular node-based graphical system a la Blender or invokeai or comfyui to follow? heh. |
Beta Was this translation helpful? Give feedback.
-
Just to chime in, I too would like to express how fun it feels using this project to tinker with agents. I'd also appreciate a better explanation of |
Beta Was this translation helpful? Give feedback.
-
Cool 👍
Yeah, when connecting to the text-generation-webui server I guess you need to specify the ChatCompletionRequestParams like instruction_template and mode (which usually "chat" or "instruct"). I'm not sure if langroid would need to somehow set that to make sure it's in the right mode but I haven't looked too carefully at the api. |
Beta Was this translation helpful? Give feedback.
-
@fat-tire @tozimaru Thank you for all the feedback. I will take all of this into account and rationalize some of the local-model setups and write an updated doc page on that. Meanwhile I will point to a couple places that may be helpful, specifically for multi-agent task workflow design:
This arg globally overrides the
|
Beta Was this translation helpful? Give feedback.
-
Langroid doesn't have these; it simply assumes the endpoint is OpenAI-compatible and that the chat-formatting is handled by the endpoint. |
Beta Was this translation helpful? Give feedback.
-
Great information, thank you! I'll be looking forward to the updated docs! Maybe a page with some kind of flowchart or lifecycle or whatever you call it showing how the "hot potato" gets passed from one agent to another in a task workflow-- like when an agent passes to another agent, who does the agent thinks its talking to (eg in the two-agent example, the Student agent thinks its talking to the User, who then actually passes its output to the Adder agent instead who replies as a proxy for the User-- the Student is unaware that the Adder exists at all) and explain things like under what circumstances a "DO-NOT-KNOW" is sent, how it's handled, etc. Oh, and how "DONE" is a trigger word, which I've discovered if it's said accidentally (Agent: "So, to summarize your instructions, I will say "DONE" when I'm finished.") will insta-end the task. Re the OobaBooga endpoint configuration- I guess that will have to either be pre-set up on the command line when starting the server or maybe via the OobaBooga API completely separate from langroid. Unrelated Question-- does a Task always run synchronously? Could a "delegate" agent theoretically fire off a bunch of agents to do various things simultaneously, then either wait for them to report back, or, if ten of them were attempting different methods to achieve a single goal, maybe wait only for the first one that succeeds to return, then abort the other 9 and continue along? (I know this would put a big load on the LLM so probably wouldn't want to do it on your PC, and there'd be notions of LLM "thread safety" on Tasks that would have to be considered, but anyhoo) just curious if this was a thing. Again, thank you so much for the pioneering effort here! All this stuff-- these concepts, terms, and workflows-- will one day be obvious, clear, standardized, and easily accessible to everyone, so it's really fun to see it develop. Terrific stuff. |
Beta Was this translation helpful? Give feedback.
-
Ah yes @nilspalumbo is working on an async task spawning, glad to see interest in that
Thank you for the interest! I'm thinking of putting down a definitive "Laws of Langroid" doc, stay tuned -- it will address what is a step, what is a valid response, when is a task done, what is the result of a task, when is a responder eligible to respond, etc. All of these are in the code but there's a real need to bring it out conceptually, and also show diagrammatically how each step evolves. |
Beta Was this translation helpful? Give feedback.
-
In case you didn't see it, there are logs generated by every task run, lightly documented here: |
Beta Was this translation helpful? Give feedback.
-
Yes, I did look at the logs, thank you-- the .log file was blank, and the .tsv file looked similar to the regular colored output as far as I could tell. The formatting of the .tsv was especially nice though, but a deep-dive explanation of the fields would be great. I didn't mention, but I have been running everything on a (regular/non collab) jupyter notebook-- and it works nicely, including the real-time streaming responses, the color output, etc. Thanks! |
Beta Was this translation helpful? Give feedback.
-
Nice to know it shows nicely in notebooks... I generally avoid notebooks so haven't extensively tested on them. |
Beta Was this translation helpful? Give feedback.
-
@fat-tire I realized I could migrate the issue into Discussions so I moved it here instead of closing it. It's nice to have it here since there is a bunch of great feedback here. Thank you for taking to write all of it down. |
Beta Was this translation helpful? Give feedback.
-
My pleasure. Let me know if I can be helpful in reviewing docs or whatever. Happy to help if/when I'm able. Cheers! |
Beta Was this translation helpful? Give feedback.
-
Quick update-- Unlike the Mixtral models I tried previously, this fine tune of mixtral 8x7B Mixture of Experts model supports the ChatML/OpenAI instruct template, and system prompts. Still not working 100% at following the prompt as I'd hope yet, but the gguf quantized versions are available. |
Beta Was this translation helpful? Give feedback.
-
Thanks for this update. I will see if I can run it on my M1 Max Pro 64GB with ollama. |
Beta Was this translation helpful? Give feedback.
-
Hey there!
So I'm playing with the example scripts from the docs, specifically the two-agent collaboration example and have run into a problem with Mixtral instruct v1 based models using the Oobabooga text-generation-ui server.
The problem is when the agents are set up, for whatever reason, the prompt doesn't seem to make it to the LLM.
Here's how I set up the llm.
llm_url
is set to an http link to the /v1 endpoint at port 5000 andapi_key
is set to "sk-111111111111111111111111111111111111111111111111
", which is how Oooba likes it. Then, per the example, I did this:At this point, the following works fine:
RESPONSE: Yes, New York is a state in the United States of America.
Great. So let's try it with multiple messages
RESPONSE: Yes, New York is a state in the United States of America.
However, setting it up with an agent, like this:
Results in a very long ramble on random topics (how to use python, some long paragraph in french, etc.) that is completely unrelated to the prompt and appears to be what happens when no prompt makes it to the llm. It's processing a blank prompt, I suspect, and just spewing randomness.
Similarly, trying it with a Task:
This also results in total garbage out.
Again, using a non MoE Mistral appears to work (although it didn't quite follow the prompts very well, which is why I was hoping mixtral would work better), but it doesn't seem to receive it through an agent. With Mixtral alone, it's prompt in, but garbage out.
Anyone else experiencing this?
Without examining the code in too much detail, I wonder why would the prompt make it to the LLM directly but not via an agent? Does this maybe have something to do with the instruction template setting or something?
I tried playing with various settings in the MyLLMConfig, some of which you can see above, but nothing seemed to work. Also tried changing instruction templates on Oobabooga itself, but no dice. I also tried moving the prompts from system_message to user_message, from the task to the agent... but it wouldn't "take".
Any thoughts? Why would using an agent "block" the prompt? 🤔
Using langroid v0.1.157 w/litellm FWIW.
Thanks - this looks like a fun and interesting project!
Beta Was this translation helpful? Give feedback.
All reactions