Lit-LLM: High-Level API for LLMs

Lit-LLM implements an accessible API to operate with LLMs.

The design principle is to introduce the thinnest possible abstraction, and at the same time keep things simple and hackable.

The first implementation focuses on lit-gpt, but adding support for more is trivial.

Features

Current features include:

loading/downloading/converting models by specifying a string identifier (e.g. microsoft/phi-1_5)
preparing datasets with awareness of target models (tokenizer, etc)
finetuning with a single command
chatting with context
exposing OpenAI-compatible HTTP endpoints

Usage

Take a look at main.py for an example of finetuning and generation. The steps are as follows.

Load the base model

Create an instance of the model passing the model name as an argument:

model = llm.LLM("microsoft/phi-1_5")

Chat with the base model

Start a chat and send a prompt to see how the base model behaves:

with model.chat(temperature=0.2) as chat:
    response = chat.generate(prompt="What do you think about pineapple pizza?")

Prepare the dataset

Download and prepare the instruction-tuning dataset. To prepare the Alpaca dataset call the prepare_dataset method of model:

alpaca = model.prepare_dataset("alpaca")

Once you download and prepare the dataset once, you can get the dataset directly

alpaca = model.get_dataset("alpaca")

You can also prepare the Dolly dataset:

alpaca = model.prepare_dataset("dolly")

You can also bring your own CSV, in which case you can use (dataset="csv").

mydataset = model.prepare_csv_dataset("mydataset", csv_path="<path_to_csv>")

In the latter case, you need to provide a CSV file with the following 3 columns

instruction input output

and pass it as the csv_path=<data.csv> argument to the function.

Fine-tune the base model on the dataset

You can now fine-tune your model on the data. Finetuning will automatically run across all available GPUs.

To finetune, call the finetune method on the model, and pass the dataset that you prepared previously.

finetuned  = model.finetune(dataset=alpaca, max_iter=100)

You can pass a number of hyperparameters to finetune in order to control the outcome.

Chat with the model

You can chat with the resulting model just like previously, only creating the chat context using finetuned:

with finetuned.chat(temperature=0.2) as chat:
    response = chat.generate(prompt="What do you think about pineapple pizza?")

Start an API inference server

You can serve each model through an OpenAI-compatible API server this way

finetuned.serve(port=8000)

You can send a request to the server using

python client.py "What do you think about pineapple pizza?"

in a separate terminal, or equivalently make a cURL request

curl http://127.0.0.1:8000/v1/chat/completions -H "Content-Type: application/json" -H "X-API-KEY: 1234567890" -d '{
     "messages": [{"role": "user", "content": "What do you think about pineapple pizza?"}],
     "temperature": 0.7
   }'

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
lit		lit
README.md		README.md
client.py		client.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lit

lit

README.md

README.md

client.py

client.py

main.py

main.py

Repository files navigation

Lit-LLM: High-Level API for LLMs

Features

Usage

Load the base model

Chat with the base model

Prepare the dataset

Fine-tune the base model on the dataset

Chat with the model

Start an API inference server

About

Releases

Packages

Languages

Lightning-Universe/lit-llm

Folders and files

Latest commit

History

Repository files navigation

Lit-LLM: High-Level API for LLMs

Features

Usage

Load the base model

Chat with the base model

Prepare the dataset

Fine-tune the base model on the dataset

Chat with the model

Start an API inference server

About

Resources

Stars

Watchers

Forks

Languages