Skip to content

Softlandia-Ltd/tiny-serverless-llms

Repository files navigation

Serverless Tiny Language Models

TinyLlama 1.1B 4-bit quantized 3 trillion token chat model and Qwen 2 beta 0.5B 8-bit quantized chat model running on Azure Functions consumption plan.

This project is not intended for production use. It is a technology demonstration to show that it is possible to run large language models on a cheap and scalable serverless platform.

A demo of the app is available at https://tiny-serverless-llms.azurewebsites.net. In the demo, you can enter a prompt and the model will generate a completion.

Any abuse of the service will result in the service being taken down.

Running locally

To run the serverless function locally, please refer to Azure Functions documentation.

You will need following environment variables:

{
  "IsEncrypted": false,
  "Values": {
    "AzureWebJobsStorage": "UseDevelopmentStorage=true",
    "FUNCTIONS_WORKER_RUNTIME": "python",
    "AzureSignalRBase": "https://{signal_r_service_name}.service.signalr.net",
    "AzureSignalRAccessKey": "{signal_r_service_key}",
    "MODEL_BASE": "{absolute_path_to_model_directory_including_trailing_slash}",
    "LLAMA_BASE": "{absolute_path_to_llama_binary_directory_including_trailing_slash}"
  }
}

Copy your llama.cpp binary to the LLAMA_BASE directory and the models to the MODEL_BASE directory.

Deploying to Azure

You'll need the following resources in Azure:

  • Azure Functions in Consumption Plan
  • Azure SignalR Service in serverless mode

You'll need to set the following application setting in Azure Functions:

POST_BUILD_SCRIPT_PATH=post_build.sh
MODEL_BASE=/home/site/wwwroot/
LLAMA_BASE=/home/site/wwwroot/llama.cpp/
AzureSignalRBase=https://{signal_r_service_name}.service.signalr.net
AzureSignalRAccessKey={signal_r_service_key}

The post_build.sh script will be executed during Oryx build. It is used to build the llama.cpp binary and add the models to the deployment package.

Softlandia is a software consultancy based in Finland. We specialize in AI, especially in generative AI, cloud architecture and IoT and in building software in general, from web applications to embedded systems.

Check out our private GenAI product YOKOT.AI and blog.

About

This is a technology demonstration to show that it is possible to run tiny (large) language models on a cheap and scalable serverless platform. Not for production use.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published