What is the Label Studio ML backend?

The Label Studio ML backend is an SDK that lets you wrap your machine learning code and turn it into a web server. The web server can be connected to a running Label Studio instance to automate labeling tasks.

If you just need to load static pre-annotated data into Label Studio, running an ML backend might be overkill for you. Instead, you can import preannotated data.

Quickstart

In order to start using the models, use docker-compose to run the ML backend server.

Use the following command to start serving the ML backend at http://localhost:9090:

git clone https://github.com/HumanSignal/label-studio-ml-backend.git
cd label-studio-ml-backend/label_studio_ml/examples/{MODEL_NAME}
docker-compose up

Replace {MODEL_NAME} with the name of the model you want to use:

Models

The following models are available in the repository. Some of them working without any additional setup, some of them require additional parameters to be set. Please check Required parameters column to see if you need to set any additional parameters.

MODEL_NAME	Description	Required parameters
segment_anything_model	General-purpose interactive image segmentation from Meta	None
llm_interactive	Prompt engineering, data collection and model evaluation workflows for LLM (OpenAI, Azure)	OPENAI_API_KEY
grounding_dino	Object detection with text prompts (details)	None
tesseract	Optical Character Recognition (OCR) by drawing bounding boxes (details)	None
easyocr	Another OCR tool from EasyOCR	None
spacy	Named entity recognition model from SpaCy	None
flair	NLP models by flair	None
huggingface	NLP models by Hugging Face	HF_TOKEN
nemo	Speech transcription models by NVIDIA NeMo	None
mmetection	Object detection models by OpenMMLab	None
simple_text_classifier	Simple trainable text classification model powered by scikit-learn	None
substring_matching	Select keyword to highlight all occurrences of the keyword in the text	None

(Advanced usage) Develop your model

To start developing your own ML backend, follow the instructions below.

1. Installation

Download and install label-studio-ml from the repository:

```bash
git clone https://github.com/HumanSignal/label-studio-ml-backend.git
cd label-studio-ml-backend/
pip install -e .
```

2. Create empty ML backend:

label-studio-ml create my_ml_backend

You can go to the my_ml_backend directory and modify the code to implement your own inference logic. The directory structure should look like this:

my_ml_backend/
├── Dockerfile
├── docker-compose.yml
├── model.py
├── _wsgi.py
├── README.md
└── requirements.txt

Dockefile and docker-compose.yml are used to run the ML backend with Docker. model.py is the main file where you can implement your own training and inference logic. _wsgi.py is a helper file that is used to run the ML backend with Docker (you don't need to modify it) README.md is a readme file with instructions on how to run the ML backend. requirements.txt is a file with Python dependencies.

3. Implement prediction logic

In your model directory, locate the model.py file (for example, my_ml_backend/model.py).

The model.py file contains a class declaration inherited from LabelStudioMLBase. This class provides wrappers for the API methods that are used by Label Studio to communicate with the ML backend. You can override the methods to implement your own logic:

def predict(self, tasks, context, **kwargs):
    """Make predictions for the tasks."""
    return predictions

The predict method is used to make predictions for the tasks. It uses the following:

tasks: Label Studio tasks in JSON format
context: Label Studio context in JSON format - for interactive labeling scenario
predictions: Predictions array in JSON format

Once you implement the predict method, you can see predictions from the connected ML backend in Label Studio.

4. Implement training logic (optional)

You can also implement the fit method to train your model. The fit method is typically used to train the model on the labeled data, although it can be used for any arbitrary operations that require data persistence (for example, storing labeled data in database, saving model weights, keeping LLM prompts history, etc). By default, the fit method is called at any data action in Label Studio, like creating a new task or updating annotations. You can modify this behavior in Label Studio > Settings > Webhooks.

To implement the fit method, you need to override the fit method in your model.py file:

def fit(self, event, data, **kwargs):
    """Train the model on the labeled data."""
    old_model = self.get('old_model')
    # write your logic to update the model
    self.set('new_model', new_model)

with

event: event type can be 'ANNOTATION_CREATED', `'ANNOTATION_UPDATED', etc.
data the payload received from the event (check more on Webhook event reference)

Additionally, there are two helper methods that you can use to store and retrieve data from the ML backend:

self.set(key, value) - store data in the ML backend
self.get(key) - retrieve data from the ML backend

Both methods can be used elsewhere in the ML backend code, for example, in the predict method to get the new model weights.

Other methods and parameters

Other methods and parameters are available within the LabelStudioMLBase class:

self.label_config - returns the Label Studio labeling config as XML string.
self.parsed_label_config - returns the Label Studio labeling config as JSON.
self.model_version - returns the current model version.

Run without Docker

To run without docker (for example, for debugging purposes), you can use the following command:

label-studio-ml start my_ml_backend

Test your ML backend

Modify the my_ml_backend/test_api.py to ensure that your ML backend works as expected.

Modify the port

To modify the port, use the -p parameter:

label-studio-ml start my_ml_backend -p 9091

Deploy your ML backend to GCP

Before you start:

Install gcloud
Init billing for account if it's not activated
Init gcloud, type the following commands and login in browser:

gcloud auth login

Activate your Cloud Build API
Find your GCP project ID
(Optional) Add GCP_REGION with your default region to your ENV variables

To start deployment:

Create your own ML backend
Start deployment to GCP:

label-studio-ml deploy gcp {ml-backend-local-dir} \
--from={model-python-script} \
--gcp-project-id {gcp-project-id} \
--label-studio-host {https://app.heartex.com} \
--label-studio-api-key {YOUR-LABEL-STUDIO-API-KEY}

After label studio deploys the model - you will get model endpoint in console.

Name		Name	Last commit message	Last commit date
Latest commit History 387 Commits
.github		.github
label_studio_ml		label_studio_ml
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

License

william9x/label-studio-ml-backend

Folders and files

Latest commit

History

Repository files navigation