Skip to content

Plugin that applies sentence transformers to message bodies in Hansken

License

Notifications You must be signed in to change notification settings

NetherlandsForensicInstitute/bert-embeddings

Repository files navigation

BERT embeddings

This repository contains some code that wraps the sentence_transformers PyPI package. This is a Hansken extraction plugin, read the docs for the SDK here. It can apply any of the available sentence transformers on the chatMessage.message field.

The plugin can be adapted to use different models, but it can also run on different fields of course; just change the matcher and the getter.

Note that it is recommended to always check a model's model card or README before actually using it.

The .sb-files are Starboard Notebook files. These files can be imported in the Code Notebooks that are included in the Expert UI of Hansken. The notebooks are an example of how you can sort all chat messages in a case based on their similarity using hansken.py.

Building and running it

To run it, pull the latest copy to the place where Hansken is looking for plugins:

docker pull ghcr.io/netherlandsforensicinstitute/bert-embeddings:latest

Or clone (and modify) this repository and build your own copy using

git clone https://github.com/netherlandsforensicinstitute/bert-embeddings
cd bert-embeddings
build_plugin bert_embeddings.py . bert-embeddings

About

Plugin that applies sentence transformers to message bodies in Hansken

Resources

License

Stars

Watchers

Forks

Releases

No releases published