Skip to content

Releases: LLukas22/llm-rs-python

Custom RoPE support & Small Langchain bugfixes

19 Aug 14:24
3bc82ba
Compare
Choose a tag to compare

Adds the ability to extend the context length of models via the RoPE_scaling parameter.

Better HuggingfaceHub Integration

19 Jul 15:30
547efaa
Compare
Choose a tag to compare

Simplified the interaction with other GGML based repos. Like TheBloke/Llama-2-7B-GGML created by TheBloke.

Stable GPU Support

17 Jul 15:33
b5eaae5
Compare
Choose a tag to compare

Fixed many gpu acceleration bugs in rustformers\llm and improved performance to match native ggml.

Experimental GPU support

27 Jun 13:31
Compare
Choose a tag to compare

Adds support for Metal/CUDA and OpenCL acceleration for LLama-based models.

Adds CI for the different acceleration backends to create prebuild binaries

Added 🌾🔱 Haystack Support + BigCode-Models

21 Jun 12:09
Compare
Choose a tag to compare
  • Added support for the haystack library
  • Support "BigCode" like models (e.g. WizardCoder) via the gpt2 architecture

Added 🦜️🔗 LangChain support

06 Jun 14:45
e2a6d45
Compare
Choose a tag to compare
Merge pull request #21 from LLukas22/feat/langchain

Add LangChain support

Added Huggingface Tokenizer Support

04 Jun 14:25
e2925c4
Compare
Choose a tag to compare

AutoModel compatible models will now use the official tokenizers library, which improves the decoding accuracy, especially for all non llama based models.

If you want to specify a tokenizer manually, it can be set via the tokenizer_path_or_repo_id parameter. If you want to use the default GGML tokenizer the huggingface support can be disabled via use_hf_tokenizer.

Fixed GPT-J quantization

29 May 10:03
Compare
Choose a tag to compare
0.2.8

GPT-J quantization bugfix

Added other quantization formats

28 May 08:27
f893129
Compare
Choose a tag to compare

Added support for q5_0,q5_1 and q8_0 formats.

Streaming support

27 May 15:31
c7e3efc
Compare
Choose a tag to compare

Added the stream method to each model, which returns a generator that can be consumed to generate a response.