Skip to content

Tutorial: Classifying Documents & Queries by Language - DocumentStore per language #7569

Discussion options

You must be logged in to vote

Hi @greghobby In that case the DocumentLanguageClassifier is still the component to use. https://docs.haystack.deepset.ai/docs/documentlanguageclassifier
It uses langdetect under the hood, which supports 55 languages. You can initialize the DocumentLanguageClassifier with as many of these languages as you want:
document_classifier = DocumentLanguageClassifier(languages = ["en", "de", ...])
The language of the classified documents will be stored in the metadata of the documents.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@greghobby
Comment options

Answer selected by greghobby
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants