Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erro using Clickhouse to create a vectorStore "Annoy index second argument must be String." #21808

Open
5 tasks done
viniciuscr opened this issue May 17, 2024 · 1 comment
Open
5 tasks done
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: vector store Related to vector store module

Comments

@viniciuscr
Copy link

viniciuscr commented May 17, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain_text_splitters import CharacterTextSplitter
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.vectorstores import Clickhouse, ClickhouseSettings

file = "some_file.pdf"
loader = PyMuPDFLoader(file)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = FastEmbedEmbeddings()

settings = ClickhouseSettings(table="some_table")
docsearch = Clickhouse.from_documents(docs, embeddings, config=settings)

I am using clickhouse suggested in the docks

! docker run -d -p 8123:8123 -p 9005:9000 --name langchain-clickhouse-server --ulimit nofile=262144:262144 clickhouse/clickhouse-server:23.4.2.11

Error Message and Stack Trace (if applicable)


DatabaseError Traceback (most recent call last)
Cell In[28], line 7
6 settings = ClickhouseSettings(table="some_table")
----> 7 docsearch = Clickhouse.from_documents(docs, embeddings, config=settings)

File ~/.local/lib/python3.12/site-packages/langchain_core/vectorstores.py:550, in VectorStore.from_documents(cls, documents, embedding, **kwargs)
548 texts = [d.page_content for d in documents]
549 metadatas = [d.metadata for d in documents]
--> 550 return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)

File ~/micromamba/envs/langchain/lib/python3.12/site-packages/langchain_community/vectorstores/clickhouse.py:403, in Clickhouse.from_texts(cls, texts, embedding, metadatas, config, text_ids, batch_size, **kwargs)
376 @classmethod
377 def from_texts(
378 cls,
(...)
385 **kwargs: Any,
386 ) -> Clickhouse:
387 """Create ClickHouse wrapper with existing texts
388
389 Args:
(...)
401 ClickHouse Index
402 """
--> 403 ctx = cls(embedding, config, **kwargs)
404 ctx.add_texts(texts, ids=text_ids, batch_size=batch_size, metadatas=metadatas)
405 return ctx

File ~/micromamba/envs/langchain/lib/python3.12/site-packages/langchain_community/vectorstores/clickhouse.py:205, in Clickhouse.init(self, embedding, config, **kwargs)
200 if self.config.index_type:
201 # Enable index
202 self.client.command(
203 f"SET allow_experimental_{self.config.index_type}_index=1"
204 )
--> 205 self.client.command(self.schema)

File ~/micromamba/envs/langchain/lib/python3.12/site-packages/clickhouse_connect/driver/httpclient.py:336, in HttpClient.command(self, cmd, parameters, data, settings, use_database, external_data)
333 params.update(self._validate_settings(settings or {}))
335 method = 'POST' if payload or fields else 'GET'
--> 336 response = self._raw_request(payload, params, headers, method, fields=fields)
337 if response.data:
338 try:

File ~/micromamba/envs/langchain/lib/python3.12/site-packages/clickhouse_connect/driver/httpclient.py:438, in HttpClient._raw_request(self, data, params, headers, method, retries, stream, server_wait, fields, error_handler)
436 error_handler(response)
437 else:
--> 438 self._error_handler(response)

File ~/micromamba/envs/langchain/lib/python3.12/site-packages/clickhouse_connect/driver/httpclient.py:362, in HttpClient._error_handler(self, response, retried)
360 err_msg = common.format_error(err_content.decode(errors='backslashreplace'))
361 err_str = f':{err_str}\n {err_msg}'
--> 362 raise OperationalError(err_str) if retried else DatabaseError(err_str) from None

DatabaseError: :HTTPDriver for http://localhost:8123/ returned response code 500)
Code: 80. DB::Exception: Annoy index second argument must be String. (INCORRECT_QUERY) (version 23.4.2.11 (official build))

Description

I'm trying to load a pdf and search it using Clickhouse

System Info

System Information

OS: Linux
OS Version: #1 SMP PREEMPT_DYNAMIC Thu May 2 18:59:06 UTC 2024
Python Version: 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0]

Package Information

langchain_core: 0.1.52
langchain: 0.1.20
langchain_community: 0.0.38
langsmith: 0.1.59
langchain_chroma: 0.1.1
langchain_openai: 0.1.7
langchain_text_splitters: 0.0.2

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langgraph
langserve

@dosubot dosubot bot added Ɑ: vector store Related to vector store module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels May 17, 2024
@MRzhanghaoran
Copy link

Longchain 962 sex erro to [sudo place->ls]
utf-8 cud:99023;
That was long chains exit the postation will be erro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

2 participants