Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot append fields of type "dense_vector" to an existing index #659

Open
walkingmug opened this issue Feb 1, 2024 · 1 comment
Open

Comments

@walkingmug
Copy link

Description:
When trying to append a pandas dataframe of type "dense_vector" to an existing elastic index with the same field type, an error occurs.

Reproduction:

  1. Install requirements:
    pip install elasticsearch eland pandas numpy
  2. Imports:
from elasticsearch import Elasticsearch
import eland as ed
import pandas as pd
import numpy as np
  1. Connect to Elasticsearch:
client = Elasticsearch(HOST, timeout=120)
  1. Create vector dataframes:
vector1 = np.random.rand(512)
vector2 = np.random.rand(512)
df_1 = pd.DataFrame({
    'vector_column': [vector1, vector2]
})

vector3 = np.random.rand(512)
vector4 = np.random.rand(512)
df_2 = pd.DataFrame({
    'vector_column': [vector3, vector4]
})
  1. ✅ Upload first dataframe:
# upload df_1 to elasticsearch
ed.pandas_to_eland(
  pd_df=df_1,
  es_client=client,
  es_dest_index='test-upload',
  es_if_exists="append",
  es_refresh=True,
  es_type_overrides={
      "vector_column": {
          "type": "dense_vector",
          "dims": 512,
          "index": True,
          "similarity": "cosine"
      },
  },
  chunksize=100
)
  1. ❌ Append second dataframe to first dataframe:
# upload df_2 to elasticsearch
ed.pandas_to_eland(
  pd_df=df_2,
  es_client=client,
  es_dest_index='test-upload',
  es_if_exists="append",
  es_refresh=True,
  es_type_overrides={
      "vector_column": {
          "type": "dense_vector",
          "dims": 512,
          "index": True,
          "similarity": "cosine"
      },
  },
  chunksize=100
)

Error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-16-b0e5aa8d561e>](https://localhost:8080/#) in <cell line: 2>()
      1 # upload df_2 to elasticsearch
----> 2 ed.pandas_to_eland(
      3   pd_df=df_2,
      4   es_client=client,
      5   es_dest_index='test-upload',

1 frames
[/usr/local/lib/python3.10/dist-packages/eland/field_mappings.py](https://localhost:8080/#) in verify_mapping_compatibility(ed_mapping, es_mapping, es_type_overrides)
    919         key_type = es_type_overrides.get(key, key_def["type"])
    920         es_key_type = es_props[key]["type"]
--> 921         if key_type != es_key_type and es_key_type not in ES_COMPATIBLE_TYPES.get(
    922             key_type, ()
    923         ):

TypeError: unhashable type: 'dict'
@pquentin
Copy link
Member

Thanks for your bug report! I liked how it was minimal and easy to reproduce locally, which allowed me to confirm the issue.

What happens is that the first append simply uploads data to a new index, while the second has to check the existing mappings, which hits a different code path. While we should not fail with a TypeError, Eland does not currently support dense_vector, which is the crux of the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants