feat(taps): Support declaring arbitrary SQLAlchemy type instances in SQL tap schemas #2102

edgarrmondragon · 2023-12-12T05:15:07Z

Feature scope

Tap/target metadata.

Description

Related to the original Singer spec's sql-datatype (see #1323, #1903), this new metadata object would work in an analogous manner to Python's logging module capability for importing and instantiating arbitrary callables and classes.

The story, specifically for vector/embedding fields for LLMs, would go something like:

As a user, I know that one or more fields in a stream represent a vector. For example, {"id": 1, "my_vector": [1, 2, 3]}.
I would like to use MeltanoLabs/target-postgres alongside pgvector/pgvector-python to declare the SQLAlchemy type of my_vector: pgvector.sqlalchemy.Vector
I would like not to only declare the type but also any arbitrary parameteres for it, like length, dimension, etc.:
```
# catalog metadata in Meltano syntax
schema:
  my_stream:
    my_vector:
      sqlalchemy_type:
        (): "pgvector.sqlalchemy.Vector"
        dim: 3
```
This would trigger the SDK to import pgvector.sqlalchemy.Vector and instantiate it as Vector(dim=3).
The other requirement is that the user installs pgvector-python in the same virtual environment as target-postgres, which could be achieved with package extras, e.g. target-postgres[vector] or by documenting known postgres SQLAlchemy extensions in the target's readme.

Potential issues

The biggest problem I can think off is figuring out the priority with which this type override is considered. For example, in target-postgres:

a few JSON schema types are mapped first to postgres-specific column types, e.g. `int -> BIGINT
then the SDK defaults are used

But by introducing this feature, we'd expect targets to resolve in the following order:

sqlalchemy_type overrides
custom target implementation overrides
SDK defaults

The text was updated successfully, but these errors were encountered:

amotl · 2023-12-12T20:59:43Z

Dear Edgar,

thank you for converging this from MeltanoLabs/target-pinecone#20 so quickly. Is GH-1872 actually already setting the stage for your proposal, or is it something different?

With kind regards,
Andreas.

edgarrmondragon added kind/Feature New feature or request valuestream/SDK labels Dec 12, 2023

edgarrmondragon mentioned this issue Dec 12, 2023

Using the Pinecone adapter as blueprint for other vector store databases? MeltanoLabs/target-pinecone#20

Closed

amotl mentioned this issue Dec 12, 2023

Add support for vector store features crate-workbench/meltano-target-cratedb#5

Open

This was referenced Dec 13, 2023

test: Add test cases for arrays and objects, and introduce verify_schema MeltanoLabs/target-postgres#250

Open

feat: Add support for pgvector [NAIVE] MeltanoLabs/target-postgres#251

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(taps): Support declaring arbitrary SQLAlchemy type instances in SQL tap schemas #2102

feat(taps): Support declaring arbitrary SQLAlchemy type instances in SQL tap schemas #2102

edgarrmondragon commented Dec 12, 2023 •

edited

amotl commented Dec 12, 2023

feat(taps): Support declaring arbitrary SQLAlchemy type instances in SQL tap schemas #2102

feat(taps): Support declaring arbitrary SQLAlchemy type instances in SQL tap schemas #2102

Comments

edgarrmondragon commented Dec 12, 2023 • edited

Feature scope

Description

Potential issues

amotl commented Dec 12, 2023

edgarrmondragon commented Dec 12, 2023 •

edited