Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: sanitize foreign schemas #1058

Merged
merged 3 commits into from
Mar 4, 2024

Conversation

westonpace
Copy link
Contributor

Arrow-js uses brittle instanceof checks throughout the code base. These fail unless the library instance that produced the object matches exactly the same instance the vectordb is using. At a minimum, this means that a user using arrow version 15 (or any version that doesn't match exactly the version that vectordb is using) will get strange errors when they try and use vectordb.

However, there are even cases where the versions can be perfectly identical, and the instanceof check still fails. One such example is when using vite (e.g. vitejs/vite#3910)

This PR solves the problem in a rather brute force, but workable, fashion. If we encounter a schema that does not pass the instanceof check then we will attempt to sanitize that schema by traversing the object and, if it has all the correct properties, constructing an appropriate Schema instance via deep cloning.

@westonpace
Copy link
Contributor Author

Note: I disabled some lint checks because we are (hopefully) moving to use prettier in nodejs in #1042 and these checks disagreed with prettier's formatting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of putting the sanitize* functions in their own module / file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good idea. I've put these into a sanitize.ts file.

@westonpace westonpace force-pushed the fix/sanitize-foreign-schemas branch from bb1a773 to f7527d3 Compare March 4, 2024 20:54
@westonpace westonpace merged commit ea33b68 into lancedb:main Mar 4, 2024
8 checks passed
raghavdixit99 pushed a commit to raghavdixit99/lancedb that referenced this pull request Apr 5, 2024
Arrow-js uses brittle `instanceof` checks throughout the code base.
These fail unless the library instance that produced the object matches
exactly the same instance the vectordb is using. At a minimum, this
means that a user using arrow version 15 (or any version that doesn't
match exactly the version that vectordb is using) will get strange
errors when they try and use vectordb.

However, there are even cases where the versions can be perfectly
identical, and the instanceof check still fails. One such example is
when using `vite` (e.g. vitejs/vite#3910)

This PR solves the problem in a rather brute force, but workable,
fashion. If we encounter a schema that does not pass the `instanceof`
check then we will attempt to sanitize that schema by traversing the
object and, if it has all the correct properties, constructing an
appropriate `Schema` instance via deep cloning.
raghavdixit99 pushed a commit to raghavdixit99/lancedb that referenced this pull request Apr 5, 2024
Arrow-js uses brittle `instanceof` checks throughout the code base.
These fail unless the library instance that produced the object matches
exactly the same instance the vectordb is using. At a minimum, this
means that a user using arrow version 15 (or any version that doesn't
match exactly the version that vectordb is using) will get strange
errors when they try and use vectordb.

However, there are even cases where the versions can be perfectly
identical, and the instanceof check still fails. One such example is
when using `vite` (e.g. vitejs/vite#3910)

This PR solves the problem in a rather brute force, but workable,
fashion. If we encounter a schema that does not pass the `instanceof`
check then we will attempt to sanitize that schema by traversing the
object and, if it has all the correct properties, constructing an
appropriate `Schema` instance via deep cloning.
westonpace added a commit that referenced this pull request Apr 5, 2024
Arrow-js uses brittle `instanceof` checks throughout the code base.
These fail unless the library instance that produced the object matches
exactly the same instance the vectordb is using. At a minimum, this
means that a user using arrow version 15 (or any version that doesn't
match exactly the version that vectordb is using) will get strange
errors when they try and use vectordb.

However, there are even cases where the versions can be perfectly
identical, and the instanceof check still fails. One such example is
when using `vite` (e.g. vitejs/vite#3910)

This PR solves the problem in a rather brute force, but workable,
fashion. If we encounter a schema that does not pass the `instanceof`
check then we will attempt to sanitize that schema by traversing the
object and, if it has all the correct properties, constructing an
appropriate `Schema` instance via deep cloning.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants