Skip to content

pgvector support for Java, Kotlin, Groovy, and Scala

License

Notifications You must be signed in to change notification settings

pgvector/pgvector-java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pgvector-java

pgvector support for Java, Kotlin, Groovy, and Scala

Supports JDBC, Spring JDBC, Groovy SQL, and Slick

Build Status

Getting Started

For Maven, add to pom.xml under <dependencies>:

<dependency>
    <groupId>com.pgvector</groupId>
    <artifactId>pgvector</artifactId>
    <version>0.1.4</version>
</dependency>

For sbt, add to build.sbt:

libraryDependencies += "com.pgvector" % "pgvector" % "0.1.4"

For other build tools, see this page.

And follow the instructions for your database library:

Or check out an example:

JDBC (Java)

Import the PGvector class

import com.pgvector.PGvector;

Enable the extension

Statement setupStmt = conn.createStatement();
setupStmt.executeUpdate("CREATE EXTENSION IF NOT EXISTS vector");

Register the vector type with your connection

PGvector.addVectorType(conn);

Create a table

Statement createStmt = conn.createStatement();
createStmt.executeUpdate("CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))");

Insert a vector

PreparedStatement insertStmt = conn.prepareStatement("INSERT INTO items (embedding) VALUES (?)");
insertStmt.setObject(1, new PGvector(new float[] {1, 1, 1}));
insertStmt.executeUpdate();

Get the nearest neighbors

PreparedStatement neighborStmt = conn.prepareStatement("SELECT * FROM items ORDER BY embedding <-> ? LIMIT 5");
neighborStmt.setObject(1, new PGvector(new float[] {1, 1, 1}));
ResultSet rs = neighborStmt.executeQuery();
while (rs.next()) {
    System.out.println((PGvector) rs.getObject("embedding"));
}

Add an approximate index

Statement indexStmt = conn.createStatement();
indexStmt.executeUpdate("CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)");
// or
indexStmt.executeUpdate("CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)");

Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance

See a full example

Spring JDBC

Import the PGvector class

import com.pgvector.PGvector;

Enable the extension

jdbcTemplate.execute("CREATE EXTENSION IF NOT EXISTS vector");

Create a table

jdbcTemplate.execute("CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))");

Insert a vector

Object[] insertParams = new Object[] { new PGvector(new float[] {1, 1, 1}) };
jdbcTemplate.update("INSERT INTO items (embedding) VALUES (?)", insertParams);

Get the nearest neighbors

Object[] neighborParams = new Object[] { new PGvector(new float[] {1, 1, 1}) };
List<Map<String, Object>> rows = jdbcTemplate.queryForList("SELECT * FROM items ORDER BY embedding <-> ? LIMIT 5", neighborParams);
for (Map row : rows) {
    System.out.println(row.get("embedding"));
}

Add an approximate index

jdbcTemplate.execute("CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)");
// or
jdbcTemplate.execute("CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)");

Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance

See a full example

Hibernate

Hibernate 6.4+ has a vector module (use this instead of com.pgvector.pgvector).

For Maven, add to pom.xml under <dependencies>:

<dependency>
    <groupId>org.hibernate.orm</groupId>
    <artifactId>hibernate-vector</artifactId>
    <version>6.4.0.Final</version>
</dependency>

Define an entity

import jakarta.persistence.*;
import org.hibernate.annotations.Array;
import org.hibernate.annotations.JdbcTypeCode;
import org.hibernate.type.SqlTypes;

@Entity
class Item {
    @Id
    @GeneratedValue
    private Long id;

    @Column
    @JdbcTypeCode(SqlTypes.VECTOR)
    @Array(length = 3) // dimensions
    private float[] embedding;

    public void setEmbedding(float[] embedding) {
        this.embedding = embedding;
    }
}

Insert a vector

Item item = new Item();
item.setEmbedding(new float[] {1, 1, 1});
entityManager.persist(item);

Get the nearest neighbors

List<Item> items = entityManager
    .createQuery("FROM Item ORDER BY l2_distance(embedding, :embedding) LIMIT 5", Item.class)
    .setParameter("embedding", new float[] {1, 1, 1})
    .getResultList();

See a full example

R2DBC

R2DBC PostgreSQL 1.0.3+ supports the vector type (use this instead of com.pgvector.pgvector).

For Maven, add to pom.xml under <dependencies>:

<dependency>
    <groupId>org.postgresql</groupId>
    <artifactId>r2dbc-postgresql</artifactId>
    <version>1.0.3.RELEASE</version>
</dependency>

JDBC (Kotlin)

Import the PGvector class

import com.pgvector.PGvector

Enable the extension

val setupStmt = conn.createStatement()
setupStmt.executeUpdate("CREATE EXTENSION IF NOT EXISTS vector")

Register the vector type with your connection

PGvector.addVectorType(conn)

Create a table

val createStmt = conn.createStatement()
createStmt.executeUpdate("CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))")

Insert a vector

val insertStmt = conn.prepareStatement("INSERT INTO items (embedding) VALUES (?)")
insertStmt.setObject(1, PGvector(floatArrayOf(1.0f, 1.0f, 1.0f)))
insertStmt.executeUpdate()

Get the nearest neighbors

val neighborStmt = conn.prepareStatement("SELECT * FROM items ORDER BY embedding <-> ? LIMIT 5")
neighborStmt.setObject(1, PGvector(floatArrayOf(1.0f, 1.0f, 1.0f)))
val rs = neighborStmt.executeQuery()
while (rs.next()) {
  println(rs.getObject("embedding") as PGvector?)
}

Add an approximate index

val indexStmt = conn.createStatement()
indexStmt.executeUpdate("CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)")
// or
indexStmt.executeUpdate("CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)")

Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance

See a full example

JDBC (Groovy)

Import the PGvector class

import com.pgvector.PGvector

Enable the extension

def setupStmt = conn.createStatement()
setupStmt.executeUpdate("CREATE EXTENSION IF NOT EXISTS vector")

Register the vector type with your connection

PGvector.addVectorType(conn)

Create a table

def createStmt = conn.createStatement()
createStmt.executeUpdate("CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))")

Insert a vector

def insertStmt = conn.prepareStatement("INSERT INTO items (embedding) VALUES (?)")
insertStmt.setObject(1, new PGvector([1, 1, 1] as float[]))
insertStmt.executeUpdate()

Get the nearest neighbors

def neighborStmt = conn.prepareStatement("SELECT * FROM items ORDER BY embedding <-> ? LIMIT 5")
neighborStmt.setObject(1, new PGvector([1, 1, 1] as float[]))
def rs = neighborStmt.executeQuery()
while (rs.next()) {
    println((PGvector) rs.getObject("embedding"))
}

Add an approximate index

def indexStmt = conn.createStatement()
indexStmt.executeUpdate("CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)")
// or
indexStmt.executeUpdate("CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)")

Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance

See a full example

Groovy SQL

Import the PGvector class

import com.pgvector.PGvector

Enable the extension

sql.execute "CREATE EXTENSION IF NOT EXISTS vector"

Create a table

sql.execute "CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))"

Insert a vector

def params = [new PGvector([1, 1, 1] as float[])]
sql.executeInsert "INSERT INTO items (embedding) VALUES (?)", params

Get the nearest neighbors

def params = [new PGvector([1, 1, 1] as float[])]
sql.eachRow("SELECT * FROM items ORDER BY embedding <-> ? LIMIT 5", params) { row ->
    println row.embedding
}

Add an approximate index

sql.execute "CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)"
// or
sql.execute "CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)"

Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance

See a full example

JDBC (Scala)

Import the PGvector class

import com.pgvector.PGvector

Enable the extension

val setupStmt = conn.createStatement()
setupStmt.executeUpdate("CREATE EXTENSION IF NOT EXISTS vector")

Register the vector type with your connection

PGvector.addVectorType(conn)

Create a table

val createStmt = conn.createStatement()
createStmt.executeUpdate("CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))")

Insert a vector

val insertStmt = conn.prepareStatement("INSERT INTO items (embedding) VALUES (?)")
insertStmt.setObject(1, new PGvector(Array[Float](1, 1, 1)))
insertStmt.executeUpdate()

Get the nearest neighbors

val neighborStmt = conn.prepareStatement("SELECT * FROM items ORDER BY embedding <-> ? LIMIT 5")
neighborStmt.setObject(1, new PGvector(Array[Float](1, 1, 1)))
val rs = neighborStmt.executeQuery()
while (rs.next()) {
  println(rs.getObject("embedding").asInstanceOf[PGvector])
}

Add an approximate index

val indexStmt = conn.createStatement()
indexStmt.executeUpdate("CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)")
// or
indexStmt.executeUpdate("CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)")

Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance

See a full example

Slick

Import the PGvector class

import com.pgvector.PGvector

Enable the extension

db.run(sqlu"CREATE EXTENSION IF NOT EXISTS vector")

Add a vector column

class Items(tag: Tag) extends Table[(String)](tag, "items") {
  def embedding = column[String]("embedding", O.SqlType("vector(3)"))
  def * = (embedding)
}

Insert a vector

val embedding = new PGvector(Array[Float](1, 1, 1)).toString
db.run(sqlu"INSERT INTO items (embedding) VALUES ($embedding::vector)")

Get the nearest neighbors

val embedding = new PGvector(Array[Float](1, 1, 1)).toString
db.run(sql"SELECT * FROM items ORDER BY embedding <-> $embedding::vector LIMIT 5".as[(String)])

Add an approximate index

db.run(sqlu"CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)")
// or
db.run(sqlu"CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)")

Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance

See a full example

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/pgvector/pgvector-java.git
cd pgvector-java
createdb pgvector_java_test
mvn test