How will the rewrite impact the compatibility with Jina? #876
Replies: 2 comments
-
Hey @JoanFM indeed this new version of DocArray will have an impact on Jina and the way we work with Executor. There will be breaking change but we will avoid them as much as possible and we believe this this will only enforce a better usage of Executor in general if we have a input schema for Executor. Herei the current vision for this work Jina side : Executor interoperabilityThe API above is much more flexible than the current Document implementation. This buys us better multimodal support as well as more natural vector DB integration. On the other hand, the Executor have less structure to rely on.
class MyExecSchema(Document):
text: str
embedding: Embedding
class MyExec(Executor):
@requests
def foo(docs: DocumentArray[MyExecSchema], *args, **kwargs):
...
class ClientDoc(Document):
text: str
first_embedding: Embedding
second_embedding: Embedding
doc = ClientDoc(...)
# map `text` to `text` and `first_embedding` to `embedding`
client.post(doc, schema_map={'MyExec': 'text:text,first_embedding:embedding'})
# the case of nested schema can be handle with dunder notation
Automatic translation details: Lets say my input data follow this schema class ImageTextDocument(Document):
text: Text
image: Image
embedding: Embedding and that my Executor follow this one class MyPhoto(Document):
vector: Embedding
photo: Image
description: Text
class PhotoEmbeddingExecutor(Executor):
@requests
def encode(self, docs: DocumentArray[MyPhoto], **kwargs):
for doc_ in docs:
doc.embedding = self.image_model(doc.photo) They define actually the same underlying schema but with different field name. So the way would be to do client.post(doc, schema_map={'MyExec': 'image:photo,embedding:vector,text:description'}) But this is too verbose for smth just translating the same schema. We will do that automatically., How ? We look at the field type and do a one by one group by. What if the matching is not exact ? i.e what if we have the two following schema ? class ClientDoc(Document):
text: Text
embedding1: Embedding
embedding2: Embedding
class ExecutorDoc(Document):
text: Text
embedding: Embedding If we have collision on a field (here two embeddings) we will take the first field that correspond (in this case embedding1). This is a deterministic algorithm because fields are ordered in pydantic |
Beta Was this translation helpful? Give feedback.
-
Jina has added compatibility with the newer docarray versions |
Beta Was this translation helpful? Give feedback.
-
How will the rewrite affect the capacity of DocumentArrays to be streamed through Jina Flows and Executors?
Beta Was this translation helpful? Give feedback.
All reactions