Better support for scalar lists #1275

sorenbs · 2017-11-14T15:05:29Z

The current data structure for scalar lists is not able to provide the advanced filters and mutations we want to support.

New Features

Given the schema

type Student @model {
  id: ID! @isUnique
  scores: [Int!]!
}

Filters

Example

allStudents(filter: {scores_every: { lt: 42}})

Specification

A scalar list field generate the same 3 top-level filter arguments as normal relations:

field_every
field_some
field_none

The nested filter arguments are almost the same as for normal relations, except for scalar lists the value is contained directly in the list, instead of being in a node in the list. So we don't need a field prefix:

eq
not
in
not_in
lt
lte
gt
gte

This list is for integers. Scalar lists for other types will support all the normal filters such as contains and starts_with for strings

Mutations

Create

The create mutation will work as it currently does:

mutation { createStudent(data: {scores: { set: [90, 67, 83] }} ) }

Update

The update mutation supports the following new list operations

push

The push operation adds one or more elements to the end of the list

mutation { updateStudent(by: {id: "...", data: {scores: { push: [97, 58] }}}) }

mutation { updateStudent(by: {id: "...", data: {scores: { push: 58 }}}) }

The position argument can be used to specify the location of the new elements. 0 means the elements are pushed at the beginning, 1 means they are pushed behind the first element etc. If the position argument is greater than the length of the list, the new elements are pushed at the end of the list

mutation { updateStudent(by: {id: "...", data: {scores: { push: [42, 1337], location: 3 }}}) }

before: [1,2,3,4,5,6,7,8,9,10]
after: [1,2,3,4,42,1337,5,6,7,8,9,10]

pop

Removes one or more elements from the beginning or end of the list. Pass a positive value to remove from the end of the list, pass a negative value to remove from the beginning of the list:

mutation { updateStudent(by: {id: "...", data: {scores: { pop: 1 }}}) }

before: [1,2,3,4,5,6,7,8,9,10]
after: [1,2,3,4,5,6,7,8,9]

mutation { updateStudent(by: {id: "...", data: {scores: { pop: 5 }}}) }

before: [1,2,3,4,5,6,7,8,9,10]
after: [1,2,3,4,5]

mutation { updateStudent(by: {id: "...", data: {scores: { pop: -5 }}}) }

before: [1,2,3,4,5,6,7,8,9,10]
after: [6,7,8,9,10]

remove

Removes all elements from the list that match a filter

mutation { updateStudent(by: {id: "...", data: {scores: { remove: { lte: 4 }}}}) }

before: [1,2,3,4,5,6,7,8,9,10]
after: [5,6,7,8,9,10]

set

Removes existing elements and add new elements

mutation { updateStudent(by: {id: "...", data: {scores: { set: [1,3,5]}}}) }

before: [1,2,3,4,5,6,7,8,9,10]
after: [1,3,5]

Data Structure

Requirements

The default data structure for scalar lists represents a compromise of the following requirements:

Fast ordered lookup get element 50.000 through 50.010
Fast insert at arbitrary position insert element at position 50.000
Fast filtering get nodes where list contain the element "Abba"
Easily portable to different sql storage engines

Table Structure

A scalar list is stored in a separate table with the structure:

nodeId	position	value

nodeId references the id column in the model table and has on cascade: delete
position is a signed INT
value is the normal type for the scalar type

Algorithm

Inserting

The first element is inserted at position 1.000, second element at position 2.000 and so on.

When inserting an element at a position that is not at the beginning or end, the position is calculated like this: pos(x) + floor((pos(x+1) - pos(x)) / 2).

Given these positions: [1000, 2000]
Inserting a new element at index 1 (between the two existing elements) would result in the following positions:

[1000, 1500, 2000]

This is a fast operation.

We can keep inserting elements "in between" like this; 1250, 1125, 1062, 1031, 1015, 1007, 1003, 1001

When we run out of space between two positions we have to increment all positions after the one where the new element will be inserted. This is an expensive operation:

[1000, 1001, 1003, ...] =>
[1000, 2001, 2003, ...] =>
[1000, 1500, 2001, 2003, ...]

The sql for this operation is of the form:

UPDATE `scalarList` SET position = position + 1000 WHERE position > 1000 ORDER BY position DESC

Querying

Getting a range of elements:

SELECT value FROM `Array` limit 100, 10

Filtering

Join the model table and the scalar list table and use the normal filter logic

Limitations

Insert performance

Inserting an element at beginning or end is generally fast.

Inserting an element in the middle of the list is fast as long as there is enough space. If we need to add space to the position column, this becomes a slow operation. For a list with 100.000 elements this operation takes on the order of 1 second. For 500.000 elements this operation takes 5-20 seconds.

Number of elements

As performing the space injection operation is expensive on large lists, we recommend keeping scalar lists below 50.000 elements.

The position is stored in a signed INT which requires 4 bytes and can store values in the -2147483648 to 2147483647 range. This enables us to store 2+ billion values in the positive space and negative space.

If values are continuously inserted and deleted in a way that requires us to perform the space injection operation, it is possible to exhaust the available position space. The first implementation will not try to reclaim space, but this can be implemented in the future

The text was updated successfully, but these errors were encountered:

kbrandwijk · 2017-11-14T15:22:27Z

Loving this. Just an idea: can the same things also be applied to 'normal relations' (pop, push, remove, set)? E.g. nested mutations

marktani · 2017-11-14T15:33:43Z

Relations don't have a defined order, so pop and push are undefined.

kbrandwijk · 2017-11-14T15:42:56Z

Well, push should stil be useful (just add to the existing children).
pop won't work, I agree.
set would be the current behavior
remove could work by any unique field (to be consistent with the new query/update by any unique field proposal.

kbrandwijk · 2017-11-14T15:52:17Z

Regarding the insert algorithm. If you use a string instead of a number to specify the ordering, you have a lot more room to navigate without having to 'reorder'. Given 3 items with a,b,c, adding a new item at position 2 (between b and c) would give it order ba. Next, inserting at position 3 (between 'ba' and 'c') would give 'bb'. Now, inserting between 'ba' and 'bb' would give 'baa'.

A total of 5 characters already gives you space to store 12 million ordered elements.

A certain 'branch' of the table can become 'heavy' if a lot of elements are inserted close to eachother (e.g. you can end up with a lot of letters). You could always have a database maintenance task to 'smooth out' the ordering, basically doing a reordering that is 'flatter' or leaves more gaps.

For example, you have 1000 items in the list. You need three letters to order them (262626), so you can skip each odd letter (a-c-e-....), so you don't have to reorder soon.

The big benefit of this that you never have to reorder when you are inserting (causing a delay), because there's always room between any 2 records. A maintenance task can be scheduled.

sorenbs · 2017-11-14T16:22:52Z

@kbrandwijk I like the idea of using a varchar for the position column. This essentially offloads the work of injecting space at the node creation time to the database. We will perform some benchmarks before implementing this feature to better understand performance impact on query and insert as well as the storage requirements.

Supporting some of the same mutation functionality for ordinary relations is a great idea that we should explore!

kbrandwijk · 2017-11-14T18:16:32Z

@sorenbs Another option might be using a double-linked list, with id and previous columns. That way, you don't need any maintenance or renumbering, but ordering might be a little heavier, and inserting a value always requires you to update two records... Anyways, just throwing it out there so you have something to compare to.

aurnik · 2017-11-15T00:50:53Z

Would there be any additional functionality for unique scalar lists? For example, with this schema:
type User @model { id: ID! @isUnique emails: [String!]! @isUnique }
And a direct query like this:
{ User (email: "john@doe.com") { id } }
Which would return the correct user if that e-mail was contained in the scalar list.

marktani · 2017-11-15T09:21:07Z

What does emails: [String!]! @isUnique mean in your suggestion?

the list (set of elements + order) is unique. Meaning, there are no two users with emails ["a", "b"], but a user with ["a", "b"] and anothe with ["b", "a"] is possible
or, the elements in the list are unique to the user. Meaning, there are no two users where any of their emails match.

I'm not convinced that covering this use case with unique scalar lists is a good idea.

kbrandwijk · 2017-11-15T10:08:14Z

Maybe something like @distinct could be used to differ between 'this array contains distinct values', 'and this entire array is unique across all nodes of this type' (@isUnique)

sorenbs · 2017-11-15T10:19:16Z

If we do add support for explicitly declaring that all elements must be distinct, I like the idea of adding a separate directive.

It might be instructive to see how MongoDB handles this https://docs.mongodb.com/manual/reference/operator/update-array/

They have an addToSet operator that only add an element if it is not present yet. It is still possible to add duplicates to the array using the normal push operation though.

--

Regarding @isUnique - we can't easily support the unique guarantee on scalar lists. There is no way to enforce this on the database level, so even if we did add support, it would add a lot of overhead

kbrandwijk · 2017-11-15T10:23:37Z

Unfortunately, there's no consensus yet on graphql/graphql-spec#190, otherwise introducing something like emails: Set<String> would be intuitive to indicate distinct elements. Maybe we can add this use case there too.

aurnik · 2017-11-16T18:32:26Z

Yes I believe the idea of something like @distinct would solve the use case I was describing.

sorenbs · 2017-11-20T17:14:45Z

We should consider supporting the native JSON type in SQL databases for implementing this feature. MySQL has had JSON support since 5.7, but only added support for retrieving a slice of an array in 8.0.2: https://bugs.mysql.com/bug.php?id=79052

As MySQL 8 does not have a stable release yet, no major cloud provider support it yet.

Another complicating with using native JSON support is that the implementation will be different between databases.

Another issue is that we would be limited to a simple filter with LIKE semantic. It would not be possible to do a query like this:

allUsers(filter: {grade_any{gt: 7}})

Kisepro · 2017-11-22T15:04:09Z

Another thing that could be added in more than the existing nested filter is something like "field_in_exclusive".

For instance the "field_in : ["A", "B" ]" will use a OR to match the condition and in a lot of use cases we can need a AND.

And I don't see any proper way to implement for myself yet ( so if you have any clue :) )

johannpinson · 2018-01-16T16:55:47Z

Hi guys,

After talking with @marktani, I would like to know what is the status about length filters ?
I see some references about it here: https://github.com/graphcool/prisma/blob/9d9dd4c76a9519ee72a8a2e5b7be829ec3c24cd3/server/backend-shared/src/main/scala/cool/graph/client/database/FilterArguments.scala#L83-L92

philcockfield · 2018-01-18T04:58:30Z

This is checked off in beta-3 of Primsa which has now been released:

https://github.com/graphcool/prisma/pull/1318

Should we expect to be able to run filters within arrays in Primsa? Or not yet? (cc @marktani )

johannpinson · 2018-01-18T09:49:02Z

@philcockfield

Strangely, it's also referenced into the post 1.0 release list...

marktani · 2018-01-18T09:59:39Z

Hey @philcockfield and @johannpinson, thanks for bringing this up!
One important step to allow powerful scalar list has been implemented in beta3, which was about the right data structure for scalar lists, as discussed in this proposal.

This paves the way for adding scalar list filters into the Prisma API, without introducing a breaking change. We'll tackle this feature shortly, but there is no concrete timeline yet.

philcockfield · 2018-01-20T07:13:05Z

Cool - thanks @marktani....as you know ( 😄 ) a lot clean design choice rest on being able to filter into lists/arrays. Really looking forward to having this!!

marktani · 2018-03-12T12:57:06Z

Filters for enum lists should also be considered: #1858.

m4rrc0 · 2018-03-24T14:01:42Z

Relations don't have a defined order.

@marktani I see plenty of use cases for an ordered list of relations. Any time a list of entities can be sorted by the user we would need to filter and mutate the positions in the list.
Hence I am glad to read @sorenbs 's comment:

Supporting some of the same mutation functionality for ordinary relations is a great idea that we should explore!

Can you tell us your current position on this?

I would also really appreciate some guidance. I have been reading a lot about the subject these past couple of days but there seems to be no trivial solution. I don't understand how such a basic need can be so complex to implement in an efficient manner... =/

codepunkt · 2018-07-03T09:38:26Z

See #2723 for another feature request for scalar lists - the default directive!

oxy88 · 2018-08-28T07:19:08Z

Any progress?
I really want this feature for my service.

patrickdevivo · 2018-09-04T15:53:42Z

Checking in as well - this feature would be immensely useful. It would be nice to understand the underlying issue/road block preventing an implementation from being merged in, is there anyone with details on that?

PedramMarandi · 2018-10-30T04:00:42Z

Hi every one
I was looking into my GraphQL docs and I found, eh there is no filter on my [String!].
It will be really useful, any idea when this feature's going to be realized?

TheMrugraj · 2018-11-26T11:18:37Z

Is there any progress on this? We really need this to filter by the list of scalar values.

FluorescentHallucinogen · 2018-12-13T11:57:29Z

Any progress on adding push, pop and remove?

schickling · 2019-01-02T12:13:42Z

Update: See the scalarList directive in this new datamodel proposal.

realAlexBarge · 2019-07-12T08:55:44Z

@schickling Could you elaborate on the current status? Even with the new scalarList datamodel syntax filtering of scalarLists is still not possible, correct? What is the ETA? Are there any best practice workarounds? Thanks in advance.

ahrbil · 2020-01-03T11:05:03Z

Looking forward for this.
@realAlexBarge Prisma team is now focusing on Prisma 2 fo it might be supported there.
also, have you found any workarounds?

KayBeSee · 2020-03-28T03:34:30Z

Is there any progress on this issue?

I am trying to figure out the best way to insert an item into an array at a specific location and this seems to be exactly what I need. Does anyone know of a work around in the meantime?

giovannefeitosa · 2020-03-30T00:25:03Z

@KayBeSee The only workaround I can think is to create another type and use Relation instead of [String!]! or so...

If somesone has a better idea, please tell us.

sorenbs added area/api rfc/1-draft labels Nov 14, 2017

sorenbs added this to the 1.0-beta4 milestone Nov 14, 2017

sorenbs self-assigned this Nov 14, 2017

This was referenced Nov 14, 2017

Add value to Array field in mutation #267

Closed

Filters should support scalar lists #60

Closed

nikolasburk mentioned this issue Nov 14, 2017

[Prisma 1.0] Specifications #353

Closed

32 tasks

sorenbs mentioned this issue Nov 14, 2017

Support filtering on aggregated values #1279

Closed

1 task

sorenbs added rfc/2-accepted and removed rfc/1-draft labels Nov 28, 2017

sorenbs added rfc/1-draft and removed rfc/2-accepted labels Dec 9, 2017

mavilein mentioned this issue Dec 14, 2017

[WIP] Prisma #1318

Merged

42 tasks

marktani modified the milestones: 1.0-beta4, 1.0-beta2 Dec 15, 2017

marktani removed this from the 1.0-beta2 milestone Dec 30, 2017

marktani removed this from the 1.0-beta3 milestone Jan 22, 2018

marktani mentioned this issue Jan 23, 2018

Better support for scalar lists Graphcool/graphcool-framework#404

Closed

marktani closed this as completed Jan 23, 2018

marktani reopened this Jan 24, 2018

marktani mentioned this issue Jan 25, 2018

Add documentation about scalar lists #1732

Closed

marktani mentioned this issue Mar 5, 2018

Add filters for enum lists #1858

Closed

marktani mentioned this issue Aug 17, 2018

Query filters for enum lists are not generated #2930

Closed

queq1890 mentioned this issue Nov 20, 2018

Change Role enum into Role Type && create mutation for new Project emily0604/reporting_system_server#30

Merged

pantharshit00 added area/api rfc/1-draft labels Jan 10, 2019

MrBlenny mentioned this issue Mar 21, 2019

[RFC] Datamodel v1.1 - Field Behaviour #3403

Closed

janpio closed this as completed Sep 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better support for scalar lists #1275

Better support for scalar lists #1275

sorenbs commented Nov 14, 2017 •

edited by do4gr

kbrandwijk commented Nov 14, 2017 •

edited

marktani commented Nov 14, 2017

kbrandwijk commented Nov 14, 2017 •

edited

kbrandwijk commented Nov 14, 2017 •

edited

sorenbs commented Nov 14, 2017

kbrandwijk commented Nov 14, 2017 •

edited

aurnik commented Nov 15, 2017 •

edited

marktani commented Nov 15, 2017

kbrandwijk commented Nov 15, 2017

sorenbs commented Nov 15, 2017

kbrandwijk commented Nov 15, 2017

aurnik commented Nov 16, 2017

sorenbs commented Nov 20, 2017

Kisepro commented Nov 22, 2017

johannpinson commented Jan 16, 2018

philcockfield commented Jan 18, 2018 •

edited

johannpinson commented Jan 18, 2018

marktani commented Jan 18, 2018

philcockfield commented Jan 20, 2018

marktani commented Mar 12, 2018

m4rrc0 commented Mar 24, 2018

codepunkt commented Jul 3, 2018

oxy88 commented Aug 28, 2018

patrickdevivo commented Sep 4, 2018

PedramMarandi commented Oct 30, 2018

TheMrugraj commented Nov 26, 2018

FluorescentHallucinogen commented Dec 13, 2018

schickling commented Jan 2, 2019

realAlexBarge commented Jul 12, 2019

ahrbil commented Jan 3, 2020

KayBeSee commented Mar 28, 2020

giovannefeitosa commented Mar 30, 2020

Better support for scalar lists #1275

Better support for scalar lists #1275

Comments

sorenbs commented Nov 14, 2017 • edited by do4gr

New Features

Filters

Example

Specification

Mutations

Create

Update

push

pop

remove

set

Data Structure

Requirements

Table Structure

Algorithm

Inserting

Querying

Filtering

Limitations

Insert performance

Number of elements

kbrandwijk commented Nov 14, 2017 • edited

marktani commented Nov 14, 2017

kbrandwijk commented Nov 14, 2017 • edited

kbrandwijk commented Nov 14, 2017 • edited

sorenbs commented Nov 14, 2017

kbrandwijk commented Nov 14, 2017 • edited

aurnik commented Nov 15, 2017 • edited

marktani commented Nov 15, 2017

kbrandwijk commented Nov 15, 2017

sorenbs commented Nov 15, 2017

kbrandwijk commented Nov 15, 2017

aurnik commented Nov 16, 2017

sorenbs commented Nov 20, 2017

Kisepro commented Nov 22, 2017

johannpinson commented Jan 16, 2018

philcockfield commented Jan 18, 2018 • edited

johannpinson commented Jan 18, 2018

marktani commented Jan 18, 2018

philcockfield commented Jan 20, 2018

marktani commented Mar 12, 2018

m4rrc0 commented Mar 24, 2018

codepunkt commented Jul 3, 2018

oxy88 commented Aug 28, 2018

patrickdevivo commented Sep 4, 2018

PedramMarandi commented Oct 30, 2018

TheMrugraj commented Nov 26, 2018

FluorescentHallucinogen commented Dec 13, 2018

schickling commented Jan 2, 2019

realAlexBarge commented Jul 12, 2019

ahrbil commented Jan 3, 2020

KayBeSee commented Mar 28, 2020

giovannefeitosa commented Mar 30, 2020

sorenbs commented Nov 14, 2017 •

edited by do4gr

kbrandwijk commented Nov 14, 2017 •

edited

kbrandwijk commented Nov 14, 2017 •

edited

kbrandwijk commented Nov 14, 2017 •

edited

kbrandwijk commented Nov 14, 2017 •

edited

aurnik commented Nov 15, 2017 •

edited

philcockfield commented Jan 18, 2018 •

edited