-
-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update docs for DataLoader GetOrAdd*Loader
#3942
Update docs for DataLoader GetOrAdd*Loader
#3942
Conversation
This should clear up some of the confusion between `GetOrAddBatchLoader` and `GetOrAddCollectionBatchLoader`. Fixes graphql-dotnet#1375
docs2/site/docs/guides/dataloader.md
Outdated
- use `GetOrAddBatchLoader` for one-to-one relationships, i.e. when each key links to a single value; think for example of resolving users by their ids: each user id maps to one (and only one) user; | ||
- use `GetOrAddCollectionBatchLoader` for one-to-many or many-to-one relationships, i.e. when each key links to multiple values; think for example of resolving orders by user id: each user id may map to (zero or) more orders. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the terms 'one-to-one relationship' and 'one-to-many or many-to-one relationships' may be a bit misleading here, when considered from a database perspective.
If we consider typical database diagrams, such as Products and Categories, where each product maps to exactly one category, this is a typical many-to-one relationship yet it is GetOrAddBatchLoader
which would be applicable for the field named category
from within the Product type. And GetOrAddCollectionBatchLoader
may be used for one-to-many relationships (e.g. 'products' field under the Category type), or many-to-many relationships (like if each product could link to multiple categories).
One-to-one relationships could certainly be represented by GetOrAddBatchLoader
, or it is likely enough that they are an implementation detail, not exposed through the GraphQL schema. I honestly have not used one-to-one relationships in any of my databases so I am not sure how to formulate an example.
I suggest also that the example below be expanded, perhaps with three fields (many-to-one, one-to-many, and many-to-many) with applicable comments so people can better visualize the differences.
Here's an example schema, but perhaps you can think of something better:
type Product {
# other fields
brand: Brand! # Many-to-One relationship: Many products belong to one brand
categories: [Category!]! # Many-to-Many relationship: Many products belong to many categories
reviews: [Review!]! # One-to-Many relationship: One product can have many reviews
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean, but I think that it would make it less confusing if we just focussed on the use case for the graphql API rather than trying to map this back to some relational theory. After all, GraphQL is not tied to any particular storage. The way I see it, the N+1 problem arises when you have a query which involves a list of entities, which you also want to resolve some field of, regardless of whether that field is a list or a single type. In the example below:
customers
will resolve to a list. For each customer,defaultAddress
will resolve to a single entity.orders
will resolve to a list. For each orderproducts
will resolve to a list.
query NPlusOne {
customers {
# one-to-one, one customer -> one address
defaultAddress {
street
number
postCode
}
# one-to-many, one customer -> many orders
orders {
# many-to-many, one order can link to multiple products and one product can be in multiple orders
products {
qty
price
total
}
}
}
}
When resolving a field like customers.defaultAddress
, we run into the N+1 problem as GraphQL tries to resolve each address separately by looping over the items in the customers
list. A data loader simply informs the resolution strategy to wait until all customers are known before resolving their address via some special resolver.
Since each customer can have (in our simplified model) just one address, we would use GetOrAddBatchLoader
and have some resolver in the CustomerType
class like
Field<AddressType, Address>()
.Name("defaultAddress")
.ResolveAsync(context =>
{
var loader = accessor.Context.GetOrAddBatchLoader<int, Address>("GetDefaultAddressForCustomers", customers.GetDefaultAddressForCustomers);
// Add this UserId to the pending keys to fetch
return loader.LoadAsync(context.Source.CustomerId);
});
// in the `Customer` class
public Task<IDictionary<int, Address>> GetDefaultAddressForCustomers(IEnumerable<int> customerIds)
=> _db.Addresses.Where(a => customerIds.Contains(a.CustomerId) && a.IsDefault).ToDictionaryAsync(a => a.CustomerId, a => a);
For customers.orders
the problem is the same, but this time each customer can have multiple orders on their name. The same goes for orders.products
. So in both cases we would use GetOrAddCollectionBatchLoader
instead (the example below is for customers.orders
:
Field<OrderType, Order>()
.Name("orders")
.ResolveAsync(context =>
{
var loader = accessor.Context.GetOrAddCollectionBatchLoader<int, Order>("GetOrdersForCustomers", customers.GetOrdersForCustomers);
// Add this UserId to the pending keys to fetch
return loader.LoadAsync(context.Source.CustomerId);
});
// in the `Customer` class
public async Task<ILookup<int, Order>> GetOrdersForCustomers(IEnumerable<int> customerIds)
=> (await _db.Orders.Where(o => customerIds.Contains(o.CustomerId)).ToLookup(o => o.CustomerId, o => o);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the docs with some of the above reasoning. Hopefully it is now more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would make it less confusing if we just focussed on the use case for the graphql API rather than trying to map this back to some relational theory
Well, GraphQL is designed specifically for relational data. That's what the "graph" in GraphQL is; each type is linked to other types (relationships) and the protocol allows for requesting data across those defined relationships. While it's possible to use GraphQL to communicate with a non-relational backend, such as a simple IOT device or data logger, a simple REST API would suffice and GraphQL loses most of its benefit.
And specifically here the topic is data loaders, which explicitly relevant to relationships. As you stated, we should "focus on the use case for the GraphQL API" which in this case most decidedly is "relational theory". One-to-many and many-to-one relationships are critical to having a good understanding of when designing a relational system, be it a database or a GraphQL schema.
On a side note, an address for a customer is likely a one-to-one relationship - at least from the point of view of the defined relationships within GraphQL. Likely enough on the database side it's a many-to-one relationship, so the address entity can be linked to multiple orders without duplication, but assuming that there is no orders
field on the Address
type, it would be perceived by the GraphQL client as a one-to-one relationship.
docs2/site/docs/guides/dataloader.md
Outdated
- use `GetOrAddCollectionBatchLoader` for one-to-many or many-to-many relationships, i.e. when each key links to multiple values; think for example of resolving orders for a user: each user may map to (zero or) more orders. | ||
|
||
Note that, from a resolver's perspective, one-to-many and many-to-many relationships are exactly the same, since a resolver only sees one side of the relation. Think of orders and products. In theory, a product can appear in multiple orders and an order can contain multiple products, so this is a many-to-many relationship. However, a resolver for `Product.Orders` (or alternatively, `Order.Products`) will only need to care for the one-to-many side of the relation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like these comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New proposal for the paragraph above which hopefully makes it clear that we are talking about what the resolver sees:
The decision about whether to use the GetOrAddBatchLoader
or GetOrAddCollectionBatchLoader
method should be based on whether each key can link to only one or to multiple values:
- use
GetOrAddBatchLoader
for one-to-one relationships, i.e. when each key links to a single value; think for example of resolving the user who made an order: each order maps to one (and only one) user; - use
GetOrAddCollectionBatchLoader
for one-to-many or many-to-many relationships, i.e. when each key links to multiple values; think for example of resolving orders for a user: each user may map to (zero or) more orders.
Note that what matters here is the cardinality of the relation as seen from the resolver's perspective. Think for example of orders and products. In theory, a product can appear in multiple orders and an order can contain multiple products, so this is a many-to-many relationship. However, a resolver for Product.Orders
(or alternatively, Order.Products
) will only need to care for the one-to-many side of the relation (and thus use the GetOrAddBatchCollectionLoader
).
The same applies to one-to-many (or many-to-one) relationships. A resolver on the "one" side of the relation will use GetOrAddBatchCollectionLoader
, while a resolver on the "many" side will use GetOrAddBatchLoader
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok just reviewed. I would just add "or many-to-one" here:
use GetOrAddBatchLoader for one-to-one or many-to-one relationships, i.e. when each key links to a single value; think for example of resolving the user who made an order: each order maps to one (and only one) user;
The rest looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated by integrating the suggestion 👍
docs2/site/docs/guides/dataloader.md
Outdated
|
||
The decision about whether to use the `GetOrAddBatchLoader` or `GetOrAddCollectionBatchLoader` method should be based on whether each key can link to only one or to multiple values: | ||
|
||
- use `GetOrAddBatchLoader` for one-to-one relationships, i.e. when each key links to a single value; think for example of resolving the user who made an order: each order maps to one (and only one) user; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs work. It says "each order maps to one (and only one) user", but the next sentence below says "...[and] each user may map to (zero or) more orders". Those are both correct, and so it's clear that this here is a many-to-one relationship, not a one-to-one relationship, with the GetOrAddCollectionBatchLoader
being used for the opposite end of the same relationship.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are basically saying the same thing, just with different wordings 😄
I see this as a one-to-one relationship from the the order's side (see also the comments above), thus the "each order". Of course, more than one order will likely map to a user, as a user is able to make multiple orders. Yet, I would not personally try to be extremely precise here. What is imho important for a user of the library is what they need to know when resolving a field via a data loader, not what the underlying logical model is.
docs2/site/docs/guides/dataloader.md
Outdated
|
||
```graphql | ||
type Order { | ||
User: User! # one-to-one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User: User! # one-to-one | |
User: User! # many-to-one |
Again, this is the same relationship as shown below, just in reverse. Many orders connect to one user. Many-to-one.
One-to-one relationships exist but are typically few and far between. In one of my GraphQL schemas, I have something like this:
type Product {
pricingInformation: ProductPricingInfo
}
In my schema/database, this is a one-to-one relationship. Each 'Product' connects to exactly one 'ProductPricingInfo' and vice versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I think we just need to agree on whether we want to describe this from a logical viewpoint or simply from the resolver's "perspective". In the former case, this is indeed a many-to-one relationship, in the latter it can be seen as a one-to-one. My wording is due to my understanding that what matters for the correct usage of the data loader pattern is the relationship as seen from the resolver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By reading further about data loaders outside of this lib, it seems to me that a data loader is not bound to the relationships present in the logical model. One may very well register a generic batch-loading method for orders and use it when resolving a single order from a user (by calling the GetOrAddBatchLoader
method) and resolve the following query via a data loader:
query {
user {
order(id: 1) { # batched
amount
}
order(id: 2) { # batched
amount
}
}
}
In this case the "true" relationship between user and order is a one-to-many (or many-to-one as seen from the order side), but the call is to the GetOrAddBatchLoader
as the resolver here is only concerned with a single order (i.e. a one-to-one relationship).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m on the road and have not read your revised paragraphs yet, so bear with me.
I think while it makes sense from a “resolver perspective” to say it’s one-to-one, GraphQL is a relational schema and when we are providing examples that clearly are a many-to-one relationship within the GraphQL context (meaning, the opposite GraphQL type has a field corresponding to the same relationship), then we should use “many-to-one” in our examples. Or at least put it in parentheses like “(typically this is a many-to-one relationship, etc etc)”.
We have a number of new GraphQL developers that have a good understanding of database design, but need help when figuring out how to configure a GraphQL schema. Stating “one-to-one” for a many-to-one relationship would just add to their confusion. The goal is to make it as simple as possible for new developers to leverage their existing knowledge base and get up to speed with GraphQL quickly.
I’ll review your updated text as soon as I get a chance. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case the "true" relationship between user and order is a one-to-many (or many-to-one as seen from the order side)
That's true.
but the call is to the
GetOrAddBatchLoader
as the resolver here is only concerned with a single order (i.e. a one-to-one relationship).
Only as "seen by the resolver". Not as the general relationship between users and orders.
Perhaps we should drop the emphasis on the "one-to-many" relationship, moving it to more of a comment, so something like this:
GetOrAddBatchLoader
is used when the resolver is expected to return a single record (or null). For instance, for acustomer
field for anOrder
type. Typically these are used for many-to-one relationships, but in some circumstances may represent other relationships. In this example, the primary key of the customer table is fed to the data loader, which retrieves all customers at a single time from the underlying data source.GetOrAddCollectionBatchLoader
is used when the resolver is expected to return a list of records. For instance, for aorders
field on aCustomer
type. Typically these are used for one-to-many or many-to-many relationships. In this example, the primary key of the customer table is fed to the data loader, which retrieves all orders for all matching customer ids at a single time from the underlying data source.
That needs work, but just an idea. Sorta like some of the text you had I think.
I do think that mentioning the relationship is valuable, as for the typical configuration it is very helpful for developers to leverage their knowledge of their existing database design to use when configuring a GraphQL schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I think we just need to agree on whether we want to describe this from a logical viewpoint or simply from the resolver's "perspective". In the former case, this is indeed a many-to-one relationship, in the latter it can be seen as a one-to-one. My wording is due to my understanding that what matters for the correct usage of the data loader pattern is the relationship as seen from the resolver.
I think it's going to be very confusing to developers if we describe it "from the resolver's perspective", using terms that people readily associate with database design. I'm okay if we want to describe it from the resolver's perspective, but not with terms like "one-to-one" or "many-to-one" to describe it. Basically what my comment above was saying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your concern, though I want to mention that, to me, that was exactly the confusing part, as I was reasoning in terms of the relation as defined in the DB, but the API required me to reason in terms of the resolver's point of view :D
In any case, I'll update this so we use the "true" relation and then explain what that means from the resolver's perspective.
Use standard DB relationships types between the entities instead of looking at them from the resolver's perspective.
This should clear up some of the confusion between
GetOrAddBatchLoader
andGetOrAddCollectionBatchLoader
.Fixes #1375