Update docs for DataLoader `GetOrAdd*Loader` #3942

macco3k · 2024-05-05T09:35:09Z

This should clear up some of the confusion between GetOrAddBatchLoader and GetOrAddCollectionBatchLoader.

This should clear up some of the confusion between `GetOrAddBatchLoader` and `GetOrAddCollectionBatchLoader`. Fixes graphql-dotnet#1375

Shane32 · 2024-05-06T05:57:10Z

docs2/site/docs/guides/dataloader.md

+- use `GetOrAddBatchLoader` for one-to-one relationships, i.e. when each key links to a single value; think for example of resolving users by their ids: each user id maps to one (and only one) user;
+- use `GetOrAddCollectionBatchLoader` for one-to-many or many-to-one relationships, i.e. when each key links to multiple values; think for example of resolving orders by user id: each user id may map to (zero or) more orders.


I think the terms 'one-to-one relationship' and 'one-to-many or many-to-one relationships' may be a bit misleading here, when considered from a database perspective.

If we consider typical database diagrams, such as Products and Categories, where each product maps to exactly one category, this is a typical many-to-one relationship yet it is GetOrAddBatchLoader which would be applicable for the field named category from within the Product type. And GetOrAddCollectionBatchLoader may be used for one-to-many relationships (e.g. 'products' field under the Category type), or many-to-many relationships (like if each product could link to multiple categories).

One-to-one relationships could certainly be represented by GetOrAddBatchLoader, or it is likely enough that they are an implementation detail, not exposed through the GraphQL schema. I honestly have not used one-to-one relationships in any of my databases so I am not sure how to formulate an example.

I suggest also that the example below be expanded, perhaps with three fields (many-to-one, one-to-many, and many-to-many) with applicable comments so people can better visualize the differences.

Here's an example schema, but perhaps you can think of something better:

type Product { # other fields brand: Brand! # Many-to-One relationship: Many products belong to one brand categories: [Category!]! # Many-to-Many relationship: Many products belong to many categories reviews: [Review!]! # One-to-Many relationship: One product can have many reviews }

I see what you mean, but I think that it would make it less confusing if we just focussed on the use case for the graphql API rather than trying to map this back to some relational theory. After all, GraphQL is not tied to any particular storage. The way I see it, the N+1 problem arises when you have a query which involves a list of entities, which you also want to resolve some field of, regardless of whether that field is a list or a single type. In the example below:

customers will resolve to a list. For each customer,

defaultAddress will resolve to a single entity.

orders will resolve to a list. For each order

products will resolve to a list.

query NPlusOne { customers { # one-to-one, one customer -> one address defaultAddress { street number postCode } # one-to-many, one customer -> many orders orders { # many-to-many, one order can link to multiple products and one product can be in multiple orders products { qty price total } } } }

When resolving a field like customers.defaultAddress, we run into the N+1 problem as GraphQL tries to resolve each address separately by looping over the items in the customers list. A data loader simply informs the resolution strategy to wait until all customers are known before resolving their address via some special resolver.

Since each customer can have (in our simplified model) just one address, we would use GetOrAddBatchLoader and have some resolver in the CustomerType class like

Field<AddressType, Address>() .Name("defaultAddress") .ResolveAsync(context => { var loader = accessor.Context.GetOrAddBatchLoader<int, Address>("GetDefaultAddressForCustomers", customers.GetDefaultAddressForCustomers); // Add this UserId to the pending keys to fetch return loader.LoadAsync(context.Source.CustomerId); }); // in the `Customer` class public Task<IDictionary<int, Address>> GetDefaultAddressForCustomers(IEnumerable<int> customerIds) => _db.Addresses.Where(a => customerIds.Contains(a.CustomerId) && a.IsDefault).ToDictionaryAsync(a => a.CustomerId, a => a);

For customers.orders the problem is the same, but this time each customer can have multiple orders on their name. The same goes for orders.products. So in both cases we would use GetOrAddCollectionBatchLoader instead (the example below is for customers.orders:

Field<OrderType, Order>() .Name("orders") .ResolveAsync(context => { var loader = accessor.Context.GetOrAddCollectionBatchLoader<int, Order>("GetOrdersForCustomers", customers.GetOrdersForCustomers); // Add this UserId to the pending keys to fetch return loader.LoadAsync(context.Source.CustomerId); }); // in the `Customer` class public async Task<ILookup<int, Order>> GetOrdersForCustomers(IEnumerable<int> customerIds) => (await _db.Orders.Where(o => customerIds.Contains(o.CustomerId)).ToLookup(o => o.CustomerId, o => o);

I've updated the docs with some of the above reasoning. Hopefully it is now more clear?

it would make it less confusing if we just focussed on the use case for the graphql API rather than trying to map this back to some relational theory

Well, GraphQL is designed specifically for relational data. That's what the "graph" in GraphQL is; each type is linked to other types (relationships) and the protocol allows for requesting data across those defined relationships. While it's possible to use GraphQL to communicate with a non-relational backend, such as a simple IOT device or data logger, a simple REST API would suffice and GraphQL loses most of its benefit.

And specifically here the topic is data loaders, which explicitly relevant to relationships. As you stated, we should "focus on the use case for the GraphQL API" which in this case most decidedly is "relational theory". One-to-many and many-to-one relationships are critical to having a good understanding of when designing a relational system, be it a database or a GraphQL schema.

On a side note, an address for a customer is likely a one-to-one relationship - at least from the point of view of the defined relationships within GraphQL. Likely enough on the database side it's a many-to-one relationship, so the address entity can be linked to multiple orders without duplication, but assuming that there is no orders field on the Address type, it would be perceived by the GraphQL client as a one-to-one relationship.

Shane32 · 2024-05-09T15:14:21Z

docs2/site/docs/guides/dataloader.md

+- use `GetOrAddCollectionBatchLoader` for one-to-many or many-to-many relationships, i.e. when each key links to multiple values; think for example of resolving orders for a user: each user may map to (zero or) more orders.
+
+Note that, from a resolver's perspective, one-to-many and many-to-many relationships are exactly the same, since a resolver only sees one side of the relation. Think of orders and products. In theory, a product can appear in multiple orders and an order can contain multiple products, so this is a many-to-many relationship. However, a resolver for `Product.Orders` (or alternatively, `Order.Products`) will only need to care for the one-to-many side of the relation.


I like these comments.

New proposal for the paragraph above which hopefully makes it clear that we are talking about what the resolver sees:

The decision about whether to use the GetOrAddBatchLoader or GetOrAddCollectionBatchLoader method should be based on whether each key can link to only one or to multiple values:

use GetOrAddBatchLoader for one-to-one relationships, i.e. when each key links to a single value; think for example of resolving the user who made an order: each order maps to one (and only one) user;

use GetOrAddCollectionBatchLoader for one-to-many or many-to-many relationships, i.e. when each key links to multiple values; think for example of resolving orders for a user: each user may map to (zero or) more orders.

Note that what matters here is the cardinality of the relation as seen from the resolver's perspective. Think for example of orders and products. In theory, a product can appear in multiple orders and an order can contain multiple products, so this is a many-to-many relationship. However, a resolver for Product.Orders (or alternatively, Order.Products) will only need to care for the one-to-many side of the relation (and thus use the GetOrAddBatchCollectionLoader).

The same applies to one-to-many (or many-to-one) relationships. A resolver on the "one" side of the relation will use GetOrAddBatchCollectionLoader, while a resolver on the "many" side will use GetOrAddBatchLoader.

Ok just reviewed. I would just add "or many-to-one" here:

use GetOrAddBatchLoader for one-to-one or many-to-one relationships, i.e. when each key links to a single value; think for example of resolving the user who made an order: each order maps to one (and only one) user;

The rest looks good.

Updated by integrating the suggestion 👍

Shane32 · 2024-05-09T15:17:28Z

docs2/site/docs/guides/dataloader.md

+
+The decision about whether to use the `GetOrAddBatchLoader` or `GetOrAddCollectionBatchLoader` method should be based on whether each key can link to only one or to multiple values:
+
+- use `GetOrAddBatchLoader` for one-to-one relationships, i.e. when each key links to a single value; think for example of resolving the user who made an order: each order maps to one (and only one) user;


This needs work. It says "each order maps to one (and only one) user", but the next sentence below says "...[and] each user may map to (zero or) more orders". Those are both correct, and so it's clear that this here is a many-to-one relationship, not a one-to-one relationship, with the GetOrAddCollectionBatchLoader being used for the opposite end of the same relationship.

I think we are basically saying the same thing, just with different wordings 😄

I see this as a one-to-one relationship from the the order's side (see also the comments above), thus the "each order". Of course, more than one order will likely map to a user, as a user is able to make multiple orders. Yet, I would not personally try to be extremely precise here. What is imho important for a user of the library is what they need to know when resolving a field via a data loader, not what the underlying logical model is.

Shane32 · 2024-05-09T15:21:41Z

docs2/site/docs/guides/dataloader.md

+
+```graphql
+type Order {
+    User: User! # one-to-one


Suggested change

User: User! # one-to-one

User: User! # many-to-one

Again, this is the same relationship as shown below, just in reverse. Many orders connect to one user. Many-to-one.

One-to-one relationships exist but are typically few and far between. In one of my GraphQL schemas, I have something like this:

type Product { pricingInformation: ProductPricingInfo }

In my schema/database, this is a one-to-one relationship. Each 'Product' connects to exactly one 'ProductPricingInfo' and vice versa.

Fair enough. I think we just need to agree on whether we want to describe this from a logical viewpoint or simply from the resolver's "perspective". In the former case, this is indeed a many-to-one relationship, in the latter it can be seen as a one-to-one. My wording is due to my understanding that what matters for the correct usage of the data loader pattern is the relationship as seen from the resolver.

By reading further about data loaders outside of this lib, it seems to me that a data loader is not bound to the relationships present in the logical model. One may very well register a generic batch-loading method for orders and use it when resolving a single order from a user (by calling the GetOrAddBatchLoader method) and resolve the following query via a data loader:

query { user { order(id: 1) { # batched amount } order(id: 2) { # batched amount } } }

In this case the "true" relationship between user and order is a one-to-many (or many-to-one as seen from the order side), but the call is to the GetOrAddBatchLoader as the resolver here is only concerned with a single order (i.e. a one-to-one relationship).

I’m on the road and have not read your revised paragraphs yet, so bear with me.

I think while it makes sense from a “resolver perspective” to say it’s one-to-one, GraphQL is a relational schema and when we are providing examples that clearly are a many-to-one relationship within the GraphQL context (meaning, the opposite GraphQL type has a field corresponding to the same relationship), then we should use “many-to-one” in our examples. Or at least put it in parentheses like “(typically this is a many-to-one relationship, etc etc)”.

We have a number of new GraphQL developers that have a good understanding of database design, but need help when figuring out how to configure a GraphQL schema. Stating “one-to-one” for a many-to-one relationship would just add to their confusion. The goal is to make it as simple as possible for new developers to leverage their existing knowledge base and get up to speed with GraphQL quickly.

I’ll review your updated text as soon as I get a chance. Thanks!

In this case the "true" relationship between user and order is a one-to-many (or many-to-one as seen from the order side)

That's true.

but the call is to the GetOrAddBatchLoader as the resolver here is only concerned with a single order (i.e. a one-to-one relationship).

Only as "seen by the resolver". Not as the general relationship between users and orders.

Perhaps we should drop the emphasis on the "one-to-many" relationship, moving it to more of a comment, so something like this:

GetOrAddBatchLoader is used when the resolver is expected to return a single record (or null). For instance, for a customer field for an Order type. Typically these are used for many-to-one relationships, but in some circumstances may represent other relationships. In this example, the primary key of the customer table is fed to the data loader, which retrieves all customers at a single time from the underlying data source.

GetOrAddCollectionBatchLoader is used when the resolver is expected to return a list of records. For instance, for a orders field on a Customer type. Typically these are used for one-to-many or many-to-many relationships. In this example, the primary key of the customer table is fed to the data loader, which retrieves all orders for all matching customer ids at a single time from the underlying data source.

That needs work, but just an idea. Sorta like some of the text you had I think.

I do think that mentioning the relationship is valuable, as for the typical configuration it is very helpful for developers to leverage their knowledge of their existing database design to use when configuring a GraphQL schema.

Fair enough. I think we just need to agree on whether we want to describe this from a logical viewpoint or simply from the resolver's "perspective". In the former case, this is indeed a many-to-one relationship, in the latter it can be seen as a one-to-one. My wording is due to my understanding that what matters for the correct usage of the data loader pattern is the relationship as seen from the resolver.

I think it's going to be very confusing to developers if we describe it "from the resolver's perspective", using terms that people readily associate with database design. I'm okay if we want to describe it from the resolver's perspective, but not with terms like "one-to-one" or "many-to-one" to describe it. Basically what my comment above was saying.

I see your concern, though I want to mention that, to me, that was exactly the confusing part, as I was reasoning in terms of the relation as defined in the DB, but the API required me to reason in terms of the resolver's point of view :D

In any case, I'll update this so we use the "true" relation and then explain what that means from the resolver's perspective.

Use standard DB relationships types between the entities instead of looking at them from the resolver's perspective.

Update docs for DataLoader GetOrAdd*Loader

18a429c

This should clear up some of the confusion between `GetOrAddBatchLoader` and `GetOrAddCollectionBatchLoader`. Fixes graphql-dotnet#1375

Shane32 reviewed May 6, 2024

View reviewed changes

Clarify GetOrAdd*Loader usage and expand examples

a1c9ff2

Shane32 reviewed May 9, 2024

View reviewed changes

Use the "true" relationship type throughout

15d0573

Use standard DB relationships types between the entities instead of looking at them from the resolver's perspective.

Shane32 approved these changes May 13, 2024

View reviewed changes

Shane32 merged commit 5910f9c into graphql-dotnet:master May 13, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update docs for DataLoader `GetOrAdd*Loader` #3942

Update docs for DataLoader `GetOrAdd*Loader` #3942

macco3k commented May 5, 2024

Shane32 May 6, 2024 •

edited

macco3k May 9, 2024 •

edited

macco3k May 9, 2024

Shane32 May 9, 2024 •

edited

Shane32 May 9, 2024

macco3k May 9, 2024

Shane32 May 10, 2024

macco3k May 12, 2024

Shane32 May 9, 2024

macco3k May 9, 2024

Shane32 May 9, 2024 •

edited

macco3k May 9, 2024

macco3k May 9, 2024 •

edited

Shane32 May 9, 2024 •

edited

Shane32 May 10, 2024

Shane32 May 10, 2024

macco3k May 12, 2024 •

edited

		- use `GetOrAddBatchLoader` for one-to-one relationships, i.e. when each key links to a single value; think for example of resolving users by their ids: each user id maps to one (and only one) user;
		- use `GetOrAddCollectionBatchLoader` for one-to-many or many-to-one relationships, i.e. when each key links to multiple values; think for example of resolving orders by user id: each user id may map to (zero or) more orders.

		- use `GetOrAddCollectionBatchLoader` for one-to-many or many-to-many relationships, i.e. when each key links to multiple values; think for example of resolving orders for a user: each user may map to (zero or) more orders.

		Note that, from a resolver's perspective, one-to-many and many-to-many relationships are exactly the same, since a resolver only sees one side of the relation. Think of orders and products. In theory, a product can appear in multiple orders and an order can contain multiple products, so this is a many-to-many relationship. However, a resolver for `Product.Orders` (or alternatively, `Order.Products`) will only need to care for the one-to-many side of the relation.


		The decision about whether to use the `GetOrAddBatchLoader` or `GetOrAddCollectionBatchLoader` method should be based on whether each key can link to only one or to multiple values:

		- use `GetOrAddBatchLoader` for one-to-one relationships, i.e. when each key links to a single value; think for example of resolving the user who made an order: each order maps to one (and only one) user;

Update docs for DataLoader GetOrAdd*Loader #3942

Update docs for DataLoader GetOrAdd*Loader #3942

Conversation

macco3k commented May 5, 2024

Shane32 May 6, 2024 • edited

Choose a reason for hiding this comment

macco3k May 9, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shane32 May 9, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shane32 May 9, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

macco3k May 9, 2024 • edited

Choose a reason for hiding this comment

Shane32 May 9, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

macco3k May 12, 2024 • edited

Choose a reason for hiding this comment

Update docs for DataLoader `GetOrAdd*Loader` #3942

Update docs for DataLoader `GetOrAdd*Loader` #3942

Shane32 May 6, 2024 •

edited

macco3k May 9, 2024 •

edited

Shane32 May 9, 2024 •

edited

Shane32 May 9, 2024 •

edited

macco3k May 9, 2024 •

edited

Shane32 May 9, 2024 •

edited

macco3k May 12, 2024 •

edited