Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs for DataLoader GetOrAdd*Loader #3942

Merged

Conversation

macco3k
Copy link
Contributor

@macco3k macco3k commented May 5, 2024

This should clear up some of the confusion between GetOrAddBatchLoader and GetOrAddCollectionBatchLoader.

Fixes #1375

This should clear up some of the confusion between `GetOrAddBatchLoader` and `GetOrAddCollectionBatchLoader`.

Fixes graphql-dotnet#1375
Comment on lines 78 to 79
- use `GetOrAddBatchLoader` for one-to-one relationships, i.e. when each key links to a single value; think for example of resolving users by their ids: each user id maps to one (and only one) user;
- use `GetOrAddCollectionBatchLoader` for one-to-many or many-to-one relationships, i.e. when each key links to multiple values; think for example of resolving orders by user id: each user id may map to (zero or) more orders.
Copy link
Member

@Shane32 Shane32 May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the terms 'one-to-one relationship' and 'one-to-many or many-to-one relationships' may be a bit misleading here, when considered from a database perspective.

If we consider typical database diagrams, such as Products and Categories, where each product maps to exactly one category, this is a typical many-to-one relationship yet it is GetOrAddBatchLoader which would be applicable for the field named category from within the Product type. And GetOrAddCollectionBatchLoader may be used for one-to-many relationships (e.g. 'products' field under the Category type), or many-to-many relationships (like if each product could link to multiple categories).

One-to-one relationships could certainly be represented by GetOrAddBatchLoader, or it is likely enough that they are an implementation detail, not exposed through the GraphQL schema. I honestly have not used one-to-one relationships in any of my databases so I am not sure how to formulate an example.

I suggest also that the example below be expanded, perhaps with three fields (many-to-one, one-to-many, and many-to-many) with applicable comments so people can better visualize the differences.

Here's an example schema, but perhaps you can think of something better:

type Product {
  # other fields
  brand: Brand!             # Many-to-One relationship: Many products belong to one brand
  categories: [Category!]!  # Many-to-Many relationship: Many products belong to many categories
  reviews: [Review!]!       # One-to-Many relationship: One product can have many reviews
}

Copy link
Contributor Author

@macco3k macco3k May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean, but I think that it would make it less confusing if we just focussed on the use case for the graphql API rather than trying to map this back to some relational theory. After all, GraphQL is not tied to any particular storage. The way I see it, the N+1 problem arises when you have a query which involves a list of entities, which you also want to resolve some field of, regardless of whether that field is a list or a single type. In the example below:

  • customers will resolve to a list. For each customer,
    • defaultAddress will resolve to a single entity.
    • orders will resolve to a list. For each order
      • products will resolve to a list.
query NPlusOne {
    customers {
        # one-to-one, one customer -> one address
        defaultAddress {
             street
             number
             postCode
        }
        # one-to-many, one customer -> many orders
        orders {
            # many-to-many, one order can link to multiple products and one product can be in multiple orders
            products { 
              qty
              price
              total
            }
        }
    }
}

When resolving a field like customers.defaultAddress, we run into the N+1 problem as GraphQL tries to resolve each address separately by looping over the items in the customers list. A data loader simply informs the resolution strategy to wait until all customers are known before resolving their address via some special resolver.

Since each customer can have (in our simplified model) just one address, we would use GetOrAddBatchLoader and have some resolver in the CustomerType class like

Field<AddressType, Address>()
    .Name("defaultAddress")
    .ResolveAsync(context =>
    {
        var loader = accessor.Context.GetOrAddBatchLoader<int, Address>("GetDefaultAddressForCustomers", customers.GetDefaultAddressForCustomers);
    
        // Add this UserId to the pending keys to fetch
        return loader.LoadAsync(context.Source.CustomerId);
    });

// in the `Customer` class
public Task<IDictionary<int, Address>> GetDefaultAddressForCustomers(IEnumerable<int> customerIds) 
    => _db.Addresses.Where(a => customerIds.Contains(a.CustomerId) && a.IsDefault).ToDictionaryAsync(a => a.CustomerId, a => a);

For customers.orders the problem is the same, but this time each customer can have multiple orders on their name. The same goes for orders.products. So in both cases we would use GetOrAddCollectionBatchLoader instead (the example below is for customers.orders:

Field<OrderType, Order>()
    .Name("orders")
    .ResolveAsync(context =>
    {
        var loader = accessor.Context.GetOrAddCollectionBatchLoader<int, Order>("GetOrdersForCustomers", customers.GetOrdersForCustomers);
    
        // Add this UserId to the pending keys to fetch
        return loader.LoadAsync(context.Source.CustomerId);
    });

// in the `Customer` class
public async Task<ILookup<int, Order>> GetOrdersForCustomers(IEnumerable<int> customerIds) 
    => (await _db.Orders.Where(o => customerIds.Contains(o.CustomerId)).ToLookup(o => o.CustomerId, o => o);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the docs with some of the above reasoning. Hopefully it is now more clear?

Copy link
Member

@Shane32 Shane32 May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would make it less confusing if we just focussed on the use case for the graphql API rather than trying to map this back to some relational theory

Well, GraphQL is designed specifically for relational data. That's what the "graph" in GraphQL is; each type is linked to other types (relationships) and the protocol allows for requesting data across those defined relationships. While it's possible to use GraphQL to communicate with a non-relational backend, such as a simple IOT device or data logger, a simple REST API would suffice and GraphQL loses most of its benefit.

And specifically here the topic is data loaders, which explicitly relevant to relationships. As you stated, we should "focus on the use case for the GraphQL API" which in this case most decidedly is "relational theory". One-to-many and many-to-one relationships are critical to having a good understanding of when designing a relational system, be it a database or a GraphQL schema.

On a side note, an address for a customer is likely a one-to-one relationship - at least from the point of view of the defined relationships within GraphQL. Likely enough on the database side it's a many-to-one relationship, so the address entity can be linked to multiple orders without duplication, but assuming that there is no orders field on the Address type, it would be perceived by the GraphQL client as a one-to-one relationship.

Comment on lines 97 to 99
- use `GetOrAddCollectionBatchLoader` for one-to-many or many-to-many relationships, i.e. when each key links to multiple values; think for example of resolving orders for a user: each user may map to (zero or) more orders.

Note that, from a resolver's perspective, one-to-many and many-to-many relationships are exactly the same, since a resolver only sees one side of the relation. Think of orders and products. In theory, a product can appear in multiple orders and an order can contain multiple products, so this is a many-to-many relationship. However, a resolver for `Product.Orders` (or alternatively, `Order.Products`) will only need to care for the one-to-many side of the relation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like these comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New proposal for the paragraph above which hopefully makes it clear that we are talking about what the resolver sees:

The decision about whether to use the GetOrAddBatchLoader or GetOrAddCollectionBatchLoader method should be based on whether each key can link to only one or to multiple values:

  • use GetOrAddBatchLoader for one-to-one relationships, i.e. when each key links to a single value; think for example of resolving the user who made an order: each order maps to one (and only one) user;
  • use GetOrAddCollectionBatchLoader for one-to-many or many-to-many relationships, i.e. when each key links to multiple values; think for example of resolving orders for a user: each user may map to (zero or) more orders.

Note that what matters here is the cardinality of the relation as seen from the resolver's perspective. Think for example of orders and products. In theory, a product can appear in multiple orders and an order can contain multiple products, so this is a many-to-many relationship. However, a resolver for Product.Orders (or alternatively, Order.Products) will only need to care for the one-to-many side of the relation (and thus use the GetOrAddBatchCollectionLoader).

The same applies to one-to-many (or many-to-one) relationships. A resolver on the "one" side of the relation will use GetOrAddBatchCollectionLoader, while a resolver on the "many" side will use GetOrAddBatchLoader.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok just reviewed. I would just add "or many-to-one" here:

use GetOrAddBatchLoader for one-to-one or many-to-one relationships, i.e. when each key links to a single value; think for example of resolving the user who made an order: each order maps to one (and only one) user;

The rest looks good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated by integrating the suggestion 👍


The decision about whether to use the `GetOrAddBatchLoader` or `GetOrAddCollectionBatchLoader` method should be based on whether each key can link to only one or to multiple values:

- use `GetOrAddBatchLoader` for one-to-one relationships, i.e. when each key links to a single value; think for example of resolving the user who made an order: each order maps to one (and only one) user;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs work. It says "each order maps to one (and only one) user", but the next sentence below says "...[and] each user may map to (zero or) more orders". Those are both correct, and so it's clear that this here is a many-to-one relationship, not a one-to-one relationship, with the GetOrAddCollectionBatchLoader being used for the opposite end of the same relationship.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are basically saying the same thing, just with different wordings 😄

I see this as a one-to-one relationship from the the order's side (see also the comments above), thus the "each order". Of course, more than one order will likely map to a user, as a user is able to make multiple orders. Yet, I would not personally try to be extremely precise here. What is imho important for a user of the library is what they need to know when resolving a field via a data loader, not what the underlying logical model is.


```graphql
type Order {
User: User! # one-to-one
Copy link
Member

@Shane32 Shane32 May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
User: User! # one-to-one
User: User! # many-to-one

Again, this is the same relationship as shown below, just in reverse. Many orders connect to one user. Many-to-one.

One-to-one relationships exist but are typically few and far between. In one of my GraphQL schemas, I have something like this:

type Product {
    pricingInformation: ProductPricingInfo
}

In my schema/database, this is a one-to-one relationship. Each 'Product' connects to exactly one 'ProductPricingInfo' and vice versa.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I think we just need to agree on whether we want to describe this from a logical viewpoint or simply from the resolver's "perspective". In the former case, this is indeed a many-to-one relationship, in the latter it can be seen as a one-to-one. My wording is due to my understanding that what matters for the correct usage of the data loader pattern is the relationship as seen from the resolver.

Copy link
Contributor Author

@macco3k macco3k May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By reading further about data loaders outside of this lib, it seems to me that a data loader is not bound to the relationships present in the logical model. One may very well register a generic batch-loading method for orders and use it when resolving a single order from a user (by calling the GetOrAddBatchLoader method) and resolve the following query via a data loader:

query {
    user {
        order(id: 1) { # batched
            amount
        }
        order(id: 2) { # batched
            amount
        }
    }
}

In this case the "true" relationship between user and order is a one-to-many (or many-to-one as seen from the order side), but the call is to the GetOrAddBatchLoader as the resolver here is only concerned with a single order (i.e. a one-to-one relationship).

Copy link
Member

@Shane32 Shane32 May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m on the road and have not read your revised paragraphs yet, so bear with me.

I think while it makes sense from a “resolver perspective” to say it’s one-to-one, GraphQL is a relational schema and when we are providing examples that clearly are a many-to-one relationship within the GraphQL context (meaning, the opposite GraphQL type has a field corresponding to the same relationship), then we should use “many-to-one” in our examples. Or at least put it in parentheses like “(typically this is a many-to-one relationship, etc etc)”.

We have a number of new GraphQL developers that have a good understanding of database design, but need help when figuring out how to configure a GraphQL schema. Stating “one-to-one” for a many-to-one relationship would just add to their confusion. The goal is to make it as simple as possible for new developers to leverage their existing knowledge base and get up to speed with GraphQL quickly.

I’ll review your updated text as soon as I get a chance. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the "true" relationship between user and order is a one-to-many (or many-to-one as seen from the order side)

That's true.

but the call is to the GetOrAddBatchLoader as the resolver here is only concerned with a single order (i.e. a one-to-one relationship).

Only as "seen by the resolver". Not as the general relationship between users and orders.

Perhaps we should drop the emphasis on the "one-to-many" relationship, moving it to more of a comment, so something like this:

  • GetOrAddBatchLoader is used when the resolver is expected to return a single record (or null). For instance, for a customer field for an Order type. Typically these are used for many-to-one relationships, but in some circumstances may represent other relationships. In this example, the primary key of the customer table is fed to the data loader, which retrieves all customers at a single time from the underlying data source.
  • GetOrAddCollectionBatchLoader is used when the resolver is expected to return a list of records. For instance, for a orders field on a Customer type. Typically these are used for one-to-many or many-to-many relationships. In this example, the primary key of the customer table is fed to the data loader, which retrieves all orders for all matching customer ids at a single time from the underlying data source.

That needs work, but just an idea. Sorta like some of the text you had I think.

I do think that mentioning the relationship is valuable, as for the typical configuration it is very helpful for developers to leverage their knowledge of their existing database design to use when configuring a GraphQL schema.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I think we just need to agree on whether we want to describe this from a logical viewpoint or simply from the resolver's "perspective". In the former case, this is indeed a many-to-one relationship, in the latter it can be seen as a one-to-one. My wording is due to my understanding that what matters for the correct usage of the data loader pattern is the relationship as seen from the resolver.

I think it's going to be very confusing to developers if we describe it "from the resolver's perspective", using terms that people readily associate with database design. I'm okay if we want to describe it from the resolver's perspective, but not with terms like "one-to-one" or "many-to-one" to describe it. Basically what my comment above was saying.

Copy link
Contributor Author

@macco3k macco3k May 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your concern, though I want to mention that, to me, that was exactly the confusing part, as I was reasoning in terms of the relation as defined in the DB, but the API required me to reason in terms of the resolver's point of view :D

In any case, I'll update this so we use the "true" relation and then explain what that means from the resolver's perspective.

Use standard DB relationships types between the entities instead of looking at them from the resolver's perspective.
@Shane32 Shane32 merged commit 5910f9c into graphql-dotnet:master May 13, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Few questions about DataLoader
2 participants