Inconsistent execution results #12

IcanDivideBy0 · 2020-02-27T11:01:24Z

I've noticed that sometime batches works, and some times not... I'm using a very simple schema at the moment, sending the same request multiple time can give different results in how the loader is executed:

query foo {
  orders {
    id
    user { id }
  }
}

I'm using a loader on the orders.user field:

#[juniper::graphql_object(Context = Context)]
impl Order {
    fn id(&self) -> juniper::ID {
        self.id.to_string().into()
    }

    async fn user(&self, ctx: &Context) -> types::User {
        ctx.loaders.user.load(self.user.clone()).await.unwrap()
    }
}

and the UserLoader is basically a copy past of the example in juniper doc

db contains 2 orders, both having the same user field. Here is some logs of the same request send multiple times:

 DEBUG hyper::proto::h1::io                > read 643 bytes
 DEBUG hyper::proto::h1::io                > parsed 14 headers
 DEBUG hyper::proto::h1::conn              > incoming body is content-length (109 bytes)
 DEBUG hyper::proto::h1::conn              > incoming body completed
 DEBUG syos::graphql::utils::loaders::user > load batch [ObjectId(593fdd2ba9c0edf74ff0b38c), ObjectId(593fdd2ba9c0edf74ff0b38c)]
 INFO  GraphQL                             > 127.0.0.1:45080 "POST /graphql HTTP/1.1" 200 "http://127.0.0.1:4444/graphiql" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36" 9.229761ms
 DEBUG hyper::proto::h1::io                > flushed 372 bytes


 DEBUG hyper::proto::h1::io                > read 643 bytes
 DEBUG hyper::proto::h1::io                > parsed 14 headers
 DEBUG hyper::proto::h1::conn              > incoming body is content-length (109 bytes)
 DEBUG hyper::proto::h1::conn              > incoming body completed
 DEBUG syos::graphql::utils::loaders::user > load batch [ObjectId(593fdd2ba9c0edf74ff0b38c)]
 DEBUG syos::graphql::utils::loaders::user > load batch [ObjectId(593fdd2ba9c0edf74ff0b38c)]
 INFO  GraphQL                             > 127.0.0.1:45080 "POST /graphql HTTP/1.1" 200 "http://127.0.0.1:4444/graphiql" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36" 12.157163ms
 DEBUG hyper::proto::h1::io                > flushed 372 bytes


 DEBUG hyper::proto::h1::io                > read 643 bytes
 DEBUG hyper::proto::h1::io                > parsed 14 headers
 DEBUG hyper::proto::h1::conn              > incoming body is content-length (109 bytes)
 DEBUG hyper::proto::h1::conn              > incoming body completed
 DEBUG syos::graphql::utils::loaders::user > load batch [ObjectId(593fdd2ba9c0edf74ff0b38c), ObjectId(593fdd2ba9c0edf74ff0b38c)]
 INFO  GraphQL                             > 127.0.0.1:45080 "POST /graphql HTTP/1.1" 200 "http://127.0.0.1:4444/graphiql" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36" 12.120952ms
 DEBUG hyper::proto::h1::io                > flushed 372 bytes


 DEBUG hyper::proto::h1::io                > read 643 bytes
 DEBUG hyper::proto::h1::io                > parsed 14 headers
 DEBUG hyper::proto::h1::conn              > incoming body is content-length (109 bytes)
 DEBUG hyper::proto::h1::conn              > incoming body completed
 DEBUG syos::graphql::utils::loaders::user > load batch [ObjectId(593fdd2ba9c0edf74ff0b38c), ObjectId(593fdd2ba9c0edf74ff0b38c)]
 INFO  GraphQL                             > 127.0.0.1:45080 "POST /graphql HTTP/1.1" 200 "http://127.0.0.1:4444/graphiql" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36" 10.009887ms
 DEBUG hyper::proto::h1::io                > flushed 372 bytes

We can see that some times load batch is called with 2 ids, some times called twice with the same id.

More infos:

tokio runtime (tried both threaded_scheduler and basic_scheduler with the same result)
warp http server
juniper master branch
mongodb database (for which the driver is not async yet)
context is created per request (and so are loaders)

I can provide more code if necessary, or maybe even a simple repo to reproduce the issue.

The text was updated successfully, but these errors were encountered:

IcanDivideBy0 · 2020-02-27T11:51:23Z

Ok, playing a bit more around this, it feels like this might be caused by the database answering too fast... before the juniper executor gets to resolve the second order, so this might be a non-issue on a real production server.

IcanDivideBy0 · 2020-02-28T07:57:38Z

After some more thought, this issue can't be related to the database or anything related to my BatchFn impl, BatchFn::load calls should be batched.

When calling the loader in a non-async resolver, things are batched properly 100% of the time, ie:

fn user(&self, ctx: &Context) -> Option<types::User> {
    let fut = ctx.loaders.user.load(self.user.clone());
    tokio::task::spawn(async {
        let _user = fut.await.unwrap();
    });

    None
}

So now I'm very suspicious to this line

Maybe rescheduling at the event loop is not enouth, juniper might not have called the resolver of the neighbor field yet ?

IcanDivideBy0 · 2020-02-28T09:48:27Z

Adding a delay after the LoadFuture creation gives correct results.

    async fn user(&self, ctx: &Context) -> types::User {
        let fut = ctx.loaders.user.load(self.user.clone());
        tokio::time::delay_for(std::time::Duration::from_nanos(1)).await;
        fut.await.unwrap()
    }

The whole delay thing inside the LoadFuture::poll can even be totally removed then:

pub struct LoadFuture<K, V, E, F> {
    id: usize,
    stage: Stage,
    state: Arc<Mutex<State<K, Result<V, LoadError<E>>, F, BatchFuture<V, E>>>>,
}

impl<K, V, E, F> Future for LoadFuture<K, V, E, F>
where
    E: Clone,
    F: BatchFn<K, V, Error = E>,
{
    type Output = Result<V, LoadError<E>>;

    fn poll(mut self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> {
        let state = self.state.clone();
        let mut st = state.lock().unwrap();

        if st.loaded_vals.contains_key(&self.id) {
            self.stage = Stage::Finished;
            return Poll::Ready(st.loaded_vals.remove(&self.id).unwrap());
        }

        if let Some(batch_id) = st.loading_ids.get(&self.id) {
            let batch_id = *batch_id;
            ready!(st.poll_batch(cx, batch_id));
            self.stage = Stage::Finished;
            return Poll::Ready(st.loaded_vals.remove(&self.id).unwrap());
        }

        let batch_id = st.dispatch_new_batch(self.id);
        ready!(st.poll_batch(cx, batch_id));
        self.stage = Stage::Finished;
        return Poll::Ready(st.loaded_vals.remove(&self.id).unwrap());
    }
}

IcanDivideBy0 · 2020-02-29T09:26:13Z

From what I understand, the best place to put a delay to push back on the execution stack is in juniper itself here, right before the resolve_field_async; and keeping the delay here in the LoadFuture::poll. But it doesn't make much sense to put this in juniper for the purpose of this crate.

Does somebody have any thoughts on this ?

kolloch · 2020-03-17T17:28:10Z

Instead of the delay, I would propose yield_now:

https://docs.rs/tokio/0.2.13/tokio/task/fn.yield_now.html

I literally just had a look at this crate 10min ago so I am not ready to comment more but I'd also be interested in consistent behavior with async. I don't see the conceptional need for non-determinism there. (But I might be wrong).

cksac · 2020-03-23T15:25:17Z

I released new version 0.8 using async-await and some breaking change. see if you want to try it out.

IcanDivideBy0 · 2020-03-23T20:29:58Z

Ok I tested with 0.8
From what I see, the dataloader is now consistently broken :/ I can't manage to get a single batch functioning.

Also the new API using async_trait makes it really hard to work with generics, the only way to get it compile without the parameter type `T` may not live long enough is:

pub struct ModelBatcher;

impl<T> BatchFn<ObjectId, Option<T>> for ModelBatcher
where
    T: Model,
{
    type Error = ();

    fn load<'a, 'b, 'c>(
        &'a self,
        keys: &'b [ObjectId],
    ) -> Pin<Box<dyn Future<Output = HashMap<ObjectId, Result<Option<T>, Self::Error>>> + Send + 'c>>
    where
        'a: 'c,
        'b: 'c,
        Self: 'c,
    {
        log::debug!("load batch {:?}", keys);

        Box::pin(async move {
            let models = T::load_many(&keys).await;
            models
        })
    }
}

which you'll agree feels really messy

cksac · 2020-03-24T02:00:23Z

Thanks for testing,
Seems there has some limitation on async_trait.
if T: 'static is ok, then the impl will more cleaner.

#[async_trait]
impl<T> BatchFn<ObjectId, Option<T>> for ModelBatcher
where
    T: Model + 'static
{
    type Error = ();

    async fn load(&self, keys: &[ObjectId]) -> HashMap<ObjectId, Result<Option<T>, Self::Error>> {
        println!("load batch {:?}", keys);
        T::load_many(&keys).await
    }
}

for the loader part, could you share your test code?

IcanDivideBy0 · 2020-03-24T09:23:22Z

For the async_trait, followinf this issue, maybe something can be done in the crate directly by adding where T: 'async_trait to the BatchFn trait;

Here is a reproduction of the bug: https://github.com/IcanDivideBy0/dataloader-bug
First loader call batch loads properly, when getting in the 2nd level (friend field) nothing is batched anymore.

IcanDivideBy0 · 2020-03-24T10:00:55Z

I really think the issue is more related to how juniper resolve its tree. Everything looks like once a future is resolved, juniper doesn't wait for all futures of the same level to be completed to continue descending the tree.

cksac · 2020-03-24T10:16:56Z

@IcanDivideBy0 thanks for the demo project to reproduce.

for lifetime issue, updated BatchFn in master branch, now can add T: 'async_trait

#[async_trait]
impl<T> BatchFn<ObjectId, Option<T>> for ModelBatcher
where
    T: Model,
{
    type Error = ();

    async fn load(&self, keys: &[ObjectId]) -> HashMap<ObjectId, Result<Option<T>, Self::Error>>
    where
        T: 'async_trait,
    {
        println!("load batch {:?}", keys);
        T::load_many(&keys).await
    }
}

IcanDivideBy0 · 2020-03-24T10:23:01Z

@cksac thanks! BatchFn way more pleasant to use with generics now :) 👍

cksac · 2020-03-24T10:55:10Z

I update the examples to call loader at same level.
junpier is able to batch load at same level, but async-graphql not able to batch at all.
I think how the fields get resolved is really matter, and not sure what can be done in this crate.

cksac · 2020-03-24T11:04:50Z

update the example to use dataloader version 0.6.0 / 0.7.0, and it works as expected.
https://github.com/cksac/dataloader-bug

IcanDivideBy0 · 2020-03-24T11:16:03Z

That's good news! It looks like that's something that can be fixed here finally

IcanDivideBy0 · 2020-03-24T11:46:31Z

When spamming the request w/ v0.6.0, from times to times, the issue can still occurs.

Try add something like this in the BatchFn::load:

if keys.len() < 2 {
    log::error!("not batched");
    std::process::exit(1);
}

then keep Ctrl+Enter pressed in GraphiQL, in about 20 seconds I get a crash with:

 DEBUG dataloader_bug::graphql > load batch ["1", "3", "2"]
 DEBUG dataloader_bug::graphql > load batch ["friend of 2", "friend of 1"]
 DEBUG dataloader_bug::graphql > load batch ["friend of 3"]
 ERROR dataloader_bug::graphql > not batched

IcanDivideBy0 · 2020-03-24T12:28:59Z

Sorry I can't get a more consistent way to reproduce this...

cksac · 2020-03-24T16:02:43Z

I manage to fix junpier batch load issue by calling yield_now() more times.
update https://github.com/IcanDivideBy0/dataloader-bug to use version 0.8.1 and
with user_loader: Loader::with_yield_count(UserBatch, 100), larger yield count will allow more request to append to batch but will wait longer before actual load.
however, async-graphql still not able to do batch load.

and for the size of batch, I think the load batch size send to loader between 1 to max_batch_size is expected. as the loader will perform load when caller demand the value.

IcanDivideBy0 · 2020-03-24T17:42:14Z

@cksac great, the multiple yields trick did the job! although it feels incredibly hacky I don't see any other way.
Thanks

IcanDivideBy0 changed the title ~~Inconsistent results~~ Inconsistent execution results Feb 27, 2020

cksac closed this as completed Apr 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent execution results #12

Inconsistent execution results #12

IcanDivideBy0 commented Feb 27, 2020 •

edited

IcanDivideBy0 commented Feb 27, 2020

IcanDivideBy0 commented Feb 28, 2020

IcanDivideBy0 commented Feb 28, 2020

IcanDivideBy0 commented Feb 29, 2020 •

edited

kolloch commented Mar 17, 2020

cksac commented Mar 23, 2020

IcanDivideBy0 commented Mar 23, 2020

cksac commented Mar 24, 2020

IcanDivideBy0 commented Mar 24, 2020

IcanDivideBy0 commented Mar 24, 2020

cksac commented Mar 24, 2020

IcanDivideBy0 commented Mar 24, 2020

cksac commented Mar 24, 2020

cksac commented Mar 24, 2020 •

edited

IcanDivideBy0 commented Mar 24, 2020

IcanDivideBy0 commented Mar 24, 2020

IcanDivideBy0 commented Mar 24, 2020

cksac commented Mar 24, 2020 •

edited

IcanDivideBy0 commented Mar 24, 2020

Inconsistent execution results #12

Inconsistent execution results #12

Comments

IcanDivideBy0 commented Feb 27, 2020 • edited

IcanDivideBy0 commented Feb 27, 2020

IcanDivideBy0 commented Feb 28, 2020

IcanDivideBy0 commented Feb 28, 2020

IcanDivideBy0 commented Feb 29, 2020 • edited

kolloch commented Mar 17, 2020

cksac commented Mar 23, 2020

IcanDivideBy0 commented Mar 23, 2020

cksac commented Mar 24, 2020

IcanDivideBy0 commented Mar 24, 2020

IcanDivideBy0 commented Mar 24, 2020

cksac commented Mar 24, 2020

IcanDivideBy0 commented Mar 24, 2020

cksac commented Mar 24, 2020

cksac commented Mar 24, 2020 • edited

IcanDivideBy0 commented Mar 24, 2020

IcanDivideBy0 commented Mar 24, 2020

IcanDivideBy0 commented Mar 24, 2020

cksac commented Mar 24, 2020 • edited

IcanDivideBy0 commented Mar 24, 2020

IcanDivideBy0 commented Feb 27, 2020 •

edited

IcanDivideBy0 commented Feb 29, 2020 •

edited

cksac commented Mar 24, 2020 •

edited

cksac commented Mar 24, 2020 •

edited