Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: push graphsync requests #346

Open
hannahhoward opened this issue Feb 4, 2022 · 6 comments
Open

Discussion: push graphsync requests #346

hannahhoward opened this issue Feb 4, 2022 · 6 comments
Assignees
Labels
need/triage Needs initial labeling and prioritization

Comments

@hannahhoward
Copy link
Collaborator

One the major reasons for the data transfer libp2p protocol is to support push data transfers, as currently graphsync is only a request/response protocol. We'd like to build this support into the library ideally, even if indirectly.

I'd like to lay out some possible paths:

  1. Add a completely abstract 4th type of entity (currently types are Request, Response, Control) to a graphsync message:
type GraphSyncControlMessage struct {
    id              GraphSyncRequestID  (rename "ID")   # unique id set on the requester side 
    extensions  GraphSyncExtensions         (rename "Ext")  # side channel information
}

This message would do very little other than allow people to register hooks to inspect these messages. Essentially, this just becomes a channel via which we can send data transfer messages and replace the libp2p protocol. The data transfer protocol would send a control message at the beginning of a push transfer, and the other side would decode it and assuming it passed all validation, initiate the actual graphsync request. With UUIDs now in use, it wouldn't be that hard to setup.

This has the benefit of being a side channel we can use for practically any messages in the data transfer libp2p protocol. The downside is it feel super weird and just throwing a random thing in graphsync to support this.

  1. Add a push request entity to the graphsync message:
type GraphSyncPushRequest struct {
  id                GraphSyncRequestID  (rename "ID")   # unique id set on the requester side
  root     optional Link                (rename "Root") # a CID for the root node in the query
  selector optional Any                 (rename "Sel")  # see https://github.com/ipld/specs/blob/master/selectors/selectors.md
  extensions        GraphSyncExtensions (rename "Ext")  # side channel information
}

This message would be handled directly by GraphSync, with some kind of OnPushRequestReceivedHook that could look at the request and extensions and validate it. Assuming all is well, the request begins, initiated by graphsync itself and then sent as a normal request.

An upside is that this feels way better. A downside is that we have to figure out other data transfer messages independently.

It also begs the question: is there a GraphSyncPushResponse too? How else do we communicate a rejection?

which brings us to our final possibility

  1. Just use our existing request and response structs, add a new request type based on Convert Cancel/Update on graphsync request to single RequestType #345

Similar to option 2, we'd have an OnPushRequestReceivedHook -- if validation succeeds, we generate a regular request back. If it fails, we generate a response message -- with a requestrejected code? Do we need to have a type on the other side? Does this neccesitate a ResponseType? I dunno.

The downside here is you get a sequence of:
request -> request on success (and now request/response have flipped)
and request -> response on fail

The upside is it's by far the smallest change to the message format.

Final question:
Data Transfer currently supports checkpointing with pause/resume for push requests, on the theory the person receiving data controls accepting it as its pushed. It's unused. It's ridiculously hacky and probably doesn't work. I'm excited to delete it soon. BUT, is this a use case we need to at least plan for? Perhaps someone wants to charge you just to accept your data (see content provider putting stuff on the retrieval network).

@hannahhoward hannahhoward added the need/triage Needs initial labeling and prioritization label Feb 4, 2022
@hannahhoward hannahhoward assigned dirkmc, willscott and rvagg and unassigned dirkmc Feb 4, 2022
@hannahhoward
Copy link
Collaborator Author

BTW my current preference is 2 or 3.

@rvagg
Copy link
Member

rvagg commented Feb 4, 2022

Option 1 is the most flexible but it's kind of opening the gates to arbitrary communications channels, are we OK with that?

Option 2 and 3 are almost the same, with just more work involved in option 2 I think, but the if you squint then a GraphSyncPushRequest is going to look like a GraphSyncRequest with a special "type" and that question about whether a rejection needs a new type again—it probably doesn't, a GraphSyncResponse is probably enough, but again we're back looking a lot like option 3.

So option 3, can you explain why you see the flow of request->request (success) and request->response (fail) to be a downside? It seems like this is a bind created by "push" requests regardless and it's just a matter of the names of things in code.

@willscott
Copy link
Collaborator

do we have a concrete application scenario that we would like to involve a push of data? being able to sketch these out within such a context a layer up would i think be useful in understanding more clearly what's triggering it, and how it fits in to the surrounding concerns of dos mitigation and efficiency.

@hannahhoward
Copy link
Collaborator Author

hannahhoward commented Feb 4, 2022

do we have a concrete application scenario that we would like to involve a push of data?

currently every storage transfer in go-data-transfer is a push. It seems useful for "I'd like to send you some data" -- currently, data transfer handles the authentication. If all is well, it gets queued into the outgoing requests -- note there's still the total number of outgoing requests to get actually initiating transfer.

@hannahhoward
Copy link
Collaborator Author

So option 3, can you explain why you see the flow of request->request (success) and request->response (fail) to be a downside? It seems like this is a bind created by "push" requests regardless and it's just a matter of the names of things in code.

Yea, it's just a weird naming and my background as a web programmer wants me to make everything request -> response.

I am leaning towards Option 3.

@dirkmc
Copy link
Collaborator

dirkmc commented Feb 7, 2022

In Boost we're not planning to have a push protocol, so as to simplify the transport implementation. When a client makes a storage deal, the server starts downloading from the client as soon as the deal is accepted.
So I guess my first question would be whether graphsync absolutely needs push?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/triage Needs initial labeling and prioritization
Projects
None yet
Development

No branches or pull requests

4 participants