Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support HTTP/HEAD data requests #2331

Closed
guidiaz opened this issue Dec 28, 2022 · 7 comments · May be fixed by #2402
Closed

Support HTTP/HEAD data requests #2331

guidiaz opened this issue Dec 28, 2022 · 7 comments · May be fixed by #2402

Comments

@guidiaz
Copy link
Contributor

guidiaz commented Dec 28, 2022

So only the metadata from a web resource is actually fetched in the form of HTTP response headers.

For instance, these HTTP response headers would be quite handfull for implementing content-based use cases with Witnet:

$ curl --head https://witnet.io/_nuxt/img/dragon_reading.a37f8cb.png
HTTP/1.1 200 OK
Content-Type: image/png
Content-Length: 498219
etag: "632067ee-79a2b"
...
$ curl --head https://ipfs.io/ipfs/QmQqzMTavQgT4f4T5v6PWBp7XNKtoPmC9jvn12WPT3gkSE
HTTP/1.1 200 OK
Content-Type: image/png
Content-Length: 38376
Etag: "QmQqzMTavQgT4f4T5v6PWBp7XNKtoPmC9jvn12WPT3gkSE"
X-Ipfs-Path: /ipfs/QmQqzMTavQgT4f4T5v6PWBp7XNKtoPmC9jvn12WPT3gkSE
X-Ipfs-Roots: QmQqzMTavQgT4f4T5v6PWBp7XNKtoPmC9jvn12WPT3gkSE
...

The implementation of HTTP/HEAD requests should:

  • Support addition of HTTP request headers.
  • Transform response header lines into a key/value map

Also, it would be nice to support a new String operator in Radon:

  • decodeHash(<encoding_type>), as to transform etag literals into properly decoded array of bytes, at least supporting SHA-256 as possible encoding type.

Possible new use-cases for Witnet:

  • Verifying a web binary content (i.e. an image, pdf document, ...) is available at the moment the data request is executed, without the nodes actually needing to download the whole file.
  • Verifying the content type, length and etag of any given URL.

Javascript DSL usage example:

const token_image_digest = new Witnet.HttpHeadSource(
  "https://api.game.art/images/1.png",  {
    "Transfer-Encoding": "identity"
  }
)
  .parseJSONMap() // perhaps not necessary, as response to HttpHeadSource should always be a key/value map
  .mapGetString("Etag")
  .stringDecode(Witnet.HASHES.SHA256)

@tmpolaczyk
Copy link
Contributor

It would be nice to be able to access the response headers in HTTP GET/POST requests as well. Perhaps using some flag at the source level to indicate whether we want the response body, the headers, or both. That could also be extended to support binary sources as requested in #2274, by being able to specify the body encoding.

@tmpolaczyk
Copy link
Contributor

This is the header format we can get from the http library surf:

response_headers: BTreeMap<String, Vec<String>>

The value is Vec<String> because header names can be repeated:

Cache-Control: no-cache
Cache-Control: no-store

And the header order is important. Is it possible to do handle each header as an array using radon? For example we would need ArrayContains to check if a header value exists, and also some operator to enforce the size of the array to 1 and get the value.

Or shall we always return only one header? (the first or last instance). Because our HTTP POST headers implementation ignores repeated headers, so should we fix that as well? Although in that case the surf library does not allow us to maintain the order, so if order is important then we may need to change the http library.

In the sample HTTP HEAD response:

HEAD /_nuxt/img/dragon_reading.a37f8cb.png

HTTP/1.1 200 OK
Content-Type: image/png
Content-Length: 498219
etag: "632067ee-79a2b"

With the current proposal we will be unable to do anything with the first line (HTTP/1.0 200 OK), so we cannot check the HTTP version or the status code. We also do not get any way to see the order of the headers, but I guess that won't be an important use case.

However we could get access to all that information if we use an array instead of a map, and then iterate over the values, or if we keep the headers as a string, and add a new StringParseHttpHeaders operator. Then if we had a StringSplit(\n) operator, we would be able to read the headers one at a time, as well as the http version and status code. We would need to change the http library to implement that, but it should be doable.

So, summary, the input of the radon script can be a:

* Map<String, Vec<String>>
* Map<String, String>
* Array<Array<String>>, [[k1, v1], [k2, v2]]
* String, and manually use radon operators to turn it into a Map

@tmpolaczyk
Copy link
Contributor

Also it looks like the surf client in its current configuration doesn't support HEAD requests because of a bug, see this issue: http-rs/surf#218 (comment)

So most probably we will need to use another http client. We are already using isahc which is a low level wrapper around libcurl, and it's also used by surf, so I will create an issue to stop using surf and use isahc directly.

@guidiaz
Copy link
Contributor Author

guidiaz commented Jan 11, 2023

With the current proposal we will be unable to do anything with the first line (HTTP/1.0 200 OK),

I don't think we'd need to include first line (transport level) within the data request response (app level). Ofc, it has to be interpreted by the node as to know whether the HTTP/HEAD request was successfull or not.

@guidiaz
Copy link
Contributor Author

guidiaz commented Jan 11, 2023

Then if we had a StringSplit(\n) operator, we would be able to read the headers one at a time

The radon script needs to access headers as a map some way or another, as it cannot assume the headers will be returned in any specific pre-known order.

@aesedepece
Copy link
Member

aesedepece commented Oct 9, 2023

@guidiaz please ping me before you go about this to discuss some challenges.

@guidiaz
Copy link
Contributor Author

guidiaz commented Oct 11, 2023

By directly using http::Response, we can get the response to an HTTP/HEAD request embedded into a http::HeadersMap<HeadersValue>, that can then be serialized as JSON string. This way, responses to HTTP/HEAD requests can be assumed to be a RadonString value, parseable to RadonMap via the StringParseJSONMap Radon operator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants