Skip to content
This repository has been archived by the owner on Apr 14, 2022. It is now read-only.

Add API for Request, SyncRequestData, and AsyncRequestData #151

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions src/hip/__init__.pyi
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
from .models import Request, SyncRequestData, AsyncRequestData
from .status_codes import StatusCode
81 changes: 81 additions & 0 deletions src/hip/models.pyi
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
import typing

HeadersType = typing.Union[
typing.Mapping[str, str],
typing.Mapping[bytes, bytes],
typing.Iterable[typing.Tuple[str, str]],
typing.Iterable[typing.Tuple[bytes, bytes]],
]

class SyncRequestData:
"""Represents a synchronous request data object. Basically a wrapper around whatever
a user passes in via 'data', 'files', 'json', etc parameters. We can take bytes /
strings, iterators of bytes or strings, and files. We can also sub-class / implement
a file-like interface for things like multipart-formdata so it's possible to
rewind (unlike the current urllib3 interface).

Maybe in the future we can expose details about when an object can be sent via
'socket.sendfile()', etc. This would have to be communicated somehow to the
low-level backend streams.

When we're handed a file-like object we should take down the starting point
in the file via .tell() so we can rewind and put the pointer back after
the seek() call in '.content_length'

Ref: https://github.com/python-trio/urllib3/issues/135
"""

def read(self, nbytes: int) -> bytes:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding there only being one interface, I think we should have something like SyncRequestData.sync_read() and AsyncRequestData.async_read() or maybe we switch to using async for chunk in request.data which would automatically get transformed by unasync but we'd forfeit the ability to grab a certain amount of data. I guess that interface is easier to implement anyways?

The reason I think this would be good to have the split is you'd only have to subclass AsyncRequestData and could technically use that class in both async and synchronous requests. Would just need to have good error reporting if you're passed input data that is for example, an async generator, and then using it in a sync session. I'm guessing that wouldn't happen often and an error message would fix any ambiguity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example of my plan in action:

import typing

StrOrInt = typing.Union[str, int]
FormType = typing.Union[
    typing.Sequence[typing.Tuple[str, StrOrInt]],
    typing.Mapping[str, typing.Union[StrOrInt, typing.Sequence[StrOrInt]]],
]


def _int_to_urlenc():
    values = {}
    special = {0x2A, 0x2D, 0x2E, 0x5F}
    for byte in range(256):
        if (
            (0x61 <= byte <= 0x7A)
            or (0x030 <= byte <= 0x5A and byte != 0x40)
            or (byte in special)
        ):  # Keep the ASCII
            values[byte] = bytes((byte,))
        elif byte == 0x020:  # Space -> '+'
            values[byte] = b"+"
        else:  # Percent-encoded
            values[byte] = b"%" + hex(byte)[2:].upper().encode()
    return values


INT_TO_URLENC = _int_to_urlenc()


class URLEncoded(AsyncRequestData):
    """Implements x-www-form-urlencoded as a RequestData object"""
    def __init__(self, form: FormType):
        self._form = form
        self._data = None

    @property
    def content_type(self) -> str:
        return "application/x-www-form-urlencoded"

    @property
    def content_length(self) -> int:
        return len(self._encode_form())

    def is_rewindable(self) -> bool:
        return True

    def __iter__(self) -> typing.Iterable[bytes]:
        yield self._encode_form()

    async def __aiter__(self) -> typing.AsyncIterable[bytes]:
        yield self._encode_form()

    def _encode_form(self) -> bytes:
        if self._data is None:

            def serialize(x: str) -> bytes:
                return b"".join([INT_TO_URLENC[byte] for byte in x.encode("utf-8")])

            output: typing.List[bytes] = []
            for k, vs in (
                self._form.items() if hasattr(self._form, "items") else self._form
            ):
                if isinstance(vs, str) or not hasattr(vs, "__iter__"):
                    vs = (vs,)
                for v in vs:
                    output.append(serialize(k) + b"=" + serialize(v))

            self._data = b"&".join(output)

        return self._data

Suddenly this one class can be used for both synchronous and asynchronous requests. We keep the is_rewindable (or maybe rename it to is_replayable? or something else) and then we'd signal a "rewind" by calling __aiter__ or __iter__ again.

"""Don't know what to call this? Grab some data from the pool of data
to be sent across the wire.
"""
@property
def content_length(self) -> typing.Optional[int]:
"""We can get a proper content-length for bytes, strings, file-like objects
that are opened in binary mode (is that detectable somehow?). If we hand
back 'None' from this property it means that the request should use
'Transfer-Encoding: chunked'
"""
def rewind(self) -> None:
"""This function rewinds the request data so that it can be retransmitted
in the case of a timeout/socket error on an idempotent request, a redirect,
etc so that the new request can be sent. This works for file-like objects
and bytes / strings (where it's a no-op).
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also add a content_type property so that can be passed to the request as a default if none is set.

@property
def is_rewindable(self) -> bool:
"""We should return a bool whether .rewind() will explode or not."""

class AsyncRequestData(SyncRequestData):
"""The same as above except also accepts async files (we'll have to
handle all the possible APIs here?) and async iterators.
"""

async def read(self, nbytes: int) -> bytes: ...
async def rewind(self) -> None: ...

class Request:
"""Requests aren't painted async or sync, only their data is.
By the time the request has been sent on the network and we'll
get a response back the request will be attached to the response
via 'SyncResponse.request'. At that point we can remove the 'data'
parameter from the Request and only have the metadata left so
users can't muck around with a spent Request body.

The 'url' type now is just a string but will be a full-featured
type in the future. Requests has 'Request.url' as a string but
we'll want to expose the whole URL object to do things like
'request.url.origin' downstream.

Also no reason to store HTTP version here as the final version
of the request will be determined after the connection has
been established.
"""

def __init__(
self,
method: str,
url: str,
headers: HeadersType,
data: typing.Union[SyncRequestData, AsyncRequestData]=None,
): ...