Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ETag support #5298

Merged
merged 9 commits into from
Feb 14, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGES/4594.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
FileResponse now supports ETag.
3 changes: 2 additions & 1 deletion aiohttp/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
)
from .cookiejar import CookieJar as CookieJar, DummyCookieJar as DummyCookieJar
from .formdata import FormData as FormData
from .helpers import BasicAuth as BasicAuth, ChainMapProxy as ChainMapProxy
from .helpers import BasicAuth, ChainMapProxy, ETag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the imports changed here?

I'm assuming the style in this file follows the no implicit imports approach from Mypy:
https://mypy.readthedocs.io/en/stable/command_line.html#cmdoption-mypy-no-implicit-reexport

Although a little odd that it's not enforced in the Mypy tests if that is the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those classes are already presented in the __all__ variable.
I agree about inconsistency with mypy style flags.

This change was previously discussed, I'll tag you there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great example of why style/formatting changes should be separated from the functional ones: a tiny disagreement can block a whole lot of legitimate changes, plus this makes it harder to notice important behavior changes.

I'll just leave this here: https://mtlynch.io/code-review-love/.

from .http import (
HttpVersion as HttpVersion,
HttpVersion10 as HttpVersion10,
Expand Down Expand Up @@ -145,6 +145,7 @@
# helpers
"BasicAuth",
"ChainMapProxy",
"ETag",
# http
"HttpVersion",
"HttpVersion10",
Expand Down
25 changes: 24 additions & 1 deletion aiohttp/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
from .log import client_logger
from .typedefs import PathLike # noqa

__all__ = ("BasicAuth", "ChainMapProxy")
__all__ = ("BasicAuth", "ChainMapProxy", "ETag")

PY_38 = sys.version_info >= (3, 8)

Expand Down Expand Up @@ -887,3 +887,26 @@ def populate_with_cookies(
for cookie in cookies.values():
value = cookie.output(header="")[1:]
headers.add(hdrs.SET_COOKIE, value)


# https://tools.ietf.org/html/rfc7232#section-2.3
_ETAGC = r"[!#-}\x80-\xff]+"
_ETAGC_RE = re.compile(_ETAGC)
_QUOTED_ETAG = fr'(W/)?"({_ETAGC})"'
QUOTED_ETAG_RE = re.compile(_QUOTED_ETAG)
LIST_QUOTED_ETAG_RE = re.compile(fr"({_QUOTED_ETAG})(?:\s*,\s*|$)|(.)")

ETAG_ANY = "*"


@dataclasses.dataclass(frozen=True)
class ETag:
value: str
is_weak: bool = False


def validate_etag_value(value: str) -> None:
if value != ETAG_ANY and not _ETAGC_RE.fullmatch(value):
raise ValueError(
f"Value {value!r} is not a valid etag. Maybe it contains '\"'?"
)
67 changes: 56 additions & 11 deletions aiohttp/web_fileresponse.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,10 @@
Any,
Awaitable,
Callable,
Iterator,
List,
Optional,
Tuple,
Union,
cast,
)
Expand All @@ -19,6 +21,7 @@

from . import hdrs
from .abc import AbstractStreamWriter
from .helpers import ETAG_ANY, ETag
from .typedefs import LooseHeaders
from .web_exceptions import (
HTTPNotModified,
Expand Down Expand Up @@ -102,6 +105,31 @@ async def _sendfile(
await super().write_eof()
return writer

@staticmethod
def _strong_etag_match(etag_value: str, etags: Tuple[ETag, ...]) -> bool:
if len(etags) == 1 and etags[0].value == ETAG_ANY:
return True
else:
return any(etag.value == etag_value for etag in etags if not etag.is_weak)
Comment on lines +112 to +113
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, this could be less nested:

Suggested change
else:
return any(etag.value == etag_value for etag in etags if not etag.is_weak)
return any(etag.value == etag_value for etag in etags if not etag.is_weak)


async def _not_modified(
self, request: "BaseRequest", etag_value: str, last_modified: float
) -> Optional[AbstractStreamWriter]:
self.set_status(HTTPNotModified.status_code)
self._length_check = False
self.etag = etag_value # type: ignore
self.last_modified = last_modified # type: ignore
Comment on lines +120 to +121
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was agreed in other PRs to add specific error codes in brackets.

# Delete any Content-Length headers provided by user. HTTP 304
# should always have empty response body
return await super().prepare(request)

async def _precondition_failed(
self, request: "BaseRequest"
) -> Optional[AbstractStreamWriter]:
self.set_status(HTTPPreconditionFailed.status_code)
self.content_length = 0
return await super().prepare(request)

async def prepare(self, request: "BaseRequest") -> Optional[AbstractStreamWriter]:
filepath = self._path

Expand All @@ -114,20 +142,35 @@ async def prepare(self, request: "BaseRequest") -> Optional[AbstractStreamWriter
gzip = True

loop = asyncio.get_event_loop()
st = await loop.run_in_executor(None, filepath.stat)
st: os.stat_result = await loop.run_in_executor(None, filepath.stat)

modsince = request.if_modified_since
if modsince is not None and st.st_mtime <= modsince.timestamp():
self.set_status(HTTPNotModified.status_code)
self._length_check = False
# Delete any Content-Length headers provided by user. HTTP 304
# should always have empty response body
return await super().prepare(request)
etag_value = f"{st.st_mtime_ns:x}-{st.st_size:x}"
last_modified = st.st_mtime

# https://tools.ietf.org/html/rfc7232#section-6
ifmatch = request.if_match
if ifmatch is not None and not self._strong_etag_match(etag_value, ifmatch):
return await self._precondition_failed(request)

unmodsince = request.if_unmodified_since
if unmodsince is not None and st.st_mtime > unmodsince.timestamp():
self.set_status(HTTPPreconditionFailed.status_code)
return await super().prepare(request)
if (
unmodsince is not None
and ifmatch is None
and st.st_mtime > unmodsince.timestamp()
):
return await self._precondition_failed(request)

ifnonematch = request.if_none_match
if ifnonematch is not None and self._strong_etag_match(etag_value, ifnonematch):
return await self._not_modified(request, etag_value, last_modified)

modsince = request.if_modified_since
if (
modsince is not None
and ifnonematch is None
and st.st_mtime <= modsince.timestamp()
):
return await self._not_modified(request, etag_value, last_modified)

if hdrs.CONTENT_TYPE not in self.headers:
ct, encoding = mimetypes.guess_type(str(filepath))
Expand Down Expand Up @@ -218,6 +261,8 @@ async def prepare(self, request: "BaseRequest") -> Optional[AbstractStreamWriter
self.headers[hdrs.CONTENT_ENCODING] = encoding
if gzip:
self.headers[hdrs.VARY] = hdrs.ACCEPT_ENCODING

self.etag = etag_value # type: ignore[assignment]
self.last_modified = st.st_mtime # type: ignore[assignment]
self.content_length = count

Expand Down
49 changes: 49 additions & 0 deletions aiohttp/web_request.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,10 @@
from .abc import AbstractStreamWriter
from .helpers import (
_SENTINEL,
ETAG_ANY,
LIST_QUOTED_ETAG_RE,
ChainMapProxy,
ETag,
HeadersMixin,
is_expected_content_type,
reify,
Expand Down Expand Up @@ -500,6 +503,52 @@ def if_unmodified_since(self) -> Optional[datetime.datetime]:
"""
return self._http_date(self.headers.get(hdrs.IF_UNMODIFIED_SINCE))

@staticmethod
def _etag_values(etag_header: str) -> Iterator[ETag]:
"""Extract `ETag` objects from raw header."""
if etag_header == ETAG_ANY:
yield ETag(
is_weak=False,
value=ETAG_ANY,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some inconsistency here. If ETag is "*", the value attribute will contain quotes. In all other cases they will be stripped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. * is a kind of special value, which is used without quotes. So if someone somehow will use "*" instead of *, it will be considered as regular etag.

Unfortunatly ETag(value='*') produced from "*" is indistinguishable from one produced from *.
So, for example:

"*", "tag", "other-tag" -> (ETag(value='*'), ETag(value='other-tag'))
* -> (ETag(value='*'), )

Maybe it's better to create separate type ETAG_ANY and change Optional[Tuple[ETag, ...]] to Optional[Union[ETAG_ANY, Tuple[ETag, ...]]]

"*", "tag", "other-tag" -> (ETag(value='*'), ETag(value='other-tag'))
* -> ETAG_ANY

@asvetlov what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right.

)
else:
for match in LIST_QUOTED_ETAG_RE.finditer(etag_header):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It accepts also garbage like "1",spam"2".

Copy link
Contributor Author

@greshilov greshilov Dec 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right.

To solve this problem I rejected the LIST_QUOTED_ETAG_RE approach, and switched to split + re.fullmatch.
Now only "1" would be matched from proposed garbage string. I've checked nginx behaviour and it's consistent with suggested solution. nginx matches etag values from header until first invalid is found. New approach is also 10x faster than old.

Main disadvantage is that memory consumption will probably increase though, but I don't think that really long If-Match, If-None-Match headers are so widespreaded.

is_weak, value, garbage = match.group(2, 3, 4)
# Any symbol captured by 4th group means
# that the following sequence is invalid.
if garbage:
break

yield ETag(
is_weak=bool(is_weak),
value=value,
)

@classmethod
def _if_match_or_none_impl(
cls, header_value: Optional[str]
) -> Optional[Tuple[ETag, ...]]:
if not header_value:
return None

return tuple(cls._etag_values(header_value))

@reify
def if_match(self) -> Optional[Tuple[ETag, ...]]:
"""The value of If-Match HTTP header, or None.

This header is represented as a `tuple` of `ETag` objects.
"""
return self._if_match_or_none_impl(self.headers.get(hdrs.IF_MATCH))

@reify
def if_none_match(self) -> Optional[Tuple[ETag, ...]]:
"""The value of If-None-Match HTTP header, or None.

This header is represented as a `tuple` of `ETag` objects.
"""
return self._if_match_or_none_impl(self.headers.get(hdrs.IF_NONE_MATCH))

@reify
def if_range(self) -> Optional[datetime.datetime]:
"""The value of If-Range HTTP header, or None.
Expand Down
43 changes: 42 additions & 1 deletion aiohttp/web_response.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,16 @@
from . import hdrs, payload
from .abc import AbstractStreamWriter
from .helpers import (
ETAG_ANY,
PY_38,
QUOTED_ETAG_RE,
CookieMixin,
ETag,
HeadersMixin,
populate_with_cookies,
rfc822_formatted_time,
sentinel,
validate_etag_value,
)
from .http import RESPONSES, SERVER_SOFTWARE, HttpVersion10, HttpVersion11
from .payload import Payload
Expand Down Expand Up @@ -271,6 +275,43 @@ def last_modified(
elif isinstance(value, str):
self._headers[hdrs.LAST_MODIFIED] = value

@property
def etag(self) -> Optional[ETag]:
quoted_value = self._headers.get(hdrs.ETAG)
if not quoted_value:
return None
elif quoted_value == ETAG_ANY:
return ETag(value=ETAG_ANY)
match = QUOTED_ETAG_RE.fullmatch(quoted_value)
if not match:
return None
is_weak, value = match.group(1, 2)
return ETag(
is_weak=bool(is_weak),
value=value,
)

@etag.setter
def etag(self, value: Optional[Union[ETag, str]]) -> None:
if value is None:
self._headers.pop(hdrs.ETAG, None)
elif (isinstance(value, str) and value == ETAG_ANY) or (
isinstance(value, ETag) and value.value == ETAG_ANY
):
self._headers[hdrs.ETAG] = ETAG_ANY
elif isinstance(value, str):
validate_etag_value(value)
self._headers[hdrs.ETAG] = f'"{value}"'
greshilov marked this conversation as resolved.
Show resolved Hide resolved
elif isinstance(value, ETag) and isinstance(value.value, str):
validate_etag_value(value.value)
hdr_value = f'W/"{value.value}"' if value.is_weak else f'"{value.value}"'
self._headers[hdrs.ETAG] = hdr_value
else:
raise ValueError(
f"Unsupported etag type: {type(value)}. "
f"etag must be str, ETag or None"
)

def _generate_content_type_header(
self, CONTENT_TYPE: istr = hdrs.CONTENT_TYPE
) -> None:
Expand Down Expand Up @@ -363,7 +404,7 @@ async def _prepare_headers(self) -> None:
elif version >= HttpVersion11 and self.status in (100, 101, 102, 103, 204):
del headers[hdrs.CONTENT_LENGTH]

if self.status != 204:
if self.status not in (204, 304):
headers.setdefault(hdrs.CONTENT_TYPE, "application/octet-stream")
headers.setdefault(hdrs.DATE, rfc822_formatted_time())
headers.setdefault(hdrs.SERVER, SERVER_SOFTWARE)
Expand Down
17 changes: 17 additions & 0 deletions docs/client_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1688,6 +1688,23 @@ ClientTimeout

.. versionadded:: 3.3

ETag
^^^^

.. class:: ETag(name, is_weak=False)

Represents `ETag` identifier.

.. attribute:: value

Value of corresponding etag without quotes.

.. attribute:: is_weak

Flag indicates that etag is weak (has `W/` prefix).

.. versionadded:: 3.8

RequestInfo
^^^^^^^^^^^

Expand Down
2 changes: 2 additions & 0 deletions docs/spelling_wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ Dict
Discord
Django
Dup
ETag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm almost sure that the spellchecker is case-insensitive.

Facebook
HTTPException
HttpProcessingError
Expand Down Expand Up @@ -155,6 +156,7 @@ env
environ
eof
epoll
etag
facto
fallback
fallbacks
Expand Down
34 changes: 34 additions & 0 deletions docs/web_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -318,6 +318,26 @@ and :ref:`aiohttp-web-signals` handlers.

.. versionadded:: 3.1

.. attribute:: if_match

Read-only property that returns :class:`ETag` objects specified
in the *If-Match* header.

Returns :class:`tuple` of :class:`ETag` or ``None`` if
*If-Match* header is absent.

.. versionadded:: 3.8

.. attribute:: if_none_match

Read-only property that returns :class:`ETag` objects specified
*If-None-Match* header.

Returns :class:`tuple` of :class:`ETag` or ``None`` if
*If-None-Match* header is absent.

.. versionadded:: 3.8

.. attribute:: if_range

Read-only property that returns the date specified in the
Expand Down Expand Up @@ -774,6 +794,20 @@ StreamResponse
as an :class:`int` or a :class:`float` object, and the
value ``None`` to unset the header.

.. attribute:: etag

*ETag* header for outgoing response.

This property accepts raw :class:`str` values, :class:`ETag`
objects and the value ``None`` to unset the header.

In case of :class:`str` input, etag is considered as strong by default.

**Do not** use double quotes ``"`` in the etag value,
they will be added automatically.

.. versionadded:: 3.8

.. comethod:: prepare(request)

:param aiohttp.web.Request request: HTTP request object, that the
Expand Down