New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement cookie session #180
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #180 +/- ##
==========================================
+ Coverage 76.16% 78.47% +2.30%
==========================================
Files 8 10 +2
Lines 2618 3047 +429
==========================================
+ Hits 1994 2391 +397
- Misses 624 656 +32
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- tests are required
Thanks for your work, great job.
Great review. @temoto I have completed a new cookie handler according to your suggestion. I will update the PR later.
|
tests/test_cookie.py python{2,3}/httplib2test.py is original, now deprecated test suite, and
|
b2172ca
to
190f31b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super, thanks. It's going to be the best cookie implementation.
There is only few more things to change and you can leave to me if you think it's too much work. All credit is yours if I do this too.
- it would probably be good idea to extract all cookie related code to separate file cookie.py
- CookieJar should only contain public API for cookie storage, same as other (null/database/etc) jar implementation; common helpers like normalize_path should go to
Cookie
or top level functions - CookieJar
_load
_save
probably should be private - your call - please create Dummy/Null/Noop-CookieJar that always returns empty cookies and ignores
set
, then you can forceHttp
object to always have a cookie manager (jar) present; also then you can remove Http.get/set/clear_cookie shortcuts because it's accessible withHttp.cookie_jar.get/set/clear
- modify headers once in _request, before any _conn_request call, that way cookie related changes are less spread across code
- please run
black python?/httplib2/cookie.py
to fix double quotes and long lines
It's alright, I will take it.
Good idea, Will do.
No problem. Will do.
I think it's better to keep them as public API. So user still can loads / saves cookies manually when the cookie persistence function is disabled(didn't provide cookie
It's fine to implement a NullCookieJar to elimiate the
instead of
Will do. |
190f31b
to
73ae9b9
Compare
Nice. Maybe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, you are very fast. Marked important changes. You can skip minor if you like.
|
||
if self.filename: | ||
self.save() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(important)
all CookieJar methods below are not related to file storage, should go to Cookie or module level functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_match_path
and _match_domain
are used by get_header
to help search cookies when walking through the <domain>/<path>/<name>
tree. They're not related to cookie directly, since at that moment no cookie is get. And they are only used in CookieJar. I think it makes sense to place them in CookieJar as private methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_header is out of scope file storage (CookieJar) too. Imagine SqliteCookieJar, it needs all these methods but it definitely must not couple with private methods of file storage (CookieJar).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I still didn't get your point about this change.
The httplib2.CookieJar is not a file storage. It's a cookie jar in memory with functions to load cookies from/or save to file. Just like get()/set()/clear()
, the get_header()
is not special, it means getting the cookies in the format of HTTP header. Similarly, extract_header
means setting the cookies from HTTP header.
Maybe it will be much clear if separating the FileCookieJar from the CookieJar. For example:
class CookieJarBase(CookieJar):
<null operations>
NullCookieJar = CookieJarBase
class CookieJar(CookieJarBase):
1. Virtual interfaces to persist the cookies: load()/save()
2. Implement all other operations(also includes `get_header`/`extract_header`) on self._cookies
class FileCookieJar(CookieJar):
Re-implement save()/load() interface with file storage
class SqliteCookieJar(CookieJar): <-- your example
Re-implement save()/load() interface with sqlite database
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word "jar" means storage for cookies. So CookieJar should contain code that is related to storing cookies: in memory, file or somewhere else. get_header
, extract_header
are helper functions that are not related to storage.
It's very fine and clear for some functions to be at module level. There is no need to implement Java style class hierarchy. That virtual/implement/override nightmare is part of what made Python stdlib HTTP client so hard to use in the first place. Because you can't clearly read code: you start reading a method, it calls something in parent, look there, which calls something not in parent, look back, it's very easy for machine, but we should care about people.
If you want to separate cookie access and storage, it's a great idea but additional work. Because right now, each set()
writing to disk is a performance problem.
class CookieJar(object):
"""Cookie in-memory storage"""
def __init__(self):
self._items = {}
def get/set/clear(): # the cookie access methods only
def __hash__(): # to help storage know when contents were modified and should be written to disk/db/etc
# not related to any storage or cookie access
def get_header():
def extract_header():
# and then separate storage, named as json and all other serialization APIs
def load(fileobj) -> CookieJar:
def dump(fileobj, CookieJar):
# this way, new database backend would only implement load/dump
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variant of your code with base jar is also fine:
class NullCookieJar(object):
def get/set/clear(): ...
# no load/save/get_header
class MemoryCookieJar(object):
def get/set/clear(): ...
# no load/save/get_header
# inherit MemoryCookieJar only as API convenience, no virtual/override methods
class FileCookieJar(MemoryCookieJar):
def load/save(): ...
# no get/set/clear/get_header
class SqliteCookieJar(object):
def get() -> db.SELECT
def set() -> db.INSERT/UPDATE
def clear() -> db.DELETE
# no load/save/get_header
class BatchSqliteCookieJar(MemoryCookieJar):
def load() -> db.SELECT
def save() -> db.INSERT/UPDATE
# no get/set/clear/get_header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let me try to explain the issue here from another point.
CookieJar
is public API, in sense that new storage backend must implement some methods for everything to work. So we must document what is the API (or at least provide refence implementation that other people may copy/modify). So what's the CookieJar API? You say it includes get_header/extract_header. It's even visible in that NullCookieJar implements get_header
(which is wrong). Now somebody needs to store cookies in database. It completely makes sense that they must create get/set/clear
, right?
class DatabaseCookieJar(object):
def get/set/clear(): ... # makes sense
but it doesn't work, because Http._set_cookie_header
calls your jar get_header
. And they think, what my database cookie jar implementation must do differently in get_header
? Ah there is (and could be) no new behavior (mind alarm 1). So what are their options?
# option 1
class DatabaseCookieJar(object):
def get/set/clear(): ...
@staticmethod
def get_header(*a, **kw): return CookieJar.get_header(*a, **kw)
# option 2
class DatabaseCookieJar(CookieJar): # why? only to make _set_cookie_header happy
def get/set/clear(): ...
And they think, why my database implementation must couple with file cookie storage? There is no good reason. Because _set_cookie_header
calls some common helper function via .cookie_jar
property.
And yet another point: one may want to construct/parse cookie headers without knowing about file/memory/database cookie storage. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variant of your code with base jar is also fine:
class NullCookieJar(object): def get/set/clear(): ... # no load/save/get_header class MemoryCookieJar(object): def get/set/clear(): ... # no load/save/get_header # inherit MemoryCookieJar only as API convenience, no virtual/override methods class FileCookieJar(MemoryCookieJar): def load/save(): ... # no get/set/clear/get_header class SqliteCookieJar(object): def get() -> db.SELECT def set() -> db.INSERT/UPDATE def clear() -> db.DELETE # no load/save/get_header class BatchSqliteCookieJar(MemoryCookieJar): def load() -> db.SELECT def save() -> db.INSERT/UPDATE # no get/set/clear/get_header
Without the load/save virtual interface in MemoryCookieJar
, you cannot keep updating the file/sqldatabse on each set/clear operation in the FileCookieJar
/BatchSqliteCookieJar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you can do like getattr(http.cookie_jar, 'save', lambda: None)()
but that is ugly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing _match_path
, _match_domain
, get_header
from CookieJar is the last important change to be made. Sorry I understand your position but disagree. You have done incredible work. 👍
@temoto I would appreciate it if you can continue the improvement work on this part. I will be busy in the next few weeks. Thank you, for your help and patience. |
Yes, no problem, I will. Thanks for your big contribution. |
Track and set the cookie header automatically.
Cookie session can be enabled by the new parameter
cookie_jar
:For cookie persistence, just providing a filename. All the cookie changes will be updated to file in the disk.
A new attribute
cookies
of response object to get parsed cookies:To set custom cookies to the
request
, you can add them toHttp.request()
method as a parameter directly, the cookies only serve this request.Or you can add them to the cookie jar, these cookies will be managed by cookie jar in subsequent requests.
Fixes #129