Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cookie session #180

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

weyou
Copy link

@weyou weyou commented Nov 2, 2020

Track and set the cookie header automatically.
Cookie session can be enabled by the new parameter cookie_jar:

cj = httplib2.CookieJar()
http = httplib2.Http(cookie_jar=cj)

For cookie persistence, just providing a filename. All the cookie changes will be updated to file in the disk.

cj = httplib2.CookieJar(filename='cookie.txt')
http = httplib2.Http(cookie_jar=cj)

A new attribute cookies of response object to get parsed cookies:

response = http.request(...)
print(response.cookies)

To set custom cookies to the request, you can add them to Http.request() method as a parameter directly, the cookies only serve this request.

http = httplib2.Http()
http.request(..., cookies={'AAA': '123'})

Or you can add them to the cookie jar, these cookies will be managed by cookie jar in subsequent requests.

cj = httplib2.CookieJar()
http = httplib2.Http(cookie_jar=cj)
http.cookie_jar.set('AAA', '123', '.example.com', '/dir_a', max_age=0)

Fixes #129

@codecov
Copy link

codecov bot commented Nov 2, 2020

Codecov Report

Merging #180 (9b72dc9) into master (595e248) will increase coverage by 2.30%.
The diff coverage is 92.54%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #180      +/-   ##
==========================================
+ Coverage   76.16%   78.47%   +2.30%     
==========================================
  Files           8       10       +2     
  Lines        2618     3047     +429     
==========================================
+ Hits         1994     2391     +397     
- Misses        624      656      +32     
Impacted Files Coverage Δ
python3/httplib2/cookie.py 92.30% <92.30%> (ø)
python2/httplib2/cookie.py 92.42% <92.42%> (ø)
python2/httplib2/__init__.py 83.77% <94.44%> (+0.18%) ⬆️
python3/httplib2/__init__.py 84.67% <94.44%> (+0.18%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 595e248...9b72dc9. Read the comment docs.

Copy link
Member

@temoto temoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • tests are required

Thanks for your work, great job.

python2/httplib2/__init__.py Outdated Show resolved Hide resolved
python2/httplib2/__init__.py Outdated Show resolved Hide resolved
python2/httplib2/__init__.py Outdated Show resolved Hide resolved
python2/httplib2/__init__.py Outdated Show resolved Hide resolved
python2/httplib2/__init__.py Outdated Show resolved Hide resolved
python2/httplib2/__init__.py Outdated Show resolved Hide resolved
python2/httplib2/__init__.py Outdated Show resolved Hide resolved
python2/httplib2/__init__.py Outdated Show resolved Hide resolved
python2/httplib2/__init__.py Show resolved Hide resolved
python2/httplib2/__init__.py Outdated Show resolved Hide resolved
@weyou
Copy link
Author

weyou commented Nov 4, 2020

Great review. @temoto

I have completed a new cookie handler according to your suggestion. I will update the PR later.
BTW, where should I add the test? All the following locations have test files:

  • python2/httplib2test.py
  • python3/httplib2test.py
  • tests
  • test

@temoto
Copy link
Member

temoto commented Nov 4, 2020

@weyou

tests/test_cookie.py

python{2,3}/httplib2test.py is original, now deprecated test suite, and test directory contains data for it. It's still not removed because I didn't finish porting all tests to new test suite.

tests contains new test suite, designed to be one for py2/3 code base and it's run by CI.

@weyou weyou force-pushed the cookie-session branch 2 times, most recently from b2172ca to 190f31b Compare November 16, 2020 06:20
@weyou weyou requested a review from temoto November 17, 2020 06:09
Copy link
Member

@temoto temoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super, thanks. It's going to be the best cookie implementation.

There is only few more things to change and you can leave to me if you think it's too much work. All credit is yours if I do this too.

  • it would probably be good idea to extract all cookie related code to separate file cookie.py
  • CookieJar should only contain public API for cookie storage, same as other (null/database/etc) jar implementation; common helpers like normalize_path should go to Cookie or top level functions
  • CookieJar _load _save probably should be private - your call
  • please create Dummy/Null/Noop-CookieJar that always returns empty cookies and ignores set, then you can force Http object to always have a cookie manager (jar) present; also then you can remove Http.get/set/clear_cookie shortcuts because it's accessible with Http.cookie_jar.get/set/clear
  • modify headers once in _request, before any _conn_request call, that way cookie related changes are less spread across code
  • please run black python?/httplib2/cookie.py to fix double quotes and long lines

python2/httplib2/__init__.py Outdated Show resolved Hide resolved
python2/httplib2/__init__.py Outdated Show resolved Hide resolved
python2/httplib2/__init__.py Outdated Show resolved Hide resolved
python2/httplib2/__init__.py Outdated Show resolved Hide resolved
python2/httplib2/__init__.py Outdated Show resolved Hide resolved
python2/httplib2/__init__.py Outdated Show resolved Hide resolved
python2/httplib2/__init__.py Outdated Show resolved Hide resolved
python2/httplib2/__init__.py Outdated Show resolved Hide resolved
python2/httplib2/__init__.py Outdated Show resolved Hide resolved
python2/httplib2/__init__.py Outdated Show resolved Hide resolved
@weyou
Copy link
Author

weyou commented Nov 18, 2020

Super, thanks. It's going to be the best cookie implementation.

There is only few more things to change and you can leave to me if you think it's too much work. All credit is yours if I do this too.

It's alright, I will take it.

  • it would probably be good idea to extract all cookie related code to separate file cookie.py

Good idea, Will do.

  • CookieJar should only contain public API for cookie storage, same as other (null/database/etc) jar implementation; common helpers like normalize_path should go to Cookie or top level functions

No problem. Will do.

  • CookieJar _load _save probably should be private - your call

I think it's better to keep them as public API. So user still can loads / saves cookies manually when the cookie persistence function is disabled(didn't provide cookie filename on initialization). It will provide greater flexibility.

  • please create Dummy/Null/Noop-CookieJar that always returns empty cookies and ignores set, then you can force Http object to always have a cookie manager (jar) present; also then you can remove Http.get/set/clear_cookie shortcuts because it's accessible with Http.cookie_jar.get/set/clear

It's fine to implement a NullCookieJar to elimiate the if self.cookie_jar:. I will let the CookieJar.set() also accept the parameters of Cookie.create() to add new cookie. So user can create new cookie without importing the Cookie class:

http.cookie_jar.set('a', '1')

instead of

from httplib2.cookie import Cookie
http.cookie_jar.set(Cookie.create('a', '1'))
  • modify headers once in _request, before any _conn_request call, that way cookie related changes are less spread across code
  • please run black python?/httplib2/cookie.py to fix double quotes and long lines

Will do.

@temoto
Copy link
Member

temoto commented Nov 18, 2020

It's fine to implement a NullCookieJar to elimiate the if self.cookie_jar:. I will let the CookieJar.set() also accept the parameters of Cookie.create() to add new cookie. So user can create new cookie without importing the Cookie class:

Nice. Maybe .set(name, value, expires=N, domain=N, path=N) = return .set_cookie(Cookie(...)) or you know better.

@weyou weyou requested a review from temoto November 18, 2020 13:01
Copy link
Member

@temoto temoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, you are very fast. Marked important changes. You can skip minor if you like.

python2/httplib2/__init__.py Outdated Show resolved Hide resolved
python2/httplib2/__init__.py Outdated Show resolved Hide resolved

if self.filename:
self.save()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(important)

all CookieJar methods below are not related to file storage, should go to Cookie or module level functions

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_match_path and _match_domain are used by get_header to help search cookies when walking through the <domain>/<path>/<name> tree. They're not related to cookie directly, since at that moment no cookie is get. And they are only used in CookieJar. I think it makes sense to place them in CookieJar as private methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_header is out of scope file storage (CookieJar) too. Imagine SqliteCookieJar, it needs all these methods but it definitely must not couple with private methods of file storage (CookieJar).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I still didn't get your point about this change.

The httplib2.CookieJar is not a file storage. It's a cookie jar in memory with functions to load cookies from/or save to file. Just like get()/set()/clear(), the get_header() is not special, it means getting the cookies in the format of HTTP header. Similarly, extract_header means setting the cookies from HTTP header.

Maybe it will be much clear if separating the FileCookieJar from the CookieJar. For example:

class CookieJarBase(CookieJar):
       <null operations>
 
NullCookieJar = CookieJarBase

class CookieJar(CookieJarBase):
        1. Virtual interfaces to persist the cookies: load()/save() 
        2. Implement all other operations(also includes `get_header`/`extract_header`) on self._cookies

class FileCookieJar(CookieJar):
        Re-implement save()/load() interface with file storage

class SqliteCookieJar(CookieJar):  <-- your example
        Re-implement save()/load() interface with sqlite database

Copy link
Member

@temoto temoto Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word "jar" means storage for cookies. So CookieJar should contain code that is related to storing cookies: in memory, file or somewhere else. get_header, extract_header are helper functions that are not related to storage.

It's very fine and clear for some functions to be at module level. There is no need to implement Java style class hierarchy. That virtual/implement/override nightmare is part of what made Python stdlib HTTP client so hard to use in the first place. Because you can't clearly read code: you start reading a method, it calls something in parent, look there, which calls something not in parent, look back, it's very easy for machine, but we should care about people.

If you want to separate cookie access and storage, it's a great idea but additional work. Because right now, each set() writing to disk is a performance problem.

class CookieJar(object):
  """Cookie in-memory storage"""
  def __init__(self):
    self._items = {}
  def get/set/clear():  # the cookie access methods only
  def __hash__():  # to help storage know when contents were modified and should be written to disk/db/etc

# not related to any storage or cookie access
def get_header():
def extract_header():

# and then separate storage, named as json and all other serialization APIs
def load(fileobj) -> CookieJar:
def dump(fileobj, CookieJar):

# this way, new database backend would only implement load/dump

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variant of your code with base jar is also fine:

class NullCookieJar(object):
  def get/set/clear(): ...
  # no load/save/get_header


class MemoryCookieJar(object):
  def get/set/clear(): ...
  # no load/save/get_header


# inherit MemoryCookieJar only as API convenience, no virtual/override methods
class FileCookieJar(MemoryCookieJar):
  def load/save(): ...
  # no get/set/clear/get_header


class SqliteCookieJar(object):
  def get() -> db.SELECT
  def set() -> db.INSERT/UPDATE
  def clear() -> db.DELETE
  # no load/save/get_header


class BatchSqliteCookieJar(MemoryCookieJar):
  def load() -> db.SELECT
  def save() -> db.INSERT/UPDATE
  # no get/set/clear/get_header

Copy link
Member

@temoto temoto Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me try to explain the issue here from another point.

CookieJar is public API, in sense that new storage backend must implement some methods for everything to work. So we must document what is the API (or at least provide refence implementation that other people may copy/modify). So what's the CookieJar API? You say it includes get_header/extract_header. It's even visible in that NullCookieJar implements get_header (which is wrong). Now somebody needs to store cookies in database. It completely makes sense that they must create get/set/clear, right?

class DatabaseCookieJar(object):
  def get/set/clear(): ... # makes sense

but it doesn't work, because Http._set_cookie_header calls your jar get_header. And they think, what my database cookie jar implementation must do differently in get_header? Ah there is (and could be) no new behavior (mind alarm 1). So what are their options?

# option 1
class DatabaseCookieJar(object):
  def get/set/clear(): ...

  @staticmethod
  def get_header(*a, **kw): return CookieJar.get_header(*a, **kw)

# option 2
class DatabaseCookieJar(CookieJar):  # why? only to make _set_cookie_header happy
  def get/set/clear(): ...

And they think, why my database implementation must couple with file cookie storage? There is no good reason. Because _set_cookie_header calls some common helper function via .cookie_jar property.

And yet another point: one may want to construct/parse cookie headers without knowing about file/memory/database cookie storage. Right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variant of your code with base jar is also fine:

class NullCookieJar(object):
  def get/set/clear(): ...
  # no load/save/get_header


class MemoryCookieJar(object):
  def get/set/clear(): ...
  # no load/save/get_header


# inherit MemoryCookieJar only as API convenience, no virtual/override methods
class FileCookieJar(MemoryCookieJar):
  def load/save(): ...
  # no get/set/clear/get_header


class SqliteCookieJar(object):
  def get() -> db.SELECT
  def set() -> db.INSERT/UPDATE
  def clear() -> db.DELETE
  # no load/save/get_header


class BatchSqliteCookieJar(MemoryCookieJar):
  def load() -> db.SELECT
  def save() -> db.INSERT/UPDATE
  # no get/set/clear/get_header

Without the load/save virtual interface in MemoryCookieJar, you cannot keep updating the file/sqldatabse on each set/clear operation in the FileCookieJar/BatchSqliteCookieJar

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can do like getattr(http.cookie_jar, 'save', lambda: None)() but that is ugly.

python2/httplib2/cookie.py Outdated Show resolved Hide resolved
python2/httplib2/cookie.py Outdated Show resolved Hide resolved
python2/httplib2/cookie.py Outdated Show resolved Hide resolved
python2/httplib2/cookie.py Outdated Show resolved Hide resolved
tests/test_cookies.py Outdated Show resolved Hide resolved
tests/test_cookies.py Show resolved Hide resolved
tests/test_cookies.py Outdated Show resolved Hide resolved
@weyou weyou requested a review from temoto November 19, 2020 08:09
python2/httplib2/cookie.py Outdated Show resolved Hide resolved
python2/httplib2/cookie.py Outdated Show resolved Hide resolved
tests/test_cookies.py Outdated Show resolved Hide resolved
tests/test_cookies.py Outdated Show resolved Hide resolved
tests/test_cookies.py Outdated Show resolved Hide resolved
tests/test_cookies.py Outdated Show resolved Hide resolved
tests/test_cookies.py Outdated Show resolved Hide resolved
tests/test_cookies.py Outdated Show resolved Hide resolved
tests/test_cookies.py Outdated Show resolved Hide resolved
Copy link
Member

@temoto temoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing _match_path, _match_domain, get_header from CookieJar is the last important change to be made. Sorry I understand your position but disagree. You have done incredible work. 👍

@weyou
Copy link
Author

weyou commented Nov 23, 2020

Removing _match_path, _match_domain, get_header from CookieJar is the last important change to be made. Sorry I understand your position but disagree. You have done incredible work. 👍

@temoto I would appreciate it if you can continue the improvement work on this part. I will be busy in the next few weeks. Thank you, for your help and patience.

@temoto
Copy link
Member

temoto commented Nov 23, 2020

Removing _match_path, _match_domain, get_header from CookieJar is the last important change to be made. Sorry I understand your position but disagree. You have done incredible work. 👍

@temoto I would appreciate it if you can continue the improvement work on this part. I will be busy in the next few weeks. Thank you, for your help and patience.

Yes, no problem, I will. Thanks for your big contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cookies on redirect response ignored
2 participants