New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to use custom hash functions in hashing.hash
and Memory
#1232
Open
judahrand
wants to merge
5
commits into
joblib:main
Choose a base branch
from
judahrand:custom-hasher
base: main
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
16f5f89
Add ability to register custom hash function
judahrand c715da3
Register `xxh32` if `xxhash` is installed
judahrand f2a7277
Allow for selection of hash function in `Memory`
judahrand f82cebb
Add tests
judahrand babb2da
Use `xxh3_64` instead of `xxh32`
judahrand File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -402,6 +402,11 @@ class MemorizedFunc(Logger): | |||||
of compression. Note that compressed arrays cannot be | ||||||
read by memmapping. | ||||||
|
||||||
hash_name: {'md5', 'sha1'}, optional | ||||||
The name of the hash function to use to hash arguments. | ||||||
Defaults to 'md5'. Additionally, if `xxhash` is installed | ||||||
'xxh3_64' is valid. | ||||||
|
||||||
verbose: int, optional | ||||||
The verbosity flag, controls messages that are issued as | ||||||
the function is evaluated. | ||||||
|
@@ -411,10 +416,16 @@ class MemorizedFunc(Logger): | |||||
# ------------------------------------------------------------------------ | ||||||
|
||||||
def __init__(self, func, location, backend='local', ignore=None, | ||||||
mmap_mode=None, compress=False, verbose=1, timestamp=None): | ||||||
mmap_mode=None, compress=False, hash_name='md5', | ||||||
verbose=1, timestamp=None): | ||||||
Logger.__init__(self) | ||||||
self.mmap_mode = mmap_mode | ||||||
self.compress = compress | ||||||
if hash_name not in hashing._HASHES: | ||||||
raise ValueError("Valid options for 'hash_name' are {}. " | ||||||
"Got hash_name={!r} instead." | ||||||
.format(hash_name, hash_name)) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
self.hash_name = hash_name | ||||||
self.func = func | ||||||
|
||||||
if ignore is None: | ||||||
|
@@ -630,7 +641,8 @@ def check_call_in_cache(self, *args, **kwargs): | |||||
|
||||||
def _get_argument_hash(self, *args, **kwargs): | ||||||
return hashing.hash(filter_args(self.func, self.ignore, args, kwargs), | ||||||
coerce_mmap=(self.mmap_mode is not None)) | ||||||
coerce_mmap=(self.mmap_mode is not None), | ||||||
hash_name=self.hash_name) | ||||||
|
||||||
def _get_output_identifiers(self, *args, **kwargs): | ||||||
"""Return the func identifier and input parameter hash of a result.""" | ||||||
|
@@ -893,6 +905,11 @@ class Memory(Logger): | |||||
of compression. Note that compressed arrays cannot be | ||||||
read by memmapping. | ||||||
|
||||||
hash_name: {'md5', 'sha1'}, optional | ||||||
The name of the hash function to use to hash arguments. | ||||||
Defaults to 'md5'. Additionally, if `xxhash` is installed | ||||||
'xxh3_64' is valid. | ||||||
|
||||||
verbose: int, optional | ||||||
Verbosity flag, controls the debug messages that are issued | ||||||
as functions are evaluated. | ||||||
|
@@ -914,8 +931,8 @@ class Memory(Logger): | |||||
# ------------------------------------------------------------------------ | ||||||
|
||||||
def __init__(self, location=None, backend='local', cachedir=None, | ||||||
mmap_mode=None, compress=False, verbose=1, bytes_limit=None, | ||||||
backend_options=None): | ||||||
mmap_mode=None, compress=False, hash_name='md5', verbose=1, | ||||||
bytes_limit=None, backend_options=None): | ||||||
# XXX: Bad explanation of the None value of cachedir | ||||||
Logger.__init__(self) | ||||||
self._verbose = verbose | ||||||
|
@@ -924,6 +941,11 @@ def __init__(self, location=None, backend='local', cachedir=None, | |||||
self.bytes_limit = bytes_limit | ||||||
self.backend = backend | ||||||
self.compress = compress | ||||||
if hash_name not in hashing._HASHES: | ||||||
raise ValueError("Valid options for 'hash_name' are {}. " | ||||||
"Got hash_name={!r} instead." | ||||||
.format(hash_name, hash_name)) | ||||||
self.hash_name = hash_name | ||||||
if backend_options is None: | ||||||
backend_options = {} | ||||||
self.backend_options = backend_options | ||||||
|
@@ -1011,6 +1033,7 @@ def cache(self, func=None, ignore=None, verbose=None, mmap_mode=False): | |||||
backend=self.backend, | ||||||
ignore=ignore, mmap_mode=mmap_mode, | ||||||
compress=self.compress, | ||||||
hash_name=self.hash_name, | ||||||
verbose=verbose, timestamp=self.timestamp) | ||||||
|
||||||
def clear(self, warn=True): | ||||||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -20,7 +20,7 @@ | |||||
from decimal import Decimal | ||||||
import pytest | ||||||
|
||||||
from joblib.hashing import hash | ||||||
from joblib.hashing import hash, register_hash, _HASHES | ||||||
from joblib.func_inspect import filter_args | ||||||
from joblib.memory import Memory | ||||||
from joblib.testing import raises, skipif, fixture, parametrize | ||||||
|
@@ -495,3 +495,23 @@ def test_wrong_hash_name(): | |||||
with raises(ValueError, match=msg): | ||||||
data = {'foo': 'bar'} | ||||||
hash(data, hash_name='invalid') | ||||||
|
||||||
|
||||||
def test_right_regist_hash(): | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
hash_name = 'my_hash' | ||||||
assert hash_name not in _HASHES | ||||||
register_hash(hash_name, hashlib.sha256) | ||||||
assert _HASHES[hash_name] == hashlib.sha256 | ||||||
|
||||||
|
||||||
def test_wrong_register_hash(): | ||||||
with raises(ValueError, match="Hash name should be a string"): | ||||||
register_hash(0, hashlib.md5) | ||||||
|
||||||
with raises( | ||||||
ValueError, match="Hash function instance must implement"): | ||||||
register_hash('test_hash', int) | ||||||
|
||||||
with raises( | ||||||
ValueError, match="Hash function 'md5' already registered."): | ||||||
register_hash('md5', hashlib.md5) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably remove this part. The rational is that as this is not the default, users would mainly discover this through the doc, and we could explain how to register this when they create a
Memory
object?Else, this results in an extra lib import on all processes, which might be costly ( not sure how heavy this lib is?)