Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use max(uint64) for OOV lexeme rank #5303

Merged
merged 6 commits into from Apr 15, 2020

Conversation

adrianeboyd
Copy link
Contributor

Description

Use max(uint64) for OOV lexeme rank

Types of change

Bug fix.

Checklist

  • I have submitted the spaCy Contributor Agreement.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@adrianeboyd adrianeboyd added bug Bugs and behaviour differing from documentation feat / vectors Feature: Word vectors and similarity labels Apr 14, 2020
@adrianeboyd
Copy link
Contributor Author

adrianeboyd commented Apr 14, 2020

The tests will fail because the version of thinc isn't available.

Is there a better way to define OOV_RANK in a way that's fast in Lexeme and elsewhere?

I also couldn't think of a good way to write tests for this?

Requiring the updated version of thinc was unnecessary.
Define OOV_RANK in one place in `util`.
@honnibal
Copy link
Member

honnibal commented Apr 14, 2020

I think we can do:

from libc.stdint cimport UINT64_MAX

OOV_RANK = UINT64_MAX

In the test, I think we can test whether OOV_RANK is 2**64-1`? That's what it should be, in theory, and we probably want to know about it if it's not.

Switch to external defintions of max(uint64) and confirm that they are
equal.
@adrianeboyd
Copy link
Contributor Author

util isn't a great place to put it but nothing else seemed better and I was trying not to call a function if possible. If you want to call a function in the python parts, I think you may as well use numpy.iinfo(numpy.uint64).max. How about this version with no local math?

@honnibal
Copy link
Member

Okay, looks good!

@honnibal honnibal merged commit 98c5902 into explosion:master Apr 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs and behaviour differing from documentation feat / vectors Feature: Word vectors and similarity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants