Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: UPS tracking numbers #228

Merged
merged 3 commits into from Nov 7, 2021
Merged

Conversation

P403n1x87
Copy link
Contributor

@P403n1x87 P403n1x87 commented Nov 4, 2021

⚠ Pull Requests not made with this template will be automatically closed 🔥

Prerequisites

Why do we need this pull request?

  • This PR adds the regex for UPS tracking numbers

What GitHub issues does this fix?

N. A.

Copy / paste of output

pywhat 1Z123CAB9912345678    
Matched on: 1Z123CAB9912345678
Name: UPS Tracking Number
Link:  https://www.ups.com/track?tracknum=1Z123CAB9912345678

tests/test_regex_identifier.py Outdated Show resolved Hide resolved
"Regex": "^(1Z[0-9A-Z]{6}[0-9]{2}[0-9]{8})$",
"plural_name": false,
"Description": null,
"Rarity": 1,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that rarity should be lowered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions? 🙂

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something around 0.3

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 0.3? I'd say higher like 0.5 or 0.6 because:

  1. The string has to start with 1Z
  2. It needs 7 chars 0-9A-Z
  3. It has exactly 2 numbers
  4. It has 8 numbers

Also, can we make it:

- ^(1Z[0-9A-Z]{6}[0-9]{2}[0-9]{8})$
- + ^(1Z[0-9A-Z]{6}[0-9]{10})$

?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 0.4 or 0.5. And yes, regex should be changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of the 2+8 split is because the first 2 digits in this group represent a service indicator code and perhaps it could be captured and handled in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: I wonder if the "rarity" could be estimated more reliably through some entropy-based measure 🤔

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

service indicator code

We have precedence for this called sub-categories. See the Mastercard / Phone Numbers regex. I am not sure it'll work on data in the middle of the regex, we may need to change the code for that :)

Aside: I wonder if the "rarity" could be estimated more reliably through some entropy-based measure 🤔

Probably! Currently I am estimating it based on what I see when people post this:
Screenshot 2021-11-04 at 16 09 24

And also whether we have any false positives.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@P403n1x87 You can use subcategories with regex method for that.

pywhat/Data/regex.json Outdated Show resolved Hide resolved
pywhat/Data/regex.json Outdated Show resolved Hide resolved
pywhat/Data/regex.json Outdated Show resolved Hide resolved
@P403n1x87 P403n1x87 force-pushed the feat/ups-tracking branch 3 times, most recently from 14b9fcc to 45d37fe Compare November 7, 2021 11:02
@P403n1x87 P403n1x87 requested review from bee-san and a user November 7, 2021 11:03
@codecov-commenter
Copy link

codecov-commenter commented Nov 7, 2021

Codecov Report

Merging #228 (62b3df8) into main (a5a4a3b) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #228   +/-   ##
=======================================
  Coverage   92.60%   92.60%           
=======================================
  Files          15       15           
  Lines        1217     1217           
=======================================
  Hits         1127     1127           
  Misses         90       90           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a5a4a3b...62b3df8. Read the comment docs.

pywhat/Data/regex.json Outdated Show resolved Hide resolved
Co-authored-by: piatrashkakanstantinass <74979584+piatrashkakanstantinass@users.noreply.github.com>
@P403n1x87 P403n1x87 requested a review from a user November 7, 2021 11:27
@bee-san bee-san merged commit 559a89c into bee-san:main Nov 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants