Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for list of regex in regex.json #244

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

GauriBodke
Copy link

⚠ Pull Requests not made with this template will be automatically closed 🔥

Prerequisites

Why do we need this pull request?

  • The main idea is to support multiple regex patterns for the same entry. Different regex patterns for multiple entries with the same name in regex.json are combined dynamically , which makes individual regex easy to understand. Here for example , there can be different types of Phone Number pattern such as 10 digits, 7 digits or 11 digits; so we can have multiple entries with same Name and different regex pattern. The added code will dynamically combine such entries by vertical pipe |.
  • Fixed Add support for list of regex in regex.json #227

What GitHub issues does this fix?

Copy / paste of output

(pyWhat) C:\Users\WIN10\OneDrive\Documents\GitHub\pyWhat>python pywhat\what.py 1234567
Matched on: 1234567
Name: Phone Number
(pyWhat) C:\Users\WIN10\OneDrive\Documents\GitHub\pyWhat>python pywhat\what.py "654 2345"
Matched on: 654 2345
Name: Phone Number
(pyWhat) C:\Users\WIN10\OneDrive\Documents\GitHub\pyWhat>python pywhat\what.py "+13528880000"
Matched on: +13528880000
Name: Phone Number
Description: Location(s): United States

Matched on: 135288800
Name: American Social Security Number
Description: An American Identification Number

Matched on: 13528880000
Name: Turkish Identification Number

@nodtem66
Copy link
Contributor

@GauriBodke Good work.
But the author became ghost. so please wait for some reviews.

My review

The author proposed a new way to improve the readability of regex.json

This PR will merge the two entries of Regexps with the same name. Then the old Regexps that have the subpattern joining with '|' need to split into multiple entries.
Old regex.json

[ 
{
      "Name": "TryHackMe Flag Format",
      "Regex": "(?i)^thm{.*}|tryhackme{.*}$",
      "plural_name": false,
      "Description": "Used for Capture The Flags at https://tryhackme.com",
      "Rarity": 1,
      "URL": null,
      "Tags": [
         "CTF Flag"
      ],
      "Examples": {
         "Valid": [
            "thm{hello}"
         ],
         "Invalid": []
      }
   }
]

New regex.json

[ 
{
    "Name": "TryHackMe Flag Format",
    "Regex": "(?i)^thm{.*}$",
    "plural_name": false,
    "Description": "Used for Capture The Flags at https://tryhackme.com",
    "Rarity": 1,
    "URL": null,
    "Tags": [
       "CTF Flag"
    ],
    "Examples": {
       "Valid": [
          "thm{hello}"
       ],
       "Invalid": []
    }
 },
{
    "Name": "TryHackMe Flag Format",
    "Regex": "(?i)^tryhackme{.*}$",
    "plural_name": false,
    "Description": "Used for Capture The Flags at https://tryhackme.com",
    "Rarity": 1,
    "URL": null,
    "Tags": [
       "CTF Flag"
    ],
    "Examples": {
       "Valid": [
          "thm{hello}"
       ],
       "Invalid": []
    }
 }
]

This is different from the method proposed by @amadejpapez that uses list as the value of "Regex" key.

   {
      "Name": "TryHackMe Flag Format",
      "Regex": ["(?i)^thm{.*}$", "(?i)^tryhackme{.*}$"],
      "plural_name": false,
      "Description": "Used for Capture The Flags at https://tryhackme.com",
      "Rarity": 1,
      "URL": null,
      "Tags": [
         "CTF Flag"
      ],
      "Examples": {
         "Valid": [
            "thm{hello}"
         ],
         "Invalid": []
      }
   }

My opinion

This PR did fix the author's issue about readability. However, the main drawback is that regex.json will have a lot of duplicated entries with the same name but different Regex. It also increases the size of file regex.json unnecessarily. So, I prefer the method proposed by @amadejpapez.

@amadejpapez
Copy link
Collaborator

amadejpapez commented Dec 15, 2021

I agree. Having Regex key as a list and each regex format as a separate list element is what we want.

@GauriBodke Sorry if this wasn't clear enough from the start and if my comment with the example was too late. If you could change your PR to that, it would be amazing :)

You can take the THM regex list from my example and update PyWhat to work with it. Then I can go through the database and separate more regexes into lists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for list of regex in regex.json
3 participants