Skip to content
piatrashkakanstantinass edited this page Aug 23, 2021 · 23 revisions

PyWhat has its own API, it will return a JSON object like:

{
    "File Signatures": None,
    "Regexes": {
        "text": [
            {
                "Matched": "127.0.0.1",
                "Regex Pattern": {
                    "Name": "Internet Protocol (IP) Address Version 4",
                    "Regex": "^((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?::[0-9]{1,5})?)$",
                    "plural_name": False,
                    "Description": None,
                    "Rarity": 0.7,
                    "URL": "https://www.shodan.io/host/",
                    "Tags": [
                        "Identifiers",
                        "Networking",
                        "IPv4"
                    ],
                    "Boundaryless Regex": "((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?::[0-9]{1,5})?)"
                }
            }
        ]
    }
}

To use this API, run this code:

from pywhat import Identifier
id = Identifier()
id.identify(text)

Identifier.identify() parameters

All parameters to identify() are keyword-only except the text itself.

id.identify(text,
            only_text=True,        # If this is True, PyWhat will not read data from the file
            dist=None,             # Distribution to use (see below for more info regarding Distributions)
            key=None,              # Key used for sorting, defaults to Keys.NONE (see below for more info regarding sorting)
            reverse=False,         # If this is True, the output is sorted in descending order
            boundaryless=None,     # Filter that defines what regexes should be boundaryless (see below for more info regarding boundaryless mode)
            search_filenames=False # If this is True, PyWhat will search the name of a file for identifiable info
)

Filters & Distributions

PyWhat has its own filtration system. The core of it is a Filter class.

To filter out what regexes should be used or shown, we can use distributions. A distribution is a filter with regex list.

A nice use-case is Wannacry. Using distributions you can only get all the domains from malware (no crypto-addresses) and use that to auto-buy those domains if possible. Potentially stopping the malware if it has a built in kill-switch!

We start by importing the necessary libraries:

from pywhat import pywhat_tags, Distribution, Filter

Now we can make a filter and a distribution:

filter1 = Filter({"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]})
dist = Distribution(filter1)

We only support:

  • MinRarity. Rarity is a measure of how unlikely it is for something to be a false-positive. Rarity of 1 == it can't be a false positive.

Rarity of 0.1 == Very likely to be a false positive.

MinRarity is the absolute minimum you'll want to see. Up this to avoid false positives!

  • MaxRarity

Max rarity is the absolute maximum rarity you want to see.

  • Tags. Every regex is tagged. To only use AWS specific tags, use AWS as the tag.

To see all tags, run what --tags 😄 or

from pywhat import *
print(pywhat_tags)
  • ExcludeTags. What tags do you not want to see?

Let's make another filter:

from pywhat import pywhat_tags, Distribution, Filter

filter1 = Filter({"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]})
filter2 = Filter({"MinRarity": 0.4, "MaxRarity": 0.8, "ExcludeTags": ["Media"]})

Logical Operators

Distributions and Filters support logical operators! Want every tag that's in both filter1 and filter2?

from pywhat import pywhat_tags, Distribution, Filter

filter1 = Filter({"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]})
filter2 = Filter({"MinRarity": 0.4, "MaxRarity": 0.8, "ExcludeTags": ["Media"]})

dist = Distribution(filter1 & filter2)

r = identifier.Identifier(dist=dist)
r.identify(text)

Or:

from pywhat import pywhat_tags, Distribution, Filter

filter1 = Filter({"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]})
filter2 = Filter({"MinRarity": 0.4, "MaxRarity": 0.8, "ExcludeTags": ["Media"]})

dist = Distribution(filter1) 
dist &= Distribution(filter2)

r = identifier.Identifier(dist=dist)
r.identify(text)

We also support logical or! Get all the items in distribution1 or distribution2!

from pywhat import pywhat_tags, Distribution, Filter

filter1 = Filter({"MinRarity": 0.3, "Tags": ["Networking", "AWS"], "ExcludeTags": ["Identifiers"]})
filter2 = Filter({"MinRarity": 0.4, "MaxRarity": 0.8, "ExcludeTags": ["Media"]})
filter3 = Filter({"ExcludeTags": ["AWS"]})

dist = Distribution(filter1) | Distribution(filter2)
dist |= Distribution(filter3)

r = identifier.Identifier(dist=dist)
r.identify(text)

Using Distributions and Identifier

There are 2 ways to use distributions with identifiers.

You can assign one per object:

r = Identifier(dist=dist)
r.identify(text)

Or you can call it in the identifier:

no_networking_tags = Distribution(filter2)
r.identify(text, dist=no_networking_tags)

To get more information use:

from pywhat import *
help(Filter)
help(Distribution)

Sorting

Pywhat supports sorting. You can get sorted output this way:

from pywhat import *
r = Identifier()
r.identify(text, key=Keys.RARITY) # returns matches sorted by rarity in ascending order
r2 = Identifier(key=Keys.MATCHED, reverse=True)
r2.identify(text) # returns matches sorted alphabetically in descending order

Available keys

Keys.NAME # Sort by the name of regex pattern
Keys.RARITY # Sort by rarity
Keys.MATCHED # Sort by a matched string
Keys.NONE # No sorting is done (the default)

Searching within files and folders

PyWhat can check if input is a valid file/folder name or a path to a file. If it finds a folder match, PyWhat will recursively search it, and return matches for each file, with key value being the filename. When PyWhat is searching only text, this value is text. This behaviour is disabled in API. In order to search within files and folders, you can specify an only_text=False parameter.

out = r.identify("/Desktop/file.txt", only_text=False)

File searching is enabled in CLI. To disable it pass -o or --only-text option.

Boundaryless mode

API does not match inputs like "abcthm{kgh}jk" because the boundaryless mode is disabled by default. Boundaryless mode allows regexes to search within strings (in case of "abcthm{kgh}jk", pywhat can find "thm{kgh}" match). To enable it you need to create a filter denoting what regexes should be in boundaryless mode (see above for more info regarding the filtration system).

from pywhat import *

# All regexes that have 'Identifiers' or 'Cyber Security' tags and a rarity of 0.6 or higher will be in boundaryless mode.
boundaryless = Filter({"Tags": ["Identifiers", "Cyber Security"], "MinRarity": 0.6}) 

id = Identifier()
id.identify("abcthm{kgh}jk", boundaryless=boundaryless)