Skip to content

Extracts tld, domain, subdomains and query from URL for Python

License

Notifications You must be signed in to change notification settings

nacholibre/url_extract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

URL Extract

This module extracts tld, domain, subdomains and query from URLs. It also validates the URLs.

Documentation https://url-extract.readthedocs.io/en/latest/

Installation

pip install url_extract

Usage

>>> from url_extract import UrlExtract
>>> extract = UrlExtract()
Downloading list...
>>> extracted = extract.extract('http://dir.bg')
>>> extracted.getDomain()
'dir'
>>> extracted.getTld()
'bg'
>>> extracted.valid()
>>> True
>>> extracted = extract.extract('https://sireninfo.com')
>>> extracted.getDomain()
'sireninfo'
>>> extracted = extract.extract('http://police.uk')
>>> extracted.valid()
False

Documentation

####class UrlExtract (datFileMaxAge=86400*31, datFileSaveDir=None, alwaysPuny=None)####

  • datFileMaxAge specifies the max age of the public suffix list
  • datFileSaveDir specifies where will the public suffix list (tlds.dat) will be downloaded
  • alwaysPuny if set to True unicoded domains after extract will be punyencoded
  • extract(url) - Extracts the url and returns Result() object

####class Result ()####

  • getDomain() - Returns domain name without subdomains and tld.
  • getTld() - Returns the tld of the domain
  • valid() - Validates domain and returns True or False
  • getFoundSubdomains() - Returns the extracted subdomains as list
  • getHostname() - Returns the hostname of the URL
  • getUrlQuery() - Returns the query after the first / in the url

About

Extracts tld, domain, subdomains and query from URL for Python

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages