Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating new publicsuffix file with filled suffix_list_urls #331

Open
AnnaSummer opened this issue May 8, 2024 · 4 comments
Open

Creating new publicsuffix file with filled suffix_list_urls #331

AnnaSummer opened this issue May 8, 2024 · 4 comments

Comments

@AnnaSummer
Copy link

AnnaSummer commented May 8, 2024

Hello! I create object of TLDExtract:

extractor = tldextract.TLDExtract(
suffix_list_urls=["file://" + "/absolute/path/to/json/with/custom/public/suffixes"],
cache_dir='/absolute/path/to/cache/dir',
fallback_to_snapshot=False
)

When I call print(extractor("google.ac")), new empty (!) files (lock and json) with public suffixes are created (identifying by new hash in name) and this file is used as a new public suffixes file.

In source code in cache.py (line 108):

cache_filepath = self._key_to_cachefile_path(namespace, key)

method _key_to_cachefile_path create new hash, that is used as a filename of new public suffix file.

Is it a correct behaviour? I just need to avoid HTTP requests to update cache of tlds (in any time including first run).

@john-kurkowski
Copy link
Owner

When I call print(extractor("google.ac")), new empty (!) files (lock and json) with public suffixes are created

Hello! Are you asking if creating empty cache files is expected behavior? I would say no, that's not expected. Only the files ending in *.lock should be empty, 0 bytes.

Even if this library caches 0 public suffixes, the cache file should be a few bytes, containing the JSON [[], []]. Then calling extractor("google.ac") would raise ValueError: No tlds set. Cannot proceed without tlds..

@john-kurkowski
Copy link
Owner

So I'm not sure what's going on in your case. Aside, if your file is local anyway, you could play with avoiding the cache step. The cache won't save a ton of processing for a local file.

tldextract.TLDExtract(
    suffix_list_urls=["file://" + "/absolute/path/to/json/with/custom/public/suffixes"],
    cache_dir=None,
    fallback_to_snapshot=False
)

@AnnaSummer
Copy link
Author

Thank you for answer!
I need to avoid tlds loading from the Internet (including at first run of library methods), just from local file (only).
I tried to call TLDExtract with your parameters, but get "ValueError: No tlds set. Cannot proceed without tlds."

My local file with public suffixes is not empty (there are tlds in that json).

@john-kurkowski
Copy link
Owner

there are tlds in that json

That could be the problem. The suffix_list_urls expects to open URLs with plaintext files formatted like the Public Suffix List. Notice that linked file is not JSON.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants