Worry the checker is actually downloading the content at the checked URL #106

markcmiller86 · 2024-02-01T05:56:32Z

For some reason, this goes pretty slowly. I am working from this document and it takes quite a while to complete a check. Next, I notice that on .pdf files, it stalls for longer, especially the one at the ftp link.

So, this has me worried that it is actually fully getting the content to check the link. I've seen similar in the Sphinx URL checking feature too. It should really just be get a header at each URL and not the full content.

Is this something you've looked into?

The text was updated successfully, but these errors were encountered:

vsoch · 2024-02-01T06:00:25Z

We should do head instead of get, I agree. We haven't looked into it but can.

vsoch · 2024-02-01T06:00:49Z

Would you like me to update the branch we are working on to try it out?

markcmiller86 · 2024-02-01T06:55:22Z

ChatGPT suggests something along the lines of

import requests

def is_url_working(url):
    try:
        response = requests.head(url)
        # You might also want to check for redirects (response.status_code == 302)
        if response.status_code == 200:
            return True
        elif response.status_code == 405: # head requests disallowed
            retval = False
            response = requests.get(url, stream=True)
            if response.status_code == 200:
                retval = True
            response.close()  # Make sure to close the response
            return retval
    except requests.RequestException as e:
        print(f"An error occurred: {e}")
        return False

# Example usage
url = "http://example.com"
if is_url_working(url):
    print(f"The URL {url} is working.")
else:
    print(f"The URL {url} is not working.")

vsoch · 2024-02-01T07:30:52Z

Oh geez chatGPT? 🙃 The first lines I see issues:

follow redirects is a parameter on head. https://requests.readthedocs.io/en/latest/user/quickstart/#redirection-and-history

The retval and response.close don’t make sense.

I appreciate the suggestion but I don’t think the quality of code from AI tools is very good. It’s mostly copy pasting some poor souls code from somewhere else on GitHub. I’m happy to write this with my own knowledge and careful inspection of core docs and library code to get the functionality I want.

vsoch · 2024-02-01T07:40:40Z

But I have to take it back - it does look like response.close() is useful for requests.get() ! Geez, I've been writing in Python a long time and I just don't see it very often. So I learned something from ChatGPT! I appreciate the post, and I'll try to be more open minded about it (even if I don't use it)!

markcmiller86 · 2024-02-01T17:54:06Z

I'll try to be more open minded about it (even if I don't use it)!

So, I just happened to see this Q&A with Linus Torvalds about AI tools in coding...

vsoch · 2024-02-01T18:32:40Z

haha I totally watched that! I'll watch again tonight with new context.

markcmiller86 · 2024-02-03T18:04:30Z

So, the more I think about this, the more I think HEAD requests are the most likely to be useful for links to non .html content (e.g. .pdf or other binary content a link references. I think for most .html content, the cost to download is likely minimal.

vsoch · 2024-02-03T18:07:06Z

Agree! Let me work hard today (just presented at FOSDEM) and maybe I can do some work on this later if I'm productive!

markcmiller86 mentioned this issue Feb 2, 2024

URL testing in wikize_refs betterscientificsoftware/bssw.io#1990

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worry the checker is actually downloading the content at the checked URL #106

Worry the checker is actually downloading the content at the checked URL #106

markcmiller86 commented Feb 1, 2024

vsoch commented Feb 1, 2024

vsoch commented Feb 1, 2024

markcmiller86 commented Feb 1, 2024

vsoch commented Feb 1, 2024 •

edited

vsoch commented Feb 1, 2024

markcmiller86 commented Feb 1, 2024

vsoch commented Feb 1, 2024

markcmiller86 commented Feb 3, 2024

vsoch commented Feb 3, 2024

Worry the checker is actually downloading the content at the checked URL #106

Worry the checker is actually downloading the content at the checked URL #106

Comments

markcmiller86 commented Feb 1, 2024

vsoch commented Feb 1, 2024

vsoch commented Feb 1, 2024

markcmiller86 commented Feb 1, 2024

vsoch commented Feb 1, 2024 • edited

vsoch commented Feb 1, 2024

markcmiller86 commented Feb 1, 2024

vsoch commented Feb 1, 2024

markcmiller86 commented Feb 3, 2024

vsoch commented Feb 3, 2024

vsoch commented Feb 1, 2024 •

edited