Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module do not work anymore #155

Closed
FloRRenn opened this issue May 1, 2022 · 7 comments
Closed

Module do not work anymore #155

FloRRenn opened this issue May 1, 2022 · 7 comments
Assignees
Labels
bug Something isn't working Server Issues caused by the backend or server-related downtime.

Comments

@FloRRenn
Copy link

FloRRenn commented May 1, 2022

I got a error like this when try to execute a code

Exception has occurred: RetryError
HTTPSConnectionPool(host='nhentai.net', port=443): Max retries exceeded with url: /random (Caused by ResponseError('too many 503 error responses'))

This is my code

from hentai import Hentai, Format

doujin = Hentai(177013)
print(doujin.url)
@FloRRenn FloRRenn added the bug Something isn't working label May 1, 2022
@hentai-chan
Copy link
Owner

hentai-chan commented May 1, 2022

Hi, thanks for filing this issue. Can you still access the website through the browser?

Edit: I now received a couple of reports via mail that mentioned something similar happening so this is now a confirmed bug. Instead of responding to each and everyone of them I will try to make sense of it but at the end of the day, there's only so much I can do about it since I'm not in control of the backend. If we're lucky this could also only be a temporary issue. Also, if anyone knows how to reach the site admin that would be great - ideally I want to find a solution that works for everyone.

@hentai-chan hentai-chan pinned this issue May 1, 2022
@hentai-chan hentai-chan added the Server Issues caused by the backend or server-related downtime. label May 1, 2022
@hentai-chan
Copy link
Owner


tl;dr nothing I can fix


After digging into the issue I found out that the nhentai site added cloudflare protection to their website. This is not limited to their API only, when you fire up https://nhentai.net for the first time you will see the cloudflare protection for a few seconds, so pretending to be someone else with a fake UA string or full request header won't do much here either since that's not the only thing cloudflare checks against. By the way ,we are not the only people that face this issue, pretty much service that consumes this API will experience something similar [1] [2]. After some more research I tried a few libraries that promised to circumvent cloudflare protection,

#!/usr/bin/env python3

import cloudscraper

settings = { "browser": "chrome", "platform": "android", "desktop": False }
scraper = cloudscraper.create_scraper(browser = settings)
response = scraper.get("https://nhentai.net/api/gallery/177013").text

print(response) # nope

but even if it did work I would not have been fond of this solution. In my opinion trying to play this game to trick out cloudflare is a lost cause because the site admin can decide to shut down the API at any time. According to other developers this protection may be turned off again after a few days. In my opinion, the best solution would be to add an OAuth2 protocol to the API in combination with a rate limit in order to combat malicious behavior such as scraping the entire catalog, but this is a process the site admin would have to kick off. I think they mentioned a few years ago that they had plans to add authentication to their API but they never got around implementing it. In the past, the API has also been shut down occasionally.

Long story short, while there probably are short-term solution to solve this problem [3], I don't think it's wort the time and effort if it doesn't address the underlying issues. Most if not all language bindings have the same problem, but I will keep my eyes around to see if the protection got turned off again or somebody else found a good solution.

[1] DiamondMiner88/nhentai#14
[2] SylveonDeko/NHentaiAPI#12
[3] https://github.com/FlareSolverr/FlareSolverr

PS: I don't think many users of this library are aware of this feature but there actually is an alternative way to instantiate a Hentai object, namely by feeding the JSON data to the constructor:

from hentai import Hentai

response: dict = None # find a way to obtain the JSON data from the /api/gallery/id endpoint
doujin = Hentai(json=response)

This feature was originally implemented as some sort of caching, i.e. if you stored the JSON with the export method using options=[Option.Raw], this would make it possible to regain access to the properties without making unnecessary requests, which in turn would making parsing this data easy again.

@FloRRenn
Copy link
Author

FloRRenn commented May 1, 2022

tl;dr nothing I can fix

After digging into the issue I found out that the nhentai site added cloudflare protection to their website. This is not limited to their API only, when you fire up https://nhentai.net for the first time you will see the cloudflare protection for a few seconds, so pretending to be someone else with a fake UA string or full request header won't do much here either since that's not the only thing cloudflare checks against. By the way ,we are not the only people that face this issue, pretty much service that consumes this API will experience something similar [1] [2]. After some more research I tried a few libraries that promised to circumvent cloudflare protection,

#!/usr/bin/env python3

import cloudscraper

settings = { "browser": "chrome", "platform": "android", "desktop": False }
scraper = cloudscraper.create_scraper(browser = settings)
response = scraper.get("https://nhentai.net/api/gallery/177013").text

print(response) # nope

but even if it did work I would not have been fond of this solution. In my opinion trying to play this game to trick out cloudflare is a lost cause because the site admin can decide to shut down the API at any time. According to other developers this protection may be turned off again after a few days. In my opinion, the best solution would be to add an OAuth2 protocol to the API in combination with a rate limit in order to combat malicious behavior such as scraping the entire catalog, but this is a process the site admin would have to kick off. I think they mentioned a few years ago that they had plans to add authentication to their API but they never got around implementing it. In the past, the API has also been shut down occasionally.

Long story short, while there probably are short-term solution to solve this problem [3], I don't think it's wort the time and effort if it doesn't address the underlying issues. Most if not all language bindings have the same problem, but I will keep my eyes around to see if the protection got turned off again or somebody else found a good solution.

[1] DiamondMiner88/nhentai#14 [2] andy840119/NHentaiAPI#12 [3] https://github.com/FlareSolverr/FlareSolverr

PS: I don't think many users of this library are aware of this feature but there actually is an alternative way to instantiate a Hentai object, namely by feeding the JSON data to the constructor:

from hentai import Hentai

response: dict = None # find a way to obtain the JSON data from the /api/gallery/id endpoint
doujin = Hentai(json=response)

This feature was originally implemented as some sort of caching, i.e. if you stored the JSON with the export method using options=[Option.Raw], this would make it possible to regain access to the properties without making unnecessary requests, which in turn would making parsing this data easy again.

thank for your supporting :))

@FloRRenn FloRRenn closed this as completed May 1, 2022
@Zack-Bloodshot
Copy link

Zack-Bloodshot commented May 3, 2022

Now its working everyone! Enjoy!
Again...

@defnotanalt
Copy link

Now its working everyone! Enjoy! Again...

Assuming we're still broken and just waiting on a turn-around basically?

@Zack-Bloodshot
Copy link

Zack-Bloodshot commented May 30, 2022

Now its working everyone! Enjoy! Again...

Assuming we're still broken and just waiting on a turn-around basically?

Yepp, I dont know if they are rate limiting or blocking the requests, once in a blue moon it works, and for other modules, they are using cookies to make it work..

@hentai-chan
Copy link
Owner

hentai-chan commented May 30, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Server Issues caused by the backend or server-related downtime.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants