Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting 403 while trying to fetch html data. #1237

Closed
AlenToma opened this issue Aug 6, 2021 · 6 comments · Fixed by #1222
Closed

Getting 403 while trying to fetch html data. #1237

AlenToma opened this issue Aug 6, 2021 · 6 comments · Fixed by #1222
Labels

Comments

@AlenToma
Copy link

AlenToma commented Aug 6, 2021

Reproduction

Steps to reproduce the behavior:

  1. call a get request to https://www.novelupdates.com/wp-admin/admin-ajax.php?action=nd_ajaxsearchmain&strType=desktop&strOne=Against%20the%20Gods&strSearchType=series
  2. response please enable cookies with status [403]

Expected behavior
The request should work as it dose not need any cookies.
When I am using the regular fetch everything work out really good and I am getting response but when I build an API
and used node-fetch the request above dose not work.

Your Environment

software version
node-fetch node-fetch
node ???? latest
npm ???? latest
Operating System Windows, visual code
@AlenToma AlenToma added the bug label Aug 6, 2021
@jimmywarting
Copy link
Collaborator

This started to happen in v3.0.0-beta.10
beta.9 works

There is a fix doe: #1222 solves this premature close issue, just waiting for at least 1 more approving review. So it can be merged. (kind of the last PR needed before releasing v3 as stable)

closing as dupl

@AlenToma
Copy link
Author

AlenToma commented Aug 6, 2021

Hi, I installed 3.0.0-beta.9 and the problem still exist.
Here is my package.json

  "dependencies": {
    "@babel/core": "^7.14.8",
    "axios": "^0.21.1",
    "body-parser": "^1.19.0",
    "concurrently": "^6.2.0",
    "cors": "^2.8.5",
    "dotenv": "^10.0.0",
    "express": "^4.17.1",
    "form-data": "^4.0.0",
    "helmet": "^4.6.0",
    "node-fetch": "3.0.0-beta.9",
    "node-fetch-cookies": "^2.0.3",
    "node-html-parser": "^4.1.2"
  },

Here is my code

    public async search(name: string) {
        try {
            var html = await HttpClient.getHtml("https://www.novelupdates.com/wp-admin/admin-ajax.php", { action: "nd_ajaxsearchmain", strType: "desktop", strOne: name, strSearchType: "series" });
            var container = HttpClient.parseHtml(html).querySelectorAll("a");
            return container.find(x => name.trim().length == (x.querySelector(".search_hl") ?? x.querySelector("span")).innerHTML.htmlText(false).trim().length && (name.toLowerCase().indexOf((x.querySelector(".search_hl") ?? x.querySelector("span")).innerHTML.htmlText().toLowerCase()) != -1))?.getAttribute("href")?.uri(this.root);
        } catch (error) {
            console.log(error)
            return undefined;
        }
    }
  static async getHtml(
    url: any, item?: any
  ) {
 

    var container = ""
    try {
      if (item) {
        Object.keys(item).forEach(x => {
          var v = encodeURIComponent(item[x]);
          if (url.indexOf("?") != -1)
            url += `&${x}=${v}`;
          else url += `?${x}=${v}`;
        })
      }
      console.log(`Sending html request to ${url}`);
      let headers = {
        Accept: '*/*',
        'User-Agent':
          'Mozilla/5.0 (Windows NT 10.03; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36',
      };

      var data = await httpClient.fetchWithTimeout(url, httpClient.staticOption({
        timeout: 30000,
        headers: headers,
      }));

      if (!data.ok || data.status === 1020) {
        const message = `An error has occured-status:${data.status}`;
        console.log(message);
      } else {
        console.log('Data is ok. proceed to parse it');
        var html = await data.text();
        html = html.replace(/<!DOCTYPE html>/g, "").replace(/[[class]]/g, "").replace(/[[id]]/g, "");
        container = html;
        console.log("Data has been parsed");
      }
    } catch (e) {
      console.log(e);
    }
    return httpClient.parseHtml(container);
  }

Please have some time to test it to verify if I am wrong.

@jimmywarting
Copy link
Collaborator

hmm, will look more closely on it in the weekend, quite late here now...
fyi, i tried requesting your url with beta.10 and it didn't work. I then tried d19fdac from #1222 and it did work

Maybe it was a earlier beta version before we accidentally broke 'premature close' problems.

@AlenToma
Copy link
Author

AlenToma commented Aug 7, 2021

Have test it with 3.0.0-beta.7 , 3.0.0-beta.8 and 2.6.0. All of them show the same result.

I should point out that I have this on API project using express if that may have different behavior.

Could you please reopen this issue so I would know when its fixed.

@AlenToma
Copy link
Author

Cool will test this out as soon as a package is deploy to npm

@AlenToma
Copy link
Author

I have tested this and its resolved in the latest version. thanx for your hard work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants