Skip to content

Add functions to load buffers, streams & URLs #2051

Closed
@fb55

Description

@fb55
Member

Most users will use Cheerio with documents loaded from the web, which can lead to decoding issues; see #1785. Cheerio should provide a method to load a buffer that properly handles encodings. JSDom uses https://github.com/jsdom/whatwg-encoding to do this. I have started working on a solution at https://github.com/fb55/encoding-sniffer, which will support streams.

One current user-land implementation of this is https://github.com/ktty1220/cheerio-httpcli (in Japanese).

We should add three functions for NodeJS users:

  • load(buffer, options) — sniffs the encoding of the passed buffer and returns the loaded document; overload of the existing load function.
  • stream(cb, options) (see .stream(cb) method #99) — returns a writeable stream that will (1) sniff the encoding, (2) parse the document as chunks arrive, and (3) calls the callback with a loaded Cheerio instance once the stream has ended.
    • It would be nice to have the return value of stream be both a writeable stream, as well as a promise that allows users to await the response.
    • An alternative interface might be stream(readableStream, options), which returns a promise and automatically consumes the readable stream. Note that this is against NodeJS conventions.
  • request(url, options) — fetches the document at url and pipes it into stream. Returns a promise for the loaded document.
    • Not named fetch, to avoid a name collision with the official fetch API.

For me, the big open question is how much of this we can bring to other platforms as well. Eg. Deno users will no doubt have similar requirements.

Activity

changed the title [-]Provide method to read buffers with unknown encodings[/-] [+]Add methods to load buffers & URLs[/+] on May 11, 2022
changed the title [-]Add methods to load buffers & URLs[/-] [+]Add methods to load buffers, streams & URLs[/+] on May 11, 2022
changed the title [-]Add methods to load buffers, streams & URLs[/-] [+]Add functions to load buffers, streams & URLs[/+] on May 11, 2022

14 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @fb55

      Issue actions

        Add functions to load buffers, streams & URLs · Issue #2051 · cheeriojs/cheerio