Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SanitizeReaderToWriter is REALLY slow #134

Closed
natefinch opened this issue Nov 3, 2021 · 2 comments
Closed

SanitizeReaderToWriter is REALLY slow #134

natefinch opened this issue Nov 3, 2021 · 2 comments

Comments

@natefinch
Copy link

I haven't looked into this too deeply, but I went straight for this method because I assumed it might be better than reading the whole HTML body into memory first. All I did was

func main() {
    in, err := os.Open("input.html")
    if err != nil {
        panic(err)
    }
    defer in.Close()

    out, err := os.Create("output.html")
    if err != nil {
        panic(err)
    }
    defer out.Close()
    
    p := bluemonday.UGCPolicy()
    err = p.SanitizeReaderToWriter(in, out)
    if err != nil {
        panic(err)
    }
}

The difference between the above and reading the input into memory and writing the output to a buffer, and then writing the buffer to disk is HUGE. For a large (6mb) HTML file on my 2021 Macbook, SanitizeReaderToWriter took 4.5s, and Sanitize took 0.15s.

I haven't looked too far into this, and I get there could be some I/O buffering issue with reading and writing directly to disk, but even then, the fact that it's 30x slower seems really weird.

@PaperPrototype
Copy link

yeah...

@buro9 buro9 closed this as completed in 9ef01f7 Jul 1, 2022
@buro9
Copy link
Member

buro9 commented Jul 1, 2022

I'm confused by this because it's all the same under the hood:

func (p *Policy) sanitize(r io.Reader, w io.Writer) error {

SanitizeReaderToWriter is in fact the only one that goes directly: https://github.com/microcosm-cc/bluemonday/blob/main/sanitize.go#L68-L96

I found this puzzling, so I've added a test:

: go test -run TestIssue134 -v
=== RUN   TestIssue134
=== RUN   TestIssue134/Sanitize
=== RUN   TestIssue134/SanitizeReader
=== RUN   TestIssue134/SanitizeBytes
=== RUN   TestIssue134/SanitizeReaderToWriter
--- PASS: TestIssue134 (0.00s)
    --- PASS: TestIssue134/Sanitize (0.00s)
    --- PASS: TestIssue134/SanitizeReader (0.00s)
    --- PASS: TestIssue134/SanitizeBytes (0.00s)
    --- PASS: TestIssue134/SanitizeReaderToWriter (0.00s)
PASS
ok      github.com/microcosm-cc/bluemonday      0.002s

Please re-open if you can reproduce the test showing the method is slower than others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants