New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profile memory usage from processing HTML contents #2088
base: main
Are you sure you want to change the base?
Conversation
Code Climate has analyzed commit 2213ba1 and detected 0 issues on this pull request. The test coverage on the diff in this pull request is 100.0% (80% is the threshold). This pull request will bring the total coverage in the repository to 94.3% (0.0% change). View more on Code Climate. |
✅ Build nokogiri 1.0.685 completed (commit 794609a419 by @ashmaroli) |
Please ignore the failed CI status -- that was due to my config mistake related to #2089. This PR was green and should be considered green. |
8e510b7
to
2213ba1
Compare
✅ Build nokogiri 1.0.711 completed (commit cb122b5a75 by @ashmaroli) |
@ashmaroli How to evaluate memory profiling results? I mean, you wouldn't go checking CI logs on every commit and manually compare memory usage between commits. |
@ilyazub Honestly, I was planning on doing just that — compare manually between commits of interest — say, the current Theoretically, if there was a way to get the log for an action that ran on |
@ashmaroli Thanks for taking the time to put this together, and sorry for not commenting before now. I think there's value in being able to look at data like this, but (as @ilyazub is implying) I worry that flagging memory leaks requires more rigor that just emitting a log file. I'd ideally like to embed this approach in a CI test that emits pass/fail status. Can you think of a way to integrate this data better in the testing cycle? I'm also curious about combining this approach with the long-defunct memory test suite (see #1603) and also incorporate testing with compaction and either valgrind or ASAN. Any thoughts about that? |
For the CI integration
Regarding #1603, we can store historical data for each test in @flavorjones @ashmaroli What do you think? |
Thanks for the feedback on this idea, @flavorjones and @ilyazub. Regarding increasing the rigor of profiling memory, we may need to look into other options. |
What problem is this PR intended to solve?
This pull request adds a GitHub Actions Workflow that profiles memory usage from using Nokogiri to parse and serialize HTML content.
The core of the workflow is split into two steps:
Nokogiri::HTML::Document
and dumping it back into HTML string.<body>
tag contents withNokogiri::HTML::DocumentFragment
and dumping it back into HTML string.(The profiling involves parsing and serializing the same input a 1000 times).
The source material for both steps are from a physical HTML file (in turn sourced from the current state of https://nokogiri.org/).
Having a static fixture ensures consistent sample across multiple runs of the workflow.