Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added an optional callback OnToken #1153

Merged
merged 1 commit into from
Mar 18, 2024

Conversation

schaakverslaafd
Copy link
Contributor

Types of Changes

Prerequisites

Please make sure you can check the following two boxes:

  • I have read the CONTRIBUTING document
  • My code follows the code style of this project

Contribution Type

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue, please reference the issue id)
  • New feature (non-breaking change which adds functionality, make sure to open an associated issue first)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • My change requires a change to the documentation
  • I have updated the documentation accordingly (NOT YET)
  • I have added tests to cover my changes
  • All new and existing tests passed

Description

Added an OnToken event to the htmlParser raised when a new html token is consumed. This event passes the html token and a range in the source text corresponding to the token.
I didn't update documentation (yet). Provided this pull request to discuss issue #754.

@FlorianRappl
Copy link
Contributor

Alright we definitely need to see two benchmarks here (on a large HTML document):

  1. Old code (no modification)
  2. With OnToken (does not need to be provided; the performance should roughly remain the same for the case that no OnToken callback was provided)

Could this be done? Thanks for your efforts!

@FlorianRappl FlorianRappl added the pending The issue is still pending and waiting for OP response label Dec 22, 2023
@schaakverslaafd
Copy link
Contributor Author

Hi,

Sorry for a late reponse.

Two benchmarks from parsing page.html from the benchmark project.

Benchmark for old code
Benchmark-OldCode

Benchmark with OnToken
BenchMark-OnToken

using System.IO;
using AngleSharp.Html.Parser;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;

namespace AngleSharp.Benchmarks
{
    [MemoryDiagnoser, GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByParams), ShortRunJob]
    public class OnTokenBenchmark
    {
        private static readonly HtmlParser angleSharpParser = new HtmlParser();
        private string pageContent = "";

        [GlobalSetup]
        public void GlobalSetup()
        {
            pageContent = File.ReadAllText("page.html");
        }

        [Benchmark]
        public void AngleSharp()
        {
            angleSharpParser.ParseDocument(pageContent);
        }
    }
}

@FlorianRappl
Copy link
Contributor

Great thanks for the efforts @schaakverslaafd - can you update your branch (sync with devel)? Then we are ready to merge!

@FlorianRappl FlorianRappl added enhancement api and removed pending The issue is still pending and waiting for OP response labels Mar 16, 2024
@FlorianRappl FlorianRappl added this to the 1.2.0 milestone Mar 16, 2024
@schaakverslaafd schaakverslaafd force-pushed the feature/#754 branch 5 times, most recently from 93d9e39 to 3192102 Compare March 17, 2024 08:42
@schaakverslaafd
Copy link
Contributor Author

Great!
I merged the branches. My editor messed up the spacing a bit. Let me know if that's an issue.

(The original solution had to be changed a bit because StructHtmlToken is passed by ref. New benchmarks did not show a significant difference.)

@schaakverslaafd
Copy link
Contributor Author

schaakverslaafd commented Mar 18, 2024

Alright! The space changes are removed.

Copy link
Contributor

@FlorianRappl FlorianRappl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for all your efforts 🚀 !

@FlorianRappl FlorianRappl merged commit aa35a48 into AngleSharp:devel Mar 18, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants