Discard search results if no nodes are found #1311

bno1 · 2023-05-17T22:12:34Z

The docs are not very clear on this, but I think the search results should be discarded if getSearchResults is not going to be called in order to free up the result in Chrome.

kenshaw · 2023-05-17T22:24:59Z

@ZekeLu your thoughts on this one?

bno1 · 2023-05-17T23:01:52Z

I did a small test and it does seem that discarding the results leads to slightly less memory usage after many (100K) searches:

Test code

package main

import (
	"context"
	"log"
	"os"
	"time"

	"github.com/chromedp/cdproto/dom"
	"github.com/chromedp/chromedp"
)

func main() {
	ctx, cancel := chromedp.NewContext(context.Background())
	defer cancel()

	doDiscard := false

	if len(os.Args) > 1 {
		doDiscard = true
	}

	const STEPS = 100000

	log.Printf("Running %v steps with doDiscard=%v", STEPS, doDiscard)

	err := chromedp.Run(
		ctx,
		chromedp.Navigate("https://google.com"),
		chromedp.ActionFunc(func(ctx context.Context) error {
			for i := 0; i < STEPS; i++ {
				id, count, err := dom.PerformSearch("#foobarbaz").Do(ctx)
				if err != nil {
					return err
				}

				if count < 1 {
					if doDiscard {
						err = dom.DiscardSearchResults(id).Do(ctx)
						if err != nil {
							return err
						}
					}

					continue
				}

				nodes, err := dom.GetSearchResults(id, 0, count).Do(ctx)
				if err != nil {
					return err
				}

				log.Println(nodes)
			}

			return nil
		}),
	)
	if err != nil {
		panic(err)
	}

	log.Println("Done, sleeping...")

	time.Sleep(1 * time.Hour)
}

No discard:

With discard:

ZekeLu · 2023-05-17T23:07:13Z

This is a good catch!

I just checked how the DevTools handle this command. It turns out that it always calls Dom.discardSearchResults. I think we should do the same thing here (that means we should call it when count > 0 too). And we have better add a test case for it.

kenshaw · 2023-05-17T23:14:59Z

When I originally wrote the query selectors, I purposefully did not discard the nodes after use; this is partially an issue with being able to maintain a somewhat "linear" logic. I'm not sure how this could be reworked without something like a chromedp.Discard() action that the user would have to add at the end of every action chain. The issue there is then that you couldn't break up actions. If anyone has thoughts on an API design that would work, I'm all for it. I realize chromedp is now used in much more complex projects, typically how I designed (and use) it for is to run a browser, extract some data, close the browser.

When I wrote the original code, it was with that mechanism in mind, so discarding the nodes seemed to not matter. Closing the search results when there are no results seems logically fine.

ZekeLu · 2023-05-17T23:15:21Z

Regarding the unit test, I'm thinking sending a DOM.getSearchResults command after the search is discarded, and it should result in an error (I'm not sure yet). But the searchId is not exported, it seems there is no way to test it.

ZekeLu · 2023-05-17T23:22:48Z

Ken, it seems that the search result is an internal state of the BySearch func, it's not exported, and the user code can not access it, so I think it's ok for BySearch to discard the search before it returns (means we can add a defer call to discard the search). Have I missed something?

kenshaw · 2023-05-17T23:32:31Z

I don't know if the logic has changed within Chrome, but originally the reason these needed to be kept around is that you would not get the right nodes back if you discarded it prior to using the node IDs later for performing actions on.

ZekeLu · 2023-05-17T23:35:23Z

Ah, that makes sense! @bno1 Can you please have a test to see if the node IDs can be used after discarding the search?

bno1 · 2023-05-17T23:40:17Z

Updated to always discard using defer. The _ = dom.DiscardSearchResults syntax is to silence golangci-lint warning about unused err result.

@ZekeLu In my small test app everything seems to work fine with discarding the result. But I can't rule out the possibility of chrome reusing node IDs from discarded results. Do you think this is possible?

kenshaw · 2023-05-17T23:45:44Z

@bno1 my original version of the BySearch discarded the nodes (I have versions of the original chromedp that was never pushed, when I was still building out the first version, and I just went back and checked that). I changed it to not do a discard, because yes it ended up being incorrect.

@ZekeLu my thinking here is that we can by default at the end of an action run discard this, and collect the value in the passed action context. We can then create a context option that disables this, if for some reason someone wants to keep the search results for multiple action runs. By default, we would enable the discard, and then users who wish to change this, would be able to modify it. It'd keep the API the same, while possibly gaining a bit of less memory usage.

ZekeLu · 2023-05-18T00:44:19Z

@kenshaw That would be a safe choice. Let me briefly summarize what will be changed in case I missed something.

Add these fields to chromedp.Context:

muSearch sync.Mutex
// searchIDs collects search id from BySearch so that the search can be
// discarded later in (Tasks).Do.
searchIDs []string
// keepSearchResult indicates whether the user wants to keep the search
// result. If it's true, chromedp won't collect search ids. Its value can
// be modified by the option WithKeepSearchResult.
keepSearchResult bool

chromedp/chromedp.go

Line 34 in 6074207

type Context struct {

Collect search id in BySearch:

@@ -330,10 +330,12 @@ func BySearch(s *Selector) {
        }

        if count < 1 {
+           // TODO: call dom.DiscardSearchResults()
            return []cdp.NodeID{}, nil
        }

        nodes, err := dom.GetSearchResults(id, 0, count).Do(ctx)
+       // TODO: collect search id
        if err != nil {
            return nil, err
        }

Discard collected searches on each (Tasks).Do (And I think we should document that calling action.Do directly won't be covered):
```
@@ -731,6 +740,7 @@ func (t Tasks) Do(ctx context.Context) error {
            return err
        }
    }
+   // TODO: discard collected searches.
    return nil
 }
```
chromedp/chromedp.go

Line 728 in 6074207

func (t Tasks) Do(ctx context.Context) error {

@bno1 Sorry that this requires quite some work. Please let us know if you encounter any issues during the implementation. Thank you very much!

bno1 · 2023-05-18T11:11:34Z

Sounds good to me. I think I will do some tests first to try to reproduce Ken's original issue, mostly out of curiosity.

bno1 · 2023-05-24T20:47:51Z

Update code according to @ZekeLu's suggestions. Also added a test for discard.

About trying to reproduce @kenshaw's original issue I couldn't achieve that on a static page. I think it can happen on a dynamic page (e.g. performSearch returns a node, the result is discarded immediately, the node is removed/replaced by page script, chrome reuses the node ID for some other node, chromedp tries to read the node and gets the new node instead of the one it found originally) but I didn't have the time to look into that

ZekeLu · 2023-05-25T11:36:05Z

@bno1 Thank you very much! Just want to let you know that I need some time to review the changes.

And regarding the original issue mentioned by ken, I need some time to look into it (hopefully ken can shed some light on it). I want to apology in advance if it finally turns out that we can discard the search result early.

Thank you!

ZekeLu · 2023-06-27T09:17:45Z

@bno1 Sorry for the soooo late reply! I was super busy recently.

I have checked the current source code of chromium, it turns out that discardSearchResults only removes the search result from search_results_:

protocol::Response InspectorDOMAgent::discardSearchResults(
    const String& search_id) {
  search_results_.erase(search_id);
  return protocol::Response::Success();
}

I'm not 100 percent sure that this won't affect the node ids. But comparing with DiscardFrontendBindings which will invalidate the node ids:

void InspectorDOMAgent::DiscardFrontendBindings() {
  if (history_)
    history_->Reset();
  search_results_.clear();
  document_node_to_id_map_->clear();
  id_to_node_.clear();
  id_to_nodes_map_.clear();
  ReleaseDanglingNodes();
  children_requested_.clear();
  cached_child_count_.clear();
  if (revalidate_task_)
    revalidate_task_->Reset();
}

I believe that the node ids are tracked by document_node_to_id_map_ (and/or id_to_node_, id_to_nodes_map_).

And based on your testing:

About trying to reproduce @kenshaw's original issue I couldn't achieve that on a static page.

In my opinion, always discarding the search result in the defer call is enough (that is, we only need this commit: d736122).

@kenshaw @bno1 What do you think?

bno1 · 2023-08-02T14:01:38Z

@ZekeLu thanks for pointing out the sources. I tried to look for those but I'm not familiar with chromium's source code. In addition to what you said, looking at the Bind method, it doesn't do anything smart like reusing ids, it's just incrementing a counter without checking for overflow or conflict.

So I agree with your conclusion. Discarding always seems to be the right thing to do.

ZekeLu · 2023-08-02T14:23:22Z

Then let's always discard the search result. I will tag a new release after this PR is merged. In the case that it causes any error, we can revert the change. Thank you!

bno1 · 2023-08-04T18:46:18Z

@ZekeLu I updated the branch

ZekeLu · 2023-08-05T03:29:59Z

Thank you @bno1 !

bno1 force-pushed the discard-search-results branch from 8091710 to d736122 Compare May 17, 2023 23:26

bno1 force-pushed the discard-search-results branch from d736122 to da070d9 Compare May 24, 2023 20:41

Always discard DOM search results

7a7b71d

bno1 force-pushed the discard-search-results branch from da070d9 to 7a7b71d Compare August 4, 2023 18:46

ZekeLu merged commit d218b16 into chromedp:master Aug 5, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discard search results if no nodes are found #1311

Discard search results if no nodes are found #1311

bno1 commented May 17, 2023

kenshaw commented May 17, 2023

bno1 commented May 17, 2023

ZekeLu commented May 17, 2023

kenshaw commented May 17, 2023

ZekeLu commented May 17, 2023

ZekeLu commented May 17, 2023 •

edited

kenshaw commented May 17, 2023

ZekeLu commented May 17, 2023

bno1 commented May 17, 2023

kenshaw commented May 17, 2023

ZekeLu commented May 18, 2023 •

edited

bno1 commented May 18, 2023

bno1 commented May 24, 2023 •

edited

ZekeLu commented May 25, 2023

ZekeLu commented Jun 27, 2023

bno1 commented Aug 2, 2023

ZekeLu commented Aug 2, 2023

bno1 commented Aug 4, 2023 •

edited

ZekeLu commented Aug 5, 2023

Discard search results if no nodes are found #1311

Discard search results if no nodes are found #1311

Conversation

bno1 commented May 17, 2023

kenshaw commented May 17, 2023

bno1 commented May 17, 2023

ZekeLu commented May 17, 2023

kenshaw commented May 17, 2023

ZekeLu commented May 17, 2023

ZekeLu commented May 17, 2023 • edited

kenshaw commented May 17, 2023

ZekeLu commented May 17, 2023

bno1 commented May 17, 2023

kenshaw commented May 17, 2023

ZekeLu commented May 18, 2023 • edited

bno1 commented May 18, 2023

bno1 commented May 24, 2023 • edited

ZekeLu commented May 25, 2023

ZekeLu commented Jun 27, 2023

bno1 commented Aug 2, 2023

ZekeLu commented Aug 2, 2023

bno1 commented Aug 4, 2023 • edited

ZekeLu commented Aug 5, 2023

ZekeLu commented May 17, 2023 •

edited

ZekeLu commented May 18, 2023 •

edited

bno1 commented May 24, 2023 •

edited

bno1 commented Aug 4, 2023 •

edited