New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discard search results if no nodes are found #1311
Conversation
@ZekeLu your thoughts on this one? |
I did a small test and it does seem that discarding the results leads to slightly less memory usage after many (100K) searches: Test codepackage main
import (
"context"
"log"
"os"
"time"
"github.com/chromedp/cdproto/dom"
"github.com/chromedp/chromedp"
)
func main() {
ctx, cancel := chromedp.NewContext(context.Background())
defer cancel()
doDiscard := false
if len(os.Args) > 1 {
doDiscard = true
}
const STEPS = 100000
log.Printf("Running %v steps with doDiscard=%v", STEPS, doDiscard)
err := chromedp.Run(
ctx,
chromedp.Navigate("https://google.com"),
chromedp.ActionFunc(func(ctx context.Context) error {
for i := 0; i < STEPS; i++ {
id, count, err := dom.PerformSearch("#foobarbaz").Do(ctx)
if err != nil {
return err
}
if count < 1 {
if doDiscard {
err = dom.DiscardSearchResults(id).Do(ctx)
if err != nil {
return err
}
}
continue
}
nodes, err := dom.GetSearchResults(id, 0, count).Do(ctx)
if err != nil {
return err
}
log.Println(nodes)
}
return nil
}),
)
if err != nil {
panic(err)
}
log.Println("Done, sleeping...")
time.Sleep(1 * time.Hour)
} |
When I originally wrote the query selectors, I purposefully did not discard the nodes after use; this is partially an issue with being able to maintain a somewhat "linear" logic. I'm not sure how this could be reworked without something like a When I wrote the original code, it was with that mechanism in mind, so discarding the nodes seemed to not matter. Closing the search results when there are no results seems logically fine. |
Regarding the unit test, I'm thinking sending a |
Ken, it seems that the search result is an internal state of the |
8091710
to
d736122
Compare
I don't know if the logic has changed within Chrome, but originally the reason these needed to be kept around is that you would not get the right nodes back if you discarded it prior to using the node IDs later for performing actions on. |
Ah, that makes sense! @bno1 Can you please have a test to see if the node IDs can be used after discarding the search? |
Updated to always discard using defer. The @ZekeLu In my small test app everything seems to work fine with discarding the result. But I can't rule out the possibility of chrome reusing node IDs from discarded results. Do you think this is possible? |
@bno1 my original version of the @ZekeLu my thinking here is that we can by default at the end of an action run discard this, and collect the value in the passed action context. We can then create a context option that disables this, if for some reason someone wants to keep the search results for multiple action runs. By default, we would enable the discard, and then users who wish to change this, would be able to modify it. It'd keep the API the same, while possibly gaining a bit of less memory usage. |
@kenshaw That would be a safe choice. Let me briefly summarize what will be changed in case I missed something.
@bno1 Sorry that this requires quite some work. Please let us know if you encounter any issues during the implementation. Thank you very much! |
Sounds good to me. I think I will do some tests first to try to reproduce Ken's original issue, mostly out of curiosity. |
d736122
to
da070d9
Compare
Update code according to @ZekeLu's suggestions. Also added a test for discard. About trying to reproduce @kenshaw's original issue I couldn't achieve that on a static page. I think it can happen on a dynamic page (e.g. performSearch returns a node, the result is discarded immediately, the node is removed/replaced by page script, chrome reuses the node ID for some other node, chromedp tries to read the node and gets the new node instead of the one it found originally) but I didn't have the time to look into that |
@bno1 Thank you very much! Just want to let you know that I need some time to review the changes. And regarding the original issue mentioned by ken, I need some time to look into it (hopefully ken can shed some light on it). I want to apology in advance if it finally turns out that we can discard the search result early. Thank you! |
@bno1 Sorry for the soooo late reply! I was super busy recently. I have checked the current source code of protocol::Response InspectorDOMAgent::discardSearchResults(
const String& search_id) {
search_results_.erase(search_id);
return protocol::Response::Success();
} I'm not 100 percent sure that this won't affect the node ids. But comparing with DiscardFrontendBindings which will invalidate the node ids: void InspectorDOMAgent::DiscardFrontendBindings() {
if (history_)
history_->Reset();
search_results_.clear();
document_node_to_id_map_->clear();
id_to_node_.clear();
id_to_nodes_map_.clear();
ReleaseDanglingNodes();
children_requested_.clear();
cached_child_count_.clear();
if (revalidate_task_)
revalidate_task_->Reset();
} I believe that the node ids are tracked by And based on your testing:
In my opinion, always discarding the search result in the defer call is enough (that is, we only need this commit: d736122). |
@ZekeLu thanks for pointing out the sources. I tried to look for those but I'm not familiar with chromium's source code. In addition to what you said, looking at the Bind method, it doesn't do anything smart like reusing ids, it's just incrementing a counter without checking for overflow or conflict. So I agree with your conclusion. Discarding always seems to be the right thing to do. |
Then let's always discard the search result. I will tag a new release after this PR is merged. In the case that it causes any error, we can revert the change. Thank you! |
da070d9
to
7a7b71d
Compare
@ZekeLu I updated the branch |
Thank you @bno1 ! |
The docs are not very clear on this, but I think the search results should be discarded if
getSearchResults
is not going to be called in order to free up the result in Chrome.