Drop anemone and use Spidr for Repo discovery #10947

sjha4 · 2024-03-22T03:26:33Z

To Do:

Support basic auth
Support proxy

What are the changes introduced in this pull request?

Considerations taken when implementing this change?

What are the testing steps for this pull request?

bundle install
Go to Content > Product > Repo discovery
Run repo discovery.

sjha4 · 2024-03-22T03:40:52Z

@evgeni : Thoughts on spidr gem as replacement for anemone?

evgeni · 2024-03-22T06:09:21Z

Doesn't look crazy? Only dep is nokogiri, which we have anyway, tested on modern rubies. Why not.

app/lib/katello/repo_discovery.rb

sjha4 · 2024-05-21T15:40:45Z

I am seeing significant performance difference with what's on the PR vs the existing workflow..Looking at ways to speed this up..Will push updates when I get the performance sorted.

Update: Should be good to go with latest commit.
Able to see chunked output in repo discovery page and the task finishes in about the same time as earlier.

app/lib/katello/repo_discovery.rb

katello.gemspec

ekohl

Implementation wise I think you should separate the crawler and Docker search into separate classes. Perhaps even the file crawl as well. Right now it's confusing.

app/lib/katello/repo_discovery.rb

katello.gemspec

ekohl

I think this heads in the right direction.

Review wise I wonder if it makes sense to split it up: 1 to create the 3 classes and a follow up that replaces anemone with spidr.

ekohl · 2024-06-03T15:24:11Z

app/lib/katello/resources/discovery/file_discovery.rb

+      @upstream_username = upstream_username.empty? ? nil : upstream_username
+      @upstream_password = upstream_password.empty? ? nil : upstream_password
+      @search = search


These parameters are unused. Perhaps drop them? If not, then should the constructor be part of the base class?

I will drop these here..
I was trying to get the constructor as part of base class but the foreman task output which is read in chunks doesn't work when inherited for some reason. Had to go with constructors in child classes and the class_for construct.

Updated to make credentials and search optional additional params for container and yum classes.

ekohl · 2024-06-03T15:29:01Z

app/lib/katello/resources/discovery/file_discovery.rb

+
+    private
+
+    def file_crawl(resume_point)


Since there is no recursion in the method, perhaps just make this run(resume_point)?

Updated to remove this extra method.

sjha4 · 2024-06-03T17:28:22Z

Review wise I wonder if it makes sense to split it up: 1 to create the 3 classes and a follow up that replaces anemone with spidr.

The anemone -> spidr change is localized to yum_discovery only right now as far a changes go. My 2 cents is it's a small enough change to be in one PR?

pr-processor bot added Not yet reviewed Waiting on contributor labels Mar 22, 2024

sjha4 force-pushed the anemone branch 2 times, most recently from cc6dee6 to 934292d Compare March 22, 2024 16:48

pr-processor bot removed the Waiting on contributor label Mar 22, 2024

jeremylenz reviewed May 14, 2024

View reviewed changes

app/lib/katello/repo_discovery.rb Outdated Show resolved Hide resolved

ekohl reviewed May 16, 2024

View reviewed changes

app/lib/katello/repo_discovery.rb Outdated Show resolved Hide resolved

Fixes #37159 - Drop anemone and use Spidr for repo discovery

84094c5

sjha4 force-pushed the anemone branch from 934292d to 759e66d Compare May 21, 2024 16:57

github-actions bot added the Packaging Change label May 21, 2024

sjha4 changed the title ~~Early Draft - Drop anemone and use Spidr~~ Drop anemone and use Spidr for Repo discovery May 21, 2024

sjha4 force-pushed the anemone branch from 759e66d to 5834448 Compare May 21, 2024 16:58

sjha4 marked this pull request as ready for review May 21, 2024 16:59

evgeni reviewed May 21, 2024

View reviewed changes

app/lib/katello/repo_discovery.rb Outdated Show resolved Hide resolved

katello.gemspec Outdated Show resolved Hide resolved

sjha4 force-pushed the anemone branch from 5834448 to d96f3f0 Compare May 21, 2024 17:20

Refs #37159 - Improve discovery performance and add gem dependency

1d7766a

sjha4 force-pushed the anemone branch from d96f3f0 to 1d7766a Compare May 21, 2024 17:56

ekohl reviewed May 21, 2024

View reviewed changes

sjha4 force-pushed the anemone branch from 2c86dc5 to 7abf885 Compare May 30, 2024 19:24

ekohl reviewed Jun 3, 2024

View reviewed changes

Refs #37159 - Refactor content specific discoveries

4c7c721

sjha4 force-pushed the anemone branch from 7abf885 to 4c7c721 Compare June 3, 2024 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop anemone and use Spidr for Repo discovery #10947

Drop anemone and use Spidr for Repo discovery #10947

sjha4 commented Mar 22, 2024 •

edited

sjha4 commented Mar 22, 2024

evgeni commented Mar 22, 2024

sjha4 commented May 21, 2024 •

edited

ekohl left a comment

ekohl left a comment

ekohl Jun 3, 2024

sjha4 Jun 3, 2024

sjha4 Jun 3, 2024

ekohl Jun 3, 2024

sjha4 Jun 3, 2024

sjha4 commented Jun 3, 2024

Drop anemone and use Spidr for Repo discovery #10947

Are you sure you want to change the base?

Drop anemone and use Spidr for Repo discovery #10947

Conversation

sjha4 commented Mar 22, 2024 • edited

What are the changes introduced in this pull request?

Considerations taken when implementing this change?

What are the testing steps for this pull request?

sjha4 commented Mar 22, 2024

evgeni commented Mar 22, 2024

sjha4 commented May 21, 2024 • edited

ekohl left a comment

Choose a reason for hiding this comment

ekohl left a comment

Choose a reason for hiding this comment

ekohl Jun 3, 2024

Choose a reason for hiding this comment

sjha4 Jun 3, 2024

Choose a reason for hiding this comment

sjha4 Jun 3, 2024

Choose a reason for hiding this comment

ekohl Jun 3, 2024

Choose a reason for hiding this comment

sjha4 Jun 3, 2024

Choose a reason for hiding this comment

sjha4 commented Jun 3, 2024

sjha4 commented Mar 22, 2024 •

edited

sjha4 commented May 21, 2024 •

edited