Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop anemone and use Spidr for Repo discovery #10947

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

sjha4
Copy link
Member

@sjha4 sjha4 commented Mar 22, 2024

To Do:

  • Support basic auth
  • Support proxy

What are the changes introduced in this pull request?

Considerations taken when implementing this change?

What are the testing steps for this pull request?

bundle install
Go to Content > Product > Repo discovery
Run repo discovery.

@sjha4
Copy link
Member Author

sjha4 commented Mar 22, 2024

@evgeni : Thoughts on spidr gem as replacement for anemone?

@evgeni
Copy link
Member

evgeni commented Mar 22, 2024

Doesn't look crazy? Only dep is nokogiri, which we have anyway, tested on modern rubies. Why not.

@sjha4
Copy link
Member Author

sjha4 commented May 21, 2024

I am seeing significant performance difference with what's on the PR vs the existing workflow..Looking at ways to speed this up..Will push updates when I get the performance sorted.

Update: Should be good to go with latest commit.
Able to see chunked output in repo discovery page and the task finishes in about the same time as earlier.

@sjha4 sjha4 changed the title Early Draft - Drop anemone and use Spidr Drop anemone and use Spidr for Repo discovery May 21, 2024
@sjha4 sjha4 marked this pull request as ready for review May 21, 2024 16:59
app/lib/katello/repo_discovery.rb Outdated Show resolved Hide resolved
katello.gemspec Outdated Show resolved Hide resolved
Copy link
Member

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation wise I think you should separate the crawler and Docker search into separate classes. Perhaps even the file crawl as well. Right now it's confusing.

app/lib/katello/repo_discovery.rb Outdated Show resolved Hide resolved
app/lib/katello/repo_discovery.rb Outdated Show resolved Hide resolved
app/lib/katello/repo_discovery.rb Outdated Show resolved Hide resolved
katello.gemspec Outdated Show resolved Hide resolved
katello.gemspec Outdated Show resolved Hide resolved
Copy link
Member

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this heads in the right direction.

Review wise I wonder if it makes sense to split it up: 1 to create the 3 classes and a follow up that replaces anemone with spidr.

Comment on lines 8 to 10
@upstream_username = upstream_username.empty? ? nil : upstream_username
@upstream_password = upstream_password.empty? ? nil : upstream_password
@search = search
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These parameters are unused. Perhaps drop them? If not, then should the constructor be part of the base class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will drop these here..
I was trying to get the constructor as part of base class but the foreman task output which is read in chunks doesn't work when inherited for some reason. Had to go with constructors in child classes and the class_for construct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to make credentials and search optional additional params for container and yum classes.


private

def file_crawl(resume_point)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is no recursion in the method, perhaps just make this run(resume_point)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to remove this extra method.

@sjha4
Copy link
Member Author

sjha4 commented Jun 3, 2024

Review wise I wonder if it makes sense to split it up: 1 to create the 3 classes and a follow up that replaces anemone with spidr.

The anemone -> spidr change is localized to yum_discovery only right now as far a changes go. My 2 cents is it's a small enough change to be in one PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants