Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get all supported OLM operators from the service for test_olm_operator #793

Closed

Conversation

YuviGold
Copy link
Contributor

No description provided.

@YuviGold
Copy link
Contributor Author

Now that OCS is fixed on #772 need to bring it back to the test
/test system-test-olm

@openshift-ci
Copy link

openshift-ci bot commented May 12, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: YuviGold

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot requested review from gamli75 and RazRegev May 12, 2021 14:02
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 12, 2021
@YuviGold
Copy link
Contributor Author

/tesr system-test-olm

@YuviGold
Copy link
Contributor Author

Different tests were collected between gw1 and gw0. The difference is:
--- gw1

+++ gw0

@@ -1,3 +1,3 @@

-test_e2e_install.py::TestInstall::test_olm_operator[lso]
 test_e2e_install.py::TestInstall::test_olm_operator[ocs]
 test_e2e_install.py::TestInstall::test_olm_operator[cnv]
+test_e2e_install.py::TestInstall::test_olm_operator[lso]

Looks like a known issue pytest-dev/pytest-xdist#432 (comment)
pytest-xdist doesn't allow random parameters. So it should be resolved with sorted

@YuviGold
Copy link
Contributor Author

/test system-test-olm

@openshift-ci
Copy link

openshift-ci bot commented May 16, 2021

@YuviGold: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/system-test-olm ce8926b link /test system-test-olm

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@YuviGold
Copy link
Contributor Author

@pkliczewski Could you take a look why this error occurs?

        "status_info": "Host cannot be installed due to following failing validation(s): All host roles must be assigned to enable OCS in Standard or Minimal Mode.",

@pkliczewski
Copy link
Contributor

pkliczewski commented May 18, 2021

All host roles must be assigned to enable OCS in Standard or Minimal Mode."

@YuviGold This message occurs when a node role is not yet assigned

@YuviGold
Copy link
Contributor Author

All host roles must be assigned to enable OCS in Standard or Minimal Mode."

@YuviGold This message occurs when a node role is not yet assigned

@pkliczewski How do you suggest to solve it?

@pkliczewski
Copy link
Contributor

@pkliczewski How do you suggest to solve it?

I used 4 workers and after some time all nodes got their role assigned and installation continued.

@YuviGold
Copy link
Contributor Author

@pkliczewski How do you suggest to solve it?

I used 4 workers and after some time all nodes got their role assigned and installation continued.

Okay, a strange thing happened. I don't know why 9 hosts were generated:

  • 3 workers
  • 2 masters
  • 4 auto-assign
  1. @pkliczewski Do we need that amount of hosts? In case not - something is not working well with the OLM operators resources
  2. @filanov is it expected that auto-assign roles would remain? Shouldn't all of them become workers?
  3. How is that possible that there are only 2 masters

@filanov
Copy link
Contributor

filanov commented May 18, 2021

@pkliczewski How do you suggest to solve it?

I used 4 workers and after some time all nodes got their role assigned and installation continued.

Okay, a strange thing happened. I don't know why 9 hosts were generated:

  • 3 workers
  • 2 masters
  • 4 auto-assign
  1. @pkliczewski Do we need that amount of hosts? In case not - something is not working well with the OLM operators resources
  2. @filanov is it expected that auto-assign roles would remain? Shouldn't all of them become workers?
  3. How is that possible that there are only 2 masters

I'm a bit our of context but auto-assign is working good without operators i'm not sure if it will work with OLM enabled because it will change the requirements so may there may be a bug there.

@YuviGold
Copy link
Contributor Author

@pkliczewski How do you suggest to solve it?

I used 4 workers and after some time all nodes got their role assigned and installation continued.

Okay, a strange thing happened. I don't know why 9 hosts were generated:

  • 3 workers
  • 2 masters
  • 4 auto-assign
  1. @pkliczewski Do we need that amount of hosts? In case not - something is not working well with the OLM operators resources
  2. @filanov is it expected that auto-assign roles would remain? Shouldn't all of them become workers?
  3. How is that possible that there are only 2 masters

I'm a bit our of context but auto-assign is working good without operators i'm not sure if it will work with OLM enabled because it will change the requirements so may there may be a bug there.

@filanov Actually I look at the logs. It looks like for this cluster 2 masters was chosen, and 3 workers.
Afterwards, 4 more hosts were registered and they weren't assigned any role.

time="2021-05-16T11:07:17Z" level=info msg="Update host 73993c49-6d04-4e02-89c8-d526220b2275 to role: master" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).updateHostRoles" file="/go/src/github.com/openshift/origin/internal/bminventory/inventory.go:2089" cluster_id=1853cfd8-c267-4229-8b0f-39b5426af277 go-id=1093 pkg=Inventory request_id=c16d9409-1500-4351-a72c-469a54900070
time="2021-05-16T11:07:17Z" level=info msg="Update host 7c523069-ce96-4f0e-9883-72e9e66c9507 to role: master" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).updateHostRoles" file="/go/src/github.com/openshift/origin/internal/bminventory/inventory.go:2089" cluster_id=1853cfd8-c267-4229-8b0f-39b5426af277 go-id=1093 pkg=Inventory request_id=c16d9409-1500-4351-a72c-469a54900070
time="2021-05-16T11:07:17Z" level=info msg="Update host 02bf6606-42af-4065-a90f-3100c96ee171 to role: worker" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).updateHostRoles" file="/go/src/github.com/openshift/origin/internal/bminventory/inventory.go:2089" cluster_id=1853cfd8-c267-4229-8b0f-39b5426af277 go-id=1093 pkg=Inventory request_id=c16d9409-1500-4351-a72c-469a54900070
time="2021-05-16T11:07:17Z" level=info msg="Update host 840c5941-abcf-42e2-83d2-9bcb8c1491c8 to role: worker" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).updateHostRoles" file="/go/src/github.com/openshift/origin/internal/bminventory/inventory.go:2089" cluster_id=1853cfd8-c267-4229-8b0f-39b5426af277 go-id=1093 pkg=Inventory request_id=c16d9409-1500-4351-a72c-469a54900070
time="2021-05-16T11:07:17Z" level=info msg="Update host f3f57fb3-f9b4-49bc-9fd1-3e0e101de390 to role: worker" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).updateHostRoles" file="/go/src/github.com/openshift/origin/internal/bminventory/inventory.go:2089" cluster_id=1853cfd8-c267-4229-8b0f-39b5426af277 go-id=1093 pkg=Inventory request_id=c16d9409-1500-4351-a72c-469a54900070

@YuviGold
Copy link
Contributor Author

Actually, this might be a bug in test-infra. @tsorya any idea?

@pkliczewski
Copy link
Contributor

@YuviGold I tested with 3 masters and 4 workers.

@tsorya
Copy link
Contributor

tsorya commented May 27, 2021

@pkliczewski the problem is that for some reason operators host counts are provided as some defaults inside terraform controller, Not talking about mess it creates and how it increases difficulty of debugging failures, it also creates bugs like this one.
https://github.com/openshift/assisted-test-infra/blob/master/discovery-infra/tests/test_e2e_install.py#L22 this test should set all the node params by itself and not by expecting some default to happened inside terraform controller.
https://github.com/openshift/assisted-test-infra/blob/master/discovery-infra/test_infra/controllers/node_controllers/terraform_controller.py#L70

what is happening is that for one of the operators default is 4 workers that are added to 5 nodes that we provide through env. This causes creation of 9 hosts but all other function still got default of 5.
I will create ticket to remove all operators defaults from controllers. These params should be handled by test code or operator_controller and for sure not by terraform or any other that is not relevant to operators

@openshift-ci
Copy link

openshift-ci bot commented May 27, 2021

@YuviGold: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tsorya
Copy link
Contributor

tsorya commented May 27, 2021

/hold

@openshift-ci openshift-ci bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels May 27, 2021
@YuviGold
Copy link
Contributor Author

@pkliczewski the problem is that for some reason operators host counts are provided as some defaults inside terraform controller, Not talking about mess it creates and how it increases difficulty of debugging failures, it also creates bugs like this one.
https://github.com/openshift/assisted-test-infra/blob/master/discovery-infra/tests/test_e2e_install.py#L22 this test should set all the node params by itself and not by expecting some default to happened inside terraform controller.
https://github.com/openshift/assisted-test-infra/blob/master/discovery-infra/test_infra/controllers/node_controllers/terraform_controller.py#L70

what is happening is that for one of the operators default is 4 workers that are added to 5 nodes that we provide through env. This causes creation of 9 hosts but all other function still got default of 5.
I will create ticket to remove all operators defaults from controllers. These params should be handled by test code or operator_controller and for sure not by terraform or any other that is not relevant to operators

@tsorya
On the contrary. We want to make it work as part of the dynamic resources so a user would be able to mention "run e2e with OLM=cnv" for example and it would work.
An issue was opened on how to solve it - https://issues.redhat.com/browse/MGMT-6545
we don't want to move backwards.

@YuviGold
Copy link
Contributor Author

Resolved on #886

@YuviGold YuviGold closed this Jun 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants