Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add clarity to MVI timeouts #2410

Merged
merged 7 commits into from Nov 12, 2018
Merged

Add clarity to MVI timeouts #2410

merged 7 commits into from Nov 12, 2018

Conversation

annaswims
Copy link
Contributor

@annaswims annaswims commented Oct 31, 2018

https://github.com/department-of-veterans-affairs/vets.gov-team/issues/14735

Description of change

There was code that looked like it intended to set the open_timeout to 2 and the read_timeout to 10 for MVI, but it didn't actually do that. The default of 15 was used all along for both. Update the code to reflect that reality. I don't believe that we want to reduce the timeout because we regularly see timeout errors.

tl;dr MVI:: Configuration:: OPEN_TIMEOUT is not the same as MVI::Configuration.instance.open_timeout and we already set default settings in config/settings.yml

Testing done

verified that request_options now contains the timeouts from settings.yml

Testing planned

no additional testing planned

Applies to all PRs

  • Appropriate logging
  • Swagger docs have been updated, if applicable
  • Provide link to originating GitHub issue, or connected to it via ZenHub
  • Does not contain any sensitive information (i.e. PII/credentials/internal URLs/etc., in logging, hardcoded, or in specs)
  • Provide which alerts would indicate a problem with this functionality (if applicable)

There was code that looked like it intended to set the open_timeout to 2 and the read_timeout to 10 for MVI,
but it didn't actually do that.  The default of 15 was used all along for both. Update the code to reflect that
reality.  I don't believe that we want to reduce the timeout because we regularly see timeout errors.
@annaswims
Copy link
Contributor Author

note to self: rerun tests after #2409 is merged

@va-bot va-bot temporarily deployed to mvi_timeout_refactor/master October 31, 2018 15:35 Inactive
Copy link
Contributor

@dhall dhall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- just a thought, the minimalist in me likes the idea of stripping out the misleading settings like you did but relying on the defaults and not overriding them with the same values (unless I misunderstood what's being done)

Copy link
Contributor

@U-DON U-DON left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Are we redefining self.open_timeout to leave ourselves open to the possibility of changing (extending) the timeouts in settings?

@saneshark
Copy link
Contributor

@annaswims could you link to devops settings for these in Prod/Staging/Dev? I think we will need a corresponding devops PR to change these.

Copy link
Contributor

@saneshark saneshark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only doing the request changes because there is a risk of accidentally merging this and breaking things with the other approvals.

end

def self.default_mvi_timeout
Rails.logger.warn 'Settings.mvi.timeout not set, using default'
10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind removing default like these, provided you can link to the devops tickets here in the PR so we can verify they are properly set. If they are not set for each environment there is a risk of blowing up one of those environments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're not set in ansible/deployment/config/vets-api/prod-settings.local.yml.j2 which becomes config/settings.local.yml. We're good because they're set in config/settings.yml in this repo, which doesn't get overridden.

expect(MVI::Configuration::TIMEOUT).to eq(10)
end
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i support removing these default ones, but can we instead replace them to ensure they have whatever is specified in Settings.yml... Ideally, be explicit in the spec.

ie. expect(MVI::Configuration.new.timeout).to eq(15) NOT expect(MVI::Configuration.new.timeout).to eq(Settings.mvi.readtimeout)

@va-bot va-bot temporarily deployed to mvi_timeout_refactor/master November 2, 2018 15:29 Inactive
@annaswims
Copy link
Contributor Author

@U-DON Correct. There was already code that made it look like Settings.mvi.open_timeout would change the timeout and now it really does.

@saneshark
Copy link
Contributor

@kreek do you see any problem with us changing these timeouts? It's in an effort to better determine failure rates and come up with better SLI around MVI related issues.

@saneshark saneshark requested a review from kreek November 5, 2018 18:21
@kreek
Copy link
Contributor

kreek commented Nov 5, 2018

@saneshark 15 seconds seems long to connect, but if it's been using that setting anyway then 👍
Is MVI having mainly read timeouts or connection timeouts?

Copy link
Contributor

@saneshark saneshark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@saneshark
Copy link
Contributor

@kreek probably connection timeouts if I had to guess, but its tough to tell for sure as I'm pretty sure the error returned for a timeout is the same for both.

@saneshark
Copy link
Contributor

I'm thinking we might want to sync with IAM on how they have things implemented on their end before changing any of these. The timeouts went up markedly beginning about 2.5 weeks ago when they introduced DMDC fallback in querying using attributes.

Depending on how they have their application configured we can probably set the open_timeout to something like 1 or 2 seconds, and increase the read_timeout to 15 seconds. But opening a connection shouldn't take long at all -- its the dependency on DMDC that might be hanging that is causing issues, so we should find out what their internal timeout is for DMDC, whether or not they cache that result, and most importantly what timeout they are using and either use something a little bit longer or sensible based on their recommendation.

@annaswims
Copy link
Contributor Author

annaswims commented Nov 7, 2018

This PR doesn't change current behavior, it just makes it possible to set Settings.mvi.open_timeout Settings.mvi.open_timeout and have it be effective. In master the settings do nothing and the default are {:open_timeout=>15, :timeout=>15}.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants