Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fixes #86] Close the HTTP pool when service is stopped. #140

Merged
merged 18 commits into from Feb 14, 2020

Conversation

adiroiban
Copy link
Member

@adiroiban adiroiban commented Oct 30, 2018

Scope

This tries to fixes #86

When the service is stopped, it should automatically clean the resource and not left files opened or connections openend.

As a drive by, I added diff_cover reporting in coverage-report as is easier to see the branch coverage on my local system rather than rely on codecov.

Changes

For now I have hacked it the ugly way based and the code used in the tests.

As a drive by, I have documented the dev tools and created a GitHub template for future PRs.

How to review

testools has a strange taste and is hard to chew :) but in the end I manage to understand do something with this.

still, this does not makes sense and it does not check the exception type.

            self.assertThat(
                fixture.service.stopService(),
                failed(AfterPreprocessing(
                    lambda f: f.value.args[0], Equals('Failing at "stop".'))))

The integration tests are not executed on my computer.
I am running some tests, but the public port 80 is forwareded to my computer on port 8081 so that I don't have to deal extra config to listen on port below 1024.

So I think that besides $ACME_HOST we should also have an internal $ACME_CLIENT_PORT on which the test is listening.

I might need to work on #81 before merging this as otherwise is hard to test.


This change is Reviewable

@codecov
Copy link

codecov bot commented Oct 30, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@f75f441). Click here to learn what that means.
The diff coverage is 90%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #140   +/-   ##
=========================================
  Coverage          ?   99.26%           
=========================================
  Files             ?       27           
  Lines             ?     2044           
  Branches          ?      179           
=========================================
  Hits              ?     2029           
  Misses            ?       11           
  Partials          ?        4
Impacted Files Coverage Δ
src/txacme/client.py 98.14% <ø> (ø)
src/txacme/test/test_util.py 100% <ø> (ø)
src/txacme/test/test_service.py 100% <ø> (ø)
src/txacme/testing.py 96.12% <0%> (ø)
src/txacme/test/test_client.py 100% <100%> (ø)
src/txacme/endpoint.py 100% <100%> (ø)
src/txacme/test/strategies.py 100% <100%> (ø)
src/txacme/test/test_endpoint.py 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f75f441...b025cde. Read the comment docs.

@adiroiban
Copy link
Member Author

Instead of passing an jws_client to txacme.client.Client(jws_client) maybe the API can be changed to pass an Agent.

I do want to run txacme with a different agent, but I don't see why I would want a different JWSClient or a different Treq client.

Also, in tests in better to mock the lowest I/O level so Twisted Agent is much lower than JWSClient.

Right now I am using something like this in an HTTP factory where I am explicitly creating the agent and stoping the agent.

My PersistentAgent is a normal agent with automatically creates a pool, and has closePersistentConnections which is forwarded to the pool.

In this way, I am not touching the private stuff.

    def startFactory(self):
        """
        See: `Factory`.
        """
        super(LetsEncryptHTTPFactory, self).startFactory()

        acme_key = JWKRSA(key=serialization.load_pem_private_key(
            self.service_configuration._account_key.encode('ascii'),
            password=None,
            backend=default_backend(),
            ))

        self._agent = PersistentAgent(event_emitter=self.emitEvent)
        jws_client = JWSClient(
            HTTPClient(agent=self._agent), key=acme_key, alg=RS256)

        def client_creator():
            """
            Called before each request.
            """
            return Client.from_url(
                self._scheduler,
                URL.fromText(self.service_configuration.acme_url),
                key=acme_key,
                alg=RS256,
                jws_client=jws_client,
                )

        store =  CertificateStore(self.service_configuration, self._onUpdate)
        store.manage([d[0] for d in self.service_configuration.domains])

        self._acme_service = AcmeIssuingService(
            store,
            client_creator,
            self._scheduler,
            responders=[self._acme_responder],
            )
        self._acme_service.startService()

    @defer.inlineCallbacks
    def stopFactory(self):
        """
        Called before service is stopped.
        """
        if self._agent:
            yield self._agent.closePersistentConnections()
            self._agent = None

        if self._acme_service:
            yield self._acme_service.stopService()
            self._acme_service = None

        result = yield super(LetsEncryptHTTPFactory, self).stopFactory()
        defer.returnValue(result)

@mithrandi
Copy link
Contributor

Thanks for the PR! I hope to review this soon.

A quick note: to control what port the integration tests listen on, use ACME_ENDPOINT; for example, ACME_ENDPOINT=tcp:8081 would listen on port 8081. However tls-sni-01 is not available for new issuances due to https://community.letsencrypt.org/t/important-what-you-need-to-know-about-tls-sni-validation-issues/50811 so these tests will fail anyway I think.

@adiroiban
Copy link
Member Author

Thanks for the PR! I hope to review this soon.

No hurry. This actual code change in this PR was just to get going with txacme dev and see how good the current tests are.

This can be on hold.

I will need to start fixing #81 and then come back here.

I will try to document ACME_ENDPOINT as part of that PR.


I was expecting the tests to fail...so they are not that good.

Note that from the service I am calling txacme.Client.stop(), but I have implemented it only in the mocks and not in the read deal.

So it looks like, for now, there are no tests for the real deal.


I have pushed a new change which I think that is better as it touches less private things, but it breaks the tests as the tests are using treq.testing._SynchronousProducer

For my tests I am using a this code https://gist.github.com/adiroiban/a3dae64d14a56e8c0dbfea4bc6b4b827

In which for unit tests I use InMemoryPersistentAgent to rig the responses from the server without touching the net.

For functional tests I use HTTPServerContext which starts a real HTTP server executed in a separate thread and replays to certain HTTP requests.
Using HTTPServerContext I can tests the connection tool and connections lifetime.

@adiroiban
Copy link
Member Author

I have pushed a new API proposal.
In this one, the AcmeIssuingService already received a client which it can use to generate/update certs.

I think that this might be a better API, once txacme will also support account registration.

You can handle the account registration via the Client outside of the AcmeIssuingService and then AcmeIssuingService don't have to handle the registration.

I feel that once registration is added, there will be too much logic in AcmeIssuingService so extracting _ensure_registered and _register will simplify the AcmeIssuingService and the account registration and account login can be done somewhere else.

@adiroiban
Copy link
Member Author

Any update on this?
No hurry from my side, but it makes it hard to submit other PRs as I am working on a forked txacme

Copy link
Contributor

@mithrandi mithrandi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :) A few minor things to fix.

src/txacme/service.py Show resolved Hide resolved
src/txacme/service.py Outdated Show resolved Hide resolved
src/txacme/client.py Outdated Show resolved Hide resolved
src/txacme/client.py Show resolved Hide resolved
@mithrandi
Copy link
Contributor

"soon" :/

Copy link
Contributor

@mithrandi mithrandi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some failing tests; mostly looks like wrong argument passing.

self._current_request.cancel()
self._current_request = None

agent_pool = getattr(self._agent, 'pool', None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is supposed to be '_pool'


agent_pool = getattr(self._agent, 'pool', None)
if agent_pool:
return self._agent._pool.closeCachedConnections()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…and this agent_pool.

@adiroiban
Copy link
Member Author

thanks @mithrandi for the review.

I will work at fixing the tests and will send it for another round of reviews.

this is not ready for review or merge

@adiroiban
Copy link
Member Author

It looks like there is a leaking tests...

If I run just $ tox -e py27-twlatest txacme.test.test_service all tests are green.
If I run $ tox -e py27-twlatest txacme.test.test_service.AcmeIssuingServiceTests.test_default_panic fails... but if I am removing that test, there is another test failing...and so on.

It looks like there is a bug in testtools and it is not correctly checking that reactor is clean after each test.

I am not familiar with testtools.
For my code, I am using a custom xUnit test case in which reactor is stopped after each tests and checked that there are no pending delayed calls or readers/writers https://github.com/chevah/compat/blob/master/chevah/compat/testing/testcase.py#L758

@adiroiban
Copy link
Member Author

@mithrandi if you have time, please take a look at this PR and maybe you will know how to fix the failing tests.

The failure of txacme.test.test_service.AcmeIssuingServiceTests.test_default_panic is a false lead.
That test is ok. I have removed it and then txacme.test.test_service.AcmeIssuingServiceTests.test_errors is the one that fails.

Thanks!

@adiroiban
Copy link
Member Author

In terms of https://twistedmatrix.com/documents/current/api/twisted.web.iweb.IAgent.html API, I don't think that an Agent.shutdown() API is ok.

I think that twisted should provide the barebone low level API, and maybe have something like treq.client.HTTPClient.shutdown() high level API.

@adiroiban
Copy link
Member Author

I have enabled the debugging for DelayedCall and I found the issue... it was src/txacme/test/test_endpoint.py:test_parser

will try to see how to fix it.

@adiroiban
Copy link
Member Author

Ok... so the issue is in Treq... when treq makes a request with timeout, it will not use the agent's reactor to schedule the timeout, but will use its own default reactor ...

Looks like this is an already know issue twisted/treq#257

I have fixed it by updating the API to take a "timeot" and set it to 0 in the test.

I hope that this time the tests are green and we can merge it.

With these green tests I am more confident in txacme and can start working on ACMEv2... I already got a pebble docker container on my local system and I am looking into v2 protocol

@adiroiban
Copy link
Member Author

Tests are failing due to txacme and hypotesis dependencies.

v1 support was removed form txacme v1.0.0 and newer.... so I have updated setup.py to use pre 1.0.0 versions.

For hypotesis... I don't know.. I have pinned it as I think that we should pin dev dependencies.... but that might need

For me using things like eliot or hypothesis makes txacme more complex.. and I don't see how this makes txacme simpler or easier to use

@mithrandi
Copy link
Contributor

Working on some fixes in #152

@adiroiban
Copy link
Member Author

Thanks @mithrandi . While working on #152 maybe you can also include 793ea76 (with the exception of that "return")

And maybe use TXACME test case for all txacme test.. In this way is easier to catch tests with side effects

I am now happy with this branch.

I start working on ACME v2

@mithrandi
Copy link
Contributor

#152 is now merged although there are unfortunately conflicts now.

@adiroiban
Copy link
Member Author

I think that I can resolve them...
Please double check the change in endpoint.py

@adiroiban
Copy link
Member Author

if we migrate to acmev2, in theory we could downoad pebble binary on Traci-CI and run some integration tests against the local pebble

Using the staging ACME is complicated due to setting up port 80

@mithrandi
Copy link
Contributor

So I think this is ready to land once the tests are green. There seem to be two failure causes:

  1. twlowest builds failing; not sure why but we can just bump the Twisted dep version probably?

  2. docs issue.

@adiroiban
Copy link
Member Author

I have fixed the docs... I have no idea why it was not finding the Deferred reference
I have update the tox to have sphinx show all errors and warnings in a single build.


The twisted lowers failure looks like an error from a different test.

I have executed just the integration tests in the twlowest env and they pass

$ tox -e py27-twlowest-alldeps src/integration

So I think it is just a matter of luck why those tests don't fail with latest Twisted

The offending test is txacme.test.test_endpoint.PluginTests.test_parser

$ tox -e py27-twlowest-alldeps txacme.test.test_endpoint.PluginTests.test_parser integration

The problem is that I have fixed the timeout for JWS client, but that test is using the reactor to connect

I will try to fix the test

@mithrandi mithrandi merged commit 9f60aae into master Feb 14, 2020
@mithrandi mithrandi deleted the 86-service-stop branch February 14, 2020 08:28
@adiroiban
Copy link
Member Author

Thanks for the merge :) This was hard but we made good progress.

@mithrandi
Copy link
Contributor

Thanks for the PR :)

@glyph
Copy link
Member

glyph commented Feb 24, 2020

In addition to introducing a nasty regression (#154) this failed to update the documentation ("client_creator" is still specified as the parameter despite the change to client). I'm inclined to roll this back?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stopService() leaves pooled connections open
3 participants