Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSSL 3.0 performance issue: SSLContext.set_default_verify_paths / load_verify_locations about 5x slower #95031

Open
fcfangcc opened this issue Jul 20, 2022 · 16 comments
Labels
3.12 bugs and security fixes extension-modules C modules in the Modules dir performance Performance or resource usage type-bug An unexpected behavior, bug, or error

Comments

@fcfangcc
Copy link

Bug report
Example code in ubuntu20.04(openssl1.1) is much faster than ubuntu22.04(openssl3.x)
Not just speed, CPU occupancy ubuntu22.04(openssl3.x) is many times of ubuntu20.04(openssl1.1)
I'm not sure whether it's OpenSSL or Python adaptation problem

import socket
import ssl
import time

import certifi
hostname = 'www.python.org'  # any support https hostname
times = 100
pem_where = certifi.where()
context = ssl.create_default_context()
verify_total_time = 0

for i in range(times):
    with socket.create_connection((hostname, 443)) as sock:
        with context.wrap_socket(sock, server_hostname=hostname) as ssock:
            verify_start_time = time.time()
            context.load_verify_locations(pem_where)
            verify_total_time += time.time() - verify_start_time
            ssock.version()
            
print(f"total {verify_total_time:.4f}, avg {verify_total_time/times:.4f}")

in my environment with docker:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 3321 root      20   0  304140  81148  12792 S  42.0   0.8   0:29.69 ipython    (ubuntu22.04)
 3850 root      20   0  203348  52632  11576 S  16.7   0.5   0:06.34 ipython    (ubuntu20.04)
total 5.8634, avg 0.0586  (ubuntu22.04)
total 0.6753, avg 0.0068  (ubuntu20.04)

Your environment

  • CPython versions tested on: 3.10.5
  • Operating system and architecture: ubuntu20.04(openssl1.1) and ubuntu22.04(openssl3.0.2), build from source
  • certifi==2022.6.15
@fcfangcc fcfangcc added the type-bug An unexpected behavior, bug, or error label Jul 20, 2022
@tiran
Copy link
Member

tiran commented Jul 20, 2022

It is a problem in OpenSSL 3.0. Python upstream does not support OpenSSL 3.0 for good reasons. It has performance and backwards compatibility issue. On my system load_verify_locations is about 5 times slower when using system certificates.

3.12.0a0 (heads/main-dirty:88e4eeba25d, Jul 20 2022, 08:22:18) [GCC 12.1.1 20220507 (Red Hat 12.1.1-1)]
OpenSSL 3.0.5 5 Jul 2022
100 loops of 'load_verify_locations' in 3.965sec
3.12.0a0 (heads/main-dirty:88e4eeba25d, Jul 20 2022, 08:22:18) [GCC 12.1.1 20220507 (Red Hat 12.1.1-1)]
OpenSSL 1.1.1n  15 Mar 2022
100 loops of 'load_verify_locations' in 0.871sec

By the way you should not combine ssl.create_default_context() with certifi. A default context already loads the system cert store.

@tiran
Copy link
Member

tiran commented Jul 20, 2022

import ssl
import sys
import time

LOOPS = 100

print(sys.version)
print(ssl.OPENSSL_VERSION)

ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)

start = time.monotonic()
for i in range(LOOPS):
    ctx.load_verify_locations('/etc/pki/tls/cert.pem')
dur = time.monotonic() - start
print(f"{LOOPS} loops of 'load_verify_locations' in {dur:0.3f}sec")

@fcfangcc
Copy link
Author

Thansk.Example is separate from requests,httpx......
It seems that using ubuntu22 with Python is not a good idea, it`s default openssl3.

@tiran
Copy link
Member

tiran commented Jul 20, 2022

I recommend that you raise a bug with OpenSSL. Their SSL_CTX_set_default_verify_paths and SSL_CTX_load_verify_locations functions are much slower in 3.0 than in 1.1.1.

@tiran
Copy link
Member

tiran commented Jul 20, 2022

According to "perf', OpenSSL 3.0 is spending a lot of time in pthread_rwlock lock/unlock followed by sa_doall, getrn, and several libcrypto string functions (ossl_lh_strcasehash, ossl_tolower, OPENSSL_strcasecmp, OPENSSL_sk_value).

@arhadthedev
Copy link
Member

a lot of time in pthread_rwlock lock/unlock

Fortunately, the issue is known (so no need to report it once more): openssl/openssl#16791 initial report and openssl/openssl#18814 pointing to a root issue.

@tiran tiran changed the title ssl module(load_verify_locations) with openssl3 there are performance problems OpenSSL 3.0 performance issue: SSLContext.set_default_verify_paths / load_verify_locations about 5x slower Jul 20, 2022
@tiran tiran added the performance Performance or resource usage label Jul 20, 2022
@iritkatriel iritkatriel added the 3.12 bugs and security fixes label Sep 12, 2022
@iritkatriel
Copy link
Member

Should we close this as a third party issue?

@risicle
Copy link

risicle commented Sep 28, 2023

As an outsider, it appears to me that there's always going to be a risk of this becoming a severe performance bottleneck as long as python's ssl API doesn't expose the ability to re-use an X509_STORE - forcing the system's CA bundle to be re-parsed for every new SSLContext.

@ThomasChr
Copy link

This one bit me today. And it did bite quite hard!

Beginning with cPython 3.11.5 (on Windows) we're shipping OpenSSL 3 instead of 1.1
So when I installed 3.12.0 on a customer system it took a few minutes until I got a call. The system is blocked, one Python Process takes all of the CPU and no one can work anymore.
Turns out that my Python Process which uses 32 Threads and does nothing more then sending some simple numbers into the Internet took all of the CPU - instead of 20% as it was with Python 3.11. Thats a very bad performance regression.

One of my other processes has some more accurate numbers - before:

15 seconds total, 1 second on CPU

and after:

30 seconds total, 15 seconds on CPU

Pretty sure you could have cooked coffee on the cpu after that.
(With cProfile you could see most of the CPU time in {method 'load_verify_locations' of '_ssl._SSLContext' objects})

On the upper level I'm using the requests library in my code and there where two solutions to the problem:

  1. Add verify=False to all of the requests
  2. Install Python 3.11.4

I used Method 2, but now I need to stay on Python 3.11.4 for all eternity. Hopefully this will be fixed some day soon.

Hope this post helps other people with the same problem. I will see if I'll add some info to the linked OpenSSL Issues also.

@ThomasChr
Copy link

ThomasChr commented Oct 5, 2023

As a quick solution:
It seems that the time is spent when OpenSSL verifys a certificate - can't we try to cache this verification? In my code I'm making thousands of requests to the exact same host - I don't need to verify it every time.
It would be a solution for the user to add verify=False beginning with the second request to the same host himself - but that means he needs to find out the root course first, which is not an easy task.

I'm a little bit sad that people will say that Python is dog-slow when really it isn't our fault. But saying "not our fault" won't help much here. Having a plan B would be great.
(I'm not entirely convinced that OpenSSL will fix this problem soon...)

@risicle
Copy link

risicle commented Oct 5, 2023

(Another workaround - if you're only connecting to one or a few hosts you may find that you're able to put together a custom, extremely minimal ca bundle with only the root cert(s) you need. This will be much faster to parse, though will have the same caveats as pinning certificates)

@ThomasChr
Copy link

ThomasChr commented Oct 5, 2023

Also one can only use verify=True when sending important data like passwords or user data. Otherwise verify=False will do.
This is not perfect but really testing the certificate in every connection won‘t be needed of the time.

@tiran
Copy link
Member

tiran commented Oct 5, 2023

Python has a cache for certificate verification: SSLContext. The simplest solution for your performance problem is a single SSLContext object for all your TLS client connection. Most client application only need a single SSLContext object during their lifetime. You configure the SSLContext according to your security profile, load the trust anchors, and then pass the object to your connection function. Then you have to pay the price for CA loading just once. SSLContext is thread and async-safe.

If you are using requests or httpx, then you want to make use of requests.Session or httpx.Client, too. They enable HTTP connection pooling, which speeds up multiple requests to the same host a lot.

@ThomasChr
Copy link

@tiran Using a Requests Session was great advice and it speeds up things considerably. But still we're cooking the cpu quite good.

This is Python 3.11 without a Requests Session:

Sent Article 1/1 105038 (took: 17.86s real time / 1.47s cpu time)

And Python 3.11 with a Requests Session:

Sent Article 1/1 105038 (took: 15.56s real time / 0.28s cpu time)

This is Python 3.12 without a Requests Session:

Sent Article 1/1 105038 (took: 36.96s real time / 29.86s cpu time)

And Python 3.12 with a Requests Session:

Sent Article 1/1 105038 (took: 17.84s real time / 7.64s cpu time)

I could live with that, so Python 3.12 is not a no-go anymore - thanks a lot for your advice!

@mm-matthias
Copy link

@tiran You've said here that

SSLContext is thread and async-safe.

Is this official? If yes, can it be added to the documentation?

I am asking, because the slowness with load_verify_locations pops up in downstream projects such as requests and botocore.
Using a session to perform requests only helps so much, some issues prevail in all of these libraries and the only way to solve them seems to be to load the SSLContext just once and then share it across threads, urllib3's HTTPConnectionPools and other places.
So it is important to know if the SSLContext can be freely shared between threads, e.g. see this question in a requests PR to tackle the problem.

@fireattack
Copy link
Contributor

fireattack commented May 21, 2024

import ssl
import sys
import time

LOOPS = 100

print(sys.version)
print(ssl.OPENSSL_VERSION)

ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)

start = time.monotonic()
for i in range(LOOPS):
    ctx.load_verify_locations('/etc/pki/tls/cert.pem')
dur = time.monotonic() - start
print(f"{LOOPS} loops of 'load_verify_locations' in {dur:0.3f}sec")

Others have mentioned how bad it is on Windows, but just to demonstrate it clearer here.

Using a slightly modified code based on this, loading site-packages/certifi/cacert.pem (284KB) which is what urllib3 loads every time it starts a new SSL context, it is 44x slower here on my WIndows 10 computer.

C:\sync\code\python\_gists>py -3.10 perf_ssl.py ssl
Python version:   3.10.5 (tags/v3.10.5:f377153, Jun  6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)]
OpenSSL version:  OpenSSL 1.1.1n  15 Mar 2022
Run "load_verify_locations()" on "site-packages/certifi/cacert.pem" 100 times...
100 loops of 'load_verify_locations' in 1.047sec

C:\sync\code\python\_gists>py -3.12 perf_ssl.py ssl
Python version:   3.12.1 (tags/v3.12.1:2305ca5, Dec  7 2023, 22:03:25) [MSC v.1937 64 bit (AMD64)]
OpenSSL version:  OpenSSL 3.0.11 19 Sep 2023
Run "load_verify_locations()" on "site-packages/certifi/cacert.pem" 100 times...
100 loops of 'load_verify_locations' in 46.735sec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes extension-modules C modules in the Modules dir performance Performance or resource usage type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

8 participants