Performance regressions in Python in version 3.18.0 #9014

BenRKarl · 2021-09-23T14:40:49Z

What version of protobuf and what language are you using?
Version: 3.18.0
Language: Python (version 3.7.0)

What operating system (Linux, Windows, ...) and version?
gLinux

What did you do?
Run the below script using protobuf version 3.17.3 then run it again using protobuf version 3.18.0 and compare the results. I ran this 30 times per version, each in a fresh python interpreter, and compared the averages.

import proto
from time import time
 
 
class ProtoPlusFixture(proto.Message):
    name = proto.Field(proto.STRING, number=1)
 
init_start = time()
pplus = ProtobufFixture()
init = time() - init_start
 
set_start = time()
pplus.name = "Test"
set_time = time() - set_start
 
print(f"{init * 1000}, {set_time * 1000}")

What did you expect to see
I expect to see the performance to be similar for each version.

What did you see instead?

3.18.0 class init avg. = 0.005571047465
3.17.3 class init avg. = 0.004593531291
3.18.0 field set avg. = 0.02173582713
3.17.3 field set avg. = 0.01894632975

That is a 21% regression in class initialization and a 14% regression when setting fields.

Anything else we should know about your project / environment

We have an internal performance framework that shows that these regressions are significant at a larger scale as well, (i.e. reading and writing several fields, processing larger requests with nested fields) and we began seeing degradations around the time that 3.18.0 was published.

The text was updated successfully, but these errors were encountered:

BenRKarl · 2021-09-23T15:04:05Z

If you run a similar test with the below generated _pb2 you can see an even more pronounced regression:

3.18.0 protobuf class init avg. = 0.004307428996
3.17.3 protobuf class init avg. = 0.002670288086
3.18.0 protobuf field set avg. = 0.003298123678
3.17.3 protobuf field set avg. = 0.001279513041

That is a 61% regression in class initialization and a 157% regression when setting fields.

# -*- coding: utf-8 -*-
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# source: protobuf_fixture.proto

from google.protobuf import descriptor as _descriptor
from google.protobuf import message as _message
from google.protobuf import reflection as _reflection
from google.protobuf import symbol_database as _symbol_database

# @@protoc_insertion_point(imports)

_sym_db = _symbol_database.Default()


DESCRIPTOR = _descriptor.FileDescriptor(
    name="protobuf_fixture.proto",
    package="",
    syntax="proto3",
    serialized_options=None,
    create_key=_descriptor._internal_create_key,
    serialized_pb=b'\n\x16protobuf_fixture.proto"\x1f\n\x0fProtobufFixture\x12\x0c\n\x04name\x18\x01 \x01(\tb\x06proto3',
)


_PROTOBUFFIXTURE = _descriptor.Descriptor(
    name="ProtobufFixture",
    full_name="ProtobufFixture",
    filename=None,
    file=DESCRIPTOR,
    containing_type=None,
    create_key=_descriptor._internal_create_key,
    fields=[
        _descriptor.FieldDescriptor(
            name="name",
            full_name="ProtobufFixture.name",
            index=0,
            number=1,
            type=9,
            cpp_type=9,
            label=1,
            has_default_value=False,
            default_value=b"".decode("utf-8"),
            message_type=None,
            enum_type=None,
            containing_type=None,
            is_extension=False,
            extension_scope=None,
            serialized_options=None,
            file=DESCRIPTOR,
            create_key=_descriptor._internal_create_key,
        ),
    ],
    extensions=[],
    nested_types=[],
    enum_types=[],
    serialized_options=None,
    is_extendable=False,
    syntax="proto3",
    extension_ranges=[],
    oneofs=[],
    serialized_start=26,
    serialized_end=57,
)

DESCRIPTOR.message_types_by_name["ProtobufFixture"] = _PROTOBUFFIXTURE
_sym_db.RegisterFileDescriptor(DESCRIPTOR)

ProtobufFixture = _reflection.GeneratedProtocolMessageType(
    "ProtobufFixture",
    (_message.Message,),
    {
        "DESCRIPTOR": _PROTOBUFFIXTURE,
        "__module__": "protobuf_fixture_pb2"
        # @@protoc_insertion_point(class_scope:ProtobufFixture)
    },
)
_sym_db.RegisterMessage(ProtobufFixture)


# @@protoc_insertion_point(module_scope)

BenRKarl · 2021-09-23T15:04:43Z

cc @haberman

anandolee · 2021-09-23T16:16:35Z

Ben, are you using pure python or cpp extension? You can print the api_implementation.Type() to know the version

protobuf/python/google/protobuf/internal/api_implementation.py

Line 141 in 75eff10

def Type():

BenRKarl · 2021-09-23T16:39:03Z

@anandolee this is the "cpp" extension.

haberman · 2021-09-23T17:11:28Z

@BenRKarl could you include the code that is using ProtobufFixture in your second example, so we have a full repro of the regression that uses protobuf only?

BenRKarl · 2021-09-23T17:28:17Z

@haberman sure thing, assuming you have the above generated _pb2 saved in a separate file called protobuf_fixture_pb2.py

from protobuf_fixture_pb2 import ProtobufFixture
from time import time

init_start = time()
pplus = ProtobufFixture()
init = time() - init_start

set_start = time()
pplus.name = "Test"
set_time = time() - set_start

print(f"{init * 1000}, {set_time * 1000}")

haberman · 2021-09-23T23:13:41Z

@anandolee and I have been looking into this and we've had trouble reproducing this regression.

I modified your script a bit to use timeit() to run the operation repeatedly, to try and get a number with fewer decimal places, and to try to reduce noise/variance and measurement overhead in the result.

from protobuf_fixture_pb2 import ProtobufFixture
import google.protobuf as pb
import timeit

def BM(func):
  timer = timeit.Timer(func, globals=globals())
  iters, time = timer.autorange()
  ns_per_iter = time / iters * (10 ** 9)
  return f"{int(ns_per_iter)} ns/iter"

msg = ProtobufFixture()
create = BM('msg = ProtobufFixture()')
assign = BM('msg.name = "Test"')
print(f"Version: {pb.__version__}, create: {create}, assign: {assign}")

I tried this with 3.18.0 and 3.17.3, with both the pure-Python and Python/C++ implementations. My results were:

# 3.18.0
$ PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp python test.py 
Version: 3.18.0, create: 141 ns/iter, assign: 121 ns/iter
$ PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python python test.py 
Version: 3.18.0, create: 962 ns/iter, assign: 408 ns/iter

# 3.17.3
$ PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp python test.py 
Version: 3.17.3, create: 137 ns/iter, assign: 114 ns/iter
$ PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python python test.py 
Version: 3.17.3, create: 921 ns/iter, assign: 434 ns/iter

The differences I'm seeing are minor, nothing approaching the 61%/157% you saw.

I'm not sure why we would be getting different results from you. What results do you get from my script?

jradcliff · 2021-09-24T16:01:09Z

@haberman , @anandolee , I'm seeing similar results to yours, but I was able to create a test case that demonstrates the differences between versions more definitively. This test requires that you:

pip install googleads==20.0.0.
Have the report_stream_response_M.pb file, which is a 61MB SearchGoogleAdsStreamResponse.

from google.ads.googleads.v8 import SearchGoogleAdsStreamResponse
import google.protobuf as pb
import timeit

def BM(func):
  timer = timeit.Timer(func, globals=globals())
  iters, time = timer.autorange()
  ns_per_iter = time / iters * (10 ** 9)
  return f"{int(ns_per_iter)} ns/iter"

def Stream():
  with open('report_stream_response_M.pb', 'rb') as f:
    resp = SearchGoogleAdsStreamResponse()._pb
    resp.ParseFromString(f.read())

parse = BM(Stream)
print(f"Version: {pb.__version__}, parse: {parse}")

Results:

Version: 3.17.3, parse: 2461624174 ns/iter
Version: 3.17.3, parse: 2387845156 ns/iter
Version: 3.17.3, parse: 2170255983 ns/iter
Version: 3.17.3, parse: 2198701215 ns/iter
Version: 3.17.3, parse: 2151988237 ns/iter
Version: 3.18.0, parse: 2920992690 ns/iter
Version: 3.18.0, parse: 2864872776 ns/iter
Version: 3.18.0, parse: 2883335483 ns/iter
Version: 3.18.0, parse: 3020910380 ns/iter
Version: 3.18.0, parse: 2945395928 ns/iter

Since report_stream_response_M.pb is too large to attach here, I'll share it with you two directly.

jradcliff · 2021-09-24T18:15:52Z

With the larger input file report_stream_response_L.pb that I mentioned elsewhere, the differences are more pronounced:

Version: 3.17.3, parse: 7917699648 ns/iter
Version: 3.17.3, parse: 7747212633 ns/iter
Version: 3.17.3, parse: 7861812523 ns/iter
Version: 3.17.3, parse: 7952817828 ns/iter
Version: 3.17.3, parse: 7829660125 ns/iter
Version: 3.18.0, parse: 9494568636 ns/iter
Version: 3.18.0, parse: 9405172989 ns/iter
Version: 3.18.0, parse: 9526017846 ns/iter
Version: 3.18.0, parse: 9433147154 ns/iter
Version: 3.18.0, parse: 9515509203 ns/iter

haberman · 2021-09-24T20:45:54Z

I was able to reproduce this, thanks.

Here are profiles before and after. In 3.18.0 we have a costly DynamicMessage::MutableRaw() call showing up that we didn't see before. This accounts for upwards of 17% of the CPU, which closely matches the ~20% regression we are seeing.

3.17.3:

  21.69%  python   _message.cpython-39-x86_64-linux-gnu.so  [.] google::protobuf::DynamicMessage::SharedCtor                                                                              
  18.13%  python   _message.cpython-39-x86_64-linux-gnu.so  [.] google::protobuf::DynamicMessage::~DynamicMessage                                                                         
   3.85%  python   libc-2.31.so                             [.] malloc_consolidate                                                                                                        
   2.99%  python   libc-2.31.so                             [.] _int_malloc                                                                                                               
   2.49%  python   _message.cpython-39-x86_64-linux-gnu.so  [.] google::protobuf::internal::ArenaStringPtr::Destroy                                                                       
   2.23%  python   [kernel.kallsyms]                        [k] clear_page_erms                                                                                                           
   2.02%  python   _message.cpython-39-x86_64-linux-gnu.so  [.] 0x00000000000f6681                                                                                                        
   1.86%  python   libc-2.31.so                             [.] _int_free                                                                                                                 
   1.80%  python   _message.cpython-39-x86_64-linux-gnu.so  [.] google::protobuf::internal::WireFormat::_InternalParseAndMergeField

3.18.0:

  17.57%  python   _message.cpython-39-x86_64-linux-gnu.so  [.] google::protobuf::DynamicMessage::SharedCtor                                                                              
  16.99%  python   _message.cpython-39-x86_64-linux-gnu.so  [.] google::protobuf::DynamicMessage::MutableRaw                                                                              
  15.36%  python   _message.cpython-39-x86_64-linux-gnu.so  [.] google::protobuf::DynamicMessage::~DynamicMessage                                                                         
   2.71%  python   _message.cpython-39-x86_64-linux-gnu.so  [.] google::protobuf::internal::ArenaStringPtr::Destroy                                                                       
   2.62%  python   libc-2.31.so                             [.] _int_malloc                                                                                                               
   2.26%  python   _message.cpython-39-x86_64-linux-gnu.so  [.] google::protobuf::Descriptor::FindFieldByNumber                                                                           
   1.97%  python   [kernel.kallsyms]                        [k] clear_page_erms                                                                                                           
   1.87%  python   libc-2.31.so                             [.] _int_free

BenRKarl · 2021-10-18T16:54:08Z

@haberman I'm actually continuing to see regressions in our benchmark framework with protobuf 3.18.1 installed, and I see a pretty significant difference when running the script @jradcliff shared earlier, using the report_stream_response_L.pb file:

Version: 3.17.3, parse: 7848545372 ns/iter (7.84s)
Version: 3.17.3, parse: 7870161324 ns/iter (7.87s)
Version: 3.17.3, parse: 7793300010 ns/iter (7.79s)
Version: 3.17.3, parse: 7804926133 ns/iter (7.80s)
Version: 3.17.3, parse: 7732892461 ns/iter (7.73s)
Version: 3.17.3, parse: 7771179428 ns/iter (7.77s)
Version: 3.18.1, parse: 59520486019 ns/iter (59.5s)
Version: 3.18.1, parse: 59328161227 ns/iter (59.3s)
Version: 3.18.1, parse: 59072955409 ns/iter (59.1s)
Version: 3.18.1, parse: 59848669674 ns/iter (59.8s)
Version: 3.18.1, parse: 59406787236 ns/iter (59.4s)
Version: 3.18.1, parse: 59375098546 ns/iter (59.4s)

I'm running this script on a gLinux machine, using Python 3.7.0 and a new virtual environment where I switch between protobuf versions using python -m pip install protobuf==3.18.1. Is there perhaps something additional I need to do to activate the change merged here?

BenRKarl · 2021-10-18T18:35:28Z

Seems like the problem might be that 3.18.1 defaults to the Python implementation instead of the cpp implementation. When I specify cpp explicitly with export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp I hit an error in descriptor.py

ImportError: cannot import name '_message' from 'google.protobuf.pyext'

If we can get this back to defaulting to cpp then the regression may be resolved.

haberman · 2021-10-18T18:38:53Z

Yes I also had suspected that perhaps you were getting pure Python parsing.

If _message cannot be loaded, that would explain why it is defaulting to pure-Python. The library tests at startup whether the C++ extension can be loaded, and will fall back to pure-Python if loading it is not possible.

I wonder if the 3.18.1 release did not properly build and upload the C++ extension for some reason. I'll look into this.

BenRKarl · 2021-10-18T18:42:38Z

If it helps, in my local .../site-packages/google/protobuf/pyext dir there is a _message.cpython-37m-x86_64-linux-gnu.so file when I have 3.17.3 installed but not when I have 3.18.1 installed.

haberman · 2021-10-18T18:47:57Z

Yes that would definitely explain it. It's odd though, because I just tried in a bare venv and it worked for me:

$ uname -o -m
x86_64 GNU/Linux
$ python -mvenv /tmp/venv
$ . /tmp/venv/bin/activate
(venv) $ pip install protobuf
Collecting protobuf
  Downloading protobuf-3.18.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
     |████████████████████████████████| 1.1 MB 8.2 MB/s 
Installing collected packages: protobuf
Successfully installed protobuf-3.18.1
(venv) $ pip show -f protobuf | grep so$
  google/protobuf/internal/_api_implementation.cpython-39-x86_64-linux-gnu.so
  google/protobuf/pyext/_message.cpython-39-x86_64-linux-gnu.so
(venv) $ python
Python 3.9.7 (default, Sep  3 2021, 06:18:44) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from google.protobuf.internal import api_implementation
>>> api_implementation.Type()
'cpp'

What do you get when running these commands?

BenRKarl · 2021-10-18T18:57:44Z

Using the same virtual env I was using for testing earlier:

$ uname -o -m
x86_64 GNU/Linux
$ python -mvenv /tmp/venv
$ . /tmp/venv/bin/activate
(venv) $ pip install protobuf
Collecting protobuf
  Using cached https://files.pythonhosted.org/packages/ec/2e/309682e1fe979b17d6fec004e8107a953d1300b93c32c397a9c3ca60322e/protobuf-3.18.1-py2.py3-none-any.whl
Installing collected packages: protobuf
Successfully installed protobuf-3.18.1
You are using pip version 10.0.1, however version 21.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
(venv)  $ python
Python 3.7.0 (default, Aug 31 2020, 13:17:04) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from google.protobuf.internal import api_implementation
>>> api_implementation.Type()
'cpp'

So similar behavior here. I think the main difference is that I'm using python -m pip install protobuf==3.18.1 instead of just pip install protobuf but I'm not sure how that would make a difference. In our performance framework we do a python -m pip install . against a setup.py file and I'm assuming get the same issue of defaulting to the Python implementation, since we see regressions there.

BenRKarl · 2021-10-18T19:04:56Z

I realized I had PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp set in my env when I ran those steps, if I unset PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION and try again the result is:

>>> from google.protobuf.internal import api_implementation
>>> api_implementation.Type()
'python'

BenRKarl · 2021-10-25T15:50:08Z

@haberman I ran some additional tests with 3.19.0 and think that this performance issue is probably solely due to inconsistencies with defaulting to the python/cpp implementation. Here are some numbers below, which show good performance with protobuf 3.19.0 with the cpp implementation and poor performance with 3.17.3 with the python implementation:

Pb implementation: python, Protobuf version: 3.17.3, Python version: 3.10.0 (default, Oct 18 2021, 14:28:16) [GCC 10.3.0] Parse time: 60504116065 ns/iter
Pb implementation: python, Protobuf version: 3.17.3, Python version: 3.10.0 (default, Oct 18 2021, 14:28:16) [GCC 10.3.0] Parse time: 60594600953 ns/iter
Pb implementation: python, Protobuf version: 3.17.3, Python version: 3.10.0 (default, Oct 18 2021, 14:28:16) [GCC 10.3.0] Parse time: 61385578474 ns/iter
Pb implementation: python, Protobuf version: 3.17.3, Python version: 3.10.0 (default, Oct 18 2021, 14:28:16) [GCC 10.3.0] Parse time: 61258974252 ns/iter
Pb implementation: python, Protobuf version: 3.17.3, Python version: 3.10.0 (default, Oct 18 2021, 14:28:16) [GCC 10.3.0] Parse time: 61688387331 ns/iter

(test-pb-3.10) benkarl@benkarl:~/google/ads/client-libs/python/googleads/pb-performance$ for i in {1..5}; do python measure.py; done
Pb implementation: cpp, Protobuf version: 3.19.0, Python version: 3.10.0 (default, Oct 18 2021, 14:28:16) [GCC 10.3.0] Parse time: 7957506182 ns/iter
Pb implementation: cpp, Protobuf version: 3.19.0, Python version: 3.10.0 (default, Oct 18 2021, 14:28:16) [GCC 10.3.0] Parse time: 7925806072 ns/iter
Pb implementation: cpp, Protobuf version: 3.19.0, Python version: 3.10.0 (default, Oct 18 2021, 14:28:16) [GCC 10.3.0] Parse time: 7903559552 ns/iter
Pb implementation: cpp, Protobuf version: 3.19.0, Python version: 3.10.0 (default, Oct 18 2021, 14:28:16) [GCC 10.3.0] Parse time: 7903983959 ns/iter
Pb implementation: cpp, Protobuf version: 3.19.0, Python version: 3.10.0 (default, Oct 18 2021, 14:28:16) [GCC 10.3.0] Parse time: 7910140319 ns/iter

The problem is that I can't pin down the pattern the causes the library to default to one implementation or the other, I've tested different language versions, different installation patterns (i.e. python -m pip install protobuf==3.19.0 v. python -m pip install protobuf v. pip install protobuf), and setting/not setting the PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION and nothing yet seems to point to a cause.

BenRKarl · 2021-10-25T15:51:36Z

One thing to add, manually specifying PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp, which should be the default, now often results in the error:

ImportError: cannot import name '_message' from 'google.protobuf.pyext'

So currently there's no reliable workaround.

elharo added the python label Sep 23, 2021

haberman mentioned this issue Sep 24, 2021

Performance fix for DynamicMessage: force GetRaw() to be inlined. #9023

Merged

haberman closed this as completed Oct 17, 2021

BenRKarl mentioned this issue Oct 25, 2021

Severe performance degradation for some operations in Python 3.10 googleads/google-ads-python#514

Closed

BenRKarl mentioned this issue Oct 25, 2021

Performance regression with repeated fields in Python 3.10 googleapis/proto-plus-python#266

Closed

BenRKarl mentioned this issue Nov 2, 2021

Inconsistent defaulting to python implementation, and errors when using cpp #9180

Closed

hamptonm1 mentioned this issue Nov 30, 2023

Upgrade to ONNX 1.15.0 onnx/onnx-mlir#2649

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regressions in Python in version 3.18.0 #9014

Performance regressions in Python in version 3.18.0 #9014

BenRKarl commented Sep 23, 2021 •

edited

BenRKarl commented Sep 23, 2021

BenRKarl commented Sep 23, 2021

anandolee commented Sep 23, 2021

BenRKarl commented Sep 23, 2021

haberman commented Sep 23, 2021

BenRKarl commented Sep 23, 2021

haberman commented Sep 23, 2021 •

edited

jradcliff commented Sep 24, 2021

jradcliff commented Sep 24, 2021

haberman commented Sep 24, 2021

BenRKarl commented Oct 18, 2021

BenRKarl commented Oct 18, 2021

haberman commented Oct 18, 2021

BenRKarl commented Oct 18, 2021

haberman commented Oct 18, 2021 •

edited

BenRKarl commented Oct 18, 2021 •

edited

BenRKarl commented Oct 18, 2021

BenRKarl commented Oct 25, 2021

BenRKarl commented Oct 25, 2021

Performance regressions in Python in version 3.18.0 #9014

Performance regressions in Python in version 3.18.0 #9014

Comments

BenRKarl commented Sep 23, 2021 • edited

BenRKarl commented Sep 23, 2021

BenRKarl commented Sep 23, 2021

anandolee commented Sep 23, 2021

BenRKarl commented Sep 23, 2021

haberman commented Sep 23, 2021

BenRKarl commented Sep 23, 2021

haberman commented Sep 23, 2021 • edited

jradcliff commented Sep 24, 2021

jradcliff commented Sep 24, 2021

haberman commented Sep 24, 2021

BenRKarl commented Oct 18, 2021

BenRKarl commented Oct 18, 2021

haberman commented Oct 18, 2021

BenRKarl commented Oct 18, 2021

haberman commented Oct 18, 2021 • edited

BenRKarl commented Oct 18, 2021 • edited

BenRKarl commented Oct 18, 2021

BenRKarl commented Oct 25, 2021

BenRKarl commented Oct 25, 2021

BenRKarl commented Sep 23, 2021 •

edited

haberman commented Sep 23, 2021 •

edited

haberman commented Oct 18, 2021 •

edited

BenRKarl commented Oct 18, 2021 •

edited