Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regressions in Python in version 3.18.0 #9014

Closed
BenRKarl opened this issue Sep 23, 2021 · 19 comments
Closed

Performance regressions in Python in version 3.18.0 #9014

BenRKarl opened this issue Sep 23, 2021 · 19 comments
Labels

Comments

@BenRKarl
Copy link

BenRKarl commented Sep 23, 2021

What version of protobuf and what language are you using?
Version: 3.18.0
Language: Python (version 3.7.0)

What operating system (Linux, Windows, ...) and version?
gLinux

What did you do?
Run the below script using protobuf version 3.17.3 then run it again using protobuf version 3.18.0 and compare the results. I ran this 30 times per version, each in a fresh python interpreter, and compared the averages.

import proto
from time import time
 
 
class ProtoPlusFixture(proto.Message):
    name = proto.Field(proto.STRING, number=1)
 
init_start = time()
pplus = ProtobufFixture()
init = time() - init_start
 
set_start = time()
pplus.name = "Test"
set_time = time() - set_start
 
print(f"{init * 1000}, {set_time * 1000}")

What did you expect to see
I expect to see the performance to be similar for each version.

What did you see instead?

  • 3.18.0 class init avg. = 0.005571047465
  • 3.17.3 class init avg. = 0.004593531291
  • 3.18.0 field set avg. = 0.02173582713
  • 3.17.3 field set avg. = 0.01894632975

That is a 21% regression in class initialization and a 14% regression when setting fields.

Anything else we should know about your project / environment

We have an internal performance framework that shows that these regressions are significant at a larger scale as well, (i.e. reading and writing several fields, processing larger requests with nested fields) and we began seeing degradations around the time that 3.18.0 was published.

@BenRKarl
Copy link
Author

If you run a similar test with the below generated _pb2 you can see an even more pronounced regression:

  • 3.18.0 protobuf class init avg. = 0.004307428996
  • 3.17.3 protobuf class init avg. = 0.002670288086
  • 3.18.0 protobuf field set avg. = 0.003298123678
  • 3.17.3 protobuf field set avg. = 0.001279513041

That is a 61% regression in class initialization and a 157% regression when setting fields.

# -*- coding: utf-8 -*-
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# source: protobuf_fixture.proto

from google.protobuf import descriptor as _descriptor
from google.protobuf import message as _message
from google.protobuf import reflection as _reflection
from google.protobuf import symbol_database as _symbol_database

# @@protoc_insertion_point(imports)

_sym_db = _symbol_database.Default()


DESCRIPTOR = _descriptor.FileDescriptor(
    name="protobuf_fixture.proto",
    package="",
    syntax="proto3",
    serialized_options=None,
    create_key=_descriptor._internal_create_key,
    serialized_pb=b'\n\x16protobuf_fixture.proto"\x1f\n\x0fProtobufFixture\x12\x0c\n\x04name\x18\x01 \x01(\tb\x06proto3',
)


_PROTOBUFFIXTURE = _descriptor.Descriptor(
    name="ProtobufFixture",
    full_name="ProtobufFixture",
    filename=None,
    file=DESCRIPTOR,
    containing_type=None,
    create_key=_descriptor._internal_create_key,
    fields=[
        _descriptor.FieldDescriptor(
            name="name",
            full_name="ProtobufFixture.name",
            index=0,
            number=1,
            type=9,
            cpp_type=9,
            label=1,
            has_default_value=False,
            default_value=b"".decode("utf-8"),
            message_type=None,
            enum_type=None,
            containing_type=None,
            is_extension=False,
            extension_scope=None,
            serialized_options=None,
            file=DESCRIPTOR,
            create_key=_descriptor._internal_create_key,
        ),
    ],
    extensions=[],
    nested_types=[],
    enum_types=[],
    serialized_options=None,
    is_extendable=False,
    syntax="proto3",
    extension_ranges=[],
    oneofs=[],
    serialized_start=26,
    serialized_end=57,
)

DESCRIPTOR.message_types_by_name["ProtobufFixture"] = _PROTOBUFFIXTURE
_sym_db.RegisterFileDescriptor(DESCRIPTOR)

ProtobufFixture = _reflection.GeneratedProtocolMessageType(
    "ProtobufFixture",
    (_message.Message,),
    {
        "DESCRIPTOR": _PROTOBUFFIXTURE,
        "__module__": "protobuf_fixture_pb2"
        # @@protoc_insertion_point(class_scope:ProtobufFixture)
    },
)
_sym_db.RegisterMessage(ProtobufFixture)


# @@protoc_insertion_point(module_scope)

@BenRKarl
Copy link
Author

cc @haberman

@anandolee
Copy link
Contributor

Ben, are you using pure python or cpp extension? You can print the api_implementation.Type() to know the version

@BenRKarl
Copy link
Author

@anandolee this is the "cpp" extension.

@haberman
Copy link
Member

@BenRKarl could you include the code that is using ProtobufFixture in your second example, so we have a full repro of the regression that uses protobuf only?

@BenRKarl
Copy link
Author

@haberman sure thing, assuming you have the above generated _pb2 saved in a separate file called protobuf_fixture_pb2.py

from protobuf_fixture_pb2 import ProtobufFixture
from time import time

init_start = time()
pplus = ProtobufFixture()
init = time() - init_start

set_start = time()
pplus.name = "Test"
set_time = time() - set_start

print(f"{init * 1000}, {set_time * 1000}")

@elharo elharo added the python label Sep 23, 2021
@haberman
Copy link
Member

haberman commented Sep 23, 2021

@anandolee and I have been looking into this and we've had trouble reproducing this regression.

I modified your script a bit to use timeit() to run the operation repeatedly, to try and get a number with fewer decimal places, and to try to reduce noise/variance and measurement overhead in the result.

from protobuf_fixture_pb2 import ProtobufFixture
import google.protobuf as pb
import timeit

def BM(func):
  timer = timeit.Timer(func, globals=globals())
  iters, time = timer.autorange()
  ns_per_iter = time / iters * (10 ** 9)
  return f"{int(ns_per_iter)} ns/iter"

msg = ProtobufFixture()
create = BM('msg = ProtobufFixture()')
assign = BM('msg.name = "Test"')
print(f"Version: {pb.__version__}, create: {create}, assign: {assign}")

I tried this with 3.18.0 and 3.17.3, with both the pure-Python and Python/C++ implementations. My results were:

# 3.18.0
$ PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp python test.py 
Version: 3.18.0, create: 141 ns/iter, assign: 121 ns/iter
$ PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python python test.py 
Version: 3.18.0, create: 962 ns/iter, assign: 408 ns/iter

# 3.17.3
$ PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp python test.py 
Version: 3.17.3, create: 137 ns/iter, assign: 114 ns/iter
$ PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python python test.py 
Version: 3.17.3, create: 921 ns/iter, assign: 434 ns/iter

The differences I'm seeing are minor, nothing approaching the 61%/157% you saw.

I'm not sure why we would be getting different results from you. What results do you get from my script?

@jradcliff
Copy link

@haberman , @anandolee , I'm seeing similar results to yours, but I was able to create a test case that demonstrates the differences between versions more definitively. This test requires that you:

  1. pip install googleads==20.0.0.
  2. Have the report_stream_response_M.pb file, which is a 61MB SearchGoogleAdsStreamResponse.
from google.ads.googleads.v8 import SearchGoogleAdsStreamResponse
import google.protobuf as pb
import timeit

def BM(func):
  timer = timeit.Timer(func, globals=globals())
  iters, time = timer.autorange()
  ns_per_iter = time / iters * (10 ** 9)
  return f"{int(ns_per_iter)} ns/iter"

def Stream():
  with open('report_stream_response_M.pb', 'rb') as f:
    resp = SearchGoogleAdsStreamResponse()._pb
    resp.ParseFromString(f.read())

parse = BM(Stream)
print(f"Version: {pb.__version__}, parse: {parse}")

Results:

Version: 3.17.3, parse: 2461624174 ns/iter
Version: 3.17.3, parse: 2387845156 ns/iter
Version: 3.17.3, parse: 2170255983 ns/iter
Version: 3.17.3, parse: 2198701215 ns/iter
Version: 3.17.3, parse: 2151988237 ns/iter
Version: 3.18.0, parse: 2920992690 ns/iter
Version: 3.18.0, parse: 2864872776 ns/iter
Version: 3.18.0, parse: 2883335483 ns/iter
Version: 3.18.0, parse: 3020910380 ns/iter
Version: 3.18.0, parse: 2945395928 ns/iter

Since report_stream_response_M.pb is too large to attach here, I'll share it with you two directly.

@jradcliff
Copy link

With the larger input file report_stream_response_L.pb that I mentioned elsewhere, the differences are more pronounced:

Version: 3.17.3, parse: 7917699648 ns/iter
Version: 3.17.3, parse: 7747212633 ns/iter
Version: 3.17.3, parse: 7861812523 ns/iter
Version: 3.17.3, parse: 7952817828 ns/iter
Version: 3.17.3, parse: 7829660125 ns/iter
Version: 3.18.0, parse: 9494568636 ns/iter
Version: 3.18.0, parse: 9405172989 ns/iter
Version: 3.18.0, parse: 9526017846 ns/iter
Version: 3.18.0, parse: 9433147154 ns/iter
Version: 3.18.0, parse: 9515509203 ns/iter

@haberman
Copy link
Member

I was able to reproduce this, thanks.

Here are profiles before and after. In 3.18.0 we have a costly DynamicMessage::MutableRaw() call showing up that we didn't see before. This accounts for upwards of 17% of the CPU, which closely matches the ~20% regression we are seeing.

3.17.3:

  21.69%  python   _message.cpython-39-x86_64-linux-gnu.so  [.] google::protobuf::DynamicMessage::SharedCtor                                                                              
  18.13%  python   _message.cpython-39-x86_64-linux-gnu.so  [.] google::protobuf::DynamicMessage::~DynamicMessage                                                                         
   3.85%  python   libc-2.31.so                             [.] malloc_consolidate                                                                                                        
   2.99%  python   libc-2.31.so                             [.] _int_malloc                                                                                                               
   2.49%  python   _message.cpython-39-x86_64-linux-gnu.so  [.] google::protobuf::internal::ArenaStringPtr::Destroy                                                                       
   2.23%  python   [kernel.kallsyms]                        [k] clear_page_erms                                                                                                           
   2.02%  python   _message.cpython-39-x86_64-linux-gnu.so  [.] 0x00000000000f6681                                                                                                        
   1.86%  python   libc-2.31.so                             [.] _int_free                                                                                                                 
   1.80%  python   _message.cpython-39-x86_64-linux-gnu.so  [.] google::protobuf::internal::WireFormat::_InternalParseAndMergeField  

3.18.0:

  17.57%  python   _message.cpython-39-x86_64-linux-gnu.so  [.] google::protobuf::DynamicMessage::SharedCtor                                                                              
  16.99%  python   _message.cpython-39-x86_64-linux-gnu.so  [.] google::protobuf::DynamicMessage::MutableRaw                                                                              
  15.36%  python   _message.cpython-39-x86_64-linux-gnu.so  [.] google::protobuf::DynamicMessage::~DynamicMessage                                                                         
   2.71%  python   _message.cpython-39-x86_64-linux-gnu.so  [.] google::protobuf::internal::ArenaStringPtr::Destroy                                                                       
   2.62%  python   libc-2.31.so                             [.] _int_malloc                                                                                                               
   2.26%  python   _message.cpython-39-x86_64-linux-gnu.so  [.] google::protobuf::Descriptor::FindFieldByNumber                                                                           
   1.97%  python   [kernel.kallsyms]                        [k] clear_page_erms                                                                                                           
   1.87%  python   libc-2.31.so                             [.] _int_free

@BenRKarl
Copy link
Author

@haberman I'm actually continuing to see regressions in our benchmark framework with protobuf 3.18.1 installed, and I see a pretty significant difference when running the script @jradcliff shared earlier, using the report_stream_response_L.pb file:

Version: 3.17.3, parse: 7848545372 ns/iter (7.84s)
Version: 3.17.3, parse: 7870161324 ns/iter (7.87s)
Version: 3.17.3, parse: 7793300010 ns/iter (7.79s)
Version: 3.17.3, parse: 7804926133 ns/iter (7.80s)
Version: 3.17.3, parse: 7732892461 ns/iter (7.73s)
Version: 3.17.3, parse: 7771179428 ns/iter (7.77s)
Version: 3.18.1, parse: 59520486019 ns/iter (59.5s)
Version: 3.18.1, parse: 59328161227 ns/iter (59.3s)
Version: 3.18.1, parse: 59072955409 ns/iter (59.1s)
Version: 3.18.1, parse: 59848669674 ns/iter (59.8s)
Version: 3.18.1, parse: 59406787236 ns/iter (59.4s)
Version: 3.18.1, parse: 59375098546 ns/iter (59.4s)

I'm running this script on a gLinux machine, using Python 3.7.0 and a new virtual environment where I switch between protobuf versions using python -m pip install protobuf==3.18.1. Is there perhaps something additional I need to do to activate the change merged here?

@BenRKarl
Copy link
Author

Seems like the problem might be that 3.18.1 defaults to the Python implementation instead of the cpp implementation. When I specify cpp explicitly with export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp I hit an error in descriptor.py

ImportError: cannot import name '_message' from 'google.protobuf.pyext'

If we can get this back to defaulting to cpp then the regression may be resolved.

@haberman
Copy link
Member

Yes I also had suspected that perhaps you were getting pure Python parsing.

If _message cannot be loaded, that would explain why it is defaulting to pure-Python. The library tests at startup whether the C++ extension can be loaded, and will fall back to pure-Python if loading it is not possible.

I wonder if the 3.18.1 release did not properly build and upload the C++ extension for some reason. I'll look into this.

@BenRKarl
Copy link
Author

If it helps, in my local .../site-packages/google/protobuf/pyext dir there is a _message.cpython-37m-x86_64-linux-gnu.so file when I have 3.17.3 installed but not when I have 3.18.1 installed.

@haberman
Copy link
Member

haberman commented Oct 18, 2021

Yes that would definitely explain it. It's odd though, because I just tried in a bare venv and it worked for me:

$ uname -o -m
x86_64 GNU/Linux
$ python -mvenv /tmp/venv
$ . /tmp/venv/bin/activate
(venv) $ pip install protobuf
Collecting protobuf
  Downloading protobuf-3.18.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
     |████████████████████████████████| 1.1 MB 8.2 MB/s 
Installing collected packages: protobuf
Successfully installed protobuf-3.18.1
(venv) $ pip show -f protobuf | grep so$
  google/protobuf/internal/_api_implementation.cpython-39-x86_64-linux-gnu.so
  google/protobuf/pyext/_message.cpython-39-x86_64-linux-gnu.so
(venv) $ python
Python 3.9.7 (default, Sep  3 2021, 06:18:44) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from google.protobuf.internal import api_implementation
>>> api_implementation.Type()
'cpp'

What do you get when running these commands?

@BenRKarl
Copy link
Author

BenRKarl commented Oct 18, 2021

Using the same virtual env I was using for testing earlier:

$ uname -o -m
x86_64 GNU/Linux
$ python -mvenv /tmp/venv
$ . /tmp/venv/bin/activate
(venv) $ pip install protobuf
Collecting protobuf
  Using cached https://files.pythonhosted.org/packages/ec/2e/309682e1fe979b17d6fec004e8107a953d1300b93c32c397a9c3ca60322e/protobuf-3.18.1-py2.py3-none-any.whl
Installing collected packages: protobuf
Successfully installed protobuf-3.18.1
You are using pip version 10.0.1, however version 21.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
(venv)  $ python
Python 3.7.0 (default, Aug 31 2020, 13:17:04) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from google.protobuf.internal import api_implementation
>>> api_implementation.Type()
'cpp'

So similar behavior here. I think the main difference is that I'm using python -m pip install protobuf==3.18.1 instead of just pip install protobuf but I'm not sure how that would make a difference. In our performance framework we do a python -m pip install . against a setup.py file and I'm assuming get the same issue of defaulting to the Python implementation, since we see regressions there.

@BenRKarl
Copy link
Author

I realized I had PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp set in my env when I ran those steps, if I unset PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION and try again the result is:

>>> from google.protobuf.internal import api_implementation
>>> api_implementation.Type()
'python'

@BenRKarl
Copy link
Author

@haberman I ran some additional tests with 3.19.0 and think that this performance issue is probably solely due to inconsistencies with defaulting to the python/cpp implementation. Here are some numbers below, which show good performance with protobuf 3.19.0 with the cpp implementation and poor performance with 3.17.3 with the python implementation:

Pb implementation: python, Protobuf version: 3.17.3, Python version: 3.10.0 (default, Oct 18 2021, 14:28:16) [GCC 10.3.0] Parse time: 60504116065 ns/iter
Pb implementation: python, Protobuf version: 3.17.3, Python version: 3.10.0 (default, Oct 18 2021, 14:28:16) [GCC 10.3.0] Parse time: 60594600953 ns/iter
Pb implementation: python, Protobuf version: 3.17.3, Python version: 3.10.0 (default, Oct 18 2021, 14:28:16) [GCC 10.3.0] Parse time: 61385578474 ns/iter
Pb implementation: python, Protobuf version: 3.17.3, Python version: 3.10.0 (default, Oct 18 2021, 14:28:16) [GCC 10.3.0] Parse time: 61258974252 ns/iter
Pb implementation: python, Protobuf version: 3.17.3, Python version: 3.10.0 (default, Oct 18 2021, 14:28:16) [GCC 10.3.0] Parse time: 61688387331 ns/iter

(test-pb-3.10) benkarl@benkarl:~/google/ads/client-libs/python/googleads/pb-performance$ for i in {1..5}; do python measure.py; done
Pb implementation: cpp, Protobuf version: 3.19.0, Python version: 3.10.0 (default, Oct 18 2021, 14:28:16) [GCC 10.3.0] Parse time: 7957506182 ns/iter
Pb implementation: cpp, Protobuf version: 3.19.0, Python version: 3.10.0 (default, Oct 18 2021, 14:28:16) [GCC 10.3.0] Parse time: 7925806072 ns/iter
Pb implementation: cpp, Protobuf version: 3.19.0, Python version: 3.10.0 (default, Oct 18 2021, 14:28:16) [GCC 10.3.0] Parse time: 7903559552 ns/iter
Pb implementation: cpp, Protobuf version: 3.19.0, Python version: 3.10.0 (default, Oct 18 2021, 14:28:16) [GCC 10.3.0] Parse time: 7903983959 ns/iter
Pb implementation: cpp, Protobuf version: 3.19.0, Python version: 3.10.0 (default, Oct 18 2021, 14:28:16) [GCC 10.3.0] Parse time: 7910140319 ns/iter

The problem is that I can't pin down the pattern the causes the library to default to one implementation or the other, I've tested different language versions, different installation patterns (i.e. python -m pip install protobuf==3.19.0 v. python -m pip install protobuf v. pip install protobuf), and setting/not setting the PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION and nothing yet seems to point to a cause.

@BenRKarl
Copy link
Author

One thing to add, manually specifying PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp, which should be the default, now often results in the error:

ImportError: cannot import name '_message' from 'google.protobuf.pyext'

So currently there's no reliable workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants