Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to serialize protos on re-run when proto2 syntax file has doubles with default values #16597

Open
john-altitude opened this issue Apr 22, 2024 · 4 comments

Comments

@john-altitude
Copy link

What version of protobuf and what language are you using?
Version: libprotoc 26.1, protobuf==5.26.1
Language: Python
Operating System: Ubuntu 20.04.6
Python version 3.8

What did you do?

  1. Write a .proto file with an optional double field that has a double default value using proto2 syntax
  2. Cause the generated _pb2 file to be imported more than once, for example using importlib to reload the _pb2 module, or import the same module with different import paths, or re-run in the context of a streamlit application, etc.

What did you expect to see
No error should happen, protobuf should recognize that the file being imported and serialized is already in the descriptor pool and should continue without error.

What did you see instead?
The following error message occurs when the AddSerializedFile function is run in the _pb2 file:

  File "streamlit_script.py", line 5, in <module>
    importlib.reload(class_a_pb2)
  File "/usr/lib/python3.8/importlib/__init__.py", line 169, in reload
    _bootstrap._exec(spec, module)
  File "<frozen importlib._bootstrap>", line 604, in _exec
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/john/one/py/experimental/john/protobuf_testing/test_protogen/class_a_pb2.py", line 17, in <module>
    DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n\rclass_a.proto\x12\tjohn_test\"\x18\n\x06Sample\x12\x0e\n\x01\x61\x18\x01 \x01(\x01:\x03\x31.1')
TypeError: Couldn't build proto file into descriptor pool: duplicate file name class_a.proto

I've attached a .proto and the corresponding _pb2.py file and python script to reproduce this bug. If the default value for Sample.a is set to 1.1, for example, the script will fail, while setting it to 1 causes no errors.

This bug is especially difficult to deal with because I am running an application through streamlit, which causes re-imports to trigger frequently.

.proto file

syntax = "proto2";

package test;

message Sample {
  optional double a = 1 [default = 1.1];
}

class_a_pb2.py

# -*- coding: utf-8 -*-
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# source: class_a.proto
# Protobuf Python Version: 5.26.1
"""Generated protocol buffer code."""
from google.protobuf import descriptor as _descriptor
from google.protobuf import descriptor_pool as _descriptor_pool
from google.protobuf import symbol_database as _symbol_database
from google.protobuf.internal import builder as _builder
# @@protoc_insertion_point(imports)

_sym_db = _symbol_database.Default()


DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n\rclass_a.proto\x12\x04test\"\x18\n\x06Sample\x12\x0e\n\x01\x61\x18\x01 \x01(\x01:\x03\x31.1')

_globals = globals()
_builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, _globals)
_builder.BuildTopDescriptorsAndMessages(DESCRIPTOR, 'class_a_pb2', _globals)
if not _descriptor._USE_C_DESCRIPTORS:
  DESCRIPTOR._loaded_options = None
  _globals['_SAMPLE']._serialized_start=23
  _globals['_SAMPLE']._serialized_end=47
# @@protoc_insertion_point(module_scope)

Example Script

import sys
import importlib
sys.path.append('/home/john/one/py/experimental/john/protobuf_testing/test_protogen')
import class_a_pb2
importlib.reload(class_a_pb2)
@john-altitude john-altitude added the untriaged auto added to all issues by default when created. label Apr 22, 2024
@zhangskz zhangskz added python investigation 26.x and removed untriaged auto added to all issues by default when created. labels Apr 24, 2024
@zhangskz
Copy link
Member

@john-altitude Is this only in v26.1, or is this error present in prior releases well?

@anandolee Jie, can you take a look and let me know if this needs a fix / patch?

@john-altitude
Copy link
Author

I'm not certain about prior releases, but this error was NOT present with libprotoc 3.6.1 and protobuf package 3.19.3. I only discovered this when updating last week but I haven't investigated other prior releases.

@zhangskz
Copy link
Member

Its likely this is due to the change in default python implementation to Python UPB in v21.x. Can you verify if you still hit this error using the Python C++ implementation by setting PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp per https://protobuf.dev/reference/python/python-generated/#sharing-messages?

We should have handling for duplicate adds of the same file in descriptor_pool.c, but it looks like this is being treated as not a match in def_pool.c.

@john-altitude
Copy link
Author

john-altitude commented Apr 24, 2024

Can you verify if you still hit this error using the Python C++ implementation by setting PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp per https://protobuf.dev/reference/python/python-generated/#sharing-messages?

Sorry but can you verify, I would enable this by building protobuf from source with the --cpp_implementation flag then just setting the env variable when I actually run protoc, right? Or is the cpp implementation included with the default PyPI version when pip installed now? I'm having a hard time finding any recent install instructions that mention the cpp implementation and I'm a bit out of my depth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants