Fast str and bytes builders #1036

JukkaL · 2023-11-26T13:14:14Z

Right now there is no particularly fast way to construct a str or bytes object from component items, such as code points or characters. Using a list + join(), or StringIO are probably not fast enough. These are common things to do in libraries and low-level code.

We could add native str and bytes builder classes that could be quite fast. Hypothetical example with bytes:

b = BytesBuilder()
b.append(97)  # or ord('a')
b.append(98)
b.extend(b'cd')  # Can also take other iterables
bb = b.bytes()  # b'abcd'

Here are some ideas about how to make this fast:

Maintain a freelist of BytesBuilder objects, so we usually wouldn't need to allocate it from the heap (or somehow stack allocate it).
Maintain a short fixed-size internal buffer in the builder, so that we don't need to allocate a separate temporary buffer when building small bytes objects (which is likely very common). Allocate a larger buffer only when needed.
Inline append() and extend() calls, since we can assume these to be performance-critical.

We can have a similar builder class for str objects, but it needs to also keep track of how many bytes per character we need. Possibly it would support giving a hint about the maximum code point value at construction. This might resemble _PyUnicodeWriter, which is used in CPython.

The text was updated successfully, but these errors were encountered:

JukkaL mentioned this issue Nov 26, 2023

Fast access to item data data of str and bytes objects #1037

Open

JukkaL added the speed label Nov 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast str and bytes builders #1036

Fast str and bytes builders #1036

JukkaL commented Nov 26, 2023 •

edited

Fast str and bytes builders #1036

Fast str and bytes builders #1036

Comments

JukkaL commented Nov 26, 2023 • edited

JukkaL commented Nov 26, 2023 •

edited