You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now there is no particularly fast way to construct a str or bytes object from component items, such as code points or characters. Using a list + join(), or StringIO are probably not fast enough. These are common things to do in libraries and low-level code.
We could add native str and bytes builder classes that could be quite fast. Hypothetical example with bytes:
b=BytesBuilder()
b.append(97) # or ord('a')b.append(98)
b.extend(b'cd') # Can also take other iterablesbb=b.bytes() # b'abcd'
Here are some ideas about how to make this fast:
Maintain a freelist of BytesBuilder objects, so we usually wouldn't need to allocate it from the heap (or somehow stack allocate it).
Maintain a short fixed-size internal buffer in the builder, so that we don't need to allocate a separate temporary buffer when building small bytes objects (which is likely very common). Allocate a larger buffer only when needed.
Inline append() and extend() calls, since we can assume these to be performance-critical.
We can have a similar builder class for str objects, but it needs to also keep track of how many bytes per character we need. Possibly it would support giving a hint about the maximum code point value at construction. This might resemble _PyUnicodeWriter, which is used in CPython.
The text was updated successfully, but these errors were encountered:
Right now there is no particularly fast way to construct a
str
orbytes
object from component items, such as code points or characters. Using a list +join()
, orStringIO
are probably not fast enough. These are common things to do in libraries and low-level code.We could add native
str
andbytes
builder classes that could be quite fast. Hypothetical example withbytes
:Here are some ideas about how to make this fast:
BytesBuilder
objects, so we usually wouldn't need to allocate it from the heap (or somehow stack allocate it).bytes
objects (which is likely very common). Allocate a larger buffer only when needed.append()
andextend()
calls, since we can assume these to be performance-critical.We can have a similar builder class for
str
objects, but it needs to also keep track of how many bytes per character we need. Possibly it would support giving a hint about the maximum code point value at construction. This might resemble_PyUnicodeWriter
, which is used in CPython.The text was updated successfully, but these errors were encountered: