jinja2.FileSystemBytecodeCache race conditions #1654

dvitek · 2022-04-10T18:33:52Z

There a few race conditions in jinja2.FileSystemBytecodeCache, if multiple processes are using the same cache directory:

A reader can read a partially-written file.
There is a TOCTOU problem that can arise if the cache is cleared just after a reader checks for existence.
Two writers can write to the same file at once, which might end badly if they are writing different payloads, and the writes interleave poorly. E.g., this might happen if there are any unordered data structures being serialized.

At the end of this message, I provide a test case that demonstrates the issue, fixes the issue, and then tests the fix. The fixed code is unfortunately more complex than the current code.

More details:

A reader can read a partially-written cache file. Once in a while, with certain templates, this can cause marshal.load to fail in an undocumented manner:

Traceback (most recent call last):
  File "/u1/dvitek/tmp/fsbcc_concurrency_test.py", line 177, in exercise_cache_repeatedly
    exercise_cache(cache_impl)
  File "/u1/dvitek/tmp/fsbcc_concurrency_test.py", line 169, in exercise_cache
    template = env.get_template('test.txt') # This call interacts with the cache
  File "/usr/local/lib/python3.6/dist-packages/jinja2/environment.py", line 997, in get_template
    return self._load_template(name, globals)
  File "/usr/local/lib/python3.6/dist-packages/jinja2/environment.py", line 958, in _load_template
    template = self.loader.load(self, name, self.make_globals(globals))
  File "/usr/local/lib/python3.6/dist-packages/jinja2/loaders.py", line 131, in load
    bucket = bcc.get_bucket(environment, name, filename, source)
  File "/usr/local/lib/python3.6/dist-packages/jinja2/bccache.py", line 178, in get_bucket
    self.load_bytecode(bucket)
  File "/usr/local/lib/python3.6/dist-packages/jinja2/bccache.py", line 267, in load_bytecode
    bucket.load_bytecode(f)
  File "/usr/local/lib/python3.6/dist-packages/jinja2/bccache.py", line 77, in load_bytecode
    self.code = marshal.load(f)
SystemError: ../Objects/codeobject.c:131: bad argument to internal function

I think someone else may have run into this sort of thing before and they suppressed the documented failure modes using the except block here:

        # if marshal_load fails then we need to reload
        try:
            self.code = marshal.load(f)
        except (EOFError, ValueError, TypeError):
            self.reset()
            return

While one could adjust that to also catch SystemError, or even Exception, this code is starting to feel awfully fragile. Who is to say it will always detect a problem when the file ends prematurely? It could simply load the wrong value. There could be some remote chance a of a security issue here.

There is a TOCTOU problem in load_bytecode where it checks whether the file exists before opening it:

    def load_bytecode(self, bucket: Bucket) -> None:
        filename = self._get_cache_filename(bucket)

        if os.path.exists(filename):
            with open(filename, "rb") as f:
                bucket.load_bytecode(f)

If another process invokes the clear() method, the file can disappear after the existence check and before the open. Here is the exception that arises:

Traceback (most recent call last):
  File "/u1/dvitek/tmp/fsbcc_concurrency_test.py", line 177, in exercise_cache_repeatedly
    exercise_cache(cache_impl)
  File "/u1/dvitek/tmp/fsbcc_concurrency_test.py", line 169, in exercise_cache
    template = env.get_template('test.txt') # This call interacts with the cache
  File "/usr/local/lib/python3.6/dist-packages/jinja2/environment.py", line 997, in get_template
    return self._load_template(name, globals)
  File "/usr/local/lib/python3.6/dist-packages/jinja2/environment.py", line 958, in _load_template
    template = self.loader.load(self, name, self.make_globals(globals))
  File "/usr/local/lib/python3.6/dist-packages/jinja2/loaders.py", line 131, in load
    bucket = bcc.get_bucket(environment, name, filename, source)
  File "/usr/local/lib/python3.6/dist-packages/jinja2/bccache.py", line 178, in get_bucket
    self.load_bytecode(bucket)
  File "/usr/local/lib/python3.6/dist-packages/jinja2/bccache.py", line 266, in load_bytecode
    with open(filename, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'cache/__jinja2_124c221e97d76fb961a8a6bc364b938f3082f29c.cache'

Two writers can write to the same file at once, which might end badly if they are writing different payloads, and the writes interleave poorly. E.g., this might happen if there are any unordered data structures being serialized.

It's possible that the symptom in (1) is really arising from this.

Environment:

Python version: Reproduces with 3.10.1 and 3.6.8
Jinja version: 3.0.3

Here is a minimal test case that reproduces the problem and then tests a fix for the problem.

# Run me with:
#  python3 fsbcc_concurrency_test.py

# This is a torture test that forks off a large number of processes,
# all of which try to use the very same template in a FileSystemBytecodeCache.

# As of April 2022, jinja2.FileSystemBytecodeCache can raise these two exceptions:
#
# SystemError: ../Objects/codeobject.c:131: bad argument to internal function
#   File "/usr/local/lib/python3.6/dist-packages/jinja2/bccache.py", line 77, in load_bytecode
#     self.code = marshal.load(f)
#
# FileNotFoundError: [Errno 2] No such file or directory: 'cache/__jinja2_124c221e97d76fb961a8a6bc364b938f3082f29c.cache'
#   File "/usr/local/lib/python3.6/dist-packages/jinja2/bccache.py", line 266, in load_bytecode
#     with open(filename, "rb") as f:
#
#
# This test verifies that exceptions occur and then tests a fixed
# version to verify the fix.


# Imports for code being tested
import jinja2, tempfile, os

# Imports for testing infrastructure
import shutil, traceback, sys

# How many processes to fork()
JOBS = 100

# How many cache fetch/insert + clear iterations each process should do
ITERATIONS = 100

# The template that gets cached.  Not just any template will do.  I
# don't know exactly what the requirements are for causing the bug to
# occur, but this example seems sufficient:
TEST_TEMPLATE = '''{% extends "xxxxxxxxxxxxxx" %}

{% import "xxxxxxxxxxxxxxxxx" as xxxxxxxxxxx with context %}
{% import "xxxxxxxxxxxxxxxxxxxxxxxx" as xx with context %}
{% import "xxxxxxxxxxxxxxxxxxxxx" as xxxxxxxxxxxxxxx with context %}
{% import "xxxxxxxxxxxxxxxxxxxxxxxxxxx" as xxxxxxxxxxxxxxxxxxxxx with context %}
{% import "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" as xxx with context%}
{% import "yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy" as yyy with context %}

{%- block xxxxx -%}
    {{ xxxxxxxx.xxxxxxxx|escape }} : {{ xxxxxxx.xxxxxxxx|escape }}
{%- endblock -%}

{% block xxxxxxxxxxx %}
{{- xx.xxxxxxxxxxxxxxx(xxxxx,
                xxxxxxx,
                None,
                None,
                xxxxxxxx,
                'xxxxxxxxxxxxxxxxxxxxx'
                )
-}}
{% endblock %}

{% block xxxxxxxxxx %}
{% include "xxxxxxxxxxxxxxxxxxxx" %}
{% include "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" %}
{% include "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" %}
{% include "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" %}
{% include "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" %}
{% include "xxxxxxxxxxxxxxxxxxxxxxxxxx" %}
{% include "xxxxxxxxxxxxxxxxxxxxxxxx" %}
{% endblock %}

{% block xxxxxxx %}
{% endblock %}

{% block xxxxxxxxxxxx %} 
{% endblock %}
'''

# Perform an insert/fetch and then clear the cache
def exercise_cache(cache_impl):
    bcc = cache_impl('cache')
    env = jinja2.Environment(
        loader=jinja2.FileSystemLoader('test'),
        bytecode_cache=bcc,
        )
    try:
        template = env.get_template('test.txt') # This call interacts with the cache
    finally:
        bcc.clear()

def exercise_cache_repeatedly(cache_impl):
    code = 0
    for _ in range(ITERATIONS):
        try:
            exercise_cache(cache_impl)
        except Exception:
            code = 1
            traceback.print_exc()
    return code

def test(cache_impl):
    try:
        shutil.rmtree('test')
    except FileNotFoundError:
        pass
    try:
        shutil.rmtree('cache')
    except FileNotFoundError:
        pass
    os.mkdir('test')
    os.mkdir('cache')
    with open('test/test.txt', 'w') as f:
        f.write(TEST_TEMPLATE)

    # I was unable to repro the problem using threads (probably due to
    # GIL making the race less likely or impossible) but with multiple
    # processes we can hit it fast.
    for _ in range(JOBS):
        if os.fork() == 0:
            rc = 0
            rc = exercise_cache_repeatedly(cache_impl)
            sys.exit(rc)
    code = 0
    for _ in range(JOBS):
        if os.wait()[1]:
            code = 1
    return code

# A subclass that fixes the bug
class SafeFileSystemBytecodeCache(jinja2.FileSystemBytecodeCache):
    def load_bytecode(self, bucket):
        filename = self._get_cache_filename(bucket)

        # Don't test for existence before opening the file, since
        # doing so is racy: The file could appear after the test and
        # before the open.  Since we don't want to catch FileNotFound
        # in the event that load_bytecode raises it, we do not use
        # 'with'.
        try:
            f = open(filename, "rb")
        except FileNotFoundError:
            return
        try:
            bucket.load_bytecode(f)
        finally:
            f.close()

    def dump_bytecode(self, bucket):
        # Write the bytecode out to a temporary file, and then rename
        # the file once it is fully written, in order to avoid race
        # conditions where a reader could come along and read the file
        # when it is half-written, or multiple writers could
        # interleave writes to the same file.
        fn = self._get_cache_filename(bucket)

        # Name of the temporary file
        tempfn = None

        def clean():
            if tempfn:
                try:
                    os.remove(tempfn)
                except OSError:
                    # This can occur if another process called
                    # clear().  On Windows, this might occur for
                    # reasons beyond our control.  For example,
                    # another process such as the Windows Search
                    # indexer or an antivirus product may be holding
                    # the file open, preventing deletion.
                    pass


        # Another approach one might consider instead of
        # NamedTemporaryFile is to use open(..., 'xb') with a
        # predictable filename (that would be the same for different
        # processes).  However, there are a couple issues with that
        # approach.  One problem is that you could have an
        # interleaving where process A writes out the file and then
        # closes it.  Process B opens the file with the same name,
        # starts writing it, and then A renames the file part way
        # through B's write.  Process C comes along and reads the
        # renamed file before B's write finishes.  NamedTemporaryFile
        # uses O_CREAT along with an unpredictable filename to avoid
        # these issues.
        try:
            # FWIW, strace suggests there might be a bug in
            # NamedTemporaryFile where it uses O_RDWR instead of
            # O_WRONLY even though we have asked for mode='wb'
            with tempfile.NamedTemporaryFile(mode='wb', dir=os.path.dirname(fn),
                                             prefix=os.path.basename(fn), suffix='.tmp',
                                             delete=False) as f:
                tempfn = f.name
                bucket.write_bytecode(f)
        except:
            clean()
            raise


        try:
            # This rename operation might raise for a number of reasons:
            #  1) Another process has beaten us to the punch and
            #     created the file while we were writing our temporary
            #     file.
            #  2) On Windows, another (non-jinja-related) process
            #     might be holding either file open, blocking renames.
            #     For example, an antivirus product or search indexer
            #     process.
            #  3) Another process might clear() just before this rename.
            os.rename(tempfn, fn)
        except OSError:
            clean()
        except:
            clean()
            raise



if __name__ == '__main__':
    if test(jinja2.FileSystemBytecodeCache) == 0:
        raise Exception('jinja2.FileSystemBytecodeCache did not experience the bug')

    if test(SafeFileSystemBytecodeCache) != 0:
        raise Exception('SafeFileSystemBytecodeCache test failed')

The text was updated successfully, but these errors were encountered:

davidism · 2022-04-10T19:24:05Z

Happy to review a PR.

dvitek pushed a commit to dvitek/jinja that referenced this issue Apr 10, 2022

Fix pallets#1654 : race conditions in FileSystemBytecodeCache

61b2162

dvitek pushed a commit to dvitek/jinja that referenced this issue Apr 10, 2022

Fix pallets#1654 : race conditions in FileSystemBytecodeCache

2af85cb

dvitek pushed a commit to dvitek/jinja that referenced this issue Apr 10, 2022

Fix pallets#1654 : race conditions in FileSystemBytecodeCache

d8cd5fb

dvitek pushed a commit to dvitek/jinja that referenced this issue Apr 10, 2022

Fix race conditions in FileSystemBytecodeCache (pallets#1654)

5f8030d

dvitek mentioned this issue Apr 10, 2022

Fix Race conditions in FileSystemBytecodeCache #1655

Merged

5 tasks

davidism added this to the 3.1.2 milestone Apr 25, 2022

davidism linked a pull request Apr 25, 2022 that will close this issue

Fix Race conditions in FileSystemBytecodeCache #1655

Merged

5 tasks

davidism closed this as completed Apr 25, 2022

github-actions bot locked as resolved and limited conversation to collaborators May 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jinja2.FileSystemBytecodeCache race conditions #1654

jinja2.FileSystemBytecodeCache race conditions #1654

dvitek commented Apr 10, 2022

davidism commented Apr 10, 2022

jinja2.FileSystemBytecodeCache race conditions #1654

jinja2.FileSystemBytecodeCache race conditions #1654

Comments

dvitek commented Apr 10, 2022

davidism commented Apr 10, 2022