Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify Nanny environment variable precedence #5204

Merged
merged 1 commit into from
Aug 12, 2021

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Aug 12, 2021

Using keyword arguments for the nanny environ would overwrite the entire dask.config. I don't think this is intended and this fixes it by creating a soft copy.

I added a test and documentation about the precedence of configuration options

@quasiben
Copy link
Member

Thanks @fjetter for the updates and doc changes as well

@quasiben quasiben merged commit 2298861 into dask:main Aug 12, 2021
@jrbourbeau
Copy link
Member

Thanks for the added test and documentation @fjetter. Both look totally reasonable to me.

Using keyword arguments for the nanny environ would overwrite the entire dask.config

I'm sure I'm just missing something, but I don't see how this was happening before. Could you elaborate a bit on how the soft copy helps avoid this?

@fjetter
Copy link
Member Author

fjetter commented Aug 12, 2021

@jrbourbeau This test asserts the "wrong" behaviour and was green on main before this merge.

# git checkout 0528d8dc56dd42ef0049ee8293d240acdf909bd7

@gen_cluster(nthreads=[], client=True)
async def test_environment_variable_dont_overwrite(c, s):

    with dask.config.set({"distributed.nanny.environ": {"FOO": "456"}}):

        a = Nanny(s.address, loop=s.loop, memory_limit=0, env={"FOO": "123"})
        b = Nanny(s.address, loop=s.loop, memory_limit=0)

        await asyncio.gather(a, b)
        results = await c.run(lambda: os.environ["FOO"])
        assert results == {
            a.worker_address: "123",
            b.worker_address: "123",  # OOPS
            # b.worker_address: "456", # This should be correct
        }

        # The env kwarg should *never* overwrite the dask config
        assert dask.config.get("distributed.nanny.environ") == {"FOO": "123"}  # Ouch
        await asyncio.gather(a.close(), b.close())

The soft copy helps because the value in distributed.nanny.environ is actually just a dict and is passed on by reference, i.e. the following lines would modify the dict in dask.config

self.env[k] = os.environ[k]
if env:
self.env.update(env)

by assigning a soft copy of the dict to self.env we're protected

@jrbourbeau
Copy link
Member

Ah, I see. I missed the self.env.update(env) code path. Thanks for the clarification @fjetter!

@fjetter fjetter deleted the nanny_envs_precedence branch August 12, 2021 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants