Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Disable storing of task arguments #142

Closed
davocarli opened this issue Apr 13, 2020 · 16 comments
Closed

Question: Disable storing of task arguments #142

davocarli opened this issue Apr 13, 2020 · 16 comments

Comments

@davocarli
Copy link

davocarli commented Apr 13, 2020

Hi, apologies if this is documented somewhere (I can't find it), but is there a quick way not to store the arguments that were passed to the celery task? Some of my tasks take in potentially sensitive information, and I would prefer not to have a permanent record of them inside of my database (I'm also looking at encrypting this information, but would prefer not to store it regardless).

@carnil
Copy link

carnil commented Aug 12, 2020

It appears that this issue is associated with the CVE-2020-17495 CVE id.

@MartinFalatic
Copy link

Furthermore this is now being flagged by Pyup Safety (https://github.com/pyupio/safety-db):

╞══════════════════════════════════════════════════════════════════════════════╡
│ REPORT                                                                       │
│ checked 178 packages, using pyup.io's DB                                     │
╞════════════════════════════╤═══════════╤══════════════════════════╤══════════╡
│ package                    │ installed │ affected                 │ ID       │
╞════════════════════════════╧═══════════╧══════════════════════════╧══════════╡
│ django-celery-results      │ 1.2.1     │ <=1.2.1                  │ 38678    │
╞══════════════════════════════════════════════════════════════════════════════╡
│ Django-celery-results through 1.2.1 stores task results in the database.     │
│ Among the data it stores are the variables passed into the tasks. The        │
│ variables may contain sensitive cleartext information that does not belong   │
│ unencrypted in the database. See CVE-2020-17495.                             │
╘══════════════════════════════════════════════════════════════════════════════╛

@caleb15
Copy link

caleb15 commented Apr 16, 2021

@auvipy @thedrow is this fixed? If do you know if this will be fixed in the next version?

@auvipy
Copy link
Member

auvipy commented Oct 11, 2021

can you try the latest versions?

@auvipy auvipy closed this as completed Oct 11, 2021
@peterfarrell
Copy link
Contributor

Just to close the loop on this for those coming to this ticket, use the argsrepr and kwargsrepr support in Celery:

#154 (comment)

Django Celery Results uses these values when store results in the DB.

@auvipy auvipy pinned this issue Feb 23, 2022
@G-Rath
Copy link

G-Rath commented Mar 22, 2022

@auvipy I understand where people are coming from in terms of this ultimately being a thing for consumers to decide, but I think it's a reasonable ask for the default behaviour to be "more secure" meaning the "fix" for this CVE would be what's outlined in this comment.

Ultimately what I'm seeking is to be able to stick a patched version on the CVE as otherwise we have to handle this anytime we want to use this package in our applications.

I'm happy to help out with this where I can, though I'm not a Python developer so might require some help myself 😅

@auvipy
Copy link
Member

auvipy commented Mar 23, 2022

current version is not affected by CVE, version belo 1.2 were effected :)

@auvipy
Copy link
Member

auvipy commented Mar 23, 2022

and #154 (comment)

@G-Rath
Copy link

G-Rath commented Mar 23, 2022

@auvipy the comment we've both linked to says that the setting currently isn't respected - are you saying that's now been changed?

Would you be able help me find the commit(s) that fixed this CVE?

@G-Rath
Copy link

G-Rath commented Mar 23, 2022

@auvipy based on this comment, this CVE isn't fixed, it's just not accurately reflected in the safetydb.

The reason this is an issue for us is that it is present in the databases osv-detector pulls from, and it is a genuine security issue to have this enabled by default.

@auvipy
Copy link
Member

auvipy commented Mar 23, 2022

I am OK with any help

@G-Rath
Copy link

G-Rath commented Mar 23, 2022

Ill see what I can do, but as I said I'm very much not a Python Dev by trade so will likely need help.

It could be good to reopen this issue incase other Python developers are interested in helping.

@peterfarrell
Copy link
Contributor

@G-Rath You can set a repr on what is stored by celery results by setting the values in kwargsrepr. What you see in the celery results for the password is **** not the hashed_password.

app.send_task(
    "user.tasks.update_password_in_external_service",
     kwargs={"user_id": self.id, "password": hashed_password},
     kwargsrepr=repr({"user_id": self.id, "password": "****"}),
)

@G-Rath
Copy link

G-Rath commented Mar 23, 2022

@peterfarrell that's good to know, but not really relevant here since it has to be applied manually to known arguments so doesn't resolve the general security issue.


Having a brief look over code, it seems that celery handles this by using a TaskExtended class when result_extended is true, which captures name, rgs, kwargs, worker, retries, and queue (source)

At this point I assume these are the only "sensitive" fields being stored (well, really args and kwargs) and that they're not needed by this library in anyway (i.e. they're being stored purely for logging/debugging purposes), as otherwise kwargsrepr wouldn't be usable.

Looking over the code in the DatabaseBackend, it seems args and kwargs are set here:

# Get input arguments
if getattr(request, 'argsrepr', None) is not None:
# task protocol 2
task_args = request.argsrepr
else:
# task protocol 1
task_args = getattr(request, 'args', None)
if getattr(request, 'kwargsrepr', None) is not None:
# task protocol 2
task_kwargs = request.kwargsrepr
else:
# task protocol 1
task_kwargs = getattr(request, 'kwargs', None)
# Encode input arguments
if task_args is not None:
_, _, task_args = self.encode_content(task_args)
if task_kwargs is not None:
_, _, task_kwargs = self.encode_content(task_kwargs)

So hopefully that should mean fixing this should be a matter of having those assignments only occur if result_extended is True - otherwise just set them to None.

The next thing to figure out is how to know what result_extended is set to within that method - I'm hoping it'll be exposed somewhere on self, or maybe in __init__ 🤔

@peterfarrell
Copy link
Contributor

If the keys with sensitive data are not known, it wouldn't be hard to mask all values:

kwargs = {
    "random_a": "my_password",
    "random_b": "000-00-0000",  # social security number
}

app.send_task(
    "user.tasks.update_password_in_external_service",
     kwargs=kwargs,
    # Dynamically build masked dictionary
     kwargsrepr=repr({ k: "****" for k, v in kwargs.items() }),
)

@G-Rath
Copy link

G-Rath commented Jul 9, 2022

I think this has been resolved by #142 since the data is not stored by default anymore 🎉

I'll get the GHSA updated shortly - it's also probably worth making sure the documentation mentions this for users that are considering enabling the option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants