Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only allow webserver to request from the worker log server #16754

Merged
merged 2 commits into from Jul 1, 2021

Conversation

ashb
Copy link
Member

@ashb ashb commented Jul 1, 2021

Logs shouldn't contain any sensitive info, but they often do by
mistake. As an extra level of protection we shouldn't allow anything
other than the webserver to access the logs.

(We can't change the bind IP form 0.0.0.0 as for it to be useful it
needs to be accessed from different hosts -- i.e. the webserver will
almost always be on a different node)


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

Logs _shouldn't_ contain any sensitive info, but they often do by
mistake. As an extra level of protection we shouldn't allow anything
other than the webserver to access the logs.

(We can't change the bind IP form 0.0.0.0 as for it to be useful it
needs to be accessed from different hosts -- i.e. the webserver will
almost always be on a different node)
@ashb ashb added this to the Airflow 2.1.2 milestone Jul 1, 2021
@ashb ashb requested review from potiuk and mik-laj July 1, 2021 14:20
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@github-actions
Copy link

github-actions bot commented Jul 1, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jul 1, 2021
@ashb
Copy link
Member Author

ashb commented Jul 1, 2021

All passed bar upload coverage -- merging.

@ashb ashb merged commit 2285ee9 into apache:main Jul 1, 2021
@ashb ashb deleted the protect-logs-server branch July 1, 2021 15:20
jhtimmins pushed a commit that referenced this pull request Jul 6, 2021
Logs _shouldn't_ contain any sensitive info, but they often do by
mistake. As an extra level of protection we shouldn't allow anything
other than the webserver to access the logs.

(We can't change the bind IP form 0.0.0.0 as for it to be useful it
needs to be accessed from different hosts -- i.e. the webserver will
almost always be on a different node)

(cherry picked from commit 2285ee9)
kaxil pushed a commit to astronomer/airflow that referenced this pull request Jul 6, 2021
)

Logs _shouldn't_ contain any sensitive info, but they often do by
mistake. As an extra level of protection we shouldn't allow anything
other than the webserver to access the logs.

(We can't change the bind IP form 0.0.0.0 as for it to be useful it
needs to be accessed from different hosts -- i.e. the webserver will
almost always be on a different node)

(cherry picked from commit 2285ee9)
(cherry picked from commit d4a09f25a186223c104f7480d78c599b14481746)
ashb added a commit to astronomer/airflow that referenced this pull request Jul 6, 2021
)

Logs _shouldn't_ contain any sensitive info, but they often do by
mistake. As an extra level of protection we shouldn't allow anything
other than the webserver to access the logs.

(We can't change the bind IP form 0.0.0.0 as for it to be useful it
needs to be accessed from different hosts -- i.e. the webserver will
almost always be on a different node)

(cherry picked from commit 2285ee9)
(cherry picked from commit d4a09f25a186223c104f7480d78c599b14481746)
ashb added a commit to astronomer/airflow that referenced this pull request Jul 6, 2021
)

Logs _shouldn't_ contain any sensitive info, but they often do by
mistake. As an extra level of protection we shouldn't allow anything
other than the webserver to access the logs.

(We can't change the bind IP form 0.0.0.0 as for it to be useful it
needs to be accessed from different hosts -- i.e. the webserver will
almost always be on a different node)

(cherry picked from commit 2285ee9)
(cherry picked from commit d4a09f25a186223c104f7480d78c599b14481746)
ashb added a commit to astronomer/airflow that referenced this pull request Jul 6, 2021
)

Logs _shouldn't_ contain any sensitive info, but they often do by
mistake. As an extra level of protection we shouldn't allow anything
other than the webserver to access the logs.

(We can't change the bind IP form 0.0.0.0 as for it to be useful it
needs to be accessed from different hosts -- i.e. the webserver will
almost always be on a different node)

(cherry picked from commit 2285ee9)
(cherry picked from commit d4a09f25a186223c104f7480d78c599b14481746)
ashb added a commit to astronomer/airflow that referenced this pull request Jul 6, 2021
)

Logs _shouldn't_ contain any sensitive info, but they often do by
mistake. As an extra level of protection we shouldn't allow anything
other than the webserver to access the logs.

(We can't change the bind IP form 0.0.0.0 as for it to be useful it
needs to be accessed from different hosts -- i.e. the webserver will
almost always be on a different node)

(cherry picked from commit 2285ee9)
(cherry picked from commit d4a09f25a186223c104f7480d78c599b14481746)
ashb added a commit to astronomer/airflow that referenced this pull request Jul 6, 2021
)

Logs _shouldn't_ contain any sensitive info, but they often do by
mistake. As an extra level of protection we shouldn't allow anything
other than the webserver to access the logs.

(We can't change the bind IP form 0.0.0.0 as for it to be useful it
needs to be accessed from different hosts -- i.e. the webserver will
almost always be on a different node)

(cherry picked from commit 2285ee9)
(cherry picked from commit d4a09f25a186223c104f7480d78c599b14481746)
ashb added a commit that referenced this pull request Jul 6, 2021
Logs _shouldn't_ contain any sensitive info, but they often do by
mistake. As an extra level of protection we shouldn't allow anything
other than the webserver to access the logs.

(We can't change the bind IP form 0.0.0.0 as for it to be useful it
needs to be accessed from different hosts -- i.e. the webserver will
almost always be on a different node)

(cherry picked from commit 2285ee9)
kaxil pushed a commit to astronomer/airflow that referenced this pull request Jul 7, 2021
)

Logs _shouldn't_ contain any sensitive info, but they often do by
mistake. As an extra level of protection we shouldn't allow anything
other than the webserver to access the logs.

(We can't change the bind IP form 0.0.0.0 as for it to be useful it
needs to be accessed from different hosts -- i.e. the webserver will
almost always be on a different node)

(cherry picked from commit 2285ee9)
@ashb ashb modified the milestones: Airflow 2.1.2, Airflow 2.1.3 Jul 7, 2021
@wanderijames
Copy link

wanderijames commented Jul 15, 2021

@ashb this could have broken worker logs access from the web ui. A pod running webui cannot access logs from the worker in another pod. This can be reproduced if deployed in Kubernetes. Getting the following:

*** Fetching from: https://worker.worker-svc.default.svc.cluster.local:8793/log/<dag>/<task>/2021-07-15T11:51:59.190528+00:00/1.log
*** Failed to fetch log file from worker. 403 Client Error: FORBIDDEN for url: https://worker.worker-svc.default.svc.cluster.local:8793/log/<dag>/<task>/2021-07-15T11:51:59.190528+00:00/1.log
For more information check: https://httpstatuses.com/403

@shankarcb
Copy link

This push has introduced another bug of 403 client error when DAG logs are fetched from Airflow WebUI. This bug appears to impact when Airflow is setup in cluster mode. We have setup Airflow in cluster mode with Celery - One master and multiple worker nodes. [Not Celery kubernetes]
Error message was that secret_key is not matching which was not the case since identical copies of airflow.cfg was used in master and worker nodes.

After increasing the default fallback value to 120 seconds in below code snippet, this has fixed the 403 client error issue.
We added "log_request_clock_grace" under [webserver] section on airflow.cfg to override this value which did not work. We had to manually change the code in airflow installation to fix the issue. However, this is not recommended/permanent solution. Please fix this in future releases.

def create_app():
flask_app = Flask(name, static_folder=None)
max_request_age = conf.getint('webserver', 'log_request_clock_grace', fallback=30)

Code: https://github.com/apache/airflow/blob/main/airflow/utils/serve_logs.py

@potiuk
Copy link
Member

potiuk commented Nov 8, 2021

This is not the real propblem but your deployment issue. Your webserver and workers simply do not have time properly synchronized. Please make sure they do (for example with ntp).

@shankarcb
Copy link

This is not the real propblem but your deployment issue. Your webserver and workers simply do not have time properly synchronized. Please make sure they do (for example with ntp).

Initially I did think the same and made sure they are synced which did not help. They are having only 2 second delay between the nodes.

@potiuk
Copy link
Member

potiuk commented Nov 8, 2021

I actually think it's not the change you applied mto the code but likely redeploying the code or restarting the machine fixed the issue. Almost 100% sure that you had time drift.

The fact that you have 2 seconds now, means that something is wrong and probably the difference will grow. When you have NTP synchronised, both machines should have exactly the same time down to millisecond.

You can check it be changing the value back to 30 - as long as the time drift is smaller it should still work.

@shankarcb
Copy link

Airflow library is installed on a separate python virtual environment with required Airflow python libraries installed there.. we updated the serve_logs.py in site-packages directory of that virtual environment which is picked up by Airflow process to resolve the issue..Machine reboot was not performed.

All the machines are synced with same RHEL NTP servers.

This impacted multiple deployments of Airflow..The other deployment did not have any time lag/drift..

What is the significance of log_request_clock_grace parameter in the code? Can this be configured in airflow.cfg?
We are fine with this solution as long as we dont need to manually change the airflow utils code after each upgrade of Airflow

@potiuk
Copy link
Member

potiuk commented Nov 8, 2021

So likely you had another issue - possibly not restarted airlfow that used different configuration. Yes. This parameter should change grace period and yes it should work if you restart airflow. so you should look in your process/deployment not in airflow.

leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Mar 10, 2022
Some users seem to be using different configs for different components. This doc makes it clear on what we recommend.

Context: apache/airflow#16754 (comment)

>I'm new to Airflow and I haven't seen an explicit part within the docs that says "all configuration must be shared between airflow components". In fact, all configuration keys are prefixed with some naming scheme which led me to think that some are common, some are for webserver etc... I personally don't like to have common configuration because changing a single setting requires a change in all components (hence restart) but I get that this is intentional for Airflow. I think this can be better communicated throughout the docs.

GitOrigin-RevId: 3a263ab5b0b8b8b796d96f060e6ccd4a0e379bf3
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Mar 10, 2022
After apache/airflow#16754 -- it is important that both Webserver and Worker have the same config value for `[webserver] secret_key` or else you will see the following error:

```
*** Fetching from: https://worker.worker-svc.default.svc.cluster.local:8793/log/<dag>/<task>/2021-07-15T11:51:59.190528+00:00/1.log
*** Failed to fetch log file from worker. 403 Client Error: FORBIDDEN for url: https://worker.worker-svc.default.svc.cluster.local:8793/log/<dag>/<task>/2021-07-15T11:51:59.190528+00:00/1.log
For more information check: https://httpstatuses.com/403
```

This happens because Airflow generates a random value for them if value isn't provided, which causes a random string generated on webserver and worker. Hence they don't match, resulting in the error.

This PR creates a K8s Secret object and creates a key for that setting and pass it as Env Var similar to what we do with Fernet Key.

GitOrigin-RevId: 7842de0ff124dd6a2696ec82bf6423455164df5b
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jun 4, 2022
Some users seem to be using different configs for different components. This doc makes it clear on what we recommend.

Context: apache/airflow#16754 (comment)

>I'm new to Airflow and I haven't seen an explicit part within the docs that says "all configuration must be shared between airflow components". In fact, all configuration keys are prefixed with some naming scheme which led me to think that some are common, some are for webserver etc... I personally don't like to have common configuration because changing a single setting requires a change in all components (hence restart) but I get that this is intentional for Airflow. I think this can be better communicated throughout the docs.

GitOrigin-RevId: 3a263ab5b0b8b8b796d96f060e6ccd4a0e379bf3
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jun 4, 2022
After apache/airflow#16754 -- it is important that both Webserver and Worker have the same config value for `[webserver] secret_key` or else you will see the following error:

```
*** Fetching from: https://worker.worker-svc.default.svc.cluster.local:8793/log/<dag>/<task>/2021-07-15T11:51:59.190528+00:00/1.log
*** Failed to fetch log file from worker. 403 Client Error: FORBIDDEN for url: https://worker.worker-svc.default.svc.cluster.local:8793/log/<dag>/<task>/2021-07-15T11:51:59.190528+00:00/1.log
For more information check: https://httpstatuses.com/403
```

This happens because Airflow generates a random value for them if value isn't provided, which causes a random string generated on webserver and worker. Hence they don't match, resulting in the error.

This PR creates a K8s Secret object and creates a key for that setting and pass it as Env Var similar to what we do with Fernet Key.

GitOrigin-RevId: 7842de0ff124dd6a2696ec82bf6423455164df5b
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jul 10, 2022
Some users seem to be using different configs for different components. This doc makes it clear on what we recommend.

Context: apache/airflow#16754 (comment)

>I'm new to Airflow and I haven't seen an explicit part within the docs that says "all configuration must be shared between airflow components". In fact, all configuration keys are prefixed with some naming scheme which led me to think that some are common, some are for webserver etc... I personally don't like to have common configuration because changing a single setting requires a change in all components (hence restart) but I get that this is intentional for Airflow. I think this can be better communicated throughout the docs.

GitOrigin-RevId: 3a263ab5b0b8b8b796d96f060e6ccd4a0e379bf3
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jul 10, 2022
After apache/airflow#16754 -- it is important that both Webserver and Worker have the same config value for `[webserver] secret_key` or else you will see the following error:

```
*** Fetching from: https://worker.worker-svc.default.svc.cluster.local:8793/log/<dag>/<task>/2021-07-15T11:51:59.190528+00:00/1.log
*** Failed to fetch log file from worker. 403 Client Error: FORBIDDEN for url: https://worker.worker-svc.default.svc.cluster.local:8793/log/<dag>/<task>/2021-07-15T11:51:59.190528+00:00/1.log
For more information check: https://httpstatuses.com/403
```

This happens because Airflow generates a random value for them if value isn't provided, which causes a random string generated on webserver and worker. Hence they don't match, resulting in the error.

This PR creates a K8s Secret object and creates a key for that setting and pass it as Env Var similar to what we do with Fernet Key.

GitOrigin-RevId: 7842de0ff124dd6a2696ec82bf6423455164df5b
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Aug 27, 2022
Some users seem to be using different configs for different components. This doc makes it clear on what we recommend.

Context: apache/airflow#16754 (comment)

>I'm new to Airflow and I haven't seen an explicit part within the docs that says "all configuration must be shared between airflow components". In fact, all configuration keys are prefixed with some naming scheme which led me to think that some are common, some are for webserver etc... I personally don't like to have common configuration because changing a single setting requires a change in all components (hence restart) but I get that this is intentional for Airflow. I think this can be better communicated throughout the docs.

GitOrigin-RevId: 3a263ab5b0b8b8b796d96f060e6ccd4a0e379bf3
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Aug 27, 2022
After apache/airflow#16754 -- it is important that both Webserver and Worker have the same config value for `[webserver] secret_key` or else you will see the following error:

```
*** Fetching from: https://worker.worker-svc.default.svc.cluster.local:8793/log/<dag>/<task>/2021-07-15T11:51:59.190528+00:00/1.log
*** Failed to fetch log file from worker. 403 Client Error: FORBIDDEN for url: https://worker.worker-svc.default.svc.cluster.local:8793/log/<dag>/<task>/2021-07-15T11:51:59.190528+00:00/1.log
For more information check: https://httpstatuses.com/403
```

This happens because Airflow generates a random value for them if value isn't provided, which causes a random string generated on webserver and worker. Hence they don't match, resulting in the error.

This PR creates a K8s Secret object and creates a key for that setting and pass it as Env Var similar to what we do with Fernet Key.

GitOrigin-RevId: 7842de0ff124dd6a2696ec82bf6423455164df5b
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Oct 4, 2022
Some users seem to be using different configs for different components. This doc makes it clear on what we recommend.

Context: apache/airflow#16754 (comment)

>I'm new to Airflow and I haven't seen an explicit part within the docs that says "all configuration must be shared between airflow components". In fact, all configuration keys are prefixed with some naming scheme which led me to think that some are common, some are for webserver etc... I personally don't like to have common configuration because changing a single setting requires a change in all components (hence restart) but I get that this is intentional for Airflow. I think this can be better communicated throughout the docs.

GitOrigin-RevId: 3a263ab5b0b8b8b796d96f060e6ccd4a0e379bf3
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Oct 4, 2022
After apache/airflow#16754 -- it is important that both Webserver and Worker have the same config value for `[webserver] secret_key` or else you will see the following error:

```
*** Fetching from: https://worker.worker-svc.default.svc.cluster.local:8793/log/<dag>/<task>/2021-07-15T11:51:59.190528+00:00/1.log
*** Failed to fetch log file from worker. 403 Client Error: FORBIDDEN for url: https://worker.worker-svc.default.svc.cluster.local:8793/log/<dag>/<task>/2021-07-15T11:51:59.190528+00:00/1.log
For more information check: https://httpstatuses.com/403
```

This happens because Airflow generates a random value for them if value isn't provided, which causes a random string generated on webserver and worker. Hence they don't match, resulting in the error.

This PR creates a K8s Secret object and creates a key for that setting and pass it as Env Var similar to what we do with Fernet Key.

GitOrigin-RevId: 7842de0ff124dd6a2696ec82bf6423455164df5b
aglipska pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Oct 7, 2022
Some users seem to be using different configs for different components. This doc makes it clear on what we recommend.

Context: apache/airflow#16754 (comment)

>I'm new to Airflow and I haven't seen an explicit part within the docs that says "all configuration must be shared between airflow components". In fact, all configuration keys are prefixed with some naming scheme which led me to think that some are common, some are for webserver etc... I personally don't like to have common configuration because changing a single setting requires a change in all components (hence restart) but I get that this is intentional for Airflow. I think this can be better communicated throughout the docs.

GitOrigin-RevId: 3a263ab5b0b8b8b796d96f060e6ccd4a0e379bf3
aglipska pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Oct 7, 2022
After apache/airflow#16754 -- it is important that both Webserver and Worker have the same config value for `[webserver] secret_key` or else you will see the following error:

```
*** Fetching from: https://worker.worker-svc.default.svc.cluster.local:8793/log/<dag>/<task>/2021-07-15T11:51:59.190528+00:00/1.log
*** Failed to fetch log file from worker. 403 Client Error: FORBIDDEN for url: https://worker.worker-svc.default.svc.cluster.local:8793/log/<dag>/<task>/2021-07-15T11:51:59.190528+00:00/1.log
For more information check: https://httpstatuses.com/403
```

This happens because Airflow generates a random value for them if value isn't provided, which causes a random string generated on webserver and worker. Hence they don't match, resulting in the error.

This PR creates a K8s Secret object and creates a key for that setting and pass it as Env Var similar to what we do with Fernet Key.

GitOrigin-RevId: 7842de0ff124dd6a2696ec82bf6423455164df5b
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Dec 7, 2022
Some users seem to be using different configs for different components. This doc makes it clear on what we recommend.

Context: apache/airflow#16754 (comment)

>I'm new to Airflow and I haven't seen an explicit part within the docs that says "all configuration must be shared between airflow components". In fact, all configuration keys are prefixed with some naming scheme which led me to think that some are common, some are for webserver etc... I personally don't like to have common configuration because changing a single setting requires a change in all components (hence restart) but I get that this is intentional for Airflow. I think this can be better communicated throughout the docs.

GitOrigin-RevId: 3a263ab5b0b8b8b796d96f060e6ccd4a0e379bf3
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Dec 7, 2022
After apache/airflow#16754 -- it is important that both Webserver and Worker have the same config value for `[webserver] secret_key` or else you will see the following error:

```
*** Fetching from: https://worker.worker-svc.default.svc.cluster.local:8793/log/<dag>/<task>/2021-07-15T11:51:59.190528+00:00/1.log
*** Failed to fetch log file from worker. 403 Client Error: FORBIDDEN for url: https://worker.worker-svc.default.svc.cluster.local:8793/log/<dag>/<task>/2021-07-15T11:51:59.190528+00:00/1.log
For more information check: https://httpstatuses.com/403
```

This happens because Airflow generates a random value for them if value isn't provided, which causes a random string generated on webserver and worker. Hence they don't match, resulting in the error.

This PR creates a K8s Secret object and creates a key for that setting and pass it as Env Var similar to what we do with Fernet Key.

GitOrigin-RevId: 7842de0ff124dd6a2696ec82bf6423455164df5b
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jan 27, 2023
Some users seem to be using different configs for different components. This doc makes it clear on what we recommend.

Context: apache/airflow#16754 (comment)

>I'm new to Airflow and I haven't seen an explicit part within the docs that says "all configuration must be shared between airflow components". In fact, all configuration keys are prefixed with some naming scheme which led me to think that some are common, some are for webserver etc... I personally don't like to have common configuration because changing a single setting requires a change in all components (hence restart) but I get that this is intentional for Airflow. I think this can be better communicated throughout the docs.

GitOrigin-RevId: 3a263ab5b0b8b8b796d96f060e6ccd4a0e379bf3
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jan 27, 2023
After apache/airflow#16754 -- it is important that both Webserver and Worker have the same config value for `[webserver] secret_key` or else you will see the following error:

```
*** Fetching from: https://worker.worker-svc.default.svc.cluster.local:8793/log/<dag>/<task>/2021-07-15T11:51:59.190528+00:00/1.log
*** Failed to fetch log file from worker. 403 Client Error: FORBIDDEN for url: https://worker.worker-svc.default.svc.cluster.local:8793/log/<dag>/<task>/2021-07-15T11:51:59.190528+00:00/1.log
For more information check: https://httpstatuses.com/403
```

This happens because Airflow generates a random value for them if value isn't provided, which causes a random string generated on webserver and worker. Hence they don't match, resulting in the error.

This PR creates a K8s Secret object and creates a key for that setting and pass it as Env Var similar to what we do with Fernet Key.

GitOrigin-RevId: 7842de0ff124dd6a2696ec82bf6423455164df5b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logging full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet