Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support remote logging in elasticsearch with filebeat 7 #14625

Merged
merged 4 commits into from Jun 11, 2021
Merged

Support remote logging in elasticsearch with filebeat 7 #14625

merged 4 commits into from Jun 11, 2021

Conversation

jedcunningham
Copy link
Member

closes: #13755

This adds support to choose different fields for host and offset. Filebeat 7 changed the defaults, and while one could probably move them back, making it configurable makes more sense.

https://www.elastic.co/guide/en/beats/libbeat/current/breaking-changes-7.0.html

offset -> log.offset
host.name -> host.hostname (And based on #13755, something also uses a str host, not sure what though)

This still needs test coverage.

@mik-laj
Copy link
Member

mik-laj commented Mar 5, 2021

I think this is a very internal details and the user should not configure it explicitly. If possible, we should check which version of the service we are connecting to and act accordingly. When this is not possible, we recommend introducing a more descriptive configuration option, e.g. use_filebeat_7_headers or something similar.

Also, we should update the documentation and describe these configuration options.
http://apache-airflow-docs.s3-website.eu-central-1.amazonaws.com/docs/apache-airflow-providers-elasticsearch/latest/logging.html

@jedcunningham
Copy link
Member Author

If anything I think we may want to go the other way and make it more configurable as filebeats is just one way to get logs into elasticsearch. I feel like we should try and be agnostic especially since Airflow isn't directly involved with getting the logs there in the first place.

Thanks for the feedback though @mik-laj, I'll chew on this a bit more.

@kaxil kaxil added this to the Airflow 2.1 milestone Mar 18, 2021
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM -- I think given there are so many ways of getting logs in to ES (versions of filebeat, logstash, graylog, vector.dev etc) that we should improve the docs and say what we required, and leave it flexible.

@TRManderson
Copy link

Code LGTM too, seems like a nice minimal way to get exactly what's needed for filebeat 7

@ashb ashb modified the milestones: Airflow 2.1, Airflow 2.1.1 May 18, 2021
@jedcunningham jedcunningham marked this pull request as ready for review June 9, 2021 18:15
@jedcunningham jedcunningham requested a review from kaxil June 9, 2021 18:15
@github-actions
Copy link

github-actions bot commented Jun 9, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jun 9, 2021
Filebeat 7 renamed some fields (offset->log.offset and host->host.name),
so allow the field names Airflow uses to be configured.

Airflow isn't directly involved with getting the logs _to_
elasticsearch, so we should allow easy configuration to accomodate
whatever tools are used in that process.
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
@kaxil kaxil merged commit 5cd0bf7 into apache:main Jun 11, 2021
ashb pushed a commit that referenced this pull request Jun 22, 2021
Filebeat 7 renamed some fields (offset->log.offset and host->host.name),
so allow the field names Airflow uses to be configured.

Airflow isn't directly involved with getting the logs _to_
elasticsearch, so we should allow easy configuration to accomodate
whatever tools are used in that process.

(cherry picked from commit 5cd0bf7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logging full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Elasticsearch log retrieval fails when "host" field is not a string
5 participants