Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add special exception for "host field is not hashable" #23136

Closed
wants to merge 2 commits into from

Conversation

NiklasBeierl
Copy link

I am adding a special exception to the es_task_handler for cases in which the "hosts" field in an elastic record is not hashable. (i.e. a dict) It seems to be a common problem, but its not really a problem with airflow itself, rather "incomaptible defaults" of airflow and filebeat. See my comment here: #15613 (comment)

I considered making _group_logs_by_host "smart" and let it handle "common" host objects. (Like the ones add_host_metadata or filebeat itself are producing)
But it struck me as risky in case someone does something else with the host field. Making an assumption like taking the host.hostname if it is available might have unintended side-effects if a worker node changes hostname / ip / Mac. Turning the entire host into a string might also cause issues if the host object happens to contain volatile values like "uptime" or "load".

closes: #15613

@boring-cyborg
Copy link

boring-cyborg bot commented Apr 20, 2022

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

if not isinstance(key, Hashable):
raise ValueError("The host field in all log records needs to be hashable. "
"If you are using filebeat, read here: "
"https://github.com/apache/airflow/issues/15613#issuecomment-1104487752") from e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of linking to GitHub, we should have documentation for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually better solution will be to copy the explanation to our ElasticSearch documentation (at airflow.apache.org) and link from it to there. The error message should explain the reason and link to the detailed discussion/explanation why - but linking to an issue is only fine only in a source comment, rather than in a user message. Theree we should only link to a documentation we control.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get your point, just got set up with breeze to write some proper documentation.

I have a question: airflow.providers.elasticsearch.log.es_task_handler.ElasticsearchTaskHandler has offset_field and host_field paramaters in its constructor. I have a hard time figuring out where these are being set / come from. Are they configurable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea about details of it to be hones. - I would have to - similarly to you dive deep in the code to understand it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually - I know that - (Re-read this) - I believe each logging handler can be configured with parameters - you can read it in "logging" configugration in our docs./

@eladkal
Copy link
Contributor

eladkal commented May 29, 2022

@NiklasBeierl can you fix the static checks?

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jul 27, 2022
@github-actions github-actions bot closed this Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logging area:providers stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Airflow2.0.2 --- TypeError: unhashable type: 'AttrDict' while trying to read logs from elasticsearch
4 participants