Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support prepend SQL comment for PG instrumentation #690

Merged
merged 6 commits into from
May 10, 2024

Conversation

keiko713
Copy link
Contributor

Currently, extract_operation method is simply extracting the first word before the space as an operation (as commented, this was borrowed from opentelemetry-js-contrib's pg instrument).

This works okay for most of queries, however, when Active Record Query Logs (known as marginalia) is used with the option of ActiveRecord::QueryLogs.prepend_comment = true, the query will start with a query comment and this extracting logic fails, making a span name as simply database name and leaving the operation empty.

This PR adds a simple logic to ignore any prepend query comment (with using /* comment */ way, not supporting -- comment type comment).
As Active Record Query Logs became part of Rails, we may see more and more folks use this likely with prepend_comment = true option, so it would be nice to support this with some minimal change.

Fixes #452

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 12, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@keiko713
Copy link
Contributor Author

I'm reading the specification for what db.operation should be, and wondering if this regex could be considered as "client-side parsing".

https://opentelemetry.io/docs/specs/otel/trace/semantic_conventions/database/#call-level-attributes

When setting this to an SQL keyword, it is not recommended to attempt any client-side parsing of db.statement just to get this property, but it should be set if the operation name is provided by the library being instrumented. If the SQL statement has an ambiguous operation, or performs more than one operation, this value may be omitted.

Given that it's a small simple one, I hope that it's okay to tweak like this (this really helps readability of the tracing so I'd love to have this personally), but I understand if that's not the case. I'd appreciate any inputs!

@arielvalentin
Copy link
Collaborator

Thanks for your contribution.

We've got work already underway to address this problem in a separate PR #529

Would you be interested in helping us do some exploratory testing there?

@keiko713
Copy link
Contributor Author

Ariel, thank you for sharing that PR. I wasn't aware of it, and I'm excited to see that there is already a PR for this! I'm more than happy to assist in any way I can.

It appears that this PR is quite big and has a lot of history. You mentioned the need for some exploratory testing, but could you please provide more details about how I can help for this? Thanks!

@fbogsany
Copy link
Contributor

Hi @keiko713. The main outstanding issue with that PR is exactly what you've addressed here. See #529 (comment). I'd really appreciate it if you could port your fix to that PR. 🙏

@keiko713
Copy link
Contributor Author

Thanks for the pointer, @fbogsany ! I took a look that PR deeper and now have several questions but let me see if I can keep it short.

First, let's see if I understand that PR properly:

  • The main motivation of that PR is to move shared sql behavior to helper gems (like the title said), mainly with the obfuscation feature. In that PR, it's extracted as opentelemetry-helpers-sql-obfuscation and both MySQL and Postgres (potentially more moving forward?) are using it to obfuscate SQL statements db.statement
    • This is a good refactoring, though I think this part is unrelated to my PR
  • In addition to the opentelemetry-helpers-sql-obfuscation helper gem, that PR also adds opentelemetry-helpers-mysql, which is used by two MySQL clients: mysql2 and trilogy
    • This helper contains the method somewhat related to my PR, OpenTelemetry::Helpers::MySQL.extract_statement_type
    • This extract_statement_type method takes a query and returns the statement type such as select or update
      • @fbogsany has feedback for the logic of this method (more accurately, the regexp the method is using) as described in this and this PR comments

Now, coming back to my PR and the current implementation of Postgres client pg instrumentation:

  • With pg instrumentation, currently it's using sql.to_s.split[0].to_s.upcase to get a statement type (aka db.operation)
    • This is different from what MySQL is doing, where searching a query against the list of "query names"
    • Postgres also has the list of query names-ish, SQL_COMMANDS, and the statement type (operation) will only be used as a part of span name when it's in SQL_COMMANDS (otherwise ignored)

From here, my questions would be:

I'm sorry for many questions and a long comment! 🙈

Copy link
Contributor

👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot

@github-actions github-actions bot added the stale Marks an issue/PR stale label Nov 18, 2023
@kaylareopelle
Copy link
Contributor

Hi @keiko713, I'm so sorry about the delay in responding to this PR. Thanks for your patience, your great questions, and this awesome fix.

Though the work is thematically similar to #529, as you correctly stated, the OpenTelemetry::Helpers::MySQL.extract_statement_type only applies to MySQL adapters. In my opinion, this PR should not be blocked by #529 and can be merged straightaway.

The MySQL helpers gem created in 529 is intended to reduce duplication between the trilogy and mysql2 instrumentation. Since we only instrument one Postgres gem, pg, we don't need to abstract this code out into a separate gem, nor should we try to combine it with the MySQL helpers gem.

I've also added functionality to #529 to make MySQL statement extraction respect Active Record Query Logs, so both MySQL and Postgres instrumentation via OpenTelemetry should now be compatible with this feature.

@simi
Copy link
Contributor

simi commented Nov 20, 2023

I don't really understand this need. Why would you enable this comment in app to remove it later during instrumentation? 🤔

@github-actions github-actions bot removed the stale Marks an issue/PR stale label Nov 21, 2023
Copy link
Contributor

👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot

@github-actions github-actions bot added the stale Marks an issue/PR stale label Dec 22, 2023
@keiko713 keiko713 requested a review from a team as a code owner January 2, 2024 20:12
@kaylareopelle
Copy link
Contributor

I don't really understand this need. Why would you enable this comment in app to remove it later during instrumentation? 🤔

@simi, the way I understand it, the comment isn't exactly being removed by the instrumentation.

Instead, the comment is ignored when trying to identify what statement type is in the query (ex. SELECT). If the comment is not ignored, the wrong statement type will be chosen, making the span name less precise.

The prepended comment will still be included in the statement if the user configured it to be included by using the default :db_statement or configuring :db_statement to :include.

@github-actions github-actions bot removed the stale Marks an issue/PR stale label Jan 3, 2024
@simi
Copy link
Contributor

simi commented Jan 11, 2024

@kaylareopelle ahh, thanks for explanation. Now it makes sense. I'm ok to move this forward. Once there will be universal SQL helpers gem, this can be moved in there.

@@ -101,7 +101,9 @@ def span_attrs(kind, *args)

def extract_operation(sql)
# From: https://github.com/open-telemetry/opentelemetry-js-contrib/blob/9244a08a8d014afe26b82b91cf86e407c2599d73/plugins/node/opentelemetry-instrumentation-pg/src/utils.ts#L35
sql.to_s.split[0].to_s.upcase
# Ignores prepend comment
comment_regex = %r{\A\/\*.*?\*\/}m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to move this into constant?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur with @simi

@keiko713 Please extract this regex into a constant to avoid allocating it for every query.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL Ruby freezes Regexes like it freezes strings.

Running a file named foo.rb with the following ruby code:

# frozen_string_literal: true

def foo
  %r{\A\/\*.*?\*\/}m.object_id
end

def bar
  'hello'.object_id
end

puts foo
puts foo
puts foo

puts bar
puts bar
puts bar

has this result:

➜  ~ ruby foo.rb
60
60
60
80
80
80

So I'm going to merge this in without moving it to a constant.

sql.to_s.split[0].to_s.upcase
# Ignores prepend comment
comment_regex = %r{\A\/\*.*?\*\/}m
sql.to_s.sub(comment_regex, '').split[0].to_s.upcase
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm, we are only interested in removing the first set of long form comments (without newlines) like those generated in SQL Commenter format.

We are also not concerned about inline comments -- this is a comment

Is that correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my understanding since the goal of this method is to grab the operation (ex. SELECT), so any comments after the operation won't apply. I'm not sure what would happen if there were two comments back-to-back before the operation, but that seems pretty unlikely.

Copy link
Contributor

👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot

@github-actions github-actions bot added the stale Marks an issue/PR stale label Mar 10, 2024
@kaylareopelle
Copy link
Contributor

@keiko713 - I think the only thing holding this PR back is to move the comment_regex to a constant.

Once that's fixed, we should be clear to merge. Thanks for your contribution and for your patience with the review process!

@github-actions github-actions bot removed the stale Marks an issue/PR stale label Mar 13, 2024
Copy link
Contributor

github-actions bot commented May 2, 2024

👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot

@github-actions github-actions bot added the stale Marks an issue/PR stale label May 2, 2024
@kaylareopelle kaylareopelle merged commit 081a3b0 into open-telemetry:main May 10, 2024
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Marks an issue/PR stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PG name incorrect when prefixing annotations
5 participants