Support reading additional files with references #156

lmolkova · 2023-03-09T08:16:21Z

Fixes #155

Adds --references-only flag to semconvgen which allows to specify additional sources for referenced (extended, included) semconv.

E.g. browser resource semconv references general user_agent.original attribute. When code for resources is generated, user_agent.original needs to be resolved, even though user_agent semconv is not generated.

Usage (tested on Java):

All semconv except resources

docker run --rm \
  -v ${SCRIPT_DIR}/opentelemetry-specification/semantic_conventions:/source \
  -v ${SCRIPT_DIR}/templates:/templates \
  -v ${ROOT_DIR}/semconv/src/main/java/io/opentelemetry/semconv/trace/attributes/:/output \
  otel/semconvgen:$GENERATOR_VERSION \
-  --exclude resource/** \
+  --reference-only resource/** \
  -f /source code \
 ...

New flag excludes semconvs from the output, but allows references to them.

Resources:

docker run --rm \
- -v ${SCRIPT_DIR}/opentelemetry-specification/semantic_conventions/resource/:/source \
+ -v ${SCRIPT_DIR}/opentelemetry-specification/semantic_conventions:/source \
  -v ${SCRIPT_DIR}/templates:/templates \
  -v ${ROOT_DIR}/semconv/src/main/java/io/opentelemetry/semconv/resource/attributes/:/output \
  semconvgen:$GENERATOR_VERSION \
+ --only resources \
+ --reference-only *.yaml \
  -f /source code \
....

I.e. from all semconv output resource conventions, and use yaml files from root folder semantic_conventions to resolve references.

lmolkova · 2023-03-09T08:17:46Z

@Oberon00 @arminru can you please take a look? Apparently, codegen is broken for 1.19.0 release (open-telemetry/opentelemetry-specification#3299).

semantic-conventions/src/opentelemetry/semconv/templating/code.py

Oberon00 · 2023-03-09T08:48:45Z

semantic-conventions/src/opentelemetry/semconv/main.py

+    parser.add_argument(
+        "--reference-only",
+        "-r",
+        help="Additional files to resolve references only (and extends) using GLOB syntax",


I think this feature warrants a bit more description. This suggestion is also a check if I understood this PR correctly 😃

Suggested change

help="Additional files to resolve references only (and extends) using GLOB syntax",

help="""Additional files that will be loaded and available for referencing, but will not be part of the output.

As semantic conventions can, in various ways, e.g. using `ref` or `include`, reference other conventions you

sometimes need to add files/groups to your input but you don't want the groups in there be processed by

the code generator template or markdown generator. In this case, add only the subset of files you want to

process with your template / markdown generator to `--yaml-root`, and add any additional required

(transitively depended-upon) files with `--reference-only`.

""",

Oberon00 · 2023-03-09T08:49:34Z

semantic-conventions/src/opentelemetry/semconv/main.py

@@ -223,6 +244,12 @@ def setup_parser():
    parser.add_argument(
        "--exclude", "-e", help="Exclude the matching files using GLOB syntax", type=str
    )
+    parser.add_argument(
+        "--reference-only",
+        "-r",


I wonder if we actually want a short-name for this flag.

Oberon00 · 2023-03-09T08:51:01Z

semantic-conventions/src/opentelemetry/semconv/model/semantic_convention.py

    errors: bool = False

-    def parse(self, file):
+    def parse(self, file, ref_only=False):


Maybe instead add parse_into(self, file, models)?

If you prefer keeping the boolean argument, please add * before to enforce using named arguments for it:

Suggested change

def parse(self, file, ref_only=False):

def parse(self, file, *, ref_only=False):

Oberon00 · 2023-03-09T08:51:48Z

semantic-conventions/src/tests/semconv/model/test_correct_parse.py

+        semconv = SemanticConventionSet(debug=False)
+        semconv.parse(self.load_file("yaml/http.yaml"))
+        self.assertEqual(len(semconv.models), 3)
+        semconv.parse(self.load_file("yaml/general.yaml"), True)


Suggested change

semconv.parse(self.load_file("yaml/general.yaml"), True)

semconv.parse(self.load_file("yaml/general.yaml"), ref_only=True)

etc
(or see comment before for maybe using an API that does not rely on a boolean)

tsloughter · 2023-03-09T12:05:07Z

I hit this with Go and Python so was also looking at making this patch. My 2 cents, I expected it to work without an additional argument. As in, if I give a path to where the yaml schema files are to -f and that includes all the necessary references within it then there should be no need to add additional files through a new option.

If the user doesn't want to include all the schema files in the path they give to -f there would still be --exclude.

--

And --only would still used for limiting the generated code to specific types.

Oberon00 · 2023-03-09T12:48:46Z

As in, if I give a path to where the yaml schema files are to -f and that includes all the necessary references within it

I think this is already the case. IIUC, this PR is actually about adding a way to exclude files from output generation but not from internal model construction (see #156 (comment)).

--exclude I think completely excludes the files from being loaded. --only should be applied after parsing I believe.

I wonder if this PR is maybe going into the wrong direction, as it adds more reliance on the directory structure. Maybe globbing on the group ID would be better. Or maybe --only would already solve the problem, or if not, could be adjusted a bit?

Noticed some room for improvement

Oberon00 · 2023-03-09T12:54:21Z

Actually I now looked at how --only is implemented, see https://github.com/open-telemetry/build-tools/blob/main/semantic-conventions/src/opentelemetry/semconv/main.py#L68 and called function. I think that approach would also work for this PR and would be much simpler. At the moment, the source file path of a convention is not part of the model, but it could be added, or alternatively, the approach could be changed to filter by group ID (regex/glob) instead of filepath.

If going for the ID-based filtering, this could mean that some group IDs in the semantic conventions might have to be renamed in order to improve "filterability". This would not change the actual attribute keys, as the "prefix" is independent of the group ID.

If going for adding the filepath to the model, the existing --exclude could be changed to filter after a complete parse of the YAML root. I think this would be a mostly backwards-compatible change as it would not change behavior in cases that already worked (except for making parsing a bit slower, but it's quite fast anyway) but allow additional cases where references are going into the excluded code. The only problem would be if you have any template/example/duplicate files in your YAML root that would actually break parsing

Oberon00 · 2023-03-09T13:10:36Z

Another option would be: instead of adding additional paths, you would add the actual root as --yaml-root, and you would add another option --only-paths that then restricts the output. So instead of --yaml-root some-subdirectory-I-want-to-generate-output-for --reference-only common-directory (which breaks as soon as there is one than more common directory, or would become cumbersome), you would do --yaml-root actual-toplevel-yaml-root --only-paths some-subdirectory-I-want-to-generate-output-for

I think all these options work, including the one currently in the PR, it's more of an UX decision.

marcalff · 2023-03-09T14:33:16Z

Thanks for looking into this.

I don't have specific comments about the implementation itself, and which options to pass to generate code.

From an end-user point of view, and based on my (very limited) understanding of semconv code generation, expectations are as follows:

(1) Example documentation that works, see issue Semconv code generator example does not work for opentelemetry-specifications 1.14.0 #122
(2) Code generation separate for traces / resources / metrics (when available), so that semconv per signal can go into different files, which contains only the applicable constants (for C++, trace semconv in trace.h, resource semconv in resource.h, etc), hence the use of -only span, ...
(3) A stable docker run command, that does not change every spec release.

A docker command like:

docker run --rm \
  -v ${SCRIPT_DIR}/opentelemetry-specification/semantic_conventions:/source \
  --only span \
 ...

seems to meet these expectations, assuming it can be supported.

If there is a need to:

add more -v mount options to add some *.yaml files
add more --exclude options to hide some other *.yaml files

every time the spec releases a different set of *.yaml files, due to changes in internal dependencies, this becomes more difficult to deal with, and is more subject to errors.

Changing the docker run command once this is resolved is not an issue, but having to change it all the time (3) is a concern here.

If somehow these expectations are not realistic, please clarify what we (users in languages) are supposed to do then (see (1)).

Regards.

Oberon00 · 2023-03-09T14:54:25Z

@marcalff

Thanks for your comment. I was of the opinion that the docker command you mention (with some syntax changes such as ~~moving the -v before run~~ adding the image name, but that's not the point here) should work even today (even before this PR). If not, I'd like to know what the problem is.

Based on #122 (comment), it might be enough to add support for multiple types to --only, e.g. support --only span,event to include both span+event. CC @jkwatson

lmolkova · 2023-03-09T16:50:32Z

to address @marcalff concern, I think

we need to add a sample codegen dry run to -spec repo workflow so that authors of PR should see if the change breaks codegen
we should reasonably unify how codegen is used (I see significant differences in Java and c++)

It's also the first time for me looking into codegen and here I'd really like to reasonably unblock ourselves for 1.19.0 release and take steps to prevent it. I wonder if these steps are good enough.

Regarding reusing current functionality without a new flag and allowing multiple --only specs:
I don't think it's a good idea to allow referring to metrics attributes from spans and vice versa. We can solve it in several ways:

explicit inclusion of files I do here gives the most control
hardcoding what's allowed in build-tools (i.e. traces can refer to resource and attribute_group attributes, but not metric ones)

The opt2 is somewhat controversial and I'm not ready to drive it forward.

Oberon00 · 2023-03-09T17:01:24Z

Regarding reusing current functionality without a new flag and allowing multiple --only specs: I don't think it's a good idea to allow referring to metrics attributes from spans and vice versa.

I think that has very little to do with the issue here. I think this should be either enforced by the tool (option 2) or by spec reviewers. Option 1 would be "enforcement by difficult UX".

In general, I believe adding reliance on the directory structure instead of explicit group kinds/types is going into the wrong direction.

marcalff · 2023-03-09T17:07:06Z

Thanks @lmolkova

to address @marcalff concern, I think

* we need to add a sample codegen dry run to -spec repo workflow so that authors of PR should see if the change breaks codegen

* we should reasonably unify how codegen is used (I see significant differences in Java and c++)

+1 on both points.

lmolkova · 2023-03-09T17:17:49Z

I think that has very little to do with the issue here. I think this should be either enforced by the tool (option 2) or by spec reviewers.

makes sense. Options 2 can happen any time in the future as well and not blocking.

In general, I believe adding reliance on the directory structure instead of explicit group kinds/types is going into the wrong direction.

ok, so I can do the following in this PR instead of reference-only:

allow multiple values for --only
maybe print a warning if attributes from metrics are reused on traces and vice versa

sounds good @Oberon00 ?

lmolkova · 2023-03-09T17:31:30Z

Thanks for your comment. I was of the opinion that the docker command you mention (with some syntax changes such as moving the -v before run adding the image name, but that's not the point here) should work even today (even before this PR). If not, I'd like to know what the problem is.

great point that only should already work (it did, but didn't produce a lot of stuff)! It should with --only span,attribute_group,event once I implement it here.

tsloughter · 2023-03-09T17:46:31Z

At the moment, the source file path of a convention is not part of the model

Right, I think that is what I was trying to suggest.

I'm trying to catch up. I don't think having multiple values to --only will work because of how the script is used to generate a specific file. Like in Go:

$(SEMCONVGEN) -i "$(OTEL_SPEC_REPO)/semantic_conventions/." --only=span -p conventionType=trace -f trace.go -t "$(SEMCONVPKG)/template.j2" -s "$(TAG)"

And Python:

docker run --rm \
  -v ${SCRIPT_DIR}/opentelemetry-specification/semantic_conventions/trace:/source \
  -v ${SCRIPT_DIR}/templates:/templates \
  -v ${ROOT_DIR}/opentelemetry-semantic-conventions/src/opentelemetry/semconv/trace/:/output \
  otel/semconvgen:$OTEL_SEMCONV_GEN_IMG_VERSION \
  -f /source code \
  --template /templates/semantic_attributes.j2 \
  --output /output/__init__.py \
  -Dclass=SpanAttributes

lmolkova · 2023-03-09T17:55:30Z

I don't think having multiple values to --only will work because of how the script is used to generate a specific file. Like in Go:

we'll have to update generate scripts anyway and the common pattern can be:

-v ${SCRIPT_DIR}/opentelemetry-specification/semantic_conventions:/source \
--only  span,attribute_group,event

this will preserve current attribute locations in files.

an alternative could be to run script per semconv type and move attributes into individual files

-v ${SCRIPT_DIR}/opentelemetry-specification/semantic_conventions:/source \
--only  span

-v ${SCRIPT_DIR}/opentelemetry-specification/semantic_conventions:/source \
--only  attribute_group

-v ${SCRIPT_DIR}/opentelemetry-specification/semantic_conventions:/source \
--only  event

lmolkova · 2023-03-10T05:09:12Z

Did multiple --only here - #157, tried on Java, works just fine and much simpler.

Oberon00 · 2023-03-10T09:47:52Z

Should attribute_group even be required to be mentioned? I think for referencing, you don't need to include it in --only

lmolkova · 2023-03-10T19:08:01Z

Should attribute_group even be required to be mentioned? I think for referencing, you don't need to include it in --only

good point. thought about it and I think it's totally valid to generate md or template for an attribute group.
E.g. span-general attributes are an attribute_group.

lmolkova · 2023-03-13T19:17:45Z

closing in favor of #157

Support reading additional files with references

df4e77e

lmolkova requested review from a team and Oberon00 as code owners March 9, 2023 08:16

Oberon00 reviewed Mar 9, 2023

View reviewed changes

semantic-conventions/src/opentelemetry/semconv/templating/code.py Outdated Show resolved Hide resolved

lint

018e874

lmolkova force-pushed the fix-codegen-119 branch from 713e66b to 018e874 Compare March 9, 2023 08:34

Oberon00 previously approved these changes Mar 9, 2023

View reviewed changes

lmolkova mentioned this pull request Mar 10, 2023

Allow multiple semconv in --only flag #157

Merged

lmolkova closed this Mar 13, 2023

marcalff mentioned this pull request Mar 22, 2023

Code generation fails for SEMCONV 1.19.0 open-telemetry/opentelemetry-specification#3299

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support reading additional files with references #156

Support reading additional files with references #156

lmolkova commented Mar 9, 2023

lmolkova commented Mar 9, 2023 •

edited

Oberon00 Mar 9, 2023

Oberon00 Mar 9, 2023

Oberon00 Mar 9, 2023

Oberon00 Mar 9, 2023

tsloughter commented Mar 9, 2023 •

edited

Oberon00 commented Mar 9, 2023

Oberon00 commented Mar 9, 2023 •

edited

Oberon00 commented Mar 9, 2023 •

edited

marcalff commented Mar 9, 2023

Oberon00 commented Mar 9, 2023 •

edited

lmolkova commented Mar 9, 2023 •

edited

Oberon00 commented Mar 9, 2023 •

edited

marcalff commented Mar 9, 2023

lmolkova commented Mar 9, 2023

lmolkova commented Mar 9, 2023 •

edited

tsloughter commented Mar 9, 2023

lmolkova commented Mar 9, 2023

lmolkova commented Mar 10, 2023

Oberon00 commented Mar 10, 2023

lmolkova commented Mar 10, 2023

lmolkova commented Mar 13, 2023

-        help="Additional files to resolve references only (and extends) using GLOB syntax",
+        help="""Additional files that will be loaded and available for referencing, but will not be part of the output.
+        As semantic conventions can, in various ways, e.g. using `ref` or `include`, reference other conventions you
+        sometimes need to add files/groups to your input but you don't want the groups in there be processed by
+        the code generator template or markdown generator. In this case, add only the subset of files you want to
+        process with your template / markdown generator to `--yaml-root`, and add any additional required
+        (transitively depended-upon) files with `--reference-only`.
+        """,

	def parse(self, file, ref_only=False):
	def parse(self, file, *, ref_only=False):

	semconv.parse(self.load_file("yaml/general.yaml"), True)
	semconv.parse(self.load_file("yaml/general.yaml"), ref_only=True)

Support reading additional files with references #156

Support reading additional files with references #156

Conversation

lmolkova commented Mar 9, 2023

lmolkova commented Mar 9, 2023 • edited

Oberon00 Mar 9, 2023

Choose a reason for hiding this comment

Oberon00 Mar 9, 2023

Choose a reason for hiding this comment

Oberon00 Mar 9, 2023

Choose a reason for hiding this comment

Oberon00 Mar 9, 2023

Choose a reason for hiding this comment

tsloughter commented Mar 9, 2023 • edited

Oberon00 commented Mar 9, 2023

Oberon00 commented Mar 9, 2023 • edited

Oberon00 commented Mar 9, 2023 • edited

marcalff commented Mar 9, 2023

Oberon00 commented Mar 9, 2023 • edited

lmolkova commented Mar 9, 2023 • edited

Oberon00 commented Mar 9, 2023 • edited

marcalff commented Mar 9, 2023

lmolkova commented Mar 9, 2023

lmolkova commented Mar 9, 2023 • edited

tsloughter commented Mar 9, 2023

lmolkova commented Mar 9, 2023

lmolkova commented Mar 10, 2023

Oberon00 commented Mar 10, 2023

lmolkova commented Mar 10, 2023

lmolkova commented Mar 13, 2023

lmolkova commented Mar 9, 2023 •

edited

tsloughter commented Mar 9, 2023 •

edited

Oberon00 commented Mar 9, 2023 •

edited

Oberon00 commented Mar 9, 2023 •

edited

Oberon00 commented Mar 9, 2023 •

edited

lmolkova commented Mar 9, 2023 •

edited

Oberon00 commented Mar 9, 2023 •

edited

lmolkova commented Mar 9, 2023 •

edited