Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can dotted keys insert into already-defined [tables]? #846

Closed
marzer opened this issue Sep 4, 2021 · 16 comments · Fixed by #859
Closed

Can dotted keys insert into already-defined [tables]? #846

marzer opened this issue Sep 4, 2021 · 16 comments · Fixed by #859

Comments

@marzer
Copy link
Contributor

marzer commented Sep 4, 2021

Had a user file a bug report for the way my parser handles the following two snippets:

Snippet A

[a.b.c]
  z = 9
[a]
  b.c.t = 9

Snippet B

[a.b.c.d]
  z = 9
[a]
  b.c.d.k.t = 8

And honestly after reading and re-reading the table and dotted key sections of the spec I'm not really any closer to deciding if these are legal. Currently my implementation treats them both as being illegal.

I can understand how a case might be made for Snippet B to be legal, since it's effectively just an alternate way of expressing this:

[a.b.c.d]
  z = 9
[a.b.c.d.k]
t = 8

Which is pretty harmless. Snippet A, on the other hand, just feels wrong to me. If you can inject values into already-defined [tables] (and thus spread their actual definition all throughout the document), what then is the point of table headers?

@hukkin
Copy link
Contributor

hukkin commented Sep 5, 2021

My interpretation has been that dotted keys are free to assign values to anything that hasn't already been directly referenced by either table or AoT syntax (e.g. [table-key] or [[aot-key]]) or key syntax (e.g. some-key = ... or some.dotted.key = ...). The spec is pretty clear regarding this restriction:

As long as a key hasn't been directly defined, you may still write to it and to names within it.

I haven't found a restriction in the spec that is obvious at prohibiting either of the two snippets so I currently do allow them in Tomli.

@eksortso
Copy link
Contributor

eksortso commented Sep 5, 2021

Our intention was that a table's key/value pairs are explicitly defined in only one place in a document.

Snippet A is definitely invalid. Since there is a table [a.b.c] defined, you must define t in the same section where you defined z. Either that, or define both keys in a.b.c as dotted keys within [a].

Snippet B, to me, seems to be valid. Within [a], you are defining a subtable, a.b.c.d.k, which is a subtable that could be more suitably defined within [a.b.c.d] or within its own section [a.b.c.d.k] of the document.

That's just my own take, though. I think that, elsewhere within these pages, there was a stricter interpretation that may not have been fully expressed in toml.md. I can't recall who made that argument though.

At any rate, a clarification seems to be in order.

EDIT: Snippet B is invalid, in fact. Read @ChristianSi's comments below.

@marzer
Copy link
Contributor Author

marzer commented Sep 5, 2021

Our intention was that a table's key/value pairs are explicitly defined in only one place in a document.

This has always been my understanding, so I'm happy to see it confirmed. In fact the only reason I doubted it to begin with was the lack of an explicit counter-example in toml.md, so I guess there's an opportunity there.

Snippet B, to me, seems to be valid.

Yah, I initially rejected it out-of-hand before giving it more thought. (Potential) Legality aside, it is a very strange way to structure that data. I suspect there would be a lot of variation throughout the TOML parser-verse with this particular case (cc: @arp242).

@eksortso
Copy link
Contributor

eksortso commented Sep 5, 2021

@hukkin Let me draw your attention to the following section of the spec. In particular, read the second sentence. This is what makes Snippet A invalid.

Since tables cannot be defined more than once, redefining such tables using a [table] header is not allowed. Likewise, using dotted keys to redefine tables already defined in [table] form is not allowed. {Emph mine.} The [table] form can, however, be used to define sub-tables within tables defined via dotted keys.

Now, the third sentence permits the [table] form to define subtables of tables that were previously defined by dotted keys. I would extend that (since nothing says otherwise) to allow dotted keys in parent tables to define previously undefined subtables of their child tables previously defined by [table]. In short, the stack of tables defined by Snippet B look like this:

EDIT: My analysis was incorrect, and has been removed. See the followup by @ChristianSi below.

@ChristianSi
Copy link
Contributor

ChristianSi commented Sep 5, 2021

@marzer's parser is correct: both these snippets are illegal. Two sentences from the spec are the key here:

Dotted keys create and define a table for each key part before the last one, provided that such tables were not previously created.

Likewise, using dotted keys to redefine tables already defined in [table] form is not allowed.

So,

[a.b.c]
  z = 9
[a]
  b.c.t = 9

is illegal, since the last line defines a.b and a.b.c, but a.b.c has already been defined in the first line, leading to a violation of the rule that "tables cannot be defined more than once".

[a.b.c.d]
  z = 9
[a]
  b.c.d.k.t = 8

is illegal for exactly the same reason: The last line defines, among other tables, a.b.c.d which, however, has already been defined in the first line.

@eksortso Didn't you more or less write that part of the spec? You should know!

The deeper reason behind these rules is, of course, to prevent the "magical key injection" which could otherwise occur. Dotted keys were never meant to be usable to magically inject keys, or subtables, into tables defined elsewhere.

A rule of thumb to remember is that, among all the tables defined in block form, a dotted key must go into the table where it requires the least number of dots. (In Snippet A, that's [a.b.c], in B it's [a.b.c.d].)

@hukkin
Copy link
Contributor

hukkin commented Sep 5, 2021

@eksortso what I find confusing is that the sentence you refer to is in a chapter that describes table definitions, and dotted keys used in table definitions, not dotted key to value assignments. Below the text there is a corresponding example for each of the three sentences, and the example also only deals with normal dotted keys in table definitons, not in key-value assignment.

I also feel like an assumption is being made here, that perhaps is not written out in the spec, that adding subtables in a table does not "redefine" the table, but assigning any other value to a key does so. The table type seems special.

I'd like to follow that up with: is inline table also special?

This is invalid

[a.b.c]
  z = 9
[a]
  b.c.t = 9

but what if we assign an inline table?

[a.b.c]
  z = 9
[a]
  b.c.t = {}

We are adding a subtable, so are we then not redefining the table? Is this valid?

@ChristianSi
Copy link
Contributor

ChristianSi commented Sep 5, 2021

@hukkin No, value types don't matter. Both your examples are invalid. You can define subtables elsewhere, but only if you do it in block form. (I.e., the table name must be enclosed in brackets on a line of its own.)

@eksortso
Copy link
Contributor

eksortso commented Sep 5, 2021

Snippet B, to me, seems to be valid.

Yah, I initially rejected it out-of-hand before giving it more thought. Legality aside, it is a very strange way to structure that data. I suspect there would be a lot of variation throughout the TOML parser-verse with this particular case (cc: @arp242).

@marzer My apologies, but Snippet B actually is invalid. I need to respond to @ChristianSi about this in detail, but he's right.

TOML has always allowed [table] blocks to be arranged in any order, hence the optimistic statement:

You don't need to specify all the super-tables if you don't want to. TOML knows how to do it for you.

This can lead to some serious confusion, and dotted keys originally made the whole thing a lot more complicated. But those issues were resolved before TOML v1.0.0 was released. Sorry for my confusion.

@marzer
Copy link
Contributor Author

marzer commented Sep 5, 2021

@eksortso No skin off my nose- if both snippets are indeed illegal I don't have to write any code 🎉

@hukkin
Copy link
Contributor

hukkin commented Sep 5, 2021

Thanks @ChristianSi and @eksortso 🙇 I do understand the intended spec now (I think 😄 ). However I do still find it a bit hard to interpret TOML v1.0.0 in the intended way.

Could we

  • Add a few examples to clarify some of the edge cases

  • Maybe change this sentence in the spec:

    Dotted keys create and define a table for each key part before the last one, provided that such tables were not previously created [emph. mine].

    @ChristianSi mentioned that

    [an example] is illegal, since the last line defines a.b and a.b.c, but a.b.c has already been defined in the first line, leading to a violation of the rule that "tables cannot be defined more than once".

    Given that dotted keys only create tables not previously created, I don't think there is a violation?

@eksortso
Copy link
Contributor

eksortso commented Sep 5, 2021

@hukkin I 100% agree with the change that you mentioned.

Hang on, folks, this is going to get complicated. But it'll get simpler eventually.

As the spec language stands, @ChristianSi and I hold conflicting conceptions of how dotted keys create and define tables. My looser interpretation was that dotted keys do define their parent tables, but not if they were already created. His stricter interpretation is that dotted keys create and define the parent tables that they specify, so that they cannot re-define a previously defined subtable, or even define its sub-subtables.

I wrote the original PR that included this clause:

Since tables cannot be defined more than once, redefining such tables using a
[table] header is not allowed. Likewise, using dotted keys to redefine tables
already defined in [table] form is not allowed. The [table] form can,
however, be used to define sub-tables within tables defined via dotted keys.

But that same PR included this clause as well:

Dotted keys create and define a table for each key part before the last one,
provided that such tables were not previously created.

But I originally wrote just "create," not "create and define." I still had the looser interpretation in mind. When @ChristianSi suggested that change to add "and define," we should have removed the "provided that" clause as well, to lock in the stricter conception.

The "provided that" clause allows for the looser interpretation. Looking at Snippet B, the "[a.b.c.d]" block defines the table a.b.c.d, but none of its parents. The "[a]" block defines the table a likewise. But the dotted key "b.c.d.t" in [a] does not define a.b.c.d a second time, because a.b.c.d already exists. If the strong conception is what we want, then the current language doesn't give us that.

I now hold the same position that @ChristianSi holds, but a PR needs to set this conception in writing. We must remove that "provided that" clause. I will submit a PR later today to do so.

@pradyunsg How will this change affect the versioning of the specification? The proposed change could invalidate TOML documents that have dotted keys crossing into subtables of existing table blocks, but how likely are such constructions to exist?

@eksortso
Copy link
Contributor

eksortso commented Sep 6, 2021

The replacement language that I propose is as follows:

Dotted keys create and define a table for each key part before the last one. Any
such table must have all its key/value pairs defined under the current [table]
header, or in the root table if defined before all headers.

I mention the "current table" in the changelog, although we don't state exactly what a "current" table is, or what a table "section" is. I may recommend (if necessary) that we rearrange the spec language so that it's clear that every key/value pair belongs to one table section or one current table, whichever we think would work better. There are still whole swaths of the spec that were written before inline tables and dotted keys were introduced, so introducing such language would help to clarify table-type objects defined around the headers using the more specialized syntaxes.

Thoughts?

@marzer
Copy link
Contributor Author

marzer commented Sep 6, 2021

@eksortso

but how likely are such constructions to exist?

Anecdotally? Very unlikely. toml++ is approaching two years old and has hundreds of clones per day, and the usage that sparked this discussion is the first user report I've had relating to it. It's a super weird way to structure data. So weird that I'd not be surprised to learn it was actually generated by a fuzzer or code generator, rather than a human being.

@python36
Copy link

python36 commented Sep 6, 2021

@marzer No. I came up with this when I wrote a parser for TCL :)

@marzer
Copy link
Contributor Author

marzer commented Sep 6, 2021

@marzer No. I came up with this when I wrote a parser for TCL :)

Ah, I see, so it was deliberately designed to break stuff! i knew it :P

@ChristianSi
Copy link
Contributor

I'll continue the discussion in PR #848.

pelletier added a commit to pelletier/go-toml that referenced this issue Oct 23, 2021
The TOML spec is being clarified to say that dotted keys "define" their
intermediate tables. Therefore the seen tracker needs to verify that none of
them reference an explicit table.

Also added a missing seen expression check for key-values parsed as part of a
table section.

See toml-lang/toml#846
pelletier added a commit to pelletier/go-toml that referenced this issue Oct 23, 2021
The TOML spec is being clarified to say that dotted keys "define" their
intermediate tables. Therefore the seen tracker needs to verify that none of
them reference an explicit table.

Also added a missing seen expression check for key-values parsed as part of a
table section.

See toml-lang/toml#846
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants