Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support round tripping with comments #213

Open
rogpeppe opened this issue Feb 8, 2018 · 3 comments · May be fixed by #328
Open

support round tripping with comments #213

rogpeppe opened this issue Feb 8, 2018 · 3 comments · May be fixed by #328

Comments

@rogpeppe
Copy link

rogpeppe commented Feb 8, 2018

This is related to #75 but the solution might look quite different.

TOML is being used for the dep tool, and a major obstacle to making that tool easier to use is the fact that the Gopkg.toml file cannot be programmatically manipulated without loss of user-edited content.

If the toml package supported round-tripping the TOML while retaining comments, I believe that it would help a great deal.

I'm afraid I'm sure how the API would look though. Some degree of loss is probably inevitable, but it may be possible to a job that's good enough for all practical cases.

@BurntSushi
Copy link
Owner

BurntSushi commented Feb 8, 2018

I don't really disagree with the feature request---it does sound quite useful---but it is a significant request IMO. It is definitely not something I have the bandwidth to do myself or even mentor. Here are a couple choices from my perspective:

  1. Perhaps I am wrong about this being significant. If there is a simple way to augment this library with this support, then I'd be willing to try and merge it. It would need to be well tested.
  2. If it does require a significant change, then I suspect someone else will need to step up as maintainer of this library, or just fork it. If the latter, depending on how that goes, I'd be willing to deprecate this project and point folks toward the fork.

@rogpeppe
Copy link
Author

rogpeppe commented Feb 8, 2018

I'm aware that it is potentially quite a lot of work. Just raising the issue so someone might get the idea that they want to do it :-)
I may have a look at some point if enthusiasm outweighs rational queue priority... I'd be very happy contributing any changes to this repo assuming you'd be happy with them.

arp242 added a commit that referenced this issue Nov 16, 2021
This allows encoding comments and setting some flags to control the
format. While toml.Marshaler added in #327 gives full control over how
you want to format something, I don't think it's especially
user-friendly to tell everyone to create a new type with all the
appropriate formatting, escaping, etc. The vast majority of of use cases
probably just call in to a few simple categories such as "use `"""` or
"encode this number as a hex".

I grouped both features together as they're closely related.

What I want is something that:

1. works well with other encoders/decoders;
2. allows setting attributes programmatically;
3. supports round-tripping by default on a standard struct;
4. has a reasonable uncumbersome API.

Most options (custom types, struct tags) fail at least one of these.

---

This adds two new methods to Encoder: the Comment() method just sets a
comment for a key, and the FormatAs() method sets the format. A simple
example:

	err := toml.NewEncoder(os.Stdout).
		FormatAs("multi_line", toml.AsMultilineString).
		FormatAs("single_line_raw", toml.AsLiteralString).
		Comment("comment_me", "Well, hello there!").
		Encode(someStruct)

This way, pretty much any flag can be added programmatically without
getting in the way of JSON/YAML/whatnot encoding/decoding.

The idea behind the naming here is that you can have one `As*` hint,
with one or more `With*` hints.

---

I don't especially care how you need to pass the keys as strings, but
there isn't really any good way to do it otherwise.

I'm not necessarily opposed to also adding struct tags for most of these
things, although I'm not a huge fan of them. Since struct tags can't be
set programmatically it's not really suitable for many use cases (e.g.
setting comments dynamically, using multiline strings only if the string
contains newlines, etc.) It's something that could maybe be added in a
future PR, if a lot of people ask for it.

Not entirely sold on the API; you have a fairly large list of variables
you can add to FormatAs() and many are invalid (e.g. they apply only for
some types). Maybe it's better to split it out to FormatString(),
FormatNumber(), etc. and add `type formatString`, `type formatNumber`?

Another thing not covered here is proper round-tripping; e.g. decoding a
TOML file and encoding it again will make it lose the comments. We
already have a `MetaData` object when decoding, so it's probably better
to add all the information there, and allow passing that to the encoder
(or something?) Need to think about comments especially, since not every
comment is necessarily associated with a key

Fixes #64
Fixes #75
Fixes #160
Fixes #192
Fixes #213
Fixes #269
arp242 added a commit that referenced this issue Nov 19, 2021
This allows encoding comments and setting some flags to control the
format. While toml.Marshaler added in #327 gives full control over how
you want to format something, I don't think it's especially
user-friendly to tell everyone to create a new type with all the
appropriate formatting, escaping, etc. The vast majority of of use cases
probably just call in to a few simple categories such as "use `"""` or
"encode this number as a hex".

I grouped both features together as they're closely related.

What I want is something that:

1. works well with other encoders/decoders;
2. allows setting attributes programmatically;
3. supports round-tripping by default on a standard struct;
4. has a reasonable uncumbersome API.

Most options (custom types, struct tags) fail at least one of these.

---

This adds two new methods to Encoder: the Comment() method just sets a
comment for a key, and the FormatAs() method sets the format. A simple
example:

	err := toml.NewEncoder(os.Stdout).
		FormatAs("multi_line", toml.AsMultilineString).
		FormatAs("single_line_raw", toml.AsLiteralString).
		Comment("comment_me", "Well, hello there!").
		Encode(someStruct)

This way, pretty much any flag can be added programmatically without
getting in the way of JSON/YAML/whatnot encoding/decoding.

The idea behind the naming here is that you can have one `As*` hint,
with one or more `With*` hints.

---

I don't especially care how you need to pass the keys as strings, but
there isn't really any good way to do it otherwise.

I'm not necessarily opposed to also adding struct tags for most of these
things, although I'm not a huge fan of them. Since struct tags can't be
set programmatically it's not really suitable for many use cases (e.g.
setting comments dynamically, using multiline strings only if the string
contains newlines, etc.) It's something that could maybe be added in a
future PR, if a lot of people ask for it.

Not entirely sold on the API; you have a fairly large list of variables
you can add to FormatAs() and many are invalid (e.g. they apply only for
some types). Maybe it's better to split it out to FormatString(),
FormatNumber(), etc. and add `type formatString`, `type formatNumber`?

Another thing not covered here is proper round-tripping; e.g. decoding a
TOML file and encoding it again will make it lose the comments. We
already have a `MetaData` object when decoding, so it's probably better
to add all the information there, and allow passing that to the encoder
(or something?) Need to think about comments especially, since not every
comment is necessarily associated with a key

Fixes #64
Fixes #75
Fixes #160
Fixes #192
Fixes #213
Fixes #269
arp242 added a commit that referenced this issue Nov 24, 2021
This allows encoding comments and setting some flags to control the
format. While toml.Marshaler added in #327 gives full control over how
you want to format something, I don't think it's especially
user-friendly to tell everyone to create a new type with all the
appropriate formatting, escaping, etc. The vast majority of of use cases
probably just call in to a few simple categories such as "use `"""` or
"encode this number as a hex".

I grouped both features together as they're closely related.

What I want is something that:

1. works well with other encoders/decoders;
2. allows setting attributes programmatically;
3. supports round-tripping by default on a standard struct;
4. has a reasonable uncumbersome API.

Most options (custom types, struct tags) fail at least one of these.

---

This adds two new methods to Encoder: the Comment() method just sets a
comment for a key, and the FormatAs() method sets the format. A simple
example:

	err := toml.NewEncoder(os.Stdout).
		FormatAs("multi_line", toml.AsMultilineString).
		FormatAs("single_line_raw", toml.AsLiteralString).
		Comment("comment_me", "Well, hello there!").
		Encode(someStruct)

This way, pretty much any flag can be added programmatically without
getting in the way of JSON/YAML/whatnot encoding/decoding.

The idea behind the naming here is that you can have one `As*` hint,
with one or more `With*` hints.

---

I don't especially care how you need to pass the keys as strings, but
there isn't really any good way to do it otherwise.

I'm not necessarily opposed to also adding struct tags for most of these
things, although I'm not a huge fan of them. Since struct tags can't be
set programmatically it's not really suitable for many use cases (e.g.
setting comments dynamically, using multiline strings only if the string
contains newlines, etc.) It's something that could maybe be added in a
future PR, if a lot of people ask for it.

Not entirely sold on the API; you have a fairly large list of variables
you can add to FormatAs() and many are invalid (e.g. they apply only for
some types). Maybe it's better to split it out to FormatString(),
FormatNumber(), etc. and add `type formatString`, `type formatNumber`?

Another thing not covered here is proper round-tripping; e.g. decoding a
TOML file and encoding it again will make it lose the comments. We
already have a `MetaData` object when decoding, so it's probably better
to add all the information there, and allow passing that to the encoder
(or something?) Need to think about comments especially, since not every
comment is necessarily associated with a key

Fixes #64
Fixes #75
Fixes #160
Fixes #192
Fixes #213
Fixes #269
arp242 added a commit that referenced this issue Nov 24, 2021
This allows encoding comments and setting some flags to control the
format. While toml.Marshaler added in #327 gives full control over how
you want to format something, I don't think it's especially
user-friendly to tell everyone to create a new type with all the
appropriate formatting, escaping, etc. The vast majority of of use cases
probably just call in to a few simple categories such as "use `"""` or
"encode this number as a hex".

I grouped both features together as they're closely related: both set
additional information on how to write keys.

What I want is something that:

1. allows setting attributes programmatically;
2. supports round-tripping by default on a standard struct;
3. plays well with other encoders/decoders;
4. has a reasonable uncumbersome API.

Most options (custom types, struct tags) fail at least one of these;
there were some PRs for struct tags, but they fail at 1, 2, and
(arguably) 4. Custom types fail at 2, 3, and probably 4.

---

This adds SetMeta() to the Encoder type; this is already what we have
when decoding, and all additional information will be set on it. On
MetaData we add the following types:

    SetType()       Set TOML type info.
    TypeInfo()      Get TOML type info.
    Doc()           Set "doc comment" above the key.
    Comment()       Set "inline comment" after the key.

Every TOML type has a type in this package, which support different
formatting options (see type_toml.go):

    Bool
    String
    Int
    Float
    Datetime
    Table
    Array
    ArrayTable

For example:

    meta := toml.NewMetaData().
            SetType("key", toml.Int{Width: 4, Base: 16}).
            Doc("key", "A codepoint").
    Comment("key", "ë")
    toml.NewEncoder(os.Stdout).SetMeta(meta).Encode(struct {
            Key string `toml:"key"`
    }{"ë")

Would write:

    # A codepoint.
    key = 0x00eb  # ë

It also has Key() to set both:

    toml.NewMetaData().
        Key("key", toml.Int{Width: 4, Base: 16}, toml.Doc("A codepoint"), toml.Comment("ë")).
        Key("other", toml.Comment("..."))

The advantage of this is that it reduces the number of times you have to
type the key string to 1, but it uses interface{}. Not yet decided which
one I'll stick with, and also not a huge fan of Doc() and Comment(), but
I can't really think of anything clearer at the moment (these are the
names the Go ast uses).

---

The Decode() sets all this information on the MetaData, so this:

    meta, _ := toml.Decode(..)
    toml.NewEncoder(os.Stdout).SetMeta(meta).Encode(..)

Will write it out as "key = 0x00eb" again, rather than "key = 235".

This way, pretty much any flag can be added programmatically without
getting in the way of JSON/YAML/whatnot encoding/decoding.

---

I don't especially care how you need to pass the keys as strings, but
there isn't really any good way to do it otherwise. There is also the
problem that the "key" as found in the parser may be different than the
"key" the user is expecting if you don't use toml struct tags:

    type X struct { Key int }

Will read "key = 2" in to "Key", but when encoding it will write as
"Key" rather than "key". The type information will be set to "key", but
when encoding it will look for "Key", so round-tripping won't work
correct and has the potential for confusion if the wrong key is set.

This is not so easy to fix since we don't have access to the struct in
the parser. I think it's fine to just document this as a caveat and tell
people to use struct tags, which is a good idea in any case.

---

I'm not necessarily opposed to also adding struct tags for most of these
things, although I'm not a huge fan of them. Since struct tags can't be
set programmatically it's not really suitable for many use cases (e.g.
setting comments dynamically, using multiline strings only if the string
contains newlines, etc.) It's something that could maybe be added in a
future PR, if a lot of people ask for it.

Fixes #64
Fixes #75
Fixes #160
Fixes #192
Fixes #213
Fixes #269
arp242 added a commit that referenced this issue Nov 24, 2021
This allows encoding comments and setting some flags to control the
format. While toml.Marshaler added in #327 gives full control over how
you want to format something, I don't think it's especially
user-friendly to tell everyone to create a new type with all the
appropriate formatting, escaping, etc. The vast majority of of use cases
probably just call in to a few simple categories such as "use `"""` or
"encode this number as a hex".

I grouped both features together as they're closely related: both set
additional information on how to write keys.

What I want is something that:

1. allows setting attributes programmatically;
2. supports round-tripping by default on a standard struct;
3. plays well with other encoders/decoders;
4. has a reasonable uncumbersome API.

Most options (custom types, struct tags) fail at least one of these;
there were some PRs for struct tags, but they fail at 1, 2, and
(arguably) 4. Custom types fail at 2, 3, and probably 4.

---

This adds SetMeta() to the Encoder type; this is already what we have
when decoding, and all additional information will be set on it. On
MetaData we add the following types:

    SetType()       Set TOML type info.
    TypeInfo()      Get TOML type info.
    Doc()           Set "doc comment" above the key.
    Comment()       Set "inline comment" after the key.

Every TOML type has a type in this package, which support different
formatting options (see type_toml.go):

    Bool
    String
    Int
    Float
    Datetime
    Table
    Array
    ArrayTable

For example:

    meta := toml.NewMetaData().
            SetType("key", toml.Int{Width: 4, Base: 16}).
            Doc("key", "A codepoint").
    Comment("key", "ë")
    toml.NewEncoder(os.Stdout).SetMeta(meta).Encode(struct {
            Key string `toml:"key"`
    }{"ë")

Would write:

    # A codepoint.
    key = 0x00eb  # ë

It also has Key() to set both:

    toml.NewMetaData().
        Key("key", toml.Int{Width: 4, Base: 16}, toml.Doc("A codepoint"), toml.Comment("ë")).
        Key("other", toml.Comment("..."))

The advantage of this is that it reduces the number of times you have to
type the key string to 1, but it uses interface{}. Not yet decided which
one I'll stick with, and also not a huge fan of Doc() and Comment(), but
I can't really think of anything clearer at the moment (these are the
names the Go ast uses).

---

The Decode() sets all this information on the MetaData, so this:

    meta, _ := toml.Decode(..)
    toml.NewEncoder(os.Stdout).SetMeta(meta).Encode(..)

Will write it out as "key = 0x00eb" again, rather than "key = 235".

This way, pretty much any flag can be added programmatically without
getting in the way of JSON/YAML/whatnot encoding/decoding.

---

I don't especially care how you need to pass the keys as strings, but
there isn't really any good way to do it otherwise. There is also the
problem that the "key" as found in the parser may be different than the
"key" the user is expecting if you don't use toml struct tags:

    type X struct { Key int }

Will read "key = 2" in to "Key", but when encoding it will write as
"Key" rather than "key". The type information will be set to "key", but
when encoding it will look for "Key", so round-tripping won't work
correct and has the potential for confusion if the wrong key is set.

This is not so easy to fix since we don't have access to the struct in
the parser. I think it's fine to just document this as a caveat and tell
people to use struct tags, which is a good idea in any case.

---

I'm not necessarily opposed to also adding struct tags for most of these
things, although I'm not a huge fan of them. Since struct tags can't be
set programmatically it's not really suitable for many use cases (e.g.
setting comments dynamically, using multiline strings only if the string
contains newlines, etc.) It's something that could maybe be added in a
future PR, if a lot of people ask for it.

Fixes #64
Fixes #75
Fixes #160
Fixes #192
Fixes #213
Fixes #269
@arp242 arp242 added the v2 label May 27, 2022
@tjayrush
Copy link

Couldn't this be easily done without any real underlying changes? Simply add two new functions: ReadWithComments and WriteWithComments.

On read, store a map of keys --> comments (including an empty key at the top level). When writing, write the file as one normally would (ignoring the comments), then read it back in as plain text, scan each line, and if the key is present in the line (either [key] or key =) then write the comment prior to writing the line.

It's not idiomatic Go, but it would add a significant (very useful) feature for minimal effort.

Just a thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants