Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add preserveFormatting option for comments/whitespace #38

Closed
wants to merge 5 commits into from

Conversation

arvindth
Copy link

This pull request addresses a use case for reading and updating properties files that have:

  • commented out sample values for key/value entries that need to be preserved. eg:
# Uncomment the following to override the default value
# serverHostname = host.com
  • sectional comments with whitespace formatting. eg:
# 
# Section 1 
#

# comment1
key = value

Changes:

  • Add a preserveFormatting bool option to the loader to allow scanning and retaining whitespace as part of comments.
  • Add a preserveFormatting bool option when writing comments to pass through the whitespace formatting to the output.
  • Add a virtual key/value for the end of the input to allow trailing comments to be retained and emitted.
  • Update lexer to not discard whitespace if preserveFormatting is set.
  • Comments are now structs containing 2 elements: the original prefixes from the input, and the comment value itself.

Notes:

  • I chose not to add a preserveFormatting option to each of the Load* methods. I felt doing this would explode the number of combinations of potential parameters.
    • Instead, the preserveFormatting option for loading is only available through the GetLoader method followed by explicitly setting the loader options before invoking the underlying Load methods.

@arvindth
Copy link
Author

arvindth commented Aug 2, 2019

@magiconair could you take a look at this PR and let me know if you'll consider accepting it?

Copy link
Owner

@magiconair magiconair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started reviewing this but stopped since this solution seems too complex. If he goal is to preserve whitespace for comments and the lexer should handle that transparently. In this approach this permeates everywhere. Doesn't look right.

load.go Outdated
@@ -178,6 +183,10 @@ func LoadMap(m map[string]string) *Properties {
return p
}

func GetLoader() (*Loader, error) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this and it is not idiomatic. It would need to be called NewLoader() but since it just returns an empty Loader we can drop this method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this method. Callers can just instantiate the struct themselves.

load.go Outdated
}

func (l *Loader) loadBytes(buf []byte, enc Encoding) (*Properties, error) {
p, err := parse(convert(buf, enc))
func (l *Loader) loadBytes(buf []byte, enc Encoding, preserveFormatting bool) (*Properties, error) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all loadXXX functions eventually call loadBytes you don't need pass the parameter through. you can just use l.PreserveFormatting

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the passed parameters in load.go.

lex.go Outdated
@@ -162,8 +163,9 @@ func (l *lexer) nextItem() item {
}

// lex creates a new scanner for the input string.
func lex(input string) *lexer {
func lex(input string, preserveFormatting bool) *lexer {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since #37 wants to add another config parameter to the lexer I think we should move the go l.run() call to the parse function and refactor this function as follows:

func lex(input string) *lexer {
	return &lexer{
		input: input,
		items: make(chan item),
		runes: make([]rune, 0, 32),
	}
}

Then parse looks like this:

func parse(input string) (properties *Properties, err error) {
	l := lex(input)
	go l.run()
	p := &parser{lex: l}
...
}

This way it is possible to pass additional configuration to the lexer without adding them as function arguments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

lex.go Outdated
@@ -60,6 +60,7 @@ type stateFn func(*lexer) stateFn

// lexer holds the state of the scanner.
type lexer struct {
preserveFormatting bool // whether to scan EOLs/whitespace as part of comments
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets rename to

keepWS // keepWS retains whitespace in comments

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

lex.go Outdated
@@ -189,14 +191,36 @@ func lexBeforeKey(l *lexer) stateFn {
return nil

case isEOL(r):
l.ignore()
return lexBeforeKey
if l.preserveFormatting {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make these two separate switch cases (also for the other ones):

case isEOL(r) && l.keepWS:
    l.appendRune(r)
    l.backup()
    return lexComment

case isEOL(r):
    l.ignore()
    return lexBeforeKey

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

properties.go Outdated
@@ -48,6 +48,11 @@ func PanicHandler(err error) {

// -----------------------------------------------------------------------------

type Comment struct {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be exported and does not look properly formatted but I also don't understand why we need this in the first place. The only difference is whether the comments have whitespace or not. Shouldn't the lexer handle that transparently for the parser?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unexported this struct. This is mainly needed to preserve the formatting of the prefix. Since java allows either # or ! as a prefix, with optional leading whitespace before the prefix, I wanted to be able to preserve that. eg:

#############################
    !!                 !!
    !!    Section 1    !!
    !!                 !!
#############################

# comment 1
key = value

parser.go Outdated

for {
token := p.expectOneOf(itemComment, itemKey, itemEOF)
switch token.typ {
case itemEOF:
if preserveFormatting && (len(comments) > 0 || token.val != "") {
// There are comments at the end of the input that are not tied to a particular key
// Save these off against a special empty key when preserving formatting
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a sentinel key I suggest using a separate field for heading and trailing comments, e.g. trailComment comment in the Properties.

Also, please revert the logic of the if statement to avoid the indent.

if !preserveFormatting || len(comments) == 0 {
    goto done
}
if token.val == "" {
    p.trailingComments = comments
    goto done
}
...
goto done

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

continue
case itemKey:
key = token.val
key = strings.TrimSpace(token.val)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look right since whitespace parsing should only affect comments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary to maintain previous behavior. Previously, whitespace around the key was being ignored in lexBeforeKey. However, now the lexer doesn't know whether the whitespace in lexBeforeKey is part of a comment and needs to be preserved or whitespace before a key and needs to be ignored. So I do it here.

@arvindth
Copy link
Author

arvindth commented Aug 6, 2019

I've put up an additional commit addressing your comments so far, and also removed some of the extra parameters, like properties.PreserveFormatting. In addition, I think at least some of the complexity comes from having to support writeFormattedCommentWithoutFormattingTests, i.e. being able to take formatted comments and write them out as before by stripping out the whitespace. I'm considering removing or modifying this capability to see if it reduces the complexity, especially in the WriteComment method.

@arvindth
Copy link
Author

arvindth commented Aug 7, 2019

I've refactored the WriteComment method and moved writing formatted comments out into a new method. The individual commit diff makes it look odd, but looking at the FilesChanged view shows the diff better. This in combination with the previous commit's changes does simplify the original change a bit.

However, in order to maintain backward compatibility in the lexer, the parser and in properties, I don't think that this can be achieved transparently in the lexer. I think that the parser needs to understand preserveFormatting and behave differently when it's specified. Either that, or the list of lexer states would need to grow much larger, since the lexer would need to understand the underlying state quite a bit more, and even then, the parser would still need enhancements to understand and consume those new emitted items.

Let me know if you agree, or have any further suggestions to modify this PR.

However, I also understand if you think it's still not the right approach. If you believe this is not the right direction for this feature, I'm ok with closing this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants