Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EachKey #45

Merged
merged 3 commits into from Jun 10, 2016
Merged

Add EachKey #45

merged 3 commits into from Jun 10, 2016

Conversation

buger
Copy link
Owner

@buger buger commented Apr 17, 2016

Description:
Fetch multiple keys at once. Instead of reading whole payload for each key, it does read only 1 time.

Using new API you can get nearly 2x improvement when fetching multiple keys, and the more keys you fetch, the bigger difference.

Example:

paths := [][]string{
    []string{"uuid"},
    []string{"tz"},
    []string{"ua"},
    []string{"st"},
}

jsonparser.EachKey(smallFixture, func(idx int, value []byte, vt jsonparser.ValueType, err error){
    switch idx {
    case 0: // uuid
        // jsonparser.ParseString(value)
    case 1: // tz
        jsonparser.ParseInt(value)
    case 2: // ua
        // jsonparser.ParseString(value)
    case 3: //sa
        jsonparser.ParseInt(value)
    }
}, paths...)

Benchmark before change:

BenchmarkJsonParserLarge      100000         84621 ns/op           0 B/op          0 allocs/op
BenchmarkJsonParserMedium     500000         15803 ns/op           0 B/op          0 allocs/op
BenchmarkJsonParserSmall     5000000          1378 ns/op           0 B/op          0 allocs/op

Added 2 new benchmarks:

BenchmarkJsonParserEachKeyManualMedium   1000000          8312 ns/op           0 B/op          0 allocs/op
BenchmarkJsonParserEachKeyManualSmall   10000000           798 ns/op           0 B/op          0 allocs/op

@buger
Copy link
Owner Author

buger commented Apr 17, 2016

Initially this change was part of struct unmarshaling PR #29, but it became too big, and there were too many optimizations/fixes in master. Will be merging this functionality separately.

@buger
Copy link
Owner Author

buger commented Apr 17, 2016

/cc @daboyuka @ikruglov

@FZambia
Copy link
Contributor

FZambia commented Jun 8, 2016

Hello Leonid, any news when this will be merged? In my case I have lots of different JSON requests with several fields on top level so this helper will allow to prevent parsing the whole content several times.

@buger
Copy link
Owner Author

buger commented Jun 9, 2016

Well, initially I did not want to merge it because thought I will find better API. I do not have too much time right now as I fully focused on http://github.com/buger/gor right now, but if you think this API make sense, I may consider merging it as it is.

@FZambia
Copy link
Contributor

FZambia commented Jun 9, 2016

Ah, ok, then let me first experiment with this patch first to understand if this is really comfortable to work with - maybe some ideas arise during work. I'll provide a feedback as soon as I get my hands dirty into this.

@FZambia
Copy link
Contributor

FZambia commented Jun 10, 2016

I played a bit with this. It was pretty comfortable to use. Here is struct I have:

type publishAPICommand struct {
    Channel Channel         `json:"channel"`
    Client  ConnID          `json:"client"`
    Data    json.RawMessage `json:"data"`
}

Here is benchmark code:

var publishKeys = [][]string{
    []string{"channel"},
    []string{"data"},
    []string{"client"},
}

var publishData = []byte(`{"channel": "1", "data": {"input": "test"}, "client": "xxx"}`)

func BenchmarkPublishUnmarshal(b *testing.B) {
    for i := 0; i < b.N; i++ {
        var cmd publishAPICommand
        err := json.Unmarshal(publishData, &cmd)
        if err != nil {
            panic(err)
        }
    }
}

func BenchmarkPublishUnmarshalJsonparser(b *testing.B) {
    for i := 0; i < b.N; i++ {
        var cmd publishAPICommand
        var parseError error

        jsonparser.EachKey(publishData, func(idx int, value []byte, vType jsonparser.ValueType, err error) {
            switch idx {
            case 0: // channel
                if err != nil {
                    parseError = err
                    return
                }
                if vType != jsonparser.String {
                    parseError = errors.New("channel must be string")
                    return
                }
                cmd.Channel = Channel(value)
            case 1: // data
                if err != nil {
                    parseError = err
                    return
                }
                if vType == jsonparser.Null {
                    cmd.Data = []byte("null")
                } else if vType == jsonparser.String {
                    cmd.Data = []byte(fmt.Sprintf("\"%s\"", string(value)))
                } else {
                    cmd.Data = value
                }
            case 2: // client
                if err != nil {
                    parseError = err
                    return
                }
                if vType != jsonparser.String {
                    parseError = errors.New("client must be string")
                    return
                }
                cmd.Client = ConnID(value)
            }
        }, publishKeys...)
        if parseError != nil {
            panic(parseError)
        }
    }
}

Benchmark results:

BenchmarkPublishUnmarshal-2                   300000          4551 ns/op         384 B/op          8 allocs/op
BenchmarkPublishUnmarshalJsonparser-2        3000000           506 ns/op           4 B/op          2 allocs/op

Pretty fast! As I said It was comfortable to use, I can handle unmarshal errors too, maybe the only thing is case 0: // channel - maybe using map for this could be better? To write sth like this:

var publishKeys = map[string][]string{
    "channel": []string{"channel"},
    "data": []string{"data"},
    "client": []string{"client"},
}

jsonparser.EachKey(publishData, func(idx string, value []byte, vType jsonparser.ValueType, err error) {
        switch idx {
        case "channel":
...
}

Not sure about this though - i.e. less typing in one place but more typing in another. But allows to extend keys description without worrying about indexes - just give a proper name.

Also a question - look how I handle data key - I need to check value type and construct proper json.RawMessage value depending on value type - am I doing this correctly?

# Conflicts:
#	parser_test.go
@buger
Copy link
Owner Author

buger commented Jun 10, 2016

It uses indexes instead of keys because you can query nested keys: []string{"nested", "nested3", "b"}, and in this case you can't use a switch with array anyway. Well, everything comes with its price, this is low-level and very fast API, but looks ugly for sure, people who will use it should be ready for this :)

Regarding RawMessage, this is indeed the custom case, and also you don't know the key type, so I think your way of handling it is entirely ok.

Merging it. Thank you for the testing!

@buger buger merged commit 5048f35 into master Jun 10, 2016
@buger buger deleted the key-each branch June 10, 2016 10:05
@FZambia
Copy link
Contributor

FZambia commented Jun 10, 2016

Thanks! Btw I also tried to unmarshal data containing crazy JSON payload as RawMessage (the craziest I could find in internet) - here it is https://github.com/FZambia/easyerror, sample.json file, don't try to open it in browser:) - worked without problems, while easyjson can not unmarshal it yet for example.

@buger
Copy link
Owner Author

buger commented Jun 10, 2016

Well, this is probably because jsonparser a bit simpler and does not really fully validate JSON, which is both good and bad, depending on your case :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants