Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON parser plugin can sometimes output records as strings rather than Hash #4100

Open
jwreford99 opened this issue Mar 16, 2023 · 12 comments · Fixed by #4474
Open

JSON parser plugin can sometimes output records as strings rather than Hash #4100

jwreford99 opened this issue Mar 16, 2023 · 12 comments · Fixed by #4474
Assignees
Labels

Comments

@jwreford99
Copy link

jwreford99 commented Mar 16, 2023

Describe the bug

The JSON parser plugin can sometimes emit records as a string rather than as a Hash. Similar to #1279, the expectation of plugins further down the line is that the record it receives is always a Hash, but this contract is broken in some circumstances by the JSON plugin leading to some very difficult to diagnose bugs.

This happens because a string is considered valid JSON (SO) and the JSON parser (Oj at least) treats it as such and parses it successfully. But the thing it outputs from it's parsing is not a Hash type, it is a string, and at no point in the JSON parser does it throw if the record it is returning isn't a Hash object.

To Reproduce

If you setup a fluentd configmap which a super simple JSON filter setup:

    <filter gearset.*>
      @type parser
      <parse>
        @type json
        json_parser oj
      </parse>
      replace_invalid_sequence true
      key_name message
      emit_invalid_record_to_error false
    </filter>

When presented with a record which has {"message": "\"HELLO\""} it will parse it so that the resulting record is "HELLO" (i.e. not a Hash type, but just a string. This then causes errors in plugins further down the line (for example in my example this was followed by the Prometheus plugin which threw an error undefined method map' for #String:0x00007fdbb6b32308\nDid you mean? tap` which was rather hard to debug).

Expected behavior

I would of expected the parser plugin to call error out if the thing it was parsing is a string (i.e. the same behaviour you see if the parser failed to parse the JSON).
Potentially I would just expect us to check if values is a Hash on

But that might change behaviour in an unexpected way for people, so it might need to be a configuration value. It is this question what led me to doing an Issue rather than a PR

Your Environment

- Fluentd version: `fluentd 1.14.6`
- Operating system: `PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"`
- Kernel version: `5.10.102.1-microsoft-standard-WSL2`

Your Configuration

<source>
      @type tail
      read_from_head true
      path /var/log/containers/__.log
      pos_file /fluentd/pos/__.log.pos
      tag gearset.x
      <parse>
        @type cri
      </parse>
    </source>

    <filter gearset.*>
      @type parser
      <parse>
        @type json
        json_parser oj
      </parse>
      replace_invalid_sequence true
      key_name message
      emit_invalid_record_to_error false
    </filter>

    <filter gearset.*>
      @type prometheus
      <metric>
        name na
        type counter
        desc NA
        <labels>
          tag ${tag}
        </labels>
      </metric>
    </filter>

Note the plugin which causes the error is the JSON plugin because it is emitting records of string type, but it is the Prometheus plugin which is throwing the error because it is the next thing handling the records (and is not designed to be able to)

Your Error Log

dump an error event: error_class=NoMethodError error="undefined method `map' for #<String:0x00007fdbb6b32308>\nDid you mean?  tap" location="/usr/local/bundle/gems/fluent-plugin-prometheus-2.0.3/lib/fluent/plugin/prometheus.rb:90:in `stringify_keys'" tag="gearset.x" time=2023-03-14 12:48:13.381801632 +0000 record="HELLO"

Additional context

No response

@daipom daipom self-assigned this Mar 17, 2023
@daipom
Copy link
Contributor

daipom commented Mar 17, 2023

Thanks for reporting!
We can reproduce this issue as follows.

<source>
  @type sample
  tag test.hoge
  sample {"message": "{\"k\":\"v\"}"}
</source>

<source>
  @type sample
  tag test.fuga
  sample {"message": "\"HELLO\""}
</source>

<filter test.**>
  @type parser
  key_name message
  <parse>
    @type json
  </parse>
</filter>

<filter test.**>
  @type record_transformer
  enable_ruby true
  <record>
    class ${record.class}
  </record>
</filter>

<match test.**>
  @type stdout
</match>

Output

2023-03-17 11:18:09 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for \"HELLO\":String" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.fuga" time=2023-03-17 11:18:09.063985763 +0900 record="HELLO"
2023-03-17 11:18:09.064692451 +0900 test.hoge: {"k":"v","class":"Hash"}
2023-03-17 11:18:10 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for \"HELLO\":String" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.fuga" time=2023-03-17 11:18:10.066951425 +0900 record="HELLO"
2023-03-17 11:18:10.067684144 +0900 test.hoge: {"k":"v","class":"Hash"}

@daipom
Copy link
Contributor

daipom commented Mar 17, 2023

I think this is probably a bug.

Potentially I would just expect us to check if values is a Hash on

But that might change behaviour in an unexpected way for people, so it might need to be a configuration value. It is this question what led me to doing an Issue rather than a PR

Thanks for your consideration on this point!
I will figure out how we can fix this.

@daipom daipom added the bug label Mar 17, 2023
@jwreford99
Copy link
Author

Thanks for looking at this so quickly! Sounds like you can reproduce it alright, but let me know if there is anything I can do to help 😄 (also sorry, hadn't realised you could use the sample source to be able to output messages like that, would of made the config easier to show as an example!)

@daipom
Copy link
Contributor

daipom commented Mar 20, 2023

Sorry for the delay!

Potentially I would just expect us to check if values is a Hash on

But that might change behaviour in an unexpected way for people, so it might need to be a configuration value. It is this question what led me to doing an Issue rather than a PR

Thanks for your consideration on this point! I will figure out how we can fix this.

This is not only an issue for filter_parser, but for all plugins that can use parser_json.

So we should fix parser_json.

def parse(text)
record = @load_proc.call(text)
time = parse_time(record)
if @execute_convert_values
time, record = convert_values(time, record)
end
yield time, record
rescue @error_class, EncodingError # EncodingError is for oj 3.x or later
yield nil, nil
end

This should return nil when the parsed result is not a Hash.

The root cause of this is that sometimes the Ruby library does not parse a given string as Hash.

require "json"
JSON.load('"hoge"')
=> "hoge"

I don't understand the reason for this Ruby specification.
Anyway, we would have expected this to be a JSON::ParserError.
The same can be said for oj and yajl.

@daipom
Copy link
Contributor

daipom commented Mar 21, 2023

I have fixed this in

@daipom
Copy link
Contributor

daipom commented Mar 21, 2023

The root cause of this is that sometimes the Ruby library does not parse a given string as Hash.

require "json"
JSON.load('"hoge"')
=> "hoge"

I don't understand the reason for this Ruby specification.

This is a more complicated issue than I initially thought...

It is also necessary to take into account that the value is parsed as an Array and that the JSON specification allows the value to be neither a Hash nor an Array.

The current implementation of JSONParser does not appear to have sufficiently considered this.

@daipom
Copy link
Contributor

daipom commented Mar 21, 2023

We need to consider carefully the specification of the case where raw data can not be parsed as Hash.

I think the currently intended usecases are Hash or Array of Hash.

I think we should consider only these 2 patterns as parsable for now.

@daipom
Copy link
Contributor

daipom commented Mar 21, 2023

<source>
  @type sample
  tag test.hash
  sample {"message": "{\"k\":\"v\"}"}
</source>

<source>
  @type sample
  tag test.string
  sample {"message": "\"HELLO\""}
</source>

<source>
  @type sample
  tag test.array
  sample {"message": "[{\"k\":\"v\"}, {\"k2\":\"v2\"}]"}
</source>

<filter test.**>
  @type parser
  key_name message
  <parse>
    @type json
  </parse>
</filter>

<filter test.**>
  @type record_transformer
  enable_ruby true
  <record>
    class ${record.class}
  </record>
</filter>

<match test.**>
  @type stdout
</match>
2023-03-21 19:12:34.005516812 +0900 test.hash: {"k":"v","class":"Hash"}
2023-03-21 19:12:35 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for [{\"k\"=>\"v\"}, {\"k2\"=>\"v2\"}]:Array" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.array" time=2023-03-21 19:12:35.006256894 +0900 record=[{"k"=>"v"}, {"k2"=>"v2"}]
2023-03-21 19:12:35 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for \"HELLO\":String" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.string" time=2023-03-21 19:12:35.007100208 +0900 record="HELLO"

@stevehipwell
Copy link

Has there been any progress on this?

@daipom
Copy link
Contributor

daipom commented Jun 12, 2023

This will be fixed in #4106.
There are still some points that need to be done in #4106.
I will fix it so that it can be included in the next or the next after the next release.

@ashie
Copy link
Member

ashie commented Apr 30, 2024

#4474 partially fixes this issue but not yet complete for filter_parser.

@jwreford99
Copy link
Author

Thanks folks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants