Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure JSON parser returns Hash #4106

Closed

Conversation

daipom
Copy link
Contributor

@daipom daipom commented Mar 21, 2023

Which issue(s) this PR fixes:
Fixes #4100

What this PR does / why we need it:
Make sure JSON parser returns Hash.
It is wrong for Parser to return a record that is not Hash.
Subsequent processing may result in errors.

The result of parsing JSON can be a value other than Hash.
Originally, the data represented by JSON was not limited to Hash.

The current implementation of JSONParser does not appear to have sufficiently considered this.

We need to consider carefully the specification of the case where raw data can not be parsed as Hash.

The currently intended usecases appears to be Hash or Array of Hash.
We should consider only these 2 patterns as parsable for now.

in_http assumed that Parser could return Array.
Therefore, it needed to be fixed along with this.

MessagePackParser appears to have the same issue. (Not fixed in this PR.)

Docs Changes:
Not needed.

Release Note:
Same as the title.

Difference

Case: No subsequent processing

Config

<source>
  @type tcp
  tag test.tcp
  <parse>
    @type json
  </parse>
</source>

<match test.**>
  @type stdout
</match>

Operation

$ netcat 0.0.0.0 5170
[{"k":"v"}, {"k2":"v2"}]

Result before this fix

2023-03-21 23:30:30.623063804 +0900 test.tcp: [{"k":"v"},{"k2":"v2"}]

Result after this fix

2023-03-21 23:29:20.147728222 +0900 test.tcp: {"k":"v"}
2023-03-21 23:29:20.147754433 +0900 test.tcp: {"k2":"v2"}

Case: With subsequent filter processing

Config

<source>
  @type tcp
  tag test.tcp
  <parse>
    @type json
  </parse>
</source>

<filter test.**>
  @type record_transformer
  enable_ruby true
  <record>
    class ${record.class}
  </record>
</filter>

<match test.**>
  @type stdout
</match>

Operation

$ netcat 0.0.0.0 5170
[{"k":"v"}, {"k2":"v2"}]

Result before this fix

2023-03-21 23:30:51 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for [{\"k\"=>\"v\"}, {\"k2\"=>\"v2\"}]:Array" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.tcp" time=2023-03-21 23:30:51.040692881 +0900 record=[{"k"=>"v"}, {"k2"=>"v2"}]

Result after this fix

2023-03-21 23:29:55.783183244 +0900 test.tcp: {"k":"v","class":"Hash"}
2023-03-21 23:29:55.783215076 +0900 test.tcp: {"k2":"v2","class":"Hash"}

@daipom
Copy link
Contributor Author

daipom commented Mar 21, 2023

Additional confirming of the difference of the behavior,

<source>
  @type sample
  tag test.hash
  sample {"message": "{\"k\":\"v\"}"}
</source>

<source>
  @type sample
  tag test.string
  sample {"message": "\"HELLO\""}
</source>

<source>
  @type sample
  tag test.invalid
  sample {"message": "HELLO"}
</source>

<source>
  @type sample
  tag test.array
  sample {"message": "[{\"k\":\"v\"}, {\"k2\":\"v2\"}]"}
</source>

<filter test.**>
  @type parser
  key_name message
  <parse>
    @type json
  </parse>
</filter>

<filter test.**>
  @type record_transformer
  enable_ruby true
  <record>
    class ${record.class}
  </record>
</filter>

<match test.**>
  @type stdout
</match>

Before this fix

Some patterns result in errors in subsequent processing.

2023-03-21 23:50:24.068859931 +0900 test.hash: {"k":"v","class":"Hash"}
2023-03-21 23:50:25 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for [{\"k\"=>\"v\"}, {\"k2\"=>\"v2\"}]:Array" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.array" time=2023-03-21 23:50:25.070141860 +0900 record=[{"k"=>"v"}, {"k2"=>"v2"}]
2023-03-21 23:50:25 +0900 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'HELLO'" location=nil tag="test.invalid" time=2023-03-21 23:50:25.071323231 +0900 record={"message"=>"HELLO"}
2023-03-21 23:50:25 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for \"HELLO\":String" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.string" time=2023-03-21 23:50:25.072303009 +0900 record="HELLO"

After this fix

Correctly, ParserError occurs in filter_parser.

I left a comment in the code about filter_parser not being able to parse multiple records from one raw record.
(This is a limitation of the current specification.)

2023-03-21 23:43:25.061681273 +0900 test.array: {"k":"v","class":"Hash"}
2023-03-21 23:43:25 +0900 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'HELLO'" location=nil tag="test.invalid" time=2023-03-21 23:43:25.062907066 +0900 record={"message"=>"HELLO"}
2023-03-21 23:43:25.063493027 +0900 test.hash: {"k":"v","class":"Hash"}
2023-03-21 23:43:25 +0900 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data '\"HELLO\"'" location=nil tag="test.string" time=2023-03-21 23:43:25.063894687 +0900 record={"message"=>"\"HELLO\""}

@daipom daipom marked this pull request as ready for review March 21, 2023 14:58
@daipom
Copy link
Contributor Author

daipom commented Mar 21, 2023

Sorry for the timing of the v1.16.0 release.

I understand that we would not be able to put this fix in v1.16.0.

else
raise "'json', 'ndjson' or 'msgpack' parameter is required"
end
end

def parse_params_with_parser(params)
if content = params[EVENT_RECORD_PARAMETER]
@custom_parser.parse(content) { |time, record|
raise "Received event is not #{@format_name}: #{content}" if record.nil?
Copy link
Contributor Author

@daipom daipom Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explanation of the diff

There did not seem to be any particular reason for raise only here.
We should align with parse_params_default.

@daipom daipom force-pushed the make-sure-json-parser-returns-hash branch from f0591f0 to 39b72c6 Compare March 24, 2023 07:11
@daipom
Copy link
Contributor Author

daipom commented Mar 24, 2023

I confirmed the failing tests. They seem to be unstable tests and not related to this fix.

if record.is_a?(Array)
# TODO: Temporarily supporting this case for compatibility.
# We should not consider this case here.
# We should fix MessagePackParser so that it doesn't return Array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you do this in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it should be summarized as fixing the JSON parser.
Since there is no time before the next release, I also wanted to keep the scope of impact as small as possible.
About in_http, I had no choice but to fix it in this PR.

If it looks like I should fix it in this PR, I will try today or tomorrow.
Or I can try to fix it in another PR after this PR is merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be done in this PR instead of adding such workaround, since it's considered as a same problem and sometimes such workaround lives long unexpectedly.

I’ll postpone merging this to v1.17.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review. I Understood.
My understanding was that the existing workaround was minimized here, not adding a new workaround.
But certainly, if it is not fixed entirely, it may lead to unnecessary problems.
Especially for MessagePackParser as @custom_parser.

@ashie ashie added this to the v1.17.0 milestone Mar 28, 2023
@daipom daipom marked this pull request as draft April 10, 2023 09:56
@daipom
Copy link
Contributor Author

daipom commented Apr 10, 2023

TODO: Fix other parsers.

@daipom
Copy link
Contributor Author

daipom commented Sep 12, 2023

Sorry for the delay. I'm going to fix this for the next release.

It is wrong for Parser to return a record that is not Hash.
Subsequent processing may result in errors.

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
in_http didn't support yield of Parser.
The specification assumed that Parser could return Array.
However, this is wrong. Parser shouldn't return Array.

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
Config to reproduce:

    <source>
      @type sample
      tag test.array
      sample {"message": "[{\"k\":\"v\"}, {\"k2\":\"v2\"}]"}
    </source>

    <filter test.**>
      @type parser
      key_name message
      <parse>
        @type json
      </parse>
    </filter>

    <match test.**>
      @type stdout
    </match>

Result:

    2023-03-21 23:24:52.004470792 +0900 test.array: {"k":"v"}
    2023-03-21 23:24:52.004470792 +0900 test.array: {"k":"v"}
    ...

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
@kenhys kenhys force-pushed the make-sure-json-parser-returns-hash branch from 39b72c6 to 940624e Compare April 4, 2024 07:33
@daipom
Copy link
Contributor Author

daipom commented Apr 26, 2024

I have add new PR for this

@daipom daipom closed this Apr 26, 2024
@daipom daipom deleted the make-sure-json-parser-returns-hash branch May 7, 2024 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JSON parser plugin can sometimes output records as strings rather than Hash
3 participants