Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove use of json gem #2479

Merged
merged 39 commits into from Nov 10, 2020
Merged

Conversation

cjlarose
Copy link
Member

@cjlarose cjlarose commented Nov 3, 2020

Description

Fixes #2471. The changes proposed in this PR in particular are described in a comment on the same issue. This PR introduces a new Puma::JSON module, which is meant to handle the specific places where Puma needs to perform JSON serialization.

Your checklist for this pull request

  • I have reviewed the guidelines for contributing to this repository.
  • I have added an entry to History.md if this PR fixes a bug or adds a feature. If it doesn't need an entry to HISTORY.md, I have added [changelog skip] or [ci skip] to the pull request title.
  • I have added appropriate tests if this PR fixes a bug or adds a feature.
  • My pull request is 100 lines added/removed or less so that it can be easily reviewed.
  • If this PR doesn't need tests (docs change), I added [ci skip] to the title of the PR.
  • If this closes any issues, I have added "Closes #issue" to the PR description or my commit messages.
  • I have updated the documentation accordingly.
  • All new and existing tests passed, including Rubocop.

For now, it only handles Arrays of Integers, but we'll extend it to
support all of the common types
Fixes a bug where requesting `/stats` from the status server would cause
subsequent phased restarts to fail when upgrading/downgrading the json
gem.
These were disabled at some point on JRuby, but they seem to run fine.
Importantly, this test ensures that a call to `/gc-stats` returns
well-formed JSON on JRuby, where the value of `GC.stat` contains nested
structures.
Fixes a bug where requesting `/gc-stats` from the status server would cause
subsequent phased restarts to fail when upgrading/downgrading the json
gem.
Fixes a bug where requesting `/thread-backtraces` from the status server
would cause subsequent phased restarts to fail when
upgrading/downgrading the json gem.
Fixes a bug where accessing `Puma.stats` would cause subsequent phased
restarts to fail when upgrading/downgrading the json gem.
@@ -0,0 +1,85 @@
# frozen_string_literal: true

module Puma
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really went above and beyond here. Better than what we had in 3.x by a mile!

@nateberkopec
Copy link
Member

100% for the approach, will do a line by line review soon

Copy link
Contributor

@rmacklin rmacklin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

The integration tests you added should prevent the specific scenarios that currently make it impossible to upgrade/downgrade json during a phased restart. If I understand correctly, though, the test suite could still pass even if someone were to introduce a new usage of json in some other code path, right?

I'm not sure there's much we can do about that (unless we want to ban the usage of require 'json' in general...) but just wanted to confirm that we'd still want to be on the lookout for new usages of json that could cause problems with phased restarts.

@@ -40,6 +40,8 @@ def serialize_value(output, value)
serialize_string output, value
when Integer
output << value.to_s
else
raise SerializationError, "Unexpected value of type #{value.class}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test for this case? (We already have tests for the other two SerializationError cases.)

Copy link
Member Author

@cjlarose cjlarose Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! 806f5c4

test/test_json.rb Outdated Show resolved Hide resolved
History.md Outdated
@@ -13,6 +13,7 @@
* Ignore illegal (by Rack spec) response header ([#2439])
* Close idle connections immediately on shutdown ([#2460])
* Fix some instances of phased restart errors related to the `json` gem (#2473)
* Remove use of `json` gem to fixed phased restart errors (#2479)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo here:

Suggested change
* Remove use of `json` gem to fixed phased restart errors (#2479)
* Remove use of `json` gem to fix phased restart errors (#2479)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed up in 0bd5531

cjlarose and others added 2 commits November 3, 2020 10:42
Co-authored-by: rmacklin <1863540+rmacklin@users.noreply.github.com>
lib/puma/json.rb Outdated
Comment on lines 24 to 25
case value
when Array, Hash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? serialize_value already handles all types or raises a SerializationError if needed.

Copy link
Member Author

@cjlarose cjlarose Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my implementation was based on an understanding of JSON according to RFC 4627, which requires that a valid JSON text is either an object or any array, but it looks like that requirement was dropped in RFC 7158. I'll update the implementation since now there's no need to handle Hash and Array in any special way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8d05a2f

lib/puma/json.rb Outdated
def generate(value)
case value
when Array, Hash
parts = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than an array, should it use StringIO? Not sure if there is a (performance) benefit for one or the other.

Copy link
Member Author

@cjlarose cjlarose Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered using stringio at first since it seems like a nice way to construct a string incrementally without an explosion in extra space in that same way that the Array#<< and Array#join strategy does. I went with just a normal Array because even stringio isn't in Ruby Core and, just like json, users can technically change the version of stringio they're using in a phased restart, which would cause workers to fail to boot. That said puma already uses stringio internally, so this is already a bug, so I could go either way on this.

I can't imagine the performance improvement would be too significant, especially since again, this module isn't really on a hot path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I'd switch to stringio is if it would improve readability/maintainability. I'll play with it and see what it looks like.

Copy link
Member Author

@cjlarose cjlarose Nov 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to StringIO in f818d65.

I ended up doing this after all because I think using the Array approach I was doing before stood out as unusual after thinking about it. If someone was trying to build a String incrementally while keep allocations to a minimum, a reader of the code would expect the author to have used stringio. Deviating from that expectation would suggest that the distinction was significant, but in this case, it isn't.

class SerializationError < StandardError; end

class << self
def generate(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to attempt using the native JSON module for those who don't use phased restarts? It'd be faster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cost of doing that might not be worth the benefit. Use of JSON serialization in puma isn't on a "hot" path, and there would have to be additional complexity to pass around the information such that Puma::JSON (or its callers) knew when it was appropriate to use the built-in ::JSON or the new Puma::JSON. If we get it wrong somewhere, that's an opportunity to re-introduce a regression.

lib/puma/json.rb Outdated
when Symbol
serialize_string output, value.to_s
when String
serialize_string output, value
Copy link
Member

@MSP-Greg MSP-Greg Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past I used inspect for strings and symbols. Maybe replace serialize_string with inspect:

when Symbol, String
  output << value.to_s.inspect

or

when Symbol
  output << value.to_s.inspect
when String
  output << value.inspect

I think it does all the needed processing?

EDIT: Just noticed the call ro serialize_string in serialize_value, that needs to be replaced also. Just tried the test locally, and inspect seems fine.

Copy link
Member Author

@cjlarose cjlarose Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll dig deeper into this. I realized I'm not handling all of the different escape sequences correctly. Newline characters, for example, are written literally instead of as \n.

I'll have to double check that everything that .inspect emits is valid JSON. If it is, I'll use it, otherwise, I'll handle the special characters in a different way.

Copy link
Member

@MSP-Greg MSP-Greg Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Your comment said

# The implementation of JSON serialization in this module is not designed to
# be particularly full-featured or fast. It just has to handle the few places
# where Puma relies on JSON serialization internally.

I suspect you know better than any of us what Puma sends to JSON.generate (having just reviewed it), so I'll trust your judgement as to what edge cases it should handle.

EDIT: I just checked (not exhaustively) arrays composed of strings and/or numbers, and inspect is equivalent to JSON.generate with a space between members...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 98b3408 to handle control characters correctly and added 28e9d8f to simplify the handling of String and Symbol.

AFAICT, there's no guarantee that .inspect produces valid JSON (or will always produce valid JSON into the future), and adding support for control characters explicitly ended up not being so bad. I think the cost of adding the cases for the different characters is small, and it makes the implementation more closely match the language of the specification. I prefer that to having to worry about some edge case down the road where .inspect produces something that JSON parsers don't like.

@nateberkopec
Copy link
Member

Thank you so much for the reviews @MSP-Greg @ekohl and @rmacklin 🙇

require_relative "helper"
require "puma/json"

class TestJSON < Minitest::Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all of these, does it make sense to assert JSON.parse(Puma::JSON.generate(value)) == value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensuring that Puma::JSON.generate returns strings that can by parsed by ::JSON.parse seems useful, for sure.

Updated in b905c34


private

def assert_puma_json_generates_string(expected_output, value_to_serialize, expected_roundtrip: value_to_serialize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never seen this before. Does value_to_serialize refer to the value that was passed in as the second argument?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to work that way, but I couldn't find that behavior being explicitly supported in any official Ruby documentation, so I made it a little more clear in 0bb2df3

@cjlarose
Copy link
Member Author

cjlarose commented Nov 8, 2020

I think I've addressed all of the feedback so far. Lemme know if there's anything else we should do!

Copy link
Contributor

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a complex PR so I may have missed something, but overall I think it's a good approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug waiting-for-review Waiting on review from anyone
Projects
None yet
Development

Successfully merging this pull request may close these issues.

phased-restart errors after upgrading from Puma 3.12.6 to 5.0.4
5 participants