Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worse performance when response schema specified #3498

Closed
2 tasks done
dgg opened this issue Nov 29, 2021 · 19 comments · Fixed by fastify/fast-json-stringify#402
Closed
2 tasks done

Worse performance when response schema specified #3498

dgg opened this issue Nov 29, 2021 · 19 comments · Fixed by fastify/fast-json-stringify#402

Comments

@dgg
Copy link

dgg commented Nov 29, 2021

Prerequisites

  • I have written a descriptive issue title
  • I have searched existing issues to ensure the bug has not already been reported

Fastify version

3.20.2

Plugin version

No response

Node.js version

14.17.1

Operating system

macOS

Operating system version (i.e. 20.04, 11.3, 10)

Mojave/Monterey

Description

We have a service written in fastify that queries a time series database and generates a sizeable report (around 7 MB of JSON).
The endpoint has a json schema setup as the successful (200) response schema.

We saw some performance problems with it (taking long to execute the request) and, while digging, we saw signs that pointed to the serialization of the response.

We then proceeded to remove the response schema and we saw a performance improvement (despite what documentation suggests: that response schemas improves performance).

Steps to Reproduce

The response schema:

{
	"definitions": {},
	"$schema": "http://json-schema.org/draft-07/schema#",
	"$id": "response",
	"type": "object",
	"additionalProperties": false,
	"required": [
		"series"
	],
	"properties": {
		"series": {
			"$id": "#/properties/series",
			"type": "array",
			"items": {
				"$id": "#/properties/series/items",
				"type": "object",
				"additionalProperties": false,
				"required": [
					"name",
					"columns",
					"points"
				],
				"properties": {
					"name": {
						"$id": "#/properties/series/items/properties/name",
						"type": "string",
						"minLength": 1,
						"examples": [
							"223926"
						]
					},
					"columns": {
						"$id": "#/properties/series/items/properties/columns",
						"type": "array",
						"minItems": 2,
						"items": {
							"$id": "#/properties/series/items/properties/columns/items",
							"type": "string",
							"minLength": 1
						}
					},
					"points": {
						"$id": "#/properties/series/items/properties/points",
						"type": "array",
						"items": {
							"$id": "#/properties/series/items/properties/points/items",
							"type": "array",
							"minItems": 2,
							"items": {
								"$id": "#/properties/series/items/properties/points/items/items",
								"type": [
									"number",
									"boolean",
									"string",
									"null"
								],
								"minLength": 1
							}
						}
					}
				}
			}
		}
	}
}

The data file can be retrieved from this gist: https://gist.github.com/dgg/b1ef44e0ad88097134e716bfdd6155a5

The service with two endpoints (one that uses the schema and one that does not):

const { fastify } = require("fastify")

const data = require("./data.json")
const schema = require("./schema.json")

const options = {
	schema: {
		response: {
			200: schema
		}
	}
}

const instance = fastify({ logger: true })
	.post("/with", options, (request, reply) => {
		reply.status(200).send(data)
	})
	.post("/without", (request, reply) => {
		reply.status(200).send(data)
	})

// Run the server!
const start = async () => {
	try {
		await instance.listen(3000)
	} catch (err) {
		instance.log.error(err)
		process.exit(1)
	}
}
start()

Results when executing a single request against the endpoint that does not use the response schema:

{"level":30,"time":1638179955533,"pid":8471,"hostname":"macbook-pro.lan","reqId":"req-1","req":{"method":"POST","url":"/without","hostname":"localhost:3000","remoteAddress":"127.0.0.1","remotePort":50433},"msg":"incoming request"}
{"level":30,"time":1638179955609,"pid":8471,"hostname":"macbook-pro.lan","reqId":"req-1","res":{"statusCode":200},"responseTime":75.19169599562883,"msg":"request completed"}

Results when executing a single request against the endpoint that does use the response schema:

{"level":30,"time":1638179973829,"pid":8471,"hostname":"macbook-pro.lan","reqId":"req-2","req":{"method":"POST","url":"/with","hostname":"localhost:3000","remoteAddress":"127.0.0.1","remotePort":50434},"msg":"incoming request"}
{"level":30,"time":1638179974192,"pid":8471,"hostname":"macbook-pro.lan","reqId":"req-2","res":{"statusCode":200},"responseTime":362.3208159953356,"msg":"request completed"}

As one can see, the response time is significantly worse.

We have tried with newer versions of fastify and the problem still remains.
We have also tried to get the JSON data from an asynchronous source (like reading a file asynchronously and the parsing the string) and it did not make a difference.

Expected Behavior

Better (or at last not worse) response times when response schemas are used.

@zekth
Copy link
Member

zekth commented Nov 29, 2021

Interesting. That might be an issue in https://github.com/fastify/fast-json-stringify

We did not include huge JSON files to serialize like yours. Worth creating more benchmarks.

@Eomm
Copy link
Member

Eomm commented Nov 29, 2021

From docs:

fast-json-stringify is significantly faster than JSON.stringify() for small payloads. Its performance advantage shrinks as your payload grows.

We should be more clear about what it means small in KB, but 7MB is not small for sure.

It should be possible to write a circuit-breker logic where we skip the json-schema serialization and run the JSON.stringify one 🤔
I think it is useful to set up a schema (for documentation at least) without using it

@dgg
Copy link
Author

dgg commented Nov 29, 2021

Well, that depends if you are generating your docs (let's say your OpenAPI spect) from fastify metadata, you are not going to get the information about the schema if you don't use it.

I was not aware of that piece of documentation.
As I see it, not only the advantage shrinks, using fast-json-stringify turns into a "not an advantage at all" for larger payloads.

@mcollina
Copy link
Member

Good spot. Could you upload a reproducible example so that we can see if there is some unoptimized path that is causing trouble.

@dgg
Copy link
Author

dgg commented Nov 29, 2021

@mcollina do you mean a repo with the files that I included in the bug description? the schema, the server and the data that is returned?
That would be easy to provide.

@mcollina
Copy link
Member

Something that we can benchmark.

@dgg
Copy link
Author

dgg commented Nov 29, 2021

Here it goes: https://github.com/dgg/fastify-large-serialization
Let me know if there is anything missing.

@mcollina
Copy link
Member

mcollina commented Dec 4, 2021

I've done some investigation in this case and come out through a few conclusions.

The first - maybe not obvious - is that these routes are slow in both cases. On my machine the one without the schema can do about 14 req/sec vs 4 req/sec of the one with the schema. From an architecture perspective these kind of data dumps are better generated by a queue and downloaded asynchronously.

The second is that we have a bug in fast-json-stringify and it's going to be very hard to solve. The gist of it is that we spend about 25% of time in __pthread_cond_wait. In fact, the bug might even be inside Node.js or V8. This will require significant more time than the one I have available to look into.

I would conclude that we should develop and document a way to skip fjs when the developer think they'll hit this case. Morever we should identify the "breaking point" at which fjs starts being counterproductive. @Eomm could you look into it?

@mcollina
Copy link
Member

mcollina commented Dec 4, 2021

A few more interesting pieces.

Flamegraph without schema:
Screenshot 2021-12-04 at 16 24 28

Flamegraph with schema:
Screenshot 2021-12-04 at 16 24 37

@mcollina
Copy link
Member

mcollina commented Dec 4, 2021

On second though, it might also be related to: https://github.com/fastify/fast-json-stringify/blob/master/index.js#L1075-L1110 and how we huge arrays of many primitives.

@zekth
Copy link
Member

zekth commented Dec 4, 2021

On second though, it might also be related to: https://github.com/fastify/fast-json-stringify/blob/master/index.js#L1075-L1110 and how we huge arrays of many primitives.

This is worth creating an issue and add huge array in the benchmark too. I wonder if it's the [].map which makes this issue.

@mcollina
Copy link
Member

mcollina commented Dec 4, 2021

The solution to this case is to have this block: https://github.com/fastify/fast-json-stringify/blob/a46425478630325478646d7e131cd131e016a8fd/index.js#L1055-L1068

replaced with a JSON.stringify(obj) in case the types specified are all natives (like the one in this example). Note that this will lose any removal or check if the data is the right type (which fjs partially do) but it might be the right call to do.

@wilkmaia
Copy link
Contributor

wilkmaia commented Apr 6, 2022

I'm looking into this issue. I'll update with some notes and a PR asap

@wilkmaia
Copy link
Contributor

wilkmaia commented Apr 6, 2022

I've submitted fastify/fast-json-stringify#402 and added extensive notes on what I've found on the PR's description. I'm out of ideas of what else we could do to improve it, specially without compromising on the spec validation.

I've proposed a mixed solution, which would give the user the ability to pick a specific mechanism to handle such edge cases. Right now the idea of allowing the user to submit their own array handling function came to mind as well. But I'm not sure whether that'd still be in the scope of this issue and whether that'd be a good feature for the lib overall.

Please, check the PR and comment on the proposed solution and the investigation done. It's possible I might've overlooked something.

@simplenotezy
Copy link

@wilkmaia thanks for your work! When dealing with large payloads with fastify, is there something I need to do, in order to configure your PR? Or would handle it out of the box, by automatically judging payload size, and then using the most appropriate method?

@wilkmaia
Copy link
Contributor

wilkmaia commented Mar 9, 2023

Hi @simplenotezy. If you're experiencing performance issues you might want to tweak the settings outlined in https://github.com/fastify/fast-json-stringify#large-arrays. But bear in mind this only relates to large arrays, not large payloads in general (i.e. a large json document with no arrays wouldn't benefit from that).

@simplenotezy
Copy link

@wilkmaia thanks for your prompt reply. I have a Nest.js application that uses GraphQL. We have a single (backoffice) endpoint that serves a rather large payload (~3-10mb), and I'd like to optimise that.

I've been looking around for a fast-json-stringify option in the FastifyAdapter, but couldn't find any

  const fastify = new FastifyAdapter({
    logger: false,
    bodyLimit: MAX_BODY_SIZE,
  });

  fastify.enableCors({
    origin: '*',
    methods: ['GET', 'POST', 'PUT', 'DELETE', 'OPTIONS'],
  });

  const app = await NestFactory.create<NestFastifyApplication>(AppModule, fastify, {

@wilkmaia
Copy link
Contributor

wilkmaia commented Mar 9, 2023

@simplenotezy judging by FastifyAdapter's constructor you can simply pass FastifyServer's options and it'd handle it. The param you're looking for is serializerOpts.

@simplenotezy
Copy link

aha, thanks for that pointer @wilkmaia!

@fastify fastify locked as resolved and limited conversation to collaborators Mar 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants