Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of type map creation #376

Merged
merged 3 commits into from
Jul 29, 2021
Merged

Conversation

amarshall
Copy link
Contributor

Please see individual commits for extensive details. I’m happy to separate into multiple PRs if some work needs more vetting or discussion, as each commit can stand on its own (but separating will require resolving conflicts as commits 2 & 3 both depend on 1).

This is the result of investigation after encountering some performance issues with type map initialization being unexpectedly slow (after implementing a more time-sensitive use case) when the number of rows queries from pg_type is quite large (~131k in production). The enclosed performance testing was done on in an ideal (and faster) development environment with similar schema and slightly large pg_type set (~189k). In our production environment we were seeing type map creation take 4,102±550 ms, and since we create two type maps, that time doubles currently. Needless to say, that is quite slow and these changes aim to improve that.

Finally, I am curious if it is possible to reduce the number of rows which are actually used to generate the coder maps, but I’m not certain if all of there are truly needed. It appears that every table with an array column gets two rows (one array_in, one record_in); since we have many tables this ends up summing to quite a lot.

Usage of the range data was previously commented-out but was removed
entirely in 365008e.
Testing methodology:

- 189k rows returned from pg_type query in `build_coder_maps`.
- Timing calling `PG::BasicTypeMapForQueries.new` only, figures are
  average of 16 iterations.

                 runtime        objects        allocations
Baseline :  2,027±350 ms          909 k            609 MiB
Optimized:    874± 70 ms (-56%)   113 k (-88%)     113 MiB (-79%)

Unfortunately performing the actual SQL query cannot easily be removed,
so there is a fair bit of variance in the runtimes, however, results
still show substantial improvement even when considering the worse case.
If I had more time I would time the query and remove it, but that
requires a bit more hacking than I would prefer, and, as said, the
improvement stands either way.
Building coder maps can be quite expensive, building them ahead-of-time
allows them to be reused for multiple type map creations, e.g. if
creating both BasicTypeMapForQueries and BasicTypeMapForResults the cost
need only be paid once.

Example usage:

    conn = PG.connect(…)
    coder_maps = BasicTypeRegistry.build_coder_maps(conn)
    tmq = BasicTypeMapForQueries.new(conn, coder_maps: coder_maps)
    tmr = BasicTypeMapForResults.new(conn, coder_maps: coder_maps)

Testing methodology:

- 189k rows returned from pg_type query in `build_coder_maps`.
- Cached and injected results from query to remove query cost and
  jitter from baseline.
- Timing calling `PG::BasicTypeMapForQueries.new` only, figures are
  average of 32 iterations.
- “Cached” calls `build_coder_maps` ahead-of-time, outside the timing
  loop, and passes-in.

          runtime
Baseline:  374±16 ms
Cached  :   <1± 0 ms (-99%)

This confirms that the majority of the cost in creating a new type map
is, when ignoring SQL query and results materialization, enclosed within
`build_coder_maps`. As such, for *n* type maps, this reduces the
(sometimes non-small) time complexity effectively from O(n) → O(1).
@amarshall
Copy link
Contributor Author

@larskanis Any thoughts on this?

@larskanis larskanis merged commit 7fe1eba into ged:master Jul 29, 2021
larskanis added a commit that referenced this pull request Jul 30, 2021
This way one CoderMapsBundle can be used to build several BasicTypeMap instances.
So only one database query on pg_types is executed then.

Follow up to #376
@amarshall amarshall deleted the perf branch November 19, 2022 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants