Skip to content

joelonsql/pg-timeit

Repository files navigation

⏱️🐘timeit

  1. About
  2. Dependencies
  3. Installation
  4. Usage
  5. API
    1. timeit.s()
    2. timeit.h()
    3. timeit.async()
    4. timeit.work()
  6. Internal types
    1. test_state
  7. Internal functions
    1. timeit.round_to_sig_figs()
    2. timeit.measure()
    3. timeit.overhead()
    4. timeit.eval()
  8. Examples

1. About

timeit is a PostgreSQL extension to measure the execution time of built-in internal C-functions with high resolution. The high accurancy is achived by also adjusting for the overhead of the measurement itself. The number of necessary test iterations/executions are auto-detected and increased until the final result has the desired number of signifiant figures.

To minimize noise, the executions and measurements are performed in C, as measuring using PL/pgSQL would be too noisy.

timeit.s() and timeit.h() immediately measures the execution time for given internal function. These are suitable when you simply want to do a single measurement.

timeit.s() returns the execution time in seconds as a numeric value. timeit.h() returns the execution time as a human-readable text value, e.g. "100 ms".

The below example measures the execution time to compute the square root for the numeric value 2, and returns the result in nanoseconds.

CREATE EXTENSION timeit;

SELECT timeit.h('clock_timestamp');
   h
-------
 30 ns
(1 row)

SELECT timeit.h('numeric_sqrt','{2}');
   h
--------
 300 ns
(1 row)

SELECT timeit.h('numeric_sqrt','{2e131071}');
   h
-------
 30 ms
(1 row)

SELECT timeit.s('numeric_sqrt','{2e131071}');
  s
------
 0.03
(1 row)

Another simple example where we measure pg_sleep(1) with three significant_figures.

CREATE EXTENSION timeit;

SELECT timeit.h('pg_sleep','{1}', 3);
   h
--------
 1.00 s
(1 row)

timeit.async() defers the measurement and just returns an id, that the caller can keep to allow the final_result to be joined in later when the execution time has been computed by the timeit.work() PROCEDURE.

timeit.async() is meant to be used when you have lots of functions/argument combinations you want to measure, and have some query or program to generates those combinations for you, and you want to request all those values to be measured, without having to do the actual measurement immediately.

SELECT timeit.async('numeric_sqrt',ARRAY[format('2e%s',unnest)])
FROM unnest(ARRAY[0,10,100,1000,10000,100000,131071]);

 async
-------
     1
     2
     3
     4
     5
     6
     7
(7 rows)

--
-- timeit.work() can be executed manually like in this example,
-- or, run in the background by adding pg-timeit-worker.sh
-- to e.g. launchd or systemd, see Installation section.
--
CALL timeit.work(return_when_idle := true);
NOTICE:  2023-01-27 22:05:17+01 working
NOTICE:  2023-01-27 22:05:17+01 7 in queue
NOTICE:  2023-01-27 22:05:17+01 7 in queue
NOTICE:  2023-01-27 22:05:17+01 7 in queue
NOTICE:  2023-01-27 22:05:17+01 3 in queue
NOTICE:  2023-01-27 22:05:17+01 3 in queue
NOTICE:  2023-01-27 22:05:17+01 3 in queue
NOTICE:  2023-01-27 22:05:17+01 3 in queue
NOTICE:  2023-01-27 22:05:17+01 3 in queue
NOTICE:  2023-01-27 22:05:17+01 3 in queue
NOTICE:  2023-01-27 22:05:17+01 3 in queue
NOTICE:  2023-01-27 22:05:17+01 3 in queue
NOTICE:  2023-01-27 22:05:17+01 2 in queue
NOTICE:  2023-01-27 22:05:17+01 2 in queue
NOTICE:  2023-01-27 22:05:17+01 2 in queue
NOTICE:  2023-01-27 22:05:17+01 2 in queue
NOTICE:  2023-01-27 22:05:17+01 2 in queue
NOTICE:  2023-01-27 22:05:17+01 2 in queue
NOTICE:  2023-01-27 22:05:17+01 idle

SELECT
    tests.id,
    test_params.function_name,
    test_params.input_values,
    tests.executions,
    tests.final_result
FROM timeit.tests
JOIN timeit.test_params USING (id)
ORDER BY id;

 id | function_name | input_values | executions | final_result
----+---------------+--------------+------------+--------------
  1 | numeric_sqrt  | {2e0}        |        128 |   0.00000009
  2 | numeric_sqrt  | {2e10}       |        128 |   0.00000009
  3 | numeric_sqrt  | {2e100}      |         16 |    0.0000007
  4 | numeric_sqrt  | {2e1000}     |          1 |      0.00001
  5 | numeric_sqrt  | {2e10000}    |          1 |       0.0002
  6 | numeric_sqrt  | {2e100000}   |          1 |         0.02
  7 | numeric_sqrt  | {2e131071}   |          1 |         0.03
(7 rows)

Another nice thing with timeit.async() and timeit.work() is that they spread out the actual measurements over time, by first doing one measurement, and then instead of immediately proceeding an doing a second measurement of the same thing, timeit.work() will instead store the first measurement, and then proceed to do other measurements, and only in the next cycle proceed and do the second measurement, after which both measurements are compared to see if a result can be produced, or if the number of executions needs to be increased further.

By separating the first and second measurement from each other in time, the risk is reduced that both would be similarily unusually slow, due to the CPU being similarily busy both times.

2. Dependencies

None.

3. Installation

Install the timeit extension with:

$ git clone https://github.com/joelonsql/pg-timeit.git
$ cd pg-timeit
$ make
$ sudo make install
$ make installcheck

Optionally, if you need timeit.async(), you also need to schedule the timeit.work() procedure to run in the background.

Here is how to do that on Ubuntu Linux:

$ sudo cp timeit-worker.sh /usr/local/bin/
$ sudo cp timeit.service /etc/systemd/system/
$ sudo systemctl enable timeit
$ sudo systemctl start timeit

Here is how to do that on Mac OS X:

$ ./add-timeit-worker-to-launchd.sh

4. Usage

Use with:

$ psql
# CREATE EXTENSION timeit;
CREATE EXTENSION;

5. API

timeit.s(function_name [, input_values ] [, significant_figures] [, timeout] [, attempts]) → numeric

Input Parameter Type Default
function_name text
input_values text[] ARRAY[]::text[]
significant_figures integer 1
timeout interval NULL
attempts integer 1

Immediately measure the execution run time of the built-in internal function named function_name.

Returns the measured execution time in seconds.

Optionally, arguments can be passed by specifying input_values.

The desired precision of the returned final result can be specified via significant_figures, which defaults to 1.

A maximum timeout interval per attempt can be specified via the timeout input parameter, which defaults to NULL, which means no timeout.

After attempts timeouts, the significant_figures will be decreased by one and attempts new attempts will be made at that precision level.

When there are no more attempts and when sig. figures. can't be decreased further, it will give up.

timeit.h(function_name [, input_values ] [, significant_figures] [, timeout] [, attempts]) → text

Like timeit.s(), but returns result a time unit pretty formatted text string, e.g. "100 ms".

timeit.async(function_name [, input_values ] [, significant_figures] [, timeout] [, attempts]) → bigint

Input Parameter Type Default
function_name text
input_values text[] ARRAY[]::text[]
significant_figures integer 1
timeout interval NULL
attempts integer 1

Request measurement of the execution run time of function_name.

Returns a bigint id value that is the primary key in the timeit.tests table where the requested test is stored to.

timeit.work()

This procedure performs the measurements requested by timeit.async() and is supposed to be called in a loop from a script or cronjob.

See the Installation section for how to daemonize it.

6. Internal types

test_state

timeit.test_state is an ENUM with the following elements:

  • init: Test has just been initiated.
  • run_test_1: First test should be executed.
  • run_test_2: Second test should be executed.
  • final: The final result has been determined.

7. Internal functions

timeit.round_to_sig_figs(numeric, integer) → numeric

Round numeric value to integer number of significant figures.

SELECT timeit.round_to_sig_figs(1234,2);
 round_to_sig_figs
-------------------
              1200
(1 row)

SELECT timeit.round_to_sig_figs(12.456,3);
 round_to_sig_figs
-------------------
              12.5
(1 row)

SELECT timeit.round_to_sig_figs(0.00012456,3);
 round_to_sig_figs
-------------------
          0.000125
(1 row)

timeit.measure(function_name text, input_values text[], executions bigint) -> numeric

Performs executions number of executions of function_name with arguments passed via input_values.

timeit.eval(function_name text, input_values text[]) -> text

Performs a single execution of function_name with arguments passed via input_values.

Returns the result casted to text.

8. Examples

Let's say we want to measure the execution time of

1.5 + 2.5

Under the hood, the arithmetic add operation is computed using

numeric_add(1.5,2.5)

so the following two statements do the same thing:

SELECT 1.5 + 2.5;
 ?column?
----------
      4.0
(1 row)

SELECT numeric_add(1.5, 2.5);
 numeric_add
-------------
         4.0
(1 row)

To measure it:

SELECT timeit.h('numeric_add', ARRAY['1.5','2.5']);
   h
-------
 60 ns
(1 row)

By default, a result with one significant figure is produced.

If we instead want two significant figures:

SELECT timeit.h('numeric_add', ARRAY['1.5','2.5'], 2);
   h
-------
 24 ns
(1 row)

We could never measure such a short duration simply by using psql's \timing, or EXPLAIN ANALYZE.

=# \timing
Timing is on.
=# EXPLAIN ANALYZE SELECT 1.5 + 2.5;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.002..0.003 rows=1 loops=1)
 Planning Time: 0.088 ms
 Execution Time: 0.024 ms
(3 rows)

Time: 0.909 ms

It's noteworthy both 0.024 ms and 0.909 ms are wrong with several orders of magnitude, which is expected, and not a critisism, since they report the time for the entire statement, not just the expression that we're interested in.

To understand how off these values are from each other, let's convert them to seconds to allow visual comparison:

0.000024    <-- EXPLAIN ANALYZE "Exeuction Time"
0.000909    <-- \timing
0.000000024 <-- timeit.h()

But how do we know the claimed exeuction time by timeit.h() is reasonable?

Let's demonstrate how we can verify it manually, only standard PostgreSQL and its existing system catalog functions.

If it's true that the exeuction time is 24 ns, it should take about 2.4 s to do 1e8 executions. We will use generate_series() for the executions, but we first have to solve two problems. We have to account for the overhead of generate_series(). We must also avoid caching of the immutable expression, and create a volatile version of numeric_add.

CREATE FUNCTION numeric_add_volatile(numeric,numeric)
RETURNS numeric
VOLATILE
LANGUAGE internal
AS 'numeric_add';

Let's measure the overhead time of generate_series:

SELECT count(null) FROM generate_series(1,1e8);
 count
-------
     0
(1 row)

Time: 11344.158 ms (00:11.344)

The execution time is not notably affected by an immutable expression, since it's only computed once, and that time is minuscule in relation to the time to generate the 1e8 long series:

SELECT count(1.5 + 2.5) FROM generate_series(1,1e8);
   count
-----------
 100000000
(1 row)

Time: 11688.894 ms (00:11.689)

If we now instead invoke our volatile version of it, we will notice how it takes about 3 seconds longer time, which nicely matches the timeit.h() returned result:

SELECT count(numeric_add_volatile(1.5, 2.5)) FROM generate_series(1,1e8);
   count
-----------
 100000000
(1 row)

Time: 14524.570 ms (00:14.525)

SELECT 14.525-11.344;
 ?column?
----------
    3.181
(1 row)

Note how 3.181 s is quite close to the predicted time 2.4 s.

Under the hood, timeit measures the execution time by executing the function using C, and doesn't use generate_series() at all. This was only meant to demonstrate how users could verify the correctness of the timeit measurements, using tools available in standard PostgreSQL.

About

PostgreSQL extension that provides high-accuracy timing of SQL expressions.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published