Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add one more Engine for PostgreSQL #986

Open
chandr-andr opened this issue Apr 24, 2024 · 7 comments
Open

Add one more Engine for PostgreSQL #986

chandr-andr opened this issue Apr 24, 2024 · 7 comments

Comments

@chandr-andr
Copy link

Hello! Thank you for your awesome ORM.

Recently, I've created a new driver for PostgreSQL - https://github.com/qaspen-python/psqlpy. It shows a significant performance upgrade in comparison to asyncpg.
It has already been tested in some production services and I can say it is production-ready.

Do you mind if I create a big PR that adds a new engine based on PSQLPy?

@sinisaos
Copy link
Member

@chandr-andr Thank you for your interest in Piccolo. Your work with psqlpy and qaspen (I didn't go into details and had a quick look at the source code and I like that it has a lot of similarities with Piccolo :) ) is also great.
I have nothing against the alternative driver, but I have one question regarding the performance that you stated have a significant performance upgrade in comparison to asyncpg.
Apologies in advance if I'm doing something wrong as I'm not a benchmark expert and I'm using the http bencmark test as the ORM or driver will most likely be used for a web application. I've tried the FastAPI example from the docs and built a similar example with asyncpg like this

# users table only have 2 columns (id and name)
import asyncpg
import uvicorn

from contextlib import asynccontextmanager
from typing import AsyncGenerator

from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse


@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
    """Startup database connection pool and close it on shutdown."""
    db_pool = await asyncpg.create_pool(
        dsn="postgres://postgres:postgres@localhost:5432/psqlpydb",
        max_size=10,
    )
    app.state.db_pool = db_pool
    yield
    await db_pool.close()


app = FastAPI(lifespan=lifespan)


@app.get("/asyncpg")
async def pg_pool_example(request: Request):
    query_result = await request.app.state.db_pool.fetch(
        "SELECT * FROM users",
    )
    return JSONResponse(content=[dict(item) for item in query_result])


if __name__ == "__main__":
    uvicorn.run("app:app" )

and the http benchmark results are pretty similar and asyncpg does a bit better? Here are the results.

rkl@mint21:~$ bombardier -c 500 -d 10s -l http://localhost:8000/asyncpg
Bombarding http://localhost:8000/asyncpg for 10s using 500 connection(s)
[============================================================================================]10s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec       707.47     109.03    1021.59
  Latency      678.21ms    70.37ms      1.54s
  Latency Distribution
     50%   687.24ms
     75%   698.83ms
     90%   709.14ms
     95%   718.07ms
     99%      0.88s
  HTTP codes:
    1xx - 0, 2xx - 7556, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:   218.29KB/s
rkl@mint21:~$ bombardier -c 500 -d 10s -l http://localhost:8000/psqlpy
Bombarding http://localhost:8000/psqlpy for 10s using 500 connection(s)
[============================================================================================] 10s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec       639.18      90.12     975.61
  Latency      745.92ms    72.92ms      1.08s
  Latency Distribution
     50%   755.51ms
     75%   762.43ms
     90%   784.68ms
     95%   797.32ms
     99%      0.97s
  HTTP codes:
    1xx - 0, 2xx - 6860, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:   197.62KB/s

Sorry again if I missed the point about driver performance.

@chandr-andr
Copy link
Author

@sinisaos Thanks for the quick feedback!
I found your benchmarks interesting and decided to find out what's going on.
So, in my example in the docs, I set only 2 connections for the connection pool, so if you used it without any change it has to lose to the asyncpg.

I created the FastAPI application (the same as yours) but set 10 connections for the connection pool in psqlpy.
The results are above:
PSQLPy

> ./bombardier -c 500 -d 10s -l http://127.0.0.1:8000/psqlpy
Bombarding http://127.0.0.1:8000/psqlpy for 10s using 500 connection(s)
[================================================================================================================================================================================================================] 10s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec      4487.78     575.16    6509.45
  Latency      110.76ms     8.40ms   272.14ms
  Latency Distribution
     50%   109.78ms
     75%   113.55ms
     90%   117.31ms
     95%   121.84ms
     99%   146.59ms
  HTTP codes:
    1xx - 0, 2xx - 45289, 3xx - 0, 4xx - 0, 5xx - 0
    others - 3
  Errors:
    dial tcp 127.0.0.1:8000: connect: connection reset by peer - 3
  Throughput:     1.14MB/s

And AsyncPG

> ./bombardier -c 500 -d 10s -l http://127.0.0.1:8000/asyncpg
Bombarding http://127.0.0.1:8000/asyncpg for 10s using 500 connection(s)
[================================================================================================================================================================================================================] 10s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec      4465.72     362.02    6188.94
  Latency      111.26ms   108.25ms      1.31s
  Latency Distribution
     50%   108.62ms
     75%   124.19ms
     90%   314.84ms
     95%   336.71ms
     99%   549.38ms
  HTTP codes:
    1xx - 0, 2xx - 45064, 3xx - 0, 4xx - 0, 5xx - 0
    others - 17
  Errors:
    dial tcp 127.0.0.1:8000: connect: connection reset by peer - 17
  Throughput:     1.14MB/s

In this test, PSQLPy beats asyncpg.
The difference, of course, is not so significant, but the query is as simple as possible too.
PSQLPy can give significant improvement with high-load big queries with a lot of output data.

In this example, there are not a lot of places for speeding up.

@chandr-andr
Copy link
Author

chandr-andr commented Apr 25, 2024

BTW, one guy made a database performance repo - https://github.com/ymezencev/db_perf with PSQLPy already.

@sinisaos
Copy link
Member

BTW, one guy made a database performance repo - https://github.com/ymezencev/db_perf with PSQLPy already.

@chandr-andr Thank you for your reply. According to the tests from that repo, the difference is significant for larger amounts of data, as you mentioned before, although I doubt anyone will load 50.000 rows from a single web api call without some sort of pagination. For example, PiccoloCRUD (which is used by Piccolo Admin), has a max_page_size parameter that limits the maximum number of rows returned per single API call. I have nothing against an alternative PostgreSQL engine (another option is always good), but you'll have to wait for the Piccolo ORM author to say what he thinks about all this. Cheers.

@dantownsend
Copy link
Member

It looks like a cool project.

Since we've only supported asyncpg up until now, I'm not sure how much work is involved in adding a new Postgres engine.

If we're lucky, it's just a case of creating a new Engine subclass, with no other changes to Piccolo, in which case it could even be a third party library at first.

@chandr-andr
Copy link
Author

@dantownsend Hello!
Thank you for the positive assessment of the PSQLPy.
I can do MVP engine based on the driver myself, I've had some experience with piccolo at work, so have knowledge about the internals.
I think I can do it in a new repo near PSQLPy.

@dantownsend
Copy link
Member

@chandr-andr That would be great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants