-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huge RSS usage retained (suspect JSONB
fields)
#21471
Comments
Thanks for opening an issue about this observation. Let's dig in:
Can you clarify a bit exactly how you are loading, processing and saving the data? I see you mentioned
Same question here. What exactly are you reading how?
Are you actively using metrics and tracing in this setup? Is the increases RSS usage causing any concrete problems? Would additional activity that needs more memory fail because of the current high retained memory? Or would the memory be freed on demand when needed? |
We have an event re-running operation based on inserting a new event in a timeline ( Load: Process (no/very minimal prisma involved): Save: We were saving everything with We were fetching the entire We are using After running enough of the operations the pod will OOM. Increased RSS usage is fine but the fact that it is not freed after the operation has ended is the problem. In the pictured screenshot the operation begins and ends at around 1/4 of the timeline when you see the small heap increase. The rest of the plateau is retained after the operation has ended (though with the same We are currently not in a position to easily try a raw query so we can't confirm that reducing all the |
If you could go to that reproduction again and add the suspected field type, that would be super useful. We won't be able to do anything here before we see this ourselves, so have a reproduction that shows the problem, which is why I was asking for all these details. But it still seems like this would take a lot of time with unclear results if we can actually reproduce the problem. You are in a much better position to do that without missing the one tiny detail that might be relevant. |
The synthetic example uses a lot of application code so it's not something we're in a position to share currently. |
Good you found a workaround for now. (We are happy to sign an NDA if that would allow you to share the reproduction, email me at my first name @prisma.io and we can make that happen. If not, we'll have to find another reproduction somehow.) |
Hi @janpio, I met a similar memory leak but with postgresql Here is how I reproduce the leak: // create a table with bytea field
model Data {
id Int @id @default(autoincrement())
data Bytes @db.ByteA
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
@@map("data")
} // detecting
async function main() {
// feed the table a row with 1MB data.
const prisma = new PrismaClient();
await prisma.$connect();
// do not read bytea field
async function readNoBin() {
for (let i = 0; i < 1000; i++) {
const _ = await prisma.data.findFirst({
select: {
id: true,
createdAt: true,
updatedAt: true,
},
where: {
id: 1,
},
});
}
}
await readNoBin();
console.log(process.memoryUsage());
// read bytea field
async function read() {
for (let i = 0; i < 1000; i++) {
const _ = await prisma.data.findFirst({
where: {
id: 1,
},
});
}
}
await read();
await prisma.$disconnect();
console.log(process.memoryUsage());
// hold the process
while (true) {}
} The result shows me that after reading 1k times of bytea field, the rss increased to 200+MB from 60MB(which is taking by the program initiating), and it will never goes down. At the same time, the 1k times of reading without bytea field doesn't make any change on rss usage. |
@leighman I attempted a reproduction and didn't find any memory leak so far on macOS or Ubuntu Here is the setup I used:
Could you provide more information?
|
@forehalo I attempted a separate reproduction for your case and didn't find any memory leak so far. Please create a new issue and provide more information. Let me know if you need help in the new issue. |
@janpio Thanks. Yes, I will take a look at reproducing it with that repository but unfortunately just haven't had a chance. |
Hello, We use Postgres and had a few tables with I hope you figure out what's the cause of the problem, but there definitely seems to be something wrong with JSON support. |
@demurgos We're definitely interested in more information here 👀 For example, could you share with us some example data that you are storing as JSON? For the following fields:
Basically, I would use this to try a reproduction based on this template https://github.com/Jolg42/repro-21471, and it would help to have some sample data to attempt a minimal reproduction. Also, please let us know which version of Prisma & Node.js you were running. |
@demurgos just checking, let us know if you want to provide more information, based on my previous comment. It would be helpful for us. We are especially curious about the Node.js version you used, because Node.js v16 might have some issues and upgrading to v18+ should fix that. |
@forehalo Did you see my previous message? @leighman 🤞🏼 that you get a chance to provide more info related to my previous message. |
@Jolg42 The issue really showed up when he added a table to keep track of background jobs in the DB, with the following model:
We initially used Node 16 and noticed the memory issue. We then update to Node 21 and the issue were still present. Since the commit I linked above (5 days ago, switch I'll open a new issue once we have a reproduction. |
Unfortunately we're extremely busy and I'm unlikely to look at a reproduction any time soon since we were able to work around the issue by using |
JSONB
fields)
High RSS and low V8 heap usage hint that the problem is likely on the engine side and not on the JavaScript side. The fact that changing json columns to text columns and parsing JSON on the client side fixed the issue for @demurgos and @leighman hints at that as well. The sharp stair pattern on the graph looks like heap fragmentation to me and not a memory leak. While a runnable reproduction would certainly be ideal, I believe there may be almost enough information for us to attempt reproducing it ourselves again. What would really be helpful is if you can give us an idea of how your JSON objects look like (their shape, complexity, levels of nesting, and if they are complex, whether the individual objects they are comprised of are rather big or small), maybe an example of such object if possible. What we could also try is to build the query engine with jemalloc allocator and have you check if it fixes or alleviates the issue for you, since jemalloc is designed to reduce fragmentation. |
Yeh, my suspicion would be the string allocations in the engine somehow. The JSON values are about 20-500kb in size. In the example above this object is a top-level array containing 10-10000 objects. The objects represent a whole entity so can vary in complexity depending on the type. The bulk of the object is a
Individual objects are probably more likely to be small and numerous though some entities can be quite large. Generally the more numerous the entity, the smaller the As described we've since worked around this issue by casting the field so it wouldn't be trivial to check with a different allocator. |
There are some architectural reasons why the engine currently deserializes the JSON values internally but that shouldn't be necessary and we should fix this. Until then, using
Understood. I'm also not sure on the second thought that linking to a different allocator is a good idea anyway given that the engine is a shared library which would be loaded in the Node.js process which uses the system allocator. Good to know that the workaround works for you well, we'll keep you informed in this thread when we have any news. |
Hi, I've encountered a similar issue to the one described in the thread. While using Prisma in conjunction with PostgreSQL and a JSON column, the RSS memory usage is huge (reaching 1GB). Below is a picture for a single operation and multiple operations using Autocannon (https://www.npmjs.com/package/autocannon). As a benchmark, I tested the same data with Kysely (https://kysely.dev/), and the memory usage does not exceed 300 MB. The use of TEXT columns mitigates the issue, but this solution is not acceptable in the long term. I've documented the described issue here: https://github.com/pzgorzaly/prisma-issue-minimal-reproducible-repo |
@pzgorzaly Thanks for the minimal reproduction, we'll try this out soon to confirm if we get the same results on our end. |
Hey, I've been looking into this per the repo you linked @pzgorzaly Here are some clinic snapshots from my testing Prisma w/ JSONNode v20.12.2Node v18.20.2I didn't see anything massively different on node 18 in comparison other than higher event loop delay (1500 as opposed to 1200) Prisma w/o JSONNode v20.12.2I can note a pretty significant memory usage decrease here both compared to Node v18.20.2I used the repro you provided to write a test in our memory test-suite which I think further signals that the issue lies in engines as well 👍 Next step is to investigate engines directly :) |
Bug description
When using prisma for some queries we see rss memory usage vastly exceeding the size of our heap usage and this memory is not freed.
The two spikes you can see at the start of the cliff in the graph represent the read and write steps of our operation (a worst case example). As you can see the rss memory is retained until pod restart whereas heap usage frees correctly.
We don't have a minimal test case yet but it seems to relate to JSONB columns.
Our tables have some JSONB fields that can be quite large (up to 0.5MB) and we may load 100s or 1000s of records, process and update.
We have tested various versions of node and prisma (back to 4.8) as well as splitting create and update rather than using upsert.
Adding a
select
to the read operation (since we don't need to read the larger fields most of the time) seemed to eliminate the first spike leading me to suspect something to do with json marshalling or size of data being fetched.How to reproduce
Suspected.
Should see heap reduce suggesting no memory leak in application and rss usage vastly exceed and be retained.
Expected behavior
Rss memory should also return to baseline after operation
Prisma information
// Add your schema.prisma
// Add your code using Prisma Client
Environment & setup
Prisma Version
locally
The text was updated successfully, but these errors were encountered: