New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
org.h2.mvstore.MVStoreException: Chunk not found [2.2.224/9] #4012
Comments
Your database is corrupt. You will need to recover it from backups. |
Well, it happens all the time, even if I drop the db and start freshly. |
If you can build a standalone test case (i.e. just Java, no depencies), we can probably have a bash at fixing it |
You might also want to try doing a build from current HEAD and seeing it that fixes your issue, a few things have been fixed since the last release |
Mmm I can try. |
I've spoken too early.
The ResultSet is consumed as a Stream with import org.h2.util.IOUtils.closeSilently
import org.slf4j.LoggerFactory
import java.sql.ResultSet
import java.sql.SQLException
import java.util.Spliterators.AbstractSpliterator
import java.util.function.Consumer
import java.util.stream.Stream
import java.util.stream.StreamSupport.stream
import kotlin.Long.Companion.MAX_VALUE
/**
* @author Cosimo Damiano Prete
* @since 09/02/2024
**/
class ResultSetStream<E>(
private val resultSet: ResultSet,
private inline val isClosed: () -> Boolean,
private inline val rowMapper: (ResultSet) -> E
): Stream<E> by stream(object: AbstractSpliterator<E>(MAX_VALUE, ORDERED or DISTINCT or SORTED or NONNULL or IMMUTABLE) {
override fun tryAdvance(action: Consumer<in E>) = try {
val hasNext = !isClosed() && resultSet.next()
if(hasNext) action.accept(rowMapper(resultSet))
hasNext
} catch (ex: SQLException) {
logger.error(ex.toString(), ex)
false
}
override fun getComparator() = null
}, false) {
init {
resultSet.fetchSize = 1
onClose {
val statement = resultSet.statement
val connection = statement.connection
closeSilently(resultSet)
closeSilently(statement)
closeSilently(connection)
}
}
private companion object {
@JvmStatic
private val logger = LoggerFactory.getLogger(ResultSetStream::class.java)
}
} |
We're seeing the same issue as well, it seems to be exacerbated by running on slow storage mediums such as spinning disks or VMs with virtual storage like Amazon Workspaces when working with large data sets (10M+ rows across several tables, 5+GB database size). In our case we use H2 as a temporary database that is recreated with a new temp file name every time the application is started, so it shouldn't be a database corruption issue. One of the instances we've seen recur is the stack trace you have above from running a basic query, though that usually triggers after loading those 10M+ rows, then updating them all a couple of times then querying immediately after. The updates are done in parallel streams and the query that triggers the error happens immediately after once all updates are complete. Another variation of this error occurs for us when doing a large
Our code base is quite large and I've tried to create a simplified example to repro the issue but so far haven't had much success. Reproducing it in our code base is still intermittent and takes hours of processing to trigger. The most success I've had is plugging in an old spinning disk and moving the DB to that drive for testing. |
Too bad that on my side I've the DB file on an SSD :-/ |
Yeah, it doesn't happen only on slower drives I just think slower drives reproduce the issue more easily. |
@ejwilburn but, indeed, I think we agree it happens for large data sets for both of us. |
So I've continued looking into this issue and managed to create a test that repros reliably in our main code base, though I haven't finished creating a simplified version I can share with the H2 team yet, but in essence here's what the code does:
The query itself is somewhat more complex than that, including a window function, but that's the jist. I tried a variety of H2 options to see if anything would help, turning off lazy querying, changing page sizes, trying Currently up to 9M total records without issue and trying 15M next, though running these tests on a spinning disk to make reproduction easier takes a very long time. Last run of 15M took ~10 hours before it failed. However, the runs with 7M total records in the output table were failing reliably in about 90 minutes. My guess is that increasing the write delay is just increasing the point at which this error starts to occur reliably, not really a fix for the underlying cause and obviously increasing the write delay that much has reliability concerns in situations where you care about recovery if there's a power/system failure. |
@ejwilburn Thank you for posting a status update. Why would you try to turn any knob possible, when it is hard to reproduce failure even for a given set of configuration parameters? I would focus on that first case of yours, where you were able to reproduce it. |
@andreitokar Because I was able to get a case I could reproduce consistently and I was looking for options to resolve this issue in our application as soon as possible. Upping the |
This is definitely related to So to summarize:
|
Hello, If I am not mistaken, Note that there were some changes in latest versions (2.2.222 / 2.2.224) to trigger the chunk rewrites more often (cf. #3848). I am wondering if you can reproduce the error using H2 version 2.1.214 for instance. In any case, I believe that to find the root cause and a fix, a simplified test case that reproduce the error will be needed. And I am looking forward to having @andreitokar's opinion on this. Kr, |
In theory I should be able to repro with just the standalone server and a basic SQL script using |
Finally got a stand alone app that reproduces the issue using JDBC: https://github.com/ejwilburn/h2-chunk-not-found Details are in the readme. I did try creating a SQL script to do this in just a stand-alone H2 server but it didn't repro the issue, possibly due to the difference between inserting the individual records via JDBC vs. generating them using SELECT FROM SYSTEM_RANGE() since that would be a single transaction instead of many. Also the script was waaay slower than inserting the records one at a time via JDBC for some reason. 🤷 Going to try downgrading to 2.1.214 and see if that has any effect. |
The first few runs of 2.1.214 aren't reproducing the issue, trying larger datasets to see if that has any impact. |
2.1.214 did reproduce when I upped the amount of data in the test by five times, so it looks like the changes in 2.2.222/2.2.224 exacerbated the issue but aren't the root cause. |
FYI, did more testing and raising the write delay to just 1000, or double the default, was enough to prevent the issue. I tested in a DB with about 65M total records. Watching Windows performance meter while heavy writes were going on, the response time from the drive was frequently in the high 400ms range and was likely peaking over 500ms at times, which again makes me think this is an issue when the time to write to the drive is longer than the write delay. |
@ejwilburn Thank You very much for creating that test case. |
On master now. |
Initial test looks good, running some larger tests and will report back. FYI we ran into a similar issue with large queries when lazy query execution is enabled and I think it's somewhat related in that the versions being used by that lazy query aren't being marked as used and are being GC'd before the next chunk of the query is returned, resulting in the chunk not found error. We worked around that by just disabling lazy query execution, which isn't ideal for us. I know that used to be marked as an experimental feature in the docs but doesn't appear to be anymore. Would you like me to open another issue for that? Not sure when I'd have time to create a simple repro for that. |
Longer test finished with no problems, looks like #4047 fixed it. |
That's fancy because in my case I'm always sending a single prepared statement with a single instruction into it every time. |
Hi @andreitokar. Would it be possible to get it released so that I can test it also from my side? |
Hi.
While executing a query and "streaming" the content of the
ResultSet
I get a "Chunk XYZ not found" exception.Can you please provide some help on why this is happening and how to solve it?
Kind regards and thanks.
The text was updated successfully, but these errors were encountered: