Skip to content

Allow increasing SQLite limits #568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 27, 2021
Merged

Allow increasing SQLite limits #568

merged 1 commit into from
Jun 27, 2021

Conversation

jehugaleahsa
Copy link
Contributor

This change is related to issues #523 and #550. @BastouP411 submitted a PR that made it possible to call sqlite3_limit using SQLiteConfig pragmas. This PR included most of the heavy lifting needed to support overriding limits at runtime; however, several of the limits imposed by SQLite are compile-time constants, meaning you can only decrease limits at runtime and not increase them. Specifically for my case, this preventing me from increasing the maximum ATTACH limit, which would allow me to ATTACH more than 10 SQLite files at a time.

The solution I am proposing in this PR is to update the MAKE file to compile SQLite with the documented maximum values supported. However, just defaulting these to the maximums could pose a security risk and the SQLite defaults should be set whenever the connection is created, by default. So you will see in the SQLiteConfig.java file that I define constants for the defaults SQLite recommends. Rather than pass -1 to the setLimit method from now on, I pass these defaults. One consequent of this is that I will call setLimit every time a connection is created, not only when the pragma is configured. I did some testing locally and this has almost not performance impact.

I also refactored the code calling setLimit slightly. The call to sqlite3_limit is a no-op if the integer value is negative, simply returning the current limit value. However, there's no reason to call this in setLimit, so I explicitly skip the call in that case. Additionally, I moved the check for whether a configuration setting was provided for the pragma into the parseLimitPragma method (formally called getInteger). This avoids unnecessarily parsing a string with a know integer value and saves us from repeating the pragrmaTable.containsKey check several times.

Just for safety, I added a NumberFormatException handler and use the default if the given value isn't a number -- I could also see letting the exception propagate so developers know they have an invalid configurating setting -- just let me know if you want this code to throw instead.

Overall, these two changes make it possible to pass the SQLite integer defaults every time a connection is created. The checks for whether the pragma was provided and the "negative integer" check should make most of these calls to sqlite3_limit no-ops and so should have the same behavior as the current commit with the difference that the defaults get set when no configuration setting is provided.

I also noticed that the "maximum page count" setting was not available, so I added this while I was at it. I think I updated everywhere this needed to be updated.

To test this, I ran mvn install locally and used the -SNAPSHOT version in a pom.xml file. I first tested it without configuring the SQLiteConfig object. I confirmed if I tried to exceed 10 ATTACH-ed files the expected exception was thrown. However, if I then updated the SQLiteConfig object to allow the maximum of 125 ATTACH-ed files, my unit tests passed, so the correct behavior is being exhibited.

@jehugaleahsa
Copy link
Contributor Author

@xerial I wasn't sure if this was on your radar. I didn't see anything in the README_BUILD about doing anything other than submit a PR, so let me know if there are additional steps you'd like me to take. I also wasn't sure if you tried to coordinate changes on a release schedule or just whenever you get free time.

I wanted to highlight three things to get your feedback on:

  1. Should I allow the NumberFormatException to be thrown if the pragma isn't a parseable number? I think allowing it to be thrown would be more backward compatible with what's currently in production, so I would need to change this in my PR. However, I wanted to get your feedback first.
  2. I noticed for some of these settings, using a large integer value causes some additional warning output to get generated on travis-ci. It looks like on some platforms it is warning such large numbers will force the code to use long long as the type. Maybe we don't want to use the absolute maximum value possible, but just some arbitrarily large value... or we just leave these settings alone for now until someone actually needs those values bumped.
  3. Most importantly, these changes have a security impact. Personally, the only value I really care about is the maximum attach limit. I would be fine reducing the impact of this PR to just touch that one setting, if that seems less risky. Just let me know!

Thanks!

@xerial
Copy link
Owner

xerial commented Jan 15, 2021

This issue is a bit complicated. Give me some time to comprehend the change.

@jehugaleahsa
Copy link
Contributor Author

Just wanted to poke you on this. Let me know if there's anything I can do to help.

@xerial
Copy link
Owner

xerial commented Jun 27, 2021

@jehugaleahsa Sorry for the long delay. The change looks good to me. I'll merge this.

@xerial xerial merged commit 741dfce into xerial:master Jun 27, 2021
@jehugaleahsa
Copy link
Contributor Author

Thank you very much. I have spent a lot of my time maintaining one of my old open source projects this weekend, too, so believe me, I totally get it. No worries! I already gave the latest JAR a spin and everything looks great.

By the way, one thing I might investigate in the upcoming months is whether we can avoid the overhead of converting Java strings into UTF-8 (specifically parameters and result sets) and then having SQLite convert them back into UTF-16, when PRAGRMA encoding = utf-16be is requested. Based on what I read, we can probably get a similar speed boost if they request UTF-16, UTF-16le, or UTF-16be because SQLite will convert those byte arrays in place (in C). The only caveat is someone might call PRAGMA encoding = utf-16 on an existing UTF-8 database, but I can always just call PRAGMA encoding to get the actual encoding. Based on my profiling, this is the only real performance killer; however, just by merging this PR, I've already nearly cut the runtime of some of my processes in half. It's amazing!

Thanks again! Deeply appreciated.

@xerial
Copy link
Owner

xerial commented Jun 29, 2021

Good to hear that. Introducing UTF8String object for avoiding Java's UTF-16 conversion might improve the performance, but at the same time, Java also has a lot of non-trivial optimizations for managing Java Strings.

I also found it's impossible to mimic Java's String class that points to a raw native memory read at SQLite, so anyway at least one byte array copy from SQLite native to Java's world will be involved.

@jehugaleahsa
Copy link
Contributor Author

@xerial I spent an embarrassing amount of time detecting the pragma encoding and switching to sqlite3_bind_text64 (turns out sqlite3_bind_text16 switches to native encoding if a BOM isn't provided and the 64 variant lets you be explicit).

As you guessed, there's not a dramatic boost in performance when converting a String to a byte[] when it's UTF-16 vs UTF-8. In fact, my guess is you are right and the conversion for ASCII characters is highly optimized and uses less memory, when means less getting passed to C and less copying.

The real performance killer, from what I can tell, though, is that queries seem to be slower. My data is mostly ASCII, so I'm guessing processing queries involving 16-bit chars is more intense than 8-bit characters. I can see this increase even without any of my changes.

So, I guess if anyone ever brings it up in the future, you have this for reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants