Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sqlite3.Open thread safe #790

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pzbitskiy
Copy link

@pzbitskiy pzbitskiy commented Feb 28, 2020

  • sqlite3_open_v2 implicitly calls sqlite3_initialize
  • sqlite3_initialize is not thread safe in the prologue
  • concurrent calls of sqlite3.Open even on different files
    may trigger mutex initialization race and crash
  • Fix: use go channel with a single message in it as a barrier
    letting only single-threaded sqlite3_open_v2 for the first time
  • Reproducibility: very low, couple hours of concurrent sqlite3.Open calls
    trigger a single crash

Sample crash dump:

Thread 1 (Thread 0x7fffee7fc700 (LWP 12250)):
#0  0x0000000000000000 in ?? ()
#1  0x0000000000ba9a55 in sqlite3_initialize () at sqlite3-binding.c:151936
#2  0x0000000000bebd49 in openDatabase (zFilename=0x7fffd8000b20 "file:gen/rootkey?_busy_timeout=1000&_synchronous=full&_journal_mode=wal", 
    ppDb=0xc0000b0488, flags=<optimized out>, zVfs=0x0) at sqlite3-binding.c:154720
#3  0x0000000000bec6f5 in sqlite3_open_v2 (filename=<optimized out>, ppDb=<optimized out>, flags=<optimized out>, zVfs=<optimized out>)
    at sqlite3-binding.c:155075
#4  0x0000000000b8cb35 in _sqlite3_open_v2 (zVfs=<optimized out>, flags=<optimized out>, ppDb=<optimized out>, filename=<optimized out>)
    at /home/ubuntu/go/pkg/mod/github.com/mattn/go-sqlite3@v1.10.0/sqlite3.go:53
#5  _cgo_210ed2050b13_Cfunc__sqlite3_open_v2 (v=0xc0000d2bc0) at cgo-gcc-prolog:151
#6  0x0000000000460298 in runtime.asmcgocall () at /usr/local/go/src/runtime/asm_amd64.s:635

Relevant code for sqlite3-bindings.c

24769	#ifdef SQLITE_ENABLE_MULTITHREADED_CHECKS
24770	      pFrom = multiThreadedCheckMutex();
24771	#else
24772	      pFrom = sqlite3DefaultMutex();
24773	#endif
24774	    }else{
24775	      pFrom = sqlite3NoopMutex();
24776	    }
24777	    pTo->xMutexInit = pFrom->xMutexInit;
   0x0000000000b8d61e <+222>:	mov    %rax,0xfc40cb(%rip)        # 0x1b516f0 <sqlite3Config+112>

24787	  }
24788	  assert( sqlite3GlobalConfig.mutex.xMutexInit );
24789	  rc = sqlite3GlobalConfig.mutex.xMutexInit();
   0x0000000000b8d625 <+229>:	jmpq   *%rax

* sqlite3_open_v2 implicitly calls sqlite3_initialize
* sqlite3_initialize is not thread safe in the prologue
* concurrent calls of sqlite3.Open even on different files
  may trigger mutex initialization race and crash
* Fix: use go channel with a single message in it as a barrier
  letting only single-threaded sqlite3_open_v2 for the first time
* Reproducibility: very low, couple hours of concurrent sqlite3.Open calls
  trigger a single crash
@rittneje
Copy link
Collaborator

This repo is a Go wrapper around the SQLite C library. If you believe you have found a bug in the C library, you should report it on their mailing list and let one of the SQLite maintainers make the proper fix. https://www.sqlite.org/support.html

@pzbitskiy
Copy link
Author

pzbitskiy commented Feb 28, 2020 via email

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 51.678% when pulling 28b2391 on pzbitskiy:initialize-fix into 67986a7 on mattn:master.

@rittneje
Copy link
Collaborator

Comments above ‘sqlite3_initialize’ clearly state that it is a caller’s responsibility to init SQLite in a thread safe manner.

The docs say otherwise. https://www.sqlite.org/c3ref/initialize.html

The sqlite3_initialize() interface is threadsafe

@pzbitskiy
Copy link
Author

It might be thread safe in master - some memory barriers are added to sync up caches and memory.

In the same time the comment above the function:

** The first thread to call this routine runs the initialization to
** completion.  If subsequent threads call this routine before the first
** thread has finished the initialization process, then the subsequent
** threads must block until the first thread finishes with the initialization.

Feel free to close, your call. The only thing go-sqlite3 1.10 (with sqlite 3.25.2) crashes on concurrent Open calls.

@rittneje
Copy link
Collaborator

I believe that comment is describing what the implementation of sqlite3_initialize does. If they were intended to be preconditions on the caller, I imagine they would have been in the public documentation. Like I said earlier, if you believe you have found a bug in SQLite, you should post your issue on their mailing list. I don't believe that this repo is the right place for a fix to the kind of problem you are experiencing. But I'm also not the owner of this repo, just giving my two cents.

@rittneje
Copy link
Collaborator

@tsachiherman Care to elaborate on the downvote? I still contend this change should not be merged, given that the claim is that there is a bug in the SQLite library itself.

@rittneje
Copy link
Collaborator

rittneje commented Jul 11, 2020

@pzbitskiy Why do you believe those lines you posted from sqlite3-bindings.c to be relevant? They aren't the line numbers from your crash dump. Did you come to those lines from some deeper investigating you did? Also, can you post how you are calling sql.Open?

rv := C._sqlite3_open_v2(name, &db,
mutex|C.SQLITE_OPEN_READWRITE|C.SQLITE_OPEN_CREATE,
nil)
if ok {
close(initCh)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means you will close the channel even if sqlite3_open_v2 were to fail, which means your init barrier will not work as expected on the next invocation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, good catch

@pzbitskiy
Copy link
Author

pzbitskiy commented Jul 11, 2020

Unfortunately I can't remember all the details after 5 months. But here is what I recalled.

They aren't the line numbers from your crash dump.

They are:

  1. sqllite3_initialize:
 151936   rc = sqlite3MutexInit();
 151937   if( rc ) return rc;
  1. sqlite3MutexInit:
  24757 SQLITE_PRIVATE int sqlite3MutexInit(void){ 
  24758   int rc = SQLITE_OK;
...
  24768     if( sqlite3GlobalConfig.bCoreMutex ){
  1. Zero address on top of the stack tells us about a jump to NULL address.
  2. Previous item in the call stack points to sqlite3MutexInit.
  3. There is only one call by a pointer there:
24789	  rc = sqlite3GlobalConfig.mutex.xMutexInit();
   0x0000000000b8d625 <+229>:	jmpq   *%rax
  1. And yes, RAX is zero and assert is stripped out in release build.

Now follow the path:

  1. Two threads enter sqllite3_initialize and hit sqlite3MutexInit
  2. Next, inside sqlite3MutexInit they execute these assignments concurrently.
  3. I understand how one thread could see NULL in xMutexInit in case of no memory barriers but there is one:
  24784     pTo->xMutexNotheld = pFrom->xMutexNotheld;
  24785     sqlite3MemoryBarrier();
  24786     pTo->xMutexAlloc = pFrom->xMutexAlloc;

I think I had explanation about it but can't figure it out.

@tsachiherman
Copy link

tsachiherman commented Jul 11, 2020

@tsachiherman Care to elaborate on the downvote? I still contend this change should not be merged, given that the claim is that there is a bug in the SQLite library itself.

@rittneje, try to look on this issue from an application perspective - the applicative stack would look like this:
[Application] -> [Go SQL library] -> [mattn sqlite3 driver] -> [sqlite3 library]

So, from workaround perspective:

  1. At this point in time, if an application developer want to release a product that doesn't have this issue, he/she would need to patch it in the application itself.
  2. Google aren't going to do anything about this issue in their SQL library.
  3. The mattn sqlite3 driver isn't the source of the issue, according to your claim.
  4. sqlite3 library (according to you) need to be fixed.

I'm not claiming that it shouldn't be fixed in the sqlite3 library itself. I think it should. However, from practical perspective, even if the sqlite3 library would get updated, it will take a long time before the fix reaches everyone.

That's why I think that in the case of such a trivial fix, it should be patched in the driver right away. With that, the bug would no longer be applicable, and the driver would be safe for usage. This workaround can be removed from the driver once the sqlite3 library would confirm to its own documentation.

Another note - even if you were to report that to the sqlite3 library team, they might fix it by updating their documentation.. which would leave you at the same place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants