Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for reading SQLite databases embedded in Go executables #968

Closed
GeekStocks opened this issue Aug 12, 2021 · 8 comments
Closed

Proposal for reading SQLite databases embedded in Go executables #968

GeekStocks opened this issue Aug 12, 2021 · 8 comments

Comments

@GeekStocks
Copy link

Proposal for reading SQLite databases embedded in Go executables

Purpose

Numerous forums in multiple programming languages are littered with questions similar to this:

"How can I read my preexisting SQLite database in memory mode? Why can't I just pass a memory pointer to my driver's open command?"

The disappointing answer to these questions is simply that the authors of SQLite have not exposed a method to do so. Driver authors / maintainers like @mattn for this package cannot interface to something that doesn't exist. Given this, the "fallback" answer usually wins the day: open SQLite using ":memory:" mode and then Exec a SQL script containing a backup "dump" of the database. Once the overhead of this loading process has occurred, the user has the environment they want.

But is there a better way that could avoid this loading overhead? The purpose of this proposal is to explore and solicit comments for a proposed custom SQLite VFS module that would work in tandem with mattn/go-sqlite3 and Go's new embed package. You are encouraged to comment if you can add to the knowledge presented here and/or see another way to accomplish the goal.

Trying a different approach

Most observed approaches to this problem start with the desired :memory: mode and quickly run into trouble with no pointer parameter available. This proposal avoids SQLite's memory mode altogether and seeks to use the existing SQLite URI mode to open an embedded database. By virtue of being embedded the database is already in memory and accessable via a variable, so opening it is all that remains.

As of Go version 1.16 released in February, 2021 embedding a SQLite database into a Go executable is now trivial:

package main

import "embed"

//go:embed specimen.db
var embeddedDatabase embed.FS

Go's embed package offers string, []byte, and the new embed.FS data types as embedded possibilities. Users of the "fallback" solution can now at least embed their SQL script backup "dumps" as a string. Since this author knows no method to load SQLite databases from []byte, the focus of this proposal is on the new embed.FS option shown above.

It is important to note that embedding digital assets into Go executables is a read-only process:

An FS is a read-only collection of files, usually initialized with a //go:embed directive

This limits the usefulness of this proposal; if a project must have write ability on the SQLite database then the "fallback" method remains the only choice. But as the use cases for read-only databases are legion, we soldier on.

Under the hood SQLite uses a standard Virtual File System (VFS) to interpret the open string and locate files. The VFS chosen depends on the OS of course:

Unix builds come with multiple built-in VFSes. The default VFS for unix is called "unix" and is used in most applications. Windows builds also come with multiple built-in VFSes. The default Windows VFS is called "win32" and is used in most applications.

We are not limited to these bundled VFSes however; we can subclass SQLite's objects to create our own. If appropriate, the majority of the work can even be delegated to the standard VFS to create a shim:

A new VFS is implemented by subclassing three objects:

  • sqlite3_vfs
  • sqlite3_io_methods
  • sqlite3_file

For Your Consideration

The proposed goal is to enable opening an embedded SQLite database from Go's new embed.FS type using a URI and a custom VFS. According to RFC 3986, a URI consists of "...a scheme, an authority, a path, a query string, and a fragment."

Consider the following code which extends the previous example:

db, err := sql.Open("sqlite3", "file://embeddedDatabase/specimen.db")

In this line a standard VFS is being used since no VFS pragma was specified. Accordingly the "embeddedDatabase" authority string above is interpreted as a folder name, and not as our embed.FS variable of the same name. The proposed solution would require the authority string to be interpreted as a Go embed.FS variable name, and the path string to be the path to follow within the embed.FS instance. Thus, the proposed Open statement would conceivably look like:

db, err := sql.Open("sqlite3", "file://embeddedDatabase/specimen.db?vfs=go-embed-unix&mode=ro")
// or
db, err := sql.Open("sqlite3", "file://embeddedDatabase/specimen.db?vfs=go-embed-win32&mode=ro")

If you are knowledgeable and so inclined, please answer or comment on the following questions as you see fit:

  • Does this approach appear sound? Why or why not?
  • Does a method already exist to read embedded SQLite that has been overlooked here?
  • Do you have any custom VFS experience you can share? Are there pitfalls to be aware of?
  • Do you foresee this proposal as a stand-alone module or something to include in mattn/go-sqlite3?

Sources

Go

SQLite

@rittneje
Copy link
Collaborator

Perhaps I'm missing something, but can't you just open the original database file normally in read-only mode? What is driving the desire for using an in-memory database specifically?

Also, just FYI, rather than using Exec, you can use the backup API to copy a physical database to an in-memory database, or vice versa. https://www.sqlite.org/backup.html

@GeekStocks
Copy link
Author

Just to document, this custom VFS proposal is now possible thanks to @mattn and @guenhter on 876 / 877

@GeekStocks
Copy link
Author

I will assume by "normally" that @rittneje meant not using ":memory:" but just opening the database "from a file" using the standard VFS provided with the build. The problem is that the standard VFS looks to disk sources and not memory sources for the []byte. We'll have to be careful with terminology here but I consider opening and working with an embedded database to be "in memory" just like opening using ":memory:" with the understanding that there are many differences of course. The executable is in memory thus anything embedded in it is in memory, we just need to be able to open it.

FS implements the io/fs package's FS interface, so it can be used with any package that understands file systems, including net/http, text/template, and html/template.

As flexible as embed.FS is (above quote) I still think we have to "teach" SQLite how to work with it by registering a new VFS and then using that in our connection strings ( thanks to #877 ). I'll be happy to be wrong if something is being overlooked in the proposal.

@rittneje
Copy link
Collaborator

My question is why you are trying to open a database from []byte in the first place. What is the specific use case?

@GeekStocks
Copy link
Author

@rittneje - As the proposal states, the goal is to avoid the overhead of opening in ":memory:" mode and then consuming CPU to populate that new, empty database when a copy of an already populated database has been lifted into memory via embedding.

Your question seems to explore why the developer made the design choice to embed, and for that I'd say it is beyond the purview of the proposal; we should trust that the correct decision was made. Some common use cases would be embedded website servers with hybrid static/dynamic content and backends for report API's. I think cloud FaaS would specifically benefit from not having the CPU cycles spent on redundant database building on every call. Hope that helps.

@GeekStocks
Copy link
Author

Closing issue. Implementation of this proposal is moving to modernc.org/sqlite

@F21
Copy link

F21 commented May 18, 2022

@GeekStocks did you get anywhere with this either with go-sqlite3 or modernc.org/sqlite? Quick grep on both repos didn't surface much for reading from a sqlite db embedded using go:embed.

@mattn
Copy link
Owner

mattn commented May 18, 2022

As long as the SQLite implementation references the original C language (in go-sqlite or modernc.org/sqlite both), I don't think it will achieve your goal. Because original SQLite implementation does not handle Go's I/O layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants