You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Datasette metadata allows users to store extra descriptions, URLs, and styling of their Datasette instances/databases/tables in an easy way. Traditionally, users can bring in their metadata in one of two ways:
With the -m metadata.json CLI option, where metadata.json is a nested JSON file of all metadata (YAML also supported)
Using the get_metadata() hook
Internally, Datasette stores metadata in internal Python dictionaries, and is accessed with the (publicly undocumented) .metadata() method. The logic is quite complex — it handles "recursive" updates to combine metadata.json metadata with plugin hooks metadata, fallback logic, and confusing database/table/key arguments.
Proposal: New datasette_metadata_* tables inside internal.db
We added a new --internal internal.db option to Datasette in a recent Datasette 1.0a release. This is a persistent instance-wide database that plugins can use to store data. I propose that Datasette core uses this database to store metadata, as a "single-source" of truth for metadata resolution.
Datasette core will use these new datasette_metadata_* tables to source metadata for instances/database/tables/columns. Plugins can write directly to these tables to store metadata, removing the need for the get_metadata() hook.
The metadata.json pattern can still be supported by just writing the contents of metadata.json to the datasette_metadata_* tables on startup.
Proposed SQL + Python API
The "internal tables" that Datasette uses for metadata can be described as follows:
-- Metadata key/values for the entire Datasette instanceCREATETABLEdatasette_metadata_instance_entries(
key text,
value text,
unique(key)
);
-- Metadata key/values for specific databasesCREATETABLEdatasette_metadata_database_entries(
database_name text,
key text,
value text,
unique(database_name, key)
);
-- Metadata key/values for specific "resources" (tables, views, canned_queries)CREATETABLEdatasette_metadata_resource_entries(
database_name text,
resource_name text,
key text,
value text,
unique(database_name, resource_name, key)
);
-- Metadata key/values for specific columnsCREATETABLEdatasette_metadata_column_entries(
database_name text,
resource_name text,
column_name text,
key text,
value text,
unique(database_name, resource_name, column_name, key)
);
In Python, Datasette core will add the following methods on the Datasette class:
These will be used internally by Datasette to wrap the SQL queries to the datasette_metadata_* tables. Though maybe plugins can use them as well?
We could also add set_* methods, mainly for plugin authors, so they could avoid writing SQL.
classDatasete:
# ...asyncdefset_instance_metadata(key:str, value:str):
passasyncdefset_database_metadata(database_name: str, key:str, value:str):
pass# etc.
Consequences
The get_metadata() hook will be deprecated. Instead, plugins can write directly to the datasette_metadata_* tables on startup, and update them as they wish (on user request, on a scheduled basis, etc.)
"Cascading metadata", aka the fallback option will be deprecated. It only really makes sense in narrow use-cases (ie licensing an entire database), and plugins could define their own cascading logic if needed.
Metadata fetching becomes an async operation.
metadata.json can still be supported - it'll just overwrite the datasette_metadata_* entries on startup, meaning users will only need to run it once then can delete their metadata.json (provided they include a persistent --internal database). Though "overwriting" may have unintended consequences...
How 3rd party plugins currently use the get_metadata() hooks
There aren't many open-source usages of the get_metadata() hook, at least what I could find on Github search. The ones I found:
datasette-metadata-editable: Handles an in-memory cache (Python dictionary) that gets populated on startup(), and updated on updates
datasette-remote-metadata: Handles an in-memory cache (Python dictionary) that gets updated on a reoccurring basis
I think all of these use-cases can easily be supported with this new approach — writing to the datasette_metadata_* tables on startup, and update them whenever they need it. I'd also say it would simplify much of the code we see here, but only time will tell...
The text was updated successfully, but these errors were encountered:
Example for what this looks like for plugin authors:
@hookimplasyncdefstartup(datasette):
# Update a single key with the Python APIawaitdatasette.set_instance_metadata("title", "My cool Datasette project")
# bulk updates if you want more controlawaitdatasette.get_internal_database().execute_write(
""" UPDATE datasette_metadata_database_entries SET value = 'database description for the covid database' WHERE database_name = 'covid' AND key = 'description' """
)
We talked about this in detail this morning, I'm on board with this plan.
I really like the symmetric set_x methods idea - makes it very clear how plugins should integrate with the metadata system, without needing any new plugin hooks.
Datasette metadata allows users to store extra descriptions, URLs, and styling of their Datasette instances/databases/tables in an easy way. Traditionally, users can bring in their metadata in one of two ways:
-m metadata.json
CLI option, wheremetadata.json
is a nested JSON file of all metadata (YAML also supported)get_metadata()
hookInternally, Datasette stores metadata in internal Python dictionaries, and is accessed with the (publicly undocumented)
.metadata()
method. The logic is quite complex — it handles "recursive" updates to combinemetadata.json
metadata with plugin hooks metadata, fallback logic, and confusing database/table/key arguments.Proposal: New
datasette_metadata_*
tables insideinternal.db
We added a new
--internal internal.db
option to Datasette in a recent Datasette1.0a
release. This is a persistent instance-wide database that plugins can use to store data. I propose that Datasette core uses this database to store metadata, as a "single-source" of truth for metadata resolution.Datasette core will use these new
datasette_metadata_*
tables to source metadata for instances/database/tables/columns. Plugins can write directly to these tables to store metadata, removing the need for theget_metadata()
hook.The
metadata.json
pattern can still be supported by just writing the contents ofmetadata.json
to thedatasette_metadata_*
tables on startup.Proposed SQL + Python API
The "internal tables" that Datasette uses for metadata can be described as follows:
In Python, Datasette core will add the following methods on the Datasette class:
These will be used internally by Datasette to wrap the SQL queries to the
datasette_metadata_*
tables. Though maybe plugins can use them as well?We could also add
set_*
methods, mainly for plugin authors, so they could avoid writing SQL.Consequences
get_metadata()
hook will be deprecated. Instead, plugins can write directly to thedatasette_metadata_*
tables on startup, and update them as they wish (on user request, on a scheduled basis, etc.)fallback
option will be deprecated. It only really makes sense in narrow use-cases (ie licensing an entire database), and plugins could define their own cascading logic if needed.metadata.json
can still be supported - it'll just overwrite thedatasette_metadata_*
entries on startup, meaning users will only need to run it once then can delete theirmetadata.json
(provided they include a persistent--internal
database). Though "overwriting" may have unintended consequences...How 3rd party plugins currently use the
get_metadata()
hooksThere aren't many open-source usages of the
get_metadata()
hook, at least what I could find on Github search. The ones I found:datasette-metadata-editable
: Handles an in-memory cache (Python dictionary) that gets populated onstartup()
, and updated on updatesdatasette-remote-metadata
: Handles an in-memory cache (Python dictionary) that gets updated on a reoccurring basisdatasette-updated
: Reads from on-disk file at request timedatasette-scraper
: Provides metadata for tables that the plugin creates/manages (kindof like shadow tables?)datasette-live-config
I think all of these use-cases can easily be supported with this new approach — writing to the
datasette_metadata_*
tables on startup, and update them whenever they need it. I'd also say it would simplify much of the code we see here, but only time will tell...The text was updated successfully, but these errors were encountered: