A postgresql function for selecting a subset of a fields in a json (jsonb
) document.
(Is "projection" a better term than "selection"?)
Tested with postgres 9, 10, 11, 12, 13, 14.
Warning: Never used or tested in production!
The function is recursive - it calls itself to handle nested json object structures. It can also handle json documents with lists.
Usage example:
select jsonb_select_keys(
-- doc:
'{
"id": 457,
"a": "hello",
"b": {
"c": "there",
"d": "how"
},
"e": [
{
"f": "are",
"g": "you"
},
{
"f": "this",
"g": "evening"
}
]
}',
-- selection:
'{
"id": 1,
"b": {
"c": 1
},
"e": {
"g": 1
}
}'
);
This would result in:
{
"b": {
"c": "there"
},
"e": [
{
"g": "you"
},
{
"g": "evening"
}
],
"id": 457
}
As shown here, the function handles lists and nested sub-structures.
-
By "doc" we mean the jsonb document that we want to process. You might have many such documents stored in some DB-column.
-
By "selection" we mean the json snippet that specifies the fields in the doc that we are interested in.
The work horse in the function is the inner join: The properties of the selection json object ("extracted" using jsonb_each
) are joined with the doc json object (again, "extracted" using jsonb_each
).
The join condition is field name ("key") of each property.
So the result will contain only the properties in the doc that are also present in the selection.
To illustrate this, a simplified variant of the function could be something like:
create or replace function simple(doc jsonb, selection jsonb)
returns jsonb
language sql
immutable
parallel safe
as
$$
with selected_subset
as
(select doc_property.key, doc_property.value
from jsonb_each(doc) doc_property(key, value)
inner join jsonb_each(selection) selection_property(key, value)
on doc_property.key = selection_property.key)
select jsonb_object_agg(ss.key, ss.value)
from selected_subset ss;
$$;
which we could use like this:
select simple(
-- doc:
'{
"a": "hello",
"b": "there",
"c": "how are you",
"d": "today"
}'::jsonb,
-- selection:
'{
"a": 1,
"c": 1,
"e": 1
}'::jsonb);
with the result:
{
"a": "hello",
"c": "how are you"
}
However, this simplified version does not handle lists and nested objects et.c., which jsonb_select_key does by adding some cruft.
-
The selection must be a json object (it can’t be a scalar or a list)
-
The selection object can only contain values that are objects or the scalar number "1". (The same rule applies recursively to the sub-objects in the selection)
-
There is no way to say "just select the whole document".
-
We could add that, by allowing the caller to just pass the number "1" as a selection, and let that mean "select the whole document"
-
-
This whole concept is similar to (but simpler variant of) what MongoDB calls a "projection"
-
Use case: save network traffic (and application CPU?) by only pulling subsets of json that are actually needed by the application. Might be useful when having legacy DB with big legacy json-documents, and there is a slow query that really only needs a small sub-set, a deep sub-selection, of the stored json-documents.
-
The cost is significant CPU usage (queries get slower) on the database server
-
It is somewhat quirky that the resulting document can contain lots of empty objects or lists (MongoDB behaves similarly)
-
We could fix that and make sure to filter out empty objects and lists (but I am not sure if this could be done without a performance penalty)
-
-
In MongoDB "sub-projection" (a sub-object of the selection) must not be empty, i.e no '{}'. If such an object is found, an error is emitted. We do it differently (to make the function simpler and maybe faster) - an empty object in the selection means "no fields are of interest". We thus don’t have to do such a check or maintain logic for raising an error.
-
In MongoDB, an object in the selection does NOT match a scalar in the document. We do it differently (but only because it is easier)
-
The root level of the document in mongo is always an object (I think). But
jsonb_select_keys
supports root level lists too. -
Somewhat related:
Start a postgres server using docker-compose and docker-compose.yml.
The bash script test/run_tests.bash runs the tests (and verifies the results), and auto-generates the asciidoc example files (test/examples.adoc, test/examples_many_small.adoc) from the test results.
So, a hint on how to work would be:
cd test
docker-compose up -d
bash run_tests.bash