Skip to content

hugoheden/postgresql-jsonb-select-keys

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

jsonb_select_keys - a PostgreSQL function

A postgresql function for selecting a subset of a fields in a json (jsonb) document. (Is "projection" a better term than "selection"?)

Tested with postgres 9, 10, 11, 12, 13, 14.

Warning: Never used or tested in production!

The function is recursive - it calls itself to handle nested json object structures. It can also handle json documents with lists.

Usage example:

select jsonb_select_keys(
 -- doc:
 '{
    "id": 457,
    "a": "hello",
    "b": {
        "c": "there",
        "d": "how"
    },
    "e": [
        {
            "f": "are",
            "g": "you"
        },
        {
            "f": "this",
            "g": "evening"
        }
    ]
 }',
 -- selection:
 '{
    "id": 1,
    "b": {
        "c": 1
    },
    "e": {
        "g": 1
    }
 }'
);

This would result in:

{
    "b": {
        "c": "there"
    },
    "e": [
        {
            "g": "you"
        },
        {
            "g": "evening"
        }
    ],
    "id": 457
}

As shown here, the function handles lists and nested sub-structures.

Terminology:

  • By "doc" we mean the jsonb document that we want to process. You might have many such documents stored in some DB-column.

  • By "selection" we mean the json snippet that specifies the fields in the doc that we are interested in.

The code

Discussion

The work horse in the function is the inner join: The properties of the selection json object ("extracted" using jsonb_each) are joined with the doc json object (again, "extracted" using jsonb_each). The join condition is field name ("key") of each property. So the result will contain only the properties in the doc that are also present in the selection.

To illustrate this, a simplified variant of the function could be something like:

create or replace function simple(doc jsonb, selection jsonb)
    returns jsonb
    language sql
    immutable
    parallel safe
as
$$
with selected_subset
         as
         (select doc_property.key, doc_property.value
          from jsonb_each(doc) doc_property(key, value)
                   inner join jsonb_each(selection) selection_property(key, value)
                              on doc_property.key = selection_property.key)
select jsonb_object_agg(ss.key, ss.value)
from selected_subset ss;
$$;

which we could use like this:

select simple(
           -- doc:
               '{
                 "a": "hello",
                 "b": "there",
                 "c": "how are you",
                 "d": "today"
               }'::jsonb,
           -- selection:
               '{
                 "a": 1,
                 "c": 1,
                 "e": 1
               }'::jsonb);

with the result:

{
  "a": "hello",
  "c": "how are you"
}

However, this simplified version does not handle lists and nested objects et.c., which jsonb_select_key does by adding some cruft.

A few examples

Some notes

  • The selection must be a json object (it can’t be a scalar or a list)

  • The selection object can only contain values that are objects or the scalar number "1". (The same rule applies recursively to the sub-objects in the selection)

  • There is no way to say "just select the whole document".

    • We could add that, by allowing the caller to just pass the number "1" as a selection, and let that mean "select the whole document"

  • This whole concept is similar to (but simpler variant of) what MongoDB calls a "projection"

  • Use case: save network traffic (and application CPU?) by only pulling subsets of json that are actually needed by the application. Might be useful when having legacy DB with big legacy json-documents, and there is a slow query that really only needs a small sub-set, a deep sub-selection, of the stored json-documents.

  • The cost is significant CPU usage (queries get slower) on the database server

  • It is somewhat quirky that the resulting document can contain lots of empty objects or lists (MongoDB behaves similarly)

    • We could fix that and make sure to filter out empty objects and lists (but I am not sure if this could be done without a performance penalty)

  • In MongoDB "sub-projection" (a sub-object of the selection) must not be empty, i.e no '{}'. If such an object is found, an error is emitted. We do it differently (to make the function simpler and maybe faster) - an empty object in the selection means "no fields are of interest". We thus don’t have to do such a check or maintain logic for raising an error.

  • In MongoDB, an object in the selection does NOT match a scalar in the document. We do it differently (but only because it is easier)

  • The root level of the document in mongo is always an object (I think). But jsonb_select_keys supports root level lists too.

  • Somewhat related:

A bunch of small examples

Developing

Start a postgres server using docker-compose and docker-compose.yml.

The bash script test/run_tests.bash runs the tests (and verifies the results), and auto-generates the asciidoc example files (test/examples.adoc, test/examples_many_small.adoc) from the test results.

So, a hint on how to work would be:

cd test
docker-compose up -d
bash run_tests.bash

About

A postgresql function for selecting a subset of jsonb fields

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published