Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debconf config and template databases #95

Open
yosifkit opened this issue Sep 11, 2020 · 5 comments
Open

Debconf config and template databases #95

yosifkit opened this issue Sep 11, 2020 · 5 comments

Comments

@yosifkit
Copy link

yosifkit commented Sep 11, 2020

Two related issues to be considered that together increase image size by ~1.5MB each time they are encountered in a Docker image.

  1. whenever a newly installed package has debconf config or templates, the respective database file gets updated: /var/cache/debconf/{config,templates}.dat
  2. these files both have automatic backups -old that are also updated when the regular file is changed

For the first, it might make sense to change debconf to use PackageDir (https://manpages.debian.org/buster/debconf-doc/debconf.conf.5.en.html#DRIVERS) so that each package only touches its own debconf template/config files rather than the two global database files to minimize changes to be just new files across Docker layers.

The second issue can be controlled by adding Backup: false in the specific debconf.conf stanza to not create backups of the debconf template/config database before modification. This should be safe/sane for containers. It may not be necessary to turn off the backups if using PackageDir, since most packages shouldn't be changing other packages debconf templates/config.

@yosifkit
Copy link
Author

This database driver allows debconf to store data in a hierarchical directory structure. The names of the various debconf templates and questions are used as-is to form directories with files in them.

DirTree is also an option, but doesn't have a Mode setting like PackageDir so shouldn't be used for the "password" config.

(could maybe use PackageDir for password and DirTree for the rest?)

Current debconf.conf (the important parts):

# Debconf will use this database to store the data you enter into it,
# and some other dynamic data.
Config: configdb
# Debconf will use this database to store static template data.
Templates: templatedb

# World-readable, and accepts everything but passwords.
Name: config
Driver: File
Mode: 644
Reject-Type: password
Filename: /var/cache/debconf/config.dat

# Not world readable (the default), and accepts only passwords.
Name: passwords
Driver: File
Mode: 600
Backup: false
Required: false
Accept-Type: password
Filename: /var/cache/debconf/passwords.dat

# Set up the configdb database. By default, it consists of a stack of two
# databases, one to hold passwords and one for everything else.
Name: configdb
Driver: Stack
Stack: config, passwords

# Set up the templatedb database, which is a single flat text file
# by default.
Name: templatedb
Driver: File
Mode: 644
Filename: /var/cache/debconf/templates.dat

@tianon
Copy link
Collaborator

tianon commented Sep 14, 2020

I really, really like this idea, because it seems perfect for containers to have one file per package config, but my biggest concern is that the default value has been the default for at least 13 years, and in my experience (and research) it is not at all common for those defaults to change.

Along with that, the database files themselves are trivial text files, so I expect many applications are reading/parsing them directly, especially since there's no obvious way to fetch a single value from a shell script while adhering to the configuration, for example (which matches my basic searching in https://codesearch.debian.net/search?q=%2Fvar%2Fcache%2Fdebconf%2Fconfig.dat&literal=1&perpkg=1). The debconf-get-selections helper script is even part of the non-default debconf-utils package (even though debconf-set-selections is part of the core debconf package), and even then doesn't have super useful output for parsing programmatically (compared to the simple grep or awk that could parse the database file directly).

So, TL;DR, I think disabling the backup files is a no-brainer, but I'm afraid going further (however interesting it may be for container use cases) is something we can't reasonably do by default. 😞

(Maybe @paultag would be willing to chime in with more opinions? 🙏 ❤️)

@paultag
Copy link

paultag commented Sep 15, 2020

I also really like this idea (we should at least consider setting Backup: false, perhaps?), but I think you're right @tianon - this smells like something that can break.

I'm not sure how this file interacts with debconf at install-time (if install-time debconf prompts populate or are populated by this file) -- and if it's missing, how those defaults change & may cause a massive change in behavior between full mutable installs and the container environment. I think if this goes "one way" (e.g. package install triggering an append to the debconf database), it may be safe, otherwise this may be a source of heisenbugs.

FWIW, the codesearch looking for the file glob appears to be mostly developer tools that need to dig into the debconf db in a way most users won't hit (or at least, the users who are using things like cdebootstrap or lintian would hopefully be able to work out what's going on here :) ), and it doesn't freak me out as much. I would likely err on the side of conservative changes here myself too.

@guillaume-d
Copy link

For those who want less image layer waste for their subsequent installations now, the following worked fine for me so far.

It uses the independently-discovered 😜 Backup: false trick explained above, combined with stacking to merge with the previous monolithic databases:

root@ce25fa0fc542:/app# diff -u <(grep -v "^#" /etc/debconf.conf) ~/.debconfrc
--- /dev/fd/63	2020-09-27 16:47:49.237302697 +0000
+++ /root/.debconfrc	2020-09-27 16:35:31.000000000 +0000
@@ -1,12 +1,13 @@
 
-Config: configdb
-Templates: templatedb
+Config: configdb1
+Templates: templatedb1
 
 Name: config
 Driver: File
 Mode: 644
 Reject-Type: password
 Filename: /var/cache/debconf/config.dat
+Readonly: true
 
 Name: passwords
 Driver: File
@@ -24,4 +25,28 @@
 Driver: File
 Mode: 644
 Filename: /var/cache/debconf/templates.dat
+Readonly: true
+
+
+Name: config1
+Driver: PackageDir
+Mode: 644
+Reject-Type: password
+Directory: /var/cache/debconf/config1.d
+Backup: false
+
+Name: configdb1
+Driver: Stack
+Stack: config1, config, passwords
+
+Name: templates1
+Driver: PackageDir
+Mode: 644
+Directory: /var/cache/debconf/templates1.d
+Backup: false
+
+Name: templatedb1
+Driver: Stack
+Stack: templates1, templatedb
+
 
root@ce25fa0fc542:/app# 

@tianon
Copy link
Collaborator

tianon commented Sep 3, 2021

Hmm, I went to work on an implementation of this, and came up with the following:

awk 'tolower($1) == "driver:" { driver = tolower($2) } tolower($1) == "backup:" { backup = 1 } /^$/ { if (driver == "file" && !backup) { print "# DOCKER: https://github.com/debuerreotype/debuerreotype/issues/95"; print "Backup: false" } driver = ""; backup = 0 } { print }' /etc/debconf.conf

This works pretty well, but helped me realize a major downside of even Backup: false -- pretty much all our other "non-default" customizations are trivial to revert by removing/ignoring/excluding files (/etc/dpkg/dpkg.cfg.d/docker*, /etc/apt/apt.conf.d/docker*, /usr/sbin/policy-rc.d, etc), whereas this will be one that will require careful modification of the file instead (for users who want to, for example, take a Docker image and use it in a non-Docker context, which is common enough to consider). 😕 😞

I was already planning to only implement it for bookworm+ (to err on the side of extreme caution), but I think maybe we could implement it only in the case of our "slim" variants, where it's already somewhat expected that we're getting more aggressive about removing things deemed "unnecessary" (removing files there is not quite enough to "revert to stock" so to speak -- packages have to be explicitly reinstalled if things like man pages are desired).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants