Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

useradd: Do not reset non-existent data in {last,fail}log #558

Merged
merged 1 commit into from
Aug 31, 2022

Conversation

DonKult
Copy link
Contributor

@DonKult DonKult commented Aug 24, 2022

useradd does not create the files if they don't exist, but if they exist
it will try to reset data even if the data did not exist before
resulting in a lot of zeros ending up in tarballs.

An example is basically every chroot containing a Debian (or derivative) system which is packed into a tarball later on e.g. for build servers. The chroot will have an _apt system user created in the bootstrap which fills the {last,fail}log files with zeros as they tend to be already created with the right permissions.

Of course, the files aren't large and zeros compress rather well (if you compress your tarballs), but what doesn't exist takes up even less space.

Rational for "ignoring" LOG_INIT: The documentation for --no-log-init says that it "do[es] not add the user to the lastlog and faillog databases", but it also says that its propose is a "reset to avoid reusing the entry from a previously deleted user". If the focus were on "add the user" the files should be created if they don't exist, but from the code as well as the later sentence in the docs I get the impression that the focus is on not (re)using incorrect entries and that is achieved just fine by the implicit zeros without making them explicit. So the init is still happening, it just doesn't write to disk for it if it doesn't need to.

@ikerexxe
Copy link
Collaborator

Let me see if I understand the problem correctly. useradd resets the data for a user in {last,fail}log if the file exists and the data for this user is found. This is a problem in chroot environments because a user called _apt created at bootstrap fills these files with zeros, thus making the files look as if a given user already existed. So when a user is created its data is reset even this data is zero. Right?

@DonKult
Copy link
Contributor Author

DonKult commented Aug 25, 2022

Close, but the data for the user is not found as is doesn't exist: The situation is that the files exist (and have owner and permissions set) but are empty. That is no problem for the lseek as it will happily seek past the end of the file creating a "hole" which if you read it is full of null bytes as if you had written them explicitly there. The write performs the reset – of data which did not exist before – so the previously empty file is now a sparse file consisting of a hole from the start up to the explicit zero'd user data following the hole. This MR "just" tries to keep the file empty instead of filling them with zeros needlessly.

(I mentioned _apt explicitly as it tends to be the first user being dynamically created with useradd (through wrapping, but lets ignore that) on a Debian (derivative) system. Anything is possible though: Someone could have already created, logged in and deleted a user by the time the _apt user is created, so we can't "just" use --no-log-init and be done even if that is extremely unlikely in practice. This leads to those empty files being filled with zeros up to uid 100 (= the one given to _apt usually) in the normal case, which I would like to avoid. This isn't really related to _apt, Debian or so specifically, that is just my background and my motivation.)

@josch
Copy link

josch commented Aug 28, 2022

The size of {last,fail}log created depends on the uid. With a high uid, {last,fail}log can become extremely large, so it's easy to find many places where people just started calling useradd with --no-log-init and called it a day. Here is somebody with a 300 GB file:

Tecnativa/doodba#251 (comment)

Or 272 GB here: kartoza/docker-geoserver#410 (uid is 1001470000)

Or 32 GB here: moby/moby#5419 (uid is 99900000)

The problem is so prevalent that using --no-log-init even made it into docker best practices: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#user

This pull request would avoid this problem because all these issues referenced above do not have a "previously deleted user" so there is nothing to reset.

@ikerexxe
Copy link
Collaborator

Close, but the data for the user is not found as is doesn't exist: The situation is that the files exist (and have owner and permissions set) but are empty. That is no problem for the lseek as it will happily seek past the end of the file creating a "hole" which if you read it is full of null bytes as if you had written them explicitly there. The write performs the reset – of data which did not exist before – so the previously empty file is now a sparse file consisting of a hole from the start up to the explicit zero'd user data following the hole. This MR "just" tries to keep the file empty instead of filling them with zeros needlessly.

Good explanation. Thanks for it.

However before merging the code I'd like to check how stat() and lseek() report the data for different type of files.

Copy link
Collaborator

@ikerexxe ikerexxe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had the opportunity to check the return values for stat() and lseek() depending on the type of file (empty, 1KB of info and sparse data at 1KB). I think that these changes are very neat and will solve the problem.

Still, there are two minor issues that I'd like to tackle before merging the code.

src/useradd.c Show resolved Hide resolved
src/useradd.c Show resolved Hide resolved
useradd does not create the files if they don't exist, but if they exist
it will reset user data even if the data did not exist before creating
a hole and an explicitly zero'd data point resulting (especially for
high UIDs) in a lot of zeros ending up in containers and tarballs.
@DonKult
Copy link
Contributor Author

DonKult commented Aug 31, 2022

force-push summary: Swapped empty and variable declaration line(s) as requested and as a bonus reworded commit message slightly.

@DonKult DonKult requested a review from ikerexxe August 31, 2022 11:51
Copy link
Collaborator

@ikerexxe ikerexxe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the patch!

@ikerexxe ikerexxe merged commit ebf9b23 into shadow-maint:master Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants