Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent problems with pkgin upgrade or pkgin full-upgrade and a few packages, especially sqlite3 #363

Open
drboone opened this issue Mar 17, 2023 · 10 comments
Assignees

Comments

@drboone
Copy link

drboone commented Mar 17, 2023

Per brief IRC discussion:

it's a real issue that, the same as the "pkg conflicts with ", I've not had a situation where I can reproduce and fix it so yeh, please raise an issue and include any information you can, e.g. a tarball of pkgdb would be handy

raising this issue about conflicts of a package with itself in the hope that I have data that might help track down the problem.

Extracts related to sqlite3 from pkg_install-err.log:

---Jan 09 14:41:37: upgrading sqlite3-3.40.0nb1...
---Feb 04 08:14:27: upgrading sqlite3-3.40.1...
pkg_delete: couldn't entirely delete package `sqlite3-3.40.0nb1'
---Mar 01 08:14:32: upgrading sqlite3-3.41.0...
---Mar 12 08:16:32: refreshing sqlite3-3.41.0...
pkg_add: Conflicting PLIST with sqlite3-3.39.0: bin/sqlite3
---Mar 13 08:12:27: upgrading sqlite3-3.41.1...
pkg_add: Conflicting PLIST with sqlite3-3.39.0: bin/sqlite3
---Mar 14 08:09:57: upgrading sqlite3-3.41.1...
pkg_add: Conflicting PLIST with sqlite3-3.39.0: bin/sqlite3
---Mar 15 08:11:58: upgrading sqlite3-3.41.1...
pkg_add: Conflicting PLIST with sqlite3-3.39.0: bin/sqlite3
---Mar 17 08:12:08: upgrading sqlite3-3.41.1...
pkg_add: Conflicting PLIST with sqlite3-3.39.0: bin/sqlite3

pkgdb.byfile.db is attached, gzipped because %^&*( github.

pkgdb.byfile.db.gz

@jperkin jperkin self-assigned this Mar 17, 2023
@drboone
Copy link
Author

drboone commented Mar 29, 2023

Here's a longer log extract for a machine that's currently exhibiting the sqlite3 issue.

bigriver.txt

jperkin pushed a commit that referenced this issue Apr 28, 2023
v3.20211022.1

  * Fix #362 to have make create-package-deb work again. by @jordansissel in
    #363

v3.20210903.1

Allow running under XWayland

  * Revert XWayland detection. Some parts of xdotool do not work under
    XWayland. However,
    many features do work on XWayland, and rejecting XWayland caused problems
    for several
    folks who were otherwise happily using xdotool under Wayland/XWayland. (#
    346, #355)

v3.20210804.2

Fixes a packaging issue in the previous release.

v3.20210804.1

  * xdotool and libxdo will now reject if it is running under Wayland/
    XWayland.
    If XWayland is detected, the program will fail. This is because XWayland
    doesn't allow xdotool or libxdo to work correctly. (#342, Jordan Sissel)
  * New command windowstate which can be used to modify properties of
    windows.
    For example, to full-screen the current window, use:
    xdotool getactivewindow windowstate --add FULLSCREEN
    (#158 by Zhai Zhao Xuan)
  * New command windowquit which is used to ask the windowed application to
    terminate. (#306, Antonio Russo)
  * New command getwindowclassname to print the window's class name.
    (#247, Dominic Mueller)
  * When sending keystrokes, enter and return now are synonyms for the
    Return key symbol (CSylvain)
  * getmouselocation now updates the window stack with the window the cursor
    is
    currently over. (#118, Jordan Sissel)
  * search command now supports searching by window role with --role flag (#
    305, altblue)
  * search command should now no longer report BadWindow errors (#335, Marek
    Marczykowski-Górecki)
  * get_window_location now reports correct value (#289, Edwin Heerschap)
  * Uppercase Latin-1/Basic Latin are now typed correctly (#283, Hasan)
  * Document the regular expressions (POSIX Extended) supported by xdotool
    (#???, Lucas Werkmeister)
  * Use the default X11 Screen instead of assuming 0 (#265, Miroslav Koškár)
  * Wrap header files with extern "C" to enable easier C++ use of libxdo.
    (#331, easyaspi314)
  * Install pkgconfig file when running make install (#229, Joakim Repomaa)
  * Set permissions correctly when installing xdo.h (#324, Dan Church)
  * Fix memory leak (#241, Andrew McDermott)
  * Fix memory leak (#299, orcNo and longqi)
  * Fixed some documentation typos (#161, Vincent Legoll; #336, yjqg6666)
  * Fix all compiler warnings during make (#344, Jordan Sissel)
jperkin pushed a commit that referenced this issue May 12, 2023
Change build-system to cmake, autotools is deprecated upstream.


Changelog (taken from https://github.com/libusb/hidapi/releases):

hidapi-0.13.1

    hidraw: fix invalid read past the UDEV buffer;


hidapi-0.13.0

    general: add hid_get_device_info (#432);
    general: Meson build script (as a wrapper over CMake) (#410);
    general: add HID Bus Type in hid_device_info (#308);
    libusb: primary usage_page/usage is now available with hid_get_device_info regardless of the compilation flags;
    hidraw: Open files with O_CLOEXEC to not leak fds to child processes (#446);
    hidraw: add support for HID over SPI (#486);
    macOS: implement hid_error (#314);
    cmake: libusb: Ensure Iconv is found when provided via CFLAGS/LDFLAGS (#430);
Other various improvements.


hidapi-0.12.0

    Windows: migrate from SetupApi to CfgMgr32 (#362) - as per recommendation from Microsoft;

    Windows: add hid_winapi_get_container_id WinAPI-specific function (#379);

    Windows: improved error messages (#388);

    Windows: fixed out-of-boundary memory access for some of the function (#418);

    windows: Add .rc (#415);

    macOS: add hid_darwin_get_location_id macOS-specific function (#378);

    macOS: add macOS-specific function(s) to open device(s) in non-exclusive mode (#397);

    libusb: improved CMake dependency on Iconv (#405) - as a result, better support for NetBSD;

    general: documentation improvements;

    general: small code cleanups/improvements;


hidapi-0.11.2

    hidraw: hid_get_input_report implementation for kernels that supports it (#351);
    windows: several improvements and bugfixes (#348, #360, #361);
    libusb: fix potential crash when libusb_detach_kernel_driver fails (#363)
    general: documentation improvements;
jperkin pushed a commit that referenced this issue Jun 11, 2023
# httpuv 1.6.11

* Fix race condition introduced in 1.6.10. (#363)

* Hygiene and metadata improvements requested by CRAN. (#366, #369, #370)


# httpuv 1.6.10

* WebSocket connections now send Ping frames to the client every 20
  seconds. This is only intended to serve as a keepalive for proxies
  that might be sitting in front of us; we don't pay attention to
  whether a Pong response is received in a timely manner. (#359)
jperkin pushed a commit that referenced this issue Sep 25, 2023
Changes since 0.12.0:

eza v0.13.0

Description breaking changes

Another Monday, another eza.

We're very close to having integration testing sorted, very very close.
That means clap is soon gonna be merged, and then we'll be proper into
the release candidate. Until then, we only had a slight breaking change,
and we assumed users would prefer getting updates.

    BREAKING CHANGE: The style codes for huge file and units where
    documented to be nt and ut but the code was using nh and uh.
    The code has been updated to match the documented style codes.
    EXA_COLORS using style codes nh and uh will need to be updated to
    use nt and ut.

Changelog

Features

  * Add completion files in deb packaging script
  * Adds filtering for Windows hidden files
  * Support --mount option on Mac
  * Lazy loading of a files extended attributes and absolute path

Bug Fixes

  * Crate can't contain broken symlink
  * Remove executable flag from fish completion file
  * Use proc_mounts only on linux
  * Hotfix harmful documentation
  * Fix hyperlinks on Windows
  * Nix flake check also builds the package
  * [breaking] Change number_huge and unit_huge to match the man page short codes

Documentation

  * Added cafkafk suggestions
  * Fix codeblocks in zsh completions
  * Update README.md
  * Document filetypes theme and rename trait
  * Link directly to space
  * Add Mac support for the --mount option in the man page
  * Add SAFETY comments to unsafe code blocks
  * Update deb instructions to use keyring
  * Fix chmod in deb installation instructions
  * Add potential gpg install to deb installation instructions
  * Document character style pairs in the code and match with man page
  * Add install instructions for Void Linux
  * Documentation of 'sn' and 'sb' conflicted with later docs
  * Document dimmed and italic style codes

Miscellaneous Tasks

  * Augment gitter size in README

Performance

  * Add criterion for benchmarking

Refactor

  * Refactor just in crossfile
  * DRY up justfile
  * Ignore missing MSVC docker image
  * Removed unused imports, mark mods as allow unused
  * Format code
  * Move ALL_MOUNTS to fs::mounts
  * Migrate ALL_MOUNTS from lazy_static to OnceLock

Testing

  * Stabalised unit-tests.yml
  * Autogenerate testing dir
  * Autogenerate test dirs
  * Generate device files
  * Add unit tests that test both exa and ls style codes together
  * Address variable names

Build

  * Add musl binary for linux
  * Fix checksums
  * Add TODOs to targets
  * Set optlevel to 3

Ci

  * Add nix Flake check to flake.yml
  * Removed nix build in favor of nix flake check
  * Include bash completion script in treefmt and fixed shellcheck formatting in completion script
  * Fix windows build
  * Fix spelling attemps -> attempts

New Contributors

  * @cocoliliace made their first contribution in #322
  * @raylu made their first contribution in #332
  * @tranzystorek-io made their first contribution in #363

Full Changelog: eza-community/eza@v0.12.0...v0.13.0
@drboone
Copy link
Author

drboone commented Oct 18, 2023

I have several machines today that are having trouble upgrading openssl. It's an easy workaround -- pkg_delete -f, pkg_add. Log from one:

---Oct 18 14:33:05: [3/4] upgrading openssl-1.1.1w...
pkg_add: Conflicting PLIST with openssl-1.1.1pnb1: bin/c_rehash
pkg_add: 1 package addition failed

There's further weirdness - some packages claim to install, but another full-upgrade will do it over and over. And I've seen e.g. mozilla-rootcerts listed twice in the upgrade list.

jperkin pushed a commit that referenced this issue Feb 1, 2024
## Version 3.1.10 (January 26, 2024)

Patch release with various build/bug fixes.

Build fixes:

  - Fix a problem where downstream projects using Imath would build
    python bindings even if they weren't requested.
  - Fix for missing `std::bit_cast`
  - Fix missing/necessary use of IMATH_HOSTDEVICE
  - IMATH_INSTALL_PKG_CONFIG is now on by default, even on Windows
  - Fix calling default constructor by uniform init in TypeTraits
  - Fix redundant PYIMATH_EXPORTS causing compile issues on Windows Clang
  - Update to SO versioning policy:

    This change adopts a policy of appending the ``MAJOR.MINOR.PATCH``
    software release name to the ``SONAME`` to form the real name of the
    shared library.

    See [website/install.rst](website/install.rst) and [PR
    #339](AcademySoftwareFoundation/openexr#339)
    for more details.

Bug fixes:

  - Fix half to float giving wrong results on older x86_64 CPUs on Windows

Other changes:

  - succ()/pred() now use std::nextafter().
  - Expand epsilon bounds for m44x pyImath test.
  - Rename "docs" to "website".
  - Add missing copyright/license identifiers.

### Merged Pull Requests

* \[[#363](AcademySoftwareFoundation/Imath#363]
Update install instructions to reflect new SO versioning policy

* \[[#362](AcademySoftwareFoundation/Imath#362]
Require sphinx 5.0

* \[[#361](AcademySoftwareFoundation/Imath#361]
src/python/config/ModuleDefine.cmake: do not install a cmake file exporting targets for dependent projects

* \[[#358](AcademySoftwareFoundation/Imath#358]
Fix half to float giving wrong results on older x86_64 CPUs on Windows

* \[[#353](AcademySoftwareFoundation/Imath#353]
Changed implementation of succ and pred to use std::nextafter.

* \[[#350](AcademySoftwareFoundation/Imath#350]
Fix check for the availability of std::bit_cast

* \[[#349](AcademySoftwareFoundation/Imath#349]
IMATH_INSTALL_PKG_CONFIG is on by default, even on Windows

* \[[#347](AcademySoftwareFoundation/Imath#347]
Export Imath from the build tree and so on.

* \[[#344](AcademySoftwareFoundation/Imath#344]
rm unnecessary IMATH_HOSTDEVICE attributes.

* \[[#342](AcademySoftwareFoundation/Imath#342]
Add missing copyright/license identifiers

* \[[#341](AcademySoftwareFoundation/Imath#341]
Remove unnecessary files

* \[[#340](AcademySoftwareFoundation/Imath#340]
Fix calling default constructor by uniform init.

* \[[#339](AcademySoftwareFoundation/Imath#339]
Update and simply SO versioning policy

* \[[#338](AcademySoftwareFoundation/Imath#338]
Rename "docs" to "website"

* \[[#333](AcademySoftwareFoundation/Imath#333]
Expand epsilon bounds for m44x pyImath test

* \[[#331](AcademySoftwareFoundation/Imath#331]
Fixed redundant PYIMATH_EXPORTS causing compile issues on Windows Clang

* \[[#326](AcademySoftwareFoundation/Imath#326]
Use security@openexr.com for consistency

* \[[#320](AcademySoftwareFoundation/Imath#320]
Add missing IMATH_HOSTDEVICE to Matrix33<T>::invert(bool)
jperkin pushed a commit that referenced this issue Feb 23, 2024
✨ Read the highlights of this release: https://git-cliff.org/blog/2.0.0
⛰️ Features
    (args) Add --no-exec flag for skipping command execution (#458) - (7ae77ff)
    (args) Add -x short argument for --context - (327512a)
    (args) Support initialization with built-in templates (#370) - (4bee628)
    (args) Allow returning the bumped version (#362) - (5e01e4c)
    (args) Set CHANGELOG.md as default missing value for output option (#354) - (04d149e)
    (changelog) Set the timestamp of the previous release - (d408e63)
    (changelog) Improve skipping via .cliffignore and --skip-commit (#413) - (faa00c6)
    (changelog) Support tag prefixes with --bump (#347) - (2399e57)
    (changelog) [breaking] Set tag to 0.0.1 via --bump if no tags exist - (3291eb9)
    (changelog) [breaking] Support templating in the footer (#369) - (0945fa8)
    (commit) Add merge_commit flag to the context (#389) - (dd27a9a)
    (github) [breaking] Support integration with GitHub repos (#363) - (5238326)
    (parser) Support using SHA1 of the commit (#385) - (1039f85)
    (parser) Support using regex scope values (#372) - (19e65c2)
    (template) Support using PR labels in the GitHub template (#467) - (30d15bb)
    (template) Support using PR title in the GitHub template (#418) - (6f32f33)
    (website) Add search bar to the website - (2d30491)

🐛 Bug Fixes
    (cd) Use workaround for linux-arm64-glibc maturin builds - (dc79ed5)
    (cd) Disable PyPI publish for linux-arm64-glibc - (e24af12)
    (cd) Avoid creating artifacts with the same name - (1647fd8)
    (cd) Fix embedding examples for crates.io release - (46b7d88)
    (changelog) Fix previous version links (#364) - (44c93b7)
    (changelog) Set the correct previous tag when a custom tag is given - (6203f77)
    (ci) Update cargo-msrv arguments - (131dd10)
    (cli) Fix broken pipe when stdout is interrupted (#407) - (bdce4b5)
    (commit) Trim the trailing newline from message (#403) - (514ca4b)
    (git) Sort commits in topological order (#415) - (29bf355)
    (links) Skip checking the GitHub commit URLs - (273d6dc)
    (website) Use node version 18 - (46dcce3)
    (website) Use prism-react-renderer v2 with docusaurus - (664ff9b)
    Allow version bump with a single previous release - (d65aec9)

🚜 Refactor
    (changelog) Support --bump for processed releases (#408) - (89e4c72)
    (ci) Use hardcoded workspace members for cargo-msrv command - (ec6035a)
    (ci) Simplify cargo-msrv installation - (f04bf6e)
    (clippy) Apply clippy suggestions - (b23dd3e)
    (clippy) Apply clippy suggestions - (a38c3fa)
    (config) Use postprocessors for checking the typos - (764e858)
    (config) Remove unnecessary newline from configs - (8edec7f)

📚 Documentation
    (configuration) Fix typo (#466) - (34a58e6)
    (fixtures) Add instructions for adding new fixtures - (8290769)
    (readme) Mention RustLab 2023 talk - (668a957)
    (readme) Use the raw link for the animation - (2c524b8)
    (security) Update security policy - (fcaa502)
    (website) Add highlights for 2.0.0 (#504) - (49684d0)
    (website) Improve matching gitmoji tip (#486) - (0731646)
    (website) Add tips and tricks section - (82e93c2)
    (website) Add tip about link parsers - (4bd47a6)
    (website) Add git-cliff animation to the website (#404) - (0561124)
    (website) Split the configuration section - (67486cc)
    (website) Add installation instructions for Homebrew (#357) - (b2f8091)

🎨 Styling
    (website) Add GitHub logo to the header - (1da7cac)
    (website) [breaking] Use dark theme as default - (dcc5116)

🧪 Testing
    (changelog) Use the correct version for missing tags - (0ca4cdb)
    (fixture) Update the date for example test fixture - (991a035)
    (fixture) Add test fixture for bumping version - (c94cb6a)
    (fixtures) Update the bumped value output to add prefix - (f635bae)

⚙️ Miscellaneous Tasks
    (changelog) Disable the default behavior of next-version (#343) - (4eef684)
    (changelog) Use 0.1.0 as default next release if no tag is found - (3123fd2)
    (command) Explicitly set the directory of command to current dir - (722efd6)
    (config) Skip dependabot commits for dev updates - (7f89160)
    (config) Revamp the configuration files - (9500bf8)
    (config) Use postprocessors for checking the typos - (5212cc9)
    (dependabot) Group the dependency updates for creating less PRs - (c6a92bf)
    (docker) Update versions in Dockerfile - (51198a5)
    (embed) Do not allow missing docs - (7754cab)
    (example) Use full links in GitHub templates (#503) - (a521891)
    (example) Remove limited commits example - (8e1e0d7)
    (github) Update templates about GitHub integration - (3f5107a)
    (mergify) Don't update PRs for the main branch - (96a220c)
    (project) Add readme to core package - (9e6bad2)
    (project) Bump MSRV to 1.74.1 - (bd5e4d2)
    (project) Update copyright years - (edc6bc0)
    (website) Fix URLs in navigation bar (#438) - (70cab99)
    (website) Rename the header for GitHub integration - (3fd9476)
    (website) Fix broken anchors - (34593dd)
    (website) Bump docusaurus to 3.1.0 - (af4482b)
    (website) Update the titles for distro installations - (ff2881b)
    (website) Add Mastodon link to the website - (2e761c9)
@drboone
Copy link
Author

drboone commented Mar 5, 2024

Seeing this today on routine pkgin full-upgrade:

---Mar 05 14:52:07: [1/1] upgrading pkg_install-20240126...
pkg_add: Conflicting PLIST with pkg_install-20210410: man/man1/pkg_add.1.gz
pkg_add: 1 package addition failed

@jperkin
Copy link
Collaborator

jperkin commented Mar 5, 2024

Ugh, yeh, lemme see if I can carve out time tomorrow to try and nail this down once and for all.

@drboone
Copy link
Author

drboone commented Mar 5, 2024

If it'd help to have ssh access to an affected machine, I can arrange.

@drboone
Copy link
Author

drboone commented Apr 4, 2024

The corrupted database warning you mentioned in IRC the other day has appeared on several of our systems:

beautiful 5 $ pkg_admin rebuild
pkg_admin: corrupt pkgdb, duplicate PKGBASE entries:
        pkgsrc-gnupg-keys-20190423
        pkgsrc-gnupg-keys-20201014

So what's the proper cleanup process here? I'm pretty sure I've removed specific package version in the past, possibly using pkg_add to get key packages back.

@drboone
Copy link
Author

drboone commented Apr 4, 2024

Digging deeper, I'll add that the most recent gz where I've had pkgin full-upgrade problems does not exhibit the corrupt pkgdb errors, but does still have conflicting file problems this morning:

---Apr 04 12:13:29: [1/19] refreshing ncurses-6.4...
---Apr 04 12:13:29: [2/19] refreshing readline-8.2nb2...
---Apr 04 12:13:29: [3/19] refreshing sqlite3-3.45.2...
---Apr 04 12:13:29: [4/19] refreshing xz-5.4.6...
---Apr 04 12:13:30: [5/19] refreshing ncursesw-6.4...
---Apr 04 12:13:30: [6/19] upgrading pkg_install-20240307...
pkg_add: Conflicting PLIST with pkg_install-20211115: man/man1/pkg_add.1.gz
pkg_add: 1 package addition failed
---Apr 04 12:13:30: [7/19] refreshing python311-3.11.8...
---Apr 04 12:13:32: [8/19] refreshing python312-3.12.2...
---Apr 04 12:13:35: [9/19] upgrading pkg_install-20240307...
pkg_add: Conflicting PLIST with pkg_install-20211115: man/man1/pkg_add.1.gz
pkg_add: 1 package addition failed
---Apr 04 12:13:35: [10/19] refreshing libarchive-3.7.2...
---Apr 04 12:13:35: [11/19] refreshing pkgsrc-gnupg-keys-20231210...
pkg_add: Conflicting PLIST with pkgsrc-gnupg-keys-20201014: share/gnupg/pkgsrc-security.gpg
pkg_add: 1 package addition failed
---Apr 04 12:13:35: [12/19] upgrading pkgin-23.8.1nb3...
---Apr 04 12:13:35: [13/19] refreshing pkgin-23.8.1nb3...
---Apr 04 12:13:36: [14/19] refreshing py312-pip-24.0...
---Apr 04 12:13:36: [15/19] upgrading bsdinstall-20160108nb1...
---Apr 04 12:13:36: [16/19] refreshing py312-wheel-0.43.0...
---Apr 04 12:13:36: [17/19] refreshing py312-setuptools-69.2.0...
---Apr 04 12:13:36: [18/19] refreshing py311-pip-24.0...
---Apr 04 12:13:37: [19/19] refreshing py311-wheel-0.43.0...

This is a machine I'm quite convinced has never had an improper tools or bootstrap kit applied -- it got the tools during the new-machine install, and hasn't been messed with.

@drboone
Copy link
Author

drboone commented May 13, 2024

Another round of this, still with no errors from pkg_admin rebuild:

---May 13 12:14:09: [1/5] upgrading pkg_install-20240307...
pkg_add: Conflicting PLIST with pkg_install-20211115: man/man1/pkg_add.1.gz
pkg_add: 1 package addition failed
---May 13 12:14:10: [2/5] upgrading pkg_install-20240307...
pkg_add: Conflicting PLIST with pkg_install-20211115: man/man1/pkg_add.1.gz
pkg_add: 1 package addition failed
---May 13 12:14:10: [3/5] refreshing pkgsrc-gnupg-keys-20231210...
pkg_add: Conflicting PLIST with pkgsrc-gnupg-keys-20201014: share/gnupg/pkgsrc-security.gpg
pkg_add: 1 package addition failed
---May 13 12:14:10: [4/5] upgrading pkgin-23.8.1nb3...
---May 13 12:14:10: [5/5] upgrading bsdinstall-20160108nb1...
avenueq 6 $ pkg_admin rebuild

Stored 27252 files and 1 explicit directory from 45 packages in /opt/tools/var/db/pkg/pkgdb.byfile.db.
Done.

@jperkin
Copy link
Collaborator

jperkin commented May 14, 2024

Some of the discussion for this ticket has been done on IRC, so I'll just try to summarise everything here so that it's all in one place.

The core problem here is that something is corrupting the pkgdb, specifically by extracting at least one package, usually more, over the top of an existing install, so that there ends up being duplicate directory entries for the same PKGBASE in the pkgdb directory.

The pkgdb directories are:

  • /opt/tools/var/db/pkg (GZ tools set)
  • /opt/local/pkg (standard zone set)

Each directory entry inside them refers to an individual installed package, and critically there must only ever be one unique entry for each package (minus the version number). There must never be e.g. foo-1.0 and foo-1.1. For example, taking one of the failures from output in the comment above:

---May 13 12:14:10: [3/5] refreshing pkgsrc-gnupg-keys-20231210...
pkg_add: Conflicting PLIST with pkgsrc-gnupg-keys-20201014: share/gnupg/pkgsrc-security.gpg

This shows that there are both pkgsrc-gnupg-keys-20231210 and pkgsrc-gnupg-keys-20201014 entries inside the pkgdb, and this then results in the cascading failures.

The various pkgin upgrade problems here are merely symptoms, not the cause. The pkgdb was already corrupted prior to pkgin being executed.

The question is, how? Going back to the bigriver.txt log is interesting, specifically when tracing sqlite3 entries.

---Mar 01 08:14:32: upgrading sqlite3-3.41.0...

The upgrade on Mar 01 worked fine, sqlite3 was apparently upgraded to 3.41.0 with no issues.

---Mar 03 08:09:27: upgrading sudo-1.9.13p2...
---Mar 03 08:09:28: refreshing npm-8.15.1...
---Mar 03 08:09:28: refreshing nodejs-19.7.0...

These are the only entries from this date. This looks like a regular upgrade that worked fine, and only needed to touch these three packages.

---Mar 12 08:16:32: refreshing sqlite3-3.41.0...
pkg_add: Conflicting PLIST with sqlite3-3.39.0: bin/sqlite3
pkg_add: 1 package addition failed

This is where things go sideways. Pretty much every package has been selected for either refresh or upgrade. This can be normal, especially if there was a bump in a core package that resulted in a rebuild of every package.

However, where did sqlite3-3.39.0 come from? The sqlite3 package was upgraded to 3.41.0 11 days prior to this with no errors, and there were no errors on Mar 03 where if there was a 3.39.0 package lying around it would have been selected for upgrade.

Looking over all of the logs, the packages that so far have exhibited this issue are:

openssl
pkg_install
pkgsrc-gnupg-keys
sqlite3

These packages all have one thing in common, in that they are (or at least were) all bootstrap packages that are distributed as part of the bootstrap kit tarball. I am almost certain that the underlying cause of all these problems is that a bootstrap kit is being unpacked over the top of an existing install. To my knowledge I've not yet seen any examples of this issue where the package causing the problems is outside of the bootstrap kit, which would further rule out issues with e.g. pkg_install not upgrading packages correctly.

To be more specific, here are the packages including versions that have exhibited the problems:

pkg_add: Conflicting PLIST with openssl-1.1.1pnb1: bin/c_rehash
pkg_add: Conflicting PLIST with pkg_install-20211115: man/man1/pkg_add.1.gz
pkg_add: Conflicting PLIST with pkgsrc-gnupg-keys-20201014: share/gnupg/pkgsrc-security.gpg
pkg_add: Conflicting PLIST with sqlite3-3.39.0: bin/sqlite3

These correspond exactly to the versions that were distributed as part of the bootstrap-trunk-tools-20220706.tar.gz bootstrap kit:

$ tar ztf bootstrap-trunk-tools-20220706.tar.gz | grep CONTENTS | egrep 'openssl|pkg_install-2|pkgsrc-gnupg-keys|sqlite3' | sort
./opt/tools/var/db/pkg/openssl-1.1.1pnb1/+CONTENTS
./opt/tools/var/db/pkg/pkg_install-20211115/+CONTENTS
./opt/tools/var/db/pkg/pkgsrc-gnupg-keys-20201014/+CONTENTS
./opt/tools/var/db/pkg/sqlite3-3.39.0/+CONTENTS

One other thing to mention is that in cases where pkg_install is not upgraded, you won't see any of the new corrupt pkgdb warnings that I've added, as you'll still be running an older version that doesn't have them.

I think what I'd suggest at this point is having something like this handy (swap the pkgdb directory for normal zones as required):

$ ls /opt/tools/var/db/pkg | awk '/-/ { sub("-[^-]*$", ""); if (seen[$0]) { print "ERROR: " $0; exit 1; } else { seen[$0] = 1 }}'

If you're able to add this one-liner to both before and after running pkgin upgrade (I believe you've mentioned using ansible in the past? if so adding it as a pre-requisite task that must exit 0 before pkgin is run, and then again after), it will help catch pkgdb corruption prior to running pkgin and stop the attempted upgrade, and that may help narrow down the point at which a bootstrap kit is unpacked over the top, especially if any previous runs ran that command successfully after a pkgin upgrade (thus confirming that the upgrade was clean).

In terms of cleaning up installs that are broken, wherever possible I'd strongly recommend a wipe and reinstall of the pkgsrc areas (/opt/tools in a GZ, /opt/local and /var/db/pkgin in a zone), just to make sure there are no leftovers of corruption. Tools such as pkgin export / pkgin import can help with that. Otherwise, it's a case of manually looking in the pkgdb at the duplicate directory entries, and removing the directory entries that do not correspond to the installed binaries. After doing this, running pkg_admin rebuild; pkg_admin rebuild-tree may get things back to a consistent state, but there is always the chance that some on-disk binaries are not correct.

@drboone
Copy link
Author

drboone commented May 15, 2024

Thanks for the detailed analysis.

I've done a bit of digging into the installer tooling. This one gz where I have conflicts has pkg_admin 20240126 (explains lack of sanity check) and was installed with the 20231113 platform image. Its name is avenueq, and it's the one I refer to in the April 4 and May 13 notes above. I've been focusing on that one for a while because I'm quite sure that it had its tooling installed properly, as opposed to other older gz or guest systems where I may have done something stoopid.)
During install, it appears that platform used bootstrap-trunk-tools-20220706.tar.gz to set up pkgsrc. This seems to track with your comment above regarding versions. So I'm still puzzled about how this one machine got here.

I'll do the export/wipe/reinstall/import thing on this one machine and see how it goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants