Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade from Debian 7 Wheezy (Puppet 3) to Debian 11 Bullseye (Puppet 7) #8

Open
25 of 26 tasks
Krinkle opened this issue Mar 17, 2021 · 22 comments
Open
25 of 26 tasks
Assignees

Comments

@Krinkle
Copy link
Member

Krinkle commented Mar 17, 2021

List of hosts

https://github.com/jquery/infrastructure/blob/puppet-stage/manifests/site.pp

  • wp-01, jquery.com
    • jquery.com
    • api.jquery.com
    • learn.jquery.com
    • plugins.jquery.com
  • wp-02, most other sites (incl *.jquery.org, jqueryui.com, etc)
  • wp-03, codeorigin.jquery.com, releases.jquery.com, and recipient of Git assets
  • wp-01.stage, WordPress doc sites, staging, all domains (stage.api.jquery.com, etc)
  • builder-01
  • builder-03.stage
  • jq03.stage.jquery.com (stage.demos.jquerymobile.com, stage.themeroller.jquerymobile.com)
  • jenkins-01
  • cla-01.ops.jquery.net
  • cla-01.stage.jquery.net
  • gruntjs.ops.jquery.net
  • gruntjs.stage.jquery.net
  • origin-01.ops.jquery.net, contentorigin (content.jquery.com, static.jquery.com)
  • swarm-01.ops.jquery.net, TestSwarm
  • view-01.ops.jquery.net, View, git assets
  • trac.ops.jquery.net, Trac, (bugs.jquery.com, bugs.jquerui.com)

Dedicated tickets:

Overview

In order to get away from the very outdated Debian versions and such, we need to also get to a newer Puppet version.

We are currently using numerous Puppet 2 features that were deprecated in Puppet 3 and removed in Puppet 4. The main change that I think affects us is the change from "environment configs" to "environment directories".

Some relevant links:

Status quo: Puppet 3

The puppet server runs at puppet.ops.jquery.net (in legacy docs: puppet-master). The config for the server is at /etc/puppet/puppet.conf. There are two Git clones that we care about on this server:

  • /etc/puppet - This is a clone of jquery/infrastructure.git at branch puppet-master. This currently replaces the entire /etc/puppet directory.
  • /etc/puppet-stage – This is a directory we made up, containing another clone of jquery/infrastructure.git at branch puppet-stage.

In /etc/puppet/puppet.conf (the only place the Puppet server actually looks at) we have the following stuff:

[main]
# …
templatedir=$confdir/templates
manifest=/etc/puppet/manifests/site.pp

[stage]
manifest=/etc/puppet-stage/manifests/site.pp
modulepath=/etc/puppet-stage/modules
# …

[master]
# …

By default, with one of our droplets that runs a puppet agent asks for provisioning, it gets provisoned by the main config which points simply at the subdirectories within /etc/puppet. On staging hosts, we have another /etc/puppet/puppet.conf file that may contain environment = stage, which the agent passes on to the Puppet server, and so the Puppet server will consider that manifest and modulepath directory instead (in addition to compling it with $::environment = "stage").

Beyond this, the only other thing worth knowing is that we use jquery::postreceive instances (similar to for the content sites) to automatically update these git checkouts after commits to them. The actual applying of changes however is passive, based on puppet agents checking in with the server every 30 minutes (default Puppet agent behaviour).

Puppet 4

Under Puppet 4, things are a little bit different. There is no longer support for the templatedir, manifest, and modulepath parameters, and there is no longer support for per-environment configuration section overrides.

Instead, modules are read from a directory like /etc/puppet/code/environments/:environment/modules and manifests are read from a directory like /etc/puppet/code/environments/:environment/manifests. For example: /etc/puppet/code/environments/production/modules.

I think global templates are no longer supported, or at least not varying by environment. But that's okay, we only have one file in /templates and that'll either just not support staging or maybe we can even get rid of it (do we still use Zabbix?).

The new directory layout seems feasible, we just create two more clones and keep both for a little while.

Transition

I noticed just now that, apart from a few minor tweaks being needed for deprecated features, more generally it is not supported to connect Puppet 4 clients to a Puppet 3 server. However, the other way around is supported. So, the puppet master will have to go first, and that means a master switch, and setting up a new one of those first as well.

The good news is, a Puppet server is relatively easy to configure and gradually switch to...

@Krinkle Krinkle self-assigned this Mar 17, 2021
@Krinkle
Copy link
Member Author

Krinkle commented Mar 18, 2021

Things I think we are not using, and I will omit initially in the Puppet 4 branch:

  • Zabbix
  • New Relic
  • Postfix
  • etckeeper

I'll mention this during this Friday's infra meeting (tomorrow) in case any we know of any of these definitely still being used.

/cc @mgol @brianwarner

@mgol
Copy link
Member

mgol commented Mar 19, 2021

I don't know much about our usage of any of these services with the exception of the fact that running:

sudo puppet agent --test

on jenkins-01 now results in the following output:

Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for jenkins-01.ops.jquery.net
Info: Applying configuration version '1616105347'
Notice: /Stage[main]/Main/Node[default]/Apt::Source[dotdeb]/Apt::Key[Add key: 3D624A3B from Apt::Source dotdeb]/Apt_key[Add key: 3D624A3B from Apt::Source dotdeb]/ensure: created
Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install libasound2 ' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package libasound2
Error: /Stage[main]/Main/Node[jenkins]/Package[libasound2 ]/ensure: change from purged to present failed: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install libasound2 ' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package libasound2
Notice: /Stage[main]/Jquery::Newrelic/Exec[install newrelic license]/returns: executed successfully
Error: Could not start Service[newrelic-sysmond]: Execution of '/usr/sbin/service newrelic-sysmond start' returned 1: Job for newrelic-sysmond.service failed. See 'systemctl status newrelic-sysmond.service' and 'journalctl -xn' for details.
Wrapped exception:
Execution of '/usr/sbin/service newrelic-sysmond start' returned 1: Job for newrelic-sysmond.service failed. See 'systemctl status newrelic-sysmond.service' and 'journalctl -xn' for details.
Error: /Stage[main]/Jquery::Newrelic/Service[newrelic-sysmond]/ensure: change from stopped to running failed: Could not start Service[newrelic-sysmond]: Execution of '/usr/sbin/service newrelic-sysmond start' returned 1: Job for newrelic-sysmond.service failed. See 'systemctl status newrelic-sysmond.service' and 'journalctl -xn' for details.
Notice: /Stage[main]/Main/Node[jenkins-01.ops.jquery.net]/Jquery::Ssh::Host[jenkins]/Exec[chmod 0600 /etc/ssh/ssh_host*]/returns: executed successfully
Notice: /Stage[main]/Main/Node[jenkins-01.ops.jquery.net]/Jquery::Ssh::Host[jenkins]/Exec[chmod 0644 /etc/ssh/*.pub]/returns: executed successfully
Notice: Finished catalog run in 8.30 seconds

so it looks like New Relic is somehow interfering with Puppet runs. I'm not sure if those errors are blocking anything.

@Krinkle
Copy link
Member Author

Krinkle commented Mar 19, 2021

Aye, that looks familiar. I'll continue the jenkins-01 issue at https://github.com/jquery/infrastructure/issues/433.

@Krinkle Krinkle changed the title Upgrade from Puppet 3 to Puppet 4 Upgrade from Puppet 3 (Debian 7 Wheezy) to Puppet 4 (Debian 9 Stretch) Mar 21, 2021
@Krinkle
Copy link
Member Author

Krinkle commented Aug 30, 2021

This cleanup was useful and benefitted our current droplets as well.

But, the issue as originally written I'm closing for now per https://github.com/jquery/infrastructure/issues/482#issuecomment-907890935. Might pick it up again depending on whether if/when we're going to have droplets running newer Debian versions.

@Krinkle Krinkle closed this as completed Aug 30, 2021
@Krinkle Krinkle reopened this Oct 27, 2021
@Krinkle
Copy link
Member Author

Krinkle commented Oct 29, 2021

@atdt I've created the puppet-02.ops.jquery.net instance (IP: 104.131.63.112, DNS not yet assigned), with a Debian 11 image, a small 2-CPU / 4GB RAM plan, and both of our SSH keys attached for initial bootstrapping.

Empty repo for Puppet manifests: https://github.com/jquery/infrastructure-puppet.
This is a new repo rather than a branch, so that we can manage most server configuration in public going forward. I suppose we can keep issue tracking and wiki pages in this repo for now. To be discussed at the infra meeting.

@Krinkle Krinkle changed the title Upgrade from Puppet 3 (Debian 7 Wheezy) to Puppet 4 (Debian 9 Stretch) Upgrade from Puppet 3 (Debian 7 Wheezy) to Debian 11 Bullseye Oct 29, 2021
@brianwarner
Copy link

This is in place as well, also with proxying enabled until you tell me not to.

@Krinkle
Copy link
Member Author

Krinkle commented Nov 1, 2021

@brianwarner Aye, yeah, this one should be without proxying as it's for internal use such as shell access and receiving webhooks.

@atdt
Copy link
Contributor

atdt commented Dec 28, 2021

What version of Puppet should we target?

Puppet <= 5 has already reached EOL, and Puppet 6 is projected to reach EOL in less than a year. (See: Puppet platform lifecycle.

OTOH, Puppet 7 has not yet been packaged for Debian 11 Bullseye. Puppet Labs estimates packages for Debian 11 will be available within the next month.

@Krinkle
Copy link
Member Author

Krinkle commented Dec 31, 2021

@atdt I see. I suppose we could wait another month.

Alternatively, we could go with Puppet 7 now if we use Debian 10 Buster, I think?
https://puppet.com/docs/puppet/7/server/install_from_packages.html

I don't see Debian 6 for Debian 11 Bullseye, but I'm not sure if I'm looking in the right place, is there one? I can't tell from the raw index at https://apt.puppet.com/. Either way, I imagine from one major Debian or Puppet to the next should be relatively simple with a few inline conditionals perhaps.

@atdt
Copy link
Contributor

atdt commented Jan 14, 2022

@Krinkle Puppet intends to release puppetserver 7.6.0 next week, with packages for Bullseye. Since it's so close, let's just wait.

@atdt
Copy link
Contributor

atdt commented Jan 23, 2022

OK, I installed puppetserver 7.6.0 on puppet-02. Here's what I ran:

#!/usr/bin/env bash
set -eux

# Enable the Puppet platform repository
# https://puppet.com/docs/puppet/7/install_puppet.html#enable_the_puppet_platform_repository
wget https://apt.puppet.com/puppet7-release-bullseye.deb
sudo dpkg -i puppet7-release-bullseye.deb

# Install Puppet server
apt install -y puppetserver
systemctl start puppetserver

# Install Puppet agent
apt install -y puppet-agent

# Start the Puppet service
/opt/puppetlabs/bin/puppet resource service puppet ensure=running enable=true

echo 'source /etc/profile.d/puppet-agent.sh' >> ~/.bashrc

/opt/puppetlabs/bin/puppet config set server puppet-02.ops.jquery.net --section main
/opt/puppetlabs/bin/puppet ssl bootstrap

@Krinkle
Copy link
Member Author

Krinkle commented Apr 28, 2022

I've created, in following with wiki: DNS and and wiki: Provisioning:

  • puppet-02.stage.ops.jquery.net
  • codeorigin-01.stage.ops.jquery.net

Both nyc3, 1 vCPU and 2 GB RAM, with Debian 11, and ori-2021 and krinkle-2020 for initial root.
Also named as such in DNS via Cloudflare.

(Prod ones later expected as 2 vCPU / 4 GB RAM.)

@Krinkle
Copy link
Member Author

Krinkle commented Apr 28, 2022

@atdt I've followed your steps on both of the droplets (puppetserver only for puppet-02, puppet agent on both), with one minor change. The apt install -y puppetserver failed.

root@puppet-02:/tmp/provision# apt install -y puppetserver
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package puppetserver

root@puppet-02:/tmp/provision# apt install -y puppet<tab>
puppet                                      puppet-module-icann-quagga                  puppet-module-puppetlabs-mount-core
puppet-beaker                               puppet-module-icann-tea                     puppet-module-puppetlabs-mysql
puppet-lint                                 puppet-module-ironic                        puppet-module-puppetlabs-ntp
puppet-master                               puppet-module-joshuabaird-ipaclient         puppet-module-puppetlabs-postgresql
...
puppet-module-heat                          puppet-module-puppetlabs-host-core          puppet-strings
puppet-module-heini-wait-for                puppet-module-puppetlabs-inifile            puppet7-release
puppet-module-horizon                       puppet-module-puppetlabs-mongodb            

After an apt-get update it worked fine however, and indeed among its output is:

...
Get:13 http://apt.puppetlabs.com bullseye/puppet7 amd64 puppet-agent amd64 7.16.0-1bullseye [20.1 MB]
Get:14 http://deb.debian.org/debian bullseye/main amd64 fontconfig-config all 2.13.1-4.2 [281 kB]
...
Get:22 http://apt.puppetlabs.com bullseye/puppet7 amd64 puppetserver all 7.7.0-1bullseye [78.3 MB]
...

But then the service didn't want to start, because:

java.lang.Error: Not enough available RAM (1,982MB) to safely accommodate the configured JVM heap size of 1,979MB. Puppet Server requires at least 2,177MB of available RAM given this heap size,

So I've re-created it with 4GB instead of 2GB.

In addition, for the codeorigin-02 droplet one extra step as the puppet ssl bootstrap will pause on the regular droplets that are not the puppetserver until it is signed on the puppetserver:

Couldn't fetch certificate from CA server; you might still need to sign this agent's certificate (codeorigin-01.stage.ops.jquery.net).
Info: Will try again in 120 seconds.
...
...
...
Info: csr_attributes file loading from /etc/puppetlabs/puppet/csr_attributes.yaml
Info: Creating a new SSL certificate request for codeorigin-01.stage.ops.jquery.net
Info: Certificate Request fingerprint (SHA256):  ....
Info: Downloaded certificate for codeorigin-01.stage.ops.jquery.net from https://puppet-02.stage.ops.jquery.net:8140/puppet-ca/v1
Notice: Completed SSL initialization

So I ran puppetserver ca list and then puppetserver ca sign --all. Mentiong here for future wiki page.

@Krinkle
Copy link
Member Author

Krinkle commented May 13, 2022

I've set up a basic skeleton at https://github.com/jquery/infrastructure-puppet for the puppet server, and provisioned as follows:

ssh root@puppet-02.stage.ops.jquery.net
$ cd /etc/puppetlabs/environments
$ rm -rf production/

$ apt-get install git
$ git clone https://github.com/jquery/infrastructure-puppet.git production/

This does not require a deployment ssh key, it can be an unauthenticated clone over HTTPS since this is the public puppet repository.

@atdt I originally wanted to set it up such that the public repo reflects /etc/puppetlabs/code rather than /etc/puppetlabs/code/environments/production/, but I couldn't get use of the modules directory to work. I had the following on the puppet server at /etc/puppetlabs/puppet/puppet.conf

[main]
disable_per_environment_manifest = true
default_manifest = /etc/puppetlabs/code/manifests
basemodulepath = /etc/puppetlabs/code/modules

But alas, it wasn't applying anything. There was no error though, it ran cleanly on codeorigin-01, but just didn't apply any roles. So the modulepath may've worked but that it was site.pp that was being ignored. Alas, in ec2631b I moved it all down a level and that's working now.

root@codeorigin-01:~# puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Caching catalog for codeorigin-01.stage.ops.jquery.net
Info: Applying configuration version '(491cd5b) Timo Tijhof - get_config: Follows-up ec2631b5b9'
Notice: /Stage[main]/Role::Codeorigin/Package[nginx]/ensure: created
Notice: Applied catalog in 9.08 seconds

Let me know if anything is off here, or could be better. Otherwise, next steps:

  • Settle on how to integrate private data.

I think it'd be neat if the public repo is standalone and self-sufficient for staging and local use, e.g. not hard-require a third repo with pseudo secrets to be integrated to substitute for the real secret repo. Instead, we might be able to get away with only having private data come from Hieradata YAML files, which have a straight-forward inheritence chain that we can configure in production to include one extra layer from a checkout of the private repo.

  • Write a puppet role for the puppetserver itself:

In particular, the following non-trivial things seem best to provision via puppet instead of statically once:

  • set up sysadmin user accounts and ssh keys.
  • prune initial root keys (ref https://github.com/jquery/infrastructure/issues/560).
  • provision a secret key that allows it to clone the private infra-puppet-secret repo. Also: this presents a bootstrapping problem, so we'll probably have to write a plain shell version as well that we document.
  • install node-notifier and provisioning it with a secret webhook to keep the checkout of infra-puppet and infra-puppet-sectet up-to-date.

@atdt
Copy link
Contributor

atdt commented May 16, 2022

I think it'd be neat if the public repo is standalone and self-sufficient for staging and local use, e.g. not hard-require a third repo with pseudo secrets to be integrated to substitute for the real secret repo. Instead, we might be able to get away with only having private data come from Hieradata YAML files, which have a straight-forward inheritence chain that we can configure in production to include one extra layer from a checkout of the private repo.

What would happen if a secret were to get accidentally deleted from the private repo? IIUC, with the setup that you're suggesting, we wouldn't get an error; the secret will simply quietly get the dummy value in production. I don't think we want that; we'd want Puppet to fail loudly in that case, which we'd get if we had a real and fake Puppet private repos.

@Krinkle
Copy link
Member Author

Krinkle commented May 16, 2022

@atdt Thanks, I hadn't thought of that!

I'd still like to try once more if we can avoid a third repo for fake-secrets, however. Rather than place all the dummy values in a "common" file in the puppet repo as I described before, what if we instead placed (most) of that in a "dummy" file that is still in the same puppet repo but indeed only optionally included, similarly to how we'd optionally include the private files. I believe that would effectively achieve the same, but with the file present in the repo rather than being brought in or symlinked from a separate repo, right?

@atdt
Copy link
Contributor

atdt commented May 22, 2022

@Krinkle I think that works, yes. Let's give it a shot.

@Krinkle Krinkle changed the title Upgrade from Puppet 3 (Debian 7 Wheezy) to Debian 11 Bullseye Upgrade from Puppet 3 (Debian 7 Wheezy) to Puppet 7 (Debian 11 Bullseye) May 28, 2022
@Krinkle
Copy link
Member Author

Krinkle commented May 29, 2022

While running puppet agent -tv works on clients and uses the correct server (puppet-02), the systemd service that was started originally kept failing as seen in syslog and via systemctl status puppet. Running systemctl restart puppet fixed that.

I'm gonna assume this is expected and simply because we ran puppet config set ... during the provisioning without restarting after that. Noting this here to be documented later as part of the provisioning steps.

Krinkle referenced this issue May 29, 2022
Also switch away from shared common.yaml to a fake and secret one, per
discussion at https://github.com/jquery/infrastructure/issues/484.
Krinkle referenced this issue May 29, 2022
Also switch away from shared common.yaml to a fake and secret one, per
discussion at https://github.com/jquery/infrastructure/issues/484.
Krinkle referenced this issue May 30, 2022
Also switch away from shared common.yaml to a fake and secret one, per
discussion at https://github.com/jquery/infrastructure/issues/484.
Krinkle referenced this issue May 30, 2022
Also switch away from shared common.yaml to a fake and secret one, per
discussion at https://github.com/jquery/infrastructure/issues/484.
@Krinkle
Copy link
Member Author

Krinkle commented May 30, 2022

I've provisioned myself and Ori on the new system, and also resolved https://github.com/jquery/infrastructure/issues/560 at the same time (Automatically remove unpuppetized root keys).

@Krinkle
Copy link
Member Author

Krinkle commented Aug 4, 2022

Notes from meeting with @atdt and myself:

  • We'll use ensure => present instead of ensure => latest for packages, same as current infra. It seems worth the trade-off between risk of potential issues when we're away given how small we are vs benefit of keeping up with exact latest versions. The slight downside of this is that if we have to re-create a droplet from scratch, it might end up with a slightly newer version as part of that process (e.g. a minor update within the same Debian stable channel).
  • At the same time we'll use https://wiki.debian.org/UnattendedUpgrades in a way that's limited to updating only packages for security updates, to cover ourselves from that angle and putting trust in the Debian ecosystem for this.

@Krinkle Krinkle changed the title Upgrade from Puppet 3 (Debian 7 Wheezy) to Puppet 7 (Debian 11 Bullseye) Upgrade from Debian 7 Wheezy (Puppet 3) to Debian 11 Bullseye (Puppet 7) Oct 23, 2022
@supertassu supertassu self-assigned this Nov 28, 2022
@Krinkle Krinkle transferred this issue from another repository Aug 30, 2023
@Krinkle Krinkle transferred this issue from another repository Aug 30, 2023
@Krinkle Krinkle transferred this issue from another repository Aug 30, 2023
@Krinkle Krinkle transferred this issue from another repository Aug 30, 2023
@Krinkle
Copy link
Member Author

Krinkle commented Oct 18, 2023

Last remaining work:

  • Decom wp-01.ops.jquery.net (still serves plugins.jquery.com).
  • Decom jenkins-01.ops.jquery.net.
  • Decom puppet.ops.jquery.net (manages jenkins-01 and wp-01).

The first one is blocked on #29

@Krinkle
Copy link
Member Author

Krinkle commented Apr 24, 2024

I've deleted the tarsnap backups of wp-01 using the command at #19 (comment), and turned off the droplet. I'll delete it next week if nothing comes up by then.

Screenshot 2024-04-24 at 20 14 38

Just shy of its 10 year anniversary. Pretty good uptime!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants