Skip to content
This repository has been archived by the owner on Jul 27, 2023. It is now read-only.

Add new partitioner script, which can do job on first boot #1239

Merged
merged 34 commits into from Apr 28, 2016

Conversation

avnik
Copy link
Contributor

@avnik avnik commented Mar 7, 2016

  • Installs cleanly on a fresh build of most recent master branch
  • Upgrades cleanly from the most recent release
  • Updates documentation relevant to the changes
  • Rebases cleanly onto the latest master

Fixes #1240

@avnik avnik force-pushed the feature/custom-partitioner branch from 27d5b9e to e2d302b Compare March 10, 2016 18:18
@avnik avnik changed the title DON'T MERGE: Add new partitioner script, which can do job on first boot Add new partitioner script, which can do job on first boot Mar 10, 2016
@avnik avnik force-pushed the feature/custom-partitioner branch from e2d302b to 03075d0 Compare March 11, 2016 14:09
@stevendborrelli stevendborrelli added this to the 1.1 milestone Mar 14, 2016
@avnik avnik force-pushed the feature/custom-partitioner branch from e91aefa to 8c380f3 Compare March 31, 2016 13:50
@avnik
Copy link
Contributor Author

avnik commented Mar 31, 2016

Add docker-io's docker support from @stevendborrelli

@@ -0,0 +1,6 @@
[dockerrepo]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 docker repos in the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I'll remove mine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been done.

@stevendborrelli
Copy link
Contributor

Documentation needs to be updated before this can go in.

@langston-barrett
Copy link
Contributor

We should update the Packer image after this is merged.

Order of execution
------------------

All configuration rules reside in ``/etc/mantl/filesystems.d``, and processed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could include an example configuration? Or include roles/lvm/templates/mantl-volume-group.conf.j2 is an example?

@langston-barrett
Copy link
Contributor

langston-barrett commented Apr 6, 2016

@avnik outlined these testing criteria:

  • clean install
  • fresh install in both overlay+xfs and devicemapper
  • upgrade from 1.0.3 with devicemapper
  • downgrade docker-io's docker to RH one
  • upgrade from current master in both overlay+xfs and devicemapper

@ryane
Copy link
Contributor

ryane commented Apr 12, 2016

on a clean install on AWS, I get the following error on the ansible run:

TASK: [docker | enable docker] ************************************************
failed: [resching-aws-worker-04] => {"failed": true}
msg: Job for docker.service failed because a configured resource limit was exceeded. See "systemctl status docker.service" and "journalctl -xe" for details.

failed: [resching-aws-worker-03] => {"failed": true}
msg: Job for docker.service failed because a configured resource limit was exceeded. See "systemctl status docker.service" and "journalctl -xe" for details.

failed: [resching-aws-worker-02] => {"failed": true}
msg: Job for docker.service failed because a configured resource limit was exceeded. See "systemctl status docker.service" and "journalctl -xe" for details.

failed: [resching-aws-worker-01] => {"failed": true}
msg: Job for docker.service failed because a configured resource limit was exceeded. See "systemctl status docker.service" and "journalctl -xe" for details.

failed: [resching-aws-control-01] => {"failed": true}
msg: Job for docker.service failed because a configured resource limit was exceeded. See "systemctl status docker.service" and "journalctl -xe" for details.

failed: [resching-aws-edge-01] => {"failed": true}
msg: Job for docker.service failed because a configured resource limit was exceeded. See "systemctl status docker.service" and "journalctl -xe" for details.

failed: [resching-aws-control-02] => {"failed": true}
msg: Job for docker.service failed because a configured resource limit was exceeded. See "systemctl status docker.service" and "journalctl -xe" for details.

failed: [resching-aws-edge-02] => {"failed": true}
msg: Job for docker.service failed because a configured resource limit was exceeded. See "systemctl status docker.service" and "journalctl -xe" for details.

failed: [resching-aws-control-03] => {"failed": true}
msg: Job for docker.service failed because a configured resource limit was exceeded. See "systemctl status docker.service" and "journalctl -xe" for details.


FATAL: all hosts have already failed -- aborting

systemctl status docker.service:

[centos@ip-10-1-1-9 ~]$ sudo systemctl status docker.service
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─10-options.conf, 12-network-options.conf, 14-storage-options.conf, 20-ExecStart.conf
   Active: failed (Result: resources)
     Docs: https://docs.docker.com

Apr 12 12:05:27 ip-10-1-1-9.ec2.internal systemd[1]: Failed to load environment files: No such file or directory
Apr 12 12:05:27 ip-10-1-1-9.ec2.internal systemd[1]: docker.service failed to run 'start' task: No such file or directory
Apr 12 12:05:27 ip-10-1-1-9.ec2.internal systemd[1]: Failed to start Docker Application Container Engine.
Apr 12 12:05:27 ip-10-1-1-9.ec2.internal systemd[1]: Unit docker.service entered failed state.
Apr 12 12:05:27 ip-10-1-1-9.ec2.internal systemd[1]: docker.service failed.
Apr 12 12:05:27 ip-10-1-1-9.ec2.internal systemd[1]: Starting Docker Application Container Engine...

any idea? @avnik @stevendborrelli

@ryane
Copy link
Contributor

ryane commented Apr 12, 2016

same thing on an upgrade from 1.0.3

This package cleans up docker containers, images and volues by
adding systemd timers tht run commands like docker rm
@avnik
Copy link
Contributor Author

avnik commented Apr 18, 2016

@ryane I guess it because we not update package with mantl-storage-setup, I'll do this atm (because I think code of partitioner enough stable).
Pass debug_storage_setup=True to ansible-playbook is another option

- "--dns {{ private_ipv4 }}"
- "--log-driver={{ docker_log_driver }}"
- "{% if docker_selinux_enabled %}--selinux-enabled {% endif %}"
- "{% if kube_build is defined %}--dns-search {{ cluster_name }}{% endif %}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is kube_build ever set anywhere?

@ryane
Copy link
Contributor

ryane commented Apr 27, 2016

successful clean build on GCE. However, I am seeing some unexpected docker errors. I restarted the nginx-consul service on a node and received the following:

Apr 27 22:54:11 resching-gce-control-03 systemd[1]: Stopping nginx-consul...
Apr 27 22:54:11 resching-gce-control-03 docker[2012]: nginx-consul
Apr 27 22:54:11 resching-gce-control-03 docker[7983]: Error response from daemon: Unable to remove filesystem for 566be3da7914736c38e32db59f2a04599c8fe76191fee610d7a67e71908f9847: remove /var/lib/docker/conta
iners/566be3da7914736c38e32db59f2a04599c8fe76191fee610d7a67e71908f9847/shm: device or resource busy
Apr 27 22:54:11 resching-gce-control-03 systemd[1]: nginx-consul.service: main process exited, code=exited, status=137/n/a
Apr 27 22:54:11 resching-gce-control-03 systemd[1]: Unit nginx-consul.service entered failed state.
Apr 27 22:54:11 resching-gce-control-03 systemd[1]: nginx-consul.service failed.
Apr 27 22:54:11 resching-gce-control-03 systemd[1]: Starting nginx-consul...
Apr 27 22:54:11 resching-gce-control-03 docker[2024]: Error response from daemon: Unable to remove filesystem for 566be3da7914736c38e32db59f2a04599c8fe76191fee610d7a67e71908f9847: remove /var/lib/docker/conta
iners/566be3da7914736c38e32db59f2a04599c8fe76191fee610d7a67e71908f9847/shm: device or resource busy
Apr 27 22:54:11 resching-gce-control-03 docker[2028]: 1.2: Pulling from ciscocloud/nginx-consul
...
Apr 27 22:55:33 resching-gce-control-03 docker[2875]: Status: Image is up to date for ciscocloud/nginx-consul:1.2
Apr 27 22:55:33 resching-gce-control-03 systemd[1]: Started nginx-consul.
Apr 27 22:55:33 resching-gce-control-03 docker[2880]: /usr/bin/docker: Error response from daemon: Conflict. The name "/nginx-consul" is already in use by container 566be3da7914736c38e32db59f2a04599c8fe76191fee610d7a67e71908f9847. You have to remove (or rename) that container to be able to reuse that name..
Apr 27 22:55:33 resching-gce-control-03 docker[2880]: See '/usr/bin/docker run --help'.

The container is marked as Dead and cannot be removed:

566be3da7914        ciscocloud/nginx-consul:1.2                           "/scripts/launch.sh"     31 minutes ago      Dead                                                                                                                          nginx-consul

$ sudo docker rm 566be3da7914
Error response from daemon: Unable to remove filesystem for 566be3da7914736c38e32db59f2a04599c8fe76191fee610d7a67e71908f9847: remove /var/lib/docker/containers/566be3da7914736c38e32db59f2a04599c8fe76191fee610d7a67e71908f9847/shm: device or resource busy

@ryane
Copy link
Contributor

ryane commented Apr 28, 2016

the issue doesn't seem to be directly related to the code in this pr but just may be a bug with the current version of docker. There are some similar issues open:

After discussing, we will merge this and workaround the immediate problem in #1390. We will open new issues for any other that may crop up.

@ryane ryane merged commit 0fd30d7 into master Apr 28, 2016
@ryane ryane deleted the feature/custom-partitioner branch April 28, 2016 15:01
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants