We strive to modernize, and make rebuilding of the services reliable and reproducible. After quite some time to deliberate Ansible was chosen to maintain our infrastructure.
These playbooks are made to run with Ansible >=2.7, <2.8, preferably with Python 3.
You might need this for our custom plugins:
You should love YAML, as the rules and most of the configuration are written in this format. This is not difficult to learn though.
ansible.cfg configuration file is provided with all needed settings and to be sure we all use the same and achieve the same results. The only controversial setting is
hash_behaviour = merge, it is very practical to allow partial variable overrides using groups of hosts (if you disagree, try to convince Duck, good luck).
The lists of hosts and groups (the inventory) is held by
hosts.yml. All hosts should be listed in the
all group, and then in various groups as appropriate; failing to do so will most probably result in dropping hosts out of the inventory when removing them from groups. We do not maintain variables in this file, this is not very practical.
The files in
group_vars/<group>/ for hosts and groups hold specific variables. We use groups for hosts holding the save service, or in the same geographic zone… That's were the magical
hash_behaviour = merge shines.
Most of the infrastructure parameters are common information stored in
group_vars/all: package repositories, DNS settings, users, entities…
Some files are also stored in
- DuckCorp-specific files which are used by a role but not included in it to keep it generic
- isolated tasks in a play manipulating data (copy/template/…); creating a higher level role may be a cleaner solution though
Here is a list of the playbooks and their goal:
- dc_check: run various tests on machines to check for problems; this is WIP and currently only checks for unapplied upgrades, package diff and obsolete packages
- dc: (partially, WIP) deploy the DuckCorp infrastructure
- regen_ldap_content: generate the whole LDAP database content
- regen_ssh_keys: on shell boxes, create missing user home directories and add their SSH keys to their
authorized_keysfile (preserving local changes)
Please don't use
roles sections in plays, use
include_role tasks instead, it is more powerful and you can order it as you wish (as any task).
We should try our best to modularize the rules to ease readability and maintenance.
Except for basic system settings (
dc-base role) which should be kept small, and DuckCorp-specific needs, all roles are maintained in separate repositories (WIP for legacy roles) and should be kept generic (without trying to address each and every possible need in the world), or clearly state their scope limitation. They should all be documented and bear meta information.
Roles should present a clear API. We also use more and more multiple entrypoints using the
include_role action and
tasks_from parameter. This allows to factorize various functions based on the same logic and variables.
It is possible to limit the scope of the run to preview on a particular machine or shorten the run when you're sure changes affect only specific host or tasks.
To limit the scope of the machines, use the
To run only specific groups of tasks, use
--skip-tags. A partial (due to dynamic includes) but sufficient list of tags can be found using:
ansible-playbook --list-tags playbooks/*.yml
validate_playbooks.sh script should be run before every commit to avoid mistakes.
This script depends on:
When we switch to a newer Ansible version, dependencies like
ansible-lint which are tied to Ansible should be upgraded too. There might be changes in the reports after migration and we should strive to fix them quickly. A new commit is not responsible for the previous state of the rules and should not mix topic changes with unrelated fixes, they should be handled in separate commits.
--check option is available but there is no effort yet to make problematic tasks (command/shell/…) handled better.
Improving Ansible Speed
Ansible is slow, but there's a nice project to improve its performance. It still has glitches so it's not enabled by default, but it's easy to enable it.
First install the library (it is not yet packaged):
pip install mitogen
Then you just need to run playbooks this way:
ANSIBLE_STRATEGY=mitogen_linear ansible-playbook …