Recently, I started a new position as a "DevOps Engineer" (Don't laugh too hard. I promise to write more on why that's a silly title and why I took the job anyways in a future post.) with a great established company expanding its technology capabilities.

One of the first things I noticed on the job was how many tasks, like installing a new client, updating an SSL certificate, or provisioning a new virtual machine were routinely performed by a human following a set of instructions in a Confluence document.

You probably have at least a few of these kinds of documents on hand at work. A checklist, document, or wiki post that walks you through the steps of deploying a new release or configuring a new staging server. These kinds of documents often have the following format:

  1. SSH to a special snowflake server.
  2. Become root.
  3. Run some ugly looking command.
  4. Wait, get distracted, go waste time reading comments on Hacker News.
  5. Come back to your terminal window and examine the output of the command.
  6. If everything looks good, go on to the next step.
  7. Repeat steps 3–6 a dozen more times.
  8. Two hours later, you’ve completed a task that a computer could have done in two minutes.

While I am thankful that we have these processes documented rather than relying on the memory and skill of engineers to perform them, the lack of automation is a serious problem for several reasons.

  • Over time as server configuration drifts, the documentation becomes stale and no longer reflects the actual process.
  • Engineers waste time and become frustrated performing routine tasks instead of working on interesting, high value projects.
  • The organization incurs the risk of unnecessary downtime caused by human error every time a process is performed manually.

Next time you are tasked with following a process defined in a document, consider taking the extra time to translate the document into an executable Ansible playbook. The extra time spent automating your IT processes is an investment that starts paying dividends immediately.

Taking a once time consuming, error prone, and mundane task and turning it into a single command that runs in a fraction of the time frees you up to work on more interesting, valuable, and important work. The kind of work you would prefer to be doing, like architecting improvements to your continuous delivery pipeline, or rebuilding your once special snowflake servers with configuration management tools.

Tips and tricks for turning your IT docs into executable Ansible playbooks

Ok, all of that sounds great and theory, but I'm hearing you complain about how it's not practical or it takes too much time. Well, now I will show you some of the techniques I use to rapidly translate manual IT processes into Ansible playbooks. Keep in mind, that what we are creating is only the first iteration along your journey of automating your daily IT operations. It is only a starting point to free up your time and reduce risk so that you can focus more time and attention on architecting and implementing more comprehensive solutions.

Iteration 1: Make liberal use of the shell, debug, and pause modules.

The first thing I do when translating IT docs to Ansible playbooks is to create a task for each command in the document using the shell module. Then I name my tasks to describe their intended function or purpose, and finally I insert invocations to the pause module between each task prompting me to inspect the shell output before continuing. I do this all at once because if the docs are good, it will work the first time, and the use of the pause module provides me with guidance and a safety net in case something isn’t documented well or goes wrong in the middle of the process.

For example, let's pretend your instructions starts with the following steps.

Step 1. Verify that the NFS images share is mounted

df -h | grep nfs.example.com:/srv/images

Step.2 Mount the NFS images share if it is not currently mounted

mount nfs.example.com:/srv/images /images

Using the shell, pause, and debug modules, we can trivially translate these steps into the following playbook.

---
- name: Install a new client (https://confluence.example.com/foo/bar/biz-baz)
  hosts: app.example.com
  gather_facts: false

  tasks:
    - name: Verify that the images NFS share is mounted
      shell: "df -h | grep nfs.example.com:/srv/images"
      register: nfs_check
      ignore_errors: yes

    # display the output of the previous command in a readable format
    - debug: var=nfs_check.stdout_lines

    - pause: |
        prompt="
        Examine the above output to verify that the /srv/images share is mounted.

        Press Enter to continue or Ctrl-C to abort"

    - name: Mount the NFS images share if it's not currently mounted
      shell: "mount nfs.example.com:/srv/images /images"

This is a good a good first iteration because the combination of debug and pause allows us to examine the behavior of each task before deciding whether or not to run the next task.

TASK [debug]
*******************************************************************
ok: [192.168.33.10] => {
  "nfs_check.stdout_lines": [
    "nfs.example.com:/srv/images 40G 1.2G 37G 4% /images"
  ]
}

TASK [pause]
*******************************************************************
[pause]

Examine the above output to verify that the /srv/images share is mounted.

Press Enter to continue or Ctrl-C to abort:

While this is great when we are unfamiliar with the steps involved and gives us the chance to uncover mistakes during the playbook development process, it still requires us to manually examine the output of each command. This is a tedious, time consuming, error prone task when you have dozens of steps in your procedure.

Iteration 2: Leverage the when, failed_when, and changed_when clauses to replace manual verification with automated verification.

The next iteration improves upon the first by leveraging Ansible to examine the result of the mount check command, and only running the mount command if necessary.

---
- name: Install a new client (https://confluence.example.com/foo/bar/biz-baz)
  hosts: app.example.com
  gather_facts: false

  tasks:
    - name: Verify that the NFS images share is mounted
      shell: "df -h | grep nfs.example.com:/srv/images"
      register: nfs_check
      changed_when: false
      failed_when: false

    - name: Mount the NFS images share if it's not currently mounted
      shell: "mount nfs.example.com:/srv/images /images"
      register: nfs_mount
      when: ""
      changed_when: ""

Notice that we are taking advantage of Ansible's changed_when and failed_when features to make these two tasks idempotent and also to reflect the true result of the task in the playbook output. For the first task, both failed_when and changed_when are set to false because we are not changing the state of the system, we are only recording the current state. The use of when and changed_when in the second task allows us to skip the mount command if not required and report a changed status in the playbook output.

TASK [Verify that the NFS images share is mounted]
*****************************
ok: [192.168.33.10]

TASK [Mount the NFS images share if it's not currently mounted]
****************
changed: [192.168.33.10]
 [WARNING]: Consider using mount module rather than running mount

That's much better! Now the task of ensuring an NFS mount exists is entirely automated. You should repeat the following process for all steps in your document.

Iteration 3 and beyond: Use the time you would have spent performing the process manually to improve upon your existing automation.

If you've been paying attention or are a seasoned Ansible veteran, you will notice that we can reduce these steps to a single task with the mount module.

tasks:
  - name: Verify that the NFS imagesshare is mounted
    mount: >
      name=/images
      src=nfs.example.com:/srv/images
      state=mounted
      fstype=ext4

It won't always be this easy, but I wanted to use a simple example to get my point across. Often times, your playbook will consist of calling a sequence of hand crafted shell scripts that perform multiple steps each. Often, if you dive into these scripts, you will find that they spend a lot of time error checking, producing debug output, and ensuring idempotency—exactly the functionality Ansible modules provide for you with a clean, declarative syntax. Both future you and your coworkers will thank you for tackling the harrowing task of poring over complicated shell scripts and reducing them to a handful of easy to understand Ansible tasks.

Conclusion

Hopefully, you know feel equipped to dive into your IT procedure documents and begin automating them with Ansible playbooks. As with any kind of investment in automation, the extra time spent automating a routine task will pay for itself many times over by giving you time and freedom to do real work and solve real problems in the future.