martin f. krafft | 3c33322 | 2013-06-14 19:27:57 +0200 | [diff] [blame] | 1 | ============================================================= |
| 2 | reclass — recursive external node classification |
| 3 | ============================================================= |
| 4 | reclass is © 2007–2013 martin f. krafft <madduck@madduck.net> |
| 5 | and available under the terms of the Artistic Licence 2.0 |
| 6 | ============================================================= |
| 7 | |
| 8 | reclass is an "external node classifier" (ENC) as can be used with automation |
| 9 | tools, such as Puppet, Salt, and Ansible. |
| 10 | |
| 11 | The purpose of an ENC is to allow a system administrator to maintain an |
| 12 | inventory of nodes to be managed, completely separately from the configuration |
| 13 | of the automation tool. Usually, the external node classifier completely |
| 14 | replaces the tool-specific inventory (such as site.pp for Puppet, or |
| 15 | /etc/ansible/hosts). |
| 16 | |
| 17 | In general, the ENC fulfills two jobs: |
| 18 | |
| 19 | - it provides information about groups of nodes and group memberships |
| 20 | - it gives access to node-specific information, such as variables |
| 21 | |
| 22 | While reclass was born into a Puppet environment and has also been used with |
| 23 | Salt, the version you have in front of you is a rewrite from scratch, which |
| 24 | was targetted at Ansible. However, care was taken to make the code flexible |
| 25 | enough to allow it to be used from Salt, Puppet, and maybe even other tools as |
| 26 | well. |
| 27 | |
| 28 | In this document, you will find an overview of the concepts of reclass, the |
| 29 | way it works, and how it can be tied in with Ansible. |
| 30 | |
| 31 | Quick start — Ansible |
| 32 | ~~~~~~~~~~~~~~~~~~~~~ |
| 33 | The following steps should get you up and running quickly. Generally, we will |
| 34 | be working in /etc/ansible. However, if you are using a source-code checkout |
| 35 | of Ansible, you might also want to work inside the ./hacking directory |
| 36 | instead. |
| 37 | |
| 38 | Or you can also just look into ./examples/ansible of your reclass checkout, |
| 39 | where the following steps have already been prepared. |
| 40 | |
| 41 | /…/reclass refers to the location of your reclass checkout. |
| 42 | |
| 43 | 1. Symlink /…/reclass/adapters/ansible to /etc/ansible/hosts (or |
| 44 | ./hacking/hosts) |
| 45 | |
| 46 | 2. Copy the two directories 'nodes' and 'classes' from the example |
| 47 | subdirectory in the reclass checkout to /etc/ansible |
| 48 | |
| 49 | If you prefer to put those directories elsewhere, you can create |
| 50 | /etc/ansible/reclass-config.yml with contents such as |
| 51 | |
| 52 | storage_type: yaml_fs |
| 53 | nodes_uri: /srv/reclass/nodes |
| 54 | classes_uri: /srv/reclass/classes |
| 55 | |
| 56 | Note that yaml_fs is currently the only supported storage_type, and it's |
| 57 | the default if you don't set it. |
| 58 | |
| 59 | 3. Check out your inventory by invoking |
| 60 | |
| 61 | ./hosts --list |
| 62 | |
| 63 | which should return 5 groups in JSON-format, and each group has exactly |
| 64 | one member 'localhost'. |
| 65 | |
| 66 | 4. See the node information for 'localhost': |
| 67 | |
| 68 | ./hosts --host localhost |
| 69 | |
| 70 | This should print a set of keys and values, including a greeting, |
| 71 | a colour, and a sub-class called 'RECLASS'. |
| 72 | |
| 73 | 5. Execute some ansible commands, e.g. |
| 74 | |
| 75 | ansible -i hosts \* --list-hosts |
| 76 | ansible -i hosts \* -m ping |
| 77 | ansible -i hosts \* -m debug -a 'msg="${greeting}"' |
| 78 | ansible -i hosts \* -m setup |
| 79 | ansible-playbook -i hosts test.yml |
| 80 | |
| 81 | 6. You can also invoke reclass directly, which gives a slightly different |
| 82 | view onto the same data, i.e. before it has been adapted for Ansible: |
| 83 | |
| 84 | /…/reclass.py --pretty-print --inventory |
| 85 | /…/reclass.py --pretty-print --nodeinfo localhost |
| 86 | |
| 87 | reclass concepts |
| 88 | ~~~~~~~~~~~~~~~~ |
| 89 | reclass assumes a node-centric perspective into your inventory. This is |
| 90 | obvious when you query reclass for node-specific information, but it might not |
| 91 | be clear when you ask reclass to provide you with a list of groups. In that |
| 92 | case, reclass loops over all nodes it can find in its database, reads all |
| 93 | information it can find about the nodes, and finally reorders the result to |
| 94 | provide a list of groups with the nodes they contain. |
| 95 | |
| 96 | Since the term 'groups' is somewhat ambiguous, it helps to start off with |
| 97 | a short glossary of reclass-specific terminology: |
| 98 | |
| 99 | node: A node, usually a computer in your infrastructure |
| 100 | class: A category, tag, feature, or role that applies to a node |
| 101 | Classes may be nested, i.e. there can be a class hierarchy |
| 102 | application: A specific set of behaviour to apply to members of a class |
| 103 | parameter: Node-specific variables, with inheritance throughout the class |
| 104 | hierarchy. |
| 105 | |
| 106 | A class consists of zero or more parent classes, zero or more applications, |
| 107 | and any number of parameters. |
| 108 | |
| 109 | A node is almost equivalent to a class, except that it usually does not (but |
| 110 | can) specify applications. |
| 111 | |
| 112 | When reclass parses a node (or class) definition and encounters a parent |
| 113 | class, it recurses to this parent class first before reading any data of the |
| 114 | node (or class). When reclass returns from the recursive, depth first walk, it |
| 115 | then merges all information of the current node (or class) into the |
| 116 | information it obtained during the recursion. |
| 117 | |
| 118 | Information in this context is essentially one of a list of applications or |
| 119 | a list of parameters. |
| 120 | |
| 121 | The interaction between the depth-first walk and the delayed merging of data |
| 122 | means that the node (and any class) may override any of the data defined by |
| 123 | any of the parent classes (ancestors). This is in line with the assumption |
| 124 | that more specific definitions ("this specific host") should have a higher |
| 125 | precedence than more general definitions ("all webservers", which includes all |
| 126 | webservers in Munich, which includes "this specific host", for example). |
| 127 | |
| 128 | Here's a quick example, showing how parameters accumulate and can get |
| 129 | replaced. |
| 130 | |
| 131 | All unixnodes (i.e. nodes who have the 'unixnodes' class in their ancestry) |
| 132 | have /etc/motd centrally-managed (through the 'motd' application), and the |
| 133 | unixnodes class definition provides a generic message-of-the-day to be put |
| 134 | into this file. |
| 135 | |
| 136 | All debiannodes, which are descendants of unixnodes, should include the |
| 137 | Debian codename in this message, so the message-of-the-day is overwritten in |
| 138 | the debiannodes class. |
| 139 | |
| 140 | The node 'quantum.example.org' will have a scheduled downtime this weekend, |
| 141 | so until Monday, an appropriate message-of-the-day is added to the node |
| 142 | definition. |
| 143 | |
| 144 | When the 'motd' application runs, it retrieves the appropriate |
| 145 | message-of-the-day and writes it into /etc/motd. |
| 146 | |
| 147 | At this point it should be noted that parameters whose values are lists or |
| 148 | key-value pairs don't get overwritten by children classes or node definitions, |
| 149 | but the information gets merged (recursively) instead. |
| 150 | |
| 151 | Similarly to parameters, applications also accumulate during the recursive |
| 152 | walk through the class ancestry. It is possible for a node or child class to |
| 153 | _remove_ an application added by a parent class, by prefixing the application |
| 154 | with '~'. |
| 155 | |
| 156 | Finally, reclass happily lets you use multiple inheritance, and ensures that |
| 157 | the resolution of parameters is still well-defined. Here's another example |
| 158 | building upon the one about /etc/motd above: |
| 159 | |
| 160 | 'quantum.example.org' (which is back up and therefore its node definition no |
| 161 | longer contains a message-of-the-day) is at a site in Munich. Therefore, it |
| 162 | is a child of the class 'hosted@munich'. This class is independent of the |
| 163 | 'unixnode' hierarchy, 'quantum.example.org' derives from both. |
| 164 | |
| 165 | In this example infrastructure, 'hosted@munich' is more specific than |
| 166 | 'debiannodes' because there are plenty of Debian nodes at other sites (and |
| 167 | some non-Debian nodes in Munich). Therefore, 'quantum.example.org' derives |
| 168 | from 'hosted@munich' _after_ 'debiannodes'. |
| 169 | |
| 170 | When an electricity outage is expected over the weekend in Munich, the admin |
| 171 | can change the message-of-the-day in the 'hosted@munich' class, and it will |
| 172 | apply to all hosts in Munich. |
| 173 | |
| 174 | However, not all hosts in Munich have /etc/motd, because some of them are |
| 175 | 'windowsnodes'. Since the 'windowsnodes' ancestry does not specify the |
| 176 | 'motd' application, those hosts have access to the message-of-the-day in the |
| 177 | node variables, but the message won't get used… |
| 178 | |
| 179 | … unless, of course, 'windowsnodes' specified a Windows-specific application |
| 180 | to bring such notices to the attention of the user. |
| 181 | |
| 182 | reclass operations |
| 183 | ~~~~~~~~~~~~~~~~~~ |
| 184 | While reclass has been built to support different storage backends through |
| 185 | plugins, currently only the 'yaml_fs' storage backend exists. This is a very |
| 186 | simple, yet powerful, YAML-based backend, using flat files on the filesystem |
| 187 | (as suggested by the _fs postfix). |
| 188 | |
| 189 | yaml_fs works with two directories, one for node definitions, and another for |
| 190 | class definitions. It is possible to use a single directory for both, but that |
| 191 | could get messy and is therefore not recommended. |
| 192 | |
| 193 | Files in those directories are YAML-files, specifying key-value pairs. The |
| 194 | following three keys are read by reclass: |
| 195 | |
| 196 | classes: a list of parent classes |
| 197 | appliations: a list of applications to append to the applications defined by |
| 198 | ancestors. If an application name starts with '~', it would |
| 199 | remove this application from the list, if it had already been |
| 200 | added — but it does not prevent a future addition. |
| 201 | E.g. '~firewalled' |
| 202 | parameters: key-value pairs to set defaults in class definitions, override |
| 203 | existing data, or provide node-specific information in node |
| 204 | specifications. |
| 205 | By convention, parameters corresponding to an application |
| 206 | should be provided as subkey-value pairs, keyed by the name of |
| 207 | the application, e.g. |
| 208 | |
| 209 | applications: |
| 210 | - ssh.server |
| 211 | parameters: |
| 212 | ssh.server: |
| 213 | permit_root_login: no |
| 214 | |
| 215 | reclass starts out reading a node definition file, obtains the list of |
| 216 | classes, then reads the files corresponding to these classes, recursively |
| 217 | reading parent classes, and finally merges the applications list (append |
| 218 | unless |
| 219 | |
| 220 | Usage |
| 221 | ~~~~~ |
| 222 | For information on how to use reclass directly, invoke reclass.py with --help |
| 223 | and study the output. |
| 224 | |
| 225 | More commonly, however, use of reclass will happen indirectly, and through |
| 226 | so-called adapters, e.g. /…/reclass/adapters/ansible. The job of an adapter is |
| 227 | to translate between different invocation paradigms, provide a sane set of |
| 228 | default options, and massage the data from reclass into the format expected by |
| 229 | the automation tool in use. |
| 230 | |
| 231 | Configuration file |
| 232 | ~~~~~~~~~~~~~~~~~~ |
| 233 | reclass can read some of its configuration from a file. The file is |
| 234 | a YAML-file and simply defines key-value pairs. |
| 235 | |
| 236 | The configuration file can be used to set defaults for all the options that |
| 237 | are otherwise configurable via the command-line interface, so please use the |
| 238 | --help output of reclass for reference. The command-line option '--nodes-uri' |
| 239 | corresponds to the key 'nodes_uri' in the configuration file. For example: |
| 240 | |
| 241 | storage_type: yaml_fs |
| 242 | pretty_print: True |
| 243 | output: json |
| 244 | nodes_uri: ../nodes |
| 245 | |
| 246 | reclass first looks in the current directory for the file called |
| 247 | 'reclass-config.yml' and if no such file is found, it looks "next to" the |
| 248 | reclass script itself. Adapters implement their own lookup logic. |
| 249 | |
| 250 | Integration with Ansible |
| 251 | ~~~~~~~~~~~~~~~~~~~~~~~~ |
| 252 | The integration between reclass and Ansible is performed through an adapter, |
| 253 | and needs not be of our concern too much. |
| 254 | |
| 255 | However, Ansible has no concept of "nodes", "applications", "parameters", and |
| 256 | "classes". Therefore it is necessary to explain how those correspond to |
| 257 | Ansible. Crudely, the following mapping exists: |
| 258 | |
| 259 | nodes hosts |
| 260 | classes groups |
| 261 | applications playbooks |
| 262 | parameters host_vars |
| 263 | |
| 264 | reclass does not provide any group_vars because of its node-centric |
| 265 | perspective. While class definitions include parameters, those are inherited |
| 266 | by the node definitions and hence become node_vars. |
| 267 | |
| 268 | reclass also does not provide playbooks, nor does it deal with any of the |
| 269 | related Ansible concepts, i.e. vars_files, vars, tasks, handlers, roles, etc.. |
| 270 | |
| 271 | Let it be said at this point that you'll probably want to stop using |
| 272 | host_vars, group_vars and vars_files altogether, and if only because you |
| 273 | should no longer need them, but also because the variable precedence rules |
| 274 | of Ansible are full of surprises, at least to me. |
| 275 | |
| 276 | reclass' Ansible adapter massage the reclass output into Ansible-usable data, |
| 277 | namely: |
| 278 | |
| 279 | - Every class in the ancestry of a node becomes a group to Ansible. This is |
| 280 | mainly useful to be able to target nodes during interactive use of |
| 281 | Ansible, e.g. |
| 282 | |
| 283 | ansible debiannode@wheezy -m command -a 'apt-get upgrade' |
| 284 | → upgrade all Debian nodes running wheezy |
| 285 | |
| 286 | ansible ssh.server -m command -a 'invoke-rc.d ssh restart' |
| 287 | → restart all SSH server processes |
| 288 | |
| 289 | ansible mailserver -m command -a 'tail -n1000 /var/log/mail.err' |
| 290 | → obtain the last 1,000 lines of all mailserver error log files |
| 291 | |
| 292 | The attentive reader might stumble over the use of singular words, whereas |
| 293 | it might make more sense to address all 'mailserver*s*' with this tool. |
| 294 | This is convention and up to you. I prefer to think of my node as |
| 295 | a (singular) mailserver when I add 'mailserver' to its parent classes. |
| 296 | |
| 297 | - Every entry in the list of a host's applications might well correspond to |
| 298 | an Ansible playbook. Therefore, reclass creates a (Ansible-)group for |
| 299 | every application, and adds '_hosts' to the name. |
| 300 | |
| 301 | For instance, the ssh.server class adds the ssh.server application to |
| 302 | a node's application list. Now the admin might create an Ansible playbook |
| 303 | like so: |
| 304 | |
| 305 | - name: SSH server management |
| 306 | hosts: ssh.server_hosts ← SEE HERE |
| 307 | tasks: |
| 308 | - name: install SSH package |
| 309 | action: … |
| 310 | … |
| 311 | |
| 312 | There's a bit of redundancy in this, but unfortunately Ansible playbooks |
| 313 | hardcode the nodes to which a playbook applies. |
| 314 | |
| 315 | The suggested way to use Ansible site-wide is then to create a 'site' |
| 316 | playbook that includes all the other playbooks (which shall hopefully be |
| 317 | based on Ansible roles), and then to invoke Ansible like this: |
| 318 | |
| 319 | ansible-playbook site.yml |
| 320 | |
| 321 | or, if you prefer only to reconfigure a subset of nodes, e.g. all |
| 322 | webservers: |
| 323 | |
| 324 | ansible-playbook site.yml --limit webserver |
| 325 | |
| 326 | Again, if the singular word 'webserver' puts you off, change the |
| 327 | convention as you wish. |
| 328 | |
| 329 | - Parameters corresponding to a node become host_vars for that host. |
| 330 | |
| 331 | Contributing to reclass |
| 332 | ~~~~~~~~~~~~~~~~~~~~~~~ |
| 333 | Conttributions to reclass are very welcome. Since I prefer to keep a somewhat |
| 334 | clean history, I will not merge pull requests. Please send your patches using |
| 335 | git-format-patch and git-send-e-mail to reclass@pobox.madduck.net. |
| 336 | |
| 337 | I have added rudimentary unit tests, and it would be nice if you could submit |
| 338 | your changes with appropriate changes to the tests. To run tests, invoke |
| 339 | ./run_tests.py in the top-level checkout directory. |
| 340 | |
| 341 | If you have larger ideas, I'll be looking forward to discuss them with you. |
| 342 | |
| 343 | -- martin f. krafft <madduck@madduck.net> Fri, 14 Jun 2013 19:30:19 +0200 |