blob: 1b11c2477291b57e2d5f89dd4329b646d8db534d [file] [log] [blame]
martin f. krafft3c333222013-06-14 19:27:57 +02001=============================================================
2 reclass recursive external node classification
3=============================================================
4reclass is © 20072013 martin f. krafft <madduck@madduck.net>
5and available under the terms of the Artistic Licence 2.0
martin f. kraffte39e8902013-06-14 22:12:17 +02006'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
martin f. krafft3c333222013-06-14 19:27:57 +02007
8reclass is an "external node classifier" (ENC) as can be used with automation
9tools, such as Puppet, Salt, and Ansible.
10
11The purpose of an ENC is to allow a system administrator to maintain an
12inventory of nodes to be managed, completely separately from the configuration
13of the automation tool. Usually, the external node classifier completely
14replaces the tool-specific inventory (such as site.pp for Puppet, or
15/etc/ansible/hosts).
16
martin f. krafft62239892013-06-14 20:03:59 +020017reclass allows you to define your nodes through class inheritance, while
18always able to override details of classes further up the tree. Think of
19classes as feature sets, as commonalities between nodes, or as tags. Add to
20that the ability to nest classes (multiple inheritance is allowed,
21well-defined, and encouraged), and piece together your infrastructure from
22smaller bits, eliminating redundancy and exposing all important parameters to
23a single location, logically organised.
24
martin f. krafft3c333222013-06-14 19:27:57 +020025In general, the ENC fulfills two jobs:
26
27 - it provides information about groups of nodes and group memberships
28 - it gives access to node-specific information, such as variables
29
30While reclass was born into a Puppet environment and has also been used with
31Salt, the version you have in front of you is a rewrite from scratch, which
32was targetted at Ansible. However, care was taken to make the code flexible
33enough to allow it to be used from Salt, Puppet, and maybe even other tools as
34well.
35
36In this document, you will find an overview of the concepts of reclass, the
37way it works, and how it can be tied in with Ansible.
38
39Quick start — Ansible
40~~~~~~~~~~~~~~~~~~~~~
41The following steps should get you up and running quickly. Generally, we will
42be working in /etc/ansible. However, if you are using a source-code checkout
43of Ansible, you might also want to work inside the ./hacking directory
44instead.
45
46Or you can also just look into ./examples/ansible of your reclass checkout,
47where the following steps have already been prepared.
48
49/…/reclass refers to the location of your reclass checkout.
50
51 1. Symlink /…/reclass/adapters/ansible to /etc/ansible/hosts (or
52 ./hacking/hosts)
53
54 2. Copy the two directories 'nodes' and 'classes' from the example
55 subdirectory in the reclass checkout to /etc/ansible
56
57 If you prefer to put those directories elsewhere, you can create
58 /etc/ansible/reclass-config.yml with contents such as
59
60 storage_type: yaml_fs
61 nodes_uri: /srv/reclass/nodes
62 classes_uri: /srv/reclass/classes
63
64 Note that yaml_fs is currently the only supported storage_type, and it's
65 the default if you don't set it.
66
67 3. Check out your inventory by invoking
68
69 ./hosts --list
70
71 which should return 5 groups in JSON-format, and each group has exactly
72 one member 'localhost'.
73
74 4. See the node information for 'localhost':
75
76 ./hosts --host localhost
77
78 This should print a set of keys and values, including a greeting,
79 a colour, and a sub-class called 'RECLASS'.
80
81 5. Execute some ansible commands, e.g.
82
83 ansible -i hosts \* --list-hosts
84 ansible -i hosts \* -m ping
85 ansible -i hosts \* -m debug -a 'msg="${greeting}"'
86 ansible -i hosts \* -m setup
87 ansible-playbook -i hosts test.yml
88
89 6. You can also invoke reclass directly, which gives a slightly different
90 view onto the same data, i.e. before it has been adapted for Ansible:
91
92 /…/reclass.py --pretty-print --inventory
93 /…/reclass.py --pretty-print --nodeinfo localhost
94
95reclass concepts
96~~~~~~~~~~~~~~~~
97reclass assumes a node-centric perspective into your inventory. This is
98obvious when you query reclass for node-specific information, but it might not
99be clear when you ask reclass to provide you with a list of groups. In that
100case, reclass loops over all nodes it can find in its database, reads all
101information it can find about the nodes, and finally reorders the result to
102provide a list of groups with the nodes they contain.
103
104Since the term 'groups' is somewhat ambiguous, it helps to start off with
105a short glossary of reclass-specific terminology:
106
107 node: A node, usually a computer in your infrastructure
108 class: A category, tag, feature, or role that applies to a node
109 Classes may be nested, i.e. there can be a class hierarchy
110 application: A specific set of behaviour to apply to members of a class
111 parameter: Node-specific variables, with inheritance throughout the class
112 hierarchy.
113
114A class consists of zero or more parent classes, zero or more applications,
115and any number of parameters.
116
117A node is almost equivalent to a class, except that it usually does not (but
118can) specify applications.
119
120When reclass parses a node (or class) definition and encounters a parent
121class, it recurses to this parent class first before reading any data of the
122node (or class). When reclass returns from the recursive, depth first walk, it
123then merges all information of the current node (or class) into the
124information it obtained during the recursion.
125
martin f. krafftff1cb062013-06-20 17:23:00 +0200126Furthermore, a node (or class) may define a list of classes it derives from,
127in which case classes defined further down the list will be able to override
128classes further up the list.
129
martin f. krafft3c333222013-06-14 19:27:57 +0200130Information in this context is essentially one of a list of applications or
131a list of parameters.
132
133The interaction between the depth-first walk and the delayed merging of data
134means that the node (and any class) may override any of the data defined by
135any of the parent classes (ancestors). This is in line with the assumption
136that more specific definitions ("this specific host") should have a higher
137precedence than more general definitions ("all webservers", which includes all
138webservers in Munich, which includes "this specific host", for example).
139
140Here's a quick example, showing how parameters accumulate and can get
141replaced.
142
143 All unixnodes (i.e. nodes who have the 'unixnodes' class in their ancestry)
144 have /etc/motd centrally-managed (through the 'motd' application), and the
145 unixnodes class definition provides a generic message-of-the-day to be put
146 into this file.
147
148 All debiannodes, which are descendants of unixnodes, should include the
149 Debian codename in this message, so the message-of-the-day is overwritten in
150 the debiannodes class.
151
152 The node 'quantum.example.org' will have a scheduled downtime this weekend,
153 so until Monday, an appropriate message-of-the-day is added to the node
154 definition.
155
martin f. krafftff1cb062013-06-20 17:23:00 +0200156 When the 'motd' application runs, it receives the appropriate
157 message-of-the-day (from 'quantum.example.org' when run on that host) and
158 writes it into /etc/motd.
martin f. krafft3c333222013-06-14 19:27:57 +0200159
160At this point it should be noted that parameters whose values are lists or
161key-value pairs don't get overwritten by children classes or node definitions,
162but the information gets merged (recursively) instead.
163
164Similarly to parameters, applications also accumulate during the recursive
165walk through the class ancestry. It is possible for a node or child class to
166_remove_ an application added by a parent class, by prefixing the application
167with '~'.
168
169Finally, reclass happily lets you use multiple inheritance, and ensures that
170the resolution of parameters is still well-defined. Here's another example
171building upon the one about /etc/motd above:
172
173 'quantum.example.org' (which is back up and therefore its node definition no
174 longer contains a message-of-the-day) is at a site in Munich. Therefore, it
175 is a child of the class 'hosted@munich'. This class is independent of the
176 'unixnode' hierarchy, 'quantum.example.org' derives from both.
177
178 In this example infrastructure, 'hosted@munich' is more specific than
179 'debiannodes' because there are plenty of Debian nodes at other sites (and
180 some non-Debian nodes in Munich). Therefore, 'quantum.example.org' derives
181 from 'hosted@munich' _after_ 'debiannodes'.
182
183 When an electricity outage is expected over the weekend in Munich, the admin
184 can change the message-of-the-day in the 'hosted@munich' class, and it will
185 apply to all hosts in Munich.
186
187 However, not all hosts in Munich have /etc/motd, because some of them are
188 'windowsnodes'. Since the 'windowsnodes' ancestry does not specify the
189 'motd' application, those hosts have access to the message-of-the-day in the
190 node variables, but the message won't get used…
191
192 … unless, of course, 'windowsnodes' specified a Windows-specific application
193 to bring such notices to the attention of the user.
194
martin f. krafftff1cb062013-06-20 17:23:00 +0200195It's also trivial to ensure a certain order of class evaluation. Here's
196another example:
197
198 The 'ssh.server' class defines the 'permit_root_login' parameter to 'no'.
199
200 The 'backuppc.client' class defines the parameter to 'without-password',
201 because the BackupPC server might need to log in to the host as root.
202
203 Now, what happens if the admin accidentally provides the following two
204 classes?
205
206 - backuppc.client
207 - ssh.server
208
209 Theoretically, this would mean 'permit_root_login' gets set to 'no'.
210
211 However, since all 'backuppc.client' need 'ssh.server' (at least in most
212 setups), the class 'backuppc.client' itself derives from 'ssh.server',
213 ensuring that it gets parsed before 'backuppc.client'.
214
215 When reclass returns to the node and encounters the 'ssh.server' class
216 defined there, it simply skips over it.
217
martin f. krafft3c333222013-06-14 19:27:57 +0200218reclass operations
219~~~~~~~~~~~~~~~~~~
220While reclass has been built to support different storage backends through
221plugins, currently only the 'yaml_fs' storage backend exists. This is a very
222simple, yet powerful, YAML-based backend, using flat files on the filesystem
223(as suggested by the _fs postfix).
224
225yaml_fs works with two directories, one for node definitions, and another for
226class definitions. It is possible to use a single directory for both, but that
227could get messy and is therefore not recommended.
228
229Files in those directories are YAML-files, specifying key-value pairs. The
230following three keys are read by reclass:
231
232 classes: a list of parent classes
233 appliations: a list of applications to append to the applications defined by
234 ancestors. If an application name starts with '~', it would
235 remove this application from the list, if it had already been
236 added — but it does not prevent a future addition.
237 E.g. '~firewalled'
238 parameters: key-value pairs to set defaults in class definitions, override
239 existing data, or provide node-specific information in node
240 specifications.
241 By convention, parameters corresponding to an application
242 should be provided as subkey-value pairs, keyed by the name of
243 the application, e.g.
244
245 applications:
246 - ssh.server
247 parameters:
248 ssh.server:
249 permit_root_login: no
250
251reclass starts out reading a node definition file, obtains the list of
252classes, then reads the files corresponding to these classes, recursively
253reading parent classes, and finally merges the applications list (append
254unless
255
martin f. krafft9b2049e2013-06-14 20:05:08 +0200256Version control
257~~~~~~~~~~~~~~~
258I recommend you maintain your reclass inventory database in Git, right from
259the start.
260
martin f. krafft3c333222013-06-14 19:27:57 +0200261Usage
262~~~~~
263For information on how to use reclass directly, invoke reclass.py with --help
264and study the output.
265
266More commonly, however, use of reclass will happen indirectly, and through
267so-called adapters, e.g. /…/reclass/adapters/ansible. The job of an adapter is
268to translate between different invocation paradigms, provide a sane set of
269default options, and massage the data from reclass into the format expected by
270the automation tool in use.
271
272Configuration file
273~~~~~~~~~~~~~~~~~~
274reclass can read some of its configuration from a file. The file is
275a YAML-file and simply defines key-value pairs.
276
277The configuration file can be used to set defaults for all the options that
278are otherwise configurable via the command-line interface, so please use the
279--help output of reclass for reference. The command-line option '--nodes-uri'
280corresponds to the key 'nodes_uri' in the configuration file. For example:
281
282 storage_type: yaml_fs
283 pretty_print: True
284 output: json
285 nodes_uri: ../nodes
286
287reclass first looks in the current directory for the file called
288'reclass-config.yml' and if no such file is found, it looks "next to" the
289reclass script itself. Adapters implement their own lookup logic.
290
291Integration with Ansible
292~~~~~~~~~~~~~~~~~~~~~~~~
293The integration between reclass and Ansible is performed through an adapter,
294and needs not be of our concern too much.
295
296However, Ansible has no concept of "nodes", "applications", "parameters", and
297"classes". Therefore it is necessary to explain how those correspond to
298Ansible. Crudely, the following mapping exists:
299
300 nodes hosts
301 classes groups
302 applications playbooks
303 parameters host_vars
304
305reclass does not provide any group_vars because of its node-centric
306perspective. While class definitions include parameters, those are inherited
307by the node definitions and hence become node_vars.
308
309reclass also does not provide playbooks, nor does it deal with any of the
310related Ansible concepts, i.e. vars_files, vars, tasks, handlers, roles, etc..
311
312 Let it be said at this point that you'll probably want to stop using
313 host_vars, group_vars and vars_files altogether, and if only because you
314 should no longer need them, but also because the variable precedence rules
315 of Ansible are full of surprises, at least to me.
316
317reclass' Ansible adapter massage the reclass output into Ansible-usable data,
318namely:
319
320 - Every class in the ancestry of a node becomes a group to Ansible. This is
321 mainly useful to be able to target nodes during interactive use of
322 Ansible, e.g.
323
324 ansible debiannode@wheezy -m command -a 'apt-get upgrade'
325 → upgrade all Debian nodes running wheezy
326
327 ansible ssh.server -m command -a 'invoke-rc.d ssh restart'
328 → restart all SSH server processes
329
330 ansible mailserver -m command -a 'tail -n1000 /var/log/mail.err'
331 → obtain the last 1,000 lines of all mailserver error log files
332
333 The attentive reader might stumble over the use of singular words, whereas
334 it might make more sense to address all 'mailserver*s*' with this tool.
335 This is convention and up to you. I prefer to think of my node as
336 a (singular) mailserver when I add 'mailserver' to its parent classes.
337
338 - Every entry in the list of a host's applications might well correspond to
339 an Ansible playbook. Therefore, reclass creates a (Ansible-)group for
340 every application, and adds '_hosts' to the name.
341
342 For instance, the ssh.server class adds the ssh.server application to
343 a node's application list. Now the admin might create an Ansible playbook
344 like so:
345
346 - name: SSH server management
347 hosts: ssh.server_hosts ← SEE HERE
348 tasks:
349 - name: install SSH package
350 action: …
351
352
353 There's a bit of redundancy in this, but unfortunately Ansible playbooks
354 hardcode the nodes to which a playbook applies.
355
martin f. krafftb608e6d2013-06-14 22:10:43 +0200356 It's now trivial to apply this playbook across your infrastructure:
357
358 ansible-playbook ssh.server.yml
359
360 My suggested way to use Ansible site-wide is then to create a 'site'
martin f. krafft3c333222013-06-14 19:27:57 +0200361 playbook that includes all the other playbooks (which shall hopefully be
362 based on Ansible roles), and then to invoke Ansible like this:
363
364 ansible-playbook site.yml
365
366 or, if you prefer only to reconfigure a subset of nodes, e.g. all
367 webservers:
368
369 ansible-playbook site.yml --limit webserver
370
371 Again, if the singular word 'webserver' puts you off, change the
372 convention as you wish.
373
martin f. krafftb608e6d2013-06-14 22:10:43 +0200374 And if anyone comes up with a way to directly connect groups in the
375 inventory with roles, thereby making it unnecessary to write playbook
376 files (containing redundant information), please tell me!
377
martin f. krafft3c333222013-06-14 19:27:57 +0200378 - Parameters corresponding to a node become host_vars for that host.
379
martin f. krafft6e9dcba2013-06-16 15:21:09 +0200380It is possible to include Jinja2-style variables like you would in Ansible,
381in parameter values. This is especially powerful in combination with the
382recursive merging, e.g.
383
384 parameters:
385 motd:
386 greeting: Welcome to {{ ansible_fqdn }}!
387 closing: This system is part of {{ realm }}
388
389Now you just need to specify realm somewhere. The reference can reside in
390a parent class, while the variable is defined e.g. in the node.
391
martin f. krafft3c333222013-06-14 19:27:57 +0200392Contributing to reclass
393~~~~~~~~~~~~~~~~~~~~~~~
394Conttributions to reclass are very welcome. Since I prefer to keep a somewhat
395clean history, I will not merge pull requests. Please send your patches using
396git-format-patch and git-send-e-mail to reclass@pobox.madduck.net.
397
398I have added rudimentary unit tests, and it would be nice if you could submit
399your changes with appropriate changes to the tests. To run tests, invoke
400./run_tests.py in the top-level checkout directory.
401
402If you have larger ideas, I'll be looking forward to discuss them with you.
403
martin f. kraffte39e8902013-06-14 22:12:17 +0200404 -- martin f. krafft <madduck@madduck.net> Fri, 14 Jun 2013 22:12:05 +0200