README - packaging/sources/reclass - Gitiles

 =============================================================
       reclass — recursive external node classification
 =============================================================
 reclass is © 2007–2013 martin f. krafft <madduck@madduck.net>
 and available under the terms of the Artistic Licence 2.0
 '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

 reclass is an "external node classifier" (ENC) as can be used with automation
 tools, such as Puppet, Salt, and Ansible.

 The purpose of an ENC is to allow a system administrator to maintain an
 inventory of nodes to be managed, completely separately from the configuration
 of the automation tool. Usually, the external node classifier completely
 replaces the tool-specific inventory (such as site.pp for Puppet,
 ext_pillar/master_tops for Salt, or /etc/ansible/hosts).

 reclass allows you to define your nodes through class inheritance, while
 always able to override details of classes further up the tree. Think of
 classes as feature sets, as commonalities between nodes, or as tags. Add to
 that the ability to nest classes (multiple inheritance is allowed,
 well-defined, and encouraged), and piece together your infrastructure from
 smaller bits, eliminating redundancy and exposing all important parameters to
 a single location, logically organised.

 In general, the ENC fulfills two jobs:

   - it provides information about groups of nodes and group memberships
   - it gives access to node-specific information, such as variables

 In this document, you will find an overview of the concepts of reclass and the
 way it works. Have a look at README.Salt and README.Ansible for information
 about integration of reclass with these tools.

 Installation
 ~~~~~~~~~~~~
 Before you can use reclass, you need to install it into a place where Python
 can find it. Unless you installed a package from your distribution, the
 following step:

   python setup.py install

 This will install the package to /usr/local, which is likely in your Python
 path. You can check this using

   python -c 'import sys; print sys.path'

 If you want to install to a different location, use --prefix like so:

   python setup.py install --prefix=/opt/local

 More options can be found in the output of

   python setup.py install --help
   python setup.py --help
   python setup.py --help-commands
   python setup.py --help [cmd]

 If you just want to run reclass from source, e.g. because you are going to be
 making and testing changes, install it in "development mode":

   python setup.py develop

 To uninstall:

   python setup.py develop --uninstall

 Uninstallation currently isn't possible for packages installed to /usr/local
 as per the above method, unfortunately: http://bugs.python.org/issue4673.
 The following should do:

   rm -r /usr/local/lib/python*/dist-packages/reclass* /usr/local/bin/reclass

 reclass concepts
 ~~~~~~~~~~~~~~~~
 reclass assumes a node-centric perspective into your inventory. This is
 obvious when you query reclass for node-specific information, but it might not
 be clear when you ask reclass to provide you with a list of groups. In that
 case, reclass loops over all nodes it can find in its database, reads all
 information it can find about the nodes, and finally reorders the result to
 provide a list of groups with the nodes they contain.

 Since the term 'groups' is somewhat ambiguous, it helps to start off with
 a short glossary of reclass-specific terminology:

   node:         A node, usually a computer in your infrastructure
   class:        A category, tag, feature, or role that applies to a node
                 Classes may be nested, i.e. there can be a class hierarchy
   application:  A specific set of behaviour to apply to members of a class
   parameter:    Node-specific variables, with inheritance throughout the class
                 hierarchy.

 A class consists of zero or more parent classes, zero or more applications,
 and any number of parameters.

 A node is almost equivalent to a class, except that it usually does not (but
 can) specify applications.

 When reclass parses a node (or class) definition and encounters a parent
 class, it recurses to this parent class first before reading any data of the
 node (or class). When reclass returns from the recursive, depth first walk, it
 then merges all information of the current node (or class) into the
 information it obtained during the recursion.

 Furthermore, a node (or class) may define a list of classes it derives from,
 in which case classes defined further down the list will be able to override
 classes further up the list.

 Information in this context is essentially one of a list of applications or
 a list of parameters.

 The interaction between the depth-first walk and the delayed merging of data
 means that the node (and any class) may override any of the data defined by
 any of the parent classes (ancestors). This is in line with the assumption
 that more specific definitions ("this specific host") should have a higher
 precedence than more general definitions ("all webservers", which includes all
 webservers in Munich, which includes "this specific host", for example).

 Here's a quick example, showing how parameters accumulate and can get
 replaced.

   All unixnodes (i.e. nodes who have the 'unixnodes' class in their ancestry)
   have /etc/motd centrally-managed (through the 'motd' application), and the
   unixnodes class definition provides a generic message-of-the-day to be put
   into this file.

   All debiannodes, which are descendants of unixnodes, should include the
   Debian codename in this message, so the message-of-the-day is overwritten in
   the debiannodes class.

   The node 'quantum.example.org' will have a scheduled downtime this weekend,
   so until Monday, an appropriate message-of-the-day is added to the node
   definition.

   When the 'motd' application runs, it receives the appropriate
   message-of-the-day (from 'quantum.example.org' when run on that node) and
   writes it into /etc/motd.

 At this point it should be noted that parameters whose values are lists or
 key-value pairs don't get overwritten by children classes or node definitions,
 but the information gets merged (recursively) instead.

 Similarly to parameters, applications also accumulate during the recursive
 walk through the class ancestry. It is possible for a node or child class to
 _remove_ an application added by a parent class, by prefixing the application
 with '~'.

 Finally, reclass happily lets you use multiple inheritance, and ensures that
 the resolution of parameters is still well-defined. Here's another example
 building upon the one about /etc/motd above:

   'quantum.example.org' (which is back up and therefore its node definition no
   longer contains a message-of-the-day) is at a site in Munich. Therefore, it
   is a child of the class 'hosted@munich'. This class is independent of the
   'unixnode' hierarchy, 'quantum.example.org' derives from both.

   In this example infrastructure, 'hosted@munich' is more specific than
   'debiannodes' because there are plenty of Debian nodes at other sites (and
   some non-Debian nodes in Munich). Therefore, 'quantum.example.org' derives
   from 'hosted@munich' _after_ 'debiannodes'.

   When an electricity outage is expected over the weekend in Munich, the admin
   can change the message-of-the-day in the 'hosted@munich' class, and it will
   apply to all hosts in Munich.

   However, not all hosts in Munich have /etc/motd, because some of them are
   'windowsnodes'. Since the 'windowsnodes' ancestry does not specify the
   'motd' application, those hosts have access to the message-of-the-day in the
   node variables, but the message won't get used…

   … unless, of course, 'windowsnodes' specified a Windows-specific application
   to bring such notices to the attention of the user.

 It's also trivial to ensure a certain order of class evaluation. Here's
 another example:

   The 'ssh.server' class defines the 'permit_root_login' parameter to 'no'.

   The 'backuppc.client' class defines the parameter to 'without-password',
   because the BackupPC server might need to log in to the host as root.

   Now, what happens if the admin accidentally provides the following two
   classes?

     - backuppc.client
     - ssh.server

   Theoretically, this would mean 'permit_root_login' gets set to 'no'.

   However, since all 'backuppc.client' nodes need 'ssh.server' (at least in
   most setups), the class 'backuppc.client' itself derives from 'ssh.server',
   ensuring that it gets parsed before 'backuppc.client'.

   When reclass returns to the node and encounters the 'ssh.server' class
   defined there, it simply skips it, as it's already been processed.

 reclass operations
 ~~~~~~~~~~~~~~~~~~
 While reclass has been built to support different storage backends through
 plugins, currently only the 'yaml_fs' storage backend exists. This is a very
 simple, yet powerful, YAML-based backend, using flat files on the filesystem
 (as suggested by the _fs postfix).

 yaml_fs works with two directories, one for node definitions, and another for
 class definitions. It is possible to use a single directory for both, but that
 could get messy and is therefore not recommended.

 Files in those directories are YAML-files, specifying key-value pairs. The
 following three keys are read by reclass:

   classes:     a list of parent classes
   appliations: a list of applications to append to the applications defined by
                ancestors. If an application name starts with '~', it would
                remove this application from the list, if it had already been
                added — but it does not prevent a future addition.
                E.g. '~firewalled'
   parameters:  key-value pairs to set defaults in class definitions, override
                existing data, or provide node-specific information in node
                specifications.
                By convention, parameters corresponding to an application
                should be provided as subkey-value pairs, keyed by the name of
                the application, e.g.

                  applications:
                    - ssh.server
                  parameters:
                    ssh.server:
                      permit_root_login: no

 reclass starts out reading a node definition file, obtains the list of
 classes, then reads the files corresponding to these classes, recursively
 reading parent classes, and finally merges the applications list (append
 unless

 Version control
 ~~~~~~~~~~~~~~~
 I recommend you maintain your reclass inventory database in Git, right from
 the start.

 Usage
 ~~~~~
 For information on how to use reclass directly, invoke reclass.py with --help
 and study the output.

 The three options --inventory-base-uri, --nodes-uri, and --classes-uri
 together specify the location of the inventory. If the base URI is specified,
 then it is prepended to the other two URIs, unless they are absolute URIs. If
 these two URIs are not specified, they default to 'nodes' and 'classes'.
 Therefore, if your inventory is in '/etc/reclass/nodes' and
 '/etc/reclass/classes', all you need to specify is the base URI as
 '/etc/reclass'.

 More commonly, however, use of reclass will happen indirectly, and through
 so-called adapters, e.g. /…/reclass/adapters/salt. The job of an adapter is to
 translate between different invocation paradigms, provide a sane set of
 default options, and massage the data from reclass into the format expected by
 the automation tool in use.

 Configuration file
 ~~~~~~~~~~~~~~~~~~
 reclass can read some of its configuration from a file. The file is
 a YAML-file and simply defines key-value pairs.

 The configuration file can be used to set defaults for all the options that
 are otherwise configurable via the command-line interface, so please use the
 --help output of reclass for reference. The command-line option '--nodes-uri'
 corresponds to the key 'nodes_uri' in the configuration file. For example:

   storage_type: yaml_fs
   pretty_print: True
   output: json
   inventory_base_uri: /etc/reclass
   nodes_uri: ../nodes

 reclass first looks in the current directory for the file called
 'reclass-config.yml' and if no such file is found, it looks "next to" the
 reclass script itself. Adapters implement their own lookup logic.

 Contributing to reclass
 ~~~~~~~~~~~~~~~~~~~~~~~
 Conttributions to reclass are very welcome. Since I prefer to keep a somewhat
 clean history, I will not just merge pull request.

 You can submit pull requests, of course, and I'll rebase them onto HEAD before
 merging. Or send your patches using git-format-patch and git-send-e-mail to
 reclass@pobox.madduck.net.

 I have added rudimentary unit tests, and it would be nice if you could submit
 your changes with appropriate changes to the tests. To run tests, invoke
 ./run_tests.py in the top-level checkout directory.

 If you have larger ideas, I'll be looking forward to discuss them with you.

  -- martin f. krafft <madduck@madduck.net>  Fri, 14 Jun 2013 22:12:05 +0200
	=============================================================
	reclass — recursive external node classification
	=============================================================
	reclass is © 2007–2013 martin f. krafft <madduck@madduck.net>
	and available under the terms of the Artistic Licence 2.0
	'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

	reclass is an "external node classifier" (ENC) as can be used with automation
	tools, such as Puppet, Salt, and Ansible.

	The purpose of an ENC is to allow a system administrator to maintain an
	inventory of nodes to be managed, completely separately from the configuration
	of the automation tool. Usually, the external node classifier completely
	replaces the tool-specific inventory (such as site.pp for Puppet,
	ext_pillar/master_tops for Salt, or /etc/ansible/hosts).

	reclass allows you to define your nodes through class inheritance, while
	always able to override details of classes further up the tree. Think of
	classes as feature sets, as commonalities between nodes, or as tags. Add to
	that the ability to nest classes (multiple inheritance is allowed,
	well-defined, and encouraged), and piece together your infrastructure from
	smaller bits, eliminating redundancy and exposing all important parameters to
	a single location, logically organised.

	In general, the ENC fulfills two jobs:

	- it provides information about groups of nodes and group memberships
	- it gives access to node-specific information, such as variables

	In this document, you will find an overview of the concepts of reclass and the
	way it works. Have a look at README.Salt and README.Ansible for information
	about integration of reclass with these tools.

	Installation
	~~~~~~~~~~~~
	Before you can use reclass, you need to install it into a place where Python
	can find it. Unless you installed a package from your distribution, the
	following step:

	python setup.py install

	This will install the package to /usr/local, which is likely in your Python
	path. You can check this using

	python -c 'import sys; print sys.path'

	If you want to install to a different location, use --prefix like so:

	python setup.py install --prefix=/opt/local

	More options can be found in the output of

	python setup.py install --help
	python setup.py --help
	python setup.py --help-commands
	python setup.py --help [cmd]

	If you just want to run reclass from source, e.g. because you are going to be
	making and testing changes, install it in "development mode":

	python setup.py develop

	To uninstall:

	python setup.py develop --uninstall

	Uninstallation currently isn't possible for packages installed to /usr/local
	as per the above method, unfortunately: http://bugs.python.org/issue4673.
	The following should do:

	rm -r /usr/local/lib/python/dist-packages/reclass /usr/local/bin/reclass

	reclass concepts
	~~~~~~~~~~~~~~~~
	reclass assumes a node-centric perspective into your inventory. This is
	obvious when you query reclass for node-specific information, but it might not
	be clear when you ask reclass to provide you with a list of groups. In that
	case, reclass loops over all nodes it can find in its database, reads all
	information it can find about the nodes, and finally reorders the result to
	provide a list of groups with the nodes they contain.

	Since the term 'groups' is somewhat ambiguous, it helps to start off with
	a short glossary of reclass-specific terminology:

	node: A node, usually a computer in your infrastructure
	class: A category, tag, feature, or role that applies to a node
	Classes may be nested, i.e. there can be a class hierarchy
	application: A specific set of behaviour to apply to members of a class
	parameter: Node-specific variables, with inheritance throughout the class
	hierarchy.

	A class consists of zero or more parent classes, zero or more applications,
	and any number of parameters.

	A node is almost equivalent to a class, except that it usually does not (but
	can) specify applications.

	When reclass parses a node (or class) definition and encounters a parent
	class, it recurses to this parent class first before reading any data of the
	node (or class). When reclass returns from the recursive, depth first walk, it
	then merges all information of the current node (or class) into the
	information it obtained during the recursion.

	Furthermore, a node (or class) may define a list of classes it derives from,
	in which case classes defined further down the list will be able to override
	classes further up the list.

	Information in this context is essentially one of a list of applications or
	a list of parameters.

	The interaction between the depth-first walk and the delayed merging of data
	means that the node (and any class) may override any of the data defined by
	any of the parent classes (ancestors). This is in line with the assumption
	that more specific definitions ("this specific host") should have a higher
	precedence than more general definitions ("all webservers", which includes all
	webservers in Munich, which includes "this specific host", for example).

	Here's a quick example, showing how parameters accumulate and can get
	replaced.

	All unixnodes (i.e. nodes who have the 'unixnodes' class in their ancestry)
	have /etc/motd centrally-managed (through the 'motd' application), and the
	unixnodes class definition provides a generic message-of-the-day to be put
	into this file.

	All debiannodes, which are descendants of unixnodes, should include the
	Debian codename in this message, so the message-of-the-day is overwritten in
	the debiannodes class.

	The node 'quantum.example.org' will have a scheduled downtime this weekend,
	so until Monday, an appropriate message-of-the-day is added to the node
	definition.

	When the 'motd' application runs, it receives the appropriate
	message-of-the-day (from 'quantum.example.org' when run on that node) and
	writes it into /etc/motd.

	At this point it should be noted that parameters whose values are lists or
	key-value pairs don't get overwritten by children classes or node definitions,
	but the information gets merged (recursively) instead.

	Similarly to parameters, applications also accumulate during the recursive
	walk through the class ancestry. It is possible for a node or child class to
	_remove_ an application added by a parent class, by prefixing the application
	with '~'.

	Finally, reclass happily lets you use multiple inheritance, and ensures that
	the resolution of parameters is still well-defined. Here's another example
	building upon the one about /etc/motd above:

	'quantum.example.org' (which is back up and therefore its node definition no
	longer contains a message-of-the-day) is at a site in Munich. Therefore, it
	is a child of the class 'hosted@munich'. This class is independent of the
	'unixnode' hierarchy, 'quantum.example.org' derives from both.

	In this example infrastructure, 'hosted@munich' is more specific than
	'debiannodes' because there are plenty of Debian nodes at other sites (and
	some non-Debian nodes in Munich). Therefore, 'quantum.example.org' derives
	from 'hosted@munich' _after_ 'debiannodes'.

	When an electricity outage is expected over the weekend in Munich, the admin
	can change the message-of-the-day in the 'hosted@munich' class, and it will
	apply to all hosts in Munich.

	However, not all hosts in Munich have /etc/motd, because some of them are
	'windowsnodes'. Since the 'windowsnodes' ancestry does not specify the
	'motd' application, those hosts have access to the message-of-the-day in the
	node variables, but the message won't get used…

	… unless, of course, 'windowsnodes' specified a Windows-specific application
	to bring such notices to the attention of the user.

	It's also trivial to ensure a certain order of class evaluation. Here's
	another example:

	The 'ssh.server' class defines the 'permit_root_login' parameter to 'no'.

	The 'backuppc.client' class defines the parameter to 'without-password',
	because the BackupPC server might need to log in to the host as root.

	Now, what happens if the admin accidentally provides the following two
	classes?

	- backuppc.client
	- ssh.server

	Theoretically, this would mean 'permit_root_login' gets set to 'no'.

	However, since all 'backuppc.client' nodes need 'ssh.server' (at least in
	most setups), the class 'backuppc.client' itself derives from 'ssh.server',
	ensuring that it gets parsed before 'backuppc.client'.

	When reclass returns to the node and encounters the 'ssh.server' class
	defined there, it simply skips it, as it's already been processed.

	reclass operations
	~~~~~~~~~~~~~~~~~~
	While reclass has been built to support different storage backends through
	plugins, currently only the 'yaml_fs' storage backend exists. This is a very
	simple, yet powerful, YAML-based backend, using flat files on the filesystem
	(as suggested by the _fs postfix).

	yaml_fs works with two directories, one for node definitions, and another for
	class definitions. It is possible to use a single directory for both, but that
	could get messy and is therefore not recommended.

	Files in those directories are YAML-files, specifying key-value pairs. The
	following three keys are read by reclass:

	classes: a list of parent classes
	appliations: a list of applications to append to the applications defined by
	ancestors. If an application name starts with '~', it would
	remove this application from the list, if it had already been
	added — but it does not prevent a future addition.
	E.g. '~firewalled'
	parameters: key-value pairs to set defaults in class definitions, override
	existing data, or provide node-specific information in node
	specifications.
	By convention, parameters corresponding to an application
	should be provided as subkey-value pairs, keyed by the name of
	the application, e.g.

	applications:
	- ssh.server
	parameters:
	ssh.server:
	permit_root_login: no

	reclass starts out reading a node definition file, obtains the list of
	classes, then reads the files corresponding to these classes, recursively
	reading parent classes, and finally merges the applications list (append
	unless

	Version control
	~~~~~~~~~~~~~~~
	I recommend you maintain your reclass inventory database in Git, right from
	the start.

	Usage
	~~~~~
	For information on how to use reclass directly, invoke reclass.py with --help
	and study the output.

	The three options --inventory-base-uri, --nodes-uri, and --classes-uri
	together specify the location of the inventory. If the base URI is specified,
	then it is prepended to the other two URIs, unless they are absolute URIs. If
	these two URIs are not specified, they default to 'nodes' and 'classes'.
	Therefore, if your inventory is in '/etc/reclass/nodes' and
	'/etc/reclass/classes', all you need to specify is the base URI as
	'/etc/reclass'.

	More commonly, however, use of reclass will happen indirectly, and through
	so-called adapters, e.g. /…/reclass/adapters/salt. The job of an adapter is to
	translate between different invocation paradigms, provide a sane set of
	default options, and massage the data from reclass into the format expected by
	the automation tool in use.

	Configuration file
	~~~~~~~~~~~~~~~~~~
	reclass can read some of its configuration from a file. The file is
	a YAML-file and simply defines key-value pairs.

	The configuration file can be used to set defaults for all the options that
	are otherwise configurable via the command-line interface, so please use the
	--help output of reclass for reference. The command-line option '--nodes-uri'
	corresponds to the key 'nodes_uri' in the configuration file. For example:

	storage_type: yaml_fs
	pretty_print: True
	output: json
	inventory_base_uri: /etc/reclass
	nodes_uri: ../nodes

	reclass first looks in the current directory for the file called
	'reclass-config.yml' and if no such file is found, it looks "next to" the
	reclass script itself. Adapters implement their own lookup logic.

	Contributing to reclass
	~~~~~~~~~~~~~~~~~~~~~~~
	Conttributions to reclass are very welcome. Since I prefer to keep a somewhat
	clean history, I will not just merge pull request.

	You can submit pull requests, of course, and I'll rebase them onto HEAD before
	merging. Or send your patches using git-format-patch and git-send-e-mail to
	reclass@pobox.madduck.net.

	I have added rudimentary unit tests, and it would be nice if you could submit
	your changes with appropriate changes to the tests. To run tests, invoke
	./run_tests.py in the top-level checkout directory.

	If you have larger ideas, I'll be looking forward to discuss them with you.

	-- martin f. krafft <madduck@madduck.net> Fri, 14 Jun 2013 22:12:05 +0200