Swiss army knife for verifying MCP cluster health

Features:

 * Verify offline minions
 * Verify time diff on your minions
 * Produce JSON output for ntpq command
 * Verify NTP peers state on your minions
 * Verify contrail nodes contrail-status output
 * Verify galera cluster status
 * Verify rabbitmq cluster status
 * Produce JSON output for rabbitmqctl commands
 * Verify haproxy upstream status
 * Produce haproxy JSON stats output
 * Verify disk space usage
 * Verify disk inodes usage
 * Verify load average
 * Verify ifaces rx/tx drops on the interfaces
 * Verify memory usage

Related-Prod: PROD-29236

Change-Id: Id7423665e8d45baee4b96751d9df29112dfa10e5
diff --git a/README.rst b/README.rst
index 4414544..8a31001 100644
--- a/README.rst
+++ b/README.rst
@@ -620,6 +620,141 @@
    {{- item }}
    %- endfor
 
+MCP Cluster health checks
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Swiss army knife toolset for verifying MCP cluster health.
+
+.. note:: Health checks are tested with salt modules >= 2017.7.
+
+Install health_checks module:
+
+.. code-block:: bash
+
+  cp health_checks.py /usr/share/salt-formulas/env/_modules/health_checks.py
+  salt -C '*' saltutil.sync_all
+
+Usually exit codes are not catched and salt-call for a module
+will always return exit 0 regardless of errors in output.
+If you want control exit code for scripting, you should pass
+**--retcode-passthrough** to each salt call:
+
+.. code-block:: bash
+
+  salt-call health_checks.minions_check --retcode-passthrough
+
+Verify if minions are online.
+Use it to determine which minions are offline.
+
+.. code-block:: bash
+
+  salt-call health_checks.minions_check
+
+Verify time diff on your minions:
+
+.. code-block:: bash
+
+  salt-call health_checks.time_diff_check
+
+In case of failure, dump diff JSON:
+
+.. code-block:: bash
+
+  salt-call health_checks.time_diff_check debug=True --out=json
+
+Get JSON stats from ntpq:
+
+.. code-block:: bash
+
+  salt-call health_checks.ntp_status
+
+Verify NTP peers status on the environment:
+
+.. code-block:: bash
+
+  salt-call health_checks.ntp_check
+  salt-call health_checks.ntp_check min_peers=2 max_stratum=2
+
+Verify contrail nodes contrail-status output:
+
+.. code-block:: bash
+
+  salt-call health_checks.contrail_check debug=True
+
+Verify galera cluster status:
+
+.. code-block:: bash
+
+  salt-call health_checks.galera_check debug=True
+  salt-call health_checks.galera_check cluster_size=3 debug=True
+
+Verify rabbitmq cluster status:
+
+.. code-block:: bash
+
+  salt-call health_checks.mysql_check debug=True
+
+Get rabbitmq json objects on command execution.
+
+.. warning:: This code is experimental. It is a hack to convert erlang object to JSON. May fail.
+
+.. code-block:: bash
+
+  salt-call health_checks.rabbitmq_cmd status
+  salt-call health_checks.rabbitmq_cmd cluster_status
+  salt-call health_checks.rabbitmq_cmd list_hashes
+  salt-call health_checks.rabbitmq_cmd list_ciphers
+
+Verify haproxy upstream status:
+
+.. code-block:: bash
+
+  salt-call health_checks.haproxy_check debug=True
+  salt-call health_checks.haproxy_check ignore_no_upstream=True
+
+Get haproxy JSON stats (native python calls to socket):
+
+.. code-block:: bash
+
+  salt-call health_checks.haproxy_status
+  salt-call health_checks.haproxy_status socket_path='/var/run/haproxy/admin.sock' stats_filter=['status']
+
+Verify disk space usage:
+
+.. code-block:: bash
+
+  salt-call health_checks.df_check
+  salt-call health_checks.df_check verify=space space_limit=90 ignore_partitions=['/']
+
+Verify disk inodes usage:
+
+.. code-block:: bash
+
+  salt-call health_checks.df_check verify=inodes
+  salt-call health_checks.df_check verify=inodes inode_limit=10
+
+Verify load average on the environment:
+
+.. code-block:: bash
+
+  salt-call health_checks.load_check
+  salt-call health_checks.load_check la1=4 la5=1 la15=1
+
+Verify ifaces rx/tx drops:
+
+.. code-block:: bash
+
+  salt-call health_checks.netdev_check
+  salt-call health_checks.netdev_check rx_drop_limit=0 tx_drop_limit=0
+
+Verify memory usage:
+
+.. code-block:: bash
+
+  salt-call health_checks.mem_check
+  salt-call health_checks.mem_check used_limit=50
+
+
 Encrypted pillars
 ~~~~~~~~~~~~~~~~~