describe architecture Change-Id: Ifc2cdddda95d0bfead3b666dd3625c6bbcc41fce

commit: d2b829749f7909d4d45066726da9e8c6d16a06a0 [log] [tgz]
author: Tomáš Kukrál <tkukral@mirantis.com> Tue Aug 29 12:45:45 2017 +0200
committer: Tomáš Kukrál <tkukral@mirantis.com> Wed Sep 20 11:27:20 2017 +0200
tree: 6db9861f043c52368a3b0458043dc58dabf2d304
parent: 91c8316269b324107293668c75a5f249f1792c4e [diff]
diff --git a/README.rst b/README.rst
index c7db5ea..d5e1c98 100644
--- a/README.rst
+++ b/README.rst

@@ -12,6 +12,167 @@
 Use salt-formula-linux for initial disk partitioning.
 
 
+Daemons
+--------
+
+Ceph uses several daemons to handle data and cluster state. Each daemon type requires different computing capacity and hardware optimization.
+
+These daemons are currently supported by formula:
+
+* MON (`ceph.mon`)
+* OSD (`ceph.osd`)
+* RGW (`ceph.radosgw`)
+
+
+Architecture decisions
+-----------------------
+
+Please refer to upstream achritecture documents before designing your cluster. Solid understanding of Ceph principles is essential for making architecture decisions described bellow.
+http://docs.ceph.com/docs/master/architecture/
+
+* Ceph version
+
+There is 3 or 4 stable releases every year and many of nighty/dev release. You should decide which version will be used since the only stable releases are recommended for production. Some of the releases are marked LTS (Long Term Stable) and these releases receive bugfixed for longer period - usually until next LTS version is released.
+
+* Number of MON daemons
+
+Use 1 MON daemon for testing, 3 MONs for smaller production clusters and 5 MONs for very large production cluster. There is no need to have more than 5 MONs in normal environment because there isn't any significant benefit in running more than 5 MONs. Ceph require MONS to form quorum so you need to heve more than 50% of the MONs up and running to have fully operational cluster. Every I/O operation will stop once less than 50% MONs is availabe because they can't form quorum.
+
+* Number of PGs
+
+Placement groups are providing mappping between stored data and OSDs. It is necessary to calculate number of PGs because there should be stored decent amount of PGs on each OSD. Please keep in mind *decreasing number of PGs* isn't possible and *increading* can affect cluster performance.
+
+http://docs.ceph.com/docs/master/rados/operations/placement-groups/
+http://ceph.com/pgcalc/
+
+* Daemon colocation
+
+It is recommended to dedicate nodes for MONs and RWG since colocation can have and influence on cluster operations. Howerver, small clusters can be running MONs on OSD node but it is critical to have enough of resources for MON daemons because they are the most important part of the cluster.
+
+Installing RGW on node with other daemons isn't recommended because RGW daemon usually require a lot of bandwith and it harm cluster health.
+
+* Journal location
+
+There are two way to setup journal:
+  * **Colocated** journal is located (usually at the beginning) on the same disk as partition for the data. This setup is easier for installation and it doesn't require any other disk to be used. However, colocated setup is significantly slower than dedicated)
+  * **Dedicate** journal is placed on different disk than data. This setup can deliver much higher performance than colocated but it require to have more disks in servers. Journal drives should be carefully selected because high I/O and durability is required.
+
+* Store type (Bluestore/Filestore)
+
+Recent version of Ceph support Bluestore as storage backend and backend should be used if available.
+
+http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/
+
+* Cluster and public network
+
+Ceph cluster is accessed using network and thus you need to have decend capacity to handle all the client. There are two networks required for cluster: **public** network and cluster network. Public network is used for client connections and MONs and OSDs are listening on this network. Second network ic called **cluster** networks and this network is used for communication between OSDs. 
+
+Both networks should have dedicated interfaces, bonding interfaces and dedicating vlans on bonded interfaces isn't allowed. Good practise is dedicate more throughput for the cluster network because cluster traffic is more important than client traffic.
+
+* Pool parameters (size, min_size, type)
+
+You should setup each pool according to it's expected usage, at least `min_size` and `size` and pool type should be considered.
+
+* Cluster monitoring
+
+* Hardware
+
+Please refer to upstream hardware recommendation guide for general information about hardware.
+
+Ceph servers are required to fulfil special requirements becauce load generated by Ceph can be diametrically opposed to common load.
+
+http://docs.ceph.com/docs/master/start/hardware-recommendations/
+
+
+Basic management commands
+------------------------------
+
+Cluster
+********
+
+- :code:`ceph health` - check if cluster is healthy (:code:`ceph health detail` can provide more information)
+
+
+.. code-block:: bash
+
+  root@c-01:~# ceph health
+  HEALTH_OK
+
+- :code:`ceph status` - shows basic information about cluster
+
+
+.. code-block:: bash
+
+  root@c-01:~# ceph status
+      cluster e2dc51ae-c5e4-48f0-afc1-9e9e97dfd650
+       health HEALTH_OK
+       monmap e1: 3 mons at {1=192.168.31.201:6789/0,2=192.168.31.202:6789/0,3=192.168.31.203:6789/0}
+              election epoch 38, quorum 0,1,2 1,2,3
+       osdmap e226: 6 osds: 6 up, 6 in
+        pgmap v27916: 400 pgs, 2 pools, 21233 MB data, 5315 objects
+              121 GB used, 10924 GB / 11058 GB avail
+                   400 active+clean
+    client io 481 kB/s rd, 132 kB/s wr, 185 op/
+
+MON
+****
+
+http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/
+
+OSD
+****
+
+http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/
+
+- :code:`ceph osd tree` - show all OSDs and it's state
+
+.. code-block:: bash
+
+  root@c-01:~# ceph osd tree
+  ID WEIGHT   TYPE NAME     UP/DOWN REWEIGHT PRIMARY-AFFINITY
+  -4        0 host c-04
+  -1 10.79993 root default
+  -2  3.59998     host c-01
+   0  1.79999         osd.0      up  1.00000          1.00000
+   1  1.79999         osd.1      up  1.00000          1.00000
+  -3  3.59998     host c-02
+   2  1.79999         osd.2      up  1.00000          1.00000
+   3  1.79999         osd.3      up  1.00000          1.00000
+  -5  3.59998     host c-03
+   4  1.79999         osd.4      up  1.00000          1.00000
+   5  1.79999         osd.5      up  1.00000          1.00000
+
+- :code:`ceph osd pools ls` - list of pool
+
+.. code-block:: bash
+
+  root@c-01:~# ceph osd lspools
+  0 rbd,1 test
+
+PG
+***
+
+http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg
+
+- :code:`ceph pg ls` - list placement groups
+
+.. code-block:: bash
+
+  root@c-01:~# ceph pg ls | head -n 4
+  pg_stat	objects	mip	degr	misp	unf	bytes	log	disklog	state	state_stamp	v	reported	up	up_primary	acting	acting_primary	last_scrub	scrub_stamp	last_deep_scrub	deep_scrub_stamp
+  0.0	11	0	0	0	0	46137344	3044	3044	active+clean	2015-07-02 10:12:40.603692	226'10652	226:1798	[4,2,0]	4	[4,2,0]	4	0'0	2015-07-01 18:38:33.126953	0'0	2015-07-01 18:17:01.904194
+  0.1	7	0	0	0	0	25165936	3026	3026	active+clean	2015-07-02 10:12:40.585833	226'5808	226:1070	[2,4,1]	2	[2,4,1]	2	0'0	2015-07-01 18:38:32.352721	0'0	2015-07-01 18:17:01.904198
+  0.2	18	0	0	0	0	75497472	3039	3039	active+clean	2015-07-02 10:12:39.569630	226'17447	226:3213	[3,1,5]	3	[3,1,5]	3	0'0	2015-07-01 18:38:34.308228	0'0	2015-07-01 18:17:01.904199
+
+- :code:`ceph pg map 1.1` - show mapping between PG and OSD
+
+.. code-block:: bash
+
+  root@c-01:~# ceph pg map 1.1
+  osdmap e226 pg 1.1 (1.1) -> up [5,1,2] acting [5,1,2]
+
+
+
 Sample pillars
 ==============
 
@@ -271,7 +432,7 @@
             crush_ruleset_name: 0
 
 Generate CRUSH map
-+++++++++++++++++++
+--------------------
 
 It is required to define the `type` for crush buckets and these types must start with `root` (top) and end with `host`. OSD daemons will be assigned to hosts according to it's hostname. Weight of the buckets will be calculated according to weight of it's childen.
commit	d2b829749f7909d4d45066726da9e8c6d16a06a0	[log] [tgz]
author	Tomáš Kukrál <tkukral@mirantis.com>	Tue Aug 29 12:45:45 2017 +0200
committer	Tomáš Kukrál <tkukral@mirantis.com>	Wed Sep 20 11:27:20 2017 +0200
tree	6db9861f043c52368a3b0458043dc58dabf2d304
parent	91c8316269b324107293668c75a5f249f1792c4e [diff]