Blame - README.rst - salt-formulas/ceph

blob: 9a1005245a183aac9d016a94216cff978229ef5f [file] [log] [blame]

jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	1
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	2	============
				3	Ceph formula
				4	============
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	5
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	6	Ceph provides extraordinary data storage scalability. Thousands of client
				7	hosts or KVMs accessing petabytes to exabytes of data. Each one of your
				8	applications can use the object, block or file system interfaces to the same
				9	RADOS cluster simultaneously, which means your Ceph storage system serves as a
				10	flexible foundation for all of your data storage needs.
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	11
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	12	Use salt-formula-linux for initial disk partitioning.
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	13
				14
Tomáš Kukrál	d2b8297	2017-08-29 12:45:45 +0200	[diff] [blame]	15	Daemons
				16	--------
				17
				18	Ceph uses several daemons to handle data and cluster state. Each daemon type requires different computing capacity and hardware optimization.
				19
				20	These daemons are currently supported by formula:
				21
				22	* MON (`ceph.mon`)
				23	* OSD (`ceph.osd`)
				24	* RGW (`ceph.radosgw`)
				25
				26
				27	Architecture decisions
				28	-----------------------
				29
				30	Please refer to upstream achritecture documents before designing your cluster. Solid understanding of Ceph principles is essential for making architecture decisions described bellow.
				31	http://docs.ceph.com/docs/master/architecture/
				32
				33	* Ceph version
				34
				35	There is 3 or 4 stable releases every year and many of nighty/dev release. You should decide which version will be used since the only stable releases are recommended for production. Some of the releases are marked LTS (Long Term Stable) and these releases receive bugfixed for longer period - usually until next LTS version is released.
				36
				37	* Number of MON daemons
				38
				39	Use 1 MON daemon for testing, 3 MONs for smaller production clusters and 5 MONs for very large production cluster. There is no need to have more than 5 MONs in normal environment because there isn't any significant benefit in running more than 5 MONs. Ceph require MONS to form quorum so you need to heve more than 50% of the MONs up and running to have fully operational cluster. Every I/O operation will stop once less than 50% MONs is availabe because they can't form quorum.
				40
				41	* Number of PGs
				42
				43	Placement groups are providing mappping between stored data and OSDs. It is necessary to calculate number of PGs because there should be stored decent amount of PGs on each OSD. Please keep in mind decreasing number of PGs isn't possible and increading can affect cluster performance.
				44
				45	http://docs.ceph.com/docs/master/rados/operations/placement-groups/
				46	http://ceph.com/pgcalc/
				47
				48	* Daemon colocation
				49
				50	It is recommended to dedicate nodes for MONs and RWG since colocation can have and influence on cluster operations. Howerver, small clusters can be running MONs on OSD node but it is critical to have enough of resources for MON daemons because they are the most important part of the cluster.
				51
				52	Installing RGW on node with other daemons isn't recommended because RGW daemon usually require a lot of bandwith and it harm cluster health.
				53
Tomáš Kukrál	d2b8297	2017-08-29 12:45:45 +0200	[diff] [blame]	54	* Store type (Bluestore/Filestore)
				55
				56	Recent version of Ceph support Bluestore as storage backend and backend should be used if available.
				57
				58	http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/
				59
Jiri Broulik	cc0d775	2017-11-18 18:58:21 +0100	[diff] [blame]	60	* Block.db location for Bluestore
				61
				62	There are two ways to setup block.db:
				63	* Colocated block.db partition is created on the same disk as partition for the data. This setup is easier for installation and it doesn't require any other disk to be used. However, colocated setup is significantly slower than dedicated)
				64	* Dedicate block.db is placed on different disk than data (or into partition). This setup can deliver much higher performance than colocated but it require to have more disks in servers. Block.db drives should be carefully selected because high I/O and durability is required.
				65
				66	* Block.wal location for Bluestore
				67
				68	There are two ways to setup block.wal - stores just the internal journal (write-ahead log):
				69	* Colocated block.wal uses free space of the block.db device.
				70	* Dedicate block.wal is placed on different disk than data (better put into partition as the size can be small) and possibly block.db device. This setup can deliver much higher performance than colocated but it require to have more disks in servers. Block.wal drives should be carefully selected because high I/O and durability is required.
				71
				72	* Journal location for Filestore
				73
				74	There are two ways to setup journal:
				75	* Colocated journal is created on the same disk as partition for the data. This setup is easier for installation and it doesn't require any other disk to be used. However, colocated setup is significantly slower than dedicated)
				76	* Dedicate journal is placed on different disk than data (or into partition). This setup can deliver much higher performance than colocated but it require to have more disks in servers. Journal drives should be carefully selected because high I/O and durability is required.
				77
Tomáš Kukrál	d2b8297	2017-08-29 12:45:45 +0200	[diff] [blame]	78	* Cluster and public network
				79
Mateusz Los	4dd8c4f	2017-12-01 09:53:02 +0100	[diff] [blame^]	80	Ceph cluster is accessed using network and thus you need to have decend capacity to handle all the client. There are two networks required for cluster: public network and cluster network. Public network is used for client connections and MONs and OSDs are listening on this network. Second network ic called cluster networks and this network is used for communication between OSDs.
Tomáš Kukrál	d2b8297	2017-08-29 12:45:45 +0200	[diff] [blame]	81
				82	Both networks should have dedicated interfaces, bonding interfaces and dedicating vlans on bonded interfaces isn't allowed. Good practise is dedicate more throughput for the cluster network because cluster traffic is more important than client traffic.
				83
				84	* Pool parameters (size, min_size, type)
				85
				86	You should setup each pool according to it's expected usage, at least `min_size` and `size` and pool type should be considered.
				87
				88	* Cluster monitoring
				89
				90	* Hardware
				91
				92	Please refer to upstream hardware recommendation guide for general information about hardware.
				93
				94	Ceph servers are required to fulfil special requirements becauce load generated by Ceph can be diametrically opposed to common load.
				95
				96	http://docs.ceph.com/docs/master/start/hardware-recommendations/
				97
				98
				99	Basic management commands
				100	------------------------------
				101
				102	Cluster
				103	********
				104
				105	- :code:`ceph health` - check if cluster is healthy (:code:`ceph health detail` can provide more information)
				106
				107
				108	.. code-block:: bash
				109
				110	root@c-01:~# ceph health
				111	HEALTH_OK
				112
				113	- :code:`ceph status` - shows basic information about cluster
				114
				115
				116	.. code-block:: bash
				117
				118	root@c-01:~# ceph status
				119	cluster e2dc51ae-c5e4-48f0-afc1-9e9e97dfd650
				120	health HEALTH_OK
				121	monmap e1: 3 mons at {1=192.168.31.201:6789/0,2=192.168.31.202:6789/0,3=192.168.31.203:6789/0}
				122	election epoch 38, quorum 0,1,2 1,2,3
				123	osdmap e226: 6 osds: 6 up, 6 in
				124	pgmap v27916: 400 pgs, 2 pools, 21233 MB data, 5315 objects
				125	121 GB used, 10924 GB / 11058 GB avail
				126	400 active+clean
				127	client io 481 kB/s rd, 132 kB/s wr, 185 op/
				128
				129	MON
				130	****
				131
				132	http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/
				133
				134	OSD
				135	****
				136
				137	http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/
				138
				139	- :code:`ceph osd tree` - show all OSDs and it's state
				140
				141	.. code-block:: bash
				142
				143	root@c-01:~# ceph osd tree
				144	ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
				145	-4 0 host c-04
				146	-1 10.79993 root default
				147	-2 3.59998 host c-01
				148	0 1.79999 osd.0 up 1.00000 1.00000
				149	1 1.79999 osd.1 up 1.00000 1.00000
				150	-3 3.59998 host c-02
				151	2 1.79999 osd.2 up 1.00000 1.00000
				152	3 1.79999 osd.3 up 1.00000 1.00000
				153	-5 3.59998 host c-03
				154	4 1.79999 osd.4 up 1.00000 1.00000
				155	5 1.79999 osd.5 up 1.00000 1.00000
				156
				157	- :code:`ceph osd pools ls` - list of pool
				158
				159	.. code-block:: bash
				160
				161	root@c-01:~# ceph osd lspools
				162	0 rbd,1 test
				163
				164	PG
				165	***
				166
				167	http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg
				168
				169	- :code:`ceph pg ls` - list placement groups
				170
				171	.. code-block:: bash
				172
				173	root@c-01:~# ceph pg ls \| head -n 4
				174	pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
				175	0.0 11 0 0 0 0 46137344 3044 3044 active+clean 2015-07-02 10:12:40.603692 226'10652 226:1798 [4,2,0] 4 [4,2,0] 4 0'0 2015-07-01 18:38:33.126953 0'0 2015-07-01 18:17:01.904194
				176	0.1 7 0 0 0 0 25165936 3026 3026 active+clean 2015-07-02 10:12:40.585833 226'5808 226:1070 [2,4,1] 2 [2,4,1] 2 0'0 2015-07-01 18:38:32.352721 0'0 2015-07-01 18:17:01.904198
				177	0.2 18 0 0 0 0 75497472 3039 3039 active+clean 2015-07-02 10:12:39.569630 226'17447 226:3213 [3,1,5] 3 [3,1,5] 3 0'0 2015-07-01 18:38:34.308228 0'0 2015-07-01 18:17:01.904199
				178
				179	- :code:`ceph pg map 1.1` - show mapping between PG and OSD
				180
				181	.. code-block:: bash
				182
				183	root@c-01:~# ceph pg map 1.1
				184	osdmap e226 pg 1.1 (1.1) -> up [5,1,2] acting [5,1,2]
				185
				186
				187
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	188	Sample pillars
				189	==============
				190
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	191	Common metadata for all nodes/roles
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	192
				193	.. code-block:: yaml
				194
				195	ceph:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	196	common:
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	197	version: luminous
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	198	config:
				199	global:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	200	param1: value1
				201	param2: value1
				202	param3: value1
				203	pool_section:
				204	param1: value2
				205	param2: value2
				206	param3: value2
				207	fsid: a619c5fc-c4ed-4f22-9ed2-66cf2feca23d
				208	members:
				209	- name: cmn01
				210	host: 10.0.0.1
				211	- name: cmn02
				212	host: 10.0.0.2
				213	- name: cmn03
				214	host: 10.0.0.3
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	215	keyring:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	216	admin:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	217	caps:
				218	mds: "allow *"
				219	mgr: "allow *"
				220	mon: "allow *"
				221	osd: "allow *"
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	222	bootstrap-osd:
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	223	caps:
				224	mon: "allow profile bootstrap-osd"
				225
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	226
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	227	Optional definition for cluster and public networks. Cluster network is used
				228	for replication. Public network for front-end communication.
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	229
				230	.. code-block:: yaml
				231
				232	ceph:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	233	common:
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	234	version: luminous
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	235	fsid: a619c5fc-c4ed-4f22-9ed2-66cf2feca23d
				236	....
				237	public_network: 10.0.0.0/24, 10.1.0.0/24
				238	cluster_network: 10.10.0.0/24, 10.11.0.0/24
				239
				240
				241	Ceph mon (control) roles
				242	------------------------
				243
				244	Monitors: A Ceph Monitor maintains maps of the cluster state, including the
				245	monitor map, the OSD map, the Placement Group (PG) map, and the CRUSH map.
				246	Ceph maintains a history (called an “epoch”) of each state change in the Ceph
				247	Monitors, Ceph OSD Daemons, and PGs.
				248
				249	.. code-block:: yaml
				250
				251	ceph:
				252	common:
				253	config:
				254	mon:
				255	key: value
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	256	mon:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	257	enabled: true
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	258	keyring:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	259	mon:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	260	caps:
				261	mon: "allow *"
				262	admin:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	263	caps:
				264	mds: "allow *"
				265	mgr: "allow *"
				266	mon: "allow *"
				267	osd: "allow *"
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	268
Ondrej Smola	91c8316	2017-09-12 16:40:02 +0200	[diff] [blame]	269	Ceph mgr roles
				270	------------------------
				271
				272	The Ceph Manager daemon (ceph-mgr) runs alongside monitor daemons, to provide additional monitoring and interfaces to external monitoring and management systems. Since the 12.x (luminous) Ceph release, the ceph-mgr daemon is required for normal operations. The ceph-mgr daemon is an optional component in the 11.x (kraken) Ceph release.
				273
				274	By default, the manager daemon requires no additional configuration, beyond ensuring it is running. If there is no mgr daemon running, you will see a health warning to that effect, and some of the other information in the output of ceph status will be missing or stale until a mgr is started.
				275
				276
				277	.. code-block:: yaml
				278
				279	ceph:
				280	mgr:
				281	enabled: true
				282	dashboard:
				283	enabled: true
				284	host: 10.103.255.252
				285	port: 7000
				286
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	287
				288	Ceph OSD (storage) roles
				289	------------------------
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	290
				291	.. code-block:: yaml
				292
				293	ceph:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	294	common:
Jiri Broulik	ec62dec	2017-10-10 13:45:15 +0200	[diff] [blame]	295	version: luminous
				296	fsid: a619c5fc-c4ed-4f22-9ed2-66cf2feca23d
				297	public_network: 10.0.0.0/24, 10.1.0.0/24
				298	cluster_network: 10.10.0.0/24, 10.11.0.0/24
				299	keyring:
				300	bootstrap-osd:
				301	caps:
				302	mon: "allow profile bootstrap-osd"
				303	....
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	304	osd:
				305	enabled: true
Jiri Broulik	ec62dec	2017-10-10 13:45:15 +0200	[diff] [blame]	306	crush_parent: rack01
				307	journal_size: 20480 (20G)
				308	bluestore_block_db_size: 10073741824 (10G)
				309	bluestore_block_wal_size: 10073741824 (10G)
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	310	bluestore_block_size: 807374182400 (800G)
				311	backend:
				312	filestore:
				313	disks:
				314	- dev: /dev/sdm
				315	enabled: false
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	316	journal: /dev/ssd
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	317	class: bestssd
				318	weight: 1.5
Jiri Broulik	58ff84b	2017-11-21 14:23:51 +0100	[diff] [blame]	319	dmcrypt: true
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	320	- dev: /dev/sdl
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	321	journal: /dev/ssd
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	322	class: bestssd
				323	weight: 1.5
				324	bluestore:
				325	disks:
				326	- dev: /dev/sdb
				327	- dev: /dev/sdc
				328	block_db: /dev/ssd
				329	block_wal: /dev/ssd
Jiri Broulik	c2be93b	2017-10-03 14:20:00 +0200	[diff] [blame]	330	class: ssd
				331	weight: 1.666
Jiri Broulik	58ff84b	2017-11-21 14:23:51 +0100	[diff] [blame]	332	dmcrypt: true
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	333	- dev: /dev/sdd
				334	enabled: false
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	335
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	336
Jiri Broulik	c2be93b	2017-10-03 14:20:00 +0200	[diff] [blame]	337	Ceph client roles - ...Deprecated - use ceph:common instead
				338	--------------------------------------------------------
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	339
				340	Simple ceph client service
Simon Pasquier	f8e6f9e	2017-07-03 10:15:20 +0200	[diff] [blame]	341
				342	.. code-block:: yaml
				343
				344	ceph:
				345	client:
				346	config:
				347	global:
				348	mon initial members: ceph1,ceph2,ceph3
				349	mon host: 10.103.255.252:6789,10.103.255.253:6789,10.103.255.254:6789
				350	keyring:
				351	monitoring:
				352	key: 00000000000000000000000000000000000000==
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	353
				354	At OpenStack control settings are usually located at cinder-volume or glance-
				355	registry services.
				356
				357	.. code-block:: yaml
				358
				359	ceph:
				360	client:
				361	config:
				362	global:
				363	fsid: 00000000-0000-0000-0000-000000000000
				364	mon initial members: ceph1,ceph2,ceph3
				365	mon host: 10.103.255.252:6789,10.103.255.253:6789,10.103.255.254:6789
				366	osd_fs_mkfs_arguments_xfs:
				367	osd_fs_mount_options_xfs: rw,noatime
				368	network public: 10.0.0.0/24
				369	network cluster: 10.0.0.0/24
				370	osd_fs_type: xfs
				371	osd:
				372	osd journal size: 7500
				373	filestore xattr use omap: true
				374	mon:
				375	mon debug dump transactions: false
				376	keyring:
				377	cinder:
				378	key: 00000000000000000000000000000000000000==
				379	glance:
				380	key: 00000000000000000000000000000000000000==
				381
				382
				383	Ceph gateway
				384	------------
				385
				386	Rados gateway with keystone v2 auth backend
				387
				388	.. code-block:: yaml
				389
				390	ceph:
				391	radosgw:
				392	enabled: true
				393	hostname: gw.ceph.lab
				394	bind:
				395	address: 10.10.10.1
				396	port: 8080
				397	identity:
				398	engine: keystone
				399	api_version: 2
				400	host: 10.10.10.100
				401	port: 5000
				402	user: admin
				403	password: password
				404	tenant: admin
				405
				406	Rados gateway with keystone v3 auth backend
				407
				408	.. code-block:: yaml
				409
				410	ceph:
				411	radosgw:
				412	enabled: true
				413	hostname: gw.ceph.lab
				414	bind:
				415	address: 10.10.10.1
				416	port: 8080
				417	identity:
				418	engine: keystone
				419	api_version: 3
				420	host: 10.10.10.100
				421	port: 5000
				422	user: admin
				423	password: password
				424	project: admin
				425	domain: default
				426
				427
				428	Ceph setup role
				429	---------------
				430
				431	Replicated ceph storage pool
				432
				433	.. code-block:: yaml
				434
				435	ceph:
				436	setup:
				437	pool:
				438	replicated_pool:
				439	pg_num: 256
				440	pgp_num: 256
				441	type: replicated
Jiri Broulik	97af8ab	2017-10-12 14:32:51 +0200	[diff] [blame]	442	crush_rule: sata
				443	application: rbd
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	444
Jiri Broulik	eaf4147	2017-10-18 09:56:33 +0200	[diff] [blame]	445	.. note:: For Kraken and earlier releases please specify crush_rule as a ruleset number.
				446	For Kraken and earlier releases application param is not needed.
				447
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	448	Erasure ceph storage pool
				449
				450	.. code-block:: yaml
				451
				452	ceph:
				453	setup:
				454	pool:
				455	erasure_pool:
				456	pg_num: 256
				457	pgp_num: 256
				458	type: erasure
Jiri Broulik	97af8ab	2017-10-12 14:32:51 +0200	[diff] [blame]	459	crush_rule: ssd
				460	application: rbd
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	461
Jiri Broulik	d68e33a	2017-10-24 10:54:43 +0200	[diff] [blame]	462
Jiri Broulik	e4ba9f6	2017-11-08 11:33:00 +0100	[diff] [blame]	463	Inline compression for Bluestore backend
				464
				465	.. code-block:: yaml
				466
				467	ceph:
				468	setup:
				469	pool:
				470	volumes:
				471	pg_num: 256
				472	pgp_num: 256
				473	type: replicated
				474	crush_rule: hdd
				475	application: rbd
				476	compression_algorithm: snappy
				477	compression_mode: aggressive
				478	compression_required_ratio: .875
				479	...
				480
				481
Jiri Broulik	d68e33a	2017-10-24 10:54:43 +0200	[diff] [blame]	482	Ceph manage keyring keys
				483	------------------------
				484
				485	Keyrings are dynamically generated unless specified by the following pillar.
				486
				487	.. code-block:: yaml
				488
				489	ceph:
				490	common:
				491	manage_keyring: true
				492	keyring:
				493	glance:
				494	name: images
				495	key: AACf3ulZFFPNDxAAd2DWds3aEkHh4IklZVgIaQ==
				496	caps:
				497	mon: "allow r"
				498	osd: "allow class-read object_prefix rdb_children, allow rwx pool=images"
				499
				500
Jiri Broulik	97af8ab	2017-10-12 14:32:51 +0200	[diff] [blame]	501	Generate CRUSH map - Recommended way
				502	-----------------------------------
Tomáš Kukrál	363d37d	2017-08-17 13:40:20 +0200	[diff] [blame]	503
Jiri Broulik	97af8ab	2017-10-12 14:32:51 +0200	[diff] [blame]	504	It is required to define the `type` for crush buckets and these types must start with `root` (top) and end with `host`. OSD daemons will be assigned to hosts according to it's hostname. Weight of the buckets will be calculated according to weight of it's children.
				505
				506	If the pools that are in use have size of 3 it is best to have 3 children of a specific type in the root CRUSH tree to replicate objects across (Specified in rule steps by 'type region').
Tomáš Kukrál	363d37d	2017-08-17 13:40:20 +0200	[diff] [blame]	507
				508	.. code-block:: yaml
				509
Jiri Broulik	97af8ab	2017-10-12 14:32:51 +0200	[diff] [blame]	510	ceph:
				511	setup:
				512	crush:
				513	enabled: True
				514	tunables:
				515	choose_total_tries: 50
				516	choose_local_tries: 0
				517	choose_local_fallback_tries: 0
				518	chooseleaf_descend_once: 1
				519	chooseleaf_vary_r: 1
				520	chooseleaf_stable: 1
				521	straw_calc_version: 1
				522	allowed_bucket_algs: 54
				523	type:
				524	- root
				525	- region
				526	- rack
				527	- host
Jiri Broulik	eaf4147	2017-10-18 09:56:33 +0200	[diff] [blame]	528	- osd
Jiri Broulik	97af8ab	2017-10-12 14:32:51 +0200	[diff] [blame]	529	root:
				530	- name: root-ssd
				531	- name: root-sata
				532	region:
				533	- name: eu-1
				534	parent: root-sata
				535	- name: eu-2
				536	parent: root-sata
				537	- name: eu-3
				538	parent: root-ssd
				539	- name: us-1
				540	parent: root-sata
				541	rack:
				542	- name: rack01
				543	parent: eu-1
				544	- name: rack02
				545	parent: eu-2
				546	- name: rack03
				547	parent: us-1
				548	rule:
				549	sata:
				550	ruleset: 0
				551	type: replicated
				552	min_size: 1
				553	max_size: 10
				554	steps:
				555	- take take root-ssd
				556	- chooseleaf firstn 0 type region
				557	- emit
				558	ssd:
				559	ruleset: 1
				560	type: replicated
				561	min_size: 1
				562	max_size: 10
				563	steps:
				564	- take take root-sata
				565	- chooseleaf firstn 0 type region
				566	- emit
				567
				568
				569	Generate CRUSH map - Alternative way
				570	------------------------------------
				571
				572	It's necessary to create per OSD pillar.
				573
				574	.. code-block:: yaml
				575
				576	ceph:
				577	osd:
				578	crush:
				579	- type: root
				580	name: root1
				581	- type: region
				582	name: eu-1
				583	- type: rack
				584	name: rack01
				585	- type: host
				586	name: osd001
				587
				588
				589	Apply CRUSH map
				590	---------------
				591
				592	Before you apply CRUSH map please make sure that settings in generated file in /etc/ceph/crushmap are correct.
				593
				594	.. code-block:: yaml
				595
				596	ceph:
				597	setup:
				598	crush:
				599	enforce: true
				600	pool:
				601	images:
				602	crush_rule: sata
				603	application: rbd
				604	volumes:
				605	crush_rule: sata
				606	application: rbd
				607	vms:
				608	crush_rule: ssd
				609	application: rbd
				610
Jiri Broulik	eaf4147	2017-10-18 09:56:33 +0200	[diff] [blame]	611	.. note:: For Kraken and earlier releases please specify crush_rule as a ruleset number.
				612	For Kraken and earlier releases application param is not needed.
				613
Jiri Broulik	97af8ab	2017-10-12 14:32:51 +0200	[diff] [blame]	614
				615	Persist CRUSH map
				616	--------------------
				617
				618	After the CRUSH map is applied to Ceph it's recommended to persist the same settings even after OSD reboots.
				619
				620	.. code-block:: yaml
				621
				622	ceph:
				623	osd:
				624	crush_update: false
				625
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	626
				627	Ceph monitoring
				628	---------------
				629
Jiri Broulik	4457407	2017-11-14 12:27:39 +0100	[diff] [blame]	630	By default monitoring is setup to collect information from MON and OSD nodes. To change the default values add the following pillar to MON nodes.
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	631
				632	.. code-block:: yaml
				633
				634	ceph:
Simon Pasquier	f8e6f9e	2017-07-03 10:15:20 +0200	[diff] [blame]	635	monitoring:
Jiri Broulik	4457407	2017-11-14 12:27:39 +0100	[diff] [blame]	636	space_used_warning_threshold: 0.75
				637	space_used_critical_threshold: 0.85
				638	apply_latency_threshold: 0.007
				639	commit_latency_threshold: 0.7
				640	pool_space_used_utilization_warning_threshold: 0.75
				641	pool_space_used_critical_threshold: 0.85
				642	pool_write_ops_threshold: 200
				643	pool_write_bytes_threshold: 70000000
				644	pool_read_bytes_threshold: 70000000
				645	pool_read_ops_threshold: 1000
Simon Pasquier	f8e6f9e	2017-07-03 10:15:20 +0200	[diff] [blame]	646
Mateusz Los	4dd8c4f	2017-12-01 09:53:02 +0100	[diff] [blame^]	647	Ceph monitor backups
				648	--------------------
				649
				650	Backup client with ssh/rsync remote host
				651
				652	.. code-block:: yaml
				653
				654	ceph:
				655	backup:
				656	client:
				657	enabled: true
				658	full_backups_to_keep: 3
				659	hours_before_full: 24
				660	target:
				661	host: cfg01
				662
				663
				664	Backup client with local backup only
				665
				666	.. code-block:: yaml
				667
				668	ceph:
				669	backup:
				670	client:
				671	enabled: true
				672	full_backups_to_keep: 3
				673	hours_before_full: 24
				674
				675	Backup server rsync
				676
				677	.. code-block:: yaml
				678
				679	ceph:
				680	backup:
				681	server:
				682	enabled: true
				683	hours_before_full: 24
				684	full_backups_to_keep: 5
				685	key:
				686	ceph_pub_key:
				687	enabled: true
				688	key: ssh_rsa
				689
				690
Simon Pasquier	f8e6f9e	2017-07-03 10:15:20 +0200	[diff] [blame]	691
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	692	More information
				693	================
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	694
				695	* https://github.com/cloud-ee/ceph-salt-formula
				696	* http://ceph.com/ceph-storage/
jan kaufman	4f7757b	2015-06-12 10:49:00 +0200	[diff] [blame]	697	* http://ceph.com/docs/master/start/intro/
Filip Pytloun	32841d7	2017-02-02 13:02:03 +0100	[diff] [blame]	698
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	699
				700	Documentation and bugs
Filip Pytloun	32841d7	2017-02-02 13:02:03 +0100	[diff] [blame]	701	======================
				702
				703	To learn how to install and update salt-formulas, consult the documentation
				704	available online at:
				705
				706	http://salt-formulas.readthedocs.io/
				707
				708	In the unfortunate event that bugs are discovered, they should be reported to
				709	the appropriate issue tracker. Use Github issue tracker for specific salt
				710	formula:
				711
				712	https://github.com/salt-formulas/salt-formula-ceph/issues
				713
				714	For feature requests, bug reports or blueprints affecting entire ecosystem,
				715	use Launchpad salt-formulas project:
				716
				717	https://launchpad.net/salt-formulas
				718
				719	You can also join salt-formulas-users team and subscribe to mailing list:
				720
				721	https://launchpad.net/~salt-formulas-users
				722
				723	Developers wishing to work on the salt-formulas projects should always base
				724	their work on master branch and submit pull request against specific formula.
				725
				726	https://github.com/salt-formulas/salt-formula-ceph
				727
				728	Any questions or feedback is always welcome so feel free to join our IRC
				729	channel:
				730
				731	#salt-formulas @ irc.freenode.net