Blame - README.rst - salt-formulas/ceph

blob: a41eb287c973089c0b76bed535094db136a1908e [file] [log] [blame]

jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	1
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	2	============
				3	Ceph formula
				4	============
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	5
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	6	Ceph provides extraordinary data storage scalability. Thousands of client
				7	hosts or KVMs accessing petabytes to exabytes of data. Each one of your
				8	applications can use the object, block or file system interfaces to the same
				9	RADOS cluster simultaneously, which means your Ceph storage system serves as a
				10	flexible foundation for all of your data storage needs.
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	11
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	12	Use salt-formula-linux for initial disk partitioning.
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	13
				14
Tomáš Kukrál	d2b8297	2017-08-29 12:45:45 +0200	[diff] [blame]	15	Daemons
				16	--------
				17
				18	Ceph uses several daemons to handle data and cluster state. Each daemon type requires different computing capacity and hardware optimization.
				19
				20	These daemons are currently supported by formula:
				21
				22	* MON (`ceph.mon`)
				23	* OSD (`ceph.osd`)
				24	* RGW (`ceph.radosgw`)
				25
				26
				27	Architecture decisions
				28	-----------------------
				29
				30	Please refer to upstream achritecture documents before designing your cluster. Solid understanding of Ceph principles is essential for making architecture decisions described bellow.
				31	http://docs.ceph.com/docs/master/architecture/
				32
				33	* Ceph version
				34
				35	There is 3 or 4 stable releases every year and many of nighty/dev release. You should decide which version will be used since the only stable releases are recommended for production. Some of the releases are marked LTS (Long Term Stable) and these releases receive bugfixed for longer period - usually until next LTS version is released.
				36
				37	* Number of MON daemons
				38
				39	Use 1 MON daemon for testing, 3 MONs for smaller production clusters and 5 MONs for very large production cluster. There is no need to have more than 5 MONs in normal environment because there isn't any significant benefit in running more than 5 MONs. Ceph require MONS to form quorum so you need to heve more than 50% of the MONs up and running to have fully operational cluster. Every I/O operation will stop once less than 50% MONs is availabe because they can't form quorum.
				40
				41	* Number of PGs
				42
				43	Placement groups are providing mappping between stored data and OSDs. It is necessary to calculate number of PGs because there should be stored decent amount of PGs on each OSD. Please keep in mind decreasing number of PGs isn't possible and increading can affect cluster performance.
				44
				45	http://docs.ceph.com/docs/master/rados/operations/placement-groups/
				46	http://ceph.com/pgcalc/
				47
				48	* Daemon colocation
				49
				50	It is recommended to dedicate nodes for MONs and RWG since colocation can have and influence on cluster operations. Howerver, small clusters can be running MONs on OSD node but it is critical to have enough of resources for MON daemons because they are the most important part of the cluster.
				51
				52	Installing RGW on node with other daemons isn't recommended because RGW daemon usually require a lot of bandwith and it harm cluster health.
				53
Tomáš Kukrál	d2b8297	2017-08-29 12:45:45 +0200	[diff] [blame]	54	* Store type (Bluestore/Filestore)
				55
				56	Recent version of Ceph support Bluestore as storage backend and backend should be used if available.
				57
				58	http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/
				59
Jiri Broulik	cc0d775	2017-11-18 18:58:21 +0100	[diff] [blame]	60	* Block.db location for Bluestore
				61
				62	There are two ways to setup block.db:
				63	* Colocated block.db partition is created on the same disk as partition for the data. This setup is easier for installation and it doesn't require any other disk to be used. However, colocated setup is significantly slower than dedicated)
				64	* Dedicate block.db is placed on different disk than data (or into partition). This setup can deliver much higher performance than colocated but it require to have more disks in servers. Block.db drives should be carefully selected because high I/O and durability is required.
				65
				66	* Block.wal location for Bluestore
				67
				68	There are two ways to setup block.wal - stores just the internal journal (write-ahead log):
				69	* Colocated block.wal uses free space of the block.db device.
				70	* Dedicate block.wal is placed on different disk than data (better put into partition as the size can be small) and possibly block.db device. This setup can deliver much higher performance than colocated but it require to have more disks in servers. Block.wal drives should be carefully selected because high I/O and durability is required.
				71
				72	* Journal location for Filestore
				73
				74	There are two ways to setup journal:
				75	* Colocated journal is created on the same disk as partition for the data. This setup is easier for installation and it doesn't require any other disk to be used. However, colocated setup is significantly slower than dedicated)
				76	* Dedicate journal is placed on different disk than data (or into partition). This setup can deliver much higher performance than colocated but it require to have more disks in servers. Journal drives should be carefully selected because high I/O and durability is required.
				77
Tomáš Kukrál	d2b8297	2017-08-29 12:45:45 +0200	[diff] [blame]	78	* Cluster and public network
				79
Mateusz Los	4dd8c4f	2017-12-01 09:53:02 +0100	[diff] [blame]	80	Ceph cluster is accessed using network and thus you need to have decend capacity to handle all the client. There are two networks required for cluster: public network and cluster network. Public network is used for client connections and MONs and OSDs are listening on this network. Second network ic called cluster networks and this network is used for communication between OSDs.
Tomáš Kukrál	d2b8297	2017-08-29 12:45:45 +0200	[diff] [blame]	81
				82	Both networks should have dedicated interfaces, bonding interfaces and dedicating vlans on bonded interfaces isn't allowed. Good practise is dedicate more throughput for the cluster network because cluster traffic is more important than client traffic.
				83
				84	* Pool parameters (size, min_size, type)
				85
				86	You should setup each pool according to it's expected usage, at least `min_size` and `size` and pool type should be considered.
				87
				88	* Cluster monitoring
				89
				90	* Hardware
				91
				92	Please refer to upstream hardware recommendation guide for general information about hardware.
				93
				94	Ceph servers are required to fulfil special requirements becauce load generated by Ceph can be diametrically opposed to common load.
				95
				96	http://docs.ceph.com/docs/master/start/hardware-recommendations/
				97
				98
				99	Basic management commands
				100	------------------------------
				101
				102	Cluster
				103	********
				104
				105	- :code:`ceph health` - check if cluster is healthy (:code:`ceph health detail` can provide more information)
				106
				107
				108	.. code-block:: bash
				109
				110	root@c-01:~# ceph health
				111	HEALTH_OK
				112
				113	- :code:`ceph status` - shows basic information about cluster
				114
				115
				116	.. code-block:: bash
				117
				118	root@c-01:~# ceph status
				119	cluster e2dc51ae-c5e4-48f0-afc1-9e9e97dfd650
				120	health HEALTH_OK
				121	monmap e1: 3 mons at {1=192.168.31.201:6789/0,2=192.168.31.202:6789/0,3=192.168.31.203:6789/0}
				122	election epoch 38, quorum 0,1,2 1,2,3
				123	osdmap e226: 6 osds: 6 up, 6 in
				124	pgmap v27916: 400 pgs, 2 pools, 21233 MB data, 5315 objects
				125	121 GB used, 10924 GB / 11058 GB avail
				126	400 active+clean
				127	client io 481 kB/s rd, 132 kB/s wr, 185 op/
				128
				129	MON
				130	****
				131
				132	http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/
				133
				134	OSD
				135	****
				136
				137	http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/
				138
				139	- :code:`ceph osd tree` - show all OSDs and it's state
				140
				141	.. code-block:: bash
				142
				143	root@c-01:~# ceph osd tree
				144	ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
				145	-4 0 host c-04
				146	-1 10.79993 root default
				147	-2 3.59998 host c-01
				148	0 1.79999 osd.0 up 1.00000 1.00000
				149	1 1.79999 osd.1 up 1.00000 1.00000
				150	-3 3.59998 host c-02
				151	2 1.79999 osd.2 up 1.00000 1.00000
				152	3 1.79999 osd.3 up 1.00000 1.00000
				153	-5 3.59998 host c-03
				154	4 1.79999 osd.4 up 1.00000 1.00000
				155	5 1.79999 osd.5 up 1.00000 1.00000
				156
				157	- :code:`ceph osd pools ls` - list of pool
				158
				159	.. code-block:: bash
				160
				161	root@c-01:~# ceph osd lspools
				162	0 rbd,1 test
				163
				164	PG
				165	***
				166
				167	http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg
				168
				169	- :code:`ceph pg ls` - list placement groups
				170
				171	.. code-block:: bash
				172
				173	root@c-01:~# ceph pg ls \| head -n 4
				174	pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
				175	0.0 11 0 0 0 0 46137344 3044 3044 active+clean 2015-07-02 10:12:40.603692 226'10652 226:1798 [4,2,0] 4 [4,2,0] 4 0'0 2015-07-01 18:38:33.126953 0'0 2015-07-01 18:17:01.904194
				176	0.1 7 0 0 0 0 25165936 3026 3026 active+clean 2015-07-02 10:12:40.585833 226'5808 226:1070 [2,4,1] 2 [2,4,1] 2 0'0 2015-07-01 18:38:32.352721 0'0 2015-07-01 18:17:01.904198
				177	0.2 18 0 0 0 0 75497472 3039 3039 active+clean 2015-07-02 10:12:39.569630 226'17447 226:3213 [3,1,5] 3 [3,1,5] 3 0'0 2015-07-01 18:38:34.308228 0'0 2015-07-01 18:17:01.904199
				178
				179	- :code:`ceph pg map 1.1` - show mapping between PG and OSD
				180
				181	.. code-block:: bash
				182
				183	root@c-01:~# ceph pg map 1.1
				184	osdmap e226 pg 1.1 (1.1) -> up [5,1,2] acting [5,1,2]
				185
				186
				187
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	188	Sample pillars
				189	==============
				190
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	191	Common metadata for all nodes/roles
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	192
				193	.. code-block:: yaml
				194
				195	ceph:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	196	common:
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	197	version: luminous
Jiri Broulik	4255205	2018-02-15 15:23:29 +0100	[diff] [blame]	198	cluster_name: ceph
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	199	config:
				200	global:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	201	param1: value1
				202	param2: value1
				203	param3: value1
				204	pool_section:
				205	param1: value2
				206	param2: value2
				207	param3: value2
				208	fsid: a619c5fc-c4ed-4f22-9ed2-66cf2feca23d
				209	members:
				210	- name: cmn01
				211	host: 10.0.0.1
				212	- name: cmn02
				213	host: 10.0.0.2
				214	- name: cmn03
				215	host: 10.0.0.3
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	216	keyring:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	217	admin:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	218	caps:
				219	mds: "allow *"
				220	mgr: "allow *"
				221	mon: "allow *"
				222	osd: "allow *"
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	223	bootstrap-osd:
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	224	caps:
				225	mon: "allow profile bootstrap-osd"
				226
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	227
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	228	Optional definition for cluster and public networks. Cluster network is used
				229	for replication. Public network for front-end communication.
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	230
				231	.. code-block:: yaml
				232
				233	ceph:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	234	common:
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	235	version: luminous
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	236	fsid: a619c5fc-c4ed-4f22-9ed2-66cf2feca23d
				237	....
				238	public_network: 10.0.0.0/24, 10.1.0.0/24
				239	cluster_network: 10.10.0.0/24, 10.11.0.0/24
				240
				241
				242	Ceph mon (control) roles
				243	------------------------
				244
				245	Monitors: A Ceph Monitor maintains maps of the cluster state, including the
				246	monitor map, the OSD map, the Placement Group (PG) map, and the CRUSH map.
				247	Ceph maintains a history (called an “epoch”) of each state change in the Ceph
				248	Monitors, Ceph OSD Daemons, and PGs.
				249
				250	.. code-block:: yaml
				251
				252	ceph:
				253	common:
				254	config:
				255	mon:
				256	key: value
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	257	mon:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	258	enabled: true
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	259	keyring:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	260	mon:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	261	caps:
				262	mon: "allow *"
				263	admin:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	264	caps:
				265	mds: "allow *"
				266	mgr: "allow *"
				267	mon: "allow *"
				268	osd: "allow *"
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	269
Ondrej Smola	91c8316	2017-09-12 16:40:02 +0200	[diff] [blame]	270	Ceph mgr roles
				271	------------------------
				272
				273	The Ceph Manager daemon (ceph-mgr) runs alongside monitor daemons, to provide additional monitoring and interfaces to external monitoring and management systems. Since the 12.x (luminous) Ceph release, the ceph-mgr daemon is required for normal operations. The ceph-mgr daemon is an optional component in the 11.x (kraken) Ceph release.
				274
				275	By default, the manager daemon requires no additional configuration, beyond ensuring it is running. If there is no mgr daemon running, you will see a health warning to that effect, and some of the other information in the output of ceph status will be missing or stale until a mgr is started.
				276
				277
				278	.. code-block:: yaml
				279
				280	ceph:
				281	mgr:
				282	enabled: true
				283	dashboard:
				284	enabled: true
				285	host: 10.103.255.252
				286	port: 7000
				287
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	288
				289	Ceph OSD (storage) roles
				290	------------------------
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	291
				292	.. code-block:: yaml
				293
				294	ceph:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	295	common:
Jiri Broulik	ec62dec	2017-10-10 13:45:15 +0200	[diff] [blame]	296	version: luminous
				297	fsid: a619c5fc-c4ed-4f22-9ed2-66cf2feca23d
				298	public_network: 10.0.0.0/24, 10.1.0.0/24
				299	cluster_network: 10.10.0.0/24, 10.11.0.0/24
				300	keyring:
				301	bootstrap-osd:
				302	caps:
				303	mon: "allow profile bootstrap-osd"
				304	....
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	305	osd:
				306	enabled: true
Jiri Broulik	ec62dec	2017-10-10 13:45:15 +0200	[diff] [blame]	307	crush_parent: rack01
				308	journal_size: 20480 (20G)
				309	bluestore_block_db_size: 10073741824 (10G)
				310	bluestore_block_wal_size: 10073741824 (10G)
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	311	bluestore_block_size: 807374182400 (800G)
				312	backend:
				313	filestore:
				314	disks:
				315	- dev: /dev/sdm
				316	enabled: false
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	317	journal: /dev/ssd
Jiri Broulik	8870b87	2018-01-24 18:04:25 +0100	[diff] [blame]	318	journal_partition: 5
				319	data_partition: 6
				320	lockbox_partition: 7
				321	data_partition_size: 12000 (MB)
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	322	class: bestssd
Jiri Broulik	8870b87	2018-01-24 18:04:25 +0100	[diff] [blame]	323	weight: 1.666
Jiri Broulik	58ff84b	2017-11-21 14:23:51 +0100	[diff] [blame]	324	dmcrypt: true
Jiri Broulik	8870b87	2018-01-24 18:04:25 +0100	[diff] [blame]	325	journal_dmcrypt: false
				326	- dev: /dev/sdf
				327	journal: /dev/ssd
				328	journal_dmcrypt: true
				329	class: bestssd
				330	weight: 1.666
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	331	- dev: /dev/sdl
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	332	journal: /dev/ssd
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	333	class: bestssd
Jiri Broulik	8870b87	2018-01-24 18:04:25 +0100	[diff] [blame]	334	weight: 1.666
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	335	bluestore:
				336	disks:
				337	- dev: /dev/sdb
Jiri Broulik	8870b87	2018-01-24 18:04:25 +0100	[diff] [blame]	338	- dev: /dev/sdf
				339	block_db: /dev/ssd
				340	block_wal: /dev/ssd
				341	block_db_dmcrypt: true
				342	block_wal_dmcrypt: true
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	343	- dev: /dev/sdc
				344	block_db: /dev/ssd
				345	block_wal: /dev/ssd
Jiri Broulik	8870b87	2018-01-24 18:04:25 +0100	[diff] [blame]	346	data_partition: 1
				347	block_partition: 2
				348	lockbox_partition: 5
				349	block_db_partition: 3
				350	block_wal_partition: 4
Jiri Broulik	c2be93b	2017-10-03 14:20:00 +0200	[diff] [blame]	351	class: ssd
				352	weight: 1.666
Jiri Broulik	58ff84b	2017-11-21 14:23:51 +0100	[diff] [blame]	353	dmcrypt: true
Jiri Broulik	8870b87	2018-01-24 18:04:25 +0100	[diff] [blame]	354	block_db_dmcrypt: false
				355	block_wal_dmcrypt: false
Jiri Broulik	d572904	2017-09-19 20:07:22 +0200	[diff] [blame]	356	- dev: /dev/sdd
				357	enabled: false
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	358
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	359
Mykyta Karpin	37949ba	2018-11-21 12:31:28 +0200	[diff] [blame]	360	In case some custom block devices should be used (like loop devices for testing purpose),
				361	it is needed to indicate proper partition prefix.
				362
				363	.. code-block:: yaml
				364
				365	ceph:
				366	osd:
				367	backend:
				368	bluestore:
				369	disks:
				370	- dev: /dev/loop20
				371	block_db: /dev/loop21
				372	data_partition_prefix: 'p'
				373
				374
Jiri Broulik	c2be93b	2017-10-03 14:20:00 +0200	[diff] [blame]	375	Ceph client roles - ...Deprecated - use ceph:common instead
				376	--------------------------------------------------------
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	377
				378	Simple ceph client service
Simon Pasquier	f8e6f9e	2017-07-03 10:15:20 +0200	[diff] [blame]	379
				380	.. code-block:: yaml
				381
				382	ceph:
				383	client:
				384	config:
				385	global:
				386	mon initial members: ceph1,ceph2,ceph3
				387	mon host: 10.103.255.252:6789,10.103.255.253:6789,10.103.255.254:6789
				388	keyring:
				389	monitoring:
				390	key: 00000000000000000000000000000000000000==
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	391
				392	At OpenStack control settings are usually located at cinder-volume or glance-
				393	registry services.
				394
				395	.. code-block:: yaml
				396
				397	ceph:
				398	client:
				399	config:
				400	global:
				401	fsid: 00000000-0000-0000-0000-000000000000
				402	mon initial members: ceph1,ceph2,ceph3
				403	mon host: 10.103.255.252:6789,10.103.255.253:6789,10.103.255.254:6789
				404	osd_fs_mkfs_arguments_xfs:
				405	osd_fs_mount_options_xfs: rw,noatime
				406	network public: 10.0.0.0/24
				407	network cluster: 10.0.0.0/24
				408	osd_fs_type: xfs
				409	osd:
				410	osd journal size: 7500
				411	filestore xattr use omap: true
				412	mon:
				413	mon debug dump transactions: false
				414	keyring:
				415	cinder:
				416	key: 00000000000000000000000000000000000000==
				417	glance:
				418	key: 00000000000000000000000000000000000000==
				419
				420
				421	Ceph gateway
				422	------------
				423
				424	Rados gateway with keystone v2 auth backend
				425
				426	.. code-block:: yaml
				427
				428	ceph:
				429	radosgw:
				430	enabled: true
				431	hostname: gw.ceph.lab
				432	bind:
				433	address: 10.10.10.1
				434	port: 8080
				435	identity:
				436	engine: keystone
				437	api_version: 2
				438	host: 10.10.10.100
				439	port: 5000
				440	user: admin
				441	password: password
				442	tenant: admin
				443
				444	Rados gateway with keystone v3 auth backend
				445
				446	.. code-block:: yaml
				447
				448	ceph:
cdodda	9b8362c	2018-04-19 18:06:41 -0500	[diff] [blame]	449	common:
				450	config:
				451	rgw:
				452	key: value
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	453	radosgw:
				454	enabled: true
				455	hostname: gw.ceph.lab
				456	bind:
				457	address: 10.10.10.1
				458	port: 8080
				459	identity:
				460	engine: keystone
				461	api_version: 3
				462	host: 10.10.10.100
				463	port: 5000
				464	user: admin
				465	password: password
				466	project: admin
				467	domain: default
Jiri Broulik	4870e80	2018-06-25 12:14:34 +0200	[diff] [blame]	468	swift:
				469	versioning:
				470	enabled: true
Ivan Berezovskiy	645d444	2018-11-21 17:09:54 +0400	[diff] [blame]	471	enforce_content_length: true
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	472
				473
				474	Ceph setup role
				475	---------------
				476
				477	Replicated ceph storage pool
				478
				479	.. code-block:: yaml
				480
				481	ceph:
				482	setup:
				483	pool:
				484	replicated_pool:
				485	pg_num: 256
				486	pgp_num: 256
				487	type: replicated
Jiri Broulik	97af8ab	2017-10-12 14:32:51 +0200	[diff] [blame]	488	crush_rule: sata
				489	application: rbd
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	490
Jiri Broulik	eaf4147	2017-10-18 09:56:33 +0200	[diff] [blame]	491	.. note:: For Kraken and earlier releases please specify crush_rule as a ruleset number.
				492	For Kraken and earlier releases application param is not needed.
				493
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	494	Erasure ceph storage pool
				495
				496	.. code-block:: yaml
				497
				498	ceph:
				499	setup:
				500	pool:
				501	erasure_pool:
				502	pg_num: 256
				503	pgp_num: 256
				504	type: erasure
Jiri Broulik	97af8ab	2017-10-12 14:32:51 +0200	[diff] [blame]	505	crush_rule: ssd
				506	application: rbd
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	507
Jiri Broulik	d68e33a	2017-10-24 10:54:43 +0200	[diff] [blame]	508
Jiri Broulik	e4ba9f6	2017-11-08 11:33:00 +0100	[diff] [blame]	509	Inline compression for Bluestore backend
				510
				511	.. code-block:: yaml
				512
				513	ceph:
				514	setup:
				515	pool:
				516	volumes:
				517	pg_num: 256
				518	pgp_num: 256
				519	type: replicated
				520	crush_rule: hdd
				521	application: rbd
				522	compression_algorithm: snappy
				523	compression_mode: aggressive
				524	compression_required_ratio: .875
				525	...
				526
				527
Mateusz Los	c398c22	2019-08-09 13:05:26 +0200	[diff] [blame]	528	Ceph manage clients keyring keys
Dzmitry Stremkouski	159f8a1	2020-03-16 03:23:36 +0100	[diff] [blame]	529	--------------------------------
Jiri Broulik	d68e33a	2017-10-24 10:54:43 +0200	[diff] [blame]	530
Mateusz Los	c398c22	2019-08-09 13:05:26 +0200	[diff] [blame]	531	Keyrings are dynamically generated unless specified by the manage_keyring pillar.
Dzmitry Stremkouski	159f8a1	2020-03-16 03:23:36 +0100	[diff] [blame]	532	This settings has effect on admin keyring only if caps are not defined.
Jiri Broulik	d68e33a	2017-10-24 10:54:43 +0200	[diff] [blame]	533
				534	.. code-block:: yaml
				535
				536	ceph:
Dzmitry Stremkouski	159f8a1	2020-03-16 03:23:36 +0100	[diff] [blame]	537	setup:
				538	enabled: true
Jiri Broulik	d68e33a	2017-10-24 10:54:43 +0200	[diff] [blame]	539	common:
				540	manage_keyring: true
				541	keyring:
Dzmitry Stremkouski	159f8a1	2020-03-16 03:23:36 +0100	[diff] [blame]	542	admin:
				543	key: AACf3ulZFFPNDxAAd2DWds3aEkHh4IklZVgIaQ==
				544	mode: 600
Jiri Broulik	d68e33a	2017-10-24 10:54:43 +0200	[diff] [blame]	545	glance:
				546	name: images
				547	key: AACf3ulZFFPNDxAAd2DWds3aEkHh4IklZVgIaQ==
Dzmitry Stremkouski	159f8a1	2020-03-16 03:23:36 +0100	[diff] [blame]	548	user: glance
Jiri Broulik	d68e33a	2017-10-24 10:54:43 +0200	[diff] [blame]	549	caps:
				550	mon: "allow r"
				551	osd: "allow class-read object_prefix rdb_children, allow rwx pool=images"
				552
Mateusz Los	c398c22	2019-08-09 13:05:26 +0200	[diff] [blame]	553	Ceph manage admin keyring
Dzmitry Stremkouski	159f8a1	2020-03-16 03:23:36 +0100	[diff] [blame]	554	-------------------------
				555	To use pre-defined admin key add manage_admin_keyring and admin keyring definition to
				556	ceph mon nodes in cluster_model/ceph/mon.yml
Mateusz Los	c398c22	2019-08-09 13:05:26 +0200	[diff] [blame]	557
Dzmitry Stremkouski	159f8a1	2020-03-16 03:23:36 +0100	[diff] [blame]	558	.. code-block:: yaml
				559
				560	ceph:
				561	common:
				562	manage_admin_keyring: true
				563	keyring:
				564	admin:
				565	caps:
				566	mds: "allow *"
				567	mgr: "allow *"
				568	mon: "allow *"
				569	osd: "allow *"
				570	key: AACf3ulZFFPNDxAAd2DWds3aEkHh4IklZVgIaQ==
Jiri Broulik	d68e33a	2017-10-24 10:54:43 +0200	[diff] [blame]	571
Dzmitry Stremkouski	461ed60	2019-08-20 16:58:02 +0200	[diff] [blame]	572	Specify alternative keyring path and username
				573
				574	.. code-block:: yaml
				575
				576	ceph:
				577	radosgw:
				578	keyring_user: radosgw.gateway
				579	keyring_path: /etc/ceph/keyring.radosgw.gateway
				580
				581
Jiri Broulik	97af8ab	2017-10-12 14:32:51 +0200	[diff] [blame]	582	Generate CRUSH map - Recommended way
				583	-----------------------------------
Tomáš Kukrál	363d37d	2017-08-17 13:40:20 +0200	[diff] [blame]	584
Jiri Broulik	97af8ab	2017-10-12 14:32:51 +0200	[diff] [blame]	585	It is required to define the `type` for crush buckets and these types must start with `root` (top) and end with `host`. OSD daemons will be assigned to hosts according to it's hostname. Weight of the buckets will be calculated according to weight of it's children.
				586
				587	If the pools that are in use have size of 3 it is best to have 3 children of a specific type in the root CRUSH tree to replicate objects across (Specified in rule steps by 'type region').
Tomáš Kukrál	363d37d	2017-08-17 13:40:20 +0200	[diff] [blame]	588
				589	.. code-block:: yaml
				590
Jiri Broulik	97af8ab	2017-10-12 14:32:51 +0200	[diff] [blame]	591	ceph:
				592	setup:
				593	crush:
				594	enabled: True
				595	tunables:
				596	choose_total_tries: 50
				597	choose_local_tries: 0
				598	choose_local_fallback_tries: 0
				599	chooseleaf_descend_once: 1
				600	chooseleaf_vary_r: 1
				601	chooseleaf_stable: 1
				602	straw_calc_version: 1
				603	allowed_bucket_algs: 54
				604	type:
				605	- root
				606	- region
				607	- rack
				608	- host
Jiri Broulik	eaf4147	2017-10-18 09:56:33 +0200	[diff] [blame]	609	- osd
Jiri Broulik	97af8ab	2017-10-12 14:32:51 +0200	[diff] [blame]	610	root:
				611	- name: root-ssd
				612	- name: root-sata
				613	region:
				614	- name: eu-1
				615	parent: root-sata
				616	- name: eu-2
				617	parent: root-sata
				618	- name: eu-3
				619	parent: root-ssd
				620	- name: us-1
				621	parent: root-sata
				622	rack:
				623	- name: rack01
				624	parent: eu-1
				625	- name: rack02
				626	parent: eu-2
				627	- name: rack03
				628	parent: us-1
				629	rule:
				630	sata:
				631	ruleset: 0
				632	type: replicated
				633	min_size: 1
				634	max_size: 10
				635	steps:
				636	- take take root-ssd
				637	- chooseleaf firstn 0 type region
				638	- emit
				639	ssd:
				640	ruleset: 1
				641	type: replicated
				642	min_size: 1
				643	max_size: 10
				644	steps:
				645	- take take root-sata
				646	- chooseleaf firstn 0 type region
				647	- emit
				648
				649
				650	Generate CRUSH map - Alternative way
				651	------------------------------------
				652
				653	It's necessary to create per OSD pillar.
				654
				655	.. code-block:: yaml
				656
				657	ceph:
				658	osd:
				659	crush:
				660	- type: root
				661	name: root1
				662	- type: region
				663	name: eu-1
				664	- type: rack
				665	name: rack01
				666	- type: host
				667	name: osd001
				668
Jiri Broulik	8870b87	2018-01-24 18:04:25 +0100	[diff] [blame]	669	Add OSDs with specific weight
				670	-----------------------------
				671
				672	Add OSD device(s) with initial weight set specifically to certain value.
				673
				674	.. code-block:: yaml
				675
				676	ceph:
				677	osd:
				678	crush_initial_weight: 0
				679
Jiri Broulik	97af8ab	2017-10-12 14:32:51 +0200	[diff] [blame]	680
				681	Apply CRUSH map
				682	---------------
				683
				684	Before you apply CRUSH map please make sure that settings in generated file in /etc/ceph/crushmap are correct.
				685
				686	.. code-block:: yaml
				687
				688	ceph:
				689	setup:
				690	crush:
				691	enforce: true
				692	pool:
				693	images:
				694	crush_rule: sata
				695	application: rbd
				696	volumes:
				697	crush_rule: sata
				698	application: rbd
				699	vms:
				700	crush_rule: ssd
				701	application: rbd
				702
Jiri Broulik	eaf4147	2017-10-18 09:56:33 +0200	[diff] [blame]	703	.. note:: For Kraken and earlier releases please specify crush_rule as a ruleset number.
				704	For Kraken and earlier releases application param is not needed.
				705
Jiri Broulik	97af8ab	2017-10-12 14:32:51 +0200	[diff] [blame]	706
				707	Persist CRUSH map
				708	--------------------
				709
				710	After the CRUSH map is applied to Ceph it's recommended to persist the same settings even after OSD reboots.
				711
				712	.. code-block:: yaml
				713
				714	ceph:
				715	osd:
				716	crush_update: false
				717
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	718
				719	Ceph monitoring
				720	---------------
				721
Jiri Broulik	4457407	2017-11-14 12:27:39 +0100	[diff] [blame]	722	By default monitoring is setup to collect information from MON and OSD nodes. To change the default values add the following pillar to MON nodes.
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	723
				724	.. code-block:: yaml
				725
				726	ceph:
Simon Pasquier	f8e6f9e	2017-07-03 10:15:20 +0200	[diff] [blame]	727	monitoring:
Jiri Broulik	4457407	2017-11-14 12:27:39 +0100	[diff] [blame]	728	space_used_warning_threshold: 0.75
				729	space_used_critical_threshold: 0.85
				730	apply_latency_threshold: 0.007
				731	commit_latency_threshold: 0.7
Machi Hoshino	5068299	2018-09-19 11:49:05 +0900	[diff] [blame]	732	pool:
				733	vms:
				734	pool_space_used_utilization_warning_threshold: 0.75
				735	pool_space_used_critical_threshold: 0.85
				736	pool_write_ops_threshold: 200
				737	pool_write_bytes_threshold: 70000000
				738	pool_read_bytes_threshold: 70000000
				739	pool_read_ops_threshold: 1000
				740	images:
				741	pool_space_used_utilization_warning_threshold: 0.50
				742	pool_space_used_critical_threshold: 0.95
				743	pool_write_ops_threshold: 100
				744	pool_write_bytes_threshold: 50000000
				745	pool_read_bytes_threshold: 50000000
				746	pool_read_ops_threshold: 500
Simon Pasquier	f8e6f9e	2017-07-03 10:15:20 +0200	[diff] [blame]	747
Mateusz Los	4dd8c4f	2017-12-01 09:53:02 +0100	[diff] [blame]	748	Ceph monitor backups
				749	--------------------
				750
				751	Backup client with ssh/rsync remote host
				752
				753	.. code-block:: yaml
				754
				755	ceph:
				756	backup:
				757	client:
				758	enabled: true
				759	full_backups_to_keep: 3
				760	hours_before_full: 24
				761	target:
				762	host: cfg01
Jiri Broulik	44feb04	2018-03-05 12:10:19 +0100	[diff] [blame]	763	backup_dir: server-backup-dir
Mateusz Los	4dd8c4f	2017-12-01 09:53:02 +0100	[diff] [blame]	764
				765	Backup client with local backup only
				766
				767	.. code-block:: yaml
				768
				769	ceph:
				770	backup:
				771	client:
				772	enabled: true
				773	full_backups_to_keep: 3
				774	hours_before_full: 24
				775
Martin Polreich	8d37f28	2018-03-04 17:38:15 +0100	[diff] [blame]	776
				777	Backup client at exact times:
				778
				779	..code-block:: yaml
				780
				781	ceph:
				782	backup:
				783	client:
				784	enabled: true
				785	full_backups_to_keep: 3
				786	incr_before_full: 3
				787	backup_times:
Martin Polreich	fe1b390	2018-04-25 15:32:30 +0200	[diff] [blame]	788	day_of_week: 0
Martin Polreich	8d37f28	2018-03-04 17:38:15 +0100	[diff] [blame]	789	hour: 4
				790	minute: 52
				791	compression: true
				792	compression_threads: 2
				793	database:
				794	user: user
				795	password: password
				796	target:
				797	host: host01
				798
				799	.. note:: Parameters in ``backup_times`` section can be used to set up exact
				800	time the cron job should be executed. In this example, the backup job
				801	would be executed every Sunday at 4:52 AM. If any of the individual
				802	``backup_times`` parameters is not defined, the defalut ``*`` value will be
				803	used. For example, if minute parameter is ``*``, it will run the backup every minute,
				804	which is ususally not desired.
Martin Polreich	fe1b390	2018-04-25 15:32:30 +0200	[diff] [blame]	805	Available parameters are ``day_of_week``, ``day_of_month``, ``month``, ``hour`` and ``minute``.
Martin Polreich	8d37f28	2018-03-04 17:38:15 +0100	[diff] [blame]	806	Please see the crontab reference for further info on how to set these parameters.
				807
				808	.. note:: Please be aware that only ``backup_times`` section OR
				809	``hours_before_full(incr)`` can be defined. If both are defined,
				810	the ``backup_times`` section will be peferred.
				811
				812	.. note:: New parameter ``incr_before_full`` needs to be defined. This
				813	number sets number of incremental backups to be run, before a full backup
				814	is performed.
				815
Mateusz Los	4dd8c4f	2017-12-01 09:53:02 +0100	[diff] [blame]	816	Backup server rsync
				817
				818	.. code-block:: yaml
				819
				820	ceph:
				821	backup:
				822	server:
				823	enabled: true
				824	hours_before_full: 24
				825	full_backups_to_keep: 5
				826	key:
				827	ceph_pub_key:
				828	enabled: true
				829	key: ssh_rsa
				830
Jiri Broulik	62892df	2018-02-28 16:22:00 +0100	[diff] [blame]	831	Backup server without strict client restriction
				832
				833	.. code-block:: yaml
				834
				835	ceph:
				836	backup:
				837	restrict_clients: false
				838
Martin Polreich	8d37f28	2018-03-04 17:38:15 +0100	[diff] [blame]	839	Backup server at exact times:
				840
				841	..code-block:: yaml
				842
				843	ceph:
				844	backup:
				845	server:
				846	enabled: true
				847	full_backups_to_keep: 3
				848	incr_before_full: 3
				849	backup_dir: /srv/backup
				850	backup_times:
Martin Polreich	fe1b390	2018-04-25 15:32:30 +0200	[diff] [blame]	851	day_of_week: 0
Martin Polreich	8d37f28	2018-03-04 17:38:15 +0100	[diff] [blame]	852	hour: 4
				853	minute: 52
				854	key:
				855	ceph_pub_key:
				856	enabled: true
				857	key: key
				858
				859	.. note:: Parameters in ``backup_times`` section can be used to set up exact
				860	time the cron job should be executed. In this example, the backup job
				861	would be executed every Sunday at 4:52 AM. If any of the individual
				862	``backup_times`` parameters is not defined, the defalut ``*`` value will be
				863	used. For example, if minute parameter is ``*``, it will run the backup every minute,
				864	which is ususally not desired.
Martin Polreich	fe1b390	2018-04-25 15:32:30 +0200	[diff] [blame]	865	Available parameters are ``day_of_week``, ``day_of_month``, ``month``, ``hour`` and ``minute``.
Martin Polreich	8d37f28	2018-03-04 17:38:15 +0100	[diff] [blame]	866	Please see the crontab reference for further info on how to set these parameters.
				867
				868	.. note:: Please be aware that only ``backup_times`` section OR
				869	``hours_before_full(incr)`` can be defined. If both are defined, The
				870	``backup_times`` section will be peferred.
				871
				872	.. note:: New parameter ``incr_before_full`` needs to be defined. This
				873	number sets number of incremental backups to be run, before a full backup
				874	is performed.
				875
Mateusz Los	b430779	2019-04-15 14:30:27 +0200	[diff] [blame]	876	Enabling ceph-volume
				877	----------------------------
				878	This feature is tech preview.
				879
				880	You can set the formula to use ceph-volume utility.
				881	The ceph-volume utility is a replacement for ceph-disk, which is deprecated since the Ceph Mimic release.
				882
				883	To enable ceph-volume, set ``osd.lvm_enabled`` to ``True``:
				884
				885	.. code-block:: yaml
				886
				887	ceph:
				888	osd:
				889	lvm_enabled: True
				890
				891	Partitioning for block_db and WAL
				892	----------------------------
				893
				894	You can allow the formula to automatically partition a disk used as block DB or WAL.
				895
				896	.. code-block:: yaml
				897
				898	ceph:
				899	osd:
				900	backend:
				901	bluestore:
				902	create_partitions: true
				903
				904	A partition on DB and WAL devices will be created based on the Ceph OSD definition from the cluster model with sizes defined in
				905	``bluestore_block_db_size`` and ``bluestore_block_wal_size``. For example, the following configuration will create three
				906	partitions on a VDD device with size of 10 GB:
				907	.. code-block:: yaml
				908
				909	ceph:
				910	osd:
				911	backend:
				912	bluestore:
				913	bluestore_block_db_size: 10737418240
				914	disks:
				915	- dev: /dev/vdc1
				916	block_db: /dev/vdd
				917	- dev: /dev/vdc2
				918	block_db: /dev/vdd
				919	- dev: /dev/vdc3
				920	block_db: /dev/vdd
				921
Jiri Broulik	4255205	2018-02-15 15:23:29 +0100	[diff] [blame]	922	Migration from Decapod to salt-formula-ceph
				923	--------------------------------------------
				924
Pavel Cizinsky	03b7af8	2018-12-12 12:01:16 +0100	[diff] [blame]	925	The following configuration will run a python script which will generate ceph config and osd disk mappings to be put in cluster model.
Jiri Broulik	4255205	2018-02-15 15:23:29 +0100	[diff] [blame]	926
				927	.. code-block:: yaml
				928
				929	ceph:
				930	decapod:
				931	ip: 192.168.1.10
				932	user: user
				933	password: psswd
				934	deploy_config_name: ceph
Mateusz Los	4dd8c4f	2017-12-01 09:53:02 +0100	[diff] [blame]	935
Simon Pasquier	f8e6f9e	2017-07-03 10:15:20 +0200	[diff] [blame]	936
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	937	More information
				938	================
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	939
				940	* https://github.com/cloud-ee/ceph-salt-formula
				941	* http://ceph.com/ceph-storage/
jan kaufman	4f7757b	2015-06-12 10:49:00 +0200	[diff] [blame]	942	* http://ceph.com/docs/master/start/intro/