Blame - README.rst - salt-formulas/ceph

blob: d5e1c985a398bb3597f07a32c53c842d3cf16e0c [file] [log] [blame]

jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	1
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	2	============
				3	Ceph formula
				4	============
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	5
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	6	Ceph provides extraordinary data storage scalability. Thousands of client
				7	hosts or KVMs accessing petabytes to exabytes of data. Each one of your
				8	applications can use the object, block or file system interfaces to the same
				9	RADOS cluster simultaneously, which means your Ceph storage system serves as a
				10	flexible foundation for all of your data storage needs.
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	11
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	12	Use salt-formula-linux for initial disk partitioning.
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	13
				14
Tomáš Kukrál	d2b8297	2017-08-29 12:45:45 +0200	[diff] [blame]	15	Daemons
				16	--------
				17
				18	Ceph uses several daemons to handle data and cluster state. Each daemon type requires different computing capacity and hardware optimization.
				19
				20	These daemons are currently supported by formula:
				21
				22	* MON (`ceph.mon`)
				23	* OSD (`ceph.osd`)
				24	* RGW (`ceph.radosgw`)
				25
				26
				27	Architecture decisions
				28	-----------------------
				29
				30	Please refer to upstream achritecture documents before designing your cluster. Solid understanding of Ceph principles is essential for making architecture decisions described bellow.
				31	http://docs.ceph.com/docs/master/architecture/
				32
				33	* Ceph version
				34
				35	There is 3 or 4 stable releases every year and many of nighty/dev release. You should decide which version will be used since the only stable releases are recommended for production. Some of the releases are marked LTS (Long Term Stable) and these releases receive bugfixed for longer period - usually until next LTS version is released.
				36
				37	* Number of MON daemons
				38
				39	Use 1 MON daemon for testing, 3 MONs for smaller production clusters and 5 MONs for very large production cluster. There is no need to have more than 5 MONs in normal environment because there isn't any significant benefit in running more than 5 MONs. Ceph require MONS to form quorum so you need to heve more than 50% of the MONs up and running to have fully operational cluster. Every I/O operation will stop once less than 50% MONs is availabe because they can't form quorum.
				40
				41	* Number of PGs
				42
				43	Placement groups are providing mappping between stored data and OSDs. It is necessary to calculate number of PGs because there should be stored decent amount of PGs on each OSD. Please keep in mind decreasing number of PGs isn't possible and increading can affect cluster performance.
				44
				45	http://docs.ceph.com/docs/master/rados/operations/placement-groups/
				46	http://ceph.com/pgcalc/
				47
				48	* Daemon colocation
				49
				50	It is recommended to dedicate nodes for MONs and RWG since colocation can have and influence on cluster operations. Howerver, small clusters can be running MONs on OSD node but it is critical to have enough of resources for MON daemons because they are the most important part of the cluster.
				51
				52	Installing RGW on node with other daemons isn't recommended because RGW daemon usually require a lot of bandwith and it harm cluster health.
				53
				54	* Journal location
				55
				56	There are two way to setup journal:
				57	* Colocated journal is located (usually at the beginning) on the same disk as partition for the data. This setup is easier for installation and it doesn't require any other disk to be used. However, colocated setup is significantly slower than dedicated)
				58	* Dedicate journal is placed on different disk than data. This setup can deliver much higher performance than colocated but it require to have more disks in servers. Journal drives should be carefully selected because high I/O and durability is required.
				59
				60	* Store type (Bluestore/Filestore)
				61
				62	Recent version of Ceph support Bluestore as storage backend and backend should be used if available.
				63
				64	http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/
				65
				66	* Cluster and public network
				67
				68	Ceph cluster is accessed using network and thus you need to have decend capacity to handle all the client. There are two networks required for cluster: public network and cluster network. Public network is used for client connections and MONs and OSDs are listening on this network. Second network ic called cluster networks and this network is used for communication between OSDs.
				69
				70	Both networks should have dedicated interfaces, bonding interfaces and dedicating vlans on bonded interfaces isn't allowed. Good practise is dedicate more throughput for the cluster network because cluster traffic is more important than client traffic.
				71
				72	* Pool parameters (size, min_size, type)
				73
				74	You should setup each pool according to it's expected usage, at least `min_size` and `size` and pool type should be considered.
				75
				76	* Cluster monitoring
				77
				78	* Hardware
				79
				80	Please refer to upstream hardware recommendation guide for general information about hardware.
				81
				82	Ceph servers are required to fulfil special requirements becauce load generated by Ceph can be diametrically opposed to common load.
				83
				84	http://docs.ceph.com/docs/master/start/hardware-recommendations/
				85
				86
				87	Basic management commands
				88	------------------------------
				89
				90	Cluster
				91	********
				92
				93	- :code:`ceph health` - check if cluster is healthy (:code:`ceph health detail` can provide more information)
				94
				95
				96	.. code-block:: bash
				97
				98	root@c-01:~# ceph health
				99	HEALTH_OK
				100
				101	- :code:`ceph status` - shows basic information about cluster
				102
				103
				104	.. code-block:: bash
				105
				106	root@c-01:~# ceph status
				107	cluster e2dc51ae-c5e4-48f0-afc1-9e9e97dfd650
				108	health HEALTH_OK
				109	monmap e1: 3 mons at {1=192.168.31.201:6789/0,2=192.168.31.202:6789/0,3=192.168.31.203:6789/0}
				110	election epoch 38, quorum 0,1,2 1,2,3
				111	osdmap e226: 6 osds: 6 up, 6 in
				112	pgmap v27916: 400 pgs, 2 pools, 21233 MB data, 5315 objects
				113	121 GB used, 10924 GB / 11058 GB avail
				114	400 active+clean
				115	client io 481 kB/s rd, 132 kB/s wr, 185 op/
				116
				117	MON
				118	****
				119
				120	http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/
				121
				122	OSD
				123	****
				124
				125	http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/
				126
				127	- :code:`ceph osd tree` - show all OSDs and it's state
				128
				129	.. code-block:: bash
				130
				131	root@c-01:~# ceph osd tree
				132	ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
				133	-4 0 host c-04
				134	-1 10.79993 root default
				135	-2 3.59998 host c-01
				136	0 1.79999 osd.0 up 1.00000 1.00000
				137	1 1.79999 osd.1 up 1.00000 1.00000
				138	-3 3.59998 host c-02
				139	2 1.79999 osd.2 up 1.00000 1.00000
				140	3 1.79999 osd.3 up 1.00000 1.00000
				141	-5 3.59998 host c-03
				142	4 1.79999 osd.4 up 1.00000 1.00000
				143	5 1.79999 osd.5 up 1.00000 1.00000
				144
				145	- :code:`ceph osd pools ls` - list of pool
				146
				147	.. code-block:: bash
				148
				149	root@c-01:~# ceph osd lspools
				150	0 rbd,1 test
				151
				152	PG
				153	***
				154
				155	http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg
				156
				157	- :code:`ceph pg ls` - list placement groups
				158
				159	.. code-block:: bash
				160
				161	root@c-01:~# ceph pg ls \| head -n 4
				162	pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
				163	0.0 11 0 0 0 0 46137344 3044 3044 active+clean 2015-07-02 10:12:40.603692 226'10652 226:1798 [4,2,0] 4 [4,2,0] 4 0'0 2015-07-01 18:38:33.126953 0'0 2015-07-01 18:17:01.904194
				164	0.1 7 0 0 0 0 25165936 3026 3026 active+clean 2015-07-02 10:12:40.585833 226'5808 226:1070 [2,4,1] 2 [2,4,1] 2 0'0 2015-07-01 18:38:32.352721 0'0 2015-07-01 18:17:01.904198
				165	0.2 18 0 0 0 0 75497472 3039 3039 active+clean 2015-07-02 10:12:39.569630 226'17447 226:3213 [3,1,5] 3 [3,1,5] 3 0'0 2015-07-01 18:38:34.308228 0'0 2015-07-01 18:17:01.904199
				166
				167	- :code:`ceph pg map 1.1` - show mapping between PG and OSD
				168
				169	.. code-block:: bash
				170
				171	root@c-01:~# ceph pg map 1.1
				172	osdmap e226 pg 1.1 (1.1) -> up [5,1,2] acting [5,1,2]
				173
				174
				175
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	176	Sample pillars
				177	==============
				178
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	179	Common metadata for all nodes/roles
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	180
				181	.. code-block:: yaml
				182
				183	ceph:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	184	common:
				185	version: kraken
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	186	config:
				187	global:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	188	param1: value1
				189	param2: value1
				190	param3: value1
				191	pool_section:
				192	param1: value2
				193	param2: value2
				194	param3: value2
				195	fsid: a619c5fc-c4ed-4f22-9ed2-66cf2feca23d
				196	members:
				197	- name: cmn01
				198	host: 10.0.0.1
				199	- name: cmn02
				200	host: 10.0.0.2
				201	- name: cmn03
				202	host: 10.0.0.3
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	203	keyring:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	204	admin:
				205	key: AQBHPYhZv5mYDBAAvisaSzCTQkC5gywGUp/voA==
				206	caps:
				207	mds: "allow *"
				208	mgr: "allow *"
				209	mon: "allow *"
				210	osd: "allow *"
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	211
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	212	Optional definition for cluster and public networks. Cluster network is used
				213	for replication. Public network for front-end communication.
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	214
				215	.. code-block:: yaml
				216
				217	ceph:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	218	common:
				219	version: kraken
				220	fsid: a619c5fc-c4ed-4f22-9ed2-66cf2feca23d
				221	....
				222	public_network: 10.0.0.0/24, 10.1.0.0/24
				223	cluster_network: 10.10.0.0/24, 10.11.0.0/24
				224
				225
				226	Ceph mon (control) roles
				227	------------------------
				228
				229	Monitors: A Ceph Monitor maintains maps of the cluster state, including the
				230	monitor map, the OSD map, the Placement Group (PG) map, and the CRUSH map.
				231	Ceph maintains a history (called an “epoch”) of each state change in the Ceph
				232	Monitors, Ceph OSD Daemons, and PGs.
				233
				234	.. code-block:: yaml
				235
				236	ceph:
				237	common:
				238	config:
				239	mon:
				240	key: value
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	241	mon:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	242	enabled: true
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	243	keyring:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	244	mon:
				245	key: AQAnQIhZ6in5KxAAdf467upoRMWFcVg5pbh1yg==
				246	caps:
				247	mon: "allow *"
				248	admin:
				249	key: AQBHPYhZv5mYDBAAvisaSzCTQkC5gywGUp/voA==
				250	caps:
				251	mds: "allow *"
				252	mgr: "allow *"
				253	mon: "allow *"
				254	osd: "allow *"
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	255
Ondrej Smola	91c8316	2017-09-12 16:40:02 +0200	[diff] [blame]	256	Ceph mgr roles
				257	------------------------
				258
				259	The Ceph Manager daemon (ceph-mgr) runs alongside monitor daemons, to provide additional monitoring and interfaces to external monitoring and management systems. Since the 12.x (luminous) Ceph release, the ceph-mgr daemon is required for normal operations. The ceph-mgr daemon is an optional component in the 11.x (kraken) Ceph release.
				260
				261	By default, the manager daemon requires no additional configuration, beyond ensuring it is running. If there is no mgr daemon running, you will see a health warning to that effect, and some of the other information in the output of ceph status will be missing or stale until a mgr is started.
				262
				263
				264	.. code-block:: yaml
				265
				266	ceph:
				267	mgr:
				268	enabled: true
				269	dashboard:
				270	enabled: true
				271	host: 10.103.255.252
				272	port: 7000
				273
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	274
				275	Ceph OSD (storage) roles
				276	------------------------
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	277
				278	.. code-block:: yaml
				279
				280	ceph:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	281	common:
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	282	config:
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	283	osd:
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	284	key: value
				285	osd:
				286	enabled: true
				287	host_id: 10
				288	copy_admin_key: true
				289	journal_type: raw
				290	dmcrypt: disable
				291	osd_scenario: raw_journal_devices
				292	fs_type: xfs
				293	disk:
				294	'00':
				295	rule: hdd
				296	dev: /dev/vdb2
				297	journal: /dev/vdb1
				298	class: besthdd
				299	weight: 1.5
				300	'01':
				301	rule: hdd
				302	dev: /dev/vdc2
				303	journal: /dev/vdc1
				304	class: besthdd
				305	weight: 1.5
				306	'02':
				307	rule: hdd
				308	dev: /dev/vdd2
				309	journal: /dev/vdd1
				310	class: besthdd
				311	weight: 1.5
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	312
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	313
				314	Ceph client roles
				315	-----------------
				316
				317	Simple ceph client service
Simon Pasquier	f8e6f9e	2017-07-03 10:15:20 +0200	[diff] [blame]	318
				319	.. code-block:: yaml
				320
				321	ceph:
				322	client:
				323	config:
				324	global:
				325	mon initial members: ceph1,ceph2,ceph3
				326	mon host: 10.103.255.252:6789,10.103.255.253:6789,10.103.255.254:6789
				327	keyring:
				328	monitoring:
				329	key: 00000000000000000000000000000000000000==
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	330
				331	At OpenStack control settings are usually located at cinder-volume or glance-
				332	registry services.
				333
				334	.. code-block:: yaml
				335
				336	ceph:
				337	client:
				338	config:
				339	global:
				340	fsid: 00000000-0000-0000-0000-000000000000
				341	mon initial members: ceph1,ceph2,ceph3
				342	mon host: 10.103.255.252:6789,10.103.255.253:6789,10.103.255.254:6789
				343	osd_fs_mkfs_arguments_xfs:
				344	osd_fs_mount_options_xfs: rw,noatime
				345	network public: 10.0.0.0/24
				346	network cluster: 10.0.0.0/24
				347	osd_fs_type: xfs
				348	osd:
				349	osd journal size: 7500
				350	filestore xattr use omap: true
				351	mon:
				352	mon debug dump transactions: false
				353	keyring:
				354	cinder:
				355	key: 00000000000000000000000000000000000000==
				356	glance:
				357	key: 00000000000000000000000000000000000000==
				358
				359
				360	Ceph gateway
				361	------------
				362
				363	Rados gateway with keystone v2 auth backend
				364
				365	.. code-block:: yaml
				366
				367	ceph:
				368	radosgw:
				369	enabled: true
				370	hostname: gw.ceph.lab
				371	bind:
				372	address: 10.10.10.1
				373	port: 8080
				374	identity:
				375	engine: keystone
				376	api_version: 2
				377	host: 10.10.10.100
				378	port: 5000
				379	user: admin
				380	password: password
				381	tenant: admin
				382
				383	Rados gateway with keystone v3 auth backend
				384
				385	.. code-block:: yaml
				386
				387	ceph:
				388	radosgw:
				389	enabled: true
				390	hostname: gw.ceph.lab
				391	bind:
				392	address: 10.10.10.1
				393	port: 8080
				394	identity:
				395	engine: keystone
				396	api_version: 3
				397	host: 10.10.10.100
				398	port: 5000
				399	user: admin
				400	password: password
				401	project: admin
				402	domain: default
				403
				404
				405	Ceph setup role
				406	---------------
				407
				408	Replicated ceph storage pool
				409
				410	.. code-block:: yaml
				411
				412	ceph:
				413	setup:
				414	pool:
				415	replicated_pool:
				416	pg_num: 256
				417	pgp_num: 256
				418	type: replicated
				419	crush_ruleset_name: 0
				420
				421	Erasure ceph storage pool
				422
				423	.. code-block:: yaml
				424
				425	ceph:
				426	setup:
				427	pool:
				428	erasure_pool:
				429	pg_num: 256
				430	pgp_num: 256
				431	type: erasure
				432	crush_ruleset_name: 0
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	433
Tomáš Kukrál	363d37d	2017-08-17 13:40:20 +0200	[diff] [blame]	434	Generate CRUSH map
Tomáš Kukrál	d2b8297	2017-08-29 12:45:45 +0200	[diff] [blame]	435	--------------------
Tomáš Kukrál	363d37d	2017-08-17 13:40:20 +0200	[diff] [blame]	436
				437	It is required to define the `type` for crush buckets and these types must start with `root` (top) and end with `host`. OSD daemons will be assigned to hosts according to it's hostname. Weight of the buckets will be calculated according to weight of it's childen.
				438
				439	.. code-block:: yaml
				440
Tomáš Kukrál	9ddb95b	2017-08-17 14:18:51 +0200	[diff] [blame]	441	ceph:
				442	setup:
				443	crush:
				444	enabled: True
				445	tunables:
				446	choose_total_tries: 50
				447	type:
				448	- root
				449	- region
				450	- rack
				451	- host
				452	root:
				453	- name: root1
				454	- name: root2
				455	region:
				456	- name: eu-1
				457	parent: root1
				458	- name: eu-2
				459	parent: root1
				460	- name: us-1
				461	parent: root2
				462	rack:
				463	- name: rack01
				464	parent: eu-1
				465	- name: rack02
				466	parent: eu-2
				467	- name: rack03
				468	parent: us-1
				469	rule:
				470	sata:
				471	ruleset: 0
				472	type: replicated
				473	min_size: 1
				474	max_size: 10
				475	steps:
				476	- take crushroot.performanceblock.satahss.1
				477	- choseleaf firstn 0 type failure_domain
				478	- emit
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	479
				480	Ceph monitoring
				481	---------------
				482
				483	Collect general cluster metrics
				484
				485	.. code-block:: yaml
				486
				487	ceph:
Simon Pasquier	f8e6f9e	2017-07-03 10:15:20 +0200	[diff] [blame]	488	monitoring:
				489	cluster_stats:
				490	enabled: true
				491	ceph_user: monitoring
				492
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	493	Collect metrics from monitor and OSD services
Simon Pasquier	f8e6f9e	2017-07-03 10:15:20 +0200	[diff] [blame]	494
				495	.. code-block:: yaml
				496
				497	ceph:
				498	monitoring:
				499	node_stats:
				500	enabled: true
				501
				502
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	503	More information
				504	================
jpavlik	8425d36	2015-06-09 15:23:27 +0200	[diff] [blame]	505
				506	* https://github.com/cloud-ee/ceph-salt-formula
				507	* http://ceph.com/ceph-storage/
jan kaufman	4f7757b	2015-06-12 10:49:00 +0200	[diff] [blame]	508	* http://ceph.com/docs/master/start/intro/
Filip Pytloun	32841d7	2017-02-02 13:02:03 +0100	[diff] [blame]	509
Ondrej Smola	81d1a19	2017-08-17 11:13:10 +0200	[diff] [blame]	510
				511	Documentation and bugs
Filip Pytloun	32841d7	2017-02-02 13:02:03 +0100	[diff] [blame]	512	======================
				513
				514	To learn how to install and update salt-formulas, consult the documentation
				515	available online at:
				516
				517	http://salt-formulas.readthedocs.io/
				518
				519	In the unfortunate event that bugs are discovered, they should be reported to
				520	the appropriate issue tracker. Use Github issue tracker for specific salt
				521	formula:
				522
				523	https://github.com/salt-formulas/salt-formula-ceph/issues
				524
				525	For feature requests, bug reports or blueprints affecting entire ecosystem,
				526	use Launchpad salt-formulas project:
				527
				528	https://launchpad.net/salt-formulas
				529
				530	You can also join salt-formulas-users team and subscribe to mailing list:
				531
				532	https://launchpad.net/~salt-formulas-users
				533
				534	Developers wishing to work on the salt-formulas projects should always base
				535	their work on master branch and submit pull request against specific formula.
				536
				537	https://github.com/salt-formulas/salt-formula-ceph
				538
				539	Any questions or feedback is always welcome so feel free to join our IRC
				540	channel:
				541
				542	#salt-formulas @ irc.freenode.net