blob: a41eb287c973089c0b76bed535094db136a1908e [file] [log] [blame]
jpavlik8425d362015-06-09 15:23:27 +02001
Ondrej Smola81d1a192017-08-17 11:13:10 +02002============
3Ceph formula
4============
jpavlik8425d362015-06-09 15:23:27 +02005
Ondrej Smola81d1a192017-08-17 11:13:10 +02006Ceph provides extraordinary data storage scalability. Thousands of client
7hosts or KVMs accessing petabytes to exabytes of data. Each one of your
8applications can use the object, block or file system interfaces to the same
9RADOS cluster simultaneously, which means your Ceph storage system serves as a
10flexible foundation for all of your data storage needs.
jpavlik8425d362015-06-09 15:23:27 +020011
Ondrej Smola81d1a192017-08-17 11:13:10 +020012Use salt-formula-linux for initial disk partitioning.
jpavlik8425d362015-06-09 15:23:27 +020013
14
Tomáš Kukráld2b82972017-08-29 12:45:45 +020015Daemons
16--------
17
18Ceph uses several daemons to handle data and cluster state. Each daemon type requires different computing capacity and hardware optimization.
19
20These daemons are currently supported by formula:
21
22* MON (`ceph.mon`)
23* OSD (`ceph.osd`)
24* RGW (`ceph.radosgw`)
25
26
27Architecture decisions
28-----------------------
29
30Please refer to upstream achritecture documents before designing your cluster. Solid understanding of Ceph principles is essential for making architecture decisions described bellow.
31http://docs.ceph.com/docs/master/architecture/
32
33* Ceph version
34
35There is 3 or 4 stable releases every year and many of nighty/dev release. You should decide which version will be used since the only stable releases are recommended for production. Some of the releases are marked LTS (Long Term Stable) and these releases receive bugfixed for longer period - usually until next LTS version is released.
36
37* Number of MON daemons
38
39Use 1 MON daemon for testing, 3 MONs for smaller production clusters and 5 MONs for very large production cluster. There is no need to have more than 5 MONs in normal environment because there isn't any significant benefit in running more than 5 MONs. Ceph require MONS to form quorum so you need to heve more than 50% of the MONs up and running to have fully operational cluster. Every I/O operation will stop once less than 50% MONs is availabe because they can't form quorum.
40
41* Number of PGs
42
43Placement groups are providing mappping between stored data and OSDs. It is necessary to calculate number of PGs because there should be stored decent amount of PGs on each OSD. Please keep in mind *decreasing number of PGs* isn't possible and *increading* can affect cluster performance.
44
45http://docs.ceph.com/docs/master/rados/operations/placement-groups/
46http://ceph.com/pgcalc/
47
48* Daemon colocation
49
50It is recommended to dedicate nodes for MONs and RWG since colocation can have and influence on cluster operations. Howerver, small clusters can be running MONs on OSD node but it is critical to have enough of resources for MON daemons because they are the most important part of the cluster.
51
52Installing RGW on node with other daemons isn't recommended because RGW daemon usually require a lot of bandwith and it harm cluster health.
53
Tomáš Kukráld2b82972017-08-29 12:45:45 +020054* Store type (Bluestore/Filestore)
55
56Recent version of Ceph support Bluestore as storage backend and backend should be used if available.
57
58http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/
59
Jiri Broulikcc0d7752017-11-18 18:58:21 +010060* Block.db location for Bluestore
61
62There are two ways to setup block.db:
63 * **Colocated** block.db partition is created on the same disk as partition for the data. This setup is easier for installation and it doesn't require any other disk to be used. However, colocated setup is significantly slower than dedicated)
64 * **Dedicate** block.db is placed on different disk than data (or into partition). This setup can deliver much higher performance than colocated but it require to have more disks in servers. Block.db drives should be carefully selected because high I/O and durability is required.
65
66* Block.wal location for Bluestore
67
68There are two ways to setup block.wal - stores just the internal journal (write-ahead log):
69 * **Colocated** block.wal uses free space of the block.db device.
70 * **Dedicate** block.wal is placed on different disk than data (better put into partition as the size can be small) and possibly block.db device. This setup can deliver much higher performance than colocated but it require to have more disks in servers. Block.wal drives should be carefully selected because high I/O and durability is required.
71
72* Journal location for Filestore
73
74There are two ways to setup journal:
75 * **Colocated** journal is created on the same disk as partition for the data. This setup is easier for installation and it doesn't require any other disk to be used. However, colocated setup is significantly slower than dedicated)
76 * **Dedicate** journal is placed on different disk than data (or into partition). This setup can deliver much higher performance than colocated but it require to have more disks in servers. Journal drives should be carefully selected because high I/O and durability is required.
77
Tomáš Kukráld2b82972017-08-29 12:45:45 +020078* Cluster and public network
79
Mateusz Los4dd8c4f2017-12-01 09:53:02 +010080Ceph cluster is accessed using network and thus you need to have decend capacity to handle all the client. There are two networks required for cluster: **public** network and cluster network. Public network is used for client connections and MONs and OSDs are listening on this network. Second network ic called **cluster** networks and this network is used for communication between OSDs.
Tomáš Kukráld2b82972017-08-29 12:45:45 +020081
82Both networks should have dedicated interfaces, bonding interfaces and dedicating vlans on bonded interfaces isn't allowed. Good practise is dedicate more throughput for the cluster network because cluster traffic is more important than client traffic.
83
84* Pool parameters (size, min_size, type)
85
86You should setup each pool according to it's expected usage, at least `min_size` and `size` and pool type should be considered.
87
88* Cluster monitoring
89
90* Hardware
91
92Please refer to upstream hardware recommendation guide for general information about hardware.
93
94Ceph servers are required to fulfil special requirements becauce load generated by Ceph can be diametrically opposed to common load.
95
96http://docs.ceph.com/docs/master/start/hardware-recommendations/
97
98
99Basic management commands
100------------------------------
101
102Cluster
103********
104
105- :code:`ceph health` - check if cluster is healthy (:code:`ceph health detail` can provide more information)
106
107
108.. code-block:: bash
109
110 root@c-01:~# ceph health
111 HEALTH_OK
112
113- :code:`ceph status` - shows basic information about cluster
114
115
116.. code-block:: bash
117
118 root@c-01:~# ceph status
119 cluster e2dc51ae-c5e4-48f0-afc1-9e9e97dfd650
120 health HEALTH_OK
121 monmap e1: 3 mons at {1=192.168.31.201:6789/0,2=192.168.31.202:6789/0,3=192.168.31.203:6789/0}
122 election epoch 38, quorum 0,1,2 1,2,3
123 osdmap e226: 6 osds: 6 up, 6 in
124 pgmap v27916: 400 pgs, 2 pools, 21233 MB data, 5315 objects
125 121 GB used, 10924 GB / 11058 GB avail
126 400 active+clean
127 client io 481 kB/s rd, 132 kB/s wr, 185 op/
128
129MON
130****
131
132http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/
133
134OSD
135****
136
137http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/
138
139- :code:`ceph osd tree` - show all OSDs and it's state
140
141.. code-block:: bash
142
143 root@c-01:~# ceph osd tree
144 ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
145 -4 0 host c-04
146 -1 10.79993 root default
147 -2 3.59998 host c-01
148 0 1.79999 osd.0 up 1.00000 1.00000
149 1 1.79999 osd.1 up 1.00000 1.00000
150 -3 3.59998 host c-02
151 2 1.79999 osd.2 up 1.00000 1.00000
152 3 1.79999 osd.3 up 1.00000 1.00000
153 -5 3.59998 host c-03
154 4 1.79999 osd.4 up 1.00000 1.00000
155 5 1.79999 osd.5 up 1.00000 1.00000
156
157- :code:`ceph osd pools ls` - list of pool
158
159.. code-block:: bash
160
161 root@c-01:~# ceph osd lspools
162 0 rbd,1 test
163
164PG
165***
166
167http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg
168
169- :code:`ceph pg ls` - list placement groups
170
171.. code-block:: bash
172
173 root@c-01:~# ceph pg ls | head -n 4
174 pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
175 0.0 11 0 0 0 0 46137344 3044 3044 active+clean 2015-07-02 10:12:40.603692 226'10652 226:1798 [4,2,0] 4 [4,2,0] 4 0'0 2015-07-01 18:38:33.126953 0'0 2015-07-01 18:17:01.904194
176 0.1 7 0 0 0 0 25165936 3026 3026 active+clean 2015-07-02 10:12:40.585833 226'5808 226:1070 [2,4,1] 2 [2,4,1] 2 0'0 2015-07-01 18:38:32.352721 0'0 2015-07-01 18:17:01.904198
177 0.2 18 0 0 0 0 75497472 3039 3039 active+clean 2015-07-02 10:12:39.569630 226'17447 226:3213 [3,1,5] 3 [3,1,5] 3 0'0 2015-07-01 18:38:34.308228 0'0 2015-07-01 18:17:01.904199
178
179- :code:`ceph pg map 1.1` - show mapping between PG and OSD
180
181.. code-block:: bash
182
183 root@c-01:~# ceph pg map 1.1
184 osdmap e226 pg 1.1 (1.1) -> up [5,1,2] acting [5,1,2]
185
186
187
jpavlik8425d362015-06-09 15:23:27 +0200188Sample pillars
189==============
190
Ondrej Smola81d1a192017-08-17 11:13:10 +0200191Common metadata for all nodes/roles
jpavlik8425d362015-06-09 15:23:27 +0200192
193.. code-block:: yaml
194
195 ceph:
Ondrej Smola81d1a192017-08-17 11:13:10 +0200196 common:
Jiri Broulikd5729042017-09-19 20:07:22 +0200197 version: luminous
Jiri Broulik42552052018-02-15 15:23:29 +0100198 cluster_name: ceph
jpavlik8425d362015-06-09 15:23:27 +0200199 config:
200 global:
Ondrej Smola81d1a192017-08-17 11:13:10 +0200201 param1: value1
202 param2: value1
203 param3: value1
204 pool_section:
205 param1: value2
206 param2: value2
207 param3: value2
208 fsid: a619c5fc-c4ed-4f22-9ed2-66cf2feca23d
209 members:
210 - name: cmn01
211 host: 10.0.0.1
212 - name: cmn02
213 host: 10.0.0.2
214 - name: cmn03
215 host: 10.0.0.3
jpavlik8425d362015-06-09 15:23:27 +0200216 keyring:
Ondrej Smola81d1a192017-08-17 11:13:10 +0200217 admin:
Ondrej Smola81d1a192017-08-17 11:13:10 +0200218 caps:
219 mds: "allow *"
220 mgr: "allow *"
221 mon: "allow *"
222 osd: "allow *"
Jiri Broulikd5729042017-09-19 20:07:22 +0200223 bootstrap-osd:
Jiri Broulikd5729042017-09-19 20:07:22 +0200224 caps:
225 mon: "allow profile bootstrap-osd"
226
jpavlik8425d362015-06-09 15:23:27 +0200227
Ondrej Smola81d1a192017-08-17 11:13:10 +0200228Optional definition for cluster and public networks. Cluster network is used
229for replication. Public network for front-end communication.
jpavlik8425d362015-06-09 15:23:27 +0200230
231.. code-block:: yaml
232
233 ceph:
Ondrej Smola81d1a192017-08-17 11:13:10 +0200234 common:
Jiri Broulikd5729042017-09-19 20:07:22 +0200235 version: luminous
Ondrej Smola81d1a192017-08-17 11:13:10 +0200236 fsid: a619c5fc-c4ed-4f22-9ed2-66cf2feca23d
237 ....
238 public_network: 10.0.0.0/24, 10.1.0.0/24
239 cluster_network: 10.10.0.0/24, 10.11.0.0/24
240
241
242Ceph mon (control) roles
243------------------------
244
245Monitors: A Ceph Monitor maintains maps of the cluster state, including the
246monitor map, the OSD map, the Placement Group (PG) map, and the CRUSH map.
247Ceph maintains a history (called an epoch”) of each state change in the Ceph
248Monitors, Ceph OSD Daemons, and PGs.
249
250.. code-block:: yaml
251
252 ceph:
253 common:
254 config:
255 mon:
256 key: value
jpavlik8425d362015-06-09 15:23:27 +0200257 mon:
Ondrej Smola81d1a192017-08-17 11:13:10 +0200258 enabled: true
jpavlik8425d362015-06-09 15:23:27 +0200259 keyring:
Ondrej Smola81d1a192017-08-17 11:13:10 +0200260 mon:
Ondrej Smola81d1a192017-08-17 11:13:10 +0200261 caps:
262 mon: "allow *"
263 admin:
Ondrej Smola81d1a192017-08-17 11:13:10 +0200264 caps:
265 mds: "allow *"
266 mgr: "allow *"
267 mon: "allow *"
268 osd: "allow *"
jpavlik8425d362015-06-09 15:23:27 +0200269
Ondrej Smola91c83162017-09-12 16:40:02 +0200270Ceph mgr roles
271------------------------
272
273The Ceph Manager daemon (ceph-mgr) runs alongside monitor daemons, to provide additional monitoring and interfaces to external monitoring and management systems. Since the 12.x (luminous) Ceph release, the ceph-mgr daemon is required for normal operations. The ceph-mgr daemon is an optional component in the 11.x (kraken) Ceph release.
274
275By default, the manager daemon requires no additional configuration, beyond ensuring it is running. If there is no mgr daemon running, you will see a health warning to that effect, and some of the other information in the output of ceph status will be missing or stale until a mgr is started.
276
277
278.. code-block:: yaml
279
280 ceph:
281 mgr:
282 enabled: true
283 dashboard:
284 enabled: true
285 host: 10.103.255.252
286 port: 7000
287
Ondrej Smola81d1a192017-08-17 11:13:10 +0200288
289Ceph OSD (storage) roles
290------------------------
jpavlik8425d362015-06-09 15:23:27 +0200291
292.. code-block:: yaml
293
294 ceph:
Ondrej Smola81d1a192017-08-17 11:13:10 +0200295 common:
Jiri Broulikec62dec2017-10-10 13:45:15 +0200296 version: luminous
297 fsid: a619c5fc-c4ed-4f22-9ed2-66cf2feca23d
298 public_network: 10.0.0.0/24, 10.1.0.0/24
299 cluster_network: 10.10.0.0/24, 10.11.0.0/24
300 keyring:
301 bootstrap-osd:
302 caps:
303 mon: "allow profile bootstrap-osd"
304 ....
Ondrej Smola81d1a192017-08-17 11:13:10 +0200305 osd:
306 enabled: true
Jiri Broulikec62dec2017-10-10 13:45:15 +0200307 crush_parent: rack01
308 journal_size: 20480 (20G)
309 bluestore_block_db_size: 10073741824 (10G)
310 bluestore_block_wal_size: 10073741824 (10G)
Jiri Broulikd5729042017-09-19 20:07:22 +0200311 bluestore_block_size: 807374182400 (800G)
312 backend:
313 filestore:
314 disks:
315 - dev: /dev/sdm
316 enabled: false
Jiri Broulikd5729042017-09-19 20:07:22 +0200317 journal: /dev/ssd
Jiri Broulik8870b872018-01-24 18:04:25 +0100318 journal_partition: 5
319 data_partition: 6
320 lockbox_partition: 7
321 data_partition_size: 12000 (MB)
Jiri Broulikd5729042017-09-19 20:07:22 +0200322 class: bestssd
Jiri Broulik8870b872018-01-24 18:04:25 +0100323 weight: 1.666
Jiri Broulik58ff84b2017-11-21 14:23:51 +0100324 dmcrypt: true
Jiri Broulik8870b872018-01-24 18:04:25 +0100325 journal_dmcrypt: false
326 - dev: /dev/sdf
327 journal: /dev/ssd
328 journal_dmcrypt: true
329 class: bestssd
330 weight: 1.666
Jiri Broulikd5729042017-09-19 20:07:22 +0200331 - dev: /dev/sdl
Jiri Broulikd5729042017-09-19 20:07:22 +0200332 journal: /dev/ssd
Jiri Broulikd5729042017-09-19 20:07:22 +0200333 class: bestssd
Jiri Broulik8870b872018-01-24 18:04:25 +0100334 weight: 1.666
Jiri Broulikd5729042017-09-19 20:07:22 +0200335 bluestore:
336 disks:
337 - dev: /dev/sdb
Jiri Broulik8870b872018-01-24 18:04:25 +0100338 - dev: /dev/sdf
339 block_db: /dev/ssd
340 block_wal: /dev/ssd
341 block_db_dmcrypt: true
342 block_wal_dmcrypt: true
Jiri Broulikd5729042017-09-19 20:07:22 +0200343 - dev: /dev/sdc
344 block_db: /dev/ssd
345 block_wal: /dev/ssd
Jiri Broulik8870b872018-01-24 18:04:25 +0100346 data_partition: 1
347 block_partition: 2
348 lockbox_partition: 5
349 block_db_partition: 3
350 block_wal_partition: 4
Jiri Broulikc2be93b2017-10-03 14:20:00 +0200351 class: ssd
352 weight: 1.666
Jiri Broulik58ff84b2017-11-21 14:23:51 +0100353 dmcrypt: true
Jiri Broulik8870b872018-01-24 18:04:25 +0100354 block_db_dmcrypt: false
355 block_wal_dmcrypt: false
Jiri Broulikd5729042017-09-19 20:07:22 +0200356 - dev: /dev/sdd
357 enabled: false
jpavlik8425d362015-06-09 15:23:27 +0200358
Ondrej Smola81d1a192017-08-17 11:13:10 +0200359
Mykyta Karpin37949ba2018-11-21 12:31:28 +0200360In case some custom block devices should be used (like loop devices for testing purpose),
361it is needed to indicate proper partition prefix.
362
363.. code-block:: yaml
364
365 ceph:
366 osd:
367 backend:
368 bluestore:
369 disks:
370 - dev: /dev/loop20
371 block_db: /dev/loop21
372 data_partition_prefix: 'p'
373
374
Jiri Broulikc2be93b2017-10-03 14:20:00 +0200375Ceph client roles - ...Deprecated - use ceph:common instead
376--------------------------------------------------------
Ondrej Smola81d1a192017-08-17 11:13:10 +0200377
378Simple ceph client service
Simon Pasquierf8e6f9e2017-07-03 10:15:20 +0200379
380.. code-block:: yaml
381
382 ceph:
383 client:
384 config:
385 global:
386 mon initial members: ceph1,ceph2,ceph3
387 mon host: 10.103.255.252:6789,10.103.255.253:6789,10.103.255.254:6789
388 keyring:
389 monitoring:
390 key: 00000000000000000000000000000000000000==
Ondrej Smola81d1a192017-08-17 11:13:10 +0200391
392At OpenStack control settings are usually located at cinder-volume or glance-
393registry services.
394
395.. code-block:: yaml
396
397 ceph:
398 client:
399 config:
400 global:
401 fsid: 00000000-0000-0000-0000-000000000000
402 mon initial members: ceph1,ceph2,ceph3
403 mon host: 10.103.255.252:6789,10.103.255.253:6789,10.103.255.254:6789
404 osd_fs_mkfs_arguments_xfs:
405 osd_fs_mount_options_xfs: rw,noatime
406 network public: 10.0.0.0/24
407 network cluster: 10.0.0.0/24
408 osd_fs_type: xfs
409 osd:
410 osd journal size: 7500
411 filestore xattr use omap: true
412 mon:
413 mon debug dump transactions: false
414 keyring:
415 cinder:
416 key: 00000000000000000000000000000000000000==
417 glance:
418 key: 00000000000000000000000000000000000000==
419
420
421Ceph gateway
422------------
423
424Rados gateway with keystone v2 auth backend
425
426.. code-block:: yaml
427
428 ceph:
429 radosgw:
430 enabled: true
431 hostname: gw.ceph.lab
432 bind:
433 address: 10.10.10.1
434 port: 8080
435 identity:
436 engine: keystone
437 api_version: 2
438 host: 10.10.10.100
439 port: 5000
440 user: admin
441 password: password
442 tenant: admin
443
444Rados gateway with keystone v3 auth backend
445
446.. code-block:: yaml
447
448 ceph:
cdodda9b8362c2018-04-19 18:06:41 -0500449 common:
450 config:
451 rgw:
452 key: value
Ondrej Smola81d1a192017-08-17 11:13:10 +0200453 radosgw:
454 enabled: true
455 hostname: gw.ceph.lab
456 bind:
457 address: 10.10.10.1
458 port: 8080
459 identity:
460 engine: keystone
461 api_version: 3
462 host: 10.10.10.100
463 port: 5000
464 user: admin
465 password: password
466 project: admin
467 domain: default
Jiri Broulik4870e802018-06-25 12:14:34 +0200468 swift:
469 versioning:
470 enabled: true
Ivan Berezovskiy645d4442018-11-21 17:09:54 +0400471 enforce_content_length: true
Ondrej Smola81d1a192017-08-17 11:13:10 +0200472
473
474Ceph setup role
475---------------
476
477Replicated ceph storage pool
478
479.. code-block:: yaml
480
481 ceph:
482 setup:
483 pool:
484 replicated_pool:
485 pg_num: 256
486 pgp_num: 256
487 type: replicated
Jiri Broulik97af8ab2017-10-12 14:32:51 +0200488 crush_rule: sata
489 application: rbd
Ondrej Smola81d1a192017-08-17 11:13:10 +0200490
Jiri Broulikeaf41472017-10-18 09:56:33 +0200491 .. note:: For Kraken and earlier releases please specify crush_rule as a ruleset number.
492 For Kraken and earlier releases application param is not needed.
493
Ondrej Smola81d1a192017-08-17 11:13:10 +0200494Erasure ceph storage pool
495
496.. code-block:: yaml
497
498 ceph:
499 setup:
500 pool:
501 erasure_pool:
502 pg_num: 256
503 pgp_num: 256
504 type: erasure
Jiri Broulik97af8ab2017-10-12 14:32:51 +0200505 crush_rule: ssd
506 application: rbd
Ondrej Smola81d1a192017-08-17 11:13:10 +0200507
Jiri Broulikd68e33a2017-10-24 10:54:43 +0200508
Jiri Broulike4ba9f62017-11-08 11:33:00 +0100509Inline compression for Bluestore backend
510
511.. code-block:: yaml
512
513 ceph:
514 setup:
515 pool:
516 volumes:
517 pg_num: 256
518 pgp_num: 256
519 type: replicated
520 crush_rule: hdd
521 application: rbd
522 compression_algorithm: snappy
523 compression_mode: aggressive
524 compression_required_ratio: .875
525 ...
526
527
Mateusz Losc398c222019-08-09 13:05:26 +0200528Ceph manage clients keyring keys
Dzmitry Stremkouski159f8a12020-03-16 03:23:36 +0100529--------------------------------
Jiri Broulikd68e33a2017-10-24 10:54:43 +0200530
Mateusz Losc398c222019-08-09 13:05:26 +0200531Keyrings are dynamically generated unless specified by the manage_keyring pillar.
Dzmitry Stremkouski159f8a12020-03-16 03:23:36 +0100532This settings has effect on admin keyring only if caps are not defined.
Jiri Broulikd68e33a2017-10-24 10:54:43 +0200533
534.. code-block:: yaml
535
536 ceph:
Dzmitry Stremkouski159f8a12020-03-16 03:23:36 +0100537 setup:
538 enabled: true
Jiri Broulikd68e33a2017-10-24 10:54:43 +0200539 common:
540 manage_keyring: true
541 keyring:
Dzmitry Stremkouski159f8a12020-03-16 03:23:36 +0100542 admin:
543 key: AACf3ulZFFPNDxAAd2DWds3aEkHh4IklZVgIaQ==
544 mode: 600
Jiri Broulikd68e33a2017-10-24 10:54:43 +0200545 glance:
546 name: images
547 key: AACf3ulZFFPNDxAAd2DWds3aEkHh4IklZVgIaQ==
Dzmitry Stremkouski159f8a12020-03-16 03:23:36 +0100548 user: glance
Jiri Broulikd68e33a2017-10-24 10:54:43 +0200549 caps:
550 mon: "allow r"
551 osd: "allow class-read object_prefix rdb_children, allow rwx pool=images"
552
Mateusz Losc398c222019-08-09 13:05:26 +0200553Ceph manage admin keyring
Dzmitry Stremkouski159f8a12020-03-16 03:23:36 +0100554-------------------------
555To use pre-defined admin key add manage_admin_keyring and admin keyring definition to
556ceph mon nodes in cluster_model/ceph/mon.yml
Mateusz Losc398c222019-08-09 13:05:26 +0200557
Dzmitry Stremkouski159f8a12020-03-16 03:23:36 +0100558.. code-block:: yaml
559
560 ceph:
561 common:
562 manage_admin_keyring: true
563 keyring:
564 admin:
565 caps:
566 mds: "allow *"
567 mgr: "allow *"
568 mon: "allow *"
569 osd: "allow *"
570 key: AACf3ulZFFPNDxAAd2DWds3aEkHh4IklZVgIaQ==
Jiri Broulikd68e33a2017-10-24 10:54:43 +0200571
Dzmitry Stremkouski461ed602019-08-20 16:58:02 +0200572Specify alternative keyring path and username
573
574.. code-block:: yaml
575
576 ceph:
577 radosgw:
578 keyring_user: radosgw.gateway
579 keyring_path: /etc/ceph/keyring.radosgw.gateway
580
581
Jiri Broulik97af8ab2017-10-12 14:32:51 +0200582Generate CRUSH map - Recommended way
583-----------------------------------
Tomáš Kukrál363d37d2017-08-17 13:40:20 +0200584
Jiri Broulik97af8ab2017-10-12 14:32:51 +0200585It is required to define the `type` for crush buckets and these types must start with `root` (top) and end with `host`. OSD daemons will be assigned to hosts according to it's hostname. Weight of the buckets will be calculated according to weight of it's children.
586
587If the pools that are in use have size of 3 it is best to have 3 children of a specific type in the root CRUSH tree to replicate objects across (Specified in rule steps by 'type region').
Tomáš Kukrál363d37d2017-08-17 13:40:20 +0200588
589.. code-block:: yaml
590
Jiri Broulik97af8ab2017-10-12 14:32:51 +0200591 ceph:
592 setup:
593 crush:
594 enabled: True
595 tunables:
596 choose_total_tries: 50
597 choose_local_tries: 0
598 choose_local_fallback_tries: 0
599 chooseleaf_descend_once: 1
600 chooseleaf_vary_r: 1
601 chooseleaf_stable: 1
602 straw_calc_version: 1
603 allowed_bucket_algs: 54
604 type:
605 - root
606 - region
607 - rack
608 - host
Jiri Broulikeaf41472017-10-18 09:56:33 +0200609 - osd
Jiri Broulik97af8ab2017-10-12 14:32:51 +0200610 root:
611 - name: root-ssd
612 - name: root-sata
613 region:
614 - name: eu-1
615 parent: root-sata
616 - name: eu-2
617 parent: root-sata
618 - name: eu-3
619 parent: root-ssd
620 - name: us-1
621 parent: root-sata
622 rack:
623 - name: rack01
624 parent: eu-1
625 - name: rack02
626 parent: eu-2
627 - name: rack03
628 parent: us-1
629 rule:
630 sata:
631 ruleset: 0
632 type: replicated
633 min_size: 1
634 max_size: 10
635 steps:
636 - take take root-ssd
637 - chooseleaf firstn 0 type region
638 - emit
639 ssd:
640 ruleset: 1
641 type: replicated
642 min_size: 1
643 max_size: 10
644 steps:
645 - take take root-sata
646 - chooseleaf firstn 0 type region
647 - emit
648
649
650Generate CRUSH map - Alternative way
651------------------------------------
652
653It's necessary to create per OSD pillar.
654
655.. code-block:: yaml
656
657 ceph:
658 osd:
659 crush:
660 - type: root
661 name: root1
662 - type: region
663 name: eu-1
664 - type: rack
665 name: rack01
666 - type: host
667 name: osd001
668
Jiri Broulik8870b872018-01-24 18:04:25 +0100669Add OSDs with specific weight
670-----------------------------
671
672Add OSD device(s) with initial weight set specifically to certain value.
673
674.. code-block:: yaml
675
676 ceph:
677 osd:
678 crush_initial_weight: 0
679
Jiri Broulik97af8ab2017-10-12 14:32:51 +0200680
681Apply CRUSH map
682---------------
683
684Before you apply CRUSH map please make sure that settings in generated file in /etc/ceph/crushmap are correct.
685
686.. code-block:: yaml
687
688 ceph:
689 setup:
690 crush:
691 enforce: true
692 pool:
693 images:
694 crush_rule: sata
695 application: rbd
696 volumes:
697 crush_rule: sata
698 application: rbd
699 vms:
700 crush_rule: ssd
701 application: rbd
702
Jiri Broulikeaf41472017-10-18 09:56:33 +0200703 .. note:: For Kraken and earlier releases please specify crush_rule as a ruleset number.
704 For Kraken and earlier releases application param is not needed.
705
Jiri Broulik97af8ab2017-10-12 14:32:51 +0200706
707Persist CRUSH map
708--------------------
709
710After the CRUSH map is applied to Ceph it's recommended to persist the same settings even after OSD reboots.
711
712.. code-block:: yaml
713
714 ceph:
715 osd:
716 crush_update: false
717
Ondrej Smola81d1a192017-08-17 11:13:10 +0200718
719Ceph monitoring
720---------------
721
Jiri Broulik44574072017-11-14 12:27:39 +0100722By default monitoring is setup to collect information from MON and OSD nodes. To change the default values add the following pillar to MON nodes.
Ondrej Smola81d1a192017-08-17 11:13:10 +0200723
724.. code-block:: yaml
725
726 ceph:
Simon Pasquierf8e6f9e2017-07-03 10:15:20 +0200727 monitoring:
Jiri Broulik44574072017-11-14 12:27:39 +0100728 space_used_warning_threshold: 0.75
729 space_used_critical_threshold: 0.85
730 apply_latency_threshold: 0.007
731 commit_latency_threshold: 0.7
Machi Hoshino50682992018-09-19 11:49:05 +0900732 pool:
733 vms:
734 pool_space_used_utilization_warning_threshold: 0.75
735 pool_space_used_critical_threshold: 0.85
736 pool_write_ops_threshold: 200
737 pool_write_bytes_threshold: 70000000
738 pool_read_bytes_threshold: 70000000
739 pool_read_ops_threshold: 1000
740 images:
741 pool_space_used_utilization_warning_threshold: 0.50
742 pool_space_used_critical_threshold: 0.95
743 pool_write_ops_threshold: 100
744 pool_write_bytes_threshold: 50000000
745 pool_read_bytes_threshold: 50000000
746 pool_read_ops_threshold: 500
Simon Pasquierf8e6f9e2017-07-03 10:15:20 +0200747
Mateusz Los4dd8c4f2017-12-01 09:53:02 +0100748Ceph monitor backups
749--------------------
750
751Backup client with ssh/rsync remote host
752
753.. code-block:: yaml
754
755 ceph:
756 backup:
757 client:
758 enabled: true
759 full_backups_to_keep: 3
760 hours_before_full: 24
761 target:
762 host: cfg01
Jiri Broulik44feb042018-03-05 12:10:19 +0100763 backup_dir: server-backup-dir
Mateusz Los4dd8c4f2017-12-01 09:53:02 +0100764
765Backup client with local backup only
766
767.. code-block:: yaml
768
769 ceph:
770 backup:
771 client:
772 enabled: true
773 full_backups_to_keep: 3
774 hours_before_full: 24
775
Martin Polreich8d37f282018-03-04 17:38:15 +0100776
777Backup client at exact times:
778
779..code-block:: yaml
780
781 ceph:
782 backup:
783 client:
784 enabled: true
785 full_backups_to_keep: 3
786 incr_before_full: 3
787 backup_times:
Martin Polreichfe1b3902018-04-25 15:32:30 +0200788 day_of_week: 0
Martin Polreich8d37f282018-03-04 17:38:15 +0100789 hour: 4
790 minute: 52
791 compression: true
792 compression_threads: 2
793 database:
794 user: user
795 password: password
796 target:
797 host: host01
798
799 .. note:: Parameters in ``backup_times`` section can be used to set up exact
800 time the cron job should be executed. In this example, the backup job
801 would be executed every Sunday at 4:52 AM. If any of the individual
802 ``backup_times`` parameters is not defined, the defalut ``*`` value will be
803 used. For example, if minute parameter is ``*``, it will run the backup every minute,
804 which is ususally not desired.
Martin Polreichfe1b3902018-04-25 15:32:30 +0200805 Available parameters are ``day_of_week``, ``day_of_month``, ``month``, ``hour`` and ``minute``.
Martin Polreich8d37f282018-03-04 17:38:15 +0100806 Please see the crontab reference for further info on how to set these parameters.
807
808 .. note:: Please be aware that only ``backup_times`` section OR
809 ``hours_before_full(incr)`` can be defined. If both are defined,
810 the ``backup_times`` section will be peferred.
811
812 .. note:: New parameter ``incr_before_full`` needs to be defined. This
813 number sets number of incremental backups to be run, before a full backup
814 is performed.
815
Mateusz Los4dd8c4f2017-12-01 09:53:02 +0100816Backup server rsync
817
818.. code-block:: yaml
819
820 ceph:
821 backup:
822 server:
823 enabled: true
824 hours_before_full: 24
825 full_backups_to_keep: 5
826 key:
827 ceph_pub_key:
828 enabled: true
829 key: ssh_rsa
830
Jiri Broulik62892df2018-02-28 16:22:00 +0100831Backup server without strict client restriction
832
833.. code-block:: yaml
834
835 ceph:
836 backup:
837 restrict_clients: false
838
Martin Polreich8d37f282018-03-04 17:38:15 +0100839Backup server at exact times:
840
841..code-block:: yaml
842
843 ceph:
844 backup:
845 server:
846 enabled: true
847 full_backups_to_keep: 3
848 incr_before_full: 3
849 backup_dir: /srv/backup
850 backup_times:
Martin Polreichfe1b3902018-04-25 15:32:30 +0200851 day_of_week: 0
Martin Polreich8d37f282018-03-04 17:38:15 +0100852 hour: 4
853 minute: 52
854 key:
855 ceph_pub_key:
856 enabled: true
857 key: key
858
859 .. note:: Parameters in ``backup_times`` section can be used to set up exact
860 time the cron job should be executed. In this example, the backup job
861 would be executed every Sunday at 4:52 AM. If any of the individual
862 ``backup_times`` parameters is not defined, the defalut ``*`` value will be
863 used. For example, if minute parameter is ``*``, it will run the backup every minute,
864 which is ususally not desired.
Martin Polreichfe1b3902018-04-25 15:32:30 +0200865 Available parameters are ``day_of_week``, ``day_of_month``, ``month``, ``hour`` and ``minute``.
Martin Polreich8d37f282018-03-04 17:38:15 +0100866 Please see the crontab reference for further info on how to set these parameters.
867
868 .. note:: Please be aware that only ``backup_times`` section OR
869 ``hours_before_full(incr)`` can be defined. If both are defined, The
870 ``backup_times`` section will be peferred.
871
872 .. note:: New parameter ``incr_before_full`` needs to be defined. This
873 number sets number of incremental backups to be run, before a full backup
874 is performed.
875
Mateusz Losb4307792019-04-15 14:30:27 +0200876 Enabling ceph-volume
877 ----------------------------
878 This feature is tech preview.
879
880 You can set the formula to use ceph-volume utility.
881 The ceph-volume utility is a replacement for ceph-disk, which is deprecated since the Ceph Mimic release.
882
883 To enable ceph-volume, set ``osd.lvm_enabled`` to ``True``:
884
885 .. code-block:: yaml
886
887 ceph:
888 osd:
889 lvm_enabled: True
890
891 Partitioning for block_db and WAL
892 ----------------------------
893
894 You can allow the formula to automatically partition a disk used as block DB or WAL.
895
896 .. code-block:: yaml
897
898 ceph:
899 osd:
900 backend:
901 bluestore:
902 create_partitions: true
903
904 A partition on DB and WAL devices will be created based on the Ceph OSD definition from the cluster model with sizes defined in
905 ``bluestore_block_db_size`` and ``bluestore_block_wal_size``. For example, the following configuration will create three
906 partitions on a VDD device with size of 10 GB:
907 .. code-block:: yaml
908
909 ceph:
910 osd:
911 backend:
912 bluestore:
913 bluestore_block_db_size: 10737418240
914 disks:
915 - dev: /dev/vdc1
916 block_db: /dev/vdd
917 - dev: /dev/vdc2
918 block_db: /dev/vdd
919 - dev: /dev/vdc3
920 block_db: /dev/vdd
921
Jiri Broulik42552052018-02-15 15:23:29 +0100922Migration from Decapod to salt-formula-ceph
923--------------------------------------------
924
Pavel Cizinsky03b7af82018-12-12 12:01:16 +0100925The following configuration will run a python script which will generate ceph config and osd disk mappings to be put in cluster model.
Jiri Broulik42552052018-02-15 15:23:29 +0100926
927.. code-block:: yaml
928
929 ceph:
930 decapod:
931 ip: 192.168.1.10
932 user: user
933 password: psswd
934 deploy_config_name: ceph
Mateusz Los4dd8c4f2017-12-01 09:53:02 +0100935
Simon Pasquierf8e6f9e2017-07-03 10:15:20 +0200936
Ondrej Smola81d1a192017-08-17 11:13:10 +0200937More information
938================
jpavlik8425d362015-06-09 15:23:27 +0200939
940* https://github.com/cloud-ee/ceph-salt-formula
941* http://ceph.com/ceph-storage/
jan kaufman4f7757b2015-06-12 10:49:00 +0200942* http://ceph.com/docs/master/start/intro/