Jakub Pavlik | e7d12cd | 2015-09-03 19:02:45 +0200 | [diff] [blame] | 1 | |
2 | ============ | ||||
3 | Heka Formula | ||||
4 | ============ | ||||
5 | |||||
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 6 | Heka is an open source stream processing software system developed by Mozilla. Heka is a Swiss Army Knife type tool for data processing. |
Jakub Pavlik | e7d12cd | 2015-09-03 19:02:45 +0200 | [diff] [blame] | 7 | |
8 | Sample pillars | ||||
9 | ============== | ||||
10 | |||||
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 11 | Metric collector service |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 12 | ------------------------ |
13 | |||||
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 14 | Local alarm definition for nova compute role, excerpt from `nova/meta/heka.yml`. |
jan kaufman | 1002cd9 | 2015-09-16 16:30:48 +0200 | [diff] [blame] | 15 | |
16 | .. code-block:: yaml | ||||
17 | |||||
jan kaufman | 1002cd9 | 2015-09-16 16:30:48 +0200 | [diff] [blame] | 18 | heka: |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 19 | metric_collector: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 20 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 21 | nova_compute_filesystem_warning: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 22 | enabled: True # implicit |
23 | description: "The nova instance filesystem's root free space is low." | ||||
24 | severity: warning | ||||
25 | logical_operator: or # implicit | ||||
26 | rules: | ||||
27 | - metric: fs_space_percent_free | ||||
28 | relational_operator: '<' | ||||
29 | threshold: 10 | ||||
30 | window: 60 | ||||
31 | periods: 0 | ||||
32 | function: min | ||||
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 33 | dimension: |
34 | fs: '/var/lib/nova' | ||||
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 35 | nova_compute_filesystem_critical: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 36 | description: "The nova instance filesystem's root free space is low." |
37 | severity: warning | ||||
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 38 | rules: |
39 | - metric: fs_space_percent_free | ||||
40 | relational_operator: '<' | ||||
41 | threshold: 5 | ||||
42 | window: 60 | ||||
43 | periods: 0 | ||||
44 | function: min | ||||
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 45 | dimension: |
46 | fs: '/var/lib/nova' | ||||
Ales Komarek | 9a8bd08 | 2016-10-25 01:25:09 +0200 | [diff] [blame^] | 47 | alarm: |
48 | nova_compute_filesystem: | ||||
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 49 | notifications: False |
50 | alerting: True | ||||
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 51 | dimension: |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 52 | node_role: controller |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 53 | triggers: |
54 | - nova_compute_filesystem_warning | ||||
55 | - nova_compute_filesystem_critical | ||||
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 56 | aggregator: |
Ales Komarek | 9a8bd08 | 2016-10-25 01:25:09 +0200 | [diff] [blame^] | 57 | alarm_cluster: |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 58 | nova_compute_service: # the service_role format |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 59 | policy: highest_severity |
60 | group_by: member | ||||
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 61 | match: |
62 | node_role: compute | ||||
63 | dimension: | ||||
64 | cluster: nova-compute-plane | ||||
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 65 | members: |
66 | - nova_compute_logs | ||||
Ales Komarek | 9a8bd08 | 2016-10-25 01:25:09 +0200 | [diff] [blame^] | 67 | - nova_compute_filesystem |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 68 | - nova_compute_instances |
69 | - nova_compute_libvirt | ||||
70 | - nova_compute_free_cpu | ||||
71 | - nova_compute_free_mem | ||||
72 | hints: | ||||
73 | - neutron_compute # or contrail_vrouter for contrail nodes | ||||
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 74 | nova_compute_plane: # the service_role format |
75 | engine: gse | ||||
76 | policy: highest_severity | ||||
77 | group_by: member | ||||
78 | match: | ||||
79 | cluster: nova-compute-plane | ||||
80 | |||||
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 81 | Default CPU usage alarms, excerpt from `linux/meta/heka.yml`. |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 82 | |
83 | .. code-block:: yaml | ||||
84 | |||||
85 | metric_collector: | ||||
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 86 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 87 | linux_system_cpu_critical: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 88 | description: 'The CPU usage is too high.' |
89 | severity: critical | ||||
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 90 | rules: |
91 | - metric: cpu_wait | ||||
92 | relational_operator: >= | ||||
93 | threshold: 35 | ||||
94 | window: 120 | ||||
95 | periods: 0 | ||||
96 | function: avg | ||||
97 | - metric: cpu_idle | ||||
98 | relational_operator: <= | ||||
99 | threshold: 5 | ||||
100 | window: 120 | ||||
101 | function: avg | ||||
102 | linux_system_cpu_warning: | ||||
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 103 | description: 'The CPU wait times are high.' |
104 | severity: critical | ||||
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 105 | rules: |
106 | - metric: cpu_wait | ||||
107 | relational_operator: >= | ||||
108 | threshold: 15 | ||||
109 | window: 120 | ||||
110 | periods: 0 | ||||
111 | function: avg | ||||
Ales Komarek | 9a8bd08 | 2016-10-25 01:25:09 +0200 | [diff] [blame^] | 112 | alarm: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 113 | linux_system_cpu: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 114 | notifications: False |
115 | alerting: True | ||||
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 116 | triggers: |
117 | - linux_system_cpu_warning # will not render if referenced trigger is disabled | ||||
118 | - linux_system_cpu_critical | ||||
119 | dimension: | ||||
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 120 | node_role: controller |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 121 | |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 122 | |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 123 | CPU usage override for compute node, excerpt from `nova/meta/heka.yml`. |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 124 | |
125 | .. code-block:: yaml | ||||
126 | |||||
127 | metric_collector: | ||||
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 128 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 129 | nova_compute_cpu_critical: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 130 | description: 'The CPU wait times are too high.' |
131 | severity: critical | ||||
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 132 | rules: |
133 | - metric: cpu_wait | ||||
134 | relational_operator: >= | ||||
135 | threshold: 35 | ||||
136 | window: 120 | ||||
137 | periods: 0 | ||||
138 | function: avg | ||||
139 | |||||
140 | .. code-block:: yaml | ||||
141 | |||||
142 | Alarm override option 1 - override: | ||||
143 | |||||
144 | .. code-block:: yaml | ||||
145 | |||||
146 | metric_collector: | ||||
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 147 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 148 | # Trigger can be disable |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 149 | linux_system_cpu_critical: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 150 | enabled: False |
Ales Komarek | 9a8bd08 | 2016-10-25 01:25:09 +0200 | [diff] [blame^] | 151 | alarm: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 152 | #Alarm can be overriden |
153 | linux_system_cpu: | ||||
Ales Komarek | 00ef62b | 2016-10-21 17:18:05 +0200 | [diff] [blame] | 154 | triggers: |
155 | - nova_compute_cpu_critical | ||||
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 156 | |
157 | Alarm override option 2 - reinitialize: | ||||
158 | |||||
159 | .. code-block:: yaml | ||||
160 | |||||
161 | metric_collector: | ||||
Ales Komarek | 9a8bd08 | 2016-10-25 01:25:09 +0200 | [diff] [blame^] | 162 | alarm: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 163 | ... |
164 | # Alarm is disabled | ||||
165 | linux_system_cpu: | ||||
166 | enabled: False | ||||
167 | # new alarm is created | ||||
168 | nova_compute_cpu: | ||||
Ales Komarek | 00ef62b | 2016-10-21 17:18:05 +0200 | [diff] [blame] | 169 | engine: afd |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 170 | notifications: False |
171 | alerting: True | ||||
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 172 | triggers: |
173 | - linux_system_cpu_warning # will not render if referenced trigger is disabled | ||||
174 | - nova_compute_cpu_critical | ||||
175 | dimension: | ||||
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 176 | node_role: controller |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 177 | |
178 | |||||
179 | Remote collector service | ||||
180 | ------------------------ | ||||
181 | |||||
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 182 | Remote API check example, excerpt from `nova/meta/heka.yml`. |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 183 | |
184 | .. code-block:: yaml | ||||
185 | |||||
186 | heka: | ||||
187 | remote_collector: | ||||
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 188 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 189 | nova_control_api_fail: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 190 | description: 'Endpoint check for nova-api failed.' |
191 | severity: critical | ||||
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 192 | rules: |
193 | - metric: openstack_check_api | ||||
194 | relational_operator: '==' | ||||
195 | threshold: 0 | ||||
196 | window: 60 | ||||
197 | periods: 0 | ||||
198 | function: last | ||||
Ales Komarek | 00ef62b | 2016-10-21 17:18:05 +0200 | [diff] [blame] | 199 | dimension: |
200 | service: 'nova-api' | ||||
Ales Komarek | 9a8bd08 | 2016-10-25 01:25:09 +0200 | [diff] [blame^] | 201 | alarm: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 202 | nova_control_api: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 203 | notifications: False |
204 | alerting: True | ||||
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 205 | dimension: |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 206 | node_role: controller |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 207 | triggers: |
208 | - nova_control_api_fail | ||||
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 209 | |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 210 | Corresponding clusters and alarms, excerpt from `nova/meta/heka.yml`. |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 211 | |
212 | .. code-block:: yaml | ||||
213 | |||||
214 | heka: | ||||
215 | aggregator: | ||||
Ales Komarek | 9a8bd08 | 2016-10-25 01:25:09 +0200 | [diff] [blame^] | 216 | alarm_cluster: |
Ales Komarek | 00ef62b | 2016-10-21 17:18:05 +0200 | [diff] [blame] | 217 | nova_control_service: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 218 | policy: highest_severity |
219 | group_by: member | ||||
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 220 | match: |
221 | node_role: control | ||||
222 | dimension: | ||||
Ales Komarek | 00ef62b | 2016-10-21 17:18:05 +0200 | [diff] [blame] | 223 | cluster: openstack-control-plane |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 224 | members: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 225 | - nova_control_api |
226 | - nova_control_endpoint | ||||
227 | hints: | ||||
228 | - neutron_control # or contrail_vrouter for contrail nodes | ||||
229 | - keystone_control | ||||
Ales Komarek | 00ef62b | 2016-10-21 17:18:05 +0200 | [diff] [blame] | 230 | openstack_control_plane: |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 231 | engine: gse |
232 | policy: highest_severity | ||||
233 | group_by: member | ||||
234 | match: | ||||
Ales Komarek | 00ef62b | 2016-10-21 17:18:05 +0200 | [diff] [blame] | 235 | cluster: openstack-control-plane |
Jakub Pavlik | e7d12cd | 2015-09-03 19:02:45 +0200 | [diff] [blame] | 236 | |
237 | Read more | ||||
238 | ========= | ||||
239 | |||||
jan kaufman | 1002cd9 | 2015-09-16 16:30:48 +0200 | [diff] [blame] | 240 | * https://hekad.readthedocs.org/en/latest/index.html |