Jakub Pavlik | e7d12cd | 2015-09-03 19:02:45 +0200 | [diff] [blame] | 1 | |
| 2 | ============ |
| 3 | Heka Formula |
| 4 | ============ |
| 5 | |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 6 | Heka is an open source stream processing software system developed by Mozilla. Heka is a Swiss Army Knife type tool for data processing. |
Jakub Pavlik | e7d12cd | 2015-09-03 19:02:45 +0200 | [diff] [blame] | 7 | |
| 8 | Sample pillars |
| 9 | ============== |
| 10 | |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 11 | Metric collector service |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 12 | ------------------------ |
| 13 | |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 14 | Local alarm definition for nova compute role, excerpt from `nova/meta/heka.yml`. |
jan kaufman | 1002cd9 | 2015-09-16 16:30:48 +0200 | [diff] [blame] | 15 | |
| 16 | .. code-block:: yaml |
| 17 | |
jan kaufman | 1002cd9 | 2015-09-16 16:30:48 +0200 | [diff] [blame] | 18 | heka: |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 19 | metric_collector: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 20 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 21 | nova_compute_filesystem_warning: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 22 | enabled: True # implicit |
| 23 | description: "The nova instance filesystem's root free space is low." |
| 24 | severity: warning |
| 25 | logical_operator: or # implicit |
| 26 | rules: |
| 27 | - metric: fs_space_percent_free |
| 28 | relational_operator: '<' |
| 29 | threshold: 10 |
| 30 | window: 60 |
| 31 | periods: 0 |
| 32 | function: min |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 33 | dimension: |
| 34 | fs: '/var/lib/nova' |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 35 | nova_compute_filesystem_critical: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 36 | description: "The nova instance filesystem's root free space is low." |
| 37 | severity: warning |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 38 | rules: |
| 39 | - metric: fs_space_percent_free |
| 40 | relational_operator: '<' |
| 41 | threshold: 5 |
| 42 | window: 60 |
| 43 | periods: 0 |
| 44 | function: min |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 45 | dimension: |
| 46 | fs: '/var/lib/nova' |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 47 | filter: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 48 | nova_compute_service: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 49 | engine: afd |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 50 | notifications: False |
| 51 | alerting: True |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 52 | dimension: |
| 53 | hostname: '$match_by.hostname' |
| 54 | node_role: controller |
| 55 | match_by: |
| 56 | - hostname |
| 57 | triggers: |
| 58 | - nova_compute_filesystem_warning |
| 59 | - nova_compute_filesystem_critical |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 60 | aggregator: |
| 61 | filter: |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 62 | nova_compute_service: # the service_role format |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 63 | engine: gse |
| 64 | policy: highest_severity |
| 65 | group_by: member |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 66 | match: |
| 67 | node_role: compute |
| 68 | dimension: |
| 69 | cluster: nova-compute-plane |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 70 | members: |
| 71 | - nova_compute_logs |
| 72 | - nova_compute_service |
| 73 | - nova_compute_instances |
| 74 | - nova_compute_libvirt |
| 75 | - nova_compute_free_cpu |
| 76 | - nova_compute_free_mem |
| 77 | hints: |
| 78 | - neutron_compute # or contrail_vrouter for contrail nodes |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 79 | nova_compute_plane: # the service_role format |
| 80 | engine: gse |
| 81 | policy: highest_severity |
| 82 | group_by: member |
| 83 | match: |
| 84 | cluster: nova-compute-plane |
| 85 | |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 86 | Default CPU usage alarms, excerpt from `linux/meta/heka.yml`. |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 87 | |
| 88 | .. code-block:: yaml |
| 89 | |
| 90 | metric_collector: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 91 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 92 | linux_system_cpu_critical: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 93 | description: 'The CPU usage is too high.' |
| 94 | severity: critical |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 95 | rules: |
| 96 | - metric: cpu_wait |
| 97 | relational_operator: >= |
| 98 | threshold: 35 |
| 99 | window: 120 |
| 100 | periods: 0 |
| 101 | function: avg |
| 102 | - metric: cpu_idle |
| 103 | relational_operator: <= |
| 104 | threshold: 5 |
| 105 | window: 120 |
| 106 | function: avg |
| 107 | linux_system_cpu_warning: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 108 | description: 'The CPU wait times are high.' |
| 109 | severity: critical |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 110 | rules: |
| 111 | - metric: cpu_wait |
| 112 | relational_operator: >= |
| 113 | threshold: 15 |
| 114 | window: 120 |
| 115 | periods: 0 |
| 116 | function: avg |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 117 | filter: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 118 | linux_system_cpu: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 119 | engine: afd |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 120 | notifications: False |
| 121 | alerting: True |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 122 | triggers: |
| 123 | - linux_system_cpu_warning # will not render if referenced trigger is disabled |
| 124 | - linux_system_cpu_critical |
| 125 | dimension: |
| 126 | hostname: '$match_by.hostname' |
| 127 | node_role: controller |
| 128 | match_by: ['hostname'] |
| 129 | |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 130 | |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 131 | CPU usage override for compute node, excerpt from `nova/meta/heka.yml`. |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 132 | |
| 133 | .. code-block:: yaml |
| 134 | |
| 135 | metric_collector: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 136 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 137 | nova_compute_cpu_critical: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 138 | description: 'The CPU wait times are too high.' |
| 139 | severity: critical |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 140 | rules: |
| 141 | - metric: cpu_wait |
| 142 | relational_operator: >= |
| 143 | threshold: 35 |
| 144 | window: 120 |
| 145 | periods: 0 |
| 146 | function: avg |
| 147 | |
| 148 | .. code-block:: yaml |
| 149 | |
| 150 | Alarm override option 1 - override: |
| 151 | |
| 152 | .. code-block:: yaml |
| 153 | |
| 154 | metric_collector: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 155 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 156 | # Trigger can be disable |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 157 | linux_system_cpu_critical: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 158 | enabled: False |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 159 | filter: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 160 | #Alarm can be overriden |
| 161 | linux_system_cpu: |
Ales Komarek | 00ef62b | 2016-10-21 17:18:05 +0200 | [diff] [blame^] | 162 | triggers: |
| 163 | - nova_compute_cpu_critical |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 164 | |
| 165 | Alarm override option 2 - reinitialize: |
| 166 | |
| 167 | .. code-block:: yaml |
| 168 | |
| 169 | metric_collector: |
| 170 | filter: |
| 171 | ... |
| 172 | # Alarm is disabled |
| 173 | linux_system_cpu: |
| 174 | enabled: False |
| 175 | # new alarm is created |
| 176 | nova_compute_cpu: |
Ales Komarek | 00ef62b | 2016-10-21 17:18:05 +0200 | [diff] [blame^] | 177 | engine: afd |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 178 | notifications: False |
| 179 | alerting: True |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 180 | triggers: |
| 181 | - linux_system_cpu_warning # will not render if referenced trigger is disabled |
| 182 | - nova_compute_cpu_critical |
| 183 | dimension: |
| 184 | hostname: '$match_by.hostname' |
| 185 | node_role: controller |
| 186 | match_by: ['hostname'] |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 187 | |
| 188 | |
| 189 | Remote collector service |
| 190 | ------------------------ |
| 191 | |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 192 | Remote API check example, excerpt from `nova/meta/heka.yml`. |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 193 | |
| 194 | .. code-block:: yaml |
| 195 | |
| 196 | heka: |
| 197 | remote_collector: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 198 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 199 | nova_control_api_fail: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 200 | description: 'Endpoint check for nova-api failed.' |
| 201 | severity: critical |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 202 | rules: |
| 203 | - metric: openstack_check_api |
| 204 | relational_operator: '==' |
| 205 | threshold: 0 |
| 206 | window: 60 |
| 207 | periods: 0 |
| 208 | function: last |
Ales Komarek | 00ef62b | 2016-10-21 17:18:05 +0200 | [diff] [blame^] | 209 | dimension: |
| 210 | service: 'nova-api' |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 211 | filter: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 212 | nova_control_api: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 213 | engine: afd |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 214 | notifications: False |
| 215 | alerting: True |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 216 | dimension: |
| 217 | hostname: '$match_by.hostname' |
| 218 | node_role: controller |
| 219 | match_by: ['hostname'] |
| 220 | triggers: |
| 221 | - nova_control_api_fail |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 222 | |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 223 | Corresponding clusters and alarms, excerpt from `nova/meta/heka.yml`. |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 224 | |
| 225 | .. code-block:: yaml |
| 226 | |
| 227 | heka: |
| 228 | aggregator: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 229 | filter: |
Ales Komarek | 00ef62b | 2016-10-21 17:18:05 +0200 | [diff] [blame^] | 230 | nova_control_service: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 231 | engine: gse |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 232 | policy: highest_severity |
| 233 | group_by: member |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 234 | match: |
| 235 | node_role: control |
| 236 | dimension: |
Ales Komarek | 00ef62b | 2016-10-21 17:18:05 +0200 | [diff] [blame^] | 237 | cluster: openstack-control-plane |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 238 | members: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 239 | - nova_control_api |
| 240 | - nova_control_endpoint |
| 241 | hints: |
| 242 | - neutron_control # or contrail_vrouter for contrail nodes |
| 243 | - keystone_control |
Ales Komarek | 00ef62b | 2016-10-21 17:18:05 +0200 | [diff] [blame^] | 244 | openstack_control_plane: |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 245 | engine: gse |
| 246 | policy: highest_severity |
| 247 | group_by: member |
| 248 | match: |
Ales Komarek | 00ef62b | 2016-10-21 17:18:05 +0200 | [diff] [blame^] | 249 | cluster: openstack-control-plane |
Jakub Pavlik | e7d12cd | 2015-09-03 19:02:45 +0200 | [diff] [blame] | 250 | |
| 251 | Read more |
| 252 | ========= |
| 253 | |
jan kaufman | 1002cd9 | 2015-09-16 16:30:48 +0200 | [diff] [blame] | 254 | * https://hekad.readthedocs.org/en/latest/index.html |