Jakub Pavlik | e7d12cd | 2015-09-03 19:02:45 +0200 | [diff] [blame] | 1 | |
| 2 | ============ |
| 3 | Heka Formula |
| 4 | ============ |
| 5 | |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 6 | Heka is an open source stream processing software system developed by Mozilla. Heka is a Swiss Army Knife type tool for data processing. |
Jakub Pavlik | e7d12cd | 2015-09-03 19:02:45 +0200 | [diff] [blame] | 7 | |
| 8 | Sample pillars |
| 9 | ============== |
| 10 | |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 11 | Metric collector service |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 12 | ------------------------ |
| 13 | |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 14 | Local alarm definition for nova compute role, excerpt from `nova/meta/heka.yml`. |
jan kaufman | 1002cd9 | 2015-09-16 16:30:48 +0200 | [diff] [blame] | 15 | |
| 16 | .. code-block:: yaml |
| 17 | |
jan kaufman | 1002cd9 | 2015-09-16 16:30:48 +0200 | [diff] [blame] | 18 | heka: |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 19 | metric_collector: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 20 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 21 | nova_compute_filesystem_warning: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 22 | enabled: True # implicit |
| 23 | description: "The nova instance filesystem's root free space is low." |
| 24 | severity: warning |
| 25 | logical_operator: or # implicit |
| 26 | rules: |
| 27 | - metric: fs_space_percent_free |
| 28 | relational_operator: '<' |
| 29 | threshold: 10 |
| 30 | window: 60 |
| 31 | periods: 0 |
| 32 | function: min |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame^] | 33 | dimension: |
| 34 | fs: '/var/lib/nova' |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 35 | nova_compute_filesystem_critical: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 36 | enabled: True # implicit |
| 37 | description: "The nova instance filesystem's root free space is low." |
| 38 | severity: warning |
| 39 | logical_operator: or # implicit |
| 40 | rules: |
| 41 | - metric: fs_space_percent_free |
| 42 | relational_operator: '<' |
| 43 | threshold: 5 |
| 44 | window: 60 |
| 45 | periods: 0 |
| 46 | function: min |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame^] | 47 | dimension: |
| 48 | fs: '/var/lib/nova' |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 49 | filter: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 50 | nova_compute_service: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 51 | engine: afd |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 52 | notifications: False |
| 53 | alerting: True |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame^] | 54 | dimension: |
| 55 | hostname: '$match_by.hostname' |
| 56 | node_role: controller |
| 57 | match_by: |
| 58 | - hostname |
| 59 | triggers: |
| 60 | - nova_compute_filesystem_warning |
| 61 | - nova_compute_filesystem_critical |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 62 | aggregator: |
| 63 | filter: |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame^] | 64 | nova_compute_service: # the service_role format |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 65 | engine: gse |
| 66 | policy: highest_severity |
| 67 | group_by: member |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame^] | 68 | match: |
| 69 | node_role: compute |
| 70 | dimension: |
| 71 | cluster: nova-compute-plane |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 72 | members: |
| 73 | - nova_compute_logs |
| 74 | - nova_compute_service |
| 75 | - nova_compute_instances |
| 76 | - nova_compute_libvirt |
| 77 | - nova_compute_free_cpu |
| 78 | - nova_compute_free_mem |
| 79 | hints: |
| 80 | - neutron_compute # or contrail_vrouter for contrail nodes |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame^] | 81 | nova_compute_plane: # the service_role format |
| 82 | engine: gse |
| 83 | policy: highest_severity |
| 84 | group_by: member |
| 85 | match: |
| 86 | cluster: nova-compute-plane |
| 87 | |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 88 | |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 89 | Default CPU usage alarms, excerpt from `linux/meta/heka.yml`. |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 90 | |
| 91 | .. code-block:: yaml |
| 92 | |
| 93 | metric_collector: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 94 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 95 | linux_system_cpu_critical: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 96 | enabled: True # implicit |
| 97 | description: 'The CPU usage is too high.' |
| 98 | severity: critical |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 99 | rules: |
| 100 | - metric: cpu_wait |
| 101 | relational_operator: >= |
| 102 | threshold: 35 |
| 103 | window: 120 |
| 104 | periods: 0 |
| 105 | function: avg |
| 106 | - metric: cpu_idle |
| 107 | relational_operator: <= |
| 108 | threshold: 5 |
| 109 | window: 120 |
| 110 | function: avg |
| 111 | linux_system_cpu_warning: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 112 | enabled: True # implicit |
| 113 | description: 'The CPU wait times are high.' |
| 114 | severity: critical |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 115 | rules: |
| 116 | - metric: cpu_wait |
| 117 | relational_operator: >= |
| 118 | threshold: 15 |
| 119 | window: 120 |
| 120 | periods: 0 |
| 121 | function: avg |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 122 | filter: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 123 | linux_system_cpu: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 124 | engine: afd |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 125 | notifications: False |
| 126 | alerting: True |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame^] | 127 | triggers: |
| 128 | - linux_system_cpu_warning # will not render if referenced trigger is disabled |
| 129 | - linux_system_cpu_critical |
| 130 | dimension: |
| 131 | hostname: '$match_by.hostname' |
| 132 | node_role: controller |
| 133 | match_by: ['hostname'] |
| 134 | |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 135 | |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 136 | CPU usage override for compute node, excerpt from `nova/meta/heka.yml`. |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 137 | |
| 138 | .. code-block:: yaml |
| 139 | |
| 140 | metric_collector: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 141 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 142 | nova_compute_cpu_critical: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 143 | enabled: True # implicit |
| 144 | description: 'The CPU wait times are too high.' |
| 145 | severity: critical |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 146 | rules: |
| 147 | - metric: cpu_wait |
| 148 | relational_operator: >= |
| 149 | threshold: 35 |
| 150 | window: 120 |
| 151 | periods: 0 |
| 152 | function: avg |
| 153 | |
| 154 | .. code-block:: yaml |
| 155 | |
| 156 | Alarm override option 1 - override: |
| 157 | |
| 158 | .. code-block:: yaml |
| 159 | |
| 160 | metric_collector: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 161 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 162 | # Trigger can be disable |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 163 | linux_system_cpu_critical: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 164 | enabled: False |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 165 | filter: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 166 | #Alarm can be overriden |
| 167 | linux_system_cpu: |
| 168 | trigger: |
| 169 | vip: |
| 170 | - nova_compute_cpu_critical |
| 171 | |
| 172 | Alarm override option 2 - reinitialize: |
| 173 | |
| 174 | .. code-block:: yaml |
| 175 | |
| 176 | metric_collector: |
| 177 | filter: |
| 178 | ... |
| 179 | # Alarm is disabled |
| 180 | linux_system_cpu: |
| 181 | enabled: False |
| 182 | # new alarm is created |
| 183 | nova_compute_cpu: |
| 184 | engine: afd_alarm |
| 185 | notifications: False |
| 186 | alerting: True |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame^] | 187 | triggers: |
| 188 | - linux_system_cpu_warning # will not render if referenced trigger is disabled |
| 189 | - nova_compute_cpu_critical |
| 190 | dimension: |
| 191 | hostname: '$match_by.hostname' |
| 192 | node_role: controller |
| 193 | match_by: ['hostname'] |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 194 | |
| 195 | |
| 196 | Remote collector service |
| 197 | ------------------------ |
| 198 | |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 199 | Remote API check example, excerpt from `nova/meta/heka.yml`. |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 200 | |
| 201 | .. code-block:: yaml |
| 202 | |
| 203 | heka: |
| 204 | remote_collector: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 205 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 206 | nova_control_api_fail: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 207 | description: 'Endpoint check for nova-api failed.' |
| 208 | severity: critical |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 209 | rules: |
| 210 | - metric: openstack_check_api |
| 211 | relational_operator: '==' |
| 212 | threshold: 0 |
| 213 | window: 60 |
| 214 | periods: 0 |
| 215 | function: last |
| 216 | service: 'nova-api' |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 217 | filter: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 218 | nova_control_api: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 219 | engine: afd |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 220 | notifications: False |
| 221 | alerting: True |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame^] | 222 | dimension: |
| 223 | hostname: '$match_by.hostname' |
| 224 | node_role: controller |
| 225 | match_by: ['hostname'] |
| 226 | triggers: |
| 227 | - nova_control_api_fail |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 228 | |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 229 | Corresponding clusters and alarms, excerpt from `nova/meta/heka.yml`. |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 230 | |
| 231 | .. code-block:: yaml |
| 232 | |
| 233 | heka: |
| 234 | aggregator: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 235 | filter: |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame^] | 236 | nova_control_service: # the service_role format |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 237 | engine: gse |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 238 | policy: highest_severity |
| 239 | group_by: member |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame^] | 240 | match: |
| 241 | node_role: control |
| 242 | dimension: |
| 243 | cluster: nova-control-plane |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 244 | members: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 245 | - nova_control_api |
| 246 | - nova_control_endpoint |
| 247 | hints: |
| 248 | - neutron_control # or contrail_vrouter for contrail nodes |
| 249 | - keystone_control |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame^] | 250 | nova_control_plane: # the service_role format |
| 251 | engine: gse |
| 252 | policy: highest_severity |
| 253 | group_by: member |
| 254 | match: |
| 255 | cluster: nova-control-plane |
| 256 | |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 257 | |
Jakub Pavlik | e7d12cd | 2015-09-03 19:02:45 +0200 | [diff] [blame] | 258 | |
| 259 | Read more |
| 260 | ========= |
| 261 | |
jan kaufman | 1002cd9 | 2015-09-16 16:30:48 +0200 | [diff] [blame] | 262 | * https://hekad.readthedocs.org/en/latest/index.html |