Jakub Pavlik | e7d12cd | 2015-09-03 19:02:45 +0200 | [diff] [blame] | 1 | |
| 2 | ============ |
| 3 | Heka Formula |
| 4 | ============ |
| 5 | |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 6 | Heka is an open source stream processing software system developed by Mozilla. Heka is a Swiss Army Knife type tool for data processing. |
Jakub Pavlik | e7d12cd | 2015-09-03 19:02:45 +0200 | [diff] [blame] | 7 | |
| 8 | Sample pillars |
| 9 | ============== |
| 10 | |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 11 | Metric collector service |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 12 | ------------------------ |
| 13 | |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 14 | Local alarm definition for nova compute role, excerpt from `nova/meta/heka.yml`. |
jan kaufman | 1002cd9 | 2015-09-16 16:30:48 +0200 | [diff] [blame] | 15 | |
| 16 | .. code-block:: yaml |
| 17 | |
jan kaufman | 1002cd9 | 2015-09-16 16:30:48 +0200 | [diff] [blame] | 18 | heka: |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 19 | metric_collector: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 20 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 21 | nova_compute_filesystem_warning: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 22 | engine: afd |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 23 | enabled: True # implicit |
| 24 | description: "The nova instance filesystem's root free space is low." |
| 25 | severity: warning |
| 26 | logical_operator: or # implicit |
| 27 | rules: |
| 28 | - metric: fs_space_percent_free |
| 29 | relational_operator: '<' |
| 30 | threshold: 10 |
| 31 | window: 60 |
| 32 | periods: 0 |
| 33 | function: min |
| 34 | fs: '/var/lib/nova' |
| 35 | nova_compute_filesystem_critical: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 36 | engine: afd |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 37 | enabled: True # implicit |
| 38 | description: "The nova instance filesystem's root free space is low." |
| 39 | severity: warning |
| 40 | logical_operator: or # implicit |
| 41 | rules: |
| 42 | - metric: fs_space_percent_free |
| 43 | relational_operator: '<' |
| 44 | threshold: 5 |
| 45 | window: 60 |
| 46 | periods: 0 |
| 47 | function: min |
| 48 | fs: '/var/lib/nova' |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 49 | filter: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 50 | nova_compute_service: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 51 | engine: afd |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 52 | notifications: False |
| 53 | alerting: True |
| 54 | trigger: |
| 55 | vip: |
| 56 | - nova_compute_filesystem_warning |
| 57 | - nova_compute_filesystem_critical |
| 58 | - nova_compute_filesystem_critical |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 59 | aggregator: |
| 60 | filter: |
| 61 | nova_compute: # the service_role format |
| 62 | engine: gse |
| 63 | policy: highest_severity |
| 64 | group_by: member |
| 65 | members: |
| 66 | - nova_compute_logs |
| 67 | - nova_compute_service |
| 68 | - nova_compute_instances |
| 69 | - nova_compute_libvirt |
| 70 | - nova_compute_free_cpu |
| 71 | - nova_compute_free_mem |
| 72 | hints: |
| 73 | - neutron_compute # or contrail_vrouter for contrail nodes |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 74 | |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 75 | Default CPU usage alarms, excerpt from `linux/meta/heka.yml`. |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 76 | |
| 77 | .. code-block:: yaml |
| 78 | |
| 79 | metric_collector: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 80 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 81 | linux_system_cpu_critical: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 82 | engine: afd |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 83 | enabled: True # implicit |
| 84 | description: 'The CPU usage is too high.' |
| 85 | severity: critical |
| 86 | label: |
| 87 | hostname: '$match_by.hostname' |
| 88 | node_role: controller |
| 89 | match_by: ['hostname'] |
| 90 | rules: |
| 91 | - metric: cpu_wait |
| 92 | relational_operator: >= |
| 93 | threshold: 35 |
| 94 | window: 120 |
| 95 | periods: 0 |
| 96 | function: avg |
| 97 | - metric: cpu_idle |
| 98 | relational_operator: <= |
| 99 | threshold: 5 |
| 100 | window: 120 |
| 101 | function: avg |
| 102 | linux_system_cpu_warning: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 103 | engine: afd |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 104 | enabled: True # implicit |
| 105 | description: 'The CPU wait times are high.' |
| 106 | severity: critical |
| 107 | label: |
| 108 | hostname: '$match_by.hostname' |
| 109 | node_role: controller |
| 110 | match_by: ['hostname'] |
| 111 | rules: |
| 112 | - metric: cpu_wait |
| 113 | relational_operator: >= |
| 114 | threshold: 15 |
| 115 | window: 120 |
| 116 | periods: 0 |
| 117 | function: avg |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 118 | filter: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 119 | linux_system_cpu: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 120 | engine: afd |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 121 | notifications: False |
| 122 | alerting: True |
| 123 | trigger: |
| 124 | vip: |
| 125 | - linux_system_cpu_warning # will not render if referenced trigger is disabled |
| 126 | - linux_system_cpu_critical |
| 127 | |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 128 | CPU usage override for compute node, excerpt from `nova/meta/heka.yml`. |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 129 | |
| 130 | .. code-block:: yaml |
| 131 | |
| 132 | metric_collector: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 133 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 134 | nova_compute_cpu_critical: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 135 | engine: afd |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 136 | enabled: True # implicit |
| 137 | description: 'The CPU wait times are too high.' |
| 138 | severity: critical |
| 139 | label: |
| 140 | hostname: '$match_by.hostname' |
| 141 | node_role: controller |
| 142 | match_by: ['hostname'] |
| 143 | rules: |
| 144 | - metric: cpu_wait |
| 145 | relational_operator: >= |
| 146 | threshold: 35 |
| 147 | window: 120 |
| 148 | periods: 0 |
| 149 | function: avg |
| 150 | |
| 151 | .. code-block:: yaml |
| 152 | |
| 153 | Alarm override option 1 - override: |
| 154 | |
| 155 | .. code-block:: yaml |
| 156 | |
| 157 | metric_collector: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 158 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 159 | # Trigger can be disable |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 160 | linux_system_cpu_critical: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 161 | enabled: False |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 162 | filter: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 163 | #Alarm can be overriden |
| 164 | linux_system_cpu: |
| 165 | trigger: |
| 166 | vip: |
| 167 | - nova_compute_cpu_critical |
| 168 | |
| 169 | Alarm override option 2 - reinitialize: |
| 170 | |
| 171 | .. code-block:: yaml |
| 172 | |
| 173 | metric_collector: |
| 174 | filter: |
| 175 | ... |
| 176 | # Alarm is disabled |
| 177 | linux_system_cpu: |
| 178 | enabled: False |
| 179 | # new alarm is created |
| 180 | nova_compute_cpu: |
| 181 | engine: afd_alarm |
| 182 | notifications: False |
| 183 | alerting: True |
| 184 | trigger: |
| 185 | vip: |
| 186 | - linux_system_cpu_warning # will not render if referenced trigger is disabled |
| 187 | - nova_compute_cpu_critical |
| 188 | |
| 189 | |
| 190 | Remote collector service |
| 191 | ------------------------ |
| 192 | |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 193 | Remote API check example, excerpt from `nova/meta/heka.yml`. |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 194 | |
| 195 | .. code-block:: yaml |
| 196 | |
| 197 | heka: |
| 198 | remote_collector: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 199 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 200 | nova_control_api_fail: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 201 | engine: afd |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 202 | description: 'Endpoint check for nova-api failed.' |
| 203 | severity: critical |
| 204 | alerting: True |
| 205 | label: |
| 206 | hostname: '$match_by.hostname' |
| 207 | node_role: controller |
| 208 | match_by: ['hostname'] |
| 209 | rules: |
| 210 | - metric: openstack_check_api |
| 211 | relational_operator: '==' |
| 212 | threshold: 0 |
| 213 | window: 60 |
| 214 | periods: 0 |
| 215 | function: last |
| 216 | service: 'nova-api' |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 217 | filter: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 218 | nova_control_api: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 219 | engine: afd |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 220 | notifications: False |
| 221 | alerting: True |
| 222 | trigger: |
| 223 | vip: |
| 224 | - nova_control_api_fail |
| 225 | |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 226 | Corresponding clusters and alarms, excerpt from `nova/meta/heka.yml`. |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 227 | |
| 228 | .. code-block:: yaml |
| 229 | |
| 230 | heka: |
| 231 | aggregator: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 232 | filter: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 233 | nova_compute: # the service_role format |
| 234 | engine: gse |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 235 | policy: highest_severity |
| 236 | group_by: member |
| 237 | members: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame^] | 238 | - nova_control_api |
| 239 | - nova_control_endpoint |
| 240 | hints: |
| 241 | - neutron_control # or contrail_vrouter for contrail nodes |
| 242 | - keystone_control |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 243 | |
Jakub Pavlik | e7d12cd | 2015-09-03 19:02:45 +0200 | [diff] [blame] | 244 | |
| 245 | Read more |
| 246 | ========= |
| 247 | |
jan kaufman | 1002cd9 | 2015-09-16 16:30:48 +0200 | [diff] [blame] | 248 | * https://hekad.readthedocs.org/en/latest/index.html |