Jakub Pavlik | e7d12cd | 2015-09-03 19:02:45 +0200 | [diff] [blame] | 1 | |
| 2 | ============ |
| 3 | Heka Formula |
| 4 | ============ |
| 5 | |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 6 | Heka is an open source stream processing software system developed by Mozilla. Heka is a Swiss Army Knife type tool for data processing. |
Jakub Pavlik | e7d12cd | 2015-09-03 19:02:45 +0200 | [diff] [blame] | 7 | |
| 8 | Sample pillars |
| 9 | ============== |
| 10 | |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 11 | Metric collector service |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 12 | ------------------------ |
| 13 | |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 14 | Local alarm definition for nova compute role, excerpt from `nova/meta/heka.yml`. |
jan kaufman | 1002cd9 | 2015-09-16 16:30:48 +0200 | [diff] [blame] | 15 | |
| 16 | .. code-block:: yaml |
| 17 | |
jan kaufman | 1002cd9 | 2015-09-16 16:30:48 +0200 | [diff] [blame] | 18 | heka: |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 19 | metric_collector: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 20 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 21 | nova_compute_filesystem_warning: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 22 | enabled: True # implicit |
| 23 | description: "The nova instance filesystem's root free space is low." |
| 24 | severity: warning |
| 25 | logical_operator: or # implicit |
| 26 | rules: |
| 27 | - metric: fs_space_percent_free |
| 28 | relational_operator: '<' |
| 29 | threshold: 10 |
| 30 | window: 60 |
| 31 | periods: 0 |
| 32 | function: min |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 33 | dimension: |
| 34 | fs: '/var/lib/nova' |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 35 | nova_compute_filesystem_critical: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 36 | description: "The nova instance filesystem's root free space is low." |
| 37 | severity: warning |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 38 | rules: |
| 39 | - metric: fs_space_percent_free |
| 40 | relational_operator: '<' |
| 41 | threshold: 5 |
| 42 | window: 60 |
| 43 | periods: 0 |
| 44 | function: min |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 45 | dimension: |
| 46 | fs: '/var/lib/nova' |
Ales Komarek | 9a8bd08 | 2016-10-25 01:25:09 +0200 | [diff] [blame] | 47 | alarm: |
| 48 | nova_compute_filesystem: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 49 | notifications: False |
| 50 | alerting: True |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 51 | dimension: |
Ales Komarek | 83ec1a4 | 2016-10-25 11:08:13 +0200 | [diff] [blame] | 52 | node_role: control |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 53 | triggers: |
| 54 | - nova_compute_filesystem_warning |
| 55 | - nova_compute_filesystem_critical |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 56 | aggregator: |
Ales Komarek | 9a8bd08 | 2016-10-25 01:25:09 +0200 | [diff] [blame] | 57 | alarm_cluster: |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 58 | nova_compute_service: # the service_role format |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 59 | policy: highest_severity |
| 60 | group_by: member |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 61 | match: |
| 62 | node_role: compute |
| 63 | dimension: |
| 64 | cluster: nova-compute-plane |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 65 | members: |
| 66 | - nova_compute_logs |
Ales Komarek | 9a8bd08 | 2016-10-25 01:25:09 +0200 | [diff] [blame] | 67 | - nova_compute_filesystem |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 68 | - nova_compute_instances |
| 69 | - nova_compute_libvirt |
| 70 | - nova_compute_free_cpu |
| 71 | - nova_compute_free_mem |
| 72 | hints: |
| 73 | - neutron_compute # or contrail_vrouter for contrail nodes |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 74 | nova_compute_plane: # the service_role format |
| 75 | engine: gse |
| 76 | policy: highest_severity |
| 77 | group_by: member |
| 78 | match: |
| 79 | cluster: nova-compute-plane |
| 80 | |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 81 | Default CPU usage alarms, excerpt from `linux/meta/heka.yml`. |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 82 | |
| 83 | .. code-block:: yaml |
| 84 | |
| 85 | metric_collector: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 86 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 87 | linux_system_cpu_critical: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 88 | description: 'The CPU usage is too high.' |
| 89 | severity: critical |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 90 | rules: |
| 91 | - metric: cpu_wait |
| 92 | relational_operator: >= |
| 93 | threshold: 35 |
| 94 | window: 120 |
| 95 | periods: 0 |
| 96 | function: avg |
| 97 | - metric: cpu_idle |
| 98 | relational_operator: <= |
| 99 | threshold: 5 |
| 100 | window: 120 |
| 101 | function: avg |
| 102 | linux_system_cpu_warning: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 103 | description: 'The CPU wait times are high.' |
| 104 | severity: critical |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 105 | rules: |
| 106 | - metric: cpu_wait |
| 107 | relational_operator: >= |
| 108 | threshold: 15 |
| 109 | window: 120 |
| 110 | periods: 0 |
| 111 | function: avg |
Ales Komarek | 9a8bd08 | 2016-10-25 01:25:09 +0200 | [diff] [blame] | 112 | alarm: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 113 | linux_system_cpu: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 114 | notifications: False |
| 115 | alerting: True |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 116 | triggers: |
| 117 | - linux_system_cpu_warning # will not render if referenced trigger is disabled |
| 118 | - linux_system_cpu_critical |
| 119 | dimension: |
Ales Komarek | 83ec1a4 | 2016-10-25 11:08:13 +0200 | [diff] [blame] | 120 | node_role: control |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 121 | |
| 122 | |
| 123 | Remote collector service |
| 124 | ------------------------ |
| 125 | |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 126 | Remote API check example, excerpt from `nova/meta/heka.yml`. |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 127 | |
| 128 | .. code-block:: yaml |
| 129 | |
| 130 | heka: |
| 131 | remote_collector: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 132 | trigger: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 133 | nova_control_api_fail: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 134 | description: 'Endpoint check for nova-api failed.' |
| 135 | severity: critical |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 136 | rules: |
| 137 | - metric: openstack_check_api |
| 138 | relational_operator: '==' |
| 139 | threshold: 0 |
| 140 | window: 60 |
| 141 | periods: 0 |
| 142 | function: last |
Ales Komarek | 00ef62b | 2016-10-21 17:18:05 +0200 | [diff] [blame] | 143 | dimension: |
| 144 | service: 'nova-api' |
Ales Komarek | 9a8bd08 | 2016-10-25 01:25:09 +0200 | [diff] [blame] | 145 | alarm: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 146 | nova_control_api: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 147 | notifications: False |
| 148 | alerting: True |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 149 | dimension: |
Ales Komarek | 83ec1a4 | 2016-10-25 11:08:13 +0200 | [diff] [blame] | 150 | service: nova-control |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 151 | triggers: |
| 152 | - nova_control_api_fail |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 153 | |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 154 | Corresponding clusters and alarms, excerpt from `nova/meta/heka.yml`. |
Ales Komarek | c9a3eb1 | 2016-10-12 11:17:55 +0200 | [diff] [blame] | 155 | |
| 156 | .. code-block:: yaml |
| 157 | |
| 158 | heka: |
| 159 | aggregator: |
Ales Komarek | 9a8bd08 | 2016-10-25 01:25:09 +0200 | [diff] [blame] | 160 | alarm_cluster: |
Ales Komarek | 00ef62b | 2016-10-21 17:18:05 +0200 | [diff] [blame] | 161 | nova_control_service: |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 162 | policy: highest_severity |
| 163 | group_by: member |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 164 | match: |
Ales Komarek | 83ec1a4 | 2016-10-25 11:08:13 +0200 | [diff] [blame] | 165 | service: nova-control |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 166 | dimension: |
Ales Komarek | 00ef62b | 2016-10-21 17:18:05 +0200 | [diff] [blame] | 167 | cluster: openstack-control-plane |
Ales Komarek | f8d248e | 2016-10-21 10:27:28 +0200 | [diff] [blame] | 168 | members: |
Ales Komarek | e2b6260 | 2016-10-21 13:24:10 +0200 | [diff] [blame] | 169 | - nova_control_api |
| 170 | - nova_control_endpoint |
| 171 | hints: |
| 172 | - neutron_control # or contrail_vrouter for contrail nodes |
| 173 | - keystone_control |
Ales Komarek | 00ef62b | 2016-10-21 17:18:05 +0200 | [diff] [blame] | 174 | openstack_control_plane: |
Ales Komarek | 04a5295 | 2016-10-21 16:26:49 +0200 | [diff] [blame] | 175 | engine: gse |
| 176 | policy: highest_severity |
| 177 | group_by: member |
| 178 | match: |
Ales Komarek | 00ef62b | 2016-10-21 17:18:05 +0200 | [diff] [blame] | 179 | cluster: openstack-control-plane |
Jakub Pavlik | e7d12cd | 2015-09-03 19:02:45 +0200 | [diff] [blame] | 180 | |
| 181 | Read more |
| 182 | ========= |
| 183 | |
jan kaufman | 1002cd9 | 2015-09-16 16:30:48 +0200 | [diff] [blame] | 184 | * https://hekad.readthedocs.org/en/latest/index.html |