Alerts Configuration

❗️
This is a legacy Apache Ignite documentation
The new documentation is hosted here: https://ignite.apache.org/docs/latest/

Alert Command Specification

Register alert: alert -r {-t=<sec>} {-<metric>=<condition><value>} ... {-<metric>=<condition><value>}
Unregister: alert -u {-id=<alert-id>|-a}

Alert options:

-n Alert name
-u Unregisters alert(s). Either '-a' flag or '-id' parameter is required. Note that only one of the '-u' or '-r' is allowed. If neither '-u' or '-r' provided - all alerts will be printed.
-a When provided with '-u' - all alerts will be unregistered.
-id=<alert-id> When provided with '-u' - alert with matching ID will be unregistered.
-r Register new alert with mnemonic predicate(s).
Note only one of the '-u' or '-r' is allowed. If neither '-u' or '-r' provided - all alerts will be printed.
-t Defines notification frequency in seconds. Default is 60 seconds.
Note This parameter can only appear with -r
-s Define script for execution when alert triggered.
For configuration of throttle period see -i argument. Script will receive following arguments: alert name or alert ID when name is not defined, condition as string, values of alert conditions ordered as in alert command.
-i Configure alert notification minimal throttling interval in seconds. Default is 60 seconds.
-<metric> This defines a mnemonic for the metric that will be measured.
Grid-wide metrics (not node specific):
- cc - Total number of available CPUs in the grid.
- nc - Total number of nodes in the grid.
- hc - Total number of physical hosts in the grid.
- cl - Current average CPU load (in %) in the grid.
Per-node current metrics:
- aj - Active jobs on the node.
- cj - Cancelled jobs on the node.
- tc - Thread count on the node.
- ut - Up time on the node. By default (no suffix provided) value is assumed to be in milliseconds.
  Note can have 's', 'm', or 'h' suffix indicating seconds, minutes, and hours.
- je - Job execute time on the node.
- jw - Job wait time on the node.
- wj - Waiting jobs count on the node.
- rj - Rejected jobs count on the node.
- hu - Heap memory used (in MB) on the node.
- cd - Current CPU load on the node.
- hm - Heap memory maximum (in MB) on the node.
<condition> This defines a condition for metric.
Comparison part of the mnemonic predicate:
- eq - Equal '=' to '' number.
- neq - Not equal '!=' to '' number.
- gt - Greater than '>' to '' number.
- gte - Greater than or equal '>=' to '' number.
- lt - Less than '<' to 'NN' number.
- lte - Less than or equal '<=' to '' number.

Examples

alert
Prints all currently registered alerts.

alert -u -a
Unregisters all currently registered alerts.

alert -u -id=12345678
Unregisters alert with provided ID.

alert -r -t=900 -cc=gte4 -cl=gt50
Register alert that will notify every 15 min if grid has >= 4 CPUs and > 50% CPU load.

alert -r -n=Nodes -t=15 -nc=gte3 -s=/home/user/scripts/alert.sh -i=300
Register alert that will notify every 15 second if grid has >= 3 nodes and execute script "/home/user/scripts/alert.sh" with repeat interval not less than 5 min.

Custom Script

Register alert that will execute script: /home/user/myScript.sh every 15 second if grid has >= 2 nodes and cpu count <= 16 with repeat interval not less than 5 min.

alert -r -t=5 -n=MyAlert -nc=gte2 -cc=lte16 -i=15 -s=/home/user/myScript.sh

Alert handle script:

echo ALERT [$1] CONDITION [$2] alarmed with node count [$3] and cpu count [$4]

Will generate output in terminal like this:

ALERT [MyAlert] CONDITION [-nc=gte2 -cc=lte16] alarmed with node count [2] and cpu count [8]

📘
Please note, that $1 points to alert name, $2 points to alert conditions and $3, $4,.... point to value of each sub-condition.

❗️This is a legacy Apache Ignite documentation

Alert Command Specification

Examples

Custom Script

📘

❗️
This is a legacy Apache Ignite documentation