6.4.4 Threshold configurationAs previously mentioned the thresholds can be configured from the Thresholds => Configuration menu or from the "Site and equipment creation" menu. The configuration page shows the configured thresholds, along with a filtering table if the user choose to use the Thresholds => Configuration menu. There is another manual threshold configuration option: when a user views a chart then he can create a threshold more easily using the [menu] on the chart and then selecting the Threshold configuration element. For more detail see section 11.8 Threshold configuration using charts. The columns of the table containing the thresholds are: · First column: The thresholds can be configured with the help of the links available here: o [add_new]: adding a new normal threshold o [from_template]: creating a new normal threshold from a template o [add_group]: adding a new group threshold o [edit]: modifying the threshold in the given line o [ · Description: Optional description field. If specified then PVSR shows its value when the user places the cursor above the name of an alarm on the Alarms page (Alarm window view mode) · Name: The name of the threshold that is sometimes displayed at the individual measurements and in the threshold violation list. This has to be unique. · Level: The level of the threshold. The possible values according to decreasing severity levels are: o Critical o Major o Warning o Minor o Unknown · Type: the type of the threshold; the list can be modified in the Types menu · E-mail: To which email addresses should the application send a letter if there is threshold violation. Zero or more email addresses can be specified from the list of e-mail addresses provided earlier. The user (except the user with the admin name) cannot delete other user’s e-mail addresses from the list. At threshold violation the system will send separate or a single e-mail message containing the violations of the same cycle generated at the same time depending on the configuration. · Quick eval: Whether the threshold is using the quick evaluation feature or not, see later. It cannot be set for group thresholds and it is also not supported if a customer implements his/her custom data collector using the Python API instead of the Perl API. It also cannot be used along with the exception list and dynamic (for example trend) values · Expression: The expression belonging to a threshold The measurement data flow depends whether the threshold is using the quick evaluation feature or not. If it is not using it, then the measurement values are first collected by collector processes, than written into temporally text files than they are transported to the central server, where they are loaded into the database, than the threshold engine is notified and the threshold evaluation begins. If it is using it, then the measurement values are collected by the collector processes and they are sent immediately directly to the threshold engine and the evaluation begins. This means that the thresholds are being evaluated as soon as possible, even before the measured values arrive in the database. However the option has some drawbacks as well: threshold evaluation happens only for those data collection times when the threshold engine is running, but thresholds without this setting are also evaluated “backwards in time” when the threshold engine starts. Also: the measured values are always present on the charts with threshold information when the threshold is in alarm state or when the alarm state ends, but otherwise they might still be midway in the normal data flow. Important to note that irregular data collectors (i.e. data collectors without a fixed list of collection intervals, like the MQTT collector) are always using the quick evaluation feature. The Unknown threshold level is special in the system. Users cannot create thresholds with this level, PVSR automatically creates them when needed, but only for normal thresholds and not for group thresholds. PVSR can raise threshold alarms based on several different condition, including the cases when the measurement is not successful. For example for a non-negative measurement the expression “value < 10” is not only true when the value is between 0 and 9, but also when the measurement is not successful. Normally we do not want this to happen, in which case the expression “value < 10 and value >=0” is applied. If the threshold is created in this way then PVSR detects that although the measurement is important (since there is a threshold for it), still no alarm would be raised if the measurement becomes unsuccessful. For this purpose PVSR will automatically create a new threshold with the level Unknown and with the same additional attributes as the non-Unknown threshold (e-mail, trap, type, …). If the measurement has multiple thresholds (for example “value < 20 and >=0”, “value < 30 and >=0”) then PVSR will still create just one Unknown threshold and will choose the threshold with the highest level to take the additional attributes from. Obviously if a non-Unknown threshold is raised when the measurement is not successful then PVSR will not create an Unknown threshold. The system tries to minimize the raising of the Unknown alarms in two ways: first of all the threshold processing doesn’t raise a new Unknown alarm in the first 15 minutes of its processing time and it also doesn’t raise a new Unknown alarm if any of its measurement was created or modified in the last hour. Obviously already raised Unknown alarms will not end in either case. Several measurement types can be configured as “availability” measurements. If an equipment has such a measurement with the same data collection cycle as an Unknown threshold and it is not successful or successful but its value is zero (depending on the configuration) then the application assumes that the equipment is not available and in this case these Unknown thresholds will not be violated. The users can only configure in this case the number of data collection cycles or the number of minutes after which the unsuccessful measurement will raise an alarm and the measurement types which are to be interpreted as “availability” measurements (see section 6.4.7). Thresholds can be created using a threshold template and templates might be automatically applied to measurements by the application. Users can modify such thresholds as well, however these modifications might be lost if someone changes the template or reapplies it. It is important to note that the application does not create a threshold based on a template if it would result in a threshold with the same name or expression as a currently existing threshold. Thus if someone wants to modify a threshold based on a template then there are different options: · Simply modify the threshold, but the modification might be lost later on as described above · Unlink the threshold from the template. In this case further template modifications will not affect the threshold, unless the template is manually reapplied (meaning that the thresholds will be recreated using the template) and the original threshold’s name and expression was changed thus a new threshold can be created using the template, resulting in two thresholds (the original and the newly created) · Disable the threshold. In this case the threshold is not evaluated and it does not appear under the normal thresholds or alarms, but reapplying the template will not recreate/reenable is · Disable the threshold and create a new one with the same settings. This is similar to the previous case, but a new one is created as well and the user can make any kind of modification to it freely If we want to make changes to a threshold based on a template then if we want to change its name and its expression then it is recommended to use the disable and copy option, otherwise it is recommended at least to unlink the threshold. Other threshold attributes, which are not listed in the table: these attributes are shown on separate subtabs when modifying a threshold · Mobile notification: Which users or mobile notification groups must be notified when an alarm is raised · SNMP Trap: To where should the application send an SNMP trap if there is threshold violation. Zero or more email addresses can be given, separated by commas. The format of the individual addresses: [community@]computer_name[:port]. If we do not set an explicit community, then public is used by default. If we do not set an explicit port, then it is 162 by default. The system also sends the variable causing the violation in the SNMP trap, but only if there is only one variable in the threshold definition and its current value can be stored in 32 bits. · Command: What other further command should be executed in the case of a threshold violation. The command is executed by PVSR from its bin directory. The following parameters are passed to the command via the command line: o Threshold id o Threshold name o Threshold level: 1=Minor, 2=Warning, 3=Major, 4=Critical, 0= o Time format YYYY.MM.DD. HH:MM:SS o Measurement data (in case of normal threshold): this appears as many times as the number of measurements that took part in the violation (this is missing when the command is executed when the alarm expires). § Site ID § Site name § Equipment ID § Equipment name § Measurement type § Measurement ID § Measurement name § IN value § OUT value · Continuous alarm: If it is set, then at each violation the appropriate email and/or SNMP trap will be sent and/or the appropriate command is executed. If is not set, then the application will send an email only if there was no violation during the previous measurement. In both cases it will send an email and/or SNMP trap and/or execute the command when the threshold violation is ended. If the value of the CLEARED_TRAP_COUNT configuration variable is larger than 1, then the system sends not only one but the given number of traps when the threshold violation ends. · When: Which time period template is used for the threshold. It is important to note that alarms won’t be raised even during these time periods if one of the following is true:
When thresholds are added or modified, several fields, except the expression, are displayed by the application like in the query view, but the editing of the expression is different for the normal and group thresholds. 6.4.4.1 The expression of normal thresholdsIn case of normal thresholds the editing of expression consists of two parts: one multi lined condition expression that determines when the expression is true thus when to system has to send an alarm; and an optional list of exception list. The exception list List of thresholds: when all of them are satisfied then this alarm should not be generated (kind of masking the edited threshold). This option is not available for irregular measurements. For example, if we have three thresholds A, B and C and at the A threshold the B and C thresholds are set as exceptions (so called parent) then even if the expression set at A is true the alarm will not be generated if the expression at B and C are both true. If in our example B and C are the examination of the availability of a router and A is an evaluation of a device which can be reached by B or C router then no—kind of false—alarm will be generated for the device A when neither router are available. This cannot be used if the quick evaluation feature is used. Condition expression Figure 49. Normal threshold configuration The following documentation details the different expression modes. Only in the case of periodic measurements are all the options available. For irregular measurements these restrictions apply: · The Match in sample and Sample size must be 1 and 1 (i.e. the threshold is immediately violated the first time the expression is true) · Only Static values can be used, Dynamic values not · Only Values can be used, trend calculations not The expression can be edited using a Simple mode and an Advanced mode. In the Simple mode the user only has to select the different items from the drop-down select box and to fill in static measurement values; while in the Advanced mode he can edit the expression freely. Static values In a normal threshold expression two types of condition systems can be used: either static or dynamic value based, and these two types can even be mixed in a single expression. Important to note that quick evaluation thresholds cannot use dynamic values. For the static condition the measurement value is always compared to the same value, for example, the value of a traffic measurement is always bigger than 10 Mbps. The advantage of this condition is that it is easy to define; its disadvantage is, however, that it does not take into account the long-term changes in the measurement value, and therefore the static values must be overwritten manually at times. Dynamic values On the other hand, for a dynamic value the system takes the values of the past periods according to the specified conditions, then it calculates the average and the standard deviation of these values and compares the actual measurement value to the average + or – some times the standard deviation value. The advantage of this method is that if the measurement value is slowly but continuously rising (e.g., it is increasing from week to week or month to month) then it is not necessary to overwrite the static value used for comparison with the average value since that is going to increase too. The disadvantage of the method is that it can be successfully used only if the measured values do not fluctuate much but they show some statistical pattern. For example, the traffic on one computer used by a single person does not meet this condition, however the common traffic of many people or the traffic of a continuously running application could. The frequency and the resolution of calculating this dynamic value can be specified as follows: · Cycle: If the specified cycle is daily, then the average and deviation value is calculated based on the last X days. In this case the system does not take into account the day of the week. So, for example, the basis of the measurement on Monday is the average and standard deviation values on the previous Sunday. If the cycle is weekly, then the basis for the measurement is on Mondays the values measured on Mondays of the last X weeks, on Tuesdays the values on Tuesdays of the last X weeks, etc. · Resolution: if the resolution is hourly, then at 3:30 p.m. the system takes the measured values between 3 p.m. and 4 p.m. on the days defined by the above cycle specification. On the contrary, if the resolution is four-hourly, then the data measured between 12 p.m. and 4 p.m. are considered, and if it is daily then the data of all days corresponding to the Cycle conditions are used. Editing the expression in Simple mode Since each expression can have multiple conditons in them, the editing is basically setting the condition parameters one-by-one and than adding it to the expression. The condition parameter setting is divided into five sections. If the expression isn't empty then first the user has to select either the "and" or the "or" operator. After that he has to select the measurement. The next step is to select what he wants to compare and how to compare it. He can choose from the actual measurement values (like Input value and Output value) or he can choose a trend based value. This could be for example a linear trend: PVSR will calculate a linear trend line based on the specified samples and calculate a forward prediction into the future based on the current trend. The trend can also be a Difference trend which is basically the difference between the current value and the previous value. Another possible trends are the “Moving window” trends: in these cases the application calculates an average or a sum based on the last specified number of values. The comparation types can be >=, >, … and even Successful or Unsuccessful if the user hasn’t choose the trend calculation. The next step (unless he choose Successful or Unsuccessful operator) is to select to what he wants to compare. This could be either a static or a dynamic value. When using a static value the user can use the K, M, G postfixes meaning 1 000, 1 000 000 and 1 000 000 000. The last step is to specify how often the condition must be true in order to generate an alarm. The application sets initialy the sample size to the measurement interval. After the last step is done, the user has to click on the "Add to" link and the application will add it to the expression. If the user wants to delete a condition then he only has to click on the bin icon next to it. For most subexpressions their values can be modified by clicking on the modification icon next to them. If the current expression does not specify condition for the case when the measurements in it are not successful, then there is an additional item next to the expression called “Keep alarm when measurements are unsuccessful”. Clicking on it modifies the expression so that if the alarm is active and the measurements become unsuccessful then the alarm will not be closed. Editing the expression in Advanced mode In the Advanced mode, the user is able to edit the inner representation of the threshold. In this mode some of the fields are text fields instead of drop-down selections, so the user can exactly specify the value. For example: in Simple mode the "Sample size" is a select item, so the user won't be able to specify "123 times of collection cycle". The other main difference is that the user can use other operators and parenthesis as well. The third difference is that since the user can edit the expression freely, he can create invalid expressions as well. He won't be able to save an expression like that and he will have an additional link called Validation below the "Add to" link: by clicking it the application examines the current expression and either shows the textual presentation in a separate row called Validation or displays an error message in the same row. There are two additional elements which can be used in an expression but which cannot be specified using the Simple mode: · #PREV_ALARM_STATE#: its valus is 1 if the threshold was violated in the previous collection cycle · #PREV_ALARM_STATE.<ID>#: its valus is 1 if the threshold with the database id <ID> was violated in the previous collection cycle For users with Perl programing language experience: the DBID… parts will be replaced with the actual measurement values, then for each #E#X#Y# part E will be evaluated as a Perl expression and the whole #...# part will be replaced with 1 if E was > 0 at least X times in Y samples, otherwise it will be replaced with 0. Finally the whole expression will be evaluated as a Perl expression again and the alarm occurs if the value of the whole expression is > 0. 6.4.4.2 The expression of group thresholdsThe group thresholds can only be based on periodic measurement based thresholds. The editing of group thresholds is done on a different panel, thus contrary to the editing of normal threshold expressions there is no need of complex expressions or for the knowledge of special syntax.
Description of fields: · Object type and object: the group alarm always summarizes the alarms of a normal or virtual site or a device. The required objects have to be specified here. If we have opened this editing form from the device configuration panel then these two fields are missing. · Filter parameters: under the specified object there may be more than one alarm, so we can filter them and therefore the group alarm will not contain all. The filter parameters are: 1. Interval: The interval of the thresholds it should apply. This is an obligatory filter parameter, that is contrary to the level and type parameters not all values are listed. 2. Name: the % and _ characters can be used. 3. Level 4. Type · Examination of existing alarms: The group alarm can be fired under two conditions: based on the number of all alarms or that of the new alarms only. In the former case the group alarm occurs when there are more alarms for the thresholds belonging to the group than the given limit. The alarm is stopped when the number of alarms goes below this limit. In the case of newly created alarms the alarm is fired when there are more newly generated alarms than the limit, and it stops when the number of the alarms falls bellow this limit. · Number of alarms: In both cases the user may chose a numerical or a percent value. In the later case 100% means all thresholds belonging to the given group. |