MINIMUM MONITORING REQUIREMENTS @CTC: Recommended For Application Performance Monitoring
METRIC (TO BE MONITORED) |
WHAT WILL IT MONITOR? |
Aggregation |
Alert Threshold |
---|---|---|---|
User Requests seen by service |
Rate - Input traffic volume in different time intervals |
AVG |
N/A |
business transactions being completed successfully |
Rate – for Output traffic volume in different time intervals (aka application-throughput) |
AVG |
100 (requests per min) |
Error counts by http-response code value |
Error – for requests ended in errors |
Count (5) |
5 |
Response Latency |
Latency - in milliseconds noticed in Service responses. Should be measured in percentiles. P99 means 99% of requests are experiencing this much latency |
AVG |
500ms (Based on historical data from DOM app) |
Requests time spent in wait queues |
Saturation – indicated by long wait times spent in service input queues |
AVG |
200ms |
Resource (cpu, memory, disk) utilization percentages1 |
Saturation – indicated by persistent spikes in resource usage |
AVG |
High – 90% Warning – 80% |
METRIC CORRELATION (TO BE MONITORED) |
WHAT WILL IT MONITOR? |
|
|
---|---|---|---|
number of concurrent users versus request latency times |
how long it takes to start processing a request after the user has sent it, in different load situations. |
|
NA |
The number of concurrent users versus the average response time |
how long it takes to complete a request after it has started processing, in different load situations |
|
NA |
The volume of requests versus the number of processing errors.
|
How scalable your system is in maintaining processing accuracy with the increasing request load. |
|
NA |
Recommended For Infrastructure Monitoring
Server | Resources on each Server |
||
---|---|---|---|
Webserver host |
• CPUs •Storage devices |
|
High – 90 % Warning- 80 % |
Load Balancer host |
-same as above- |
|
Same as above |
Database Server host |
-same as above- |
|
Same as above |
Time Server host |
-same as above- |
|
Same as above |
DHCP Server host |
-same as above- |
|
Same as above |
MQ Server host |
-same as above- |
|
Same as above |
Name Resolution Server host |
-same as above-
|
|
Same as above |
Recommended For Database Monitoring
Database(TO BE MONITORED) | WHAT WILL IT MONITOR? | |
---|---|---|
Oracle DB Status |
Oracle Status – helps to notify whether the database is running or not |
Down |
A job's execution can fail for any of several reasons, including incorrect input, storage or memory quota issues, timeouts, or database connection problems. |
Failed Jobs – number of jobs failed, either by throwing an error or by abnormally terminating |
5 |
the maximum number of Operating System (OS) user processes that can simultaneously connect to the Oracle database at the same time |
Process Usage Percentage – saturation of Oracle DB process |
70 |
This metric checks for the number of failed logins on the target database |
Failed Login Count- the number of failed logon attempts |
3 |
the time spent in database operations per transaction |
Response Time – response time (ms) |
200ms |
specifies the maximum number of sessions used by the database |
Sessions Usage – in percentage (%) |
30% |
The metrics active_sessions and inactive_sessions denotes the number of active and inactive sessions respectively |
Active and Inactive Sessions |
5 |
Know your tablespace_status to be either Read-Write, Read-Only or Offline. tablespace_usage_percent helps track how data grows in the database and to make sure appropriate provisioning is given |
Tablespace Status and its Usage (in %) - |
High – 90 % Warning – 80 % |
The reads and writes represent the number of physical reads and writes respectively. A total of the reads and writes gives the I/O activity for a specific disk. |
Number of Reads & Writes in a Tablespace |
10 |
The rman_failed_backup_count provides the number of failed backups in the RMAN repository |
RMAN Failed Backup Count – backup status |
5 |
High Disk Write Utilization [Disk mount point will change but physical write to DB needs to be included] |
Percentage of Physical Write Utilization |
85% |
Recommended For Synthetic Monitoring Setup
WHAT WILL IT MONITOR? | |
---|---|
Service Availability – checks if the application is online or not |
Down |
SSL-Cert Expiry – it checks for number of days remaining until certificate expired |
High – Less than 5 days left Warning – Less than 10 days left |
Page load behavior – emulates a ‘page visit’ using Google Chrome agent. |
200 ms |
Any broken link on a page – reports individual non-successful links that caused a failure. |
Step failure |
Recommended For Log Monitoring Setup
TYPE OF LOGS |
Examples |
|
---|---|---|
OS Logs |
Linux - log file path: /var/log/* |
|
Application Logs |
CTC-App specific Log- <check the file path for log> |
|
Web Server Logs |
Apache /var/log/httpd/*.log Nginx /var/log/nginx/*. Log |
|
DB Server Logs |
Oracle Log files |
|
TIBCO EMS Logs |
Log files for Tibco |
Note – Above mentioned thresholds are only reference purpose these can be modified as per application behaviour and based on historical data.