Recommended MMR

Published date: May 28, 2024, Version: 1.0

MINIMUM MONITORING REQUIREMENTS @CTC: Recommended For Application Performance Monitoring

METRIC (TO BE MONITORED)	WHAT WILL IT MONITOR?	Aggregation	Alert Threshold
User Requests seen by service	Rate - Input traffic volume in different time intervals	AVG	N/A
business transactions being completed successfully	Rate – for Output traffic volume in different time intervals (aka application-throughput)	AVG	100 (requests per min)
Error counts by http-response code value	Error – for requests ended in errors	Count (5)	5
Response Latency	Latency - in milliseconds noticed in Service responses. Should be measured in percentiles. P99 means 99% of requests are experiencing this much latency	AVG	500ms (Based on historical data from DOM app)
Requests time spent in wait queues	Saturation – indicated by long wait times spent in service input queues	AVG	200ms
Resource (cpu, memory, disk) utilization percentages¹	Saturation – indicated by persistent spikes in resource usage	AVG	High – 90% Warning – 80%

METRIC CORRELATION (TO BE MONITORED)	WHAT WILL IT MONITOR?
number of concurrent users versus request latency times	how long it takes to start processing a request after the user has sent it, in different load situations.	NA
The number of concurrent users versus the average response time	how long it takes to complete a request after it has started processing, in different load situations	NA
The volume of requests versus the number of processing errors.	How scalable your system is in maintaining processing accuracy with the increasing request load.	NA

METRIC CORRELATION (TO BE MONITORED)

WHAT WILL IT MONITOR?

number of concurrent users

versus

request latency times

how long it takes to start processing a request after the user has sent it, in different load situations.

The number of concurrent users

versus

the average response time

how long it takes to complete a request after it has started processing, in different load situations

The volume of requests

versus

the number of processing errors.

How scalable your system is in maintaining processing accuracy with the increasing request load.

Recommended For Infrastructure Monitoring

Server	Resources on each Server
Webserver host	• CPUs • Memory • Network interfaces •Storage devices •Controllers Interconnects: •File System	High – 90 % Warning- 80 %
Load Balancer host	-same as above-	Same as above
Database Server host	-same as above-	Same as above
Time Server host	-same as above-	Same as above
DHCP Server host	-same as above-	Same as above
MQ Server host	-same as above-	Same as above
Name Resolution Server host	-same as above-	Same as above

Recommended For Database Monitoring

Database(TO BE MONITORED)	WHAT WILL IT MONITOR?
Oracle DB Status	Oracle Status – helps to notify whether the database is running or not	Down
A job's execution can fail for any of several reasons, including incorrect input, storage or memory quota issues, timeouts, or database connection problems.	Failed Jobs – number of jobs failed, either by throwing an error or by abnormally terminating	5
the maximum number of Operating System (OS) user processes that can simultaneously connect to the Oracle database at the same time	Process Usage Percentage – saturation of Oracle DB process	70
This metric checks for the number of failed logins on the target database	Failed Login Count- the number of failed logon attempts	3
the time spent in database operations per transaction	Response Time – response time (ms)	200ms
specifies the maximum number of sessions used by the database	Sessions Usage – in percentage (%)	30%
The metrics active_sessions and inactive_sessions denotes the number of active and inactive sessions respectively	Active and Inactive Sessions	5
Know your tablespace_status to be either Read-Write, Read-Only or Offline. tablespace_usage_percent helps track how data grows in the database and to make sure appropriate provisioning is given	Tablespace Status and its Usage (in %) -	High – 90 % Warning – 80 %
The reads and writes represent the number of physical reads and writes respectively. A total of the reads and writes gives the I/O activity for a specific disk.	Number of Reads & Writes in a Tablespace	10
The rman_failed_backup_count provides the number of failed backups in the RMAN repository	RMAN Failed Backup Count – backup status	5
High Disk Write Utilization [Disk mount point will change but physical write to DB needs to be included]	Percentage of Physical Write Utilization	85%

Recommended For Synthetic Monitoring Setup

WHAT WILL IT MONITOR?
Service Availability – checks if the application is online or not	Down
SSL-Cert Expiry – it checks for number of days remaining until certificate expired	High – Less than 5 days left Warning – Less than 10 days left
Page load behavior – emulates a ‘page visit’ using Google Chrome agent.	200 ms
Any broken link on a page – reports individual non-successful links that caused a failure.	Step failure

Recommended For Log Monitoring Setup

TYPE OF LOGS	Examples
OS Logs	Linux - log file path: /var/log/*
Application Logs	CTC-App specific Log- <check the file path for log>
Web Server Logs	Apache /var/log/httpd/.log Nginx /var/log/nginx/. Log
DB Server Logs	Oracle Log files
TIBCO EMS Logs	Log files for Tibco