Recommended MMR

Published date: May 28, 2024, Version: 1.0

MINIMUM MONITORING REQUIREMENTS @CTC:    Recommended  For Application Performance Monitoring

METRIC (TO BE MONITORED)

WHAT WILL IT MONITOR?

Aggregation

Alert Threshold

User Requests seen by service

Rate - Input traffic volume in different time intervals

AVG

N/A

business transactions being completed successfully

Rate – for Output traffic volume in different time intervals (aka application-throughput)

AVG

100 (requests per min)

Error counts by http-response code value

Error – for requests ended in errors

Count (5)

5

Response Latency

Latency - in milliseconds noticed in Service responses. Should be measured in percentiles. P99 means 99% of requests are experiencing this much latency

AVG

500ms (Based on historical data from DOM app)

Requests time spent in wait queues

Saturation – indicated by long wait times spent in service input queues

AVG

200ms

Resource (cpu, memory, disk) utilization percentages1

Saturation – indicated by persistent spikes in resource usage

AVG

High – 90%

Warning – 80%

METRIC CORRELATION (TO BE MONITORED)

WHAT WILL IT MONITOR?

 

 

number of concurrent users

versus

request latency times

how long it takes to start processing a request after the user has sent it, in different load situations.

 

NA

The number of concurrent users

versus

the average response time

how long it takes to complete a request after it has started processing, in different load situations

 

NA

The volume of requests

versus

the number of processing errors.

 

How scalable your system is in maintaining processing accuracy with the increasing request load.

 

NA

Recommended For Infrastructure Monitoring

Server Resources on each Server
   

Webserver host

• CPUs
• Memory
• Network interfaces

•Storage devices
•Controllers Interconnects:
•File System

 

High – 90 %

Warning- 80 %

Load Balancer host

-same as above-

 

Same as above

Database Server host

-same as above-

 

Same as above

Time Server host

-same as above-

 

Same as above

DHCP Server host

-same as above-

 

Same as above

MQ Server host

-same as above-

 

Same as above

Name Resolution Server host

-same as above-

 

 

Same as above

Recommended For Database Monitoring

Database(TO BE MONITORED) WHAT WILL IT MONITOR?  

Oracle DB Status

Oracle Status – helps to notify whether the database is running or not

Down

A job's execution can fail for any of several reasons, including incorrect input, storage or memory quota issues, timeouts, or database connection problems.

Failed Jobs – number of jobs failed, either by throwing an error or by abnormally terminating

5

the maximum number of Operating System (OS) user processes that can simultaneously connect to the Oracle database at the same time

Process Usage Percentage – saturation of Oracle DB process

70

This metric checks for the number of failed logins on the target database

Failed Login Count- the number of failed logon attempts

3

the time spent in database operations per transaction

Response Time – response time (ms)

200ms

specifies the maximum number of sessions used by the database

Sessions Usage – in percentage (%)

30%

The metrics active_sessions and inactive_sessions denotes the number of active and inactive sessions respectively

Active and Inactive Sessions  

5

Know your tablespace_status to be either Read-Write, Read-Only or Offline. tablespace_usage_percent helps track how data grows in the database and to make sure appropriate provisioning is given

Tablespace Status and its Usage (in %) -  

High – 90 %

Warning – 80 %

The reads and writes represent the number of physical reads and writes respectively. A total of the reads and writes gives the I/O activity for a specific disk.

Number of Reads & Writes in a Tablespace

10

The rman_failed_backup_count provides the number of failed backups in the RMAN repository

RMAN Failed Backup Count – backup status

5

High Disk Write Utilization [Disk mount point will change but physical write to DB needs to be included]

Percentage of Physical Write Utilization

85%

     

Recommended For Synthetic Monitoring Setup

WHAT WILL IT MONITOR?  

Service Availability – checks if the application is online or not

Down

SSL-Cert Expiry – it checks for number of days remaining until certificate expired

High – Less than 5 days left

Warning – Less than 10 days left

Page load behavior – emulates a ‘page visit’ using Google Chrome agent.

200 ms

Any broken link on a page – reports individual non-successful links that caused a failure.

Step failure

Recommended For Log Monitoring Setup

TYPE OF LOGS

Examples

 

OS Logs

Linux - log file path: /var/log/*

 

Application Logs

CTC-App specific Log- <check the file path for log>

 

Web Server Logs

Apache /var/log/httpd/*.log

Nginx /var/log/nginx/*. Log

 

DB Server Logs

Oracle Log files

 

TIBCO EMS Logs

Log files for Tibco

 

Note – Above mentioned thresholds are only reference purpose these can be modified as per application behaviour and based on historical data.