Incident Management

Published date: April 15, 2024, Version: 1.0

 

What We Do

Enterprise Incident Management is responsible for the overall management of Critical IT Incidents. Through the utilization of "time boxing" and ITIL Management best practices, we apply our Incident Management process through the entire life-cycle of Critical Incidents while providing regular updates to Business and IT stakeholders

Our Goal

The primary goal of EIM is to restore normal service operation as quickly as possible, and to minimize the impact on business operations, thus ensuring that the best possible levels of service quality and availability are maintained. To further assist with this goal, we aim to reduce the number of critical incidents through the EP3 incident process by addressing, managing, and resolving issues before they become critical.  

Major Incident Timeboxing

Timeboxing allocates a fixed time period, called a timebox, within which planned activity takes place. It is leveraged in several project management, software development and disaster recovery models including:

  • Agile
  • Scrum
  • Lean
  • Rapid application development

The goal of timeboxing is to define and limit the amount of time dedicated to an activity. This helps manage the risk of overextended tasks and ensures teams focus on the task at hand. It also includes several escalation points to allow for additional management of risk within a major incident.

With a defined goal of 4 & 8 Hour SLAs for P1 & P2s respectively, major incident management employs the below timeboxes for each activity during the lifecycle of a major incident. 

Escalated Priority 3

An Escalated P3 (EP3) is a non-critical service impacting a large subset of users that do not have a viable workaround or a critical system or service that has an acceptable workaround.

Criteria of an EP3 include:

  • No redundancy for a Critical Service
  • Non-Critical Service unavailable with no viable workaround, and business impacting implications
  • Critical Service degraded with a viable workaround
  • Non-Critical Incident, with business impact, that will lead to a Critical Incident if not resolved
  • Escalated issues from Business Users and/or AVP

An EP3 is integrated with the break-fix change management process. This allows technical staff to quickly rectify any issues that have been identified. Some integration points include:​​​​​​​

  • A Break-Fix change will be available for EP3 Incidents. All Break-Fix changes MUST have manager approval before proceeding. The Break Fix process is not meant for the following and will be deemed a Normal / Fast Track change:
    • Project changes
    • Unplanned changes and or to meet deadlines
    • Changes causing an increase in impact
  • Break Fix changes will be discussed with IT teams and Incident management to determine the best method of implementation during operational hours. If outside operational hours IT management will be the only reviewer.
  • Change Management team will audit all Break/Fix changes to ensure the change process is being followed correctly.