This is the third of five articles providing insights into the importance of building and maintaining digital systems that are resilient to adverse events. The previous article in our series dealt with designing resilience into your environment. Once resilience features have been incorporated into the systems, it is up to the operations to monitor, identify anomalies, and make the appropriate decisions to avoid and/or mitigate the impact of an event.
Preparation, protection, and practice
We will discuss how preparation, protection and practice apply to monitoring and incident response allowing your business to stay operationally resilient. As a reminder, the “Three Ps” are:
Assessment and identification of gaps begin with a solid strategic response plan that covers the monitoring of internal systems and accessing and incorporating external information on emerging threats. This allows you to adjust to emerging threats and watch for operational anomalies with sufficient time to take preventative actions. The plan for monitoring internal systems includes both technical and business indicators allowing identification of technical failures. Examples of failures include storage constraints or logical inconsistencies, such as control discrepancies. The monitoring systems must be able to eliminate noise and support correlation analysis to help pinpoint the actual triggering cause.
Configure to initiate automated actions
Where possible, monitoring systems should be configured to initiate automated actions that provide continued service when events do take place. This especially true in cases when:
Where monitoring systems are supported by automated corrective actions, notification should be produced for subsequent review and escalation by knowledgeable parties.
Response plan notification and escalation
Response plan notification and escalation should clearly specify steps to ensure the appropriate management levels are quickly made aware of the incident. This helps ensure that deciding the next steps, and consistent communication, are executed in a timely fashion.
While managing the incident, it’s key to balance swift restoration of service with capturing sufficient information to provide accurate root cause analysis. This helps create effective corrections to prevent future incidents from occurring repeatedly. At some point in the management of the incident, a recovery may be the appropriate decision. We will address that topic in the next article.
Commission independent reviews
Another element of “watching” is to commission independent reviews of systems and monitoring. This includes having experts test your protection and monitoring provisions in the cyber-area, as well as having independent resilience reviews of your IT service delivery set up. It’s often a good idea to avoid working in a vacuum, as an objective outside point-of-view can illuminate blind spots. These reviews can provide you with important improvement opportunities to avert future incidents.
“Include expert testing of your cyber protection and monitoring provisions – as well as objective, independent resilience reviews of your IT service delivery setup.”
Practice your incident response
Regularly practicing your incident response rehearsals is important. Without frequent rehearsal, decision-making in the heat of the moment can present challenges. Rehearsal scenarios should represent realistic situations and include unannounced changes in conditions.
The final step in effective response is a continuous improvement program that helps you improve on your incident response capabilities. Working with a professional consulting firm to develop and periodically test your incident response plan can also be a wise investment.
Consulting firms with deep subject matter expertise bring a wealth of knowledge about newest and best security and resiliency practices in IT, industry-specific trends, and specialised skillsets that help accelerate progress.
At Kyndryl Security and Resiliency, we specialise in operational resilience. We have the skills and experience to conduct independent reviews and probes of cyber-protection and IT operational resilience environments.
Working with us helps customers reduce the time it takes to develop a plan, and – through periodic rehearsals using our experienced consultants and partners – to execute successfully during a live event. A proactive approach to monitoring and incident response can help you develop effective capabilities and plans to avoid cyber-geddon and costly damage from unplanned outages.
By Bob Pitcole - Executive Consultant, Kyndryl Security & Resiliency Servces
© 2024, Lyonsdown Limited. teiss® is a registered trademark of Lyonsdown Ltd. VAT registration number: 830519543