Problem management is the practice of identifying and managing the causes of disruptions in an IT service. A known bug is a problem.
Problem Management in ITIL 4: Practice and Benefits
ITIL defines a problem as a cause, or potential cause, of one or more incidents. A known error is a problem that has been analyzed but not resolved.
Every service has errors, flaws, or vulnerabilities that may cause incidents, which can originate from any of the four dimensions of service management. Some errors remain unidentified or unresolved during service design, development and deployment, and may be a risk to live services. So, how do we manage these problems and known errors before they cause more serious issues?
What is Problem Management?
Problem management is the practice of identifying and managing the causes of incidents on an IT service. Its purpose is to reduce the likelihood and impact of incidents by identifying actual and potential causes of incidents, and managing workarounds and known errors.
It is a core component of ITSM frameworks.
Problem management is about:
- finding and fixing incidents
- identifying and understanding the underlying causes of an incident
- identifying the best method to eliminate that root cause.
Moreover, pinpointing the cause has no value to an organization if it’s a cut-off practice completed by a siloed team, so problem management should be constant and widely practiced across multiple teams, including IT, security, and software developers.
An incident may be over once the service is up and running again, but until the underlying causes and contributing factors are addressed, the problem remains.
Problem Management vs Incident Management
The behaviors behind effective incident management and effective problem management are often similar and overlapping, but there are still key differences.
- Incidents have an impact on users or business practices, and must be resolved so that normal business activity can take place. Incident Management is about restoring services as quickly as possible, often by applying temporary solutions.
- Problems are the causes of incidents therefore they require investigation and analysis to identify the causes, develop workarounds, and recommend longer-term resolution. This reduces the number and impact of future incidents. Problem Management is tasked with analyzing root causes and preventing Incidents from happening in the future.
Problem management and incident management practices are becoming increasingly intertwined. During the times between incidents, IT teams can focus their efforts on problem investigations that lead to improvements and better service quality. This is how problem management becomes the most valuable to the organization.
The Problem Management Practice
The main steps that contribute to a problem management practice are the following:
- Problem Identification
- Problem Control
- Error Control
1. Problem Identification
A proactive approach to problem management identifies problems by:
- Analyzing incident trends, leveraging network monitoring systems, and utilizing other diagnostic software.
- Detecting risks from incidents that might recur.
- Evaluating information received from partners and suppliers.
- Evaluating information from internal software developers, engineers, and test teams.
2. Problem Control
Problem management is a collaborative effort, so for results to be effective, multiple departments and stakeholders should be involved in the problem control phase.
Problem control includes activities like:
- Prioritization
- Investigation
- Analysis
- Documenting known errors
- Workarounds
There are numerous techniques that help in prioritization and analysis of problems. A good rule of thumb to follow is first tackling problems that, when solved, significantly curb the disruption of services in the organization.
Problem analysis should have a holistic approach considering all contributory causes such as those that caused the incident to happen, made the incident worse, or even prolonged the incident.
When a problem cannot be resolved quickly, it is often useful to find and document a workaround for future incidents, based on an understanding of the problem. A workaround is defined as a solution that reduces or eliminates the impact or probability of an incident or problem for which a full resolution is not yet available. An effective incident workaround can become a permanent solution to some problems.
3. Error Control
Once a problem is analyzed, it’s documented as a known error. These known errors are regularly reassessed to account for the impact they create, and to test the effectiveness of workarounds.
Error control also regularly re-assesses the status of known errors that have not been resolved, taking account of the overall impact on customers and/or service availability, and the cost of permanent resolutions, and effectiveness of workarounds. The effectiveness of workarounds should be evaluated each time a workaround is used, as the workaround may be improved based on the assessment.
Why should I implement Problem Management?
The benefits of taking a formal approach to Problem Management include the following:
- Preventing service disruptions
- Maintaining service levels
- Meeting service availability requirements
- Improving the productivity of support staff by ensuring Problems are dealt with according to impact on the business and priority
- Improving user satisfaction
- Resolving problems effectively and in a timely manner
- Identifying and resolving problems and known errors in a proactive manner such that occurrences of Incidents is reduced
- Providing management information about Problems and their resolution
- Learning from past experience provides historical data to identify trends and the means of preventing Incidents and reducing the impact of these so users can be more productive
Organizations that are new to problem management should focus their efforts on implementing a reactive problem management practice.
As an organization’s service delivery matures, it should transition to a proactive problem management practice. This transition should be carried out by a team with a good analytical skill set that’s highly proficient in IT infrastructure and the tools and technology that support the organization.
Obviously, whether to install a proper Problem Management policy in your IT department or organization, training and consulting are essential.
ITIL Training in Problem Management
IT professionals involved in problem management can continue their ITIL journey after the ITIL Foundation training with the Monitor, Support & Fulfil course.
This is a combined course consisting of practical modules that allow for shorter and more flexible training. The course covers the 5 practices:
- Problem Management
- Incident Management
- Service Desk
- Service Request Management
- Monitoring and Event Management
Want to know more about the course? Visit our ITIL Monitor, Support & Fulfil page or contact us!