You are off to a great start: Your organization has a network security monitoring solution deployed and configured.
Traffic to and from the system — critical assets are on your radar, and events and alerts are functioning properly and streaming in. You have all of these events, alerts, and subsequent data at your fingertips.
What do you do now?
Gaining visibility and insight into your environment is a crucial first step. The traditional approach to network security and performance monitoring provides a plethora of events and data. But this can be overwhelming, even for large and mature teams.
This approach also does not take into account your organization’s most knowledgeable expert: your OT team — Operators, OT Engineers, and OT Architects. This kind of visibility also doesn’t account for operational analytics and metrics such as safety, resiliency, and efficiency.
Many OT network security monitoring solutions have a limited capability for customization, while others may be complicated and/or IT-focused. This means you could be receiving notifications for events for nearly every anomaly, regardless of its current impact on operations, even if that event is a normal part of operations for a specific site.
This quickly results in noisy dashboards and alerts that make it difficult to see what really needs to be addressed in your environment and leads to alert fatigue.
What’s the key to better monitoring and alerting? Utilizing your most knowledgeable experts and codifying their knowledge of your environment into your solution.
OT Monitoring and Visibility Isn’t One-Size-Fits-All
Every OT environment is unique — they’re purpose-built to meet specific needs and requirements, with very few being similar. Because of this, there is no “one configuration to rule them all.”
For a visibility and security monitoring solution to be effective, it needs to be customized to meet the needs of each industrial environment. Relying on threat intelligence for detection, while potentially beneficial strategically, doesn’t always add the value that IT and OT teams need in day-to-day operations.
ICS environments are so specific, the indicators of compromise (IOCs) that monitoring software would use to detect them might not be relevant or ever be seen in quite the same way. This is why operator intelligence is key.
Operators know their systems and environments better than anyone. They know how the process works, which devices are critical to operations, concerns around specific sites, knowledge around safety systems, preventative maintenance schedules — the list goes on.
These nuances of an industrial environment are a critical part of configuring monitoring and detection software so that the resulting alerts are meaningful and actionable.
Bringing Operator Knowledge and IT Solutions Together
Effective monitoring and alerts begin with a strong foundation of communication across IT and OT teams. This will make it easier to communicate about assets, environments, and systems using the same language. Improved communication will also ensure that there’s less chance of miscommunication and will help to get everyone on the same page.
The best place to start would be with a digital asset inventory. This could be as simple as a spreadsheet with the asset’s operational name and/or hostname, location (geographic, sitename, building, rack/cabinet), IP address, and MAC address.
From there, it’s a matter of getting operator knowledge down on paper. IT teams and analysts can work together with operators and OT architects to understand what’s expected or normal behavior in an environment that might otherwise generate an event or alert that would add to alert fatigue.
Operators can explain any existing policy around known issues that can’t or won’t be addressed until the next maintenance window or hardware refresh.
For example, an industrial device could use specific protocols out of specification, resulting in what appears to be a “malformed” packet that would typically generate an event. But this might be expected behavior rather than a cause for an alert that requires attention.
Questions you should ask when gathering operator-informed rules for configuring your monitoring and detection platform:
- How do operators identify assets and their locations?
- How does the Operations Technology team define criticality?
- When does an event need to be addressed?
- What does normal behavior look like during regular operations?
- What does normal look like during a maintenance period?
- When are regular maintenance windows? How often do they occur, and how long do they last?
- How do operators respond to alerts from specific devices?
- Are there any long-standing known issues currently being or will be addressed by policy or planned maintenance?
Monitoring & Detection Technology + Operator Knowledge = Actionable insights that will help to secure and improve the overall safety and efficiency of an operation.
Alarm Management Style Monitoring for OT Environments
Monitoring and detection can and should be tailored to the specific needs of each OT environment to reduce alert fatigue.
Normalized, curated, and classified events on a dashboard allow smaller teams to prioritize high-impact, actionable alerts rather than trying to dig through and decipher what is or isn’t relevant to their operations.
Every alert that comes up in an environment will be collected and logged, but it won’t be flagged for attention unless it’s actively impacting operations or deemed critical to other systems.
Operator knowledge can also be used to correlate related events and alerts to escalate issues where necessary. Three discrete alerts may not be cause for action individually, but the presence of all three may warrant attention to address the cause or fix the underlying issue.
In Operational terms, one alarm might indicate there was an overage in pressure, and another alarm indicates that the pressure dipped. Eventually, the pressure in the system normalizes. In this case, the events are logged but not immediately flagged for attention since the system is still operating within acceptable parameters.
If there’s another alarm indicating that a valve isn’t working properly, which may have caused these fluctuations, this escalates the alarm for attention.
In cybersecurity, this could look like a device whose latency has slowly been increasing relative to the average latency of the network. Since it doesn’t affect operations, it doesn’t need to be addressed immediately.
But if that device is suddenly generating unexpected or even new traffic, this could be indicative of a misconfiguration or something malicious.
Actionable Alarms Start with Visibility
Creating an effective alarm management-style monitoring workflow starts with visibility across your environment. You can’t address alerts for devices and sites you can’t see.
If you’re looking for a monitoring solution to capture data throughout your environment all the way to remote edge sites, SynSaber offers an ultra-small, flexible software-based sensor that can deploy rapidly without the need for another piece of equipment. Our software is vendor agnostic, so you can get event information from everywhere within your network and send that data to wherever you need it, like an existing SIEM, SOAR, data lake, and just about anywhere else.