Ignition Community Live with Otorio
ORM - Looking Beyond Information-centric Cybersecurity28 min video / 20 minute read
Operational Resilience Management (ORM) is a holistic approach to industrial cybersecurity - from the identification of potential risks through evaluating their possible impact, to implementing mitigation controls. Incorporating digital and physical risks, ORM better ensures operational resilience and business continuity.
Yair: Hi, everyone. Thank you for joining and thank you for the opportunity. Today we're gonna talk about ORM, Operational Resilience Management, a new term that you're probably gonna hear more and more. And: What does it really mean to manage the digital and cyber risk in industrial environments? My name is Yair Attar, I'm the CTO and Co-Founder of OTORIO, and with me I have Greg Ryan from Andritz. Greg, maybe you could introduce yourself?
Greg: Sure, yeah. Good morning, everyone. I'm the Manager of Instrumentation and Controls. I'm working here out of our Vancouver, British Columbia, Canada office. And yeah, we're really pleased to have Attar Yair presenting for us today. Andritz and OTORIO partnered a couple of years ago now, and we're really pleased to have added their offerings to our platform of digital products. Andritz calls this platform Metris. And the Metris technologies are aimed at digitalizing and networking machines and plants, as well as developing new customer-specific solutions. Metris products are the very latest in the state-of-art digital products, and they could really be customized specifically to our customers' needs as well.
Greg: And they really make a substantial contribution towards helping our customers achieve the best possible productivity and efficiency in their facilities. And Metris is essentially Andritz's proprietary app store for process control and optimization of process performance. And while the name Metris has only been around for a handful of years, some of the tools available on it have been in development, and in use, in fact, for decades. Such as the BrainWave Model Predictive Controller, which far outperforms PID loops on processes with long lag times and multiple interacting variables. So we're very fortunate to now have OTORIO's cyber security offerings, so that's part of this digital platform. So without further ado, I'll hand it back to you, Yair.
Yair: Thank you, Greg. So shortly, before we start. OTORIO, we are quite new in the market, almost three years already. As Greg mentioned, we partnered with Andritz to basically establish OTORIO, an industrial cyber security solutions provider, focusing on the operation, environment, use cases, and how to manage the digital and cyber risks we are facing today. What we're gonna talk about today is the operational things, and how their role is actually changing throughout the, what we call, Industry 4.0 or the digitalization processes. How we wanna maintain production resiliency in this changing environment? What does it really mean to manage the risk and how to avoid such a risk, and how to maintain these cyber security procedures? What basically the industrial, cyber and digital ecosystems is all about? What risk factors we need to take into consideration when we're talking about digital risk?
Yair: Mentioning that, it's not just managing the risk from an IP perspective, or a technical perspective. We need to manage the risk from a business production perspective, meaning to talk about the... Whether it's a refinery process, whether it's a valve, or an oven, or a boiler, we need to understand the possible impact of any consequences to digital factors. And, last part is basically how we integrate with those industrial systems like Ignition, like the Metris, and how we're making it all happen. The operational things, well basically, the role hasn't changed so much. Even throughout the Industry 4.0 or the digitalization, the main objectives are still to produce in a safe way, in a reliable way, and make sure production works. But, when we're talking about the ecosystem, it's now changing. We have much more what we call cyber-physical systems, which are digital connections to the physical world, whether it's basically a controller which controls the valve, or something else that has a physical reflection.
Yair: Now everything's got more connected. We have much more automation systems that automate a lot of the processes. And of course if you wanna be competitive, you need to introduce automation and connectivity to your production floor. The supply chain is getting much more digested by many vendors. And there isn't just one solution provider who provides everything. We have different vendors for different areas. And it's not just that, but if we wanna maintain... Whether it's for warranty, for maintenance, for support... Those third-party vendors need to have connections to our production floor, which is not always managed. Now this causes new challenges, whether it's an increase of the attack surface, we open up our production floor. What was used to be a closed island now is starting to have more bridges to the outer world. This provides new potential failure points.
Yair: Basically now we have more connectivity, more systems connected to one another. This provides more risk from a visibility perspective. Not everyone knows what they really have in their production floor. Which systems from which decades? What's their status? What's their security status? And it's not just that. Those operational environments were developed along the years without the work or joint effort from the IT teams. Now we have those two worlds coming together and we do have also physical limitations. Even due to the COVID-19, we see today more difficulties in putting more people on production floors and reaching some places so, those types of limitations and it's also limitations from an IT perspective, whether from, for example, IT solutions are not fit to operational environments.
Yair: Most of them whether regarding are using active scanners which doesn't fit or provide patching suggestions which are not feasible. So, all those things are becoming much more challenging. And of course, add to that, the skill gap and lack of resources, which is always an issue. Now, even Gardner is saying that by 2024, 75% of CEOs would be personally liable for cyber-physical security incidents, meaning more and more top managements are now understanding this is their risk they need to manage. This is something major. I think we see these days, every other week, even more than that, and run somewhere that hits the production floor and basically stops or holds production. Well, those things will basically cause damage to production and let's hope we won't get there but it's just a matter of time when it costs safety issues.
Yair: Understanding that and, of course, mentioning or clarifying that the operational teams are the one responsible for the business continuity, and they are the ones who understand taking actions and consequences of those actions. They are the only one who can take it. So, it means that we need to equip the operational teams, the ones who are closest to the problem or to the challenge, solution to help them resolve it. Now, with that in mind, comes the ORM approach. And the ORM approach basically says: We need to manage the risk and we needed to do it in an agile way because the digital risk is changing rapidly and we need to adapt. And the operational resilience management is basically the science and art to ensure our business runs as it should be, alongside these challenges which are rapidly changing, where this comes from new threat vectors or from someone opening up the production floor, whether it's by mistake or not, for maintenance or not. These things are changing rapidly and we need to manage it.
Yair: Now, some of you are probably familiar with the maintenance workflow and we see here in front of us, this is the usual workflow. We just adjusted it to the digital perspective or what we have today in our production floor and how to avoid the risk. Now, the process is very straightforward. First of all, we need to collect the data, whether it's performance, operations, work intelligence, security, IT, whatever we can. Then, we need to understand whether we have a risk and to evaluate it, whether we find a risk, how much severity, the probability and the consequences. Eventually, risk management is a prioritization matter. We can't really reduce it to zero, so we need to rank the risks and to provide the context of the risk where it could be, could cause a malfunction, whether it's regarding to a misconfiguration, a cyber attack. There are many ways that we need to rank those risks in order to prioritize what should be dealt before. Now, once the risk is evaluated and now I understand which risk I should start working on. Now, I'm gonna talk about what are the mitigation measures, what needs to be done in order to reduce the risk, in order to avoid the risk. And of course, once this has been done, we need to reassess the new posture and understand whether this still exists or we have another risk in place.
Yair: Now, this process takes time. And again, we're talking about digital aspects, we don't have a lot of time. Now, this is why this needs to be automated. There is the need for a solution to support those teams in managing it. And this is the approach that brought us to build a solution, which we call RAM Squared, which is a risk assessment monitoring and management solution. And just to understand, in a nutshell, we're not... I'm just gonna talk about it maybe briefly, is that the RAM Squared as a risk management solution for the industrial use case, we understood that we need to take many risk factors into consideration. This is the collection part. We need to collect parts from endpoints, from operational systems like OPC, DCS, project files, those types of systems. Ignition, we're gonna talk about it in a second, and other security and IT systems. Now, once we have that, we are starting to build the understanding of the potential gaps and exposures we have within our production floor.
Yair: And just to have a better overview of how it looks like from the Purdue Model or a side perspective, so we see that we are even able to collect relevant information on the risk but also on the business context from Ignition systems which usually sits at the Level 2 architecture. Now, eventually, once we have that, we've talked about that we need to aggregate the risk from the business production process 'cause eventually, we wanna understand whether it's in a specific, I don't know, painting or welding, whether it's for an assembly shop for automotive or whether it's for pulp and paper in the refinery. So, we need to understand where exactly because the impact and analysis matters and we need to prioritize accordingly. And eventually I'm gonna see the risk from the aggregation of those processes.
Yair: Now, when we're talking about the integration with industrial sources, there are two main aspects that we need to take into consideration. Now, first is for collection perspective, for this matter. We need to collect from digital and cyber risk posture of those solutions. Whether this is DCS, HMI, history and project files, each one of them is a source of what's happening in the production floor from an operational perspective, but also from a cyber perspective. And for each one, we need to understand how it might influence one another. Now, the second part is also to integrate into the day-to-day operational procedures of the operation teams, right? 'Cause nobody has more teams and more members who can now sit in front of more solutions, they need to still work as they work on a daily or weekly basis from a risk avoidance perspective. Usually those are asset performance management and CMMS systems, maintenance systems that really support those long risk avoidance processes. Up until now, they usually supported what we call physical risk avoidance from temperature, speed, those types of things, and now we also integrate the digital and cyber risk. Now, let's see how it really works.
Yair: When we talk about the integration with industrial systems like the Space Management Solutions, again, up until now they were talking mainly about physical risk, physical equipment, whether it's a pump, whether it's a valve, those types of parameters. Now, we take into consideration the digital risk, which comes from the RAM Squared, as the engine to assess and manage those cyber risks and also performance-wise. This is why, by the way, we also collect data like CPU, RAM storage, that could also indicate something that is not working properly. Now, what we see here in front of us, this is by the way, Metris RBM, this is a use case that we've integrated with Android's metric solution, and we see here that... Again, before it was usually for physical risk, whether there is a need to change a specific part, but now we see risks that are indicating maybe digital operational problem or challenge or a cyber exposure whether those are CPU, memory, storage, where there are some gaps specifically in the network infrastructure. Now, the second part is to get information from those systems like Ignition and this, I'm gonna talk about now, a specific use case with Ignition, how we integrate information from those types of systems and understand their security posture. Again, from two main perspectives, one is more of a proactive one, we need to identify risk and exposures, and the second one is more of a detection one to really identify whether something is suspicious.
Yair: Now, maybe some of you are familiar with the Ignition Security Guide, so they have... There are very nice things that I really recommend for everyone to go after and make sure that they are working according to those guidelines, but basically we can see which risk practices each supplier is saying and also our own best practices, how those systems can be better secured. And it's not just looking on those specific events, because one of the key challenges in cyber security is actually to manage all this noise, you could say. And this is why there is a need to help in the process, to automate the process, and why we correlate events from different systems. We create a greater story of what's really happening, whether those are events that could indicate something or events that could indicate an exposure. Once we correlate them together, we can identify a pattern, something that indicates something more serious than just looking on one event. Now we see here whether those are changes in project files, whether we see failed login requests, or a specific tag change event. Once you correlate those things together, whether it's from a timestamp perspective, whether from a user perspective, whether from an assets perspective or process perspective... First of all, we now... I will get much more information on a specific use case and we reduce the amount of noise that are coming from different sources to one place.
Yair: Besides that, we need to remember that we need to monitor continuously for those potential threats because, again, the digital and cyber risk are changing, it's changing rapidly. We need to take into consideration all those changes in an ongoing monitor way, and it's not just collecting those information. We talked about impact analysis, we need to prioritize what should be done first. This is going, first of all, according to the impact analysis from our best practices, but also from the business production perspective where we are installed. Because there's a big difference if, for example, there is a risk on an asset, which is an oven, it could cost $250 million than a risk on a small valve that costs, I don't know, $10,000. So we need also to take this into consideration when we calculate the risk and prioritize what should be done first. Now, if you're looking at, specifically, on Ignition, now let's take two use cases and see how it works. So what we see in front of us is the RAM Squared, again, how we correlated different events, different enrichments that we've gathered from different systems, and one of those systems is the Ignition and the information that is withhold within the Ignition system, this is why we've done the integration.
Yair: With the system it's provided a lot of great value. Now we see here where those are unencrypted communication events and we see here redirects of communications, we see some different events for example, another one coming from the EDR management system that again at the beginning you could see them as separate events and you're gonna need to go and ask yourself on each specific one what happened here and start to investigate it but, once we were able to connect the dots automatically and provide as an insight to the customer, now he has one place he can really manage and in a much more fast and simplified way.
Yair: Now, the second use case, more of a detection use case. Now, we see suspicious behavior that could indicate again, from different systems it could indicate something suspicious, something with risk, whether those are failed login attempts, whether those are communication that are not according to a specific baseline or where they're from zones to zones that are not supposed to happen. Now again, those are types of events that we can collect later by our sensors if the customer has firewall logs, maybe I forget to mention, is that, there's no need for every solution to be in place, usually probably we're gonna see at a customer's site one, two or three different solutions in place and this is why we are able to connect the dots from those systems and with whatever the customer has in place, this is why we always think that we should amplify what's already in place and then identify all the apps that need to be taken into consideration.
Yair: Before we go to some questions regarding the presentation and regarding maybe the solution, I did saw some questions. I'll try to summarize and give a high overview of what we've seen today. First of all, whether if you haven't done it before, go and read the Ignition Security Hardening Guide, it's a really good guide and we think you can benefit from it right away. Look beyond information security cybersecurity, information-centric cybersecurity, meaning that it's not just looking on specific events and from the technical perspective. You really need to prioritize from the operational process perspective, you need to have it in a whole matter, otherwise you'll just get lost and this is why in order to maintain resiliency of the production floor, this is how we need to start talking when we're mentioning cyber-risk in production for an operational environment.
Yair: Cyber-physical systems, operational resilience management teams, personnel, whoever's related to it we need to work with solutions that really encompass all the aspects of risk management and we need something to help us simplify the process. First of all, from a prevention perspective, most solutions today are focusing on a reactive approach, meaning identifying a suspicious activity usually when we're talking about operational environments it's a bit too late. We're seeing today, we've talked about ransomwares, it's just too late, we need to identify the gaps, the exposures, the vulnerabilities before they become a bridge, once I've done that, we should not neglect the response capability, we need to identify early detection and also minimize possible impact, by the way those could also talk about things like, faster recovery or backups. So this is also a part for making sure that if something will happen we'll be able to recover fast and maybe one last thing, is that it's not just to get yourself an exposure or a risk or a bridge, it's also making sure that we learn from it that our system not just person, learn from it making sure that it won't happen again.
Yair: I would say this is the overall approach and what I've tried to convey in the last about thirty minutes to share with you first of all, a bit about ourselves, about our solution and approach and about our integration with Ignition and what we've done with them and now we have an ongoing solution together whenever we see the Ignition in the production course. This was the overall presentation, I do know that I've seen some questions, I'm gonna answer some of them and see whether anything else is still missing.
Yair: First of all I've been asked whether — do I think operational teams can really manage this? It's a good question. I truly believe that there is no other option, usually today we have cyber analysts who usually sits in SOC somewhere globally they're usually looking for abnormal detection alerts, again that could really indicate a bridge which we've just talked about which is a bit too late, they usually don't understand the context, meaning that probably they will never take any action by themselves, actions that are like close the file report, install something, patch or even quarantine of a file that could indicate something for the automation process. They're gonna need someone on the production floor to assist along the way and to make sure that they are not doing any damage.
Yair: And this means that there must be someone on the shop floor that could manage it to some extent. Now I'm not saying that the operational guy's supposed to replace a cyber analyst because those are different skill sets. But usually we see out of those are automation engineers with ICIS or security background. We see, for example, IT managers within specifically bigger plants. You see some teams that are more capable... In smaller plants you're probably gonna see someone in charge that someone just giving the role besides what he do on a daily basis. Now, this is why there isn't it and by the way, this is why we think that the risk avoidant process is and should be managed by those teams who really understand the consequences... Who really are the ones responsible for the operation resiliency. And once there is an indication of a potential bridge and there is a need for a cyber analyst, of course there is a need for a better collaboration with the guys sitting in the SOC.
Yair: Another question is regarding the systems in place. What if I don't have all those systems in place? So maybe I didn't explain myself good enough that, in most cases, we don't see everything in place. Usually we'll see partial systems out of those are firewalls, or HMIs, or SCADA systems. But something probably will be there. But this is why we have various ways to collect the data out of, via our network sensors. We have querying capability, WMS tools, OPC connectors, et cetera. So we have various ways to connect the data from each use case of production floor, and there's a big difference between energy production and distribution, and from a soda co., and beverages and food. So there is a big difference between each sector. First of all from mindset and operational perspective, and also from routines and processes. This is why we have various ways to collect the data from different systems. So if there are no further questions, I would use the opportunity to thank again everyone who came. I really hope it was interesting. And if anyone has any other questions you can reach out via LinkedIn or other platforms.
Greg: Thanks very much, Yair. Greatly appreciate the presentation today.
Yair: Thank you Greg. Thank you everyone. Have a great day.
Greg: Take care...