Artificial Intelligence and Machine Learning - Contextualizing security risks by Shafia Zubair
Understanding Artificial Intelligence and Machine Learning within the framework of Security
Hello, I'm Shay Auber and today, I'll be sharing insights on artificial intelligence (AI) and machine learning (ML) and how we look at them from the perspective of security. Specifically, exploring the risk that organizations face and ways we are reducing such risks.
What we're going to focus on is ensuring that we fully understand AI and ML, how they are currently being used within organizations, how adversaries utilize them, and guaranteeing effective utilization of these systems towards efficient security solutions.
The Perception of Artificial Intelligence
What comes to mind when we hear about artificial intelligence? People often have either a dystopian or utopian view of what AI systems imply for us. Either we fear machines will take over the world, or we anticipate a future where we rely on machines to live more comfortably. However, what we are discussing does not represent this speculative type of AI, which we refer to as Artificial General Intelligence. Rather, we focus on current, practical applications.
Artificial Intelligence Today
Today, AI takes practical forms, such as self-driving cars, the remarkable autonomous helicopter utilized in the Mars mission, or the autonomous ship aiming to cross the Atlantic Ocean. These machines primarily aid us in our daily lives, facilitating more effective processes and life solutions. Therefore, our approach and perspective today align with these realistic, practical applications implemented in our organizations.
Defining Artificial Intelligence and Machine Learning
First and foremost, we need to define artificial intelligence. AI, an area of computer science, concentrates on developing intelligent machines that work and react like humans. Autonomously, they have the ability to triage inputs and quickly derive outcomes.
The backbone of these AI systems is the machine learning. It comprises algorithms and statistical models, enabling the computer systems to execute tasks without specific instructions - but by relying on patterns and inferences.
The Journey to Artificial Intelligence
Our journey to AI begins from automating processes. Once we connect these automated processes to deliver a cohesive end-to-end outcome, it becomes a robotic process automation (RPA). Collecting and analyzing data to provide cohesive instructions for this automation is crucial for its successful adoption in firms.
Beyond the RPA, we aim to develop cognitive insights - learning from collected data over time, understanding patterns and data behavior to derive useful insights. This could range from basic machine learning to more complex forms such as deep learning, which involves neural networks.
Security Risks in Artificial Intelligence
The AI and ML systems encounter multiple risks such as privacy concerns, fairness issues, and transparency problems. When it comes to security, we consider the CIA triad - Confidentiality, Integrity, and Availability. Ensuring only authorized individuals can access the system, trustworthiness in the accessed data, and availability of the system when needed, form the most crucial issues for data scientists from a security standpoint.
Utilizing AI and ML in Different Sectors
AI and ML have been embraced in various sectors and organizations, with security operations using them extensively to defend organizations. Instances include email monitoring for defense against spam, granular patterns of user and system behavior for identity access management, and virus monitoring for endpoint detection and response.
AI in Product Development
AI is also employed in the creation of multiple products. The hyper-personalization of these products and their ability to recognize our voices and features enhances product development. Further, AI also forms the basis for autonomous business systems, giving organizations a competitive edge.
Challenges in AI and ML
Despite its vast applications, AI and ML also face challenges concerning data availability and storage, the technology used in the organization, and the unpredictable outcomes the model could present after a learning curve.
Going forward
Before diving into AI and ML, assessing your assets and threats and understanding the attack vectors is important. Tailoring your approach to managing risk based on the specific context of your product is the best way forward.
Remember, cybersecurity in the age of artificial intelligence is not just a compliance requirement, but also a brand protector and revenue accelerator. The right approach doesn't just defend organizations but also brings competitive advantages.
If you have any questions, feel free to reach out to me via Linkedin.
Video Transcription
Everyone. I'm Shay Auber and I'm joining um the first session from the suburbs of Chicago. I look forward to talking to all of you today today.I would like to talk to you all about artificial intelligence, machine learning and how exactly we are going to look at it from the perspective of security and ensuring that we are re reducing the risk that organizations face so simply. But uh we're gonna discuss what exactly is artificial intelligence and machine learning. Um We are also looking at how we're gonna use in our organizations. It's already happening right now and how the adversaries are using this and how do we make sure we are contextualizing what exactly we do with this? So what comes to mind when I talk about artificial intelligence? Typically, people either have a very dystopian view or a very utopian view of what any of these artificial intelligence systems can do for us. Either the systems are going to take over the world, we're going to be working for machines or on the other hand, we will have them serving us on an ongoing basis and you pretty much will have a very relaxed lifestyle and quite frankly, it's neither one of those uh what this view represents is a very specific artificial general intelligence, um which we are quite frankly many, many years away from it is going to happen at some point in time, but maybe not in either of the utopian or dystopian view.
What do we mean by that? What exactly do we think we will be looking at when we talk about A I or machine learning? This is a more realistic picture of what artificial intelligence looks like today and potentially in the near future. We're looking at self driving cars. We're looking at um this autonomous helicopter that was used in the Mars mission. Uh We are looking at a very autonomous ship that is looking to cross the Atlantic Ocean, the Pacific Ocean um from us to UK fully autonomous with without any uh sailors on board. So we are looking at machines that will help us facilitate us in our day to day life and potentially in scenarios where a human can be positioned such as like on Mars, very practical applications. None of these look either very threatening or very supportive. So um my presentation today is going to be in line with these more practical applications that we will be applying in our organizations. So what exactly is artificial intelligence? Let's nail it down before we even go further. Artificial intelligence is an area of computer science which emphasizes creation of intelligent machines that work and react like humans. When we say work and react like humans, they're fully autonomous, they have the ability to triage inputs and come up with outcomes very quickly.
They do not need external stimulus or external direction to establish what the reaction should look like. On the other hand, what drives these machine and the A I systems is the machine learning, which truly is nothing but algorithms and statistical models that the computer systems are using to perform the task they're using without any instructions you provided no programming, they're relying on patterns and inference.
So keep this in mind as we go through this presentation today, the journey to artificial intelligence starts from actually automating some of the processes. A lot of you depending upon which part of technology you're working in, you have automated, either a process or written some scripts um to automate processes. Now, that's a very individual process when you start connecting automated processes to have an outcome that is end to end, that becomes a robotic process automation. And that's for to do so, you will need to collect the data, you need to target, you need to make sure the instructions you're providing to automate that process are cohesive end to end. And a lot of companies have adopted this or are on the journey to adopt this successfully. The next step is after you have had done this end to end automation of processes and have that single cohesive say input to output, right? You get a resume and you have a hey yay or nay on the resume, that's an automated process.
But the next step is developing this cognitive insights, truly learning from the data you have collected over a period of time, understanding the patterns, understanding the behavior of the data itself or the data entities to land at. Um you know, some kind of an insight, it could be something as basic as machine learning or it could be very exhaustive like deep learning, which is neural networks. Um That's a very technical subject we're not going to go into today, but I just wanted to highlight that cognitive insights are what you will look find from say a Robo Avis or um a system that is used by healthcare professionals to understand what kind of a disease you might have.
The next in in this whole journey is that that truly autonomous system that has intelligence that can adapt to scenarios depending upon what it faces. It triages those multiple sensor inputs to adapt to the environment itself. So a self driving car is a classic example of that.
Now, do we have all of those autonomous systems in every environment that is not the case today, there are some of those but not quite what we have is a single domain autonomous system. So it can only do one single action. Um So in this case, if it's a self driving car, it's only driving the car, it cannot do anything else such as stop at a certain spot, you know, change a tire like a human would do it if it faces something that you know, if it faces the issue there. So it is a single domain centric autonomous system. Why are we looking at it just purely from a single domain? Because there are certain challenges that we need to be looked at that we encounter when we're developing the system, the challenges have to do with the data, the data, we collect the data that is available and how do we cleanse it and sanitize it? Um You know, the technology do we have that available in our organization? What does it take to collect that much amount of data to train a model? How do you sanitize it? How do you store it? How do you make sure you stay on top of it? And of course, the model itself, you have programmed it to do something, but at some point in time, it could do something completely, totally different because it has learned or it has gone through some iterative steps that land at a certain uncertain outcome.
Now, on the flip side, those are the challenges that you as a technologist will have to address. On the other hand, every model, every algorithm has a specific outcome you are using this for a business outcome. Now, when the business outcome is not being met and you have some negative outcomes, whether it is a biased output, whether the results are unstable and you cannot explain whether the results are such that people do not start trusting your models um or you're non compliant with some of the regulations out there that becomes a negative outcome for your organization, for the business itself and that will have a business impact.
So you need to keep all of these in mind when you're creating systems that have artificial intelligence or machine learning in the back end. So when we're doing all of this, right, the key here today today was to talk about risk. There are multiple risks these artificially intelligent systems will encounter. Um they have privacy concerns, they have um a lot of concerns about fairness. There are multiple examples in literature around how unfair these systems have been, whether it comes to processing loans or whether it comes to being utilized in prison systems. There's the the element of transparency and explainability as well. For those that are self learning models.
At some point, the developer loses the transparency to explain how exactly this model landed at a certain decision, which is a critical requirement of a lot of regulations for us, especially in the finance sector or even in the healthcare sector. Um So all of these are risks that we are going to encounter as we go through this journey. The key today we're gonna be looking at is a security risk. So let's look at that um security for those folks in uh who are, who are security professionals in this call, you will realize we focus on what is called the CIA triad confidentiality, integrity and availability. At any point in time. If you're using a system, you want to make sure that only the people who have, who are, who can access the system are those who you expect to access. Um And the, the data that you access is has the integrity, it's trustworthy, it's dependable. So and it is it's available when you need it to be available. In case of machine learning systems, you are looking at the confidentiality and integrity as being the key elements of security risks. You want to make sure that only those who who are expected to access the system can access it and that the data integrity and the model integrity is always maintained. And that is usually the hardest part for a data scientist to ensure from a security perspective.
But we are still using this right. We are using the um artificial intelligence or machine learning in our environment. Let's look at some of the areas where we use it. Um For example, in security operations, we use machine learning quite, quite a bit um to defend our organizations.
And a key example for this is email monitoring. Google has a statistic out this that says about 70 to 80% of the email it receives it processes is spam. So that is 70 to 80% that Google already filters before it hits your inbox where you already also mark things as spam. So not only are you receiving the spam you receive, but it's already automatically being filtered out. So imagine how many millions and billions of emails are being sent as spam. And we as humans could not have filtered that out if we did not have these machine learning models, the scale the speed that they bring to our organizations to be efficiently execute on those. So that's what your classic example, where we are using machine learning, we are using machine learning to establish some granular patterns of both user and system behavior. Um Typically, this would be an identity access management, whether you're either maybe using biometrics where using, you know, and uh we are constantly analyzing how a user accesses our or systems, the network, the perimeter. Um What kind of behavior do they express when they are accessing the systems and establish a pattern on that? Some organizations go very deep into it. Some don't depending upon the kind of risk they have um involved when it comes to access of systems.
You also would see this in say virus monitoring. Um you know, you have endpoint detection and response, you have multiple agents and antivirus systems on your endpoints, laptops, mobile devices, all of those are using machine learning in the back end to constantly analyze and detect these things that could harm our systems.
So from an operations perspective, you're using it exhaustively. Now, on the other hand, our product team members are also using A I to create multiple products. Whether you use a smartphone, whether you use something like a recommendation engine, whether it is uh a shopping engine, whether it is an entertainment engine, you are looking at hyper personalized products, you are looking at uh products that recognize uh our voices, our features.
Uh there are chatbots and human bots that you are looking at. When you, when you're interacting with them on different systems, you are also looking at something like a Robo Avis, which is a gold system. You set a goal for yourself and the Robo advisor is there to advise you on the financial steps you need to take. Um And then of course, there's a fully autonomous system that are taking these decisions and we are not there yet, but we're going to get to that stage, right? So all of these are different products that are being created by different organizations. So quite massively A I machine learning is being embraced in different companies and different organizations raises a very fundamental question. What kind of risks are we adding to our organization? So just like we said, right, operations, product development, we're using A I machine learning. Adversaries.
On the other hand, are also using this for in for doing for intruding environment for evading any kind of controls we have in place and also to attack the models that have been created. So it's not just that we are gaining an advantage from using this emerging technology. Our adversaries, the threat actors are also using this at a massive footprint to gain that advantage themselves to breach the uh the companies. So what what are some of the examples where something like this has happened? And also keep in mind it is very hard to detect the cause of a breach, right? Like did they use A I or machine learning? We do not know we can only infer based upon what we are encountering. So some of these examples are listed over here. Proofpoint is an email filtering system um used by a lot of enterprises. What they do is they score an email based upon a scoring is a spam. Is it not spam? 100 means it's a valid email. A lower lower scoring would mean it is spam. Now, Proofpoint actually published a vulnerability uh wherein they could be breached wherein the uh a threat actor can constantly keep sending emails and depending upon the rejection response received, it can fine tune the spam and make it look like a legit email. So there you go.
Your your proof one, your email monitoring system has been breached already. Now it just takes one spam email, that one link that someone clicks to get compromised as an organization. Now another example of that is um the Tay chatbot, um Microsoft rolled out a chatbot and all they did was they wanted to learn how the users interact with the chat bot. And within 16 hours, they had to bring it down because that chat bot learned very negative comments from the tweets that we're being replied to and to the extent that it became very sexist and very narcissistic. Uh so that tells us that people can also teach the self learning models some negative behavior. And very quickly at that, um the one that is the most scariest and this was done by researchers. So it's not something we know of in the real world at least um happens to come from the collaborations, pharmaceuticals company, they had a drug discovery model. So which basically was they have a data set with compounds of compounds. Um you know, to do to do to pull together something that would be a very favorable chemical that can be utilized. So they score it based upon the toxicity. So they that the positiveness of the how the how it benefits a human would get a positive score.
If the toxicity would reduce the score, they tampered with the model enough that they could flip the toxicity factor. So they, the model started avoiding higher toxicity and and actually, you know, the the benefit of the humans was being marked as a negative score. So within six hours, that model created 40,000 toxins including a chemical similar to VX, which is a nerve agent used in warfare. So, as you can see from these examples, ensuring the security of your model, ensuring how your model learns and understands where is the kill switch, right?
What is the, at what point in time do we figure out when the model needs to shut down? What is the negative outcome that are, that is all part of what I would call the security triad because the confidential, the integrity of both your outcome as well as the data set and the model is being compromised. So raises a very fundamental question. How exactly could an adversary impact us from an operations perspective? They're doing multiple things, password guesses, automating timing attacks, collecting all the data, hoovering the data from the social media, triaging it, right? Establishing those deep planning patterns.
And then on the product side, they can poison the data, any data that they have access to that you use to train your model, they can poison it by data injection or manipulation, they can reverse the logic like the people who from the, the pharmaceutical company did to create that nerve agent uh for the warfare, right?
And from a very IP perspective, right? This is information that is very uh confidential to your company that could be for your competitive advantage and they could potentially steal how your model behaves. So if you're an insurance company and you have a model that loves people's behavior to advise on how they're going to price you out. And some other competitive company comes in and tries to see how your model behaves by pro providing some kind of negative inputs or you know, by providing some um just dummy inputs, just to understand how your model does that model seeing also takes place the logic, the structure is something they're after.
So you have to as security professionals safeguard all of those, either at this point you go, these are products, we have good controls in place. We are limiting access. You know, we have M fa we are good, we are fine. Um But then the reality is that this is not you, you can choose to do so. And if you do so you land in a, in a position like this organization did if they have a robot security guard which is positioned in multiple places. Um At one point in time, it just flipped, fell into a fountain, Twitter had a lot of fun with that. And at one point in time, it also bumped into a toddler and, and rolled over it. So these had to be pulled out of uh usage in those areas. You do not want to be this company that gets into news for these negative products that have been compromised, that we do not know the company never published what compromise these robots, right? But the question is, how are you safeguarding your organization and ensuring the risks are minimized. So look at it from a risk assessment per perspective, assess your assets, your threats, what are the attack vectors? How can a threat actor attack? You analyze all those risks.
So typically what you would do in a risk analysis, but you're looking at it from the framework for artificial intelligence and machine learning models, very specific to how they can be compromised and compromised very quickly because the threat actors can use the can use the exact same scale and precision that we are using to create all these products prioritize your risk.
Do not boil the ocean. If it's just a recommendation system, maybe the higher the risk prioritization is low for that. But whereas if you're using uh uh you know, a computer imaging system, you want to make sure the risk prioritization is higher for it. It all depends upon your business risk and your business options always define your outcomes. Look at all the outcomes, make sure your technology, team members or product, team members are very well aware of what could be considered a negative outcome. It could be as outlandish as you could imagine, but that is truly going to be the case here. Um You have to think like a threat actor and you have to make sure you're doing close loop testing on an ongoing basis, ongoing audits to mitigate those risks. So always look at it in the context of which this product will be used. Whether it is an internal for internal operations purpose or being utilized by your customers. So with that, I'm going to leave you with one single statement. Remember that cybersecurity in the age of artificial intelligence is not just a security or compliance requirement, but it is a brand protector. It is also your revenue accelerator. We need to ensure that we have the the confidentiality, the integrity and the availability of all these systems is being maintained and maintained in a manner that is um adequate for the outcomes that is expected from that model.
So don't try to address every single machine learning product out there. Try to go it based upon the context in which it is going to be utilized with that. If you have any questions, please feel free to reach out to me. You can put that in the chat right now or we can um you know, you can reach out to me via linkedin. Any questions you do a quick check? Ok, we do not have any questions. Thank you folks. I see we have a lot of participants, London, Canada us, Ireland, Scotland. So thank you all for participating. It was fun. Um I hope to see you all in other sessions in the conference.