Design Like a Pro: Machine Learning Basics
What It Is, Why It Matters, and How to Get Started58 min video / 52 minute read View slides
Machine learning (ML) is one of the most interesting technologies to enter the popular consciousness lately. While it's exciting to ponder the nearly limitless industrial applications of ML and the massive benefits it can bring, it can also be daunting to think about exactly how to start using it in your own organization.
In this webinar, Travis Cox, Kathy Applebaum, and Kevin McClusky from Inductive Automation discuss key concepts and best practices, show demos, and answer questions from the audience, to help you start integrating ML into your day-to-day processes.
Learn more about:
- Practical ways to use ML in your factory or facility
- What you'll need to get started
- Existing ML tools and platforms
- Making goals and plans for ML
- And more
- ICC session: Introduction to Machine Learning
- Blog post: Why People are the Real Key to Machine Learning
(The moderator, Travis Cox, briefly introduces the topic, Inductive Automation and Ignition software, and then introduces the presenters, Kathy Applebaum, and Kevin McClusky.)
Travis: First let's talk about machine learning. That's what we're all here for, and I wanna kinda give you a little bit of an introduction before we get into the actual topic or the meat here. So machine learning has existed for a long time but has recently come to the forefront especially with the introduction of IIoT and the fact that companies are creating teams around this to do more with the data that they have. We see many companies using machine learning, but they aren't really sure... They wanna use it but they're not really sure what it is or what it can be used for. It's true that machine learning has tremendous potential but it won't give you instant results. There are a lot of steps you need to take before you can implement a full solution.
Travis: First, you'll need to ask a lot of important questions, and then plan out your project carefully. You will need to make sure you have the right data, the right people and processes in place but it's definitely worth it. If it's done the right way, a machine learning project can save a huge amount of money and greatly increase your speed after just a few months of development. So today, we'd like to help you get there by discussing what machine learning is, some of the ways that it can apply, that you can apply it in your industrial organization, and which steps you can take to get started. Plus, we'll also show you a demo of the Microsoft Azure Machine Learning Studio or AML studio and answer some of your questions. We won't have time to answer them all, but if we don't get to it, we'll make sure we get in touch with you afterwards. So let's first start by answering the question, What is machine learning? And for that, I'm gonna turn it over to Kathy.
Kathy: Thanks, Travis. You'll hear terms like analytics, machine learning and artificial intelligence used. And they get thrown around a lot, but they're rarely defined. So let's talk a little bit about what they mean first. Analytics is discovering the knowledge that's in your data and there's four main types of analytics. The first is descriptive analytics and this is the most common type, because it just describes what's already going on in your data. Every time you run a report, you're doing descriptive analytics. Diagnostic analytics adds why to what's in your data. In other words, descriptive analytics might tell you that your equipment has been down for four hours. Diagnostic analytics will tell you why that equipment went down. Predictive Analytics is very popular today, because that looks at what might happen. So it will tell you, for example, that it's likely your equipment is going to go down on a second shift on Tuesdays when you use a certain raw material. And then, prescriptive analytics is not very commonly used, but it has a huge potential because it recommends next steps, so it might recommend changing a process or changing a supplier of your raw materials.
Kathy: Machine learning is learning and improving from experience. There's a lot of overlap with analytics, especially with prescriptive analytics. The experience that we're learning from for machine learning can be completely in the past or we can continually refine our learning through things like lazy learning or re-training. And artificial intelligence is tasks that simulate human intelligence, and again, there's a huge overlap between machine learning and artificial intelligence. A lot of experts consider machine learning to be a subset of artificial intelligence. For today, we're gonna use machine learning to talk about all three of these things, and we're gonna focus on things that let us take an action. Now, I'll turn it over to Kevin to talk about software options for machine learning.
Kevin: Thanks, Kathy. So if you take a look at the platforms that are available today for machine learning, they range all the way from left to right on this graph. So if we start from the left, we're looking at things that have more control. And for those of you who are on the webinar today who are programmers, those of you who do direct coding or are interested in really low level things, the items on the left here are more interesting. So these are libraries. These are things that are available where you can write your own code, you can write your own Python or your own Java or your own R code, whatever it might be, and build out your machine learning models and code against them, directly. So these are low-level software libraries, so that's great for a certain set of folks. A lot of folks wanna go in a more automated way where, maybe you don't wanna touch code. So if you go over and you take a look on the right-hand side here, there's an example of different platforms that exist out there where you can just send it a bunch of data and then get back some sort of result. So these systems are a bit more of a black box.
Kevin: And then, if you go to the center section here, these are more of a kind of right in the middle where you're still configuring it, you're still doing things where you're choosing your own algorithms, you're choosing your own configuration for it, you're doing the configuration, but you don't necessarily need to get in there and be writing code against it. There are visual design tools and techniques that exist, the Microsoft Azure Machine Learning Studio that is right up there at the top is the one that I'll be focusing on a little bit today, but the middle item there is AWS and the bottom one is Google and so, there are platforms available from each of the major three providers, three cloud providers that give you these types of tools, as well as a number of other toolkits out there. So I get the joy of doing the actual demo for everyone who's on the phone here and on the call on the webinar, so I'm gonna switch over and take a look at an Ignition Gateway that I have installed. Most of you have probably heard of Ignition maybe or are using it today or exploring.
Kevin: If you aren't familiar with this, this is just the Ignition launch page. I'm going to click launch right up here to launch a client, so it looks like this might be up and running again, which is great. So actually, I'm gonna switch back to this, so we've got the demo going and inside this demo, this is showing off a solar farm. So we loaded a bunch of data in here that one of our integrators sent over, and the data that they gave to us is around power generation for a specific solar farm. If we go into the history for the solar farm, we've got a little bit of this set up already, but I'll go back in time to where we had some deep data here from the solar farm. So if I go back to June and pull back... And maybe I'll wanna zoom in here, right? I wanna see a little bit more detail. Go all the way to that date range. We can take a look at the different points along the graph. So we've got temperatures, we've got barometric pressure, we have a lot of information that's coming from a weather station that's associated with this and then a number of sensors, and then we have real power here along the bottom.
Kevin: This real power is what's the most interesting item to the company, so we wanna be able to tell, this is how much real power we've generated. In this case, what we want to do is generate predictions on how much real power we will have. So going into the future here, you can see right now, we can see what the real power was in the past, but how do I know what it's going to be in the future? Well, what we've done is we've taken the forecast information from the National Weather Service, and we're able to load that in. This is the forecast over the next number of days here, and you can see this is from today going forward, and if I hit predict power output right here, what it's doing is it's getting predictions. These predictions are live predictions that are coming back from Azure Machine Learning studio that I just showed you, that we have set up. So it's sent out these to the cloud and we're getting information back here that is taking into account the sky cover. 53, the sky cover is 42, it's going down to 24, and you can see the power output that's predicted here based on that sky cover has changed over the course of that day.
Kevin: I'm a little bit lucky because normally, there's not sky cover in this prediction so I can't show this but this is live data, this is a live demo. You know we didn't plan this beforehand, but it's pretty nice to be able to show you that. So this is a real example of this working. I'm gonna dig into some more of the details behind the scenes in just a couple of minutes, so we'll switch back over to the presentation and Kathy, back over to you.
Kathy: Thanks, Kevin. There are two main types of machine learning models. The Model is, for example, what Kevin trained on Microsoft Azure Learning studio. So one of the types of models is going to be a classifier. A classifier predicts a category such as good versus defective. The other type of model is a regression model. That's going to predict the value, such as your power output or your defect rate or an equipment setting that you need to use to optimize your output.
Kathy: One of the big applications of machine learning for automation is going to be predictive maintenance. It's very expensive to bring lines down to maintain them before they're needed to be maintained, but it's even more expensive to wait too long and have equipment break and suddenly have to do emergency repairs. It's even worse if you have very expensive equipment, you might have a motor that cost hundreds of thousands or a million dollars. You don't want those sitting around when you don't need them but you don't want to be ordering them on an emergency basis. So if you can predict when that motor is going to fail and have that motor arrive just in time and be able to bring that line down when you need to, there's gonna be a huge return on investment on that. The same with expensive things like delivery trucks, or any expensive piece of equipment that needs regular maintenance.
Kathy: You wanna predict when that maintenance is needed so you're not doing it too early or too late. So in addition to predicting maintenance through predictive analytics, there are many other industrial machine learning applications. Things like predicting machine settings to optimize your output or optimize your quality. Quality control, you don't wanna be shipping defective products. You'd like to pull those out of the line. You wanna forecast your demand, you don't wanna have your equipment and your people tied up in making something that there's not gonna be any demand for. You want to have that output be based on something that's really going to sell. You want to forecast the price of your raw materials, you'd like to stockpile things when they're cheap and have them available when the price might otherwise be high. And even training industrial robots, it's becoming more and more common to have an industrial robot simply watch the process that it needs to do and be trained just like a human and then when that task is done, it can be retrained for something else. So, Travis.
Travis: Yeah. So now we've got a much better idea of what machine learning is and what we can do with it. We saw an example of the forecast of the power that Kevin showed and we really need to start digging into well, the steps to actually accomplishing a machine learning solution. And we need to ask ourselves what we need to do to actually get this all started. And for me, for starters, we need to get data, lots and lots of data. The more data you already have, the better a candidate you are for machine learning. Exactly how much data you need depends on what you want to accomplish. If you're not collecting data yet, you need to start. You need to start getting that information and start getting those models created. You should also have data from various sources. Not just process data from devices, but you want ERP information, you want maintenance management information, you want information from all the systems you have, because the more context to that data, the more we can label that information, the better it's going to be for us. And the data collection of course should be automated, certainly not manual, we don't wanna rely on operators and entering information that could be faulty or it could be that or were guessed or estimates.
Travis: We wanna make sure that we get the information directly from the sources automatically. And while having a quantity of data is important, quality is I think much more important. And we'll show more of why we say that in a minute here. Another thing you need to really, that you need to get started is a dedicated person who has statistics knowledge and a domain knowledge. Statistics knowledge means that they understand linear algebra and statistics, can make sure the data sample is representative, can distinguish correlation from causation, who can figure out your actuarial requirements and more. Domain knowledge means they have to have in-depth knowledge about the process. They'll have to understand which types of data are the most promising and they'll have to know when results don't make sense. Again, one of the most important parts of machine learning is labeling that data, and if they don't understand the process, they're not gonna be able to under... To not be able to label that data correctly. So, that fundamental is it's very important. And we can't just get an intern or someone from IT to do that, you have to have somebody who has had experience running those lines and know the trials and tribulations, and when these issues have occurred.
Travis: You need someone to be able to sort through all that information and to really label it as good or bad. Having that labeled data, as I mentioned, makes it much easier to do machine learning whereas unlabeled data pretty much is useless. So, labeling, the labeling data requires manual intervention. Computers are fast, but they're dumb. So we need to be able to... One of the big tasks that people don't realize, if they can go in and provide that label to that information, we're gonna have a much better solution at the end of the day. So Kathy, what are steps we need to know here?
Kathy: One of the most important steps is to pick a question that you're going to answer. A lot of times I see people they'll start with machine learning and they'll pick something that they think is easy to answer. The problem is that you still need to go through a lot of work to do that easy answer, you need to collect your data, you need to label it, you need to try several different algorithms, and you've done a lot of work and you ended up with something that's not particularly valuable to you. So start with something that is valuable that you really want to know, because the amount of work you're gonna go through is not that much more, but you get a bigger return on the investment of your work. The other thing is thinking about your cost function. A cost function tells the machine learning algorithm how far away its prediction was from the actual answer. If you're trying to predict the price, the cost function is very easy, it's just the difference between your prediction and the actual price, but sometimes developing a cost function is not that easy. You need to think about whether being close is good enough or maybe some types of errors are more expensive than others. If you're trying to develop a self-driving car, driving into something is a pretty costly error and you wanna minimize that. So how you design that cost function is really crucial to the success of your machine learning project.
Kathy: As Travis was saying, you need to use domain knowledge. Your domain knowledge is going to tell you what types of data might answer your question. You may have particular knowledge about your process that you know that some types of data are very important and some types of data are not. This is gonna be really crucial for getting good answers out of your machine learning project. Can you acquire missing data? There's always gonna be missing data, either you changed your process and so some data was not available historically, it just wasn't acquired. Maybe a sensor was malfunctioning and it didn't record for several days. How do you acquire that missing data or how can you develop your process so that it's not crucial to have that particular data? You need to think about what quality your data is. Perhaps a particular sensor is just not reliable, and so it's giving you very bad data. Do you need to ignore that data? Can you estimate it? Can you do something to make that quality better? Machine learning is only going to be as accurate as the data coming in, so if your data is poor quality, your answers are gonna be poor quality. And you need to think about dependent variables.
Kathy: So in the example that Kevin showed us of the solar farm, time of day and temperature depend on each other. However, they were both important for his project because the time of day, tells us about the angle of the sun and the temperature tells us about the weather. So they're not as dependent as they might seem. But if you're trying to dry fruit, time of day and temperature, that correlation is not really going to to add anything and it's just gonna make your answer more complex and take more computing power, so you would probably want to eliminate one of those two variables. ETL; Extract, Transform and Load. This is where a lot of your machine learning time is going to come in. You need to extract the data from where you have it. This is probably multiple sources. You need to transform it into something that you can use, and you need to load it into some place where you have good access to it. You need to think about, "Can you automate each step?" You're gonna have a lot of data, if you automate those steps, it's gonna go much smoother, you're gonna have a lot more buy-in rather than having someone continually type into an excel spreadsheet or something.
Kathy: Can you acquire that new data automatically? Again, going forward, try to find ways that that data can be extracted, transformed and load automatically as you save it. Perhaps you need to save it in two different places to do that, that's fine, but anything you can do to make that data acquisition easier and faster and more accurate, is going to be worthwhile. How much clean up of your data is needed? Things like dates and numbers often need some clean up, anything that's in a string is probably going to need to be cleaned up because we know that CA and California are the same thing, but the computer has no idea. So you're gonna need to get those into a consistent format. And how are you going to handle missing values? There's always gonna be missing values. Some algorithms handle them really well, some don't. So you need to think about, are you going to choose an algorithm that will handle that? Are you going to throw out the data where values are missing? Are you gonna estimate it? Are you gonna find some way to get it? Think about this in advance.
Kathy: I can't emphasize enough the importance of visualizing your data. It's so easy to visualize your data, nowadays, that there's no excuse not to. Things you should look for when you're visualizing your data are problem data, things that just won't even load into your visualization told you that you didn't clean up your data enough. You're gonna see obvious trends. You might see the data looking very much in a straight line and that's gonna tell you that linear regression is a good choice for this. Or you may see that your data is scattered all over the place, and that's going to tell you that you may need to do a little more work, and that leads into finding the obvious algorithm. Sometimes our brains just will see that data and we'll say, "Oh yeah, I know what this looks like, I know what to try." And that comes down to determining which algorithm to use.
Kathy: We didn't talk a lot about algorithm specifics today, but some of the questions that you'll need to ask yourself are, is this a classification problem or a regression problem? Rarely can you use the same algorithm for both types of problems. Travis talked about the importance of labeled data, that's really important in automation. We can rarely use unlabeled data, so are you... That's gonna determine what type of algorithm you're gonna use. Some worked well with labeled data, some don't. How tolerant is your algorithm of missing data and how much missing data do you have? You need to think about this in advance. Do you want to train your model once or can you do something like lazy learning or retraining? How important it is... Is it to change this model over time? Black Box versus human-readable. Some algorithms like neural networks are considered black box. In other words, data goes in, answers come out but you can't really understand how it got to those answers. Sometimes it's very important for your process to understand exactly how that answer was achieved and it needs to be a human-readable process.
Kathy: What kind of computing resources do you have available? Can you go to the cloud and take advantage of that or do you need to do everything on site? And how tolerant do you need to be of outliers? Some processes are very clean and you're gonna have very few outliers, some have a lot and you need to think about how that's going to impact your results. You need to determine what platform you're going to use. And by platform, I mean both the hardware and the software combination. So do you have the computing resources on site to do this and the software resources on site, or do you need to use a cloud-based solution? How flexible do you need to be with your solution? How easy is it to get data into and out of it, you need to think about your whole system. You're gathering data in ignition, you're going to store it somewhere, you need to get it into your machine learning platform and then you need to get those answers back out into something that you can use. How can you get those results into a usable form that might affect your process? And again, can it be automated? You could show these results on a screen and then have someone type in a value somewhere else, that's gonna be inefficient.
Kathy: And you need to test your model. So the model that Kevin developed on Microsoft Azure, has ways to test it, and that's really great. You're generally gonna want to send, to save about 30% of your data for testing. You need to think about how accurate you need to be. I've seen a lot of people say, "Oh, my model is really accurate because it's predicting the right value 90% of the time," but 95% of their data was in category A, and 5% in category B. So if they just always predicted category A, they would have been more accurate than their model. This is where your statistics knowledge is going to come in and you're gonna find out exactly whether your results really are good. And be prepared to back up a few steps. This is gonna be an iterative process, you're gonna find that you may need to collect more data, you need to clean up your data better, you need to try a different algorithm. This is not something you're gonna do in an hour, this is gonna be a process, but the results are worth it. So now, Kevin, can you take us back into Azure and show us how the data was used to develop your models?
Kevin: Sure thing, Kathy. So this is kind of the fun part. So this is, as I mentioned before, this is Azure Machine Learning studio that we've done when I've gone through and created these outputs and these predictions. I'll jump back over to that. And inside Azure Machine learning studio, I will walk you through setting up a new experiment. And so what they have inside, inside this platform and as mentioned, you could use other platforms too. This is absolutely not the only thing that you can use with ignition. You can use pretty much anything that you want to, but inside Azure machine Learning studio, it has a really nice interface, that I'm walking through right here and I'll just walk you through setting up a machine learning example, and I know some of you on the phone probably have a lot of experience with some of this, and some of you probably have never seen any machine learning experiment set up before in your life.
Kevin: So, I'll walk through and try to describe each one of the sections as we go through to make it a little bit clearer as to what I'm doing. So, I'm starting out with this solar training data, I can visualize this data and what you'll see here is that we have dates, we have the day, the hour of the day split up, we've got Temperature, relative humidity, pressure, sky cover, wind speed and real power. And this is all historical data. This is data that we have for a lot of rows here of data that, I think it was over the course of about a month that we generated this for each one of these.
Kevin: What I wanna get out of the experiment is basically when I send in predictions, when I say, this is my day, hour, temp, relative humidity, pressure, sky cover, wind speed for the next month or for the next week, or for the next day, what do I expect my real power to be? And that's what I ended up getting out and visualizing just a minute ago. So we'll walk through that process. So the machine learning algorithms, they're not really gonna understand what a date is. That date's not really going to help us. So the first thing that we're going to do is pull that date out of the experiment. So I'll come over here, a data transformation manipulation and then I will select columns inside this data set, pull this down and pick everything in here except for that date. The hour of the day is going to be a much better predictor. That date would only be useful if I had other things that I wanted to predict that were happening on that exact same day. And in this case it's not. So I've got that in here now, this results data set. I could even run through this and say I want it to do this selection.
Kevin: And in running this, you can see that it's processing, now that's complete. And I can take a look at that data set now and see that it has everything except for that date there. So this is a good data set that I wanna use for everything. Now, I'm going to take this data set, I'm going to feed it into my next step here, which is going to be a split. So and the reason for the split is that I want to take a look at the set of this data and use it for training my machine learning model. Training is a process where you take the data that you have, you say apply this algorithm, use this data and give me back some sort of model that's going to predict things for me. So I'm going to split this data and I'm gonna take 70% of it and use it for that training process. I'm going to take 70% because Kathy said take 70% and she's very smart. So I'm going to jump over here. And now that I have this split 70% coming out this side, 30% coming out that side, I'm going to apply machine learning training to this. And so I'm going to just train the model, send this 70% of the data to this training step and then I'm going to pick my algorithm.
Kevin: And this algorithm, as Kathy mentioned, there are a lot of different categories of algorithms. In this case, I'm going to use regression and I'll use the simplest algorithm possible, I'll use this linear regression. Normally not a good idea, but it'll give us kind of a baseline. So this linear regression will come in to train this model. When I train this model, I'm going to tell the model I wanna train it to predict something, so I'll launch the column selector here and the column that I want it to predict. I want it to predict this real power going into the future. So I'll pick that guy, hit okay there. And then as soon as I've trained that model, I wanna use it and I want to use it on the 30% of data that's coming out on this side. When I use it inside an experiment like this, that's called scoring the model where I can take it, I can apply it, and then I can take a look at the results and compare them to the real results that I already got from this data up here, to see how well it did, that's it. So, I've now built out my machine learning experiment here and I'm going to hit run at the bottom.
Kevin: When I run this, it's going to run through each one of the steps. It's doing this on Microsoft servers here, and it's not doing it locally on my machine and it's already gone through and it's finished for me, so great. So instead of looking at each one of these steps, I'll come down to score model, score data set, hit visualize right here. And I can see now what this looks like. So I have the real power, this is what actually happened, this is what came in my data. These are the scored labels. This has given me an idea of the predictions. So for each one of these, the real power was actually 19. If I didn't know what the real power was, and I pass in these guys to the machine learning model that we just created which is the algorithm, the predictor, then it's going to predict 11 for me, right here, not real great, but not completely awful. This one is okay, that's pretty close. This guy is pretty far off. And you can see that I don't have a great prediction coming back here, so I've just fed it in. It's not necessarily going to give me what I want, it's actually probably not gonna give me anywhere near what I want and so, I'll come over here and take a look and say, "Let's take these scored labels and compare them to real power."
Kevin: What I would expect here is that they would be about the same. And what you can see here is that they're not. So these scored labels right here, we would want real power... If real power was zero, we'd want it to be zero. But if real power is 10, we want it to be 10. So we'd wanna see a kind of a scatter chart that's going up in this direction, where it's close to linear along the way, but in this case, we're kind of all over the place. So this didn't do good for me. So I don't wanna take this machine learning algorithm and now feed it into my project, and my process and use it for anything real because it's not gonna give me what I want. So I'll come in and change out this linear regression, move it over to something else. So I'll take a look at my regression analysis on this side, and I happened to do this already before the webinar. So I had an idea of how each one of these is going to perform and I happened to know that this decision forest is pretty good for this data set. As part of your process, if you understand what these algorithms are you'll have an idea beforehand, which ones might do well and then often, it's a good idea to just go through and test the different algorithms and find the one that's going to give you the best scoring for everything.
Kevin: So I hit run this model again, it's running through and it's finished. So now I could take a look and score this model, visualize the score data set at this point and I get numbers that are looking a lot better. So, these are the numbers that were actually there. These are the predictions based on that. And I'm much happier with this. When I use this particular, inside Azure machine learning Studio, sometimes it adds a few more things. So if I use this decision forest regression, then it's going to give me some additional information on this side where I get this scored label standard deviation. This is giving me an idea of how much I've deviated. And if I take a look at the same histogram that I was looking at earlier, where I compare this over to real power, you can see this is much better. So this is much more along the lines of what you wanna see. You can see that there's still a range inside that prediction where it's gonna be off by a little bit, as it goes up, but there's nothing that is really far away from what the reality was, of the situation.
Kevin: Now, if I wanna take this and I want to set it up as a web service, I can do that right here. And this is how we connect to most of these machine learning platforms from ignition. You'll just have a web service connection from ignition out to these guys. So if I hit set up web service, I can set up a predictive web service and a re-training web service. I'll circle back to what retraining means in just a second. So I'll set up this predictive web service and it automatically drops things in for me in a way that each one of these different steps is set up. I'll hit Run right here. It's gotta run through once, make sure that this model is solid. I could adjust this model at this point if I want to too but this is using this experiment as the model. So I generated this experiment. Just going through and it's scoring it, and it's going to output it back out to the web service. So right here inside ignition basically what we'll be doing, we'll call this web service input. This is where ignition feeds into it. It will pass in whatever data set I want to do predictions based on, and then it will go through. It'll use this model that I've generated already, and then it will output back to ignition.
Kevin: Now that you have a little bit more reality on this, I'll just walk you through this one more time. So this up here at the top is the temperature, relative humidity, sky cover, wind speed and pressure that we have coming back from the National Weather Service. Each one of these guys when coming down to what you see down below, this is predicted power that's coming back out of that web service and it's using each one of these guys to get the predicted output. When I click this button, what it's doing, it's taking this whole data set up here, it's sending it over to this web service input, it's going through, it's using this model to generate a new column which is that real power inside this web service input. We don't send in any real power because we don't have it, it's for the future, and then this web service output is giving us back that real power, and then from behind this button, we're taking that data and we're publishing it down to this graph down here. Now it's really easy to get these things wrong. So even if you visualize this inside the experiments on this side, it's super easy to do this in a way that you're not going to get great results. So, I ran through and did a number of different items here. Just to give you an idea of what some of the bad results might look like. So I did a neural network regression, for example. If you take a look at this, this is the predicted output.
Kevin: This isn't very real, because if you're taking a look at power output, you're gonna be zero for a while. If you're taking a look at what it is gonna be per day, it's gonna go down to the bottom, it's gonna jump up, it's gonna go down, it's gonna jump back up. If you see something like this coming back from your visualization, even though it might have... It might be up at about the right times, this is a problem down here. So, Travis was mentioning domain knowledge being important for being able to do a good job with this. This is a really good example of that. If I didn't know that this should be flat for a while, I might think that this is kind of okay, right? But if I know what this should look like, if I understand the process, if I understand what I'm predicting, if I understand how things should be running, then it gives me the ability to take a quick look at this and say, "No, that's terrible, this isn't working at all."
Kevin: If I take a look at some of these other predictions, you can also see that they're not necessarily very good. This is the Decision Forest based on some test data, and this is the best results that we got. Now, I did a number of other things with this as well, where through the process, I did some bad predictions, so I had decided to put these up here to show you what these look like as well. These are coming back from the same in-points there, but basically what I did, the National Weather Service, the way that it was sending data back was Time Zone adjusted and one of the items there was UTC time instead of Pacific or Eastern or Central, which ended up making a big difference for what the predictors were, because some of these guys right up-top, were showing that they were... This time, they were local and some of these guys were not, some of them were showing essentially England time and so, since these didn't match up to each other, the prediction algorithm couldn't do a very good job. You can see spikes like this all over the place, that is very wrong. But take a look at the Neural Network, Bayesian LR, Decision Forest.
Kevin: None of them could get it right because we were sending in bad data. So along the lines of sanitization that both Kathy and Travis were talking about, that is really important. Getting your data in the right format, validating your data before you send it through will actually give you the ability to do much better predictions. So what you see inside this final version is a pretty good way to do predictions. So for those of you who have used ignition before, I wanted to show just in about a minute, here, what this looks like inside the ignition designer. The ignition designer is where we've designed all of everything that you see. Everything that you see inside the client is done inside the ignition designer. If you've never seen this before, it's a design environment that allows for screen building, it allows for alarming, it allows for historians, it allows for configuring all sorts of things inside ignition. It's pretty much the place that you go to configure everything inside the platform since ignition is one integrated platform that has everything inside it. So inside the designer, I'm going to hop over to my predictions that we were just looking at.
Kevin: I'll come under my windows, go to my main windows, and then open up the predictions right here and I'll run through these as well and predict that power output. And then, if you take a look behind each one of these, basically what it's doing is it's sending information out, and what we've done is we've created scripts that allow you to very easily call out to Azure, so this is a run machine learning set up and if you were doing this inside scripting, so this is for folks who already use ignition, but you can come in and you could do a project.Azure.runmachinelearning and then you pass in that you URL, the API key, and your data set, and then that's going to give you back the results from Azure Machine Learning Studio. And if you are an expert with ignition, you'll notice that all we're really doing is we're sending a post out to that restful web service endpoint. So I don't wanna get too much into the technical details behind the scenes, but of course if you wanted to have a further discussion about any of this with us at Inductive Automation, we are happy to have that as a follow-up to this webinar. We're happy to help you get started on your project and give you a little bit of guidance. With that, I will turn it back over. I think it goes back over to you, Travis, right?
Travis: Yeah. Alright, so as you probably noticed, Kevin ignition in his demos to illustrate how to get information from ignition to Azure and back so that we can see that information. And one note to mention is that the data set that he had was fairly small, the larger the data set the more information we can feed into the system, that of course, is good quality sanitized, the better the results we're gonna get out of these kinds of systems. But the process really isn't that complicated from end-to-end. Understanding the right algorithms and all that. There's some details in there, but there is a process by which you can make this happen and ignition does have... It's not a machine learning platform itself, but it can definitely be a great asset in creating an effective solution overall. It has a lot of the data that you need and it can connect to those platforms very easily, as Kevin showed there.
Travis: And we also, a lot of these machine learning platforms, Azure as well as AWS, have these where now we can use the cloud to create these models and then take those models and run them on... Locally on a PC so we can feed live information through to that, and where it can start doing these predictions and/or anomaly detections locally without having to send a lot of data to the Cloud all the time, that they'll stream. But to utilize the cloud for to run through the potentially gigabytes or terabytes of data that you might have. So we have a lot of facilities to get the information easily into those systems so we can use it. Plus, another important part is that the upcoming release of Ignition version 7.9.8 will have easier access to doing some of these machine learning and algorithms through our scripting language. And Kathy, you were... Tell a big part of this, can you help explain more about that?
Kathy: Yeah. This is really exciting. This came from a suggestion from our user forums. And what we've done is we've added 19 different scripting functions that will help you with the statistics portion, things like finding the standard deviation, the mode, the median. In adding those scripting functions, we've also given you access to the entire Apache math library, and that means it's going to be very easy to pull in descripting things like Apache maths, neural networks, their classifiers, and their genetic algorithms. So, this gives you some very powerful machine learning tools that you can just use in any of your scripts starting with 7.9.8.
Travis: Perfect. So, to recap the main points of what we talked about today, machine learning is related to analytics and artificial intelligence. There are two main machine learning modes, as Kathy mentioned, the classification and regression. It's important to understand those two to know which one we're gonna be using. And there are many different industrial machine learning applications for the ways you can apply it, but the number one are why of course, is predictive analytics and predictive maintenance, and we see a lot of customers going down that path. Again, they had started you need a lot of data. If you don't have a lot of data, you start collecting that information today, and I guess you could say they help you do that, but really need to have a dedicated person who is qualified to understand the process and who has a team who can extract, transform, and load, and get that information to the right way so that we can sort through it and use it with these algorithms.
Travis: Other steps are, of course, picking a question to answer, what are you trying to use machine learning for? Is it predictive analytics as you're trying to predict to the motor's failing? Are you trying to get a forecast of your power consumption? What is that question? You gotta use that domain knowledge in order to understand how to answer that question correctly. Again, using different tools to extract, transform, and load, Ignition can certainly help with that. There are other tools out there as well, but we gotta make sure that data is very consistent and has all the context because we don't wanna feed, again, bad data to the algorithms there especially as you're in the cloud. We wanna visualize data. It's very important to visualize. There's no excuse, as Kathy mentioned, to not visualize data because there's a lot of tools out there. Ignitions certainly can do that as well as these cloud platforms can make it easy, and visualize as you go.
Travis: It's very important, as Kevin was showing, that we have to look at that after the algorithm was run through to determine whether it's actually effective or not, and to utilize a lot of different algorithms that are out there. Again, choose the right platform or choose the platform that you want to try with and work with, and to definitely test those models, and continue to come back to it, maybe retrain, look at it again and again until you're pretty confident that you're getting the right results. And again, Ignition has the interoperability and the capabilities to be a good asset in all of this. So, before we get to the end here, Kathy and Kevin, do you have any closing advice about successful in pointing machine learning? Kathy?
Kathy: I would say, we hit domain knowledge a lot. I really wanna hit it again. You need to really think about what data is coming in that's gonna be useful to your project, and what data you want to get out, and the only way you can do that is really understanding the problem.
Kevin: Sure, and I'll just add, for me, the journey into the exploration about machine learning, what I found I tried to present it in a way here that was pretty simple, but what I found is that the more you know, the better things are going to be. So, really having that knowledge about machine learning, having the knowledge about the algorithms, having knowledge about how things work behind the scenes rather than just kind of throwing things out in algorithm, hoping that it will work. If you really understand it, you're going to... That's going to set you up in a way that this is going to be as effective for you as possible. And so many things are moving in this direction. It's been part of the key notes of Google, Amazon, and all of these, and Apple for a really long time, so much is moving in that direction because it's so powerful, but you need to make sure that you treat it that way, treat it with respect, treat it with a curiosity, and that really, really go for understanding how it works rather than just trying to see it as another tool, see it as something that is this amazing technology that we're going to be able to use in manufacturing, and industrial industries, and IoT platforms, and everything going forward from here on out.
Kevin: And to finish with that, I think don't be afraid to fail. It is... You have to have that experience and knowledge of machine learning, but you also need to learn how to use the tools effectively. So, try them, and in the afternoon you can get some data out there, get some models created, get information coming back to with your system, and who cares if it's bad or good? You understood the process, because once you understand that process, and you start learning the algorithms, and you know how it's all fit together, you're gonna be more sophisticated in how you can approach it. So with that, thanks for watching today. We're gonna move to the Q and A sections at the moment here, but before we do that, if you like to learn more about our design like a pro tips, we have many webinars and white papers and other topics. As you can see here, such as alarming, or HMI optimization. You'll see a lot more on our website inductiveautomation.com in our resources section. And I invite you to go ahead and download and try Ignition for yourself at our website. It takes three minutes to install it. Once you download it, it's absolutely free, as you can run into our trial period and fully evaluate and get a prove of concept of running both in collecting data and interfacing with these machine learning systems.
Travis: We also have a lot of training videos available at our inductive University. It's a free learning platform with over 21 courses that can guide you through steps to our Ignition credential. There are specifically some courses and some videos on talking to web services, so open REST-based APIs. There's also, of course, ones about how to extract data from the historian and to use that. So there's a lot of help that could help you get information from Ignition to these systems. In addition to that, of course, is a wealth of information on our website and our documentation. We urge you to go take a look at that. And Kevin, myself, we are happy to help as we go forward.
Travis: Alright, and if you're interested also to get a personalized demo of Ignition, please, you can contact us. So as you can see, Kevin and myself's information's there on the slide, as well as all of the account executives we have here in Inductive Automation. So please, feel free to contact us. With that being said, it's time to go to the Q&A, and I'm sure there are gonna be lots of questions here, but to get us started with this. First, Malcom here asks what machine learning engines are compatible with Ignition. And Kevin, over to you for that.
Kevin: Sure, so pretty much everything that's out there, you can... Most algorithms that... Well, most platforms that exist today are going to be available online. In addition to that, the libraries, the low-level coding, you can normally interface with those from external systems to and through Ignition's Python and through Ignition's support for Java through the SDK. You can normally make connections to just about everything that's out there.
Travis: And Kathy, one question over to you. Samuel asks: Will machine learning be used to monitor security of networks within the plant?
Kathy: Absolutely. This has actually been already implemented. The number of my fellow students in the grad program were working on using machine learning to detect anomalies in network usage and in network... To detect intruders in networks. So it's a very exciting area in network security, and I'm sure that that's gonna transfer over to plants.
Travis: So there's a lot of questions here about will the session be recorded. Absolutely, it'll be available online, along with the PowerPoint here. Got a bunch of questions asking, Kevin, if that project will be available to share with us.
Kevin: Yes, absolutely, we'll make that available.
Travis: But you gotta contact us, we gotta have a session where we can go through it a little bit.
Kevin: Yeah, it needs to have a little bit of context, and that we're able to give you an orientation.
Travis: So a question here from Rod: Do you have guidelines for the time required for machine learning project that uses time series data? How much data is needed to produce the prediction? Kathy?
Kathy: It's really gonna depend on your project. So some projects are gonna need a huge amount of data, and some are not gonna need a huge amount of data. And so, unfortunately, this is gonna be a case of start with the data you have and see if you can get good predictions; and if you're not, try to acquire more data.
Travis: And this question here, it goes over the algorithms we talked about quite a bit here, but there over 100 different algorithms out there that can be used. Is there a process and/or examples that we can go through to select the best-fit algorithm?
Kathy: Right, well, in all those algorithms, they generally fall into general categories. So we talked about the general categories of classification versus regression. Within those, there are more general categories, so as you learn those general categories, you're gonna find out which ones are better for certain types of problems. For example, my Master's project was on image classification. So for image classification, the general category of things that work really well are neural networks or support vector machines, so you can narrow that down very quickly. And then within those, try a few algorithms.
Travis: Thank you. So from a records management viewpoint, as far as historical data, time series data, how long do we change the data? And does deletion of past data have a detrimental effect to machine learning? I think we're getting to his ears. We're also gonna train our model with a lot of this historical data. So once we have that model, should we just continue going with it forever, or what do we do with new data?
Kathy: Right, so once you've trained your model, and you don't necessarily need new data or your historical data because the model is the encapsulation of that data. But if your processes change over time, or you get more experience that can be helpful, you may want to retrain your model, in which case, having that past data available to you again can be helpful. This is part of why you can load that past data into some other database or some other storage so that you have it available for your machine learning project.
Travis: Okay, so this is a question here from Marcello: Is it possible to change any parameter of the models like training algorithms, and so on?
Kevin: Yeah, absolutely. So as your machine learning studio or some others... Pretty much, every other also has tuning parameters. I didn't show any of the tuning parameters there, but you can tune just about every algorithm.
Travis: We have time for a few more questions here. There's a question from Sam that says, "Is there a function in Azure to run through available machine learning algorithms to find the one that scores the best? Or does it have to be a trial-and-error like you were talking about there?"
Kevin: So it's been a trial-and-error in the past. Azure is always adding new things to Microsoft Azure Machine Learning Studio, so I don't know if anything's been added over the last few months, but it used to be a trial-and-error. But it's normally, you have a little bit of experience with these algorithms so that you're able to go ahead and pick the ones that you think are going to be best so you don't have to go through all 100 or 30 that are in there.
Travis: And again, it is important to understand the machine learning algorithms. I will mention one thing that Kevin's demo he showed, the algorithm ran very quickly. There wasn't as much data. If you have terabytes of data, we're talking about an algorithm that's gonna take some time to run, maybe hours or days. And so, you don't really wanna run all 100 of them to try to figure out which of the one is the best. It's best to put some better guesstimate as to which one to use because it could take some time to run through those models. Alright, so my question here from Ryan: How many positive data points would be needed for a good model? In other words, can machine learning be used to correctly identify rare conditions occurring? Kevin? Kathy?
Kathy: Machine learning can do a couple of different things with this. So one is there are certain algorithms that are very good at detecting outliers; so maybe your rare occurrences are outliers and you want to find those. Otherwise, if it's something that only happens rarely, you need to think about how you can make it less rare in your training. Maybe you oversampled that rare occurrence so that the machine learning algorithm can really figure out what are the common characteristics of that rare occurrence.
Travis: Okay, another question here from Ryan: Is there any good algorithm for using time series data? Or maybe a way to transform time series data into something that algorithms can really use?
Kevin: Yeah, so what I used for the solar farm was transferring the time series data that we had there into something that had what's an hour, hour of the day. So zero to... Or one to 24, or zero to 23, which allowed data from one day to the next to be compared to each other in a way that machine learning algorithms understood. So that's a really good way to take the time series data, depending on how you're using it, and you can do that, or you could take... If you're going based on the start of an event, you could have time since the event started, and one minute, two minutes, three minutes. So those types of transformations can be really useful for predicting things going into the future.
Travis: I think the last question here, and then we have to wrap up, but for self-training on machine learning, can I use downloaded data in Excel or CSV format stored in my PC? Or does it have to necessarily be live data? That's a question from Robbie.
Kevin: Sure, yeah, you can use CSV format. The training data set that I showed was in CSV format, so you could stream it back from Ignition or somewhere else, or you can use pretty much any format for most machine learning algorithms.
Travis: Alright, well, that wraps up our session here today. We thank you for attending. And of course, we're gonna be doing another one of these exciting webinars on May 30th; registration will be available pretty soon, so stay informed about upcoming webinars and events by following us on Twitter, LinkedIn and Facebook. And please sign up for a weekly newsfeed on our website. So everybody, thanks for watching, and have a great rest of your day.
Want to stay up-to-date with us?
Sign up for our weekly News Feed.