Ignition Diagnostics and Troubleshooting Basics

46 min video  /  39 minute read
 

Speakers

Cosmo Stevens

Software Support Engineer I

Inductive Automation

Corbin Harrell

Support Applications Supervisor

Inductive Automation

Ignition offers numerous built-in tools for gathering diagnostic information about the health of your system. This session offers an overview of these tools and explains how our Support Division leverages this information during the troubleshooting process. By the end of this session, fixing problems will feel like shooting code in a barrel.

Transcript:

00:10
Cosmo Stevens: Thank you. So today we are going to talk about Ignition diagnostics and troubleshooting. So, first of all, it's our Support Department's goal to provide industry-defining live support to our integrators and end users, or you guys. However, support doesn't always start with the support ticket. It starts with the resources available. That's why today we're going to provide an overview of the resources available in Ignition and how you can use that information to troubleshoot your own problems.

00:42
Cosmo Stevens: So, I'm Cosmo Stevens and that guy over there is Corbin Harrell and we've both worked in the Support Department for quite some time and we are very excited to share some tips and tricks on how we solve problems in Ignition. All right. So first let's talk about the troubleshooting structure. So, in the support department we like to break down our structure into four stages. So, the first stage is discovery. In discovery, we try to collect as much relevant information to the issue as possible.

01:11
Cosmo Stevens: So, this can include things like gateway network architecture, history of recent changes, tag structure, etc. You get the point. Next we work on identification. In identification, we take this information and we try to better define the root cause of the issue. Then in isolation we take the information from discovery and identification and we work to isolate that root cause. And then finally, once we've defined what the root cause is, we work to provide resolution, which can come in the form of an actual solve or sometimes just a workaround. Now, it's important to understand that these four stages aren't always purely sequential.

01:55
Cosmo Stevens: Sometimes you need to revert to a previous stage to push the process forward. You know, sometimes, I mean, you might think that you've found what the problem is only to realize that it's actually a small part of a much larger problem. All right. So let's take a look at the diagnostic resources available in Ignition. So, you're probably all familiar with the logs. Basically if you're not, the logs are a list of messages, errors, and warnings that tell the story of what's going on under the hood inside of Ignition. And then you have the thread dumps. So thread dumps are a snapshot of the current state of all executing threads inside of Ignition. And then you have the metrics dashboard.

02:38
Cosmo Stevens: So this is actually pretty cool. The metrics dashboard you're able to create and customize your own dashboards that reflect metrics within Ignition. So you can track things like CPU utilization, memory usage, database traffic, etc. And then you have the running scripts page. So, the running scripts page, it tracks all currently running threads that are executing user-written Jython scripts. So, these resources are all directly available inside of the gateway and most of them have an option for you to download. All right. So, today we're going to focus mainly on the logs and thread dumps, because they're the most commonly used resources in Ignition.

03:20
Cosmo Stevens: So let's talk about the logs first. All right. So, in Ignition there's two varieties of logs. You have the wrapper logs and then you have the IDB logs. The wrapper logs are plain text files that the wrapper service that runs Ignition creates. So the wrapper service is Ignition to the Ignition JVM or Java Virtual Machine. So it can contain some information that the IDB logs don't have access to. So, let's say the Ignition JVM fails to start. You'd probably want to look in the wrapper logs for information on that. The IDB logs on the other hand, are generated by the Ignition JVM.

03:57
Cosmo Stevens: They're a SQLite database that contain the logs themselves as well as some additional contextual information about that log. So while the IDB logs don't have all the information that the wrapper logs might catch, they have some pretty useful pieces of information that you might want to be aware of. So for example, map diagnostic context keys or MDC keys. MDC keys are basically a key value pair that will give you information about the context of that log itself. So for example, a popular or like a common one that you might run into, is project name. So if your log has an MDC key of project name, it will tell you which project that log is associated with. Another benefit of using the IDB logs: thread identification.

04:47
Cosmo Stevens: So, IDB logs can tell you which thread is associated with the log that you're troubleshooting, which of course, if you're troubleshooting a problem that you know is associated with a particular thread, then you know you'll be able to identify that on the log itself. All right. So, both the wrapper logs and IDB logs are pretty easy to get a hold of. So, the wrapper logs are in the logs directory of the Ignition file structure. The IDB logs can be found on the gateway status logs page. Or you can just grab them both in the diagnostic bundle, which is in the gateway status overview page.

05:32
Cosmo Stevens: So, logs can be useful in all stages of troubleshooting. However, they're particularly useful in the identification and isolation stages. Ideally, logs will state the exact error that occurred. But even if they haven't, the logger that logged the log itself can help you narrow down which subsystem is being affected by the error. All right. So, logs can have five different levels of... Logs can have five different levels. So basically, the first one, we have error, which means that something that a system component isn't working and is likely interfering with other functionality.

06:09
Cosmo Stevens: Warn means that something unexpected happened and things might go wrong. Info just means that it's something of note. It's purely informational, hence the name info. And then we have debug, which means that something is useful during software debugging. And trace is as close as you can get to a step-by-step walkthrough of everything that's going on under the hood. As you can probably tell, error is the most severe level and trace is the least. As a default, Ignition stores logs with a minimum level of info. So info, warn, and error. However, you can change that minimum level in general or for specific loggers.

06:50
Cosmo Stevens: So, let's say you're troubleshooting something involving missed tag value changes. You might want to set your tags execution actors logger to trace. This could give you the information that you need to solve the problem. All right. The next most common resource is the thread dump. A thread dump is a snapshot of the current state of all the threads that are executing in Ignition. Thread dumps contain name, state, estimated CPU, the ID, and the stack. So this information can be useful in identifying performance problems or what threads might be blocking other threads from executing.

07:31
Cosmo Stevens: You can obtain the thread dumps through the thread dump page in the gateway, through a system function in the script or using the Jstack command provided by JDK. So collecting a thread dump is simple enough. But once you open it up, you're faced with a real issue: making sense of a massive amount of cryptic data. So, to help understand what we're looking for in the thread dump, let's identify the different components of the thread itself. So, the first element is the thread name. So, the thread name is generated or it's logically assigned when it's created, and it usually contains information about the systems that created it or the thread's purpose itself.

08:15
Cosmo Stevens: Then we have the ID, which is unique and will basically let you track that thread's movement through the different thread dumps. All right. Then we have the state. So, the possible states we have are new, which means that it hasn't yet started to execute, runnable, which means that it's either currently running or it's able to run. You have blocked, which means that it wants to run, but it can't because the lock required for it to run is being used by something else. Waiting means that it's either called a wait or join method, and then timed waiting means that the parameter required has a timeout parameter or it's waiting for something else. And then terminated just means it's completed executing.

09:04
Cosmo Stevens: All right. And then we have CPU usage, which CPU usage is basically just estimated percentage based on that thread's current runtime. This can be pretty useful if you're trying to find out what threads are resource intensive. And then we have the stack. So the thread stack contains stack trace of the process that's being executed by the thread, and this is probably the most useful part of the thread object itself because it provides a window into what systems are being involved with this thread. So it's important to remember that thread stacks are read from the bottom up. So the bottom line is the original or top-level function, and every line above that is a function that was called by one of the functions below in sequence. Most threads have smaller stack sizes because they're actually just waiting to do something.

09:58
Cosmo Stevens: They haven't really done anything yet. And then threads with larger stack traces are usually doing something meaningful. So if we're troubleshooting a performance issue, we usually want to collect three or four thread dumps and then look for larger size stack traces because this will tell us that that thread is doing something that's taking a good amount of time. And then once we actually have identified that thread, the function calls in classes present in the stack trace can point us towards the different systems involved in that thread. All right. So most of the resources available in Ignition can be opened with simple tools like Notepad. However, if you're troubleshooting complex issues that involve multiple files that all require deep analysis, you're going to lose your mind if you're using Notepad.

10:47
Cosmo Stevens: So probably don't do that. That's why Senior Software Developer and Support Department alumnus Paul Griffith created Kindling. Kindling is an open-source tool specifically designed to make analyzing diagnostic resources easier. And we love it. So it offers special views for both the IDB and wrapper logs, thread dump viewers, store-and-forward caches, gateway backups, generic IDBs and the metrics IDB. So we don't have enough time to cover all the different tools involved in there, but we're going to cover some of the more, like, used ones. So as mentioned before, the IDB logs contain a lot of information that can't be found in the wrapper logs. Now, IDB logs can't be opened in Notepad.

11:36
Cosmo Stevens: I mean, you can try, but you're just going to get a bunch of gibberish. You can open it in something like DB Browser, but then you have to make your own queries and nobody here wants to do that. So, Kindling provides a clean interface for viewing IDB log files. Additionally, it supports filtering based on time, logger name, logging level, MDC key values and even the associated thread. If for some crazy reason you still want to query it, it has the generic IDB viewer that you can use. Another useful tool inside of Kindling is the thread dump viewer. So typically when you analyze thread dumps, you want to take three or four thread dumps and compare them.

12:19
Cosmo Stevens: And so each thread dump contains, you know, about 100 threads and each thread has a stack size ranging from 10 to 100 lines. So we're talking about thousands of lines of data across multiple files. Please don't do this in Notepad. I've done it before and I seriously hope I never have to do it again. It's terrible. On the other hand, Kindling has a thread viewer tool that sorts all the threads into a table and you can sort and filter based on the different elements of the thread. So if you prefer Notepad, that's fine. But let me make this analogy. Let's say you're working on a car. So, there are hundreds of tools that you can choose to use to work on your car.

13:00
Cosmo Stevens: You can use hand tools and a jack or you can use power tools and a hydraulic lift. Now, I mean, depending on what the problem is, you can probably do it with hand tools and a jack. But if you have hydraulic lift available and some power tools, it would probably be a lot easier. So the pros use the best tools possible for the job and so should you. Troubleshooting software problems is very much the same. If you want to solve the problems like a pro, use the tools that the pros use. Kindling is both free and it's available on the Inductive Automation GitHub page and both Paul Griffith and our Support Department's applications team continue to maintain and improve the tools used in it.

13:43
Cosmo Stevens: Okay. So we've taken a look at the troubleshooting stages, the diagnostic resources, and the tools we use to reason. I think it's time we see them all in action. So I'm going to hand it off to Corbin here and he is going to troubleshoot a real-life problem right here in front of you all. Thank you.

14:03
Corbin Harrell: Thanks, Cosmo. All right. So in this example issue, the customer has emailed in with some issues that they've noticed with their tag history trends. They've noticed that during the busiest parts of the day, their trends go completely blank for a few seconds at a time up to a minute. So before we dive into this issue, we need to discover enough information about the customer's gateway that we don't make any dangerous assumptions while troubleshooting the issue.

14:31
Corbin Harrell: So the first piece of information that I'm going to ask of the customer is their gateway architecture. Their architecture can have a huge impact on how various systems work. For instance, tag history. Whether the gateway is distributed, hub and spoke, or just a single gateway is going to have a huge impact on how we would expect tag history to work. So this customer lets us know that they just have a single gateway that's connected to a MySQL database that's hosted on another server but in the same network.

15:01
Corbin Harrell: All right. Now that we know the architecture, we can get information about the resources available to these servers. So, in Ignition and pretty much all applications, it's important to make sure that the servers running them have enough resources that the servers can perform as expected. An underallocated gateway can cause a whole host of issues. So, we want to make sure that they have the resources to perform as expected. This customer lets us know that they have a server with 12 gigabytes of RAM and the gateway has eight gigabytes allocated to it. Taking a look at the metrics dashboard, or, sorry, status page, we can see that the memory trend has a... shows that usage never gets above four gigabytes.

15:55
Corbin Harrell: So, we can see that the memory trend is above four gigabytes. So, we know that memory doesn't seem to be an issue with this gateway. Similarly, the CPU usage doesn't seem to spike at all and there's no clock drifts in the logs. So, it seems like this gateway has the appropriate amount of resources to function as expected. So, now that we know that the gateway should be able to perform as expected, we need to understand what we're trying to do with this gateway. So, now we're going to ask the customer what subsystems and modules they're using. Whether a gateway is using Perspective or Vision or both, OPC UA server or a third-party server makes a huge impact on how we expect these systems to interact with each other.

16:35
Corbin Harrell: So, this customer lets us know that they're using Perspective for visualization. They're using the Ignition's built-in OPC UA server connected to five devices. They have tag history being stored for 10,000 tags. They're using Perspective to visualize that tag history as well as interact with their devices. All right. So now that we have a baseline understanding of how these systems are expected to behave, we can move on to identifying the issue that's causing these tag history trends to go blank. So, some important questions we need to answer during this stage are what exactly are these graphs actually querying for? Are there errors in the gateway logs that correspond to when these graphs go blank?

17:22
Corbin Harrell: And if not, are there loggers we could set to trace or debug to gain more context as to why they're blank? So, to answer these questions, we asked the customer to send over a project export as well as an export of their gateway logs. The first thing we're going to look at is their project export. So, taking a look at the project export, we were able to find the view that they were having issues with. The charts that are displaying their tag history are XY charts with their data property bound to a tag history binding. So, this is a pretty common way of handling tag history dashboards. So, nothing about this setup looks particularly problematic so far.

18:01
Corbin Harrell: Looks like they have a polling rate of five seconds, so they're getting live data, and they're querying over a period of three hours. Pretty reasonable. So, let's go ahead and take a look at the logs and see if there are any errors that might explain the behavior. So, this is Kindling's IDB log view. As you can see, there's just a ton of information here. Not sure if you can make it out, but there's 54,000 events in these logs. That's just way too much to look through. Additionally, it looks like the customer has some sort of script logging some custom values. It's probably useful to them, but not to us.

18:40
Corbin Harrell: So, we need to figure a way to sort through these logs to get at the information we're actually interested in. Let's use some filters. So, the first filter I'm going to apply is the time filter. So, the customer screenshot included the last time that this issue occurred. So, we're going to set the start time of this time filter to just a few minutes before the issue occurred. That way, we get a few events leading up to the issue, and we can also see the time that the issue occurred and a few minutes afterwards as well.

19:13
Corbin Harrell: That's dropped down the logs to just around 5,000 from 54,000. That's pretty good, but we can get better. So, the next filter I'm going to apply is the level filter. So for this initial run through of the logs, we're really just interested in error messages. This is because error messages generally contain the most information per message compared to all the other logging levels. So, for this initial run-through, we're just looking for those errors. But on subsequent look-throughs, if we don't find enough information in the errors, we might look at info and warn as well. For now, just going to apply that error filter. And we can see it's dropped down to just 3,000 events. Now there's one more filter I'm going to apply here. And we're going to take advantage of MDC keys. So, MDC keys are extremely useful. So, this customer sent us a project export so we know exactly what view encountered the issue of the tag history trends going blank. This means we can use the MDC key filters to apply a filter for only logs that are related to that view.

20:20
Corbin Harrell: So, the way we do that is first we select the first drop-down. And we select the view key. So, this key lets us know that it's just going to be for Perspective views. Then we select the second drop-down, which are all possible associated values for that key. We're interested in the dashboard value, which is the view that the customer is having issues with. So after applying that filter, we can see that we're just down to 251 events. That's way more reasonable than the original 54,000. And what's more, it looks like the events are all pretty much the same. So we're just going to take a look at these top two errors and figure out why these trends went blank. So, this first error is a database connection faulted error. The second error is an error executing historical tag reads. It stands to reason if the database that this tag history is stored in has a faulted connection, we're not going to be able to get the tag history. But why is this database connection becoming faulted to begin with? Well, to answer that question, we can look at the exception that was thrown that generated this error message.

21:29
Corbin Harrell: That's in this stack right underneath the message itself. So we can see that this exception says, "Cannot get a connection, pool error timeout waiting for idle object." So that sounds pretty cryptic. But let's back up and explain how database connections in Ignition actually work. So, when we have a defined connection in Ignition, that connection has a pool of available connections that can all run queries in parallel to the database. This is really useful so we don't have to wait for one query to finish before we run the next. But that pool has a max size by default. This is to prevent potential resource issues in both the gateway and database. So, we seem to be running into an issue where all of these connections are being taken up. Once that happens, new queries don't just immediately fail, they have to wait to see if one of those connections becomes available. If they wait for too long, then they throw this error. So, both of those settings, the max size of the connection pool and the time that it waits before it throws an error, are configurable in the database configuration settings.

22:40
Corbin Harrell: So, we ask the customer if they've changed these, and they have not. So, we know that there's a max of eight connections by default with a max timeout of five seconds. So, now we know that this error is being caused by the database becoming faulted. We know that the database is faulted because all of the connections are taken up. But we don't know what queries are actually taking up that connection. So, we're going to have to isolate what is causing the database to fault. So, in order to do this, we ask the customer to take a screenshot of their gateway status page under the databases section, where they can click on the database that's faulting and see all of the connections and all of the queries running on those connections. They've done that and sent that over. And we can see that, indeed, all eight connections are being taken up. And it looks like all of the connections are running this stored procedure, this call sensor stats. What's worse is that if we look down in the longest recent queries, we can see that stored procedure has taken around 15 seconds to run in the past.

23:49
Corbin Harrell: That's a pretty long query, and we're running it an awful lot. So we can see why all of these connections are getting taken up. They're waiting for these queries to finish. New queries are coming in, and they're failing because they're waiting for the max timeout. But what is triggering these stored procedures? And how can we find out? Well, the first most important resource is just asking the customer. They are going to have the best idea of their project structure. So they should probably know where this stored procedure is being called from. The problem, though, is that projects are big. We don't expect you to actually know where all of your queries are coming from. Because in Ignition, they can come from anywhere. Unfortunately, this customer also doesn't remember where this stored procedure is being triggered from. So we're going to have to use some other diagnostic resources to narrow down what we're looking at. So, for this instance, we're going to ask the customer to take some thread dumps. The thread dumps should allow us to look at each thread and see if it's busy running a query.

24:53
Corbin Harrell: So, the customer has done that the next time the issue occurred and sent us over a series of thread dumps. And we can now take a look at them in Kindling. All right, so this is what we're going to see in Kindling. So the customer sent over a zip archive. Now, Kindling actually allows us to open up a zip archive without having to unzip it first. This is really useful if you're taking a look through gateway backups, which are also in the archive format, or the diagnostic bundle that you can get from the gateway overview page. So, opening up this zip file, we can browse the contents in this file browser to the left. Clicking one of those files opens up the corresponding tool for that file, in this case, a thread dump view tool. We can also select all of the files in this zip archive that are of the same format and right-click and select "open in aggregate view." This gives us a really useful multi-thread view tool. So the multi-thread view tool has a few advantages over the normal view. It has this state column, which shows how the thread state has changed across all of the thread dumps. It also gives us a max CPU and max depth columns.

26:11
Corbin Harrell: These tell us the max values found across all of the thread dumps. These tools are really useful when looking at a large series of thread dumps to narrow it down to just one or two thread dumps that are actually encountering the issue. So let's take a closer look at these threads. Now, we don't want to have to look through every single thread, because there's hundreds of them and they have big stack traces. So we need to figure out a way to filter it down to just the threads that are related with our database query. So the way we're going to do this is by using the search bar at the top right. We're going to search for a key term.

26:48
Corbin Harrell: So, this is a key term that we're going to expect to be in the stack trace or the thread name. In this case, we're going to search for MySQL. This is because the database connection is MySQL, and therefore, Ignition is going to be using the MySQL JDBC driver to make queries. So somewhere in that stack trace, we're going to expect a class or function name mentioning MySQL. After applying that filter, we do indeed see that it's narrowed it down quite a bit. But there's definitely more than eight threads in this list of threads.

27:19
Corbin Harrell: So we're going to want to narrow it down just a little bit more. So the next filter we're going to apply is a state filter. The threads we're actually interested in should actively be running queries to the database. So that means they should be in the runnable state. I'm going to apply that filter, and now we're just down to 12. But 12 is still more than eight. At least, I think it is. So why are we seeing more than eight threads here?

27:47
Corbin Harrell: So if we look at the state column, we can actually see that although there are more than eight threads, only eight of those threads are actually in the runnable state across each thread dump. So that confirms that all eight of our database connections are being taken up by these Perspective worker threads. So we've narrowed this down to a Perspective issue. But we need to figure out what resource is actually causing these queries. So by selecting one of these threads, we can take a closer look at each thread across each thread dump. So remember, this is the same thread just taken capture of across four different times. So we can see that the stack trace looks like it's basically the same across all of the thread dumps.

28:32
Corbin Harrell: So let's take a closer look at that stack trace to see what's going on. So wow, that's a bunch of garbage, right? We have these Java function names, which I don't know if you've ever programmed in Java, but they are long. So it's important not to get intimidated here. When reading through a thread stack, you need to start at the bottom. So the bottom's always going to start with this super generic function, something like java.link.thread.run. This is just the function that's responsible for generating the thread to begin with.

29:07
Corbin Harrell: So as we move our way up, we can get to more specific threads or function names. So up here, we see inductiveautomation.perspective.gateway.binding.transform.script. Okay, yada, yada, yada. What this is telling us is that this is a script transform thread being run. So somewhere in our customer's project, there's a script transform that's running and generating these stored procedure calls. So now we have to work on isolating where that binding is. So I guess it's time to grab an export of the customer's project, look through every single Perspective project they have, open every single view in that Perspective project, open every single component with a binding in that Perspective project.

29:53
Corbin Harrell: But that's going to take forever. We don't have time for that. So let's take a step back and try to think of a way to better isolate what resources are triggering these queries. We're going to look for some loggers that we could potentially set to trace or debug to get more information about these queries. So after browsing the Inductive Automation forums, which is a great resource by the way, we find this post by, well, once again, Paul Griffith. So he's telling us that the gateway database updates and gateway database selects loggers when set to trace will log every single query that comes out from Ignition. So that's going to be a lot of info, but it should have the information that we need about these queries.

30:38
Corbin Harrell: So we ask the customer to set them to trace. They grab a capture the next time they encounter the issue. We're going to take a look at that now in Kindling once again. All right. So as we can see, there's a whole bunch of data again. But now we know specifically which loggers we're actually interested in. So we're going to use a logger filter to just turn off all the loggers except for the gateway database selects and updates loggers.

31:04
Corbin Harrell: So those are the two loggers we set to trace. We can see there's a whole bunch of queries going on. And right there in the middle, we see that call sensor stats query. Clicking on this query, we get some additional information down at the bottom. We get the thread that logged that statement. And then we also have this magnifying glass, which when we hover over it, gives us a list of the MDC keys and values associated with this log event. So, these MDC keys not only have the Perspective project, but also the view, the component, and even the property that has this binding. So that's just made our life a whole lot easier. All right.

31:43
Corbin Harrell: Now that we've isolated the issue, we can move on to finding a resolution. So there's a couple ways we could approach this. First, we could simply increase the maximum number of connections that Ignition can make to the database. Right now it's set to eight. But those eight are getting filled up, and we're getting this error. So maybe we should just increase that number. So this is probably a good solution to consider. But we have to rule out two other elements here.

32:10
Corbin Harrell: So first, is this query actually supposed to take this long? 15 seconds is pretty long for a query. Nowhere near the longest we've seen, but pretty long. The second question we have to answer is, why is this query being triggered so often? So we might come back to increasing the maximum number of connections, but we want to rule out other factors here. So for reducing the query time, this is something we'd ask of the customer's database administrator or architect. They're the ones that know their database best. They're the ones that are experts in constructing queries that are fast. We work with databases pretty much on a daily basis, but we're not the experts. We're experts in Ignition. So we might ask the customer to take a look at this on their own but in the meantime, let's take a look at what's triggering these queries and why they're running so often. So as we found earlier, the query is being triggered from a script transform. This script transform is on an expression binding. Expression structure binding, sorry. So expression structure bindings are awesome.

33:15
Corbin Harrell: They allow you to track multiple properties and reevaluate when any of those properties change. Now, they're a super useful tool if you need to live update values whenever any of those properties change. But for a customer that's trying to generate a report, is that really what they're looking for? Normally, when I'm going to generate a report, I have some parameters in mind that I'm actually interested in seeing for the report. The way that this is set up, every time any of those parameters changes, we have to reevaluate the report.

33:47
Corbin Harrell: So after talking with the customer, they agree that they don't really need to see the report reevaluate anytime any of these parameters change. They're only interested in seeing the report once all of the parameters are set to what they actually want. So we restructure this view so that the stored procedure is triggered by a button press instead of this expression structure binding. In addition, we're going to ask that the customer increase the max number of connections open to this database. This is because if they're running into really expensive stored procedure calls, they probably also have a bunch of other expensive queries.

34:25
Corbin Harrell: And we want to make sure that they have enough buffer room in that connection pool so that they can run those expensive queries and also run their tag history queries without potentially running into any issues. So after implementing those changes, the customer hasn't noticed any more changes or any more issues with their tag history. All right. So to solve this problem, all right, three times this time. To solve this problem we had to go through all four stages of troubleshooting.

34:57
Corbin Harrell: So we first discovered how this project was supposed to behave. Then we were able to identify that the error causing the tag history trends to be blank was being caused by the database going faulted. We were able to isolate the cause of that fault to a stored procedure being triggered by a script transform. Finally, we were able to resolve the issue by redesigning the view so that the stored procedure wasn't called too many times. So by applying these troubleshooting stages, as well as understanding the diagnostic resources available to you and the tools you have to use those diagnostic resources effectively, you can elevate your troubleshooting abilities to the next level. Thank you for coming. And we'll now be taking questions.

35:53
Audience Member 1: This was a real troubleshooting drill. Can you give us an idea of the time frame start to finish? It seemed like you went back to the customer a few times, asked for logs, asked for screen dumps, things like that.

36:05
Corbin Harrell: Yeah, so this example issue was actually recreated. So it wasn't a real-life customer. But we have seen this specific issue encountered by customers pretty often. So we will typically find that these connection pools will get filled up if we're running a lot of expensive queries. And oftentimes, those expensive queries are running a lot more often than they should be. But yeah, you bring up a good point. And that's that we had to kind of bounce back and forth between the customer. Now, we really like live session support getting on the phone because we can normally figure out these questions, collect all the resources we need pretty quickly. But sometimes, the email format just leads to a little bit of back and forth. Now, there are other ways that we could have probably approached this issue to come to a solution faster but we really wanted to demonstrate just baseline with very little understanding of a customer system, going in and applying those stages and coming to a solution. Thanks for the question. That was really good. Is there documentation on Kindling? That's a great question.

37:11
Corbin Harrell: So we have a public Inductive Automation GitHub account. And Kindling is one of the projects on that account. There is some brief documentation on some of the tools in that repo. But we're working on fleshing out that documentation. The tool is really useful. And there's a lot of features that we probably need to document a little bit better. But yeah, it's coming.

37:35
Audience Member 2: Maybe even [an Inductive] University video or something?

37:38
Corbin Harrell: That's a great idea. Yeah. Paul? We'll probably look into creating an IU video on that as well. Thank you for the suggestion. Any other questions?

37:50
Audience Member 3: So I think we're seeing here a good use case for how this software works. What are some of the other scenarios that you most commonly get outside of queries in a database? And I know we don't have a ton of time to go through a whole other example. But can you talk us through, do you still follow the same procedures? What kind of tips and tricks do you have for using that?

38:17
Cosmo Stevens: So each problem is different and has to be approached differently. This is more of highlighting the different stages that you would go. And you get to certain points where you have to go back and keep on looking. But generally, for performance-related issues, you want to check the thread dumps, because that can give you a lot of information. And I mean, it's not terribly wrong to go and just start always in the logs, because if you're troubleshooting something and then you've been looking at it for an hour, and then you go back and check the logs, and it's right there, then...

38:52
Corbin Harrell: Yeah, I'd agree with that. The logs are always the first place to look. We kind of crafted this issue so that we could use a broad spectrum of resources. But I also say a really good resource is that running scripts page we mentioned at the beginning. Had we looked at that page while the issue was occurring, we would have actually seen the exact threads that we saw in the thread dump running those queries. We probably even would have had a reference to the resource running those queries. So that running scripts page is really good if you expect it's something like a Jython script that you wrote that's responsible for the issue. You can track it down, especially scripts that are running longer than they should be can be viewed in that page.

39:36
Audience Member 4: Hi. I have a question up here. I want to know if you have any recommendations regarding using external tools for debugging and troubleshooting. So for example, I know we use a lot of Wiresharks for network-related issues. So is there any other tools that you would recommend that you actually use internally to troubleshoot? That'd be great.

40:00
Cosmo Stevens: I mean, Wireshark for networking situations, Wireshark is definitely the best way to go. I mean, usually checking netstat is useful, but that's usually something you would do before Wireshark.

40:14
Corbin Harrell: For connection issues to devices, we'll often try to use a third-party OPC UA server, whether it's UA Expert or Kepware or something like that, just to confirm that other services, is it just Ignition that's having this issue or can no server reach that device? There's also a tool OP, I think it's OPC UA Expert, or Security Expert, something like that, for troubleshooting OPC DA connection issues. Because nine times out of 10, it's a security issue, and those tools can tell you exactly what to do to fix that security issue.

40:54
Cosmo Stevens: For Modbus, simply Modbus is also a good solution. You're able to test out whether you're able to read from right there, that way you can rule Ignition in or out as part of the problem.

41:09
Audience Member 5: Outside of GitHub, what was the other location where Kindling can be found? I think you mentioned two.

41:16
Cosmo Stevens: There's a forum post. I mean, if you literally just Google Kindling Ignition, it'll either take you to the GitHub page, or it'll take you to the forum post, which will take you to the GitHub page. But if you want to save time, just go to the GitHub page.

41:29
Corbin Harrell: Right now, it's only hosted on the GitHub page. That might change in the future. We might eventually host it somewhere on the IA page, but it's still being decided.

41:42
Audience Member 6: Yeah, so recently we had a CPU issue, and we called tech support, fantastic response. Kind of went through some of this. You know, you guys mentioned things like, "Oh, your pool size is eight by default." We don't necessarily know that. I don't know if there's, I think the solution was that there was too many scripts in memory, and so you were able to tune. And instead of 900 scripts, we limited that to 50. Boom, fixed everything. Is there details on some of those tuning parameters?

42:23
Cosmo Stevens: There is documentation for a lot of this stuff, but a lot of those different parameters are controlled by values inside of the Ignition comp file. So take some time. Just go through there. Like, there's a whole bunch of different things in there. But if you just start looking through and asking questions, "What does that do? What does this do?" A lot of that will kind of open that up.

42:43
Corbin Harrell: Yeah, so there is a user manual page that goes over every single custom parameter you can set in the Ignition configuration file that Cosmo mentioned. That does include tag change script pool, which kind of sounded like what maybe the issue was. So part of the reason we're doing this presentation is because Support really wants to, moving forward, put a lot of emphasis in making sure that everyone has the same resources we do. So when troubleshooting issues, you guys know exactly where to check. So that's a big push that we're going for. And so we're working on a lot of improving our documentation to make that process easier.

43:23.
Audience Member 7: And if I may, when troubleshooting and dealing with loggers specifically, how do you typically determine setting trace or debug? Is there a general rule of thumb or generalities?

43:40
Cosmo Stevens: I mean, debug or trace is more info. I just set it to trace because you're going to get debug and trace when you set it to that. So I go straight for that. For certain things, the information can be found in debug. So you can have avoided that. But I'll just do it because it's a minimum level kind of thing. So it'll still include that. And then as far as choosing the different loggers that you set, I mean, we've been doing this for a bit. And honestly, some of them I remember. But there's thousands of them. There's no way you could remember them all. So that's when you really start turning towards the forums and describing your problem. And you would be truly surprised all the different things that would turn up when you start doing those searches.

44:20
Corbin Harrell: Yeah. All right, we have time for one more.

44:23
Audience Member 8: So are there plans in the future to include Kindling with the Ignition install? Because getting approval for an additional EXE from the corporate people is not easy.

44:39
Corbin Harrell: Right. It's a good question. I don't think it's something that's come up yet. But it's definitely a discussion worth having. I think for the most part, there is a push, especially in 8.3, to include a lot more diagnostic analysis straight from Ignition. This was kind of a tool that we wanted to build to make sure that troubleshooting 8.1, 7.9, 7.8, we have a resource for everything. But that's a really good point. And it's probably a conversation we'll definitely want to have.

45:10
Cosmo Stevens: Yeah.

45:11
Corbin Harrell: All right.

45:11
Cosmo Stevens: Thank you. So our big red clock is at zero. So we have to go. But if any of you have any additional questions, we'll go up to the SCADA Arcade area. And please come by and ask us some questions. Thank you so much.

45:22
Corbin Harrell: Thank you all so much for coming.

Posted on November 13, 2023