SQL: The Midas Touch

Turning Time-Series Data Into Enterprise Gold

16 minute read Download PDF

Old King Midas could turn anything he touched into gold with the simple touch of his hand, or so the old story goes. The tale dates back to Greek mythology, and reflects a human aspiration that dates back even further – the desire to change the value of one’s fortune. 

Although Midas’ golden touch might only be the makings of a legend, the ability to transform the valueless into the valuable is very real. One such example of modern day alchemy exists in the field of industrial automation. It is the ability to transform mundane time-series data into an indispensable asset through the use of SQL relational databases.

To succeed in business it is important to pursue every advantage possible. This includes acquiring more assets as well as fully capitalizing on untapped assets that a company already possesses. In the manufacturing industry one of the most undervalued and underutilized assets that companies possess is the time-series data residing in their SCADA (supervisory control and data acquisition) systems.

Many manufacturing companies view time-series data as a simple byproduct of using a SCADA system. They see time-series data as being useful for telling the status of specific data points at specific times, tracking simple trends ... and little else. Most companies don’t see any value for it beyond that point.

The usefulness of time-series data explodes exponentially when it is put into context with data from the rest of the enterprise. Time-series data that is easily accessible and relatable comes alive, offering deep insights into interrelationships between the plant floor and the rest of the enterprise that can potentially change the fortunes of the entire company for the better.

Time-series data is the lifeblood of any SCADA system. It is a sequence of data points, measured at successive time instants, and spaced out at uniform intervals. Time-series data is created when a data point from a PLC is read and time-stamped by a SCADA system.

Time-series data is used to monitor machines and processes connected to a SCADA system, usually through a PLC. An example of time-series data would be the weight data coming from a scale on a production line. As product is weighed on the scale each weight reading receives a timestamp from the SCADA system. In the pairing of a piece of data with a date and time, a time-series data point is created.

The Evolution of Controls Data

PLC SCADA Database

Time-series data can be displayed in real time, as well as logged as historical data. To be logged as historical data, time-series data needs to be saved in some kind of repository. This is where SQL databases come into play.

Traditionally, time-series data has been stored in a process historian, but relational SQL databases offer a compelling alternative. SQL databases compare favorably to process historians and even offer the major advantage of making historical data more accessible to other enterprise systems.

Before getting into a comparison of how to choose between process historians and SQL relational databases for storing your data, it is important to understand some basics about SQL databases and historians.

SQL Databases: Simple and Accessible

SQL-compatible databases are the most popular databases in the world, used to store information of all types in every industry you can imagine. SQL is not a type or brand of database; SQL is a standardized structured query language for databases. SQL databases are relational databases, which are structured like a large spreadsheet with rows, columns and cells, but are much more robust and powerful.

SQL databases are popular for their simplicity, ease of connectivity, flexibility and most of all their ability to quickly query related data. SQL was created with the specific intent to make it easy to ask questions of data.

The primary function of SQL is to create a query (or question) and run it against your data to retrieve an answer. Simply put, SQL was built to quickly answer complex questions about large amounts of data.

A SQL database makes a great repository for time-series data, but despite this, SQL databases are often ignored. One reason for this is a perception in the manufacturing industry that time-series data is not relational data, so therefore it doesn’t belong in a SQL relational database. Regardless of how pervasive this idea is, it’s not true.

Relational data is data that can be related to other data, and in order to relate things together there needs to be something in common. In this sense time-series data is actually inherently relatable because all time-series data has something in common – a timestamp. The timestamp on all time-series data means that it can be easily related to any piece of data that is also associated with a time.

This makes the notion that “time-series data is not relational data” simply false. In spite of this, many companies opt to use a much more expensive method of storing time-series data than SQL; they use process historians.

Process Historians: Proprietary and Detached

A process historian is an application specifically created to deal with time-series data. A historian is made for the storage and analysis of time-series data and as such is designed with an emphasis on the compression and speedy retrieval of large amounts of data.

Most historians use proprietary technology to compress and store data, which can make it difficult for other systems to easily communicate with them. As a result, time-series data is often kept totally separate and detached from the rest of the enterprise data; not because it has to be, but because historians make it difficult for other systems to work with it.

The outcome of this has been a misperception in the manufacturing industry that somehow time-series data is special, and can only be handled by a process historian. In truth, there is nothing special about time-series data, it’s just data. A SQL database will more than suffice for the logging of time-series data.

To explore how SQL can be a golden alternative to process historians, let’s take a look at the selling points of process historians and how SQL relational databases stack up to them.

 

SQL Databases vs. Process Historians

There are advantages and disadvantages for using process historians and SQL relational databases. Understanding the strengths and weaknesses of process historians will help you make an informed decision about using SQL to help your company make the most of its time-series data. SQL databases coupled with modern SQL-friendly SCADA systems can measure up and even outdo traditional process historians.

Speed

Historians can log data very quickly, but this comes at the cost of putting data into a proprietary, flat file format that other enterprise systems will have trouble accessing.

On the other hand SQL has gotten a bad rap for not being fast enough to log time-series data effectively. While at one point this may have been the case, modern SQL databases coupled with state-of-the-art SCADA systems have no trouble logging 100,000 tags per second and more. This is more than enough speed to handle most cases.

Size

In the area of file compression, process historians have a clear advantage over SQL databases. Historians do a good job of compressing data to maximize storage space because that is what they were designed to do. However this compression once again comes at the cost of having to save data in a proprietary, difficult-to-read file format.

While SQL can’t achieve the same compression ratios that historians can, a SQL database is easily scalable to accommodate the largest storage needs. Also, with the cost of memory getting cheaper and cheaper the space that data takes up is less and less important.

Support

This one is no contest. SQL is clearly more widely used and well supported than even the most popular process historians. SQL is used in virtually every industry on the planet, and as a result, IT professionals are very comfortable supporting SQL on a daily basis.

All you have to do is go to the bookstore to confirm this; there are hundreds of books about using SQL in all kinds of ways. You will be hard pressed to find even a few – if any – books that support of the proprietary languages that process historians use.

Cost

The advantages of cost go squarely to SQL. The price for process historians can be outrageously priced at 10 to 30 times the cost of a SQL relational database. It can cost upwards of $60,000 to get everything working. Getting a SCADA system set up to log historical data to a SQL database is a fraction of the cost, and offers more advantages.

Analysis

SQL was designed to make it easy to ask questions of data in whichever way you want by constructing simple SQL queries. SQL puts the user in the driver’s seat when asking questions.

On the other hand historians were designed to store data and return quick analysis of pre-built questions. Historians offer answers to questions quickly, but the questions are the ones the historians were built to answer, which are not always the questions that need to be answered.

Interoperability

This last one is where SQL really shines. SQL was made to be easy to connect to, and because SQL relational databases are the most widely used in the world, most enterprise systems are already using them. If your time-series data is logged in a SQL database, it is a breeze to connect it to other systems, which means easy interoperability, enterprise-wide.

Since historians save data in a proprietary format it can be a real challenge to achieve true interoperability with the rest of the data in the enterprise. It requires the use of a proprietary decoder to convert data into a palatable format, which costs more money. Standardized interfaces have been devised using OPC HDA (historical data access), but support and functionality is limited when compared to SQL.

If your time-series data is logged in a SQL database, it is a breeze to connect it to other systems, which means easy interoperability enterprise-wide.

A Penny Saved is a Penny Earned

After a side-by-side comparison of SQL and process historians, the advantages of using a SQL relational database to log historical data are pretty clear. SQL databases are designed to empower users to quickly get answers to questions while historians are designed to efficiently store highly-compressed data.

Both are good at what they do, but when you factor in the much lower cost of using a SQL database, the value of using SQL is easy to see. The money saved using SQL could be put toward other more pressing needs, which can increase the value of the enterprise overall.

One of the biggest ways SQL can add value to a company is making information available in real-time. It accomplishes this by taking out the man-in-the-middle and putting him to work somewhere else. The-man-in-the-middle is in reference to the old way of achieving some semblance of interoperability with time-series data and the data from other systems in the enterprise.

To relate real-time time-series data with data from other enterprise systems there are few options for users who have SCADA systems that don’t take advantage of SQL. They have to export data into a flat file that must be decoded, and buy an expensive decoder to make sense of the proprietary file format, or they can try to hack a connection, which offers uncertain results. Because of these limited options, many companies are still doing things the old-fashioned way.

In order to get time-series to work with other data, they have someone manually export and decode the data and then import it into another system – this unlucky fellow is called the man-in-the-middle. SQL databases remove this step, making your data available in real time.

Out of the Middle, Into the Lead

Not only is the man-in-the-middle approach really slow, but the ineffectiveness of the approach is compounded when data has to be shared across multiple systems. This is because the data has to be separately imported into each system, and this has to be done every time the company wants to compare real-time data – although after all the exporting and importing, calling it “real-time” data is a stretch. All this can be avoided simply by using SQL for time-series data.

The other systems in your enterprise understand SQL, so just give them time-series data in a way they can understand it. Using SQL for time-series data means that it’s possible to achieve easy interoperability with the systems in the rest of the enterprise – in real time.

SQL totally eliminates the need for the-man-in-the-middle, which can free up that employee to deal with other work that requires attention. It also means that you can get answers to important questions about real-time data in milliseconds, not hours or days. More effective use of manpower means a greater profit margin, and a less stressed-out, former man-in-the-middle who can be in the lead driving the data.

The Man-in-the-Middle VS The Man-in-the-Lead

Total Enterprise Connection

There is no question that as technology has progressed the world has become more and more connected; and that connectivity has become ever more important to the profitability of today’s businesses. Not that long ago having a company website was seen as a luxury, whereas today it is virtually a requirement to doing business.

As the business world has become more connected, companies have also increasingly grown interdependent on one another. This has been mirrored in the manufacturing industry, where companies have become more specialized and processes have grown more and more complex.

With so much going on at a modern manufacturing company every day, it has become imperative to stay connected with the entire enterprise at all times.

Each system in the enterprise could be affecting the others for better or worse, and the only way to know is to get everything connected. Using a SQL database for time-series data facilitates easy connection to other enterprise systems because most of them have used SQL for years.

The time-series data from your SCADA system may have volumes to say about your company. It may hold the key to unlocking the true potential of your company’s productivity; it may even be that competitive advantage your company has been seeking. It may be a lot of things, but if your time-series data isn’t connected to the rest of your enterprise it is definitely underutilized.

In order to tap the potential gold mine of information contained within your SCADA system, you have to give the data context. The only way you can put time-series data into context is to make it relatable to the data from the rest of your enterprise. Here are a few examples to help illustrate the potential profit-increasing benefits of putting time-series data into context with the rest of your enterprise.

Example 1: Inventory + OEE + SCADA

After cross-referencing the time-series data in the SCADA system with OEE (overall equipment effectiveness) software – via a SQL database connection – it is apparent that line 1 is producing more product than line 2. The OEE software points to the fact that more downtime is occurring on line 2, which is resulting in decreased production. The two lines appear to be functioning properly, yet line 2 is clearly not as efficient as line 1.

However, since the inventory system is also connected to the SQL database it is discovered that line 2 uses raw materials from a different vendor than line 1. After making a switch of raw material vendors the productivity of line 2 is increased and the problem is solved. Solving the problem swiftly – because of enterprise-wide connectivity – results in decreased downtime and increased productivity, which means increased value for the company on the whole.

Example 2: Quality + SCADA

All products pass through a quality checkpoint before moving on to packaging; in this process the quality system is alerted to the fact that numerous products are falling below acceptable control limits. Thanks to the SQL connection that the quality and SCADA systems share, the offending products can be traced back through the process to their source.

After tracking several low-quality products back to the source, it is discovered that the raw materials all originated from lot 1B. Once the bad lot is discovered and shut down, the quality of all the products is again within acceptable levels. Thanks to cross-enterprise data accessibility, the quality problem was identified and fixed before any damage was done, saving the company time, money and bad publicity.

Example 3: Invoicing + WMS + SCADA

An abnormal amount of complaints are coming in from a localized area about the freshness of the product. By using the invoice system, the shipment made to the area in question is identified and traced back to originating from warehouse 2. After searching through the data of the quality and SCADA systems, no problem is discovered.

However the WMS (warehouse management system) reveals an unusually large gap in time between when the product was packaged and shipped – the product has begun to spoil by sitting in the warehouse too long. By discovering the problem quickly, it can be addressed before more bad product goes out, minimizing the damage done.

Using SQL relational databases to deal with time-series data makes it accessible to the entire enterprise. The bottom line is that combining the power of SQL with your SCADA system facilitates more people to ask more important questions about your data. The answers to those questions in real time can result in immediate, impactful, potentially revolutionary insights into what’s happening in your company – right now.

SQL can also help shed new light on past data. Time-series data that has been collecting dust can be given new life by putting it into a SQL database. Putting the data into a relational format will make old data easy to put into context with the data from across the enterprise. It’s possible that in the light of this new context you may discover trends that you have never seen before, trends that could hold the key to increasing the company’s productivity.

In the Information Age in which we live, the value of accurate, real-time information cannot be overstated. As we move into the future, the speed of business will continue to accelerate and only the companies that stay quick and agile will be able to keep up. Using SQL puts a company in a good position to keep pace.

Using SQL allows companies to answer important questions about their data in milliseconds, and in a competitive industry where every second counts, that is a game-changer. Pairing SQL and SCADA can help you turn your time-series data into one of your company’s more useful and valuable assets.

Posted on July 12, 2012