Researchers design a user-friendly interface that helps non-experts make forecasts using data collected over time.
Whether one is trying to forecast tomorrow’s weather, predicting future stock prices, identifying missed opportunities for sales in retail, or estimating a patient’s risk of developing a disease Putting it, they will probably need to interpret time-series data, which is a collection of recorded observations. for a longer period of time.
Making predictions using time-series data typically requires multiple data-processing steps and complex machine-learning algorithms that have such a steep learning curve that they are not easily accessible to non-experts. Huh.
To make these powerful tools more user-friendly, MIT researchers developed a system that directly integrates prediction functionality with existing time-series databases. Their simplified interface, which they call TSPDB (Time Series Prediction Database), does all the complex modeling behind the scenes so that a non-expert can generate a prediction in just seconds.
The new system is more accurate and efficient than state-of-the-art deep learning methods when performing two tasks: predicting future values and filling in missing data points.
One reason TSPDB is so successful is that it involves a novel time-series-prediction algorithm, explained by Abdullah Alomar, an electrical engineering and computer science (EECS) graduate student, who is the author of a recent research paper and his co-authors. -The authors describe the algorithm. , This algorithm is particularly effective in making predictions on multivariate time-series data, which are data that contain more than one time-dependent variable. For example, the temperature, dew point and cloud cover in the weather database depend on their previous values.
The algorithm also estimates the volatility of a multivariate time series to provide the user with a confidence level for its predictions.
“Even as time-series data becomes more and more complex, this algorithm can effectively capture any time-series structure. It seems that we need to reduce the model complexity of time-series data.” Have found the right lens to look at,” says senior author Devvrat Shah, Andrew and Erna Viterbi Professor at EECS and member of the Institute for Data, Systems and Society. Laboratory for Information and Decision Systems.
Along with Alomar and Shah on the paper is lead author Anish Agarwal, a former EECS graduate student who is currently a postdoc at the Simons Institute at the University of California at Berkeley. The research will be presented at the ACM Sigmatrix conference.
adopting a new algorithm
Shah and his colleagues have been working for years on the problem of interpreting time-series data, adopting different algorithms and integrating them into TSPDB as they created the interface.
About four years ago, he learned of a robust classical algorithm called Singular Spectrum Analysis (SSA), which converses and predicts single time series. Attribution is the process of replacing missing values or correcting previous values. Although this algorithm required manual parameter selection, the researchers suspected that it might enable their interface to make compelling predictions using time series data. Earlier work removed this, the need to manually intervene for the algorithmic implementation.
For a single time series the algorithm converted it into a matrix and used matrix estimation procedures. The major intellectual challenge was how to adapt it to use multiple time series. After a few years of struggling, he realized that the answer was straightforward: “stack” the matrix for each time series, treat it as a large matrix, and then apply a single time-series algorithm.
It uses information naturally across multiple time series – both in time series and across time, which they describe in their new paper.
This recent publication also discusses attractive alternatives, where instead of transforming the multivariate time series into one large matrix, it is viewed as a three-dimensional tensor. A tensor is a multi-dimensional array or grid of numbers. This establishes a good connection between the classical field of time series analysis and the growing field of tensor inference, Alomar says.
“The version of MSSA that we introduced really captures all of that beautifully. Therefore, it not only provides the most reasonable estimate, but also a time-varying confidence interval,” says Shah.
They tested the optimized MSSA against other state-of-the-art algorithms, including deep-learning methods, on real-world time-series datasets with inputs from power grids, traffic patterns and financial markets.
Their algorithm outperformed all others on imputation, and it surpassed all other algorithms for predicting future values. The researchers also demonstrated that their modified version of MSSA can be applied to time-series data.
“One of the reasons I think it works so well is that the model captures a lot of time series dynamics, but it’s still a simple model at the end of the day. When you’re dealing with something simple like this, So, instead of a neural network that can easily overfit the data, you can actually get better performance,” Alomar says.
The impressive performance of MSSA is what makes TSPDB so effective, explains Shah. Now, they aim to make this algorithm accessible to all.
Once a user installs TSPDB on top of an existing database, they can run a predictive query with just a few keystrokes in about 0.9 milliseconds, compared to 0.5 milliseconds for a standard search query. Confidence intervals are also designed to help non-experts make more informed decisions by incorporating the degree of uncertainty of the predictions into their decision making.
For example, the system may enable a non-expert to predict future stock prices with high accuracy in a matter of minutes, even if there are missing values in the time-series dataset.
Now that researchers have shown why MSSA works so well, they are targeting new algorithms that could be incorporated into the TSPDB. One of these algorithms uses the same model to automatically enable change point detection. If the user is confident that their time series will change its behavior, the system will automatically detect that change and incorporate it into its forecasts.
Shah says they want to continue collecting feedback from current TSPDB users to see how they can improve the system’s functionality and user-friendliness.
“Our interest at the highest level is to make TSPDB successful as a widely-usable, open-source system. Time-series data are important, and it’s really a beautiful concept to build prediction functions directly into the database.” It’s never been done before, and so we want to make sure the world gets used to it,” he says.
“This work is exciting for several reasons. It provides a practical form of MSSA, which requires no hand-tuning; they provide the first known analysis of MSSA. Computer science professor Vishal Mishra says, “The authors demonstrate the real-world value of their algorithm by competing with or out-performing many known algorithms for imputation and predictions in (multivariate) time series for multiple real-world data sets.” were not involved in the research.
“At the heart of all this is the beautiful modeling work where they cleverly exploit the correlations of time (within a time series) and space (in a time series) to create a low-grade spatiotemporal factor representation of a multivariate time series. Importantly, this model ties the field of time series analysis to the rapidly evolving topic of tensor completeness, and I expect this paper to lead to a lot of follow-up research.”
Written by Adam Zewe
Source: Massachusetts Institute of Technology