IBM Improves Generative AI Forecasting Using Time, Not Just Attention

According to IBM, attention is not all you need when forecasting certain outcomes with generative AI. You also need time. Earlier this year, IBM made its open-source TinyTimeMixer (TTM) model available on Hugging Face under an Apache License. Based on IBM’s Granite foundation model, TTM is a lightweight pre-trained time series foundation model (TSFM) for time-series forecasting based on a patch-mixer architecture for learning context and correlations across time and multiple variables.

Unlike language and vision-based foundation models, such as ChatGPT and Llama, where each word or token contains semantic meaning, TSFMs use values associated with local temporal patches – a contiguous set of points in time to learn the temporal patterns. Additionally, while the language/vision foundation models best derive associations when trained on a single context, such as a given language or topic where grammatical structures or vernacular is common to the dataset, with TSFMs, further contexts and associations can be derived by looking at long historical time windows and correlations with other multi-variate time series data. This data can vary by industry, time resolution, sampling rates, numerical scales or other characteristics typically associated with time series data.

One common factor between the different types of models is the need for massive amounts of data to train the models properly. Language and vision foundation models have essentially the entirety of the internet at their disposal. TSFMs, however, require very specific time-stamped data that is typically not publicly available. Some estimates go as high as 95% of this type of data is still proprietary and not publicly available. Fortunately, researchers from Monash University and the University of Sydney have compiled the Monash Time Series Forecasting Repository, which provides sufficient data across multiple domains and time units to train TSFMs properly.

The ability of TSFMs to handle the multi-variate nature of time series data is essential for taking into account the context of what the data represents during the training window (e.g., when analyzing stock prices, was there an earnings call or a critical announcement where there was an inflection point in the data, etc.). To take full advantage of this, as opposed to using a transformer architecture like language models, IBM created a new architecture called Time Series Mixer or TS Mixer. According to IBM, upon implementing the TS Mixer architecture, model size was reduced by a factor of 10 compared to models using transformer architecture while maintaining similar accuracy levels.

Since its release in April 2024, TTM has had over one million downloads from Hugging Face, which begs the question: What time series applications are developers using IBMs Granite TTM for? According to IBM, TTM is being used for a variety of value-added, multi-variable use cases. One use is forecasting flash storage device performance across over 350 key performance indicators. Another use case is to provide directional forecasts for stock movements using both the temporal patterns and the impact of other variables. It has been used in providing a 28-day sales forecast (demonstrated against the M5 retail data set) for inventory and revenue planning with the added capability of factoring in the effects of things like sale events and other variables that affect retail sales for added accuracy. TTM is also used for forecasting based optimization (model predictive control), such as building temperature control or complex manufacturing process modeling.

As we continue to see, there is no one-size-fits-all AI solution. As new AI technologies and models are introduced, selecting the best solution for the application is important. Transformer-based large language models clearly provide world-changing results when predicting outcomes based on language and vision. However, in the case of forecasting time series-based outcomes, IBM has developed a new tool to put in our collective toolboxes. Its Granite TTM is not the only TSFM available, but hopefully, given the innovations that IBM has introduced and its open-sourced availability, it will be the one that helps drive TSFMs to the same scale in terms of development and utility as its language-based counterparts.

Read the full article here