Gabriel Appau Abeyie

Assistant Professor of Economics

Beyond News Headlines and TF-IDF: Enhancing Text-Based Forecasting Models with Validated Collocations and Improved Attention


This paper proposes a method for improving text-based forecasting models, specifically focusing on forecasting crude oil prices. Utilizing advanced techniques, including pattern validation and attention mechanisms, the study demonstrates notable improve ments in predictive power over traditional approaches. One key finding is that considering the full text of news articles, rather than limiting the analysis to news headlines, leads to significant gains in forecasting accuracy. Furthermore, the model featuring verb-noun and noun-verb collocation pattern validation consistently outperforms benchmarks and models based solely on news headlines across various forecasting horizons. The results suggest that the presence of such collocations as ’price fell,’ ’prices tumbled,’ and ’price dropped’ in crude-oil-related news articles is associated with a decrease in oil price returns. Additionally, in tegrating macroeconomic data with text-based features enhances predictive performance, demonstrating that combining structured economic indicators with textual features improves forecasting accuracy.