Abstract:Stock index prediction is an important topic in the field of finance. With the development of computing power and technologies, there are opportunities to improve the performance of stock index prediction by identifying and quantifying valuable information from online news. In order to extend the econometric literature on stock index prediction frameworks to high-dimensional textual data, a stock index prediction framework based on generative language models is proposed. The prediction framework can be divided into two steps. First, a supervised generative language model is used to filter out noisy words quickly and aggregate the remaining text into a news index that can fully explain stock index changes. Second, the news index and historical stock index data are jointly used as independent variables of the time-varying parameter predictive model to predict future stock index values. The framework not only enriches the influencing factors of stock index prediction but also reveals the time-varying dynamic relationship between these factors and stock index values. Empirical research demonstrates the explanatory and out-of-sample predictive power of the proposed prediction framework. Among the six industrial stock indices predicted, the mean square error obtained by the proposed prediction framework is generally lower than that by traditional time series and machine learning methods. Compared with the time-varying parameter predictive model and long short-term memory model that do not consider news information, the proposed prediction framework also exhibits better predictive performance.