Behind the Markets Podcast: Using ChatGPT to Forecast Equity Price Movements

Global Head of Research

07/28/2023

We recently had the pleasure of speaking with Alejandro Lopez Lira, assistant professor at the University of Florida’s Warrington College of Business. It was great to join him at the University of Pennsylvania campus, where he earned his Ph.D.

The focus of the discussion was asset pricing, where the bulk of Alejandro’s research and work has been. Alejandro took us back to the line of thinking he had during his research at UPenn, where he was motivated to see what drives different assets to have different expected returns. Machine learning was an interesting tool that could be used for text analysis, and it was clear that individual companies were disclosing lots of risks in their 10-K documentation. The concept was to capture these risks from within the unstructured text data and figure out whether the market was treating them in a way that allowed investors to be compensated for bearing them.

At WisdomTree, we spend a lot of time looking at various factors to help explain the market returns we are seeing and build new strategies for investors. A massive amount of attention is paid to the group of so-called academic factors. Book-to-market is a useful example of an academic factor. It refers to the ratio of a company’s book value of equity compared to its market value of equity, with a higher figure indicating that the company leans toward being a value stock.

Alejandro, along with Andrew Y. Chen and Tom Zimmerman, wrote a paper titled Peer-Reviewed Theory Does Not Help Predict the Cross-Section of Stock Returns. We certainly did not cover the full paper in the conversation, but we alluded to how, for every published factor, numerous other factors have a similar explanatory power over the observed variation in returns. The gist is that published academic factors are not the be-all and end-all of the picture, and there seems to be a high degree of correlation between published and unpublished factors. The explanatory power in general appears to be declining, indicating that more information may be disseminating into asset prices more quickly.

We talked a bit about the ongoing debate over whether companies should capitalize on their research and development expenditures. The logic is that when a company buys a physical asset, it holds that asset on its balance sheet. Over time, depreciation expenses take the carrying value lower and lower until the useful life of that asset has ended. Think of the example of Bard, the large language model developed by Alphabet’s Google. It is clear that the company had to invest significantly to build this model, and we should assume it brings some value above what it would have if the company had not developed it. Yet, Bard does not live anywhere directly on Alphabet’s balance sheet, so researchers get to debate whether it ends up being captured in the company’s book value of equity or somewhere else.

Jeremy asked Alejandro whether, after all his research, he had a favorite factor. Alejandro noted that he prefers to avoid aggregate market prediction, mixing up different factors and not restricting himself to a single signal. He appreciates the capability that machine learning brings to his research because it uncovers patterns and signals within large datasets.

An important part of the discussion regarded some of the limitations of machine learning. Simply put, it works better with more data. Additionally, we have to recognize that, if we are using ChatGPT to apply machine learning and artificial intelligence, this model is designed to predict text well. Importantly, it is not designed to perform any specific finance-oriented functions. Alejandro noted that if someone tried to have ChatGPT multiply two large numbers together, there is a high likelihood that it would come to an incorrect answer. Without proper plug-ins, techniques like linear regression are beyond its capabilities. But if the remit is instead to ingest massive amounts of company news and headlines that arrive very quickly, it could be the ideal tool.

Anyone seeking to repeat these results or to back test strategies using large language models, whether ChatGPT, GPT-4, Bard, Llama or models that are yet to come, should keep the following in mind:

When models are trained essentially on the entire public internet, it may not be possible to truly see the predictive power of a given approach to equity price forecasting. It could be that the systems are simply regurgitating the actual price movements, which could very well have been included in the training data.

Alejandro was talking about how, since ChatGPT’s training data cuts off in September 2021, he has only been able to truly test the system’s predictive power since that point. This is why the work focuses on daily returns—there is the highest frequency and thereby more data for daily returns. He indicated that it might be possible to look at some weekly studies, but monthly return forecasting is not yet possible with only about 18 months of data. Therefore, any strategy seeking to employ these models would be based on higher-frequency, higher-turnover approaches rather than anything long-term or buy-and-hold.

Notably, the new toolkit, large language models, can be applied to a better, more comprehensive look at sentiment analysis. Alejandro hypothesized that a key reason it could be working well is that, on a daily horizon, it is difficult for market participants to trade small stocks, particularly on the short side, since negative headlines require a short position. Transaction costs could also be higher, particularly for investors with more assets under management, making it impossible for them to exploit these opportunities without moving the market.

Listen to the full discussion below.