Behind the hype: Machine learning in investment management
In a recent article, I discussed some of the significant progress being made in machine learning–enabled artificial intelligence and some of its potential drawbacks as well as the challenges it poses for regulators. Now, I want to bring your attention to a very interesting Barclays report that looks at the deployment of quantitative fund strategies, and in particular, the role of machine learning in investment management.
You can read more articles on technology’s role in finance by Sviatoslav Rosov, PhD, CFA on the Market Integrity Insights blog.
Big data: Costly, but is it useful?
Although big data is usually directly associated with machine learning, there is still a debate whether new data sources, such as web crawling through news or social media, credit card data, geolocation data, and so on, is helpful in the investment process. Some specific examples of trading strategies based on such data include using Twitter sentiment to make bets on the equity market as a whole or individual stocks in particular or using geolocation data to estimate retail activity relevant to individual stocks (e.g., footfall at retail stores).
The Barclays report states that 54% of surveyed investment managers use alternative data, such as web crawling social media data, satellite data, or credit card data. This finding suggests it is less prevalent than tick data (100% usage) or fundamental data (62% usage), such as balance sheet or income statement data, but more prevalent than economic data (38% usage), such as employment or inflation figures, or sell-side data (31% usage), such as analyst reports or broker recommendations.
Despite the widespread use of alternative data, 80% of surveyed investment managers in the Barclays report said that their biggest challenge was in assessing the usefulness of the data. Other concerns managers have are that the price of big data is typically greater than its usefulness, and it is difficult to clean and process for analysis. The key issue here is that the cost of the dataset is not merely its up front cost but also the opportunity cost of time spent cleaning, filtering, and analysing a dataset that may ultimately not yield any actionable recommendations.
Machine learning does the dirty work
Interestingly, machine learning may help reduce this opportunity cost of alternative data by improving and automating the data gathering, processing, and cleaning procedures. Existing sources of data can also be rendered cheaper and more effective by these improvements.
The Barclays report confirms this potential by noting that the most popular use case for machine learning among respondents is to clean traditional data sources, such as tick data, with 88% of those managers who use machine learning in the investment process using it as a data processing tool. Of managers who use machine learning in the investment process, only 25% are currently using it as part of the investment decision or portfolio construction and execution phase.
Artificial data miner
One issue with machine learning that is a potential concern is the problem of overfitting data and/or data mining. The respondents to the Barclays survey also highlight this issue. Machine learning is very good at finding correlations, but it does nothing to develop testable hypotheses of causation. Given the big data/cloud computing approach to machine learning, many hundreds of hypotheses are tested by the algorithm (that is, essentially, the process of machine “learning”) and in the end the significant correlations are highlighted. But from first-year statistics we know that, through random chance alone, if we test 100 hypotheses with a 95% confidence level, we expect 5 to be significant spuriously. This issue of spurious correlation is one that is routinely ignored in finance research, and it is quite likely that machine learning will be no different.
In addressing these concerns, the report notes that although managers are aware of this issue, they are not concerned by it because they are seeking simply to take advantage of these correlations, spurious or otherwise. To the extent that they represent short-term market inefficiencies they “prefer to take advantage of the signal while it is still useful rather than waste time trying to find hypotheses for these signals.”
In summary
The Barclays report is an interesting window into the role of machine learning in investment management. There is a lot of hype about artificial intelligence and robo-advisers, but as always, reality tends to intrude at some point and what we are seeing is it mostly being used for data processing techniques (for now anyway).
What did you think of this article? Share your opinion by taking a 1-minute survey.