Across a range of index-tracking funds, we can see the impact of high fees on performance. The rationale for human advisers paying high fees is the expectation of earning greater net returns than one’s peers over time. Yet research shows that the opposite is true. The only risk variation in gross performance (and mostly by a large amount on the downside) is associated with high-fee funds. When high fees are taken into account, the comparison becomes too probabilistically implausible to ever justify them. This article takes advantage of modern, advanced analytics and “big data” tools to further substantiate the academic understanding of fees versus performance.
As we have seen in personal finance, the bold pursuit of outsized returns generally worsens an investor’s situation. An investor must contend with the difficulty of skillfully selecting the outperforming securities,1 together with the high chance of market mistiming2 and other tracking errors.3 Last but not least is the drive toward higher expense funds, which, as we will demonstrate, can be closely associated with passive index benchmarks.
In this quantitative article, we examine the returns of funds with higher fees charged versus those with lower fees charged, and we pay careful attention to attributing the expenses charged within the fund as the primary factor. Using advanced probability theory and the advanced analytics of “big data,” we explain not only how fees take away from gross returns but also how there is no supportive justification for owning high-fee funds to begin with. Although investors are drawn to the higher fees in the hope of attaining higher returns, our careful study shows, in both probabilistic and financial terms, that this hope is not fulfilled.
In this article, we explore the funds listed in the Lipper Fund Database and focus on the simplest of funds—those whose objective is to match the returns of the largest 500 US securities (S&P 500 Index). Much of the academic literature supporting the common-sense conclusion that fees take away from performance is based on regression and factor analysis.4 Here, we will use a modern application of advanced analytics (including machine learning) to better understand the nuanced empirical patterns of clusters within the funds’ data (n = 136). We can see from Figure 1 that the very cloudy pattern makes regression analysis impractical.
Let’s begin with some comments on the dataset because we will be using it throughout this article. The first observation is that there are roughly six truncated data points owing to their being outliers associated with an erroneous 0% net performance, which implies some data quality issues with the Lipper performance database. Second, for our analysis, it does not matter whether we use gross performance (or, later, net performance) because the fee ratio is very small in relation to the level of performance in this one-year period. More importantly, the fee ratio is not small in relation to the variation in performance. Third, the results are similar regardless of the length and starting point of the time frame chosen for analysis.
We often get unintuitive clusters when using advanced analytics, which, when applied to our dataset, reveal the three different clusters in Figure 2.
As we can see in Figure 2, the squared data (LL, n = 20) are clearly an issue and will remain so throughout the analysis. The triangles (L for loser funds) and the diamonds (W for winner funds) have sample sizes of 27 and 83, respectively. The main point at this stage of the analysis is not to dig further into the numbers but, rather, to see that the advanced analytics provide some of the funds at a fee ratio of about 0.5% to both LL and L. It is good to see a cut of about 0.5% in fees, but we cannot further determine which funds will be winners and losers solely on the basis of fees at this point. In other words, if you select only funds with 0.5% fees, you have as good a chance of selecting a winner as you do of selecting a loser (LL). In this analysis, we aim to improve on our selection probability.
We rely on a hybrid approach, using a linear partitioning to optimally cluster the breakpoint in fees that provide the strongest contrast in fund performance.
Figure 3 shows the individual histograms of the variables (gross performance on the left and fee ratio on the right). Neither has a normal distribution, making a linear regression analysis difficult.
Next, looking at the insightful smoothed contour plot and the 90% confidence interval circling the data (see Figure 4), we can see the overall fit and how theoretical joint normal distribution does an acceptable job. It is not very tight. We know from an article on abnormal risks5 that the assumption about random variables, which can be mathematically manipulated, allows for a clearer understanding of the ultimate parameters and model distribution.
So, our analysis has taken us from the cluster that we saw at the beginning to the optimal linear partitions shown in Figure 5.
The gross performance of the 63 funds at <0.45% fee ratio (low fee) is 17.3% (σ of only 0.2%); for the nearly 67 funds at ≥0.45% fee ratio (high fee), the gross performance is 15.7% (high σ of 2.2%). So, we can see that the linear partition does a nice job of taking on slightly fewer funds (63 versus 83) but achieves the same performance as the winners’ (W) cluster. We have grouped together the L and LL (originally 27 + 20 = 47, but now 67) and reduced the allowed fee ratio for selection. Additionally, we see a strong partition in means and both a large and roughly equal sample size for the two samples.
Our analysis leaves us wanting to solve for the ultimate understanding of the probability of selecting a winner fund or a loser fund solely on the basis of selecting a high-fee ratio versus a low-fee ratio. It uses Bayesian probabilities and is a critical complement even with a big-data analysis. More exotic probability tests could be performed in closed form, but we are in violation of the normal distribution parameters, as we have already seen several times in this article and as Figure 6 shows.
So, the probability of high net returns (e.g., 16.9%) is roughly 75% if choosing a low-fee fund and 0% if choosing a high-fee fund. We can see the cost of all of the poor LL performers weighing on gross returns, in addition to the higher fees themselves. For example, the low-fee funds saw their net performance average of 17.0% (down from 17.3% gross). Yet the high-fee funds saw their net performance plummet to an average of 14.7% (down from 15.7% gross). So, the fees ate up an additional 0.7 percentage point (the fee itself) versus the losing differential at the gross performance level. We learn from investment performance standards:
(17.3% – 17.0%) – (15.7% – 14.7%)~ 0.3 percentage points – 1.0 percentage points
~ –0.7 percentage points
Even looking at the previous net performance histograms, we can see that if an investor selects a random high-fee fund, there is barely a 25% chance that it will outperform a random low-fee fund. This is a cautionary tale that summarizes the strength of the cost discipline a robo-adviser can have and the probabilistic difficulty of justifying higher-fee funds. What is hidden in more traditional regression and factor analysis is that the variation, which is only in the downside, is also only in the high-fee space (about 2%). This implies a 2.3 percentage point advantage for low-fee funds (17% – 14.7%), which suggests that everyone should embrace only these funds. Under modern portfolio theory, an investor is supposed to be paid for the extra risk (including fees). Here, an investor is sometimes earning less—most of the time a lot less.
1 Jeff Sommer, “The Oracle of Omaha, Looking a Bit Ordinary,” New York Times (5 April 2014): www.nytimes.com/2014/04/06/business/the-oracle-of-omaha-lately-looking-a-bit-ordinary.html.
2 Salil Mehta, “The Puzzle in Active Investing,” Pensions & Investments (24 April 2015): www.pionline.com/article/20150424/ONLINE/150429908/the-puzzle-in-active-investing.
3 See http://statisticalideas.blogspot.com/2015/04/the-indomitable-benchmarks.html.
4 Mark Carhart, “On Persistence in Mutual Fund Performance,” Journal of Finance, vol. 52, no. 1 (March 1997); Erik R. Sirri and Peter Tufano, “Costly Search and Mutual Fund Flows,” Journal of Finance, vol. 53, no. 5 (October 1998).
5 See http://statisticalideas.blogspot.com/2015/06/abnormal-risks.html.
Author Information
Related Topics





