Performance Estimation with the Sharpe Ratio

Steven E. Pav

Opendoor

June 2, 2022

The Sharpe Ratio and Signal-Noise Ratio

The Sharpe ratio (SR) is a sample statistic used to measure investment performance, defined as sample mean of returns divided by sample volatility of returns:

\frac{\hat{\mu}}{\sqrt{\hat{\sigma}^2}}.

  • The Signal-Noise ratio (SNR) is the unobservable population parameter, \mu / \sigma.

The Sharpe ratio and Signal-Noise ratio are connected to three questions:

  • Las Vegas: How would you invest if you knew the parameters of returns?
    • Roy (1952) proposed maximizing SNR, to minimize the probability of a loss via Chebyshev’s Inequality.
  • Ivory Tower: How do you estimate population parameters from observed data?
    • Q’s like: Is the SNR of my strategy positive? Is the SNR of strategy A higher than that of B?
    • Steal results from classical statistics, because SR is like a t statistic.
  • Wall Street: How should you invest, given the observed data?
    • Combine Las Vegas and Ivory Tower? That is, decide to maximize the SNR, then perform inference on SNR.
    • Alternatively, use a heuristic? Without any theory, hard to understand performance of this approach.
    • Sharpe (1965) gave no theoretical backing, other than investors like returns and hate risk.

Example: “Sharpe is useless because returns are not normal.” Arguable for Las Vegas, work-arounds in Ivory Tower.

Sharpe Ratio as Currency

The Sharpe ratio has escaped academia and has a kind of currency among allocators and investors.

  • Investors never discuss their mythical utility function, but routinely ask about your Sharpe.

 

 

 

  • If a high achieved Sharpe is the investment objective, you should seek to maximize SNR.

Basic Computation of Sharpe Ratio

  • Compute Sharpe via SharpeR::as.sr:
# load the monthly Fama French 4 Factors
library(tsrsa)
data(mff4)
head(mff4,n=2)
          Mkt   SMB  HML   UMD   RF
Jan 1927 0.19 -0.56 4.83  0.44 0.25
Feb 1927 4.44 -0.10 3.17 -1.32 0.26
# compute SR on all 4:
as.sr(mff4[,c('Mkt','UMD','SMB','HML')]) 
    SR/sqrt(yr) Std. Error t value  Pr(>t)    
Mkt        0.61       0.10     6.0 1.8e-09 ***
UMD        0.48       0.10     4.6 2.1e-06 ***
SMB        0.23       0.10     2.2   0.014 *  
HML        0.32       0.10     3.1   0.001 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Detects input frequency of returns, reports Sharpe in annualized units:
# load the daily Fama French 4 Factors
library(tsrsa)
data(dff4)
head(dff4,n=2)
             Mkt   SMB   HML   UMD    RF
1926-11-03 0.213 -0.24 -0.28  0.57 0.013
1926-11-04 0.603 -0.15  0.69 -0.52 0.013
# compute SR on all 4:
as.sr(dff4[,c('Mkt','UMD','SMB','HML')]) 
    SR/sqrt(yr) Std. Error t value  Pr(>t)    
Mkt        0.64       0.10     6.2 2.7e-10 ***
UMD        0.55       0.10     5.3 5.0e-08 ***
SMB        0.14       0.10     1.4 0.08771 .  
HML        0.37       0.10     3.6 0.00017 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Distribution of Sharpe

If returns were normal, the Sharpe ratio would follow a (non-central) t distribution, up to scaling:

\mathrm{Sharpe\,\,ratio} = \frac{\hat{\mu}}{\hat{\sigma}},\,\,\,\quad \mathrm{t\,\,statistic} = \sqrt{n}\frac{\hat{\mu}}{\hat{\sigma}}.

  • When returns are nearly normal, inference can use the connection to the t distribution.
  • Otherwise, use asymptotic formulae for standard error: s.e. \approx \sqrt{\frac{1 - \zeta \gamma_1 + \frac{\gamma_2 + 2}{4}\zeta^2}{n}} \approx \sqrt{\frac{1 + \zeta^2/2}{n}}. (Johnson and Welch (1940), Mertens (2002), Bao (2009))
# using the simple asymptotic standard error:
zeta <- as.sr(mff4['2011-01-01::2020-12-31','Mkt'])
confint(zeta,type='t')
     2.5 % 97.5 %
Mkt 0.3729  1.639
  • as.sr(...,higher_order=TRUE) computes and stores the moments needed by se, confint and predint.
# using Mertens form:
zeta <- as.sr(mff4['2011-01-01::2020-12-31','Mkt'],higher_order=TRUE)
confint(zeta,type='Mertens')
    2.5 % 97.5 %
Mkt 0.337  1.674

Hypothesis Testing

One and two sample Hypothesis testing via SharpeR::sr_test.

  • Hypothesis testing on one investment also via t approximation or approximate standard error:
# higher order approximate standard error
print(sr_test(mff4[,'Mkt'],alternative='greater',zeta=0.3,ope=12,conf.level=0.95,type='Mertens'))

    One Sample sr test, Mertens method

data:  mff4[, "Mkt"]
t = 6, df = 1127, p-value = 0.001
alternative hypothesis: true signal-noise ratio is greater than 0.3
sample estimates:
      [,1]
Mkt 0.6138
attr(,"names")
[1] "Sharpe ratio of mff4[, \"Mkt\"]"
  • Two sample tests, paired and unpaired, via approximate standard errors:
# let's compare Mkt and Value
as.sr(mff4[,c('Mkt','HML')])
    SR/sqrt(yr) Std. Error t value  Pr(>t)    
Mkt        0.61       0.10     6.0 1.8e-09 ***
HML        0.32       0.10     3.1   0.001 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(sr_test(x=mff4[,'Mkt'],y=mff4[,'HML'],ope=12,alternative='two.sided',paired=TRUE,conf.level=0.95,type='Mertens'))

    Paired sr-test

data:  mff4[, "Mkt"] and mff4[, "HML"]
t = 2.3, df = 1127, p-value = 0.02
alternative hypothesis: true difference in signal-noise ratios is not equal to 0
sample estimates:
difference in Sharpe ratios 
                     0.2959 

Sharpe Optimization

We don’t always have binary buy/no-buy decisions, instead we can hold a portfolio of investments.

  • Las Vegas problem: suppose \vec{\mu}, \Sigma are the known mean and covariance of returns. Maximize the signal-noise ratio of a portfolio of the assets.
    • A portfolio \vec{w} has expected return \vec{w}^{\top}\vec{\mu} and variance of returns \vec{w}^{\top}\Sigma\vec{w}.
    • SNR maximization solved by Markowitz portfolio \vec{w}_{*} = c \Sigma^{-1}\vec{\mu}, for any c>0.
    • The squared SNR of the Markowitz portfolio is \zeta^2_* = \vec{\mu}^{\top}\Sigma^{-1}\vec{\mu}.
  • As a Wall Street problem, we typically use simple estimators of mean and covariance, and plug them in: \hat{\mu} = \frac{1}{n}\sum_{t=1}^{n}\vec{x}_t,\quad \hat{\Sigma} = \frac{1}{n-1}\sum_{t=1}^{n}\left(\vec{x}_t\vec{x}_t^{\top} - \hat{\mu}\hat{\mu}^{\top}\right). Then the sample Markowitz portfolio and squared Sharpe are: \hat{w}_{*} = \hat{\Sigma}^{-1}\hat{\mu},\quad \hat{\zeta}_{*}^2 = \hat{\mu}^{\top}\hat{\Sigma}^{-1}\hat{\mu}.

Portfolio Inference

Just as \sqrt{n}\frac{\hat{\mu}}{\hat{\sigma}} = t in the univariate case, n \hat{\mu}^{\top}\hat{\Sigma}^{-1}\hat{\mu} = T^2 the Hotelling statistic for the multivariate case.

response classical statistics quantitative investing
univariate t statistic Sharpe Ratio
multivariate T^2 statistic Squared Optimal Sharpe Ratio

(There’s another dimension to this table if you consider conditioning information!)

  • T^2 is the test for the hypothesis H_0: \vec{\mu} = \vec{0}, or equivalently H_0: \zeta^2_* = 0.
  • If \vec{\mu} = \vec{0} then every portfolio on the set of assets has zero SNR.

Major divergence between Ivory Tower and Wall Street in this case:

  • We might correctly reject H_0 but the SNR of \hat{w}_{*} could be zero or negative!
    The Markowitz Portfolio is an Error Maximizing Portfolio. (Michaud (1989))

Also: distribution of T^2 is less robust to assumptions than that of t.
That aside, we can compute \hat{\zeta}_*^2 and perform inference via SharpeR::as.sropt:

rets <- mff4[,c('Mkt','HML','SMB','UMD')]
print(SharpeR::as.sropt(rets))
       SR/sqrt(yr) SRIC/sqrt(yr) 2.5 % 97.5 % T^2 value Pr(>T^2)    
Sharpe        1.07          1.04  0.84   1.26       107   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Again, inference is on \zeta^2_*, not the SNR of the sample Markowitz portfolio.

Portfolio Inference Example

  • Can also construct \hat{w}_* and perform inference on \vec{w}_* via MarkowitzR::mp_vcov:
library(MarkowitzR)
mp <- MarkowitzR::mp_vcov(rets)
knitr::kable(rbind(round(t(mp$W),4),paste0('(',round(sqrt(diag(mp$What)),4),')')),
             caption="Sample Markowitz portfolio weights and std.errs.")
Sample Markowitz portfolio weights and std.errs.
Mkt HML SMB UMD
Intercept 0.0435 0.043 0.0049 0.06
(0.0075) (0.0109) (0.0098) (0.0098)

  • Portfolio weights are unitless, and likely should be rescaled for volatility. (Britten-Jones (1999), Pav (2013))

Conditional Portfolios

Freshman Quant Question: (Why) Would an investor pay for the sample Markowitz portfolio?

Markowitz is a Las Vegas result, did not specify how \vec{\mu}_t, \Sigma_t are estimated. Use features \vec{f}_{t-1} to predict them?

  • Flattening: Explicitly look for a portfolio linear in \vec{f}_{t-1}. (Brandt and Santa-Clara (2006))

    • Returns of portfolio \mathrm{W}\vec{f}_{t-1} can be expressed in terms of the “flattened” \mathrm{W}: \footnotesize \vec{x}_t^{\top}\left(\mathrm{W}\vec{f}_{t-1}\right) = \operatorname{trace}\left(\mathrm{W}\vec{f}_{t-1}\vec{x}_t^{\top}\right) = \operatorname{vec}\left(\mathrm{W}^{\top}\right)^{\top} \operatorname{vec}\left(\vec{f}_{t-1}\vec{x}_t^{\top}\right). Example: two assets, three features \begin{align*} \footnotesize \left[\begin{array}{c}{x_1}\\{x_2}\end{array}\right]^{\top} \left( \left[\begin{array}{ccc}{w_{11}}&{w_{12}}&{w_{13}}\\{w_{21}}&{w_{22}}&{w_{23}}\end{array}\right] \left[\begin{array}{c}{f_1}\\{f_2}\\{f_3}\end{array}\right] \right) &= \small x_1f_1 w_{11} + x_1f_2 w_{12} + x_1f_3 w_{13} + x_2f_1 w_{21} + x_2f_2 w_{22} + x_2f_3 w_{23},\\ &= \footnotesize \left[\begin{array}{c} {w_{11}}\\ {w_{12}}\\ {w_{13}}\\ {w_{21}}\\ {w_{22}}\\ {w_{23}} \end{array}\right]^{\top} \left[\begin{array}{c} {x_1f_1}\\{x_1f_2}\\{x_1f_3}\\{x_2f_1}\\{x_2f_2}\\{x_2f_3} \end{array}\right]. \end{align*}
    • So pretend that the returns are \operatorname{vec}\left(\vec{f}_{t-1}\vec{x}_t^{\top}\right) and use unconditional methods.
    • Very flexible and simple to use, easy to set some W_{ij}=0. Variance depends on \vec{f}_{t-1}.
  • Conditional Expectation Model: yields another connection to classical statistics, via MGLH. (Pav (2021))

Conditional Portfolios, Flattening Example

  • Flattening on Fama French 4 factor returns, features: rescaled 6 mo. momentum, 12 mo. volatility of Mkt.
rets <- mff4[,c('Mkt','HML','SMB','UMD')]
library(fromo)
momentum <- 0.1*dplyr::lag(fromo::running_mean(rets[,'Mkt'],window=6),1)   # don't forget the lag!!
vola <- log(dplyr::lag(fromo::running_sd(rets[,'Mkt'],window=12),1))
vola <- vola / median(vola,na.rm=TRUE)
flattened <- cbind(setNames(rets,paste0('intercept_',colnames(rets))),
                   setNames(momentum*rets,paste0('momentum_',colnames(rets))),
                   setNames(as.numeric(vola)*rets,paste0('vola_',colnames(rets))))[-c(1,2),] 
print(SharpeR::as.sropt(flattened))
       SR/sqrt(yr) SRIC/sqrt(yr) 2.5 % 97.5 % T^2 value Pr(>T^2)    
Sharpe         1.3           1.2   1.0    1.4       153   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Can also construct \operatorname{vec}\left(\hat{W}_*\right) and perform inference:
mp <- MarkowitzR::mp_vcov(flattened)
mpdf <- tibble(asset=rownames(mp$W),weight=as.numeric(mp$W),se=sqrt(diag(mp$What))) 

Hedging

Use case: Suppose you think an asset’s returns cannot be forecast, or you want no exposure to some assets.

  • Ideally hedging would produce returns which are statistically independent from the hedged asset.
  • Correlation is easier to work with, so instead produce returns which are uncorrelated with those of the hedged asset.
  • Sometimes this means you have to invest in the hedged asset: cannot hedge intangibles (e.g. sun spots).

Signal-noise optimization of portfolio with hedge constraint: \max_{\vec{w} : \mathrm{G}\Sigma\vec{w} = \vec{0}} \frac{\vec{w}^{\top}\vec{\mu}}{\sqrt{\vec{w}^{\top}\Sigma\vec{w}}}.

Rows of \mathrm{G} are “hedged out”. This problem is solved by c\left(\vec{w}_{*,\mathrm{I}} - \vec{w}_{*,\mathrm{G}}\right), where we define \vec{w}_{*,\mathrm{A}} = \mathrm{A}^{\top}\left(\mathrm{A}\Sigma\mathrm{A}^{\top}\right)^{-1}\mathrm{A}\vec{\mu} for matrix \mathrm{A}. “Projected Markowitz”.

  • This portfolio has squared signal-noise ratio \Delta = \zeta^2_{*,\mathrm{I}} - \zeta^2_{*,\mathrm{G}}, where \zeta^2_{*,\mathrm{A}} = \vec{\mu}^{\top}\mathrm{A}^{\top}\left(\mathrm{A}\Sigma\mathrm{A}^{\top}\right)^{-1}\mathrm{A}\vec{\mu}. “Projected squared SNR”.

  • If this quantity were zero, then all of the SNR of the assets are captured in the rows of \mathrm{G}. (Kan and Zhou (2012))

Hedging Example

  • Hedge out Momentum (UMD) from Fama French 4 factor returns using SharpeR::as.del_sropt:
rets <- mff4[,c('Mkt','HML','SMB','UMD')]
# first the unhedged:
print(SharpeR::as.sropt(rets))
       SR/sqrt(yr) SRIC/sqrt(yr) 2.5 % 97.5 % T^2 value Pr(>T^2)    
Sharpe        1.07          1.04  0.84   1.26       107   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# now hedge against UMD:
G <- diag(ncol(rets))[colnames(rets)=='UMD',]
print(SharpeR::as.del_sropt(rets,G=G))
       SR/sqrt(yr) 2.5 % 97.5 % F value Pr(>F)    
Sharpe        0.95  0.73    1.2      28 <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Hedge out static positions from the flattening example:
# flattened data defined above
print(SharpeR::as.sropt(flattened))
       SR/sqrt(yr) SRIC/sqrt(yr) 2.5 % 97.5 % T^2 value Pr(>T^2)    
Sharpe         1.3           1.2   1.0    1.4       153   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# now hedge against intercept features:
G <- diag(ncol(flattened))[grepl('^intercept_',colnames(flattened)),]
print(dim(G))
[1]  4 12
print(SharpeR::as.del_sropt(flattened,G=G))
       SR/sqrt(yr) 2.5 % 97.5 % F value  Pr(>F)    
Sharpe         0.7  0.41   0.86     5.3 1.7e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Inference on Achieved Sharpe

Recall that inference on \zeta^2_{*} may not be sufficient for success because of mis-estimation of \vec{w}_*.

Two tools:

  • Sharpe Ratio Information Criterion (SRIC) is an approximately unbiased estimator of the SNR of \hat{w}_*, defined as SRIC = \hat{\zeta}_* - \frac{k-1}{n\hat{\zeta}_*}. (Paulsen and Söhl (2020))
    Caution: estimating this via cross-validation can be biased!

  • Similarly defined approximate confidence bounds for the SNR of \hat{w}_* of the form b_{\alpha} = \hat{\zeta}_* - \frac{f\left(k,\alpha;...\right)}{n\hat{\zeta}_*}. The function f\left(k,\alpha;\cdot\right) depends on the unknown {\zeta}_* but can be estimated from \hat{\zeta}_*. (Pav (2020))
    The probability that the SNR of \vec{w}_* is below b_{\alpha} is approximately \alpha.

Inference on Achieved Sharpe Example

  • SRIC is available via SharpeR::sric (also in the print method):
rets <- mff4[,c('Mkt','HML','SMB','UMD')]
print(SharpeR::as.sropt(rets))
       SR/sqrt(yr) SRIC/sqrt(yr) 2.5 % 97.5 % T^2 value Pr(>T^2)    
Sharpe        1.07          1.04  0.84   1.26       107   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(sric(SharpeR::as.sropt(rets)))
      [,1]
[1,] 1.037
  • Confidence bounds on achieved SNR via SharpeR::asnr_confint:
rets <- mff4[,c('Mkt','HML','SMB','UMD')]
zs <- (SharpeR::as.sropt(rets))
print(asnr_confint(zs,level.lo=0.025,level.hi=0.975))  # currently in dev branch!
      2.5 % 97.5 %
[1,] 0.7954  1.203
  • Can also compute bounds on hedged portfolios (cannot compute SRIC in this case yet):
print(as.del_sropt(flattened,G=G))
       SR/sqrt(yr) 2.5 % 97.5 % F value  Pr(>F)    
Sharpe         0.7  0.41   0.86     5.3 1.7e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(asnr_confint(as.del_sropt(flattened,G=G)))   # currently in dev branch!
        5 % 100 %
[1,] 0.3809   Inf

Takeaways

  • Think in terms of the three problems.
  • Use connections to classical statistics.
  • Make unconditional portfolios conditional via flattening.
  • Perform inference on SNR of \hat{w}_*.
  • Use SharpeR and MarkowitzR.
  • Learn more about the Sharpe ratio:

Buy my book, or download “Short Sharpe Course”.

References

Bao, Yong. 2009. “Estimation Risk-Adjusted Sharpe Ratio and Fund Performance Ranking Under a General Return Distribution.” Journal of Financial Econometrics 7 (2): 152–73. https://doi.org/10.1093/jjfinec/nbn022.
Brandt, Michael W., and Pedro Santa-Clara. 2006. “Dynamic Portfolio Selection by Augmenting the Asset Space.” The Journal of Finance 61 (5): 2187–217. https://doi.org/10.1111/j.1540-6261.2006.01055.x.
Britten-Jones, Mark. 1999. “The Sampling Error in Estimates of Mean-Variance Efficient Portfolio Weights.” The Journal of Finance 54 (2): 655–71. https://doi.org/10.1111/0022-1082.00120.
Hall, Peter, and Qiying Wang. 2004. “Exact Convergence Rate and Leading Term in Central Limit Theorem for Student’s t Statistic.” https://doi.org/10.1214/009117904000000252.
Johnson, N. L., and B. L. Welch. 1940. “Applications of the Non-Central t-Distribution.” Biometrika 31 (3-4): 362–89. https://doi.org/10.1093/biomet/31.3-4.362.
Kan, Raymond, and Guofu Zhou. 2012. “Tests of Mean-Variance Spanning.” Annals of Economics and Finance 13 (1). https://doi.org/10.2139/ssrn.231522.
Mertens, Elmar. 2002. “Comments on Variance of the IID Estimator in Lo (2002).” Working Paper University of Basel, Wirtschaftswissenschaftliches Zentrum, Department of Finance. http://www.elmarmertens.com/research/discussion/soprano01.pdf.
Michaud, Richard O. 1989. “The Markowitz Optimization Enigma: Is ‘Optimized’ Optimal?” Financial Analysts Journal, 31–42. http://newfrontieradvisors.com/Research/Articles/documents/markowitz-optimization-enigma-010189.pdf.
Paulsen, Dirk, and Jakob Söhl. 2020. “Noise Fit, Estimation Error and a Sharpe Information Criterion.” Quantitative Finance 0 (0): 1–17. https://doi.org/10.1080/14697688.2020.1718746.
Pav, Steven E. 2014-2019. SharpeR: Statistical Significance of Sharpe Ratio. https://github.com/shabbychef/SharpeR.
———. 2013. “Asymptotic Distribution of the Markowitz Portfolio.” Privately Published. http://arxiv.org/abs/1312.0557.
———. 2017. “A Short Sharpe Course.” Privately Published. https://doi.org/10.2139/ssrn.3036276.
———. 2018. MarkowitzR: Statistical Significance of the Markowitz Portfolio. https://github.com/shabbychef/MarkowitzR.
———. 2020. “Inference on Achieved Signal Noise Ratio.” http://arxiv.org/abs/2005.06171.
———. 2021. The Sharpe Ratio: Statistics and Applications. CRC Press.
Roy, A. D. 1952. “Safety First and the Holding of Assets.” Econometrica 20 (3): 431–49. https://doi.org/10.2307/1907413.
Sharpe, William F. 1965. “Mutual Fund Performance.” Journal of Business 39: 119. https://doi.org/10.1086/294846.