Introduction

In a previous post (Random Walks in the Stock Market with R), we have explored how volatility greatly affects the range of possible returns in the future and therefore is widely accepted as a strong indictor of risk in the financial market.

Using this concept of risk, let us analyse the historical risk vs. return profiles of several large-cap stocks. The following table shows the top 10 stocks by market cap currently trading in the US market, taken from Yahoo Finance as of 26th April 2020.

Symbol Name Market Cap (USD)
MSFT Microsoft Corporation 1.3T
AAPL Apple Inc. 1.2T
BABA Alibaba Group Holding Limited 563B
FB Facebook, Inc. 541B
BRK-B Berkshire Hathaway Inc. 453B
JNJ Johnson & Johnson 408B
V Visa Inc. 370B
JPM JPMorgan Chase & Co. 276B
PG The Proctor & Gamble Company 294B
TSM Taiwan Semiconductor Manufacturing Company Limited 254B

Analysis in R

As usual, let's set up the environment in R by loading required libraries.

# Load required packages, and install missing ones
if (!require(tidyverse)) install.packages("tidyverse")
if (!require(tidyquant)) install.packages("tidyquant")
if (!require(slider))    install.packages("slider")
if (!require(ggrepel))   install.packages("ggrepel")

# Set default ggplot2 theme
ggplot2::theme_set(tidyquant::theme_tq())

Then we define a vector of symbols we want to extract prices for. Apart from the top 10 market cap stocks, let's also throw in a few others to serve as basis for comparison:

  • VTI (Vanguard Total Stock Market ETF): Proxy for the entire US stock market as a whole
  • IEF (iShares 7-10 Year Treasury Bond ETF): Proxy for long-term US Treasury bonds
  • GLD (SPDR Gold Trust): Proxy for Gold
  • USO (United States Oil Fund LP): Proxy for Oil
  • GBTC (Grayscale Bitcoin Trust): Proxy for Bitcoin
  • TSLA (Tesla Inc.): Popular stock (market cap 133B) with notoriously high volatility
# Define tickers to extract
tickers <- c(
    "MSFT", "AAPL", "BABA", "FB", "BRK-B",
    "JNJ", "V", "JPM", "PG", "TSM",
    "VTI", "IEF", "GLD", "USO",
    "TSLA", "GBTC"
)

Let's use the handy tq_get() function provided by the tidyquant library to extract these quotes (from Yahoo Finance by default). While there are many different ways to extract stock prices in R, tidyquant is one of my favorites as it is fully compliant with the tidy method.

# Extract prices from Yahoo Finance
prices <- tickers %>%
    tidyquant::tq_get(
        get = "stock.prices",
        from = "2000-01-01"
    )

Next comes the heavy lifting part, where we calculate the daily returns and annualised volatility.

Daily returns are simply calculated as the percentile (or logarithmic) difference between each given day's closing price and the previous day's closing price: \[Returns = {{P_0 - P_{-1}} \over P_{-1}}\]

Volatility is calculated as the standard deviation of returns over a given period (in this case, 90 days), and then multiplied with \(\sqrt{252}\) to make it an annualised value. Why \(\sqrt{252}\)? It has to do with how volatility is proportional to the square root of time, and this website has a good explanation for the maths behind it.

As as for the number 252, it's simply the average number of trading days for the US market in a year. This is the value you'll get if you take the total number of days in a year (365) and subtract the number of Saturdays/Sundays (104) and public holidays (9).

# Calculate returns and volatility
tbl_analysis <- prices %>%
    dplyr::group_by(symbol) %>%
    # Calculate daily returns
    dplyr::mutate(
        return = (adjusted - dplyr::lag(adjusted)) / dplyr::lag(adjusted)
    ) %>%
    # Calculate annualised 90-day volatility
    dplyr::mutate(
        vol_90 = slider::slide_dbl(
            return, sd,
            .before = 90, .complete = TRUE
        ) * sqrt(252)
    )

Why a 90-day volatility and not any other number of days (7, 30, 180, 365, etc.)? Well, there's no magic behind this particular number other that it's a reasonable time frame short enough to react to relatively recent changes in volatility, while at the same time being long enough to smoothen out extremely short-term spikes.

Risk vs. Return Analysis (2019)

Finally, let's filter the results for the period we want to look at (year of 2019), summarise the average daily returns and average 90-day annualised volatility for each symbol, and plot them.

# Plot returns vs. volatility
tbl_analysis %>%
    dplyr::filter(date >= "2019-01-01" & date < "2020-01-01") %>%
    dplyr::group_by(symbol) %>%
    dplyr::summarise(
        date_min = min(date),
        avg_ret = mean(return, na.rm = TRUE) * 100.0,
        avg_vol = mean(vol_90, na.rm = TRUE) * 100.0
    ) %>%
    ggplot2::ggplot(ggplot2::aes(
        x = avg_ret, y = avg_vol,
        colour = symbol, label = symbol
    )) +
    ggplot2::geom_vline(xintercept = 0, color = "firebrick", linetype = "dotted") +
    ggplot2::geom_point() +
    ggrepel::geom_label_repel() +
    ggplot2::labs(
        title = "Analysis: Returns vs. Volatility",
        subtitle = "Over Period: Jan - Dec 2019",
        x = "Average Daily Returns (%)",
        y = "Average 90-Day Annualised Volatility (%)"
    ) +
    ggplot2::theme(
        legend.position = "none"
    )

As expected, US Treasury Bonds (IEF) shows up at the bottom left indicating low-risk and low-return, while Bitcoin (GBTC) breaks away from all the rest at the upper right indicating high-risk high-return.

Gold (GLD) has better returns than Bonds but also higher risk. Equity market as a whole (VTI) has slightly better returns and higher risk than that.

The plot also offers some interesting insights, such as:

  • Berkshire Hathaway (BRK-B), Johnson & Johnson (JNJ) and Proctor & Gamble (PG) are in a similar risk band (~20% volatility) but PG performed much better than the rest, at least in 2019
  • Apple (AAPL) compared to Facebook (FB) had not only higher returns, but also slightly less risk

While this sort of analysis is useful, it is by definition backward-facing and therefore can be dangerous when taken to have predictive properties.

Risk vs. Return Analysis (Q1 2020)

To illustrate this point, let's see what happened in Q1 2020 with the most unprecedented market dislocation in history.

# Plot returns vs. volatility (Q1 2020)
tbl_analysis %>%
    dplyr::filter(date >= "2020-01-01" & date < "2020-04-01") %>%
    dplyr::group_by(symbol) %>%
    dplyr::summarise(
        date_min = min(date),
        avg_ret = mean(return, na.rm = TRUE) * 100.0,
        avg_vol = mean(vol_90, na.rm = TRUE) * 100.0
    ) %>%
    ggplot2::ggplot(ggplot2::aes(
        x = avg_ret, y = avg_vol,
        colour = symbol, label = symbol
    )) +
    ggplot2::geom_vline(xintercept = 0, color = "firebrick", linetype = "dotted") +
    ggplot2::geom_point() +
    ggrepel::geom_label_repel() +
    ggplot2::labs(
        title = "Analysis: Returns vs. Volatility",
        subtitle = "Over Period: Q1 2020",
        x = "Average Daily Returns (%)",
        y = "Average 90-Day Annualised Volatility (%)"
    ) +
    ggplot2::theme(
        legend.position = "none"
    )

Just look at how almost everything is negative, but more importantly how the 90-day volatility has also increased significantly.

Apple (AAPL) for example was an excellent investment for the whole of 2019 (in fact, for the past several years) with ~25% volatility and ~0.27% average returns. But with the advent of COVID-19 its volatility has increased to ~30% and average returns became negative at -0.2%. But interestingly, its relative position with Facebook (FB) remained relatively unchanged.

Tesla (TSLA) and Bitcoin (GBTC) are interesting cases of what might happen with high-volatility assets. While this particular market dislocation proved fortunate for Tesla, Bitcoin's average returns dwindled from its 2019 average of 0.5% to almost zero.

Volatility Over 20 Years

Another interesting analysis would be to observe how the volatility of these assets changed over time. Although Yahoo Finance data goes back to around the 1980's, many of the tickers we're using in this analysis did not exist that early. So let's filter for only the prices in the past 20-year period, from 2000-2020.

# Plot 90-day volatility over time
tbl_analysis %>%
    dplyr::filter(date >= "2000-01-01") %>%
    tidyr::drop_na() %>%
    ggplot2::ggplot(ggplot2::aes(x = date, y = vol_90, colour = symbol)) +
    ggplot2::geom_line(alpha = 0.5) +
    ggplot2::labs(
        title = "Analysis: Volatility Over Time",
        subtitle = "Over Period: Jan 2000 - Apr 2020",
        x = "",
        y = "Average 90-Day Annualised Volatility (%)"
    )

The most obvious observation has to be that volatility itself can be very volatile!

But if you look carefully at the spikes of volatility, they tend to be highly consistent with the financial crises in the past:

  • 2001: Dot-com Bubble Burst
  • 2007: Global Financial Crisis
  • 2020: COVID-19 Crisis

You can also observe that outside of crisis periods, the volatility of each asset tends to be much more stable. Even more interesting is the fact that the relative volatility of different assets tend to be generally stable.

US Treasury Bonds (IEF) in particular should stand out in how it kept an extremely low volatility even through the most turbulent markets. This is why even with extremely low expected returns, treasury bonds tend to form the reliable base of every mutual fund (and majority of hedge funds too) out there. It can be counted upon to provide a component of small but generally positive returns to the portfolio even in the worst of times.

And on the other extreme, we have Bitcoin as the rogue YOLO asset with crisis-level volatility as a matter of course.

Conclusion

While the observations in this exercise are by no means investment advice, I hope it has given you a better appreciation of how to view a particular asset's attractiveness tempered by its inherent (historical) risk.