# Introduction

Monte Carlo simulations were first formally designed in the 1940’s while developing nuclear weapons, and since have been heavily used in various fields to use randomness solve problems that are potentially deterministic in nature.

In finance, Monte Carlo simulations can be a useful tool to give a sense of how assets with certain characteristics might behave in the future.

While there are more complex and sophisticated financial forecasting methods such as ARIMA (Auto-Regressive Integrated Moving Average) and GARCH (Generalised Auto-Regressive Conditional Heteroskedasticity) which attempt to model not only the randomness but underlying macro factors such as seasonality and volatility clustering, Monte Carlo random walks work surprisingly well in illustrating market volatility as long as the results are not taken too seriously.

There are two key assumptions made in this exercise:

1.  The future returns cannot be predicted and are therefore effectively random

2.  Following the Central Limit Theorem, the distribution of returns over the long term approximates a normal distribution

Normal Distribution of Returns

First, let’s explore the distribution of 10,000 random returns with the following characteristics:

•  On average, the return is 0%

•  The standard deviation of returns is 10%

We can plot such a distribution in R using the following line of code:

hist(rnorm(10000, 0, 0.1))

The rnorm command returns random numbers and takes 3 arguments, which indicate the # of numbers to generate, the mean and standard deviation in that order. The hist command plots a histogram.

This gives us:

Being a normal distribution, it exhibits the following characteristics:

•  ~67% of observations fall within 1 standard deviation of the mean

•  ~95% of observations fall within 2 standard deviations of the mean

•  ~99.7% of observations fall within 3 standard deviations of the mean

Here we are using 0 as the mean, representing a view that the expected returns for the future is 0%. We will revisit this later with a different view, but zero expected returns are a reasonable risk-averse point to start from.

plot(
cumprod(c(1000,rnorm(100, 1.0, 0.1))),
type = "l", col = "red",
main = "Random Walk: $1,000 over 100 Days", xlab = "Days", ylab = "Value" ) This plot shows us how the value of$1,000 invested in an asset with these characteristics (0% expected returns, 10% volatility) might have played out in a randomly selected possibly future. In this particular alternate reality, we’ve lost quite a bit of money!

You might have noticed that this R code is significantly longer than the previous one, but most of it is optional code used for the purpose of “prettifying” the plot.

A relatively simple plot(cumprod(c(1000,rnorm(100, 1.0, 0.1))), type = "l") is sufficient to produce the basic unadorned plot, but I’m a sucker for pretty plots and couldn’t resist.

The cumprod command calculates the cumulative product of the vector passed to it. For example, cumprod(c(1, 2, 3)) would be equivalent to $1 \times 2 \times 3= 6$. Notice that I’ve changed the second argument of rnorm from 0 to 1; This is so that the resultant vector of random numbers will be centred around 1 instead of 0.

While the previous rnorm command returned percentile returns (e.g. 1%, -3%, etc.), this rnorm will return percentile changes in the value due to the returns (e.g. 101%, -97%). So when cumprod is applied to this result, it will result in the equivalent of compounding the value of each previous day with the percentile returns of each day.

The c(1000, before the rnorm is used to multiply the cumulative product by 1,000, to simulate the initial capital being $1,000 in value. # Simulating Multiple Random Walks Keeping in mind that the random walk above represents merely one out of countless possible futures, let’s try simulating a larger number of them. First, we will need to load a couple of libraries. The plotting commands used so far (hist and plot) are part of the base R distribution, but they are limited in flexibility. The following codes use the more powerful ggplot2 library for plotting, and dplyr for data wrangling. Both of these are part of the tidyverse package. I'm also loading tidyquant and using it only for aesthetics here, but it is a very useful library for financial data analysis which I will probably cover in a separate post. # Load required libraries and install them if missing if (!require(tidyverse)) install.packages("tidyverse") if (!require(tidyquant)) install.packages("tidyquant") Next, we define a function for simulating these random walks: # Function for simulating random walk func_random_walk <- function( # Mean percentile returns mean = 0.0, # Standard deviation of returns sd = 0.10, # Number of random possibilities runs = 100, # Number of days days = 100, # Capital capital = 1000 ) { if (!require(tidyverse)) install.packages("tidyverse") # Build data frame of first random walk x <- seq(1, days, 1) y <- rnorm(days, mean, sd) df <- data.frame(run = 1, x = x, y = y) # Add on additional random walks for (i in 2:runs) { x <- seq(1, days, 1) y <- rnorm(days, mean, sd) tmp <- data.frame(run = i, x = x, y = y) df <- rbind(df, tmp) } # Convert data frame to tibble format df <- dplyr::as_tibble( df %>% # Group each random walk so cumprod will be applied to each of them separately dplyr::group_by(run) %>% # Calculate cumulative product of each random walk dplyr::mutate(cum_y = capital * cumprod(1 + y)) %>% # Remove grouping to improve future performance dplyr::ungroup() ) # Attach descriptive attributes to tibble attr(df, "mean") <- mean attr(df, "sd") <- sd attr(df, "runs") <- runs attr(df, "days") <- days attr(df, "capital") <- capital # Return final result as function output return(df) } Defining a custom function like this wasn’t exactly necessary, but I prefer doing it this way so that it becomes much easier to run multiple random walks with various different characteristics in future. Let’s use that function to run one simulation for 100 possible futures: # Generate random walks df <- func_random_walk( mean = 0.00, sd = 0.10, runs = 100, days = 100 ) This will give us a data frame with the following fields (first 6 rows shown as a sample). > head(df) # A tibble: 6 x 4 run x y cum_y <dbl> <dbl> <dbl> <dbl> 1 1 1 0.0293 1029. 2 1 2 -0.0507 977. 3 1 3 0.0216 998. 4 1 4 0.0416 1040. 5 1 5 -0.0392 999. 6 1 6 0.0159 1015. The run field indicates which alternate future the data is simulating, while x is the number of days in each run. y is the percentile return of each day, while cum_y is the new value of the asset for the day as a result of the daily returns. Next, we will pre-define a set of ggplot layers so that they can be commonly applied to subsequent plots. # Function for obtaining ggplot layers to commonly apply to subsequent plots get_gg_layers <- function(df) { gg_layers <- list( ggplot2::geom_hline( yintercept = attr(df, "capital"), color = "black", linetype = "dotted" ), ggplot2::geom_hline( yintercept = max(subset(df, x == max(x))$cum_y),
color = "steelblue", linetype = "dashed"
),
ggplot2::geom_hline(
yintercept = min(subset(df, x == max(x))$cum_y), color = "firebrick", linetype = "dashed" ), ggplot2::annotate( geom = "label", x = max(df$x), y = max(subset(df, x == max(x))$cum_y), label = prettyNum(round(max(subset(df, x == max(x))$cum_y), 0), big.mark = ","),
size = 3, hjust = 1, color = "white", fill = "steelblue"
),
ggplot2::annotate(
geom = "label",
x = max(df$x), y = min(subset(df, x == max(x))$cum_y),
label = prettyNum(round(min(subset(df, x == max(x))$cum_y), 0), big.mark = ","), size = 3, hjust = 1, color = "white", fill = "firebrick" ), ggplot2::labs( title = "Random Walk Simulation", subtitle = paste( attr(df, "runs"), "random walks with", paste0("$", prettyNum(attr(df, "capital"), big.mark = ",")),
"and", paste0(round(attr(df, "sd") * 100, 0), "%"),
"volatility"
),
x = "Days into future", y = "Value"
),
tidyquant::scale_colour_tq(),
tidyquant::scale_fill_tq(),
tidyquant::theme_tq(),
ggplot2::theme(legend.position = "none")
)
return(gg_layers)
}

Again, all these lines are not strictly necessary as they are for prettifying the plot. Is all this overhead worth it? I’ll show you in a bit.

Finally it’s time to see the result of the 100 random walks. Now that the prettifying layers are pre-defined, applying them are as simple as adding + get_gg_layers() at the end of the plotting commands:

# Plot each individual run as separate line
df %>%
ggplot2::ggplot(ggplot2::aes(
x = x, y = cum_y,
color = factor(run), group = factor(run)
)) +
ggplot2::geom_line(alpha = 0.8) +
get_gg_layers(df)

This gives us the following plot:

Compare this with a bare-minimum version of the exact same plot:

And you can decide for yourself if the additional pre-defined gg_layers are worth it.

Coming back to the topic, notice how completely different each alternate future can play out. Keep in mind that each of these 100 plots have the exact same expected return (0%) and standard deviation (10%). In the most fortunate future, the original $1,000 increased in value to more than$8,000 before settling down to $4,809 at the end of 100 days. But in the least fortunate future, the$1,000 is down to a mere $69. But you can also observe how the vast majority of possible futures converge quite closely around the mean (expected returns). Now, while this mess of squiggly lines are effective in showing how chaotically random the alternate futures might be, it is difficult to appreciate how widely the potential returns diverge over time. For this purpose, we will simplify the figure by plotting only the maximum and minimum values for each day, and removing all the lines in between. # Plot range between minimum and maximum values df %>% dplyr::group_by(x) %>% dplyr::summarise( min_y = min(cum_y), max_y = max(cum_y) ) %>% ggplot2::ggplot( ggplot2::aes(x = x) ) + ggplot2::geom_line(ggplot2::aes(y = max_y), color = "steelblue") + ggplot2::geom_line(ggplot2::aes(y = min_y), color = "firebrick") + ggplot2::geom_ribbon(ggplot2::aes(ymin = min_y, ymax = max_y), alpha = 0.2) + get_gg_layers(df) Notice how the same set of graphical options (plot title, axis titles, guiding min/max lines and labels) can be applied again to this plot by calling + get_gg_layers(). Let’s see what this gives us: As you can see, the range of possible values over the 100-day period is much more visible in this version, although it abstracts away the details. One visualisation is not inherently better than the other; each has its own strengths and weaknesses depending on the story you need to tell with the plot. # Volatility of Returns So far, we’ve plotted random walks with 10% volatility. But what does that mean? Since we are assuming the returns to be normally distributed, the volatility of returns is the standard deviation around the expected returns (0% in our current assumption). So if we are simulating random walks with 10% volatility, it means that the actual returns will tend to be within a 10% range of the expected returns ~67% of the time. It also means that the returns will be within twice that range (or 20%) ~95% of the time, thrice that range (30%) ~99.7% of the time. But take a moment to think about what the opposites of these statement imply; Even though the expected returns is 0%, the actual returns can vary by more than 30% with 0.3% probability. This is in line with what we have observed in the previous random walks. Let’s see what happens at different levels of volatility. Notice how multiple random walks can now be conveniently generated by calling the previously defined func_random_walk() function. # Random Walk for volatility range 1-15% df1 <- func_random_walk(mean = 0.0, sd = 0.01, runs = 100, days = 100) df2 <- func_random_walk(mean = 0.0, sd = 0.05, runs = 100, days = 100) df3 <- func_random_walk(mean = 0.0, sd = 0.10, runs = 100, days = 100) df4 <- func_random_walk(mean = 0.0, sd = 0.15, runs = 100, days = 100) # Merge data frames into one df_merged <- dplyr::bind_rows( df1 %>% mutate(ver = "A) Vol 1%"), df2 %>% mutate(ver = "B) Vol 5%"), df3 %>% mutate(ver = "C) Vol 10%"), df4 %>% mutate(ver = "D) Vol 15%") ) # Plot range between minimum and maximum values df_merged %>% ggplot2::ggplot(ggplot2::aes( x = x, y = cum_y, color = factor(run), group = factor(run) )) + ggplot2::geom_line(alpha = 0.8) + ggplot2::labs(title = "", x = "", y = "") + ggplot2::facet_wrap(~ver, scales = "free") + ggplot2::scale_y_continuous(labels = scales::comma) + tidyquant::scale_colour_tq() + tidyquant::theme_tq() + ggplot2::theme(legend.position = "none") These graphs make you truly appreciate the axiom of “high risk, high returns”. While I’m sure the immediately apparent observation from these plots is the phenomenally high returns a high volatility can obtain in best-case scenarios (e.g. tenfold increase in capital within 100 days with 15% volatility), let’s take a closer look at the worst-case scenarios too. > min(subset(df1, x == 100)$cum_y)
[1] 792.5748
> min(subset(df2, x == 100)$cum_y) [1] 239.96 > min(subset(df3, x == 100)$cum_y)
[1] 70.06201
> min(subset(df4, x == 100)$cum_y) [1] 9.789624 At 15% volatility, the$1,000 capital can almost completely evaporate (down to $9.79) within 100 days in a worst-case scenario. On the other hand, at 1% volatility the best and worst case scenarios are ~$1,300 and \$800 respectively.

# Conclusion

As I have mentioned at the start of this post, random walks using mean and standard deviation of returns is an unsophisticated and naive method of forecasting the future.

But as mathematician George E. P. Box has famously said,

All models are wrong; some models are useful.

While such a simple method can be dangerous if trusted to make accurate predictions about the future, it is nevertheless a very useful tool to instill an intuitive appreciation of what different volatility levels can feel like.

With this I conclude my post. It ended up being much longer than I originally intended, but hopefully it has been helpful.