Brownian Motion as a Limit of Random Walks

Published

March 09, 2026

Overview

The formal definition of standard Brownian motion — the one you’ll find in section 8.1 — is a continuous-time stochastic process $\{B_t\}_{t \geq 0}$ satisfying:

$B_0 = 0$ a.s.
Sample paths $t \mapsto B_t$ are continuous a.s.
Stationary increments: $B_{s+h} - B_s \overset{d}{=} B_h$ for all $s, h \geq 0$.
Independent increments: for non-overlapping intervals $[s_1, t_1]$ and $[s_2, t_2]$, the increments $B_{t_1} - B_{s_1}$ and $B_{t_2} - B_{s_2}$ are independent.
Gaussian marginals: $B_t \sim N(0, t)$ for all $t > 0$.

Informally, however, one can think of Brownian motion as a random walk on a continuous state space in continuous time. In fact, one can construct a process satisfying (1)-(5) as a limit of a simple (discrete) random walk. The goal of this activity is to see the Brownian motion process emerge in exactly this way.

Work through this notebook section by section, run codes, and complete the exercises. To obtain a working copy of the notebook, find the “</> Code” button at the top of this page, and copy the source code into a blank .qmd file.

library(tidyverse)

Limiting construction

The Simple Random Walk

A simple random walk can be expressed as a sum of increments $S_n = \sum_{k=1}^n \xi_k$ where $\xi_1, \xi_2, \ldots$ are i.i.d. uniform on $\{-1, +1\}$ and $S_0 = 0$.

To see this, consider simulating a path sequentially, one step at a time:

# length of walk
n <- 50

# simulate sequentially
set.seed(1)
rw_seq <- numeric(n + 1)
rw_seq[1] <- 0
for (k in 1:n) {
  current <- rw_seq[k]
  increment <- sample(c(-1, 1), size = 1)   # draw increment at step k
  rw_seq[k + 1] <- current + increment
}

# plot the path
plot(1:(n+1), rw_seq, type = 'l',
     xlab = 'n', ylab = expression(X[n]))

We can simulate the exact same path by drawing all the increments at once and computing a cumulative sum:

# draw increments
set.seed(1)
inc <- sample(c(-1, 1), size = n, replace = TRUE)

# cumulative sum
rw_csum <- c(0, cumsum(inc))

# plot the path
plot(1:(n+1), rw_csum, type = 'l',
     xlab = 'n', ylab = expression(X[n]))

We’ll use the cumulative sum approach throughout. Here it is wrapped in a function:

rw_sim <- function(n, seed = NULL){
  if(!is.null(seed)){set.seed(seed)}
  inc <- sample(c(-1, 1), size = n, replace = TRUE)
  rw <- c(0, cumsum(inc))
  return(rw)
}

Time Rescaling

A random walk with $n$ steps lives on the integer time axis $\{0, 1, \ldots, n\}$. Let’s rescale time to the unit interval via $t = k/n$, so that a single “unit” of time always corresponds to $n$ steps regardless of $n$.

# rescale time axis to [0, 1]
n <- 50
rw_sim(n, seed = 1) %>% 
  plot((0:n)/n, ., type = 'l', 
       xlab = 't', ylab = expression(X[t]))

As $n$ grows, the rescaled time steps $1/n$ shrink and the path begins to look like a continuous function:

# try increasing n
n <- 500
rw_sim(n, seed = 1) %>% 
  plot((0:n)/n, ., type = 'l', 
       xlab = 't', ylab = expression(X[t]))

Run several realizations together to get a sense of the whole distribution of paths:

# Q: what happens to the mean and variance of Xt as t -> 1?
n <- 500
replicate(20, rw_sim(n)) %>%
  matplot((0:n)/n, ., col = 1, lty = 1, lwd = 0.2, type = 'l',
          xlab = 't', ylab = expression(X[t]))

If we take slices at fixed values of $t$, we can inspect how the marginal distributions change with time:

# simulate many paths
n <- 500
sims <- replicate(1000, rw_sim(n))

# histograms (across paths) at several time slices
tvals <- c(0.1, 0.4, 0.7, 1)
ix <- floor(tvals*n) + 1
par(mfrow = c(2, 2))
for (j in seq_along(tvals)){
  hist(sims[ix[j], ], breaks = 15, xlim = c(-100, 100), 
       main = paste("t =", tvals[j]), xlab = expression(X[t]))
}
par(mfrow = c(1, 1))

Before proceeding, work through the following exercise.

Exercise

Note that the time-rescaled process is $S_t = \sum_{k=1}^{nt} \xi_k$ where $\xi_1, \xi_2, \ldots$ are i.i.d. uniform on $\{-1, +1\}$ and $S_0 = 0$.

a. Find the mean and variance of $\xi_i$.

b. Use the CLT to find a Gaussian approximation for the distribution of $S_t$.

c. Overlay the approximation on the histogram for the distribution of $S_{0.5}$ from the simulated paths in sims below. (Note you need to run the whole cell for the curve to appear on the same plot.)

# marginal distribution at t = 0.5
hist(sims[n/2 + 1, ], freq = FALSE, main = 't = 0.5', xlab = expression(X[t]))

# adjust here
mu <- 0
sig <- 30
curve(dnorm(x, mean = mu, sd = sig), add = TRUE)

State Rescaling

Time rescaling alone does not produce a stable limit: as $n$ increases, the paths spread out without bound. To see this, compare the spread of paths for a few values of $n$:

n <- c(100, 500, 1000)
par(mfrow = c(1, 3))
for (j in seq_along(n)){
  replicate(100, rw_sim(n[j])) %>%
    matplot((0:n[j])/n[j], ., col = 1, lty = 1, lwd = 0.1, type = 'l',
            xlab = 't', ylab = expression(X[t]), ylim = c(-100, 100))
}
par(mfrow = c(1, 1))

Consider your CLT approximation from the exercise above. The code below compares the empirical CDFs of the rescaled state $S_n / c_n$ under two values of $n$. Find the right constant $c_n$ so the CDFs overlap:

# GOAL: get the empirical CDFs to overlap
# HINT: try a function of n
n <- c(500, 5000)
const <- c(1, 1) # adjust here (should be a vector)
sims1 <- replicate(1000, rw_sim(n[1]))/const[1]
sims2 <- replicate(1000, rw_sim(n[2]))/const[2]
F1 <- ecdf(sims1[n[1] + 1, ])
F2 <- ecdf(sims2[n[2] + 1, ])
plot(F1, do.points = FALSE, verticals = TRUE, main = NULL)
lines(F2, do.points = FALSE, verticals = TRUE, col = 'blue')

General Step Distributions

The $\pm 1$ increments are not special. Any sequence of i.i.d. increments with mean zero and finite variance leads to the same limiting process. This result is a functional generalization of the CLT known as Donsker’s theorem, which states that the normalized walk \[X_t^{(n)} = \frac{1}{\sqrt{n}} \sum_{k=1}^{\lfloor tn \rfloor} \xi_k\]

converges in distribution (as a process) to Brownian motion whenever $E[\xi_k] = 0$ and $\text{Var}(\xi_k) = 1$.

We’ll use a new simulation function that accepts an arbitrary increment generator and applies the $1/\sqrt{n}$ normalization:

rw_sim_iid <- function(n, rstep, seed = NULL){
  if(!is.null(seed)){set.seed(seed)}
  inc <- rstep(n)
  rw <- c(0, cumsum(inc))/sqrt(n)
  return(rw)
}

Four increment distributions with mean 0 and variance 1:

# simple random walk from before
r_srw <- function(n) sample(c(-1, 1), size = n, replace = TRUE)

# some options with mean 0 and unit variance
r_uniform <- function(n) runif(n, min = -sqrt(3), max = sqrt(3))
r_poisson <- function(n) rpois(n, lambda = 1) - 1
r_exp     <- function(n) rexp(n, rate = 1) - 1

# Q: do the scaled paths look qualitatively similar?
n <- 5000
par(mfrow = c(2, 2))
replicate(50, rw_sim_iid(n, r_srw)) %>%
  matplot((0:n)/n, ., col = 1, lty = 1, lwd = 0.2, type = 'l', ylim = c(-4, 4),
          xlab = 't', ylab = expression(X[t]), main = "SRW")
replicate(50, rw_sim_iid(n, r_poisson)) %>%
  matplot((0:n)/n, ., col = 1, lty = 1, lwd = 0.2, type = 'l', ylim = c(-4, 4),
          xlab = 't', ylab = expression(X[t]), main = "Poisson steps")
replicate(50, rw_sim_iid(n, r_uniform)) %>%
  matplot((0:n)/n, ., col = 1, lty = 1, lwd = 0.2, type = 'l', ylim = c(-4, 4),
          xlab = 't', ylab = expression(X[t]), main = "Uniform steps")
replicate(50, rw_sim_iid(n, r_exp)) %>%
  matplot((0:n)/n, ., col = 1, lty = 1, lwd = 0.2, type = 'l', ylim = c(-4, 4),
          xlab = 't', ylab = expression(X[t]), main = "Exponential steps")
par(mfrow = c(1, 1))

The finite variance condition is essential. The Cauchy distribution has no finite variance, and the $1/\sqrt{n}$ scaling fails to stabilize the paths:

# Q: why doesn't the 1/sqrt(n) scaling work for Cauchy steps?
r_cauchy <- function(n) rcauchy(n)
replicate(50, rw_sim_iid(n, r_cauchy)) %>%
  matplot((0:n)/n, ., col = 1, lty = 1, lwd = 0.2, type = 'l',
          xlab = 't', ylab = expression(X[t]), main = "Cauchy steps")

Note

The Cauchy distribution belongs to the stable family — the correct normalization for a Cauchy random walk is $n^{-1}$ rather than $n^{-1/2}$, and the limit is a Cauchy process, not Brownian motion.

Marginal Distributions

We can now verify the Gaussian marginals property directly. Fix a large $n$, simulate many paths under each increment distribution, and extract the value at a fixed time $t$:

# fix n large, simulate many paths, and extract the value at time t for each path
n <- 5000
R <- 3000
t_fixed <- 1 # tinker here
ix <- floor(t_fixed * n) + 1
set.seed(7341)
sims_srw  <- replicate(R, rw_sim_iid(n, r_srw)[ix])
sims_pois <- replicate(R, rw_sim_iid(n, r_poisson)[ix])
sims_unif <- replicate(R, rw_sim_iid(n, r_uniform)[ix])
sims_exp  <- replicate(R, rw_sim_iid(n, r_exp)[ix])

# arrange in list
step_names <- c("SRW", "Poisson", "Uniform", "Exponential")
step_sims  <- list(sims_srw, sims_pois, sims_unif, sims_exp)

The dashed curve in each panel below is the $N(0, t)$ density predicted by the CLT:

# theoretical distribution
xgrid       <- seq(-5, 5, length.out = 300)
ref_density <- dnorm(xgrid, mean = 0, sd = sqrt(t_fixed))

# compare with histograms
par(mfrow = c(2, 2))
for (j in seq_along(step_sims)) {
  hist(step_sims[[j]], breaks = 40, freq = FALSE,
       xlim = c(-5, 5), ylim = c(0, 0.5),
       main = step_names[j], xlab = expression(X[t]))
  lines(xgrid, ref_density, lty = 2, lwd = 1.5)
}
par(mfrow = c(1, 1))

ECDFs make deviations from the limit easier to see — any mismatch shows up as a gap between the empirical and reference curves:

# color scheme
ecdf_cols <- c("black", "blue", "red", "forestgreen")

# compare eCDFs at time t
xgrid <- seq(-5, 5, length.out = 500)
plot(xgrid, pnorm(xgrid, sd = sqrt(t_fixed)),
     type = 'l', lty = 2, lwd = 1.5,
     xlab = expression(X[t]), ylab = "CDF",
     main = paste("Marginal CDFs at t =", t_fixed))
for (j in seq_along(step_sims)) {
  lines(ecdf(step_sims[[j]]), do.points = FALSE, verticals = TRUE, col = ecdf_cols[j])
}
legend("topleft",
       legend = c("N(0,t)", step_names),
       lty = c(2, 1, 1, 1, 1),
       col = c("black", ecdf_cols),
       bty = "n")

Try changing t_fixed to see how the marginals look earlier in the process.

Stationary Increments

The simulations above establish some intuitions for the marginal distributions of a Brownian motion process $(B_t)_{t \geq 0}$. Let’s now consider the properties of process increments $B_{s + h} - B_s$.

We’ll work with a large matrix of simulated paths:

n <- 5000
R <- 3000
set.seed(2847)
paths <- replicate(R, rw_sim_iid(n, r_srw))
cols  <- c("black", "blue", "red")

The increment $B_{s+h} - B_s$ has the same distribution for any starting time $s$ — the distribution depends only on the length $h$ of the interval, not on where it starts. Fix $h = 0.3$ and compare the increment distributions starting from three different times:

# increment length
h <- 0.3

# increment locations
s_vals <- c(0, 0.2, 0.5)

# identify indices corresponding to setting above
ix_starts <- floor(s_vals * n) + 1
ix_ends   <- floor((s_vals + h) * n) + 1
inc_list  <- lapply(seq_along(s_vals), function(j) paths[ix_ends[j], ] - paths[ix_starts[j], ])

# plot eCDFs
for (j in seq_along(s_vals)) {
  if(j == 1){
    plot(ecdf(inc_list[[j]]), do.points = FALSE, verticals = TRUE, col = cols[j],
         main = paste0('Increments of length h = ', h))
  }else{
    lines(ecdf(inc_list[[j]]), do.points = FALSE, verticals = TRUE, col = cols[j])
  }
}
legend("topleft",
       legend = paste0("s = ", s_vals),
       lty = c(1, 1, 1), col = cols, bty = "n")

Vary $h$ across a few values to see how the scale shifts:

# Q: how does the increment distribution change as h grows?
h_vals <- c(0.1, 0.3, 0.5) # modify here

for (j in seq_along(h_vals)) {
  inc <- paths[floor(h_vals[j] * n) + 1, ]
  if(j == 1){
    plot(ecdf(inc), do.points = FALSE, verticals = TRUE,
         main = 'Increments of length h', col = cols[j])
  }else{
    lines(ecdf(inc), do.points = FALSE, verticals = TRUE, col = cols[j])
  }
}
legend("topleft",
       legend = paste0("h = ", h_vals),
       lty = c(1, 1, 1), col = cols, bty = "n")

Exercise

Consider that the process increments can be expressed as the limit of random walk increments

\[ S_{s + h} - S_s = \sum_{k \in J} \xi_k \]

for an appropriate set of indices $J$ and where $\xi_1, \xi_2, \ldots$ are i.i.d. with zero mean and unit variance.

a. Determine the index set $J$.

b. Use the CLT to form a large-sample approximation for the distribution of an increment of length $h$.

c. Choose a value of $h$ and compare your approximation to the simulated distribution based on paths.

# choose your h
h <- 0.5

# simulated distribution
inc <- paths[floor(h * n) + 1, ]
plot(ecdf(inc), do.points = FALSE, verticals = TRUE,
     main = paste0('Increments of length h = ', h), 
     col = cols[1])

# add your approximation here

Independent Increments

Increments over non-overlapping time intervals are independent. We can check this empirically with a scatterplot and correlation:

# increment lengths and locations
s1 <- 0;   t1 <- 0.3  # modify here
s2 <- 0.5; t2 <- 0.8  # modify here

# retrieve simulated values
inc1 <- paths[floor(t1*n)+1, ] - paths[floor(s1*n)+1, ]
inc2 <- paths[floor(t2*n)+1, ] - paths[floor(s2*n)+1, ]

# plot
plot(inc1, inc2,
     pch = '.', col = adjustcolor("black", alpha.f = 0.4),
     xlab = bquote(X[.(t1)] - X[.(s1)]),
     ylab = bquote(X[.(t2)] - X[.(s2)]),
     main = "Non-overlapping increments")
abline(h = 0, v = 0, lty = 2, col = "grey60")
legend("topleft",
       legend = paste("cor =", round(cor(inc1, inc2), 3)),
       bty = "n")

The near-zero correlation and unstructured scatterplot reflect independence.

Exercise

Now modify the interval endpoints so that $[s_1, t_1]$ and $[s_2, t_2]$ overlap. What happens to the correlation, and why does it have the sign it does?

Two-Dimensional Brownian Motion

A two-dimensional Brownian motion $\mathbf{B}_t = (B_t^{(1)}, B_t^{(2)})$ is simply a pair of independent one-dimensional Brownian motions — one for each coordinate. The construction carries over directly: run two independent scaled random walks in parallel. The simulator below returns a $2 \times (n+1)$ matrix, with rows giving the $x$ and $y$ coordinates at each time step:

rw2d_sim <- function(n, seed = NULL) {
  if (!is.null(seed)) set.seed(seed)
  rbind(
    c(0, cumsum(sample(c(-1, 1), n, replace = TRUE))),
    c(0, cumsum(sample(c(-1, 1), n, replace = TRUE)))
  ) / sqrt(n)
}

Paths in the Plane

Rather than plotting each coordinate against time, we can plot the path as a curve in the $(x, y)$ plane. The dot marks the origin.

# Q: how do the paths compare qualitatively to what you saw in 1D?
n <- 5000
set.seed(4921)
par(mfrow = c(1, 2))
for (j in 1:2) {
  path <- rw2d_sim(n)
  plot(path[1, ], path[2, ], type = 'l', asp = 1, lwd = 0.3,
       xlab = 'x', ylab = 'y')
  points(0, 0, pch = 19, cex = 0.8)
}
par(mfrow = c(1, 1))

Marginal Distributions

At a fixed time $t$, the position $\mathbf{B}_t$ has a bivariate normal distribution with mean $\mathbf{0}$ and covariance $tI$ — the distribution is circularly symmetric with spread growing in $t$. Simulate many paths and scatter the positions at time $t$; the red circle has radius $\sqrt{t}$, the 1-SD contour of $N(\mathbf{0}, tI)$:

n <- 5000
R <- 2000
t_fixed <- 1   # modify here
ix <- floor(t_fixed * n) + 1
set.seed(3814)
paths2d <- replicate(R, rw2d_sim(n))   # 2 x (n+1) x R array
pos <- paths2d[, ix, ]                 # 2 x R matrix of positions at time t

theta <- seq(0, 2*pi, length.out = 300)
plot(pos[1, ], pos[2, ], pch = '.', asp = 1,
     col = adjustcolor("black", alpha.f = 0.3),
     xlab = 'x', ylab = 'y',
     main = paste('Positions at t =', t_fixed))
lines(sqrt(t_fixed) * cos(theta), sqrt(t_fixed) * sin(theta), col = 'red', lwd = 1.5)

Try changing t_fixed. The cloud should expand, always staying circularly symmetric.

Stationary Increments

The 2D increment $\mathbf{B}_{s+h} - \mathbf{B}_s$ is a 2D vector whose distribution depends only on $h$, not $s$. The three panels below show this increment cloud at different starting times $s$ for the same $h$; the red circle is the 1-SD contour of the expected $N(\mathbf{0}, hI)$ limit:

# Q: do the three clouds look the same?
h      <- 0.3           # modify here
s_vals <- c(0, 0.2, 0.5)

par(mfrow = c(1, 3), pty = "s")
for (j in seq_along(s_vals)) {
  i0 <- floor(s_vals[j] * n) + 1
  i1 <- floor((s_vals[j] + h) * n) + 1
  dx <- paths2d[1, i1, ] - paths2d[1, i0, ]
  dy <- paths2d[2, i1, ] - paths2d[2, i0, ]
  plot(dx, dy, pch = '.',
       col = adjustcolor("black", alpha.f = 0.3),
       xlim = c(-2, 2), ylim = c(-2, 2),
       xlab = 'dx', ylab = 'dy',
       main = paste0('s = ', s_vals[j]))
  lines(sqrt(h) * cos(theta), sqrt(h) * sin(theta), col = 'red', lwd = 1.5)
}
par(mfrow = c(1, 1), pty = "m")

Independent Increments

Independence of non-overlapping increments holds coordinate-wise (and therefore jointly). As before, a scatterplot and correlation of the $x$-coordinates of two non-overlapping increments should show no structure:

# Q: what do you expect if the intervals overlap?
s1 <- 0;   t1 <- 0.3   # modify here
s2 <- 0.5; t2 <- 0.8   # modify here

inc1_x <- paths2d[1, floor(t1*n)+1, ] - paths2d[1, floor(s1*n)+1, ]
inc2_x <- paths2d[1, floor(t2*n)+1, ] - paths2d[1, floor(s2*n)+1, ]

plot(inc1_x, inc2_x,
     pch = '.', col = adjustcolor("black", alpha.f = 0.4),
     xlab = bquote(B[.(t1)]^(1) - B[.(s1)]^(1)),
     ylab = bquote(B[.(t2)]^(1) - B[.(s2)]^(1)),
     main = "Non-overlapping increments (x-coordinate)")
abline(h = 0, v = 0, lty = 2, col = "grey60")
legend("topleft", legend = paste("cor =", round(cor(inc1_x, inc2_x), 3)), bty = "n")

--- title: "Brownian Motion as a Limit of Random Walks" date: "3/9/26" date-format: "MMMM DD, YYYY" format: html: toc: false code-tools: true execute: eval: false echo: true warning: false --- ## Overview The formal definition of standard Brownian motion — the one you'll find in section 8.1 — is a continuous-time stochastic process $\{B_t\}_{t \geq 0}$ satisfying: 1. $B_0 = 0$ a.s. 2. Sample paths $t \mapsto B_t$ are continuous a.s. 3. Stationary increments: $B_{s+h} - B_s \overset{d}{=} B_h$ for all $s, h \geq 0$. 4. Independent increments: for non-overlapping intervals $[s_1, t_1]$ and $[s_2, t_2]$, the increments $B_{t_1} - B_{s_1}$ and $B_{t_2} - B_{s_2}$ are independent. 5. Gaussian marginals: $B_t \sim N(0, t)$ for all $t > 0$. Informally, however, one can think of Brownian motion as a random walk on a continuous state space in continuous time. In fact, one can construct a process satisfying (1)-(5) as a limit of a simple (discrete) random walk. The goal of this activity is to see the Brownian motion process emerge in exactly this way. Work through this notebook section by section, run codes, and complete the exercises. To obtain a working copy of the notebook, find the "\</\> Code" button at the top of this page, and copy the source code into a blank `.qmd` file. ```{r} library(tidyverse) ``` ------------------------------------------------------------------------ ## Limiting construction ### The Simple Random Walk A **simple random walk** can be expressed as a sum of increments $S_n = \sum_{k=1}^n \xi_k$ where $\xi_1, \xi_2, \ldots$ are i.i.d. uniform on $\{-1, +1\}$ and $S_0 = 0$. To see this, consider simulating a path sequentially, one step at a time: ```{r} # length of walk n <- 50 # simulate sequentially set.seed(1) rw_seq <- numeric(n + 1) rw_seq[1] <- 0 for (k in 1:n) { current <- rw_seq[k] increment <- sample(c(-1, 1), size = 1) # draw increment at step k rw_seq[k + 1] <- current + increment } # plot the path plot(1:(n+1), rw_seq, type = 'l', xlab = 'n', ylab = expression(X[n])) ``` We can simulate the exact same path by drawing all the increments at once and computing a cumulative sum: ```{r} # draw increments set.seed(1) inc <- sample(c(-1, 1), size = n, replace = TRUE) # cumulative sum rw_csum <- c(0, cumsum(inc)) # plot the path plot(1:(n+1), rw_csum, type = 'l', xlab = 'n', ylab = expression(X[n])) ``` We'll use the cumulative sum approach throughout. Here it is wrapped in a function: ```{r} rw_sim <- function(n, seed = NULL){ if(!is.null(seed)){set.seed(seed)} inc <- sample(c(-1, 1), size = n, replace = TRUE) rw <- c(0, cumsum(inc)) return(rw) } ``` ------------------------------------------------------------------------ ### Time Rescaling A random walk with $n$ steps lives on the integer time axis $\{0, 1, \ldots, n\}$. Let's rescale time to the unit interval via $t = k/n$, so that a single "unit" of time always corresponds to $n$ steps regardless of $n$. ```{r} # rescale time axis to [0, 1] n <- 50 rw_sim(n, seed = 1) %>% plot((0:n)/n, ., type = 'l', xlab = 't', ylab = expression(X[t])) ``` As $n$ grows, the rescaled time steps $1/n$ shrink and the path begins to look like a continuous function: ```{r} # try increasing n n <- 500 rw_sim(n, seed = 1) %>% plot((0:n)/n, ., type = 'l', xlab = 't', ylab = expression(X[t])) ``` Run several realizations together to get a sense of the whole distribution of paths: ```{r} # Q: what happens to the mean and variance of Xt as t -> 1? n <- 500 replicate(20, rw_sim(n)) %>% matplot((0:n)/n, ., col = 1, lty = 1, lwd = 0.2, type = 'l', xlab = 't', ylab = expression(X[t])) ``` If we take slices at fixed values of $t$, we can inspect how the marginal distributions change with time: ```{r} # simulate many paths n <- 500 sims <- replicate(1000, rw_sim(n)) # histograms (across paths) at several time slices tvals <- c(0.1, 0.4, 0.7, 1) ix <- floor(tvals*n) + 1 par(mfrow = c(2, 2)) for (j in seq_along(tvals)){ hist(sims[ix[j], ], breaks = 15, xlim = c(-100, 100), main = paste("t =", tvals[j]), xlab = expression(X[t])) } par(mfrow = c(1, 1)) ``` Before proceeding, work through the following exercise. ::: callout-tip ## Exercise Note that the time-rescaled process is $S_t = \sum_{k=1}^{nt} \xi_k$ where $\xi_1, \xi_2, \ldots$ are i.i.d. uniform on $\{-1, +1\}$ and $S_0 = 0$. a\. Find the mean and variance of $\xi_i$. b\. Use the CLT to find a Gaussian approximation for the distribution of $S_t$. c\. Overlay the approximation on the histogram for the distribution of $S_{0.5}$ from the simulated paths in `sims` below. (Note you need to run the whole cell for the curve to appear on the same plot.) ```{r} # marginal distribution at t = 0.5 hist(sims[n/2 + 1, ], freq = FALSE, main = 't = 0.5', xlab = expression(X[t])) # adjust here mu <- 0 sig <- 30 curve(dnorm(x, mean = mu, sd = sig), add = TRUE) ``` ::: ------------------------------------------------------------------------ ### State Rescaling Time rescaling alone does not produce a stable limit: as $n$ increases, the paths spread out without bound. To see this, compare the spread of paths for a few values of $n$: ```{r} n <- c(100, 500, 1000) par(mfrow = c(1, 3)) for (j in seq_along(n)){ replicate(100, rw_sim(n[j])) %>% matplot((0:n[j])/n[j], ., col = 1, lty = 1, lwd = 0.1, type = 'l', xlab = 't', ylab = expression(X[t]), ylim = c(-100, 100)) } par(mfrow = c(1, 1)) ``` Consider your CLT approximation from the exercise above. The code below compares the empirical CDFs of the rescaled state $S_n / c_n$ under two values of $n$. Find the right constant $c_n$ so the CDFs overlap: ```{r} # GOAL: get the empirical CDFs to overlap # HINT: try a function of n n <- c(500, 5000) const <- c(1, 1) # adjust here (should be a vector) sims1 <- replicate(1000, rw_sim(n[1]))/const[1] sims2 <- replicate(1000, rw_sim(n[2]))/const[2] F1 <- ecdf(sims1[n[1] + 1, ]) F2 <- ecdf(sims2[n[2] + 1, ]) plot(F1, do.points = FALSE, verticals = TRUE, main = NULL) lines(F2, do.points = FALSE, verticals = TRUE, col = 'blue') ``` ------------------------------------------------------------------------ ### General Step Distributions The $\pm 1$ increments are not special. Any sequence of i.i.d. increments with mean zero and finite variance leads to the same limiting process. This result is a functional generalization of the CLT known as *Donsker's theorem*, which states that the normalized walk $$X_t^{(n)} = \frac{1}{\sqrt{n}} \sum_{k=1}^{\lfloor tn \rfloor} \xi_k$$ converges in distribution (as a process) to Brownian motion whenever $E[\xi_k] = 0$ and $\text{Var}(\xi_k) = 1$. We'll use a new simulation function that accepts an arbitrary increment generator and applies the $1/\sqrt{n}$ normalization: ```{r} rw_sim_iid <- function(n, rstep, seed = NULL){ if(!is.null(seed)){set.seed(seed)} inc <- rstep(n) rw <- c(0, cumsum(inc))/sqrt(n) return(rw) } ``` Four increment distributions with mean 0 and variance 1: ```{r} # simple random walk from before r_srw <- function(n) sample(c(-1, 1), size = n, replace = TRUE) # some options with mean 0 and unit variance r_uniform <- function(n) runif(n, min = -sqrt(3), max = sqrt(3)) r_poisson <- function(n) rpois(n, lambda = 1) - 1 r_exp <- function(n) rexp(n, rate = 1) - 1 ``` ```{r} # Q: do the scaled paths look qualitatively similar? n <- 5000 par(mfrow = c(2, 2)) replicate(50, rw_sim_iid(n, r_srw)) %>% matplot((0:n)/n, ., col = 1, lty = 1, lwd = 0.2, type = 'l', ylim = c(-4, 4), xlab = 't', ylab = expression(X[t]), main = "SRW") replicate(50, rw_sim_iid(n, r_poisson)) %>% matplot((0:n)/n, ., col = 1, lty = 1, lwd = 0.2, type = 'l', ylim = c(-4, 4), xlab = 't', ylab = expression(X[t]), main = "Poisson steps") replicate(50, rw_sim_iid(n, r_uniform)) %>% matplot((0:n)/n, ., col = 1, lty = 1, lwd = 0.2, type = 'l', ylim = c(-4, 4), xlab = 't', ylab = expression(X[t]), main = "Uniform steps") replicate(50, rw_sim_iid(n, r_exp)) %>% matplot((0:n)/n, ., col = 1, lty = 1, lwd = 0.2, type = 'l', ylim = c(-4, 4), xlab = 't', ylab = expression(X[t]), main = "Exponential steps") par(mfrow = c(1, 1)) ``` The finite variance condition is essential. The Cauchy distribution has no finite variance, and the $1/\sqrt{n}$ scaling fails to stabilize the paths: ```{r} # Q: why doesn't the 1/sqrt(n) scaling work for Cauchy steps? r_cauchy <- function(n) rcauchy(n) replicate(50, rw_sim_iid(n, r_cauchy)) %>% matplot((0:n)/n, ., col = 1, lty = 1, lwd = 0.2, type = 'l', xlab = 't', ylab = expression(X[t]), main = "Cauchy steps") ``` ::: callout-note The Cauchy distribution belongs to the stable family — the correct normalization for a Cauchy random walk is $n^{-1}$ rather than $n^{-1/2}$, and the limit is a Cauchy process, not Brownian motion. ::: ------------------------------------------------------------------------ ### Marginal Distributions We can now verify the Gaussian marginals property directly. Fix a large $n$, simulate many paths under each increment distribution, and extract the value at a fixed time $t$: ```{r} # fix n large, simulate many paths, and extract the value at time t for each path n <- 5000 R <- 3000 t_fixed <- 1 # tinker here ix <- floor(t_fixed * n) + 1 set.seed(7341) sims_srw <- replicate(R, rw_sim_iid(n, r_srw)[ix]) sims_pois <- replicate(R, rw_sim_iid(n, r_poisson)[ix]) sims_unif <- replicate(R, rw_sim_iid(n, r_uniform)[ix]) sims_exp <- replicate(R, rw_sim_iid(n, r_exp)[ix]) # arrange in list step_names <- c("SRW", "Poisson", "Uniform", "Exponential") step_sims <- list(sims_srw, sims_pois, sims_unif, sims_exp) ``` The dashed curve in each panel below is the $N(0, t)$ density predicted by the CLT: ```{r} # theoretical distribution xgrid <- seq(-5, 5, length.out = 300) ref_density <- dnorm(xgrid, mean = 0, sd = sqrt(t_fixed)) # compare with histograms par(mfrow = c(2, 2)) for (j in seq_along(step_sims)) { hist(step_sims[[j]], breaks = 40, freq = FALSE, xlim = c(-5, 5), ylim = c(0, 0.5), main = step_names[j], xlab = expression(X[t])) lines(xgrid, ref_density, lty = 2, lwd = 1.5) } par(mfrow = c(1, 1)) ``` ECDFs make deviations from the limit easier to see — any mismatch shows up as a gap between the empirical and reference curves: ```{r} # color scheme ecdf_cols <- c("black", "blue", "red", "forestgreen") # compare eCDFs at time t xgrid <- seq(-5, 5, length.out = 500) plot(xgrid, pnorm(xgrid, sd = sqrt(t_fixed)), type = 'l', lty = 2, lwd = 1.5, xlab = expression(X[t]), ylab = "CDF", main = paste("Marginal CDFs at t =", t_fixed)) for (j in seq_along(step_sims)) { lines(ecdf(step_sims[[j]]), do.points = FALSE, verticals = TRUE, col = ecdf_cols[j]) } legend("topleft", legend = c("N(0,t)", step_names), lty = c(2, 1, 1, 1, 1), col = c("black", ecdf_cols), bty = "n") ``` Try changing `t_fixed` to see how the marginals look earlier in the process. ------------------------------------------------------------------------ ## Stationary Increments The simulations above establish some intuitions for the marginal distributions of a Brownian motion process $(B_t)_{t \geq 0}$. Let's now consider the properties of process increments $B_{s + h} - B_s$. We'll work with a large matrix of simulated paths: ```{r} n <- 5000 R <- 3000 set.seed(2847) paths <- replicate(R, rw_sim_iid(n, r_srw)) cols <- c("black", "blue", "red") ``` The increment $B_{s+h} - B_s$ has the same distribution for *any* starting time $s$ — the distribution depends only on the length $h$ of the interval, not on where it starts. Fix $h = 0.3$ and compare the increment distributions starting from three different times: ```{r} # increment length h <- 0.3 # increment locations s_vals <- c(0, 0.2, 0.5) # identify indices corresponding to setting above ix_starts <- floor(s_vals * n) + 1 ix_ends <- floor((s_vals + h) * n) + 1 inc_list <- lapply(seq_along(s_vals), function(j) paths[ix_ends[j], ] - paths[ix_starts[j], ]) # plot eCDFs for (j in seq_along(s_vals)) { if(j == 1){ plot(ecdf(inc_list[[j]]), do.points = FALSE, verticals = TRUE, col = cols[j], main = paste0('Increments of length h = ', h)) }else{ lines(ecdf(inc_list[[j]]), do.points = FALSE, verticals = TRUE, col = cols[j]) } } legend("topleft", legend = paste0("s = ", s_vals), lty = c(1, 1, 1), col = cols, bty = "n") ``` Vary $h$ across a few values to see how the scale shifts: ```{r} # Q: how does the increment distribution change as h grows? h_vals <- c(0.1, 0.3, 0.5) # modify here for (j in seq_along(h_vals)) { inc <- paths[floor(h_vals[j] * n) + 1, ] if(j == 1){ plot(ecdf(inc), do.points = FALSE, verticals = TRUE, main = 'Increments of length h', col = cols[j]) }else{ lines(ecdf(inc), do.points = FALSE, verticals = TRUE, col = cols[j]) } } legend("topleft", legend = paste0("h = ", h_vals), lty = c(1, 1, 1), col = cols, bty = "n") ``` ::: callout-tip ## Exercise Consider that the process increments can be expressed as the limit of random walk increments $$ S_{s + h} - S_s = \sum_{k \in J} \xi_k $$ for an appropriate set of indices $J$ and where $\xi_1, \xi_2, \ldots$ are i.i.d. with zero mean and unit variance. a\. Determine the index set $J$. b\. Use the CLT to form a large-sample approximation for the distribution of an increment of length $h$. c\. Choose a value of $h$ and compare your approximation to the simulated distribution based on `paths`. ```{r} # choose your h h <- 0.5 # simulated distribution inc <- paths[floor(h * n) + 1, ] plot(ecdf(inc), do.points = FALSE, verticals = TRUE, main = paste0('Increments of length h = ', h), col = cols[1]) # add your approximation here ``` ::: ### Independent Increments Increments over non-overlapping time intervals are independent. We can check this empirically with a scatterplot and correlation: ```{r} # increment lengths and locations s1 <- 0; t1 <- 0.3 # modify here s2 <- 0.5; t2 <- 0.8 # modify here # retrieve simulated values inc1 <- paths[floor(t1*n)+1, ] - paths[floor(s1*n)+1, ] inc2 <- paths[floor(t2*n)+1, ] - paths[floor(s2*n)+1, ] # plot plot(inc1, inc2, pch = '.', col = adjustcolor("black", alpha.f = 0.4), xlab = bquote(X[.(t1)] - X[.(s1)]), ylab = bquote(X[.(t2)] - X[.(s2)]), main = "Non-overlapping increments") abline(h = 0, v = 0, lty = 2, col = "grey60") legend("topleft", legend = paste("cor =", round(cor(inc1, inc2), 3)), bty = "n") ``` The near-zero correlation and unstructured scatterplot reflect independence. ::: callout-tip ## Exercise Now modify the interval endpoints so that $[s_1, t_1]$ and $[s_2, t_2]$ **overlap**. What happens to the correlation, and why does it have the sign it does? ::: ------------------------------------------------------------------------ ## Two-Dimensional Brownian Motion A two-dimensional Brownian motion $\mathbf{B}_t = (B_t^{(1)}, B_t^{(2)})$ is simply a pair of independent one-dimensional Brownian motions — one for each coordinate. The construction carries over directly: run two independent scaled random walks in parallel. The simulator below returns a $2 \times (n+1)$ matrix, with rows giving the $x$ and $y$ coordinates at each time step: ```{r} rw2d_sim <- function(n, seed = NULL) { if (!is.null(seed)) set.seed(seed) rbind( c(0, cumsum(sample(c(-1, 1), n, replace = TRUE))), c(0, cumsum(sample(c(-1, 1), n, replace = TRUE))) ) / sqrt(n) } ``` ### Paths in the Plane Rather than plotting each coordinate against time, we can plot the path as a curve in the $(x, y)$ plane. The dot marks the origin. ```{r} # Q: how do the paths compare qualitatively to what you saw in 1D? n <- 5000 set.seed(4921) par(mfrow = c(1, 2)) for (j in 1:2) { path <- rw2d_sim(n) plot(path[1, ], path[2, ], type = 'l', asp = 1, lwd = 0.3, xlab = 'x', ylab = 'y') points(0, 0, pch = 19, cex = 0.8) } par(mfrow = c(1, 1)) ``` ### Marginal Distributions At a fixed time $t$, the position $\mathbf{B}_t$ has a bivariate normal distribution with mean $\mathbf{0}$ and covariance $tI$ — the distribution is circularly symmetric with spread growing in $t$. Simulate many paths and scatter the positions at time $t$; the red circle has radius $\sqrt{t}$, the 1-SD contour of $N(\mathbf{0}, tI)$: ```{r} n <- 5000 R <- 2000 t_fixed <- 1 # modify here ix <- floor(t_fixed * n) + 1 set.seed(3814) paths2d <- replicate(R, rw2d_sim(n)) # 2 x (n+1) x R array pos <- paths2d[, ix, ] # 2 x R matrix of positions at time t theta <- seq(0, 2*pi, length.out = 300) plot(pos[1, ], pos[2, ], pch = '.', asp = 1, col = adjustcolor("black", alpha.f = 0.3), xlab = 'x', ylab = 'y', main = paste('Positions at t =', t_fixed)) lines(sqrt(t_fixed) * cos(theta), sqrt(t_fixed) * sin(theta), col = 'red', lwd = 1.5) ``` Try changing `t_fixed`. The cloud should expand, always staying circularly symmetric. ### Stationary Increments The 2D increment $\mathbf{B}_{s+h} - \mathbf{B}_s$ is a 2D vector whose distribution depends only on $h$, not $s$. The three panels below show this increment cloud at different starting times $s$ for the same $h$; the red circle is the 1-SD contour of the expected $N(\mathbf{0}, hI)$ limit: ```{r} # Q: do the three clouds look the same? h <- 0.3 # modify here s_vals <- c(0, 0.2, 0.5) par(mfrow = c(1, 3), pty = "s") for (j in seq_along(s_vals)) { i0 <- floor(s_vals[j] * n) + 1 i1 <- floor((s_vals[j] + h) * n) + 1 dx <- paths2d[1, i1, ] - paths2d[1, i0, ] dy <- paths2d[2, i1, ] - paths2d[2, i0, ] plot(dx, dy, pch = '.', col = adjustcolor("black", alpha.f = 0.3), xlim = c(-2, 2), ylim = c(-2, 2), xlab = 'dx', ylab = 'dy', main = paste0('s = ', s_vals[j])) lines(sqrt(h) * cos(theta), sqrt(h) * sin(theta), col = 'red', lwd = 1.5) } par(mfrow = c(1, 1), pty = "m") ``` ### Independent Increments Independence of non-overlapping increments holds coordinate-wise (and therefore jointly). As before, a scatterplot and correlation of the $x$-coordinates of two non-overlapping increments should show no structure: ```{r} # Q: what do you expect if the intervals overlap? s1 <- 0; t1 <- 0.3 # modify here s2 <- 0.5; t2 <- 0.8 # modify here inc1_x <- paths2d[1, floor(t1*n)+1, ] - paths2d[1, floor(s1*n)+1, ] inc2_x <- paths2d[1, floor(t2*n)+1, ] - paths2d[1, floor(s2*n)+1, ] plot(inc1_x, inc2_x, pch = '.', col = adjustcolor("black", alpha.f = 0.4), xlab = bquote(B[.(t1)]^(1) - B[.(s1)]^(1)), ylab = bquote(B[.(t2)]^(1) - B[.(s2)]^(1)), main = "Non-overlapping increments (x-coordinate)") abline(h = 0, v = 0, lty = 2, col = "grey60") legend("topleft", legend = paste("cor =", round(cor(inc1_x, inc2_x), 3)), bty = "n") ```