Mini-project 1

Topics in discrete-time Markov chains

Mini-projects should explore an application, illustration, implementation, or extension of a topic discussed in class. The expectation is that students will work in pairs to produce: (a) a 10-15 minute in-class presentation; and (b) a 1-2 page written summary written for an audience of peers. This first project should relate to topics in discrete-time Markov chains. The in-class presentationw will be held on Monday, February 1; the written deliverable is due by Friday, February 6, 11:59pm.

Projects may be more applied (e.g., a data application, a worked example from the text or elsewhere, or a software tutorial/vignette) or more methodological (e.g., simulation experiments, an extension not covered in lecture, or an exposition of a particular model, estimation technique, or algorithm). You may choose one of the options below, or craft your own. Please confirm your topic with me (a quick email will suffice) before going ahead.

GeoLife trajectories

Explore human mobility as a discrete-time Markov chain under one or several spatial resolutions of your choosing.

Tasks:

Choose a discretization (e.g., grid cells or clustered locations) and define states.
Build one or more trajectories and estimate a transition matrix.
Either:
- Apply the fitted model for inference of long-term behavior, prediction, etc.
- Explore how results change with varying spatial resolution (i.e., redefine the state space) or different processing of input data

Results should include…

A map/plot of states (grid or clusters) plus at least one example trajectory
A transition summary (e.g., top destinations from a chosen state, most visited states, long-run behavior, etc.)
At least one model diagnostic

Resources: [sampling script] [fitting random walks] [data]

More NYC taxis

Explore non-homogeneity by time of day, day of week, month, season, etc., to identify shifts in transition behavior. Utilize additional data besides the single month modeled in class.

Tasks:

Define state as taxi zone and estimate PU → DO transition probabilities from multiple subsets of the TLC data for different time periods of your choosing.
Decide on an estimation strategy with an appropriate amount of smoothing.
Compare/contrast estimates across periods.

Results should include…

At least one map-based visualization
A visual comparison of estimates between time periods
At least one model diagnostic

Resources: [estimation with smoothing] [data]

Branching processes

Explain what branching processes are and illustrate with a simple simulation. (Note this is the subject of Ch. 4 in the text.)

Tasks:

Write a short explanation of the Galton–Watson model: offspring distribution, generations, mean.
Simulate at least two regimes (e.g., subcritical vs supercritical; or different variances).
Demonstrate extinction vs survival behavior empirically via simulation.

Results should include…

Explaination of the model and key concepts (offspring distribution, mean, extinction).
One figure showing example sample paths (population size over generations).
One figure/table showing estimated extinction probability by parameter choice.

Estimation from multiple trajectories

Explain how likelihood estimation works when you have many independent sequences.

Tasks:

Describe the likelihood for multiple trajectories and explain why counts add.
Implement pooled estimation using aggregated counts:
[ N = m N^{(m)},p{ij} = . ]
Compare pooled and per-trajectory estimates for a few states or rows.

Results should include…

A clear explanation of the pooled estimator and why it is the MLE
One comparison plot/table showing variability across trajectories
An exploration of which trajectories differ most and why

More NOAA data

Extend the NOAA data example from class and explore weather Markov chains beyond one station and one state definition.

Tasks:

Use multiple locations or compare multiple discretizations of weather data
Estimate transition matrices and compare persistence across discretizations/locations
Assess and/or account for seasonality/homogeneity

Results should include…

A comparison across stations or discretizations (table/plot of key transition probabilities)
One persistence diagnostic (dry spell lengths, self-transition rates, etc.)
Short interpretation connecting differences to geography/seasonality

Resources: [simple model via MLE] [data]

Find your own topic

Please consult with me briefly if you want to focus on your own topic. I will probably approve, but we should agree on some tasks and minimum requirements similar to those above.