Mapping Disease Spread: Graph Forecasters with ST-GCN Surrogates

Your Neural Network for Spatiotemporal Epidemic Intelligence

🗺️ Introduction: When Geography Meets Machine Learning

Imagine trying to predict flu cases in New York City without considering what’s happening in New Jersey, or forecasting dengue in Bangkok while ignoring neighboring provinces. Traditional epidemic models often treat locations as isolated islands, missing the fundamental truth that diseases spread through networks of human movement and contact.

Enter Graph Forecasters based on Spatiotemporal Graph Convolutional Networks (ST-GCNs)—machine learning models that treat geographic regions as nodes in a network, where disease dynamics in one location directly influence its neighbors. These models don’t just ask “What happened here yesterday?” but “What happened here and everywhere connected to here over the past few weeks?”

Originally developed for traffic prediction and social network analysis [1-2], ST-GCNs have been brilliantly adapted to epidemiological forecasting, capturing how diseases ripple through populations along the invisible threads of human mobility, transportation networks, and social connections [3-4]. Unlike traditional spatial models that assume smooth geographic gradients, graph forecasters respect the actual connectivity structure of real-world populations.

🧮 Model Description: The Mathematics of Networked Prediction

Graph forecasters treat epidemic surveillance as a spatiotemporal graph signal processing problem, where disease incidence is a signal that evolves over both time and space according to the underlying network structure.

Graph Structure Definition

Let G = (V, E, A) represent the geographic network:

V: Set of N regions (nodes), e.g., counties, cities, or countries
E: Set of edges representing connections between regions
A: N × N adjacency matrix where Aᵢⱼ represents connection strength between region i and j

The reference equation you provided represents a linear graph forecaster:

ŷᵢ,ₜ = β₀ + β₁ · Iᵢ,ₜ₋₁ + β₂ · ⟨I⟩_{Neigh(i), t−1} + β₃ · seasonₜ

Where:

ŷᵢ,ₜ: Predicted incidence in region i at time t
Iᵢ,ₜ₋₁: Incidence in region i at time t−1
⟨I⟩_{Neigh(i), t−1}: Average incidence among neighbors of region i at time t−1
seasonₜ: Seasonal adjustment term (e.g., sin(2πt/52) for weekly data)
β₀, β₁, β₂, β₃: Regression coefficients

The neighbor average is computed as:

⟨I⟩_{Neigh(i), t−1} = (∑₍j=1₎ᴺ Aᵢⱼ · Iⱼ,ₜ₋₁) / (∑₍j=1₎ᴺ Aᵢⱼ)

Spatiotemporal Graph Convolutional Network (ST-GCN)

Modern ST-GCNs replace the simple linear aggregation with learnable graph convolutions:

H⁽ˡ⁺¹⁾ = σ(Â · H⁽ˡ⁾ · Θ⁽ˡ⁾)

Where:

H⁽ˡ⁾: Node feature matrix at layer l (N × F dimensions)
Â: Normalized adjacency matrix Â = D⁻¹ᐧ² · A · D⁻¹ᐧ²
D: Degree matrix where Dᵢᵢ = ∑₍j₎ Aᵢⱼ
Θ⁽ˡ⁾: Learnable weight matrix for layer l
σ(·): Nonlinear activation function (e.g., ReLU)

For temporal dynamics, ST-GCNs stack temporal convolutions:

Z = TemporalConv1D(H; kernel_size, dilation)

Combining spatial and temporal convolutions creates a powerful architecture that captures both local spatial dependencies and temporal evolution patterns.

Full ST-GCN Architecture

Input: X ∈ ℝᴺˣᵀˣᶠ (N regions, T time steps, f features)
Spatial GCN: Xˢ = GCN(X)
Temporal Conv: Xˢᵀ = TemporalConv1D(Xˢ)
Output: ŷ = Linear(Xˢᵀ[:, -1, :]) ∈ ℝᴺ

The model is trained by minimizing mean squared error:

Loss = (1/(N·T)) · ∑₍i=1₎ᴺ ∑₍t=1₎ᵀ (ŷᵢ,ₜ − Iᵢ,ₜ)²

📊 Key Parameter Definitions and Typical Values

Understanding these parameters helps you configure and interpret graph forecasters effectively.


N	Number of regions (nodes)	10 – 1000	Larger N = more complex spatial patterns
T	Training time period	100 – 1000 days	Longer T = better temporal pattern learning
H	Forecast horizon	1 – 14 days	Shorter H = more accurate predictions
ρ	Graph diffusion strength	0.1 – 0.9	Higher ρ = stronger spatial coupling
Aᵢⱼ	Adjacency weights	0 – 1 (normalized)	Often based on mobility, distance, or connectivity
β coefficients	Linear model parameters	-2 to 2	β₂ > 0 indicates positive spatial spillover

Graph Construction Methods

The adjacency matrix A can be constructed using various real-world data sources:

Distance-based: Aᵢⱼ = exp(−dᵢⱼ / σ) where dᵢⱼ is distance between regions
Mobility-based: Aᵢⱼ = mobile_flowsᵢⱼ / total_flowsᵢ (from cell phone or transportation data)
Administrative: Aᵢⱼ = 1 if regions share a border, 0 otherwise
Learned: Aᵢⱼ optimized during training as learnable parameters

Typical Graph Structures

Ring graph: Each region connected only to immediate neighbors (ρ controls connection strength)
Fully connected: All regions influence each other (rarely realistic)
Scale-free: Few highly connected “hub” regions (common in transportation networks)
Small-world: High local clustering with short path lengths (typical of real human networks)

⚠️ Assumptions and Applicability: When Graph Forecasters Shine

Graph forecasters are powerful but work best under specific epidemiological and data conditions.

✅ Ideal Applications

Multi-region surveillance data: Cases reported for multiple connected geographic units
Known connectivity patterns: Mobility data, transportation networks, or social connections available
Spatiotemporal clustering: Diseases that spread geographically over time (flu, dengue, COVID-19)
Moderate to high case counts: Sufficient signal in each region for learning
Stable network structure: Connectivity patterns don’t change dramatically during observation period

❌ Limitations and Challenges

Sparse regions: Areas with very few cases provide little signal for learning
Changing connectivity: Pandemic restrictions altering mobility patterns
Boundary effects: Edge regions with fewer neighbors may be poorly predicted
Computational complexity: Large N (>1000 regions) requires significant computational resources
Data requirements: Need both spatial and temporal dimensions well-populated

💡 Pro Tip: Always validate graph structure assumptions—plot actual case correlations against your adjacency matrix to ensure your graph reflects real disease spread patterns [5].

🚀 Model Extensions and Variants: Advanced Graph Forecasting

The basic ST-GCN framework has inspired numerous sophisticated extensions for real-world epidemiological challenges.

1. Attention-Based Graph Networks

Replace fixed adjacency with learnable attention weights:

Aᵢⱼ = softmax(LeakyReLU(aᵀ · [W · hᵢ || W · hⱼ]))

Where a is a learnable attention vector and || denotes concatenation. This allows the model to discover important connections automatically [6].

2. Dynamic Graph Convolutional Networks

Handle time-varying connectivity (e.g., changing mobility during interventions):

Âₜ = f(mobilityₜ)
Hₜ = Âₜ · Hₜ₋₁ · Θ

Where the adjacency matrix evolves over time based on external data [7].

3. Multiscale Graph Networks

Capture both local and global spatial patterns:

Hˡᵒᶜᵃˡ = GCNₗₒcₐₗ(X)
Hᵍˡᵒᵇᵃˡ = GCNᵍₗₒbₐₗ(X)
H = concatenate(Hˡᵒᶜᵃˡ, Hᵍˡᵒᵇᵃˡ)

Using different graph structures for different spatial scales [8].

4. Probabilistic ST-GCNs

Output prediction intervals instead of point estimates:

ŷᵢ,ₜ ~ Normal(μᵢ,ₜ, σᵢ,ₜ)
μᵢ,ₜ, σᵢ,ₜ = ST-GCN(X; θ)

Using quantile regression or distributional outputs for uncertainty quantification [9].

5. Heterogeneous Graph Networks

Handle multiple node types (hospitals, schools, neighborhoods) and edge types (commuting, social, healthcare):

G = (V₁ ∪ V₂ ∪ …, E₁ ∪ E₂ ∪ …)

With different convolution operations for different relationship types [10].

6. Physics-Informed Graph Networks

Incorporate mechanistic epidemic constraints:

Loss = MSE + λ · ||dI/dt − β · S · I / N||²

Where the neural network predictions are constrained to approximately satisfy compartmental model equations [11].

🎯 Conclusion: The Networked Future of Epidemic Forecasting

Graph forecasters based on ST-GCNs represent a paradigm shift in how we model disease spread—moving from isolated geographic units to interconnected networks that mirror real human connectivity. By explicitly modeling how diseases propagate through the web of human movement and contact, these models capture the fundamental spatial dynamics that drive epidemic waves.

What makes this approach particularly valuable is its flexibility and realism. Unlike traditional spatial models that assume smooth geographic gradients, graph forecasters respect the actual topology of human networks—whether that’s the hub-and-spoke structure of air travel, the local clustering of neighborhoods, or the long-distance connections of social media.

However, this power requires careful implementation. The quality of graph forecasters depends critically on the accuracy of the underlying network structure. A poorly constructed adjacency matrix will lead to poor predictions, regardless of model sophistication. The most successful applications combine rich connectivity data (from mobility, transportation, or social networks) with robust machine learning architectures.

Whether you’re tracking seasonal influenza across states, monitoring dengue in urban neighborhoods, or forecasting emerging pathogens across international borders, graph forecasters provide your ML Epidemics Toolbox with a powerful framework for understanding how diseases truly spread—not just through space, but through the networks that connect us all.

📚 References

[1] Yu, B., Yin, H., & Zhu, Z. (2018). Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. Proceedings of the 27th International Joint Conference on Artificial Intelligence, 3634–3640. https://doi.org/10.24963/ijcai.2018/505

[2] Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. International Conference on Learning Representations. https://arxiv.org/abs/1609.02907

[3] Shah, S., & Rodriguez, A. (2021). Spatiotemporal graph neural networks for epidemic forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, 35(1), 542–549. https://doi.org/10.1609/aaai.v35i1.16123

[4] Li, Y., Yu, R., Shahabi, C., & Liu, Y. (2018). Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. International Conference on Learning Representations. https://arxiv.org/abs/1707.08908

[5] Wesolowski, A., Buckee, C. O., Engø-Monsen, K., & Metcalf, C. J. E. (2016). Connecting mobility to infectious diseases: The promise and limits of mobile phone data. Journal of Infectious Diseases, 214(suppl_4), S414–S420. https://doi.org/10.1093/infdis/jiw413

[6] Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., & Bengio, Y. (2018). Graph attention networks. International Conference on Learning Representations. https://arxiv.org/abs/1710.10903

[7] Sankar, A., Wang, Y., & Chang, C. C. (2020). Dynamic graph representation learning for video domain adaptation. International Conference on Learning Representations. https://arxiv.org/abs/1910.11400

[8] Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., & Yu, P. S. (2021). A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 32(1), 4–24. https://doi.org/10.1109/TNNLS.2020.2978386

[9] Salinas, D., Flunkert, V., Gasthaus, J., & Januschowski, T. (2020). DeepAR: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3), 1181–1191. https://doi.org/10.1016/j.ijforecast.2019.07.001

[10] Zhang, D., Yin, J., Zhu, X., & Zhang, C. (2020). Heterogeneous graph neural network. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 793–803. https://doi.org/10.1145/3292500.3330961

[11] Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378, 686–707. https://doi.org/10.1016/j.jcp.2018.10.045