ETC3550 Cheatsheet by Miles Munro

Fundamentals & Time Series Decomposition

Forecasting Principles

Forecast Error: Difference between observed and predicted values.

Residuals: Forecast errors for the training data.

Goals: Minimize forecast error, avoid bias, and capture patterns.

Bias: Systematic difference between forecasts and actual values. Indicated by non-zero mean error.

Forecast Distribution: Use simulations of the future to develop forecast distributions

Overfitting: Fitting model too closely to the training data, results in poor performance on new data.

Underfitting: Model is too simple to capture patterns in the data

Time Series Components

Additive Decomposition: Data = Trend + Seasonal + Random	Multiplicative Decomposition: Data = Trend * Seasonal * Random
Trend: Long-term direction of the series.	Seasonal: Regular, predictable variations that recur over a fixed period.
Cyclic: Fluctuations around the trend, usually over a longer period than seasonality.	Random: Irregular, unpredictable variations.
Classical Decomposition: Method for decomposing a time series into its components (trend, seasonal, and irregular).	STL Decomposition: Versatile and robust method for decomposing time series data, handling both additive and multiplicative seasonality, as well as complex seasonal patterns.

Time Series Plots

Seasonal Plot: Data are grouped by season (e.g., months or quarters) and plotted to highlight seasonal patterns, showing how the series varies within each season.

Time Plot: Time series data are plotted against time, revealing trends, seasonality, and cyclical patterns over time.

Scatter Plot: Data points are plotted as individual points to visualize the relationship between two variables, such as the series and its lagged values, helping to identify autocorrelation and patterns.

Autocorrelation Function (ACF): Measures the correlation between a time series and its lagged values, revealing the strength and significance of autocorrelation at different lags.

Partial Autocorrelation Function (PACF): Measures the correlation between a time series and its lagged values after removing the effects of intermediate lags, isolating the direct relationship between the series and each lag.

Simple Forecasting Methods

Basic Methods

Average Method: Forecast all future values using the average of historical data.

Formula: \hat{y}_{t+h|t} = \bar{y} = (y_1 + y_2 + ... + y_T) / T

Naive Method: Forecast equals the last observed value.

Formula: \hat{y}_{t+h|t} = y_t

Seasonal Naive Method: Forecast equals the last observed value from the same season.

Formula: \hat{y}_{t+h|t} = y_{t+h-m(k+1)}, where m is the seasonal period and k is the integer part of (h-1)/m

Drift Method: Forecast is the last value plus an average change over time.

Formula: \hat{y}_{t+h|t} = y_t + h \frac{y_T - y_1}{T-1}

Residual Diagnostics

Assumptions about residuals: Uncorrelated, mean zero, constant variance, normally distributed.

Plots: Time series plot, histogram, ACF plot.

Tests: Ljung-Box test.

Ljung-Box Test: Tests whether a group of autocorrelations of a time series are different from zero.

Q^* = T(T+2) \sum_{k=1}^h r_k^2(e)

Exponential Smoothing Methods

Simple Exponential Smoothing (SES)

Suitable for: Data with no trend or seasonality.

Formula: \hat{y}_{t+1|t} = \alpha y_t + (1 - \alpha) \hat{y}_{t|t-1}

\alpha: Smoothing constant (0 < \alpha < 1). Higher values give more weight to recent observations.

Initialization: \hat{y}_{1|0} can be set to y_1 or the average of the first few observations.

Holt's Linear Trend Method

Suitable for: Data with a trend but no seasonality.

Equations:
Level: \ell_t = \alpha y_t + (1 - \alpha) (\ell_{t-1} + b_{t-1})
Trend: b_t = \beta^* (\ell_t - \ell_{t-1}) + (1 - \beta^*) b_{t-1}
Forecast: \hat{y}_{t+h|t} = \ell_t + h b_t

\alpha: Smoothing constant for the level.
\beta^*: Smoothing constant for the trend.

Initialization: \ell_0 and b_0 can be estimated using linear regression on the historical data.

Damped Trend Methods

Damped Trend Methods: Similar to Holt’s method, but the trend is damped over time.

Formula: \hat{y}_{t+h|t} = \ell_t + (\phi + \phi^2 + ... + \phi^h)b_t
Level: \ell_t = \alpha y_t + (1-\alpha)(\ell_{t-1} + \phi b_{t-1})
Trend: b_t = \beta^*(\ell_t - \ell_{t-1}) + (1 - \beta^*)\phi b_{t-1}

\phi: Damping parameter (0 < \phi < 1). As h increases, the forecast approaches \ell_T + \frac{\phi}{1-\phi} b_T.

Holt-Winters' Seasonal Method

Suitable for: Data with both trend and seasonality. Can be additive or multiplicative.

Additive: \hat{y}_{t+h|t} = \ell_t + hb_t + s_{t+h-m(k+1)}
Level: \ell_t = \alpha(y_t - s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1})
Trend: b_t = \beta^*(\ell_t - \ell_{t-1}) + (1 - \beta^*)b_{t-1}
Seasonal: s_t = \gamma (y_t - \ell_{t-1} - b_{t-1}) + (1-\gamma)s_{t-m}

Multiplicative: \hat{y}_{t+h|t} = (\ell_t + hb_t)s_{t+h-m(k+1)}
Level: \ell_t = \alpha(y_t / s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1})
Trend: b_t = \beta^*(\ell_t - \ell_{t-1}) + (1 - \beta^*)b_{t-1}
Seasonal: s_t = \gamma (y_t / (\ell_{t-1} + b_{t-1})) + (1-\gamma)s_{t-m}

Parameters: \alpha, \beta^*, \gamma are smoothing constants (0 < \alpha, \beta^*, \gamma < 1).

Initialization: Requires initial estimates for level, trend, and seasonal components.

ARIMA Models

ARIMA Model Components

AR(p): Autoregressive model of order p. Uses past values to predict future values.

MA(q): Moving average model of order q. Uses past forecast errors to predict future values.

I(d): Integrated component of order d. Represents the number of differences required to make the time series stationary.

Stationarity: A time series is stationary if its statistical properties (mean, variance, autocorrelation) do not change over time.

Differencing: Used to make a time series stationary. First difference: y'_t = y_t - y_{t-1}. Seasonal difference: y'_t = y_t - y_{t-m}

Model Selection

ACF and PACF Plots: Used to identify the order of AR and MA components.

Information Criteria: AIC, AICc, BIC. Lower values indicate better model fit.

AIC (Akaike Information Criterion): AIC = -2 \log(L) + 2k, where L is the likelihood and k is the number of parameters.

AICc (Corrected AIC): AICc = AIC + \frac{2k(k+1)}{T-k-1}, where T is the number of observations.

BIC (Bayesian Information Criterion): BIC = -2 \log(L) + k \log(T)

ARIMA Model Equations

AR(p) Model: y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + ... + \phi_p y_{t-p} + e_t

MA(q) Model: y_t = c + e_t + \theta_1 e_{t-1} + \theta_2 e_{t-2} + ... + \theta_q e_{t-q}

ARIMA(p,d,q) Model: Combines AR(p), I(d), and MA(q) components. Requires differencing the series d times to achieve stationarity.