State space of a rotating panels error structure
Introduction
The following concepts are required to define a rotating panel:
Concept | Definition | Symbol |
---|---|---|
Base frequency | Frequency of the model latent variables (e.g. months, quarters) | $t$ |
Survey period | Number of periods (in the base frequency) during which the survey continiously takes place | $lag$ |
Stint | Division of the survey period (e.g. week). Each individual is surveyed always at the same stint of the survey period | $t_j$ |
Wave | The $i$th wave comprises a group of individuals that have been surveyed for the $i$th time | waves $1,\ldots,i,\ldots, W $ |
Persistence | The total number of waves or, equivalently, the number of consecutive survey periods that a first wave individual will remain in the panel | $W $ |
Wave-specific error | For a given wave, the difference between the average response and the actual perception (subset of the wave corresponding to base period $t$) | $e^{(i)}_t$ |
Bias term | Part of the wave-specific error that accumulates over the survey period (it is assumed to be zero on average for all waves) | $b^{(i)}_t$ |
Correlation term | Part of the wave-specific error at time $t$ that is independent from the bias component, and may be correlated with the error at time $t-lag$ | $\epsilon^{(i)}_t$ |
The model presented here is complatible with the measurement error structure arising from rotating panels such as the Labour Force Survey in the UK.
We assume the aggregate response of individuals belonging to wave ($i$) at time $t$, i.e. $y^{(i)}_{t}$, has an errors in variables structure:
\(y^{(i)}_{t} = Y_{t} + e^{(i)}_{t}\),
where \(Y_{t}\) is a variable representing the true value of the economic concept for which the group of individuals $i$ has been surveyed. Thus, orthogonal to this variable we have \(e^{(i)}_{t}\), which is considered to be a measurement error. This is given by the fact that only a small fraction of the population is answering to the survey. However, we will assume that \(e^{(i)}_{t}\) has a particular structure that is given by the rotating survey design:
\[e^{(i)}_{t} = b^{(i)}_{t} + \varepsilon^{(i)}_{t}\]Bias component
The first term follows a random walk process and it represents bias, although the sum of the bias terms accross waves is equal to zero:
\(b^{(i)}_{t} = b^{(i)}_{t-1} + w^{(i)}_{t}\),
where \(w^{(i)}_{t} \sim N\left(0, \sigma^2_{i} \right)\).
Most of the times, this component cannot be identified together with additional trend components. Thus, one identification restriction is to assume that the bias terms for all weights is equal to zero (cite example):
\[b^{(1)}_{t} = -\sum_{i=2}^{W} b^{(i)}_{t}\]Alternatively, one could assume that there is no bias in the first wave (cite example).
Autocorrelation component
The second term represents an autocorrelated wave specific survey error. Given that each wave $i$ for time $t$ comprises a group of individuals that has been surveyed for the $ith$ time, and the same group of individuals responded during the previous survey period (i.e. $t-nlags$) for the $i-1$th time, a correlation pattern may arise when the responses are not updated efficiently. Thus, the error in wave $i$ for time $t$ may be correlated with the error in wave $i-1$ for time $t-nlags$, for $i>1$ (i.e. the first response of the individuals may be biased, but it is not correlated with previous responses, simply because it is the very first time they are asked to respond):
\(\varepsilon^{(i)}_{t} = \phi^{(i)}_{1} \varepsilon^{(i-1)}_{t-nlags} + \epsilon^{(i)}_{t}\),
where \(\epsilon^{(i)}_{t} \sim N\left(0, (1-(\phi^{(i)}_{1})^{2})(k^{(i)})^{2} \right)\). Note that $\phi^{(i)}_{1}=0$ for $i=1$.
The autocorrelated wage specific errors are driven by a proportion of individuals that do not update their responses on the basis of new information, so in principle it is possible that first wave individuals’ responses persist until the last wave \(W\). However, it is typically assumed that the error in the first response is correlated to the error in the second response, but uncorrelated to the error in the third response.
A representation of all possible correlation patterns in a panel defined by $ W=5 $ and $ nlags=3 $ follows in this table:
Input Correlation Matrix with $ W=5 $ and $ nlags=3 $ | $\varepsilon^{(1)}_{t}$ | $\varepsilon^{(2)}_{t}$ | $\varepsilon^{(3)}_{t}$ | $\varepsilon^{(4)}_{t}$ | $\varepsilon^{(5)}_{t}$ | |||
---|---|---|---|---|---|---|---|---|
$\varepsilon^{(1)}_{t-3}$ | $\varepsilon^{(2)}_{t-3}$ | $\varepsilon^{(3)}_{t-3}$ | $\varepsilon^{(4)}_{t-3}$ | 0 | $\phi^{2}_{1}$ | $\phi^{3}_{1}$ | $\phi^{4}_{1}$ | $\phi^{5}_{1}$ |
$\varepsilon^{(1)}_{t-3\times 2}$ | $\varepsilon^{(2)}_{t-3\times 2}$ | $\varepsilon^{(3)}_{t-3\times 2}$ | 0 | 0 | $\phi^{3}_{2}$ | $\phi^{4}_{2}$ | $\phi^{5}_{2}$ | |
$\varepsilon^{(1)}_{t-3\times 3}$ | $\varepsilon^{(2)}_{t-3\times 3}$ | 0 | 0 | 0 | $\phi^{4}_{3}$ | $\phi^{5}_{3}$ | ||
$\varepsilon^{(1)}_{t-3\times 4}$ | 0 | 0 | 0 | 0 | $\phi^{5}_{1}$ |
If we assume that the error in the first response is correlated to the error in the second response and third responses, the coefficients in green and blue will be zero. The resulting representation would then have the following form:
\[\begin{eqnarray} \varepsilon^{(i)}_{t} = \phi^{(i)}_{1} \varepsilon^{(i-1)}_{t-nlags} + {\color{red}\phi^{(i)}_{2}} \varepsilon^{(i-2)}_{t- nlags\times 2} + \epsilon^{(i)}_{t}, \end{eqnarray}\]with \(\phi^{(i)}_{1}=0\) for \(i=1\) and \(\phi^{(i)}_{2}=0\) for \(i=1,2\).
References
- HARVEY, A. and C.H. Chung(2000), Estimating the underlying change in unemployment in the UK, Journal of the Royal Statistical Society: Series A, Volume 163, p. 303-339
- van den BRAKEL, J. and S. Krieg (2009), Structural time series modelling of the monthly unemployment rate in a rotating panel] , Discussion Paper, 09031
- ELLIOTT, D. (2017), Increasing frequency and improving timeliness of key variables in the UK Labour Force Survey, University of Southampton