Theoretical Background

Assume that the function of interest \(y\) is a real-valued function and is twice differentiable at \(z=x\) for all \(x\), then there exists a linear approximation at this point. The second degree Taylor expansion of the function \(y\) centered around \(z\) in a neighborhood of \(x\) is given by:

\[\begin{equation} y(z) \approx y(x) + \frac{\partial y(x)}{\partial X} + \frac{1}{2} \frac{\partial^2 y(x)}{\partial X^2} (z-x)^2 \end{equation}\]

This representation is a linear combination and can be reformulated as a linear model, in order to solve it with Least Squared Regression. Translating the representation of the Taylor expansion to the Linear Model leads to:

\[\begin{split}\begin{align*} y_i &= \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \varepsilon_i\;\;\;\;\; \rightarrow \;\; y = \mathbf{X} \beta + \varepsilon \;\;\;\;\; \text{(matrix notation)}\\ \text{where}\\ \mathbf{X} &= \begin{pmatrix} 1 & (X_1 - x) & (X_1 - x)^2 \\ 1 & (X_2 - x) & (X_2 - x)^2 \\ \vdots & \vdots & \vdots \\ 1 & (X_n - x) & (X_n - x)^2 \\ \end{pmatrix} , \;\; y = \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \\ \end{pmatrix} , \;\; \beta = \begin{pmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \end{pmatrix} = \begin{pmatrix} y(x)\\ \frac{\partial y(x)}{\partial X}\\ \frac{\partial^2 y(x)}{\partial X^2}\\ \end{pmatrix}\\ \end{align*}\end{split}\]

Where \(y\) is the target (observed values \(y_i, i = 1, \ldots, n\)), the \((z-x)\)-terms are the regressors (\(n \times 3\) dimensional matrix \(\mathbf{X}\), containing the explanatory variable \(X_i\)) and the approximated fit and its derivatives will be found in the vector of coefficients \(\beta\).

\[\begin{equation} \widetilde{\beta}(x) = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top y \end{equation}\]
In order to achieve an estimation that is dependant on the neighborhood of \(x\), a Kernel is added to the least-squares minimization problem.
Unlike in ordinary least suqares regression and linear regression, the errors covariance matrix is therefore different from the identity matrix \(\mathbb{1}\), but is a diagonal matrix of the Kernel:
\[\begin{equation} \mathbf{W} = diag\{Ker(X_i-x)\} \end{equation}\]

The minimization problem is solved by the weighted least squares estimator \(\hat{\beta}\):

\[\begin{equation} \hat{\beta}(x) = (\mathbf{X}^\top \mathbf{W} \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{W}y \end{equation}\]

The combination of these three key components lead the the term Local Polynomial Regression (local: Weights/Kernel, polynomial: Taylor expansion, where the function is represented in polynomials, regression: Least Squared Regression).

Resources:

1

Nonparametric and Semiparametric Models - Chapter: Nonparametric Regression - W. K. Haerdle - Springer Berlin Heidelberg - 2004