Theoretical Background
Assume that the function of interest \(y\) is a real-valued function and is twice differentiable at \(z=x\) for all
\(x\), then there exists a linear approximation at this point. The second degree Taylor expansion of the function \(y\)
centered around \(z\) in a neighborhood of \(x\) is given by:
\[\begin{equation}
y(z) \approx y(x) + \frac{\partial y(x)}{\partial X}
+ \frac{1}{2} \frac{\partial^2 y(x)}{\partial X^2} (z-x)^2
\end{equation}\]
This representation is a linear combination and can be reformulated as a linear model, in order to solve it with
Least Squared Regression.
Translating the representation of the Taylor expansion to the Linear Model leads to:
\[\begin{split}\begin{align*}
y_i &= \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \varepsilon_i\;\;\;\;\;
\rightarrow \;\; y = \mathbf{X} \beta + \varepsilon \;\;\;\;\; \text{(matrix notation)}\\
\text{where}\\
\mathbf{X} &= \begin{pmatrix}
1 & (X_1 - x) & (X_1 - x)^2 \\
1 & (X_2 - x) & (X_2 - x)^2 \\
\vdots & \vdots & \vdots \\
1 & (X_n - x) & (X_n - x)^2 \\
\end{pmatrix} , \;\;
y = \begin{pmatrix}
y_1 \\
y_2 \\
\vdots \\
y_n \\
\end{pmatrix} , \;\;
\beta = \begin{pmatrix}
\beta_0 \\
\beta_1 \\
\beta_2 \\
\end{pmatrix}
= \begin{pmatrix}
y(x)\\
\frac{\partial y(x)}{\partial X}\\
\frac{\partial^2 y(x)}{\partial X^2}\\
\end{pmatrix}\\
\end{align*}\end{split}\]
Where \(y\) is the target (observed values \(y_i, i = 1, \ldots, n\)),
the \((z-x)\)-terms are the regressors (\(n \times 3\) dimensional matrix \(\mathbf{X}\), containing the explanatory variable \(X_i\)) and the
approximated fit and its derivatives will be found in the vector of coefficients \(\beta\).
\[\begin{equation}
\widetilde{\beta}(x) = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top y
\end{equation}\]
In order to achieve an estimation that is dependant on the neighborhood of \(x\), a Kernel is added to the least-squares minimization problem.
Unlike in ordinary least suqares regression and linear regression, the errors covariance matrix is therefore different from the identity matrix \(\mathbb{1}\), but is a diagonal matrix of the Kernel:
\[\begin{equation}
\mathbf{W} = diag\{Ker(X_i-x)\}
\end{equation}\]
The minimization problem is solved by the weighted least squares estimator \(\hat{\beta}\):
\[\begin{equation}
\hat{\beta}(x) = (\mathbf{X}^\top \mathbf{W} \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{W}y
\end{equation}\]
The combination of these three key components lead the the term Local Polynomial Regression
(local: Weights/Kernel, polynomial: Taylor expansion, where the function is represented in polynomials, regression: Least Squared Regression).
Resources: