localpoly package

Submodules

localpoly.base module

class localpoly.base.LocalPolynomialRegression(X, y, h, kernel='gaussian', gridsize=100)[source]

Bases: object

Local polynomial regression.

LocalPolynomialRegression fits a polynomial of degree 3 in to the sourrounding of each point. The surrounding is realized by a kernel with bandwidth h. The regression returns the fit, as well as its first and second derivative.

Parameters
  • X – X-values of data that is to be fitted (explanatory variable)

  • y – y-values of data that is to be fitted (observations)

  • h – bandwidth for the kernel

  • gridsize – desired size of the fit (granularity)

  • kernel_str – the name of the kernel as a string “gaussian”

fit(prediction_interval)[source]

Fit the Local Polynomial Regression model for the prediction interval.

Parameters

prediction_interval (tuple) – interval for which the prediction is calculated

Returns

Results of the fit. The estimated function (fit) in the prediction interval (X) and its first and second derivative:

{
    'X' : X_domain,    # prediction interval of fit
    'fit': fit,        # fit of the function at point x
    'first': first,    # first derivative at point x
    'second': second,  # second derivative at point x
}

Return type

dict

localpoly(x)[source]

Calculates estimate for position x via Local Polynomial Regression.

The usage of Local Polynomial Regression allows to not only calculate the estimate, but also its first and second derivative in this point. Data (X, y) and regression settings (kernel, h) are saved in self.

Parameters

x (float) – Position for which to calculate the estimated value.

Returns

Results of regression. The estimated value for point x, its first and second derivative in this point and the weight vector of the influence of the surrounding points.:

{"fit": beta[0], "first": beta[1], "second": beta[2], "weight": W_hi}

Return type

dict

class localpoly.base.LocalPolynomialRegressionCV(X, y, kernel='gaussian', gridsize=100, n_sections=10, loss='MSE', sampling='random')[source]

Bases: localpoly.base.LocalPolynomialRegression

Bandwidth Selection via Cross Validation for Local Polynomial Regression.

LocalPolynomialRegressionCV performs the parameter optimization for LocalPolynomialRegression. The optimal Bandwidth highly depends on the data (X, y) and the kernel.

Parameters
  • X (np.array) – X-values of data that is to be fitted (explanatory variable)

  • y (np.array) – y-values of data that is to be fitted (observations)

  • kernel (str, optional) – Name of the kernel. Defaults to “gaussian”.

  • gridsize (int, optional) – Desired size of the fit - granularity. Defaults to 100.

  • n_sections (int, optional) – Amount of sections to devide the dataset in cross validation (k-folds). Defaults to 10.

  • loss (str, optional) – Loss function for optimization. Defaults to “MSE”.

  • sampling (str, optional) – Whether the dataset should be partitioned “random” or as “slicing”. Defaults to “random”.

prediction_interval

Interval in which to calculate the estimates, automatically set to (X.min(), X.max())

bandwidth_cv(coarse_list_of_bandwidths)[source]

Cross Validation for Bandwidth optimization.

The CV Routine is performed twice. First, for a coarse_list_of_bandwidths, then on a finer grid which spans around the first optimal value, fine_list_of_bandwidths.

Parameters

coarse_list_of_bandwidths (list) – coarse list of bandwidths, it is suggested to give values around the Silverman bandwidth

Returns

fine results and coarse results of bandwidth search:

{
    "fine results": {
        "bandwidths": fine_list_of_bandwidths,
        "MSE": # mean squared errors for bandwidths,
        "h": # optimal bandwidth within fine_list_of_bandwidths,
    },
    "coarse results": {
        # ... same as above but with coarse_list_of_bandwidths
    },
}

Return type

dict

Module contents