Cross-Validation / Fitting Kernel Functions to Data

Example model

Description: When fitting a function to data, if you have too many free parameters relative to the number of points in your data set, you may "overfit" the data. When this happens, the fit to your training data may be very good, but the fit to new data points (beyond those used for training) may be very poor.

Cross-validation is a common technique to deal with this problem: We set aside a fraction of the available data as a cross-validation set. Then we begin by fitting very simple functions to the data (with few free parameters), successively increasing the number of free parameters, and seeing how the predictive performance changes on the cross-validation set. It is typical to see improvement on the cross-validation set for a while, followed by a deterioration of predictive performance on the cross-validation set once overfitting starts occurring.

This example model successively fits a non-linear kernel function to the residual error, and uses cross-validation to determine how many kernel functions should be used.

Requires Analytica Optimizer: The kernel fitting function (Kern_Fit) uses NlpDefine.

Keywords: Cross-validation, overfitting, non-linear kernel functions

Author: Lonnie Chrisman

Download: Cross-validation example.ana