Tuning¶
- class nimble.Tuning(selection='consecutive', validation='cross validation', performanceFunction=None, loops=1, order=None, maxIterations=100, timeout=None, threshold=None, learnerArgsFunc=None, initRandom=5, randomizeAxisFactor=0.75, folds=5, foldFeature=None, validateX=None, validateY=None, proportion=0.2)¶
Define the method to identify the best values to train the learner.
This object is passed to the
tuning
parameter in Nimble’s training functions to provide a protocol for evaluating multiple arguments to determine the best argument set for the model. To make this determination, the training function must know: 1) the sets of arguments to test (amongst all possible combinations) 2) how to evaluate the performance of each set Multiple arguments are specified by providingTune
objects as learner arguments for the training function.The
selection
parameter identifies which argument sets to try and accepts the following:“brute force” : Try every possible combination
“consecutive” : Optimize one argument at a time, holding the others constant. Optionally, multiple
loops
can occur and theorder
that the parameters are tuned can be defined.“bayesian” : Apply a bayesian algorithm to the argument space. Note: When there is a correlation between an argument value and loss, the
Tune
objects provided should provide a linear or exponential range of values. This allows all values in that space to be sampled, otherwise only the provided values will be sampled and assumed to have no correlation with loss.“iterative” : Beginning with the middle value of the sorted arguments, tries the higher and lower values (holding others constant) then applies the best (higher, lower, or same) argument on the next iteration. Note: This requires arguments to be numeric and assumes there is a correlation between the values and the performance.
“storm” : Apply a stochastic random mutator to the argument space. Randomly selects argument sets to begin, then starts to optimize the best performing set, while selecting random values at some given probability to avoid local optima. Note: For ordered numeric values, this assumes there is a correlation between the values and the performance.
The
validation
parameter identifies how the performance will be evaluated and accepts the following:- Cross Validations:
“cross validation” : perform k-fold cross-validation with the training data. The number of folds can be set using the
folds
parameter.“leave one out” : A k-fold cross-validation where the number of folds is equal to the number of points in the training data.
“leave one group out” : The folds are determined by a feature in the data. This requires a
foldFeature
.
- Holdout Validations:
“proportion” : A random proportion of the training data is held out. Requires the
proportion
parameter. As a shortcut,validation
can be set directly to a float value to trigger this validation.“data” : Provide the data to use for validation. These are passed as the
validateX
andvalidateY
parameters.
- Parameters:
selection (str) – How the next argument set will be chosen. Accepts “brute force”, “consecutive”, “bayesian” and “iterative”.
validation (str, float) – How each argument set will be validated. Accepts “cross validation”, “leave one out”, “leave one group out”, “data”, and “proportion” as strings or a float between 0 and 1 will also trigger “proportion” validation. See above for descriptions of each validation.
performanceFunction (function, None) – The function that will be used to validate the performance of each argument set. If None, the performance function provided to the training function will be applied.
loops (int) – Applies when
selection
is “consecutive”. For more than one loop, the values for the arguments not being optimized will be set to the optimal values from the previous loop.order (list) – Applies when
selection
is “consecutive”. A list of argument names defining the order to use when tuning.maxIterations (int) – Applies when
selection
is “bayesian”, “iterative”, or “storm”. The maximum number of times iterate through the argument selection process. Default is 100.timeout (int, None) – Applies when
selection
is “bayesian”, “iterative”, or “storm”. The maximum number of seconds to perform the argument selection process.threshold (float, None) – Applies when
selection
is “bayesian”, “iterative”, or “storm”. Stop the argument selection process if the performance is better than or equal to the threshold.learnerArgsFunc (function, None) – Applies when the
selection
is “storm”. A function defining how to build the model with variable hyperparameters. Takes the form: learnerArgsFunc(hyperparameters) where hyperparameters will be a HyperParameters instance from storm_tuner and must return a dictionary to use as the arguments parameter for nimble.train.initRandom (int) – Applies when the
selection
is “storm”. The number of initial iterations to perform a random search. Recommended value is between 3 and 8.randomizeAxisFactor (float) – Applies when the
selection
is “storm”. Controls the tradeoff between explorative and exploitative selection. Values closer to 1 are likely to generate more mutations, while values closer to 0 are more likely to only perform a single mutation during each step.folds (int) – Applies when
validation
is “cross validation”. Default is 5.foldFeature (identifier) – Applies when
validation
is “leave one group out”. The folds for cross validation will be created by grouping the data by this feature.validateX (nimble data object) – Applies when
validation
is “data”. The validation set to use. Can contain the validateY data.validateY (nimble data object, identifier) – Applies when
validation
is “data”. Either an object of labels for the validation set, or the name or index of the labels invalidateX
.proportion (float) – Applies when
validation
is “proportion”. A value between 0 and 1 indicating the random proportion of the training data to holdout for validation. A float value can also be passed directly tovalidation
to trigger this same validation.
See also
Attributes
Get the argument set for each validation that has been run.
Get the results of each validation that has been run.
The arguments that provided the best performance.
The score of the best performance.
If a cross-validation was used, get the fold-by fold results.
Methods
copy
()A new Tuner with attributes based on the latest tuning.
tune
(learnerName, trainX, trainY, arguments, ...)Run validation on each argument set.