Tuning

class nimble.Tuning(selection='consecutive', validation='cross validation', performanceFunction=None, loops=1, order=None, maxIterations=100, timeout=None, threshold=None, learnerArgsFunc=None, initRandom=5, randomizeAxisFactor=0.75, folds=5, foldFeature=None, validateX=None, validateY=None, proportion=0.2)

Define the method to identify the best values to train the learner.

This object is passed to the tuning parameter in Nimble’s training functions to provide a protocol for evaluating multiple arguments to determine the best argument set for the model. To make this determination, the training function must know: 1) the sets of arguments to test (amongst all possible combinations) 2) how to evaluate the performance of each set Multiple arguments are specified by providing Tune objects as learner arguments for the training function.

The selection parameter identifies which argument sets to try and accepts the following:

  • “brute force” : Try every possible combination

  • “consecutive” : Optimize one argument at a time, holding the others constant. Optionally, multiple loops can occur and the order that the parameters are tuned can be defined.

  • “bayesian” : Apply a bayesian algorithm to the argument space. Note: When there is a correlation between an argument value and loss, the Tune objects provided should provide a linear or exponential range of values. This allows all values in that space to be sampled, otherwise only the provided values will be sampled and assumed to have no correlation with loss.

  • “iterative” : Beginning with the middle value of the sorted arguments, tries the higher and lower values (holding others constant) then applies the best (higher, lower, or same) argument on the next iteration. Note: This requires arguments to be numeric and assumes there is a correlation between the values and the performance.

  • “storm” : Apply a stochastic random mutator to the argument space. Randomly selects argument sets to begin, then starts to optimize the best performing set, while selecting random values at some given probability to avoid local optima. Note: For ordered numeric values, this assumes there is a correlation between the values and the performance.

The validation parameter identifies how the performance will be evaluated and accepts the following:

  • Cross Validations:
    • “cross validation” : perform k-fold cross-validation with the training data. The number of folds can be set using the folds parameter.

    • “leave one out” : A k-fold cross-validation where the number of folds is equal to the number of points in the training data.

    • “leave one group out” : The folds are determined by a feature in the data. This requires a foldFeature.

  • Holdout Validations:
    • “proportion” : A random proportion of the training data is held out. Requires the proportion parameter. As a shortcut, validation can be set directly to a float value to trigger this validation.

    • “data” : Provide the data to use for validation. These are passed as the validateX and validateY parameters.

Parameters:
  • selection (str) – How the next argument set will be chosen. Accepts “brute force”, “consecutive”, “bayesian” and “iterative”.

  • validation (str, float) – How each argument set will be validated. Accepts “cross validation”, “leave one out”, “leave one group out”, “data”, and “proportion” as strings or a float between 0 and 1 will also trigger “proportion” validation. See above for descriptions of each validation.

  • performanceFunction (function, None) – The function that will be used to validate the performance of each argument set. If None, the performance function provided to the training function will be applied.

  • loops (int) – Applies when selection is “consecutive”. For more than one loop, the values for the arguments not being optimized will be set to the optimal values from the previous loop.

  • order (list) – Applies when selection is “consecutive”. A list of argument names defining the order to use when tuning.

  • maxIterations (int) – Applies when selection is “bayesian”, “iterative”, or “storm”. The maximum number of times iterate through the argument selection process. Default is 100.

  • timeout (int, None) – Applies when selection is “bayesian”, “iterative”, or “storm”. The maximum number of seconds to perform the argument selection process.

  • threshold (float, None) – Applies when selection is “bayesian”, “iterative”, or “storm”. Stop the argument selection process if the performance is better than or equal to the threshold.

  • learnerArgsFunc (function, None) – Applies when the selection is “storm”. A function defining how to build the model with variable hyperparameters. Takes the form: learnerArgsFunc(hyperparameters) where hyperparameters will be a HyperParameters instance from storm_tuner and must return a dictionary to use as the arguments parameter for nimble.train.

  • initRandom (int) – Applies when the selection is “storm”. The number of initial iterations to perform a random search. Recommended value is between 3 and 8.

  • randomizeAxisFactor (float) – Applies when the selection is “storm”. Controls the tradeoff between explorative and exploitative selection. Values closer to 1 are likely to generate more mutations, while values closer to 0 are more likely to only perform a single mutation during each step.

  • folds (int) – Applies when validation is “cross validation”. Default is 5.

  • foldFeature (identifier) – Applies when validation is “leave one group out”. The folds for cross validation will be created by grouping the data by this feature.

  • validateX (nimble data object) – Applies when validation is “data”. The validation set to use. Can contain the validateY data.

  • validateY (nimble data object, identifier) – Applies when validation is “data”. Either an object of labels for the validation set, or the name or index of the labels in validateX.

  • proportion (float) – Applies when validation is “proportion”. A value between 0 and 1 indicating the random proportion of the training data to holdout for validation. A float value can also be passed directly to validation to trigger this same validation.

See also

Tune, nimble.train

Attributes

allArguments

Get the argument set for each validation that has been run.

allResults

Get the results of each validation that has been run.

bestArguments

The arguments that provided the best performance.

bestResult

The score of the best performance.

deepResults

If a cross-validation was used, get the fold-by fold results.

Methods

copy()

A new Tuner with attributes based on the latest tuning.

tune(learnerName, trainX, trainY, arguments, ...)

Run validation on each argument set.