nimble.trainAndTestOnTrainingData

nimble.trainAndTestOnTrainingData(learnerName, performanceFunction, trainX, trainY, crossValidationFolds=None, arguments=None, multiClassStrategy=None, randomSeed=None, tuning=None, *, useLog=None, **kwarguments)

Train a model using the train data and get the performance results.

trainAndTestOnTrainingData is the function for doing learner creation and evaluation in a single step with only a single data set (no withheld testing set). By default, this will calculate training error for the learner trained on that data set. However, cross validation error can instead be calculated by setting the parameter crossValidationFolds to an integer. In that case, we will partition the training set into that number of folds, and iteratively withhold each single fold to be used as the testing set of the learner trained on the rest of the data.

Parameters:
  • learnerName (str) – The learner to be called. This can be a string in the form ‘package.learner’ or the learner class object.

  • performanceFunction (function) – The function used to determine the performance of the learner. Pre-made functions are available in nimble.calculate. If hyperparameter tuning and the Tuning instance does not have a set performanceFunction, it will utilize this function as well.

  • trainX (nimble Base object) – Data to be used for training.

  • trainY (identifier, nimble Base object) – A name or index of the feature in trainX containing the labels or another nimble Base object containing the labels that correspond to trainX.

  • crossValidationFolds (int, None) – When None, the learner is trained on the training data and then tested using the same data. When an integer, the training data is partitioned into that number of folds. Each fold is iteratively withheld and used as the testing set for a learner trained on the combination of all of the non-withheld data. The performance results for each of those tests are then averaged together to act as the return value.

  • arguments (dict) – Mapping argument names (strings) to their values, to be used during training and application (e.g., {‘dimensions’:5, ‘k’:5}). Multiple values for arguments can be provided by using a Tune object (e.g., {‘k’: Tune([3, 5, 7])}) to initiate hyperparameter tuning and return the learner trained on the best set of arguments. To provide an argument that is an object from the same package as the learner, use a nimble.Init object with the object name and its instantiation arguments (e.g., {‘optimizer’: nimble.Init(‘SGD’, learning_rate=0.01}). Note: learner arguments can also be passed as kwarguments so this dictionary will be merged with any keyword arguments.

  • multiClassStrategy (str, None) – May be ‘OneVsAll’ or ‘OneVsOne’ to train the learner using that multiclass strategy. When None, the learner is trained on the data as provided.

  • randomSeed (int) – Set a random seed for the operation. When None, the randomness is controlled by Nimble’s random seed. Ignored if learner does not depend on randomness.

  • tuning (nimble.Tuning, performanceFunction, None) – Applies when hyperparameter tuning is initiated by Tune objects in the arguments. A Tuning instance details how the argument sets will be selected and validated. For convenience, a performanceFunction may instead be provided or None will provide the performanceFunction from this function and this will trigger construction of a Tuning instance using the default consecutive selection method with 5-fold cross validation.

  • useLog (bool, None) – Local control for whether to send results/timing to the logger. If None (default), use the value as specified in the “logger” “enabledByDefault” configuration option. If True, send to the logger regardless of the global option. If False, do NOT send to the logger, regardless of the global option.

  • kwarguments – Keyword arguments specified variables that are passed to the learner. These are combined with the arguments parameter. Multiple values for arguments can be provided by using a Tune object (e.g., k=Tune([3, 5, 7])) to initiate hyperparameter tuning and return the learner trained on the best set of arguments. To provide an argument that is an object from the same package as the learner, use a nimble.Init object with the object name and its instantiation arguments (e.g., optimizer=nimble.Init(‘SGD’, learning_rate=0.01)).

Returns:

performance – The results of the test.

Examples

Train and test datasets which contains the labels.

>>> lstTrain = [[1, 0, 0, 1],
...             [0, 1, 0, 2],
...             [0, 0, 1, 3],
...             [1, 0, 0, 1],
...             [0, 1, 0, 2],
...             [0, 0, 1, 3]]
>>> ftNames = ['a', 'b', 'c', 'label']
>>> trainData = nimble.data(lstTrain,
...                         featureNames=ftNames)
>>> perform = nimble.trainAndTestOnTrainingData(
...     'nimble.KNNClassifier', nimble.calculate.fractionIncorrect,
...     trainX=trainData, trainY='label')
>>> perform
0.0

Passing arguments to the learner. Both the arguments parameter and kwarguments can be utilized, they will be merged. Below, C and kernel are parameters for scikit-learn’s SVC learner.

>>> lstTrainX = [[1, 0, 0],
...              [0, 1, 0],
...              [0, 0, 1],
...              [1, 0, 0],
...              [0, 1, 0],
...              [0, 0, 1]]
>>> lstTrainY = [[1], [2], [3], [1], [2], [3]]
>>> trainX = nimble.data(lstTrainX)
>>> trainY = nimble.data(lstTrainY)
>>> perform = nimble.trainAndTestOnTrainingData(
...     'sciKitLearn.SVC', nimble.calculate.fractionIncorrect,
...     trainX=trainX, trainY=trainY, arguments={'C': 0.1},
...     kernel='linear')
>>> perform
0.0

Keywords: performance, in-sample, in sample, supervised, score, model, training, machine learning, predict, error, measure, accuracy, performance