Base.trainAndTestSets

Base.trainAndTestSets(testFraction, labels=None, randomOrder=True, *, useLog=None)

Divide the data into training and testing sets.

Return either a length 2 or a length 4 tuple. If labels=None, then returns a length 2 tuple containing the training object, then the testing object (trainX, testX). If labels is non-None, a length 4 tuple is returned, containing the training data object, then the training labels object, then the testing data object, and finally the testing labels (trainX, trainY, testX, testY).

Parameters:
  • testFraction (int or float) – The fraction of the data to be placed in the testing sets. If randomOrder is False, then the points are taken from the end of this object.

  • labels (Base object, identifier, list of identifiers, or None) – A separate Base object containing the labels for this data or the feature axis name(s) or index(es) of the data labels within this object. A value of None implies this data does not contain labels. This parameter will affect the shape of the returned tuple.

  • randomOrder (bool) – Control whether the order of the points in the returns sets matches that of the original object, or if their order is randomized.

  • useLog (bool, None) – Local control for whether to send object creation to the logger. If None (default), use the value as specified in the “logger” “enabledByDefault” configuration option. If True, send to the logger regardless of the global option. If False, do NOT send to the logger, regardless of the global option.

Returns:

tuple – If labels is None, a length 2 tuple containing the training and testing objects (trainX, testX). If labels is non-None, a length 4 tuple containing the training and testing data objects and the training a testing labels objects (trainX, trainY, testX, testY).

Examples

Returning a 2-tuple.

>>> nimble.random.setSeed(42)
>>> lst = [[1, 0, 0],
...        [0, 1, 0],
...        [0, 0, 1],
...        [1, 0, 0],
...        [0, 1, 0],
...        [0, 0, 1]]
>>> ptNames = ['a', 'b', 'c', 'd', 'e', 'f']
>>> X = nimble.data(lst, pointNames=ptNames)
>>> trainData, testData = X.trainAndTestSets(.34)
>>> trainData
<Matrix "train" 4pt x 3ft
     0  1  2
   ┌────────
 a │ 1  0  0
 b │ 0  1  0
 f │ 0  0  1
 c │ 0  0  1
>
>>> testData
<Matrix "test" 2pt x 3ft
     0  1  2
   ┌────────
 e │ 0  1  0
 d │ 1  0  0
>

Returning a 4-tuple.

>>> nimble.random.setSeed(42)
>>> lst = [[1, 0, 0, 1],
...        [0, 1, 0, 2],
...        [0, 0, 1, 3],
...        [1, 0, 0, 1],
...        [0, 1, 0, 2],
...        [0, 0, 1, 3]]
>>> ptNames = ['a', 'b', 'c', 'd', 'e', 'f']
>>> X = nimble.data(lst, pointNames=ptNames)
>>> fourTuple = X.trainAndTestSets(.34, labels=3)
>>> trainX, trainY = fourTuple[0], fourTuple[1]
>>> testX, testY = fourTuple[2], fourTuple[3]
>>> trainX
<Matrix "trainX" 4pt x 3ft
     0  1  2
   ┌────────
 a │ 1  0  0
 b │ 0  1  0
 f │ 0  0  1
 c │ 0  0  1
>
>>> trainY
<Matrix "trainY" 4pt x 1ft
     0
   ┌──
 a │ 1
 b │ 2
 f │ 3
 c │ 3
>
>>> testX
<Matrix "testX" 2pt x 3ft
     0  1  2
   ┌────────
 e │ 0  1  0
 d │ 1  0  0
>
>>> testY
<Matrix "testY" 2pt x 1ft
     0
   ┌──
 e │ 2
 d │ 1
>

Keywords: split, data, prepare, training, testing, data points, divide, validation, splitting, train_test_split