nimble.normalizeData

nimble.normalizeData(learnerName, trainX, trainY=None, testX=None, arguments=None, randomSeed=None, *, useLog=None, **kwarguments)

Modify data according to a produced model.

Calls on the functionality of a package to train on some data and then return the modified trainX and testX (if provided) according to the results of the trained model. If only trainX is provided, the normalized trainX is returned. If testX is also provided a tuple (normalizedTrain, normalizedTest) is returned. The name of the learner will be added to each normalized object’s name attribute to indicate the normalization that has been applied. Point and feature names are preserved when possible.

Parameters:
  • learnerName (str) – The learner to be called. This can be a string in the form ‘package.learner’ or the learner class object.

  • trainX (nimble Base object) – Data to be used for training.

  • trainY (identifier, nimble Base object) – A name or index of the feature in trainX containing the labels or another nimble Base object containing the labels that correspond to trainX.

  • testX (nimble Base object) – Data to be used for testing.

  • arguments (dict) – Mapping argument names (strings) to their values, to be used during training and application (e.g., {‘dimensions’:5, ‘k’:5}). To provide an argument that is an object from the same package as the learner, use a nimble.Init object with the object name and its instantiation arguments (e.g., {‘optimizer’: nimble.Init(‘SGD’, learning_rate=0.01}). Note: learner arguments can also be passed as kwarguments so this dictionary will be merged with any keyword arguments.

  • randomSeed (int) – Set a random seed for the operation. When None, the randomness is controlled by Nimble’s random seed. Ignored if learner does not depend on randomness.

  • useLog (bool, None) – Local control for whether to send object creation to the logger. If None (default), use the value as specified in the “logger” “enabledByDefault” configuration option. If True, send to the logger regardless of the global option. If False, do NOT send to the logger, regardless of the global option.

  • kwarguments – Keyword arguments specified variables that are passed to the learner. These are combined with the arguments parameter. To provide an argument that is an object from the same package as the learner, use a nimble.Init object with the object name and its instantiation arguments (e.g., optimizer=nimble.Init(‘SGD’, learning_rate=0.01)).

Examples

Normalize a single data set.

>>> lst = [[20, 1.97, 89], [28, 1.87, 75], [24, 1.91, 81]]
>>> trainX = nimble.data(lst, pointNames=['a', 'b', 'c'],
...                      featureNames=['age', 'height', 'weight'],
...                      returnType="Matrix")
>>> normTrainX = nimble.normalizeData('scikitlearn.StandardScaler',
...                                   trainX)
>>> normTrainX
<Matrix 3pt x 3ft
      age    height  weight
   ┌───────────────────────
 a │ -1.225   1.298   1.279
 b │  1.225  -1.136  -1.162
 c │  0.000  -0.162  -0.116
>

Normalize training and testing data.

>>> lst1 = [[0, 1, 3], [-1, 1, 2], [1, 2, 2]]
>>> trainX = nimble.data(lst1)
>>> lst2 = [[-1, 0, 5]]
>>> testX = nimble.data(lst2)
>>> pcaTrain, pcaTest = nimble.normalizeData('scikitlearn.PCA',
...                                          trainX, testX=testX,
...                                          n_components=2)
>>> pcaTrain
<Matrix 3pt x 2ft
       0       1
   ┌───────────────
 0 │ -0.216   0.713
 1 │ -1.005  -0.461
 2 │  1.221  -0.253
>
>>> pcaTest
<Matrix 1pt x 2ft
       0       1
   ┌──────────────
 0 │ -1.739  2.588
>

Keywords: modify, apply, standardize, scale, rescale, encode, center, mean, standard deviation, z-scores, z scores