Features.normalize

Features.normalize(function, applyResultTo=None, features=None, *, useLog=None)

Modify all features in this object using the given function.

Normalize the data by a function that adjusts each feature based on the provided function. If the function allows, the normalization can also be applied to a second object. Examples of normalizations provided in Nimble are meanNormalize, percentileNormalize, and the others in the examples below. See nimble.calculate.normalize for all the default provided types of normalization.

Parameters:
  • function – The function applying the normalization. Functions must accept a feature view and output the normalized feature data. When applyResultTo is not None, the function must accept a second feature view and return a two-tuple (normalized feature from the calling object, normalized feature from applyResultTo). Common normalizations can be found in nimble.calculate.normalize.

  • applyResultTo (nimble Base object, None) – The secondary object to apply the the normalization to. Must have the same number of features as the calling object.

  • features (identifier, list of identifiers, None) – Select specific features to apply the normalization to. If features is None, the normalization will be applied to all features.

  • useLog (bool, None) – Local control for whether to send object creation to the logger. If None (default), use the value as specified in the “logger” “enabledByDefault” configuration option. If True, send to the logger regardless of the global option. If False, do NOT send to the logger, regardless of the global option.

Examples

Calling object only.

>>> from nimble.calculate import range0to1Normalize
>>> lstTrain = [[5, 9.8, 92],
...             [3, 6.2, 58],
...             [2, 3.0, 29]]
>>> pts = ['movie1', 'movie2', 'movie3']
>>> fts = ['review1', 'review2', 'review3']
>>> train = nimble.data(lstTrain, pts, fts)
>>> train.features.normalize(range0to1Normalize)
>>> train
<Matrix 3pt x 3ft
          review1  review2  review3
        ┌──────────────────────────
 movie1 │  1.000    1.000    1.000
 movie2 │  0.333    0.471    0.460
 movie3 │  0.000    0.000    0.000
>

With applyResultTo.

>>> from nimble.calculate import meanStandardDeviationNormalize
>>> lstTrain = [[5, 9.8, 92],
...             [3, 6.2, 58],
...             [2, 3.0, 10]]
>>> lstTest = [[4, 9.1, 43],
...            [3, 5.1, 88]]
>>> fts = ['review1', 'review2', 'review3']
>>> trainPts = ['movie1', 'movie2', 'movie3']
>>> train = nimble.data(lstTrain, trainPts, fts)
>>> testPts = ['movie4', 'movie5']
>>> test = nimble.data(lstTest, testPts, fts)
>>> train.features.normalize(meanStandardDeviationNormalize,
...                          applyResultTo=test)
>>> train
<Matrix 3pt x 3ft
          review1  review2  review3
        ┌──────────────────────────
 movie1 │   1.336    1.248    1.149
 movie2 │  -0.267   -0.048    0.139
 movie3 │  -1.069   -1.200   -1.288
>
>>> test
<Matrix 2pt x 3ft
          review1  review2  review3
        ┌──────────────────────────
 movie4 │   0.535    0.996   -0.307
 movie5 │  -0.267   -0.444    1.031
>

With user defined normalization function.

>>> import numpy as np
>>> lstTrain = [[482], [30000], [7900], [35],[600]]
>>> pts = ['user1', 'user2', 'user3', 'user4', 'user5']
>>> fts = ['miles']
>>> train = nimble.data(lstTrain, pts, fts)
>>> train
<Matrix 5pt x 1ft
         miles
       ┌──────
 user1 │   482
 user2 │ 30000
 user3 │  7900
 user4 │    35
 user5 │   600
>
>>> def logNormalize(ft):
...     return np.log(ft)
>>> train.features.normalize(logNormalize)
>>> train
<Matrix 5pt x 1ft
         miles
       ┌───────
 user1 │  6.178
 user2 │ 10.309
 user3 │  8.975
 user4 │  3.555
 user5 │  6.397
>

Keywords: standardize, scale, rescale, divide, length