Points.combineByExpandingFeatures

Points.combineByExpandingFeatures(featureWithFeatureNames, featuresWithValues, modifyDuplicateFeatureNames=False, *, useLog=None)

Combine similar points based on a differentiating feature.

Combine points that share common features, but are currently separate points due to a feature, featureWithFeatureNames, that is categorizing the values in the remaining unshared features, featuresWithValues. The points can be combined to a single point by instead representing the categorization of the unshared values as different features of the same point. The corresponding featureName/value pairs from featureWithFeatureNames and featuresWithValues in each point will become the values for the expanded features for the combined points. If a combined point lacks a featureName/value pair for any given feature(s), np.nan will be assigned as the value(s) at that feature(s). The resulting featureNames depends on the number of features with values. For a single feature with values, the new feature names are the unique values in featureWithFeatureNames. However, for two or more features with values, the unique values in featureWithFeatureNames no longer cover all combinations so those values are combined with the names of the features with values using an underscore. The combined point name will be assigned the point name of the first instance of that point, if point names are present.

An object containing m features and n points with k unique point combinations amongst shared features, i unique values in featureWithFeatureNames and j featuresWithValues will result in an object with k points and (m - (1 + j) + (i * j)) features.

Parameters:
  • featureWithFeatureNames (identifier) – The name or index of the feature containing the values that will become the names of the features in the combined points.

  • featuresWithValues (identifier, list of identifiers) – The names and/or indices of the features of values that correspond to the values in featureWithFeatureNames.

  • modifyDuplicateFeatureNames (bool) – Allow modifications featureName strings if two or more unique values in featureWithFeatureNames return the same string. Duplicate strings will have the type of the feature appended to the string wrapped in parenthesis. For example, if 1 and ‘1’ are both in featureWithFeatureNames, the featureNames will become ‘1(int)’ and ‘1(str)’, respectively.

  • useLog (bool, None) – Local control for whether to send object creation to the logger. If None (default), use the value as specified in the “logger” “enabledByDefault” configuration option. If True, send to the logger regardless of the global option. If False, do NOT send to the logger, regardless of the global option.

Notes

A visual representation of the Example:

sprinters.points.combineByExpandingFeatures('dist', 'time')

     sprinters (before)                 sprinters (after)
+-----------+------+-------+    +-----------+------+-------+
| athlete   | dist | time  |    | athlete   | 100m | 200m  |
+-----------+------+-------+    +-----------+------+-------+
| Bolt      | 100m | 9.81  |    | Bolt      | 9.81 | 19.78 |
+-----------+------+-------+ -> +-----------+------+-------+
| Bolt      | 200m | 19.78 |    | Gatlin    | 9.89 |       |
+-----------+------+-------+    +-----------+------+-------+
| Gatlin    | 100m | 9.89  |    | de Grasse | 9.91 | 20.02 |
+-----------+------+-------+    +-----------+------+-------+
| de Grasse | 200m | 20.02 |
+-----------+------+-------+
| de Grasse | 100m | 9.91  |
+-----------+------+-------+

This function was inspired by the spread function from the tidyr library created by Hadley Wickham [1] in the R programming language.

References

Examples

>>> lst = [['Bolt', '100m', 9.81],
...        ['Bolt', '200m', 19.78],
...        ['Gatlin', '100m', 9.89],
...        ['de Grasse', '200m', 20.02],
...        ['de Grasse', '100m', 9.91]]
>>> fts = ['athlete', 'dist', 'time']
>>> sprinters = nimble.data(lst, featureNames=fts)
>>> sprinters.points.combineByExpandingFeatures('dist', 'time')
>>> sprinters
<DataFrame 3pt x 3ft
      athlete    100m   200m
   ┌─────────────────────────
 0 │      Bolt  9.810  19.780
 1 │    Gatlin  9.890
 2 │ de Grasse  9.910  20.020
>

Keywords: spread, cast, pivot, pivot_longer, unfold, tidy, tidyr