Points.combineByExpandingFeatures¶
- Points.combineByExpandingFeatures(featureWithFeatureNames, featuresWithValues, modifyDuplicateFeatureNames=False, *, useLog=None)¶
Combine similar points based on a differentiating feature.
Combine points that share common features, but are currently separate points due to a feature,
featureWithFeatureNames
, that is categorizing the values in the remaining unshared features,featuresWithValues
. The points can be combined to a single point by instead representing the categorization of the unshared values as different features of the same point. The corresponding featureName/value pairs fromfeatureWithFeatureNames
andfeaturesWithValues
in each point will become the values for the expanded features for the combined points. If a combined point lacks a featureName/value pair for any given feature(s), np.nan will be assigned as the value(s) at that feature(s). The resulting featureNames depends on the number of features with values. For a single feature with values, the new feature names are the unique values infeatureWithFeatureNames
. However, for two or more features with values, the unique values infeatureWithFeatureNames
no longer cover all combinations so those values are combined with the names of the features with values using an underscore. The combined point name will be assigned the point name of the first instance of that point, if point names are present.An object containing m features and n points with k unique point combinations amongst shared features, i unique values in
featureWithFeatureNames
and jfeaturesWithValues
will result in an object with k points and (m - (1 + j) + (i * j)) features.- Parameters:
featureWithFeatureNames (identifier) – The name or index of the feature containing the values that will become the names of the features in the combined points.
featuresWithValues (identifier, list of identifiers) – The names and/or indices of the features of values that correspond to the values in
featureWithFeatureNames
.modifyDuplicateFeatureNames (bool) – Allow modifications featureName strings if two or more unique values in
featureWithFeatureNames
return the same string. Duplicate strings will have the type of the feature appended to the string wrapped in parenthesis. For example, if 1 and ‘1’ are both infeatureWithFeatureNames
, the featureNames will become ‘1(int)’ and ‘1(str)’, respectively.useLog (bool, None) – Local control for whether to send object creation to the logger. If None (default), use the value as specified in the “logger” “enabledByDefault” configuration option. If True, send to the logger regardless of the global option. If False, do NOT send to the logger, regardless of the global option.
Notes
A visual representation of the Example:
sprinters.points.combineByExpandingFeatures('dist', 'time') sprinters (before) sprinters (after) +-----------+------+-------+ +-----------+------+-------+ | athlete | dist | time | | athlete | 100m | 200m | +-----------+------+-------+ +-----------+------+-------+ | Bolt | 100m | 9.81 | | Bolt | 9.81 | 19.78 | +-----------+------+-------+ -> +-----------+------+-------+ | Bolt | 200m | 19.78 | | Gatlin | 9.89 | | +-----------+------+-------+ +-----------+------+-------+ | Gatlin | 100m | 9.89 | | de Grasse | 9.91 | 20.02 | +-----------+------+-------+ +-----------+------+-------+ | de Grasse | 200m | 20.02 | +-----------+------+-------+ | de Grasse | 100m | 9.91 | +-----------+------+-------+
This function was inspired by the spread function from the tidyr library created by Hadley Wickham [1] in the R programming language.
References
See also
Examples
>>> lst = [['Bolt', '100m', 9.81], ... ['Bolt', '200m', 19.78], ... ['Gatlin', '100m', 9.89], ... ['de Grasse', '200m', 20.02], ... ['de Grasse', '100m', 9.91]] >>> fts = ['athlete', 'dist', 'time'] >>> sprinters = nimble.data(lst, featureNames=fts) >>> sprinters.points.combineByExpandingFeatures('dist', 'time') >>> sprinters <DataFrame 3pt x 3ft athlete 100m 200m ┌───────────────────────── 0 │ Bolt 9.810 19.780 1 │ Gatlin 9.890 2 │ de Grasse 9.910 20.020 >
Keywords: spread, cast, pivot, pivot_longer, unfold, tidy, tidyr