Features.splitByParsing¶
- Features.splitByParsing(feature, rule, resultingNames, *, useLog=None)¶
Split a feature into multiple features.
Parse an existing feature and divide it into separate parts. Each value must split into a number of values equal to the length of
resultingNames
.- Parameters:
feature (indentifier) – The name or index of the feature to parse and split.
rule (str, int, list, function) –
string - split the value at any instance of the character string. This works in the same way as python’s built-in split() function; removing this string.
integer - the index position where the split will occur. Unlike a string, no characters will be removed when using integer. All characters before the index will be split from characters at and after the index.
list - may contain integer and/or string values
function - any function accepting a value as input, splitting the value and returning a list of the split values.
resultingNames (list) – Strings defining the names of the split features.
useLog (bool, None) – Local control for whether to send object creation to the logger. If None (default), use the value as specified in the “logger” “enabledByDefault” configuration option. If True, send to the logger regardless of the global option. If False, do NOT send to the logger, regardless of the global option.
Notes
Visual representations of the Examples:
locations.splitFeatureByParsing('location', ', ', ['city', 'country']) locations (before) locations (after) +-------------------------+ +-----------+--------------+ | location | | city | country | +-------------------------+ +-----------+--------------+ | Cape Town, South Africa | | Cape Town | South Africa | +-------------------------+ -> +-----------+--------------+ | Lima, Peru | | Lima | Peru | +-------------------------+ +-----------+--------------+ | Moscow, Russia | | Moscow | Russia | +-------------------------+ +-----------+--------------+ inventory.splitFeatureByParsing(0, 3, ['category', 'id']) inventory (before) inventory (after) +---------+----------+ +----------+-----+----------+ | product | quantity | | category | id | quantity | +---------+----------+ +----------+-----+----------+ | AGG932 | 44 | | AGG | 932 | 44 | +---------+----------+ +----------+-----+----------+ | AGG734 | 11 | -> | AGG | 734 | 11 | +---------+----------+ +----------+-----+----------+ | HEQ892 | 1 | | HEQ | 892 | 1 | +---------+----------+ +----------+-----+----------+ | LEQ331 | 2 | | LEQ | 331 | 2 | +---------+----------+ +----------+-----+----------+
This function was inspired by the separate function from the tidyr library created by Hadley Wickham [1] in the R programming language.
References
Examples
Split with a string for
rule
.>>> lst = [['Cape Town, South Africa'], ... ['Lima, Peru'], ... ['Moscow, Russia']] >>> fts = ['location'] >>> locations = nimble.data(lst, featureNames=fts) >>> locations.features.splitByParsing('location', ', ', ... ['city', 'country']) >>> locations <DataFrame 3pt x 2ft city country ┌──────────────────────── 0 │ Cape Town South Africa 1 │ Lima Peru 2 │ Moscow Russia >
Split with an index for
rule
.>>> lst = [['AGG932', 44], ... ['AGG734', 11], ... ['HEQ892', 1], ... ['LEQ331', 2]] >>> fts = ['product', 'quantity'] >>> inventory = nimble.data(lst, featureNames=fts) >>> inventory.features.splitByParsing(0, 3, ['category', 'id']) >>> inventory <DataFrame 4pt x 3ft category id quantity ┌──────────────────────── 0 │ AGG 932 44 1 │ AGG 734 11 2 │ HEQ 892 1 3 │ LEQ 331 2 >
Keywords: parse, separate, pattern, break, apart, detect, tidy, tidyr