Features.splitByParsing

Features.splitByParsing(feature, rule, resultingNames, *, useLog=None)

Split a feature into multiple features.

Parse an existing feature and divide it into separate parts. Each value must split into a number of values equal to the length of resultingNames.

Parameters:
  • feature (indentifier) – The name or index of the feature to parse and split.

  • rule (str, int, list, function) –

    • string - split the value at any instance of the character string. This works in the same way as python’s built-in split() function; removing this string.

    • integer - the index position where the split will occur. Unlike a string, no characters will be removed when using integer. All characters before the index will be split from characters at and after the index.

    • list - may contain integer and/or string values

    • function - any function accepting a value as input, splitting the value and returning a list of the split values.

  • resultingNames (list) – Strings defining the names of the split features.

  • useLog (bool, None) – Local control for whether to send object creation to the logger. If None (default), use the value as specified in the “logger” “enabledByDefault” configuration option. If True, send to the logger regardless of the global option. If False, do NOT send to the logger, regardless of the global option.

Notes

Visual representations of the Examples:

locations.splitFeatureByParsing('location', ', ',
                                ['city', 'country'])

    locations (before)                locations (after)
+-------------------------+    +-----------+--------------+
| location                |    | city      | country      |
+-------------------------+    +-----------+--------------+
| Cape Town, South Africa |    | Cape Town | South Africa |
+-------------------------+ -> +-----------+--------------+
| Lima, Peru              |    | Lima      | Peru         |
+-------------------------+    +-----------+--------------+
| Moscow, Russia          |    | Moscow    | Russia       |
+-------------------------+    +-----------+--------------+

inventory.splitFeatureByParsing(0, 3, ['category', 'id'])

  inventory (before)                  inventory (after)
+---------+----------+        +----------+-----+----------+
| product | quantity |        | category | id  | quantity |
+---------+----------+        +----------+-----+----------+
| AGG932  | 44       |        | AGG      | 932 | 44       |
+---------+----------+        +----------+-----+----------+
| AGG734  | 11       |   ->   | AGG      | 734 | 11       |
+---------+----------+        +----------+-----+----------+
| HEQ892  | 1        |        | HEQ      | 892 | 1        |
+---------+----------+        +----------+-----+----------+
| LEQ331  | 2        |        | LEQ      | 331 | 2        |
+---------+----------+        +----------+-----+----------+

This function was inspired by the separate function from the tidyr library created by Hadley Wickham [1] in the R programming language.

References

Examples

Split with a string for rule.

>>> lst = [['Cape Town, South Africa'],
...        ['Lima, Peru'],
...        ['Moscow, Russia']]
>>> fts = ['location']
>>> locations = nimble.data(lst, featureNames=fts)
>>> locations.features.splitByParsing('location', ', ',
...                                   ['city', 'country'])
>>> locations
<DataFrame 3pt x 2ft
        city      country
   ┌────────────────────────
 0 │ Cape Town  South Africa
 1 │      Lima          Peru
 2 │    Moscow        Russia
>

Split with an index for rule.

>>> lst = [['AGG932', 44],
...        ['AGG734', 11],
...        ['HEQ892', 1],
...        ['LEQ331', 2]]
>>> fts = ['product', 'quantity']
>>> inventory = nimble.data(lst, featureNames=fts)
>>> inventory.features.splitByParsing(0, 3, ['category', 'id'])
>>> inventory
<DataFrame 4pt x 3ft
     category   id  quantity
   ┌────────────────────────
 0 │   AGG     932     44
 1 │   AGG     734     11
 2 │   HEQ     892      1
 3 │   LEQ     331      2
>

Keywords: parse, separate, pattern, break, apart, detect, tidy, tidyr