SPLIT stage v1.2

Pipelines v2.1

 

Purpose, Operands, Streams, Usage, Examples, Related

Home

 

Part of examples section does not wrap!

 

Syntax

 
           ┌─AT───────────────────────────────────────────┐
>>──SPLIT──┼──────────────────────────────────────────────┼──────────────────────────><
           │ ┌─AT─────┐                                   
           └─┼────────┼──┬─────────┬──┬─────────────────┬─┘
             BEFORE  ANYCase─┘  charrange───────┤
             └─AFTER──┘               STRing─┬─string─┘
                                      REGexp─┤
                                      └─ANYof──┘

 

Purpose

 

Use the SPLIT stage to divide input records into multiple output records. SPLIT reads records from its primary input stream; splits the records and writes the resulting records to its primary output stream. If no operands are specified; SPLIT divides records at whitespace characters (both space (X'20) and tab (X'09') characters are considered to be whitespace) and discards them. If you only specify the AT, BEFORE or AFTER operands; records are split at whitespace characters, before whitespace characters or after whitespace characters, respectively. With additional operands, records are split relative to occurrences of a specified target.

 

Operands

 

AT

causes input records to be split at the specified target. The target characters are discarded. AT is the default.

 

BEFORE

causes input records to be split before the specified target. The target characters are retained.

 

AFTER

causes input records to be split after the specified target. The target characters are retained.

If you specify AFTER with STRING, the records are split after any columns that contain the last character of the string.

 

ANYCase

specifies that charrange or string is compared with the input record in uppercase. In effect this means that a non-case-sensitive comparison is made when selecting characters that cause the records to be split.

 

charrange

is a character range. A split occurs when any one of the characters in the range is matched.

 

STRing

specifies that the string operand is a literal string of characters to locate. A split occurs only when the entire string is matched.

 

REGexp

specifies that the string operand is a regular expression of characters to locate. A split occurs when the expression is matched.

 

ANYof

specifies that the string operand is a list of characters to locate. A split occurs when any one of the characters in the list is matched.

 

 

string

is a string to locate.

 

Streams

 

The following streams are used by the SPLIT stage:

 

Stream

 

Action

 

 

Primary input stream

SPLIT reads records from its primary input stream.

Primary output stream

After splitting the input records into multiple records, SPLIT writes the resulting records to its primary output stream.

 

Usage

 

1.

SPLIT does not delay the records.

 

2.

If the SPLIT stage discovers that its primary input or output streams are not connected, the SPLIT stage ends.

 

3.

SPLIT copies null input records to its primary output stream. It does not generate null output records.

 

4.

SPLIT verifies that its secondary input and output streams are not connected and then begins execution.

 

Examples

 

Given the input file: input.txt, below; the following four examples demonstrate the before and after operands of the SPLIT stage.

 

input.txt (input)

 

...|...+....1....+....2....+....3....
   **** Top of file ****
 1 1234512345
 2 5432154321
   **** End of file ****

 

1.

To split input records before the column that is 1 column to the left of each occurrence of the character 4, use the following:

 

'pipe < input.txt | split 1 before string /4/ | console'

 

output:

12

34512

345

54321

54321

 

2.

To split input records after the column that is 2 columns to the right of each occurrence of the last character in string 12, use the following:

 

'pipe < INPUT DATA | split 2 after string /12/ | console'

 

output:

1234

51234

5

5432154321

 

3.

To split input records between the characters 5 and 4 in occurrences of the string 54, use the following:

 

'pipe < input data | split -1 before string /54/ | console'

 

output:

1234512345

5

43215

4321

 

4.

To split input records between the characters 2 and 3 in occurrences of the string 123, use the following:

 

'pipe < input data | split -1 after string /123/ | console'

 

output:

12

34512

345

5432154321

 

 

Other miscellaneous examples:

 

1.

The following example utilises the SPLIT stage command in a pipeline which determines the number of bytes that could be saved by removing trailing whitespace.

 

   **** Top of file ****
 1 Address Rxpipe
 2
 3 'pipe < myfile.txt ',
 4    '| locate',                         /* Discard blank-lines. */
 5    '| xlate w-1;* x20 @ x09 @',        /* Change spaces/tabs to at(@) chars. */
 6    '| split before str /@/',           /* Split at each at(@), start new record. */
 7    '| strip trailing anyof /@/',       /* Reduce records to length zero. */
 8    '| nlocate 1',                      /* Select only null/empty records. */
 9    '| count',                          /* Count the records. */
10    '| specs /The number of bytes which could be saved is:/ 1 1-* nw',
11    '| cons'                            /* Display the result. */
12
13 Exit 0
   **** End of file ****

 

Related

 

CHOP, JOIN, PAD, STRIP
 

History

 

Version

 

Date

Action

Description

Pipelines

1.2

27.12.2021

changed

Application-wide rewrite.

2.1

1.1

22.03.2008

added

Support for the REGEXP operand; which specifies that the string operand is interpreted as a regular expression.

1.4

1.0

06.09.2007

created

First version.

1.0