UNIQUE stage v1.0

Pipelines v2.0

 

Purpose, Operands, Streams used, Usage notes, Examples, See also

Home

 

Syntax

 
                        _NOPAD____                 _1-*___    _LAST_____
>>__UNIQue__ _______ __|__________|__ _________ __|_______|__|__________|______________><
            |_COUNT_|  |_PAD_char_|  |_ANYCase_|  | Range |  |_FIRST____|
                                                             |_COLLAPSE_|
 
Range:
 
|__ __inputrange_______________________ ________________________________________________>
   |   <__________________________     |
   |_(__inputrange__ __________ __|__)_|
                    |_NOPAD____|
                    |_PAD_char_|
 

Purpose

 

Use the UNIQUE stage to select unique or duplicate records.

 

By default, UNIQUE reads records from its primary input stream and compares each one with the following record to determine if they are the same. When the records are the same; UNIQUE continues reading records until it reaches a record that is different. The comparison is based on the contents of the entire input record. When a contiguous set of duplicates is read, UNIQUE selects only the last record in each set and discards the others. INIQUE writes the selected records to its primary output stream. The discarded records are written to the secondary output stream, if it is connected. Two matching records that are not contiguous are not considered to be duplicates. Therefore, the input stream for the UNIQUE stage must be in sorted order for UNIQUE to determine all the unique and duplicate records in its input stream.

 

Optionally, you can choose to perform a non-case-sensitive record comparison and you can specify that the comparison be based on one or more key fields; a specific range of words, fields or columns.

 

Operands

 

    

COUNT

When used in conjunction with the FIRST or LAST operand, COUNT prefaces each record in the primary output stream with a 10-character field which represents the record’s position in a set of duplicate records. The number is right-justified with leading spaces. Consecutive records that have the same key fields are considered a set of duplicate records. The count is 1 when a record is unique. For example, when combined with the default operand LAST; if the first three records of an input stream are duplicates, the third record is written the primary output stream prefaced by the number 3. 3 is the position of the last record in the set of duplicates; this represents the total number of duplicates in that set. When used in conjunction with the COLLAPSE operand; COUNT counts the number of records that lie between the first and last records in a set of duplicates.

 

    

NOPAD

specifies that shorter key fields are not extended with a pad character before they are compared with longer key fields of other records. The NOPAD operand can be specified in two positions on the UNIQUE stage:

 

 

specified before the inputrange operands or if inputrange is not specified, NOPAD applies to the entire record. This is the default.

If you specify NOPAD after inputrange, NOPAD only applies to that particular key field.

 

    

PAD

specifies that shorter key fields are extended with a pad character before they are compared with longer key fields of other records. The PAD operand can be specified in two positions:

 

 

If PAD is specified before the inputrange operands or if inputrange is not specified, PAD applies to the entire record.

If you specify PAD after inputrange, PAD only applies to the particular key field.

 

char

is the pad character.

 

    

ANYCase

specifies that key fields are compared in uppercase. In effect this means that a non-case-sensitive comparison is made.

 

    

inputrange

is an integer column, word or field range which defines a key field. If you do not specify inputrange, the key field is the entire record. When you specify more than one inputrange; you must enclose the set of inputrange operands within parentheses.

 

    

LAST

writes all unique records and the last record of each set of duplicate records to the primary output stream. All duplicate records that are not written to the primary output stream are discarded or written to the secondary output stream, if it is connected. This is the default

 

    

FIRST

writes all unique records and the first record of each set of duplicate records to the primary output stream. All duplicate records that are not written to the primary output stream

 

    

COLLAPSE

writes all unique records and the first and last record of each set of duplicate records to the primary output stream. When COLLAPSE is used on its own; all duplicate records that are not written to the primary output stream are discarded or written to the secondary output stream, if it is connected. When COLLAPSE is used in conjunction with the COUNT operand; COLLAPSE counts the number of duplicate records which lie between the first and last record in the set of duplicates. Once the last record in the set has been determined; a single record containing the number of duplicates (excluding the first and last in the set) is written to the secondary output stream. The number is a ten-character field, right-justified with leading spaces.

 

Streams used

 

The following streams are used by the UNIQUE stage:

 

Stream

Action

 

 

Primary input stream

UNIQUE reads records from its primary input stream.

Primary output stream

After selecting the specified records from its primary input stream, UNIQUE writes the selected records to its primary output stream.

Secondary output stream

UNIQUE writes the unselected input records to its secondary output stream.

 

Usage notes

 

     1)

UNIQUE FIRST does not delay the records. UNIQUE LAST and UNIQUE COLLAPSE delays one record.

 

     2)

If the UNIQUE stage discovers that all of its output streams are not connected, the UNIQUE stage ends.

 

     3)

UNIQUE waits to write a record to its output stream until it has compared it to the next record in its input stream. However, if you specify the FIRST operand, the input record does not wait to be compared before it is written to the output stream.

 

     4)

Use the SORT stage with the UNIQUE operand instead of separate SORT and UNIQUE stages when the input stream has many duplicate records and you do not wish to process the duplicate records further.

 

     5)

UNIQUE verifies that its secondary input stream is not connected and then begins execution.

 

Examples

 

Collapsing duplicate record sets

 

See also

 

Reference the following link for additional information:

 

SORT

 

History of change

 

None.