Design notes |
Pipelines v2.1 |
● |
This 32bit version of Pipelines is designed to work in conjunction
with ooRexx 32bit version 4+. It will install on a platform which hosts as a
minimum requirement; the Microsoft Windows .NET Framework 2.0. |
● |
Multiple Pipelines instances may execute concurrently. |
● |
Pipelines dispatches the stages in the order in which they
appear in the pipeline, however; any stage may be the first to begin
processing records. The relative order of the records flowing through a
pipeline can be predicted; as long as the stage path only comprises stages
that do not delay the records. |
● |
Unless the pipeline comprises
a stage or stages' that accumulate records; for example the SORT stage, and,
that the input records are not excessively long, Pipelines requires only a
small amount of memory to process input files of any size; as only a handful
of records will be in the pipeline at any one time. |
● |
not pre-emptive. When a stage reports an initialisation or runtime error;
Pipelines begins terminating the pipeline by instructing all active stages to
quiesce. When all active stages in the pipeline chain have responded to the
quiesce command and have terminated; Pipelines (the StageManager) terminates. |
● |
designed to execute on a single processor, where each stage/process vies for
service by the StageManager; the specific design of a stage controls how it
interoperates within a multi-stream pipeline configuration. |
● |
not verify that a pipeline is semantically correct, only that it is syntactically
correct. This means that you may construct a pipeline that does not execute
in the way that you expect it to. It may produce output records in a format
or an order that you did not intend or it may not produce any output records
at all. In view of this; when developing a pipeline that replaces the
contents of a disk-file, it is particularly prudent to test the pipeline
against a copy of that file. Pipelines does not issue "are you sure?"
messages! |
● |
Pipelines does not
work with records containing MBCS or UNICODE data (this will be addressed in a future
version of Pipelines; although this will require a massive re-work of the
application – and this will take time), only the single-byte ASCII
character set is supported. As a consequence, you should ensure that only ASCII-type input files are selected for modification.
Pipelines cannot determine the format of an input file; it simply executes
the pipeline that you specify. |
● |
Pipelines comprises a stall detection mechanism
that determines when a pipeline is stalled; A stall occurs when Pipelines
determines that every stage is either waiting to read a record or write a
record. That is, there is no stage that is currently processing a record; all
stages are either read-pending or write-pending. Pipelines
writes the current status of each stage in the pipeline to a dump-file
which can be inspected to determine the combination of stream connections
that caused the stall. |
● |
When a stage does not
specifically limit the number of input and/or output streams; the stage may
process up to 4096 input streams and the unsigned integer value _MAX_INT_ output streams. However, a pipeline configuration which
connects more than a handful of input or output streams to any one stage
should be considered; as badly designed. Consider the following
ooRexx script which concatenates three input files: **** Top of file **** 1 Address Rxpipe 2 3 'pipe (endchar ?)', 4 '< myfile1.txt', 5 '| a: fanin', 6 '| > myjoinedfiles.txt', 7 '?', 8 '< myfile2.txt', 9 '| a:', 10 '?', 11 '< myfile3.txt', 12 '| a:' 13 14 Exit 0 **** End of file ****
The pipeline above is
limited and not easily extensible; a better approach might be:
**** Top of file **** 1 Address Rxpipe 2 3 'pipe filelist file=myfile* ext=txt', 4 '| > myjoinedfiles.txt' 5 6 Exit 0 **** End of file ****
This pipeline is
extensible by design. The FILELIST stage will select all the files with a
pattern mask of: myfile*.txt. |
● |
Pipelines itself is extensible;
it comprises an MS VC++ stage command API library which contains all the
stage initialisation parsing functions and runtime extraction routines that
support the current set of builtin
stage filters. The API allows you to create new stage DLL's that augment the
current builtin set. The API
addresses' most of the needs that a stage might reasonably require; console
locking and synchronisation, multi-stream connectivity, multiple column, word
and field isolation, pre-process functionality, character range expansions,
input and output record availability and more. Pipelines ships with a DEBUG
and RELEASE API library version. The Pipelines Stage
command API utilises the Microsoft Foundation Class (MFC) CString
class extensively and other MFC specific classes under the covers, as and
when required. |
● |
supports third-party non-API WIN32 console applications/modules through
the SHELLEXECUTE stage command. SHELLEXECUTE will load and service any WIN32 application;
reading input records from that process' STDOUT and STDERR I/O streams;
writing records to the SHELLEXECUTE stages' primary and secondary output
streams, respectively. |
● |
Since Pipelines
version 1.6; the application documentation has been available online and that
involved separating the package documentation from the install package, and
allowing Pipelines to be installed on a disk-drive and in a directory of
choice. As the location of the input-files for the example pipelines cannot be
determined prior to installation; rather than programmatically, statically
setting example input-file source locations during the install process, I
replaced the input-file path in each example pipeline with a 'place-holder'
or 'macro'. Those new definitions allow you to save/relocate an example
pipeline to another directory and (as I may introduce new versions of example
pipelines which illustrate new or extended functionality; you may want to
retain older example versions for future reference), an example pipeline
provided by Pipelines version 1.6 and any future versions of Pipelines will
always reference the currently installed input-file directory. |
● |
The pipeline is not
interpreted; Pipelines performs a single-pass parse of the pipeline; allocating
the resources required by each stage and then it begins dispatching them. |
● |
an ALLUSERS application – every profile on the machine will have access to
Pipelines. |
● |
supports the sub-commands:
PEEKTO, READTO and OUTPUT – which provide functionality similar to their CMS
Pipelines versions. They work across the IPC divide between a calling
pipeline and a called pipeline (subroutine) maintaining the relative record
order. |
● |
The IN and OUT Stage commands are
designed to be dual purpose. A pipeline which utilises the IN and/or OUT
Stage command - launched through the CALLPIPE stage command; will service
input and output records read from and written to the calling pipeline's CALLPIPE
stage. Similarly – The IN and OUT Stage commands specified in an pipeline
which is connected (piped through) from and to another WIN32 process will
happily service their respective STDIN and STDOUT streams. Rather than limit subroutine
pipelines (in the way that traditional stored-procedures do – by embedding a
called routine within the calling script) a subroutine pipeline operates as
an autonomous unit. For example; by specifying the CALLPIPE QUIET option -
you might use a pipeline as a back-end utility in an application that
searches, replaces, sorts, translates or collates data. |
● |
provides a
convenient and easy way to create a new ooRexx script; simply right-click
anywhere on your desktop or within a folder, to access to the 'New->Pipelines
file' option. Selecting this option will create a very simple skeleton ooRexx
file; ext (.REX). File associations under Windows can be a troublesome,
especially when you try to re-name a file by extension - using this method;
you can create a new ooRexx file with the minimum of effort. |
● |
comprises two
distinct processing phases; the initialisation-phase and the runtime-phase.
The first; involves the parsing and validation of the pipeline source, the allocation
of resources needed to support the pipeline and the dispatch of the stage
command DLL's. The second; involves the actual execution (servicing input and
output record requests), monitoring the record throughput and the
de-allocation of acquired resources. The following two paragraphs provide a
brief overview, in a little more detail.
● |
offered freely and without evaluation caveats; you may use it as you please. If
you have any comments, suggestions or requests; please contact me via the
link below. |
● |
If you use Pipelines, you use it at your own risk! – I do not take any responsibility implied
or otherwise; for any damage caused through its use. |