Pipelining is a technique of decomposing a sequential process into suboperations, with each subprocess being executed in a special dedicated segment that operates concurrently with all other segments.
The overlapping of computation is made possible by associating a register with each segment in the pipeline.
The registers provide isolation between each segment so that each can operate on distinct data simultaneously.
Perhaps the simplest way of viewing the pipeline structure is to imagine that each segment consists of an input register followed by a combinational circuit.
- The register holds the data.
- The combinational circuit performs the suboperation in the particular segment.
A clock is applied to all registers after enough time has elapsed to perform all segment activity.
The pipeline organization will be demonstrated by means of a simple example:
-
- To perform the combined multiply and add operations with a stream of numbers
Ai * Bi + Ci for i = 1, 2, 3, …, 7
Each suboperation is to be implemented in a segment within a pipeline.
R1 ß Ai, R2 ß Bi Input Ai and Bi
R3 ß R1 * R2, R4 ß Ci Multiply and input Ci
R5 ß R3 + R4 Add Ci to product
Each segment has one or two registers and a combinational circuit as shown in 9-2.
The five registers are loaded with new data every clock The effect of each clock is shown in Table 4-1.
Fig 4-1: Example of pipeline processing
Table 4-1: Content of Registers in Pipeline Example
General Considerations
- Any operation that can be decomposed into a sequence of suboperations of about the same complexity can be implemented by a pipeline processor.
- The general structure of a four-segment pipeline is illustrated in 4-2.
- We define a task as the total operation performed going through all the segments in the pipeline.
- The behavior of a pipeline can be illustrated with a space-time diagram.
- It shows the segment utilization as a function of time.
Fig 4-2: Four Segment Pipeline
- The space-time diagram of a four-segment pipeline is demonstrated in 4-3.
- Where a k-segment pipeline with a clock cycle time tp is used to execute n tasks.
- The first task T1 requires a time equal to ktp to complete its operation.
- The remaining n-1 tasks will be completed after a time equal to (n-1)tp
- Therefore, to complete n tasks using a k-segment pipeline requires k+(n-1) clock cycles.
- Consider a nonpipeline unit that performs the same operation and takes a time equal to tn to complete each task.
- The total time required for n tasks is ntn.
Fig 4-3: Space-time diagram for pipeline
- The speedup of a pipeline processing over an equivalent non-pipeline processing is defined by the ratio S = ntn/(k+n-1)tp .
- If n becomes much larger than k-1, the speedup becomes S = tn/tp.
- If we assume that the time it takes to process a task is the same in the pipeline and non-pipeline circuits, i.e., tn = ktp, the speedup reduces to S=ktp/tp=k.
- This shows that the theoretical maximum speed up that a pipeline can provide is k, where k is the number of segments in the pipeline.
- To duplicate the theoretical speed advantage of a pipeline process by means of multiple functional units, it is necessary to construct k identical units that will be operating in parallel.
- This is illustrated in 4-4, where four identical circuits are connected in parallel.
- Instead of operating with the input data in sequence as in a pipeline, the parallel circuits accept four input data items simultaneously and perform four tasks at the same time.
Fig 4-4: Multiple functional units in parallel
- There are various reasons why the pipeline cannot operate at its maximum theoretical rate.
- Different segments may take different times to complete their sub operation.
- It is not always correct to assume that a non pipe circuit has the same time delay as that of an equivalent pipeline circuit.
- There are two areas of computer design where the pipeline organization is
- Arithmetic pipeline
- Instruction pipeline