• Pipeline processing can occur not only in the data stream but in the instruction as well.
  • Consider a computer with an instruction fetch unit and an instruction execution unit designed to provide a two-segment pipeline.
  • Computers with complex instructions require other phases in addition to above phases to process an instruction completely.
  • In the most general case, the computer needs to process each instruction with the following sequence of steps.
    • Fetch the instruction from memory.
    • Decode the instruction.
    • Calculate the effective address.
    • Fetch the operands from memory.
    • Execute the instruction
    • Store the result in the proper place.
  • There are certain difficulties that will prevent the instruction pipeline from operating at its maximum rate.
    • Different segments may take different times to operate on the incoming information.
    • Some segments are skipped for certain operations.
    • Two or more segments may require memory access at the same time, causing one segment to wait until another is finished with the memory.

Example: Four-Segment Instruction Pipeline

  • Assume that:
    • The decoding of the instruction can be combined with the calculation of the effective address into one segment.
    • The instruction execution and storing of the result can be combined into one segment.
  • Fig 4-7 shows how the instruction cycle in the CPU can be processed with a four-segment pipeline.
    • Thus up to four suboperations in the instruction cycle can overlap and up to four different instructions can be in progress of being processed at the same time.
  • An instruction in the sequence may be because of a branch out of the normal sequence.
    • In that case the pending operations in the last two segments are completed and all information stored in the instruction buffer is deleted.
    • Similarly, an interrupt request will cause the pipeline to empty and start again from a new address value.

                       

                                     Fig 4-7: Four-segment CPU pipeline

  • Fig 9-8 shows the operation of the instruction pipeline.

                     

                                                                Fig 4-8: Timing of Instruction Pipeline

                              

  • FI: the segment that fetches an instruction
  • DA: the segment that decodes the instruction and calculate the effective address
  • FO: the segment that fetches the operand
  • EX: the segment that executes the instruction

Pipeline Conflicts

  • In general, there are three major difficulties that cause the instruction pipeline to deviate from its normal operation.
    • Resource conflicts caused by access to memory by two segments at the same time.
  • Can be resolved by using separate instruction and data memories
  • Data dependency conflicts arise when an instruction depends on the result of a previous instruction, but this result is not yet available.
  • Branch difficulties arise from branch and other instructions that change the value of PC.
  • A difficulty that may cause a degradation of performance in an instruction pipeline is due to possible collision of data or address.
    • A data dependency occurs when an instruction needs data that are not yet available.
    • An address dependency may occur when an operand address cannot be calculated because the information needed by the addressing mode is not available.
  • Pipelined computers deal with such conflicts between data dependencies in a variety of ways.

Data Dependency Solutions

  • Hardware interlocks: an interlock is a circuit that detects instructions whose source operands are destinations of instructions farther up in the pipeline.
    • This approach maintains the program sequence by using hardware to insert the required delays.
  • Operand forwarding: uses special hardware to detect a conflict and then avoid it by routing the data through special paths between pipeline segments.
    • This method requires additional hardware paths through multiplexers as well as the circuit that detects the conflicts.
  • Delayed load: the compiler for such computers is designed to detect a data conflict and reorder the instructions as necessary to delay the loading of the conflicting data by inserting no-operation instructions.

Handling of Branch Instructions

  • One of the major problems in operating an instruction pipeline is the occurrence of branch instructions.
    • An unconditional branch always alters the sequential program flow by loading the program counter with the target address.
    • In a conditional branch, the control selects the target instruction if the condition is satisfied or the next sequential instruction if the condition is not satisfied.
  • Pipelined computers employ various hardware techniques to minimize the performance degradation caused by instruction branching.
  • Prefetch target instruction: To prefetch the target instruction in addition to the instruction following the Both are saved until the branch is executed.
  • Branch target buffer(BTB): The BTB is an associative memory included in the fetch segment of the pipeline.
    • Each entry in the BTB consists of the address of a previously executed branch instruction and the target instruction for that branch.
    • It also stores the next few instructions after the branch target instruction.
  • Loop buffer: This is a small very high speed register file maintained by the instruction fetch segment of the pipeline.
  • Branch prediction: A pipeline with branch prediction uses some additional logic to guess the outcome of a conditional branch instruction before it is executed.
  • Delayed branch: in this procedure, the compiler detects the branch instructions and rearranges the machine language code sequence by inserting useful instructions that keep the pipeline operating without interruptions.
    • A procedure employed in most RISC processors.
    • eg. no-operation instruction