• In many science and engineering applications, the problems can be formulated in terms of vectors and matrices that lend themselves to vector processing.
  • Computers with vector processing capabilities are in demand in specialized e.g.
    • Long-range weather forecasting
    • Petroleum explorations
    • Seismic data analysis
    • Medical diagnosis
    • Artificial intelligence and expert systems
    • Image processing
    • Mapping the human genome
  • To achieve the required level of high performance it is necessary to utilize the fastest and most reliable hardware and apply innovative procedures from vector and parallel processing techniques.

Vector Operations


  • Many scientific problems require arithmetic operations on large arrays of numbers.
  • A vector is an ordered set of a one-dimensional array of data items.
  • A vector V of length n is represented as a row vector by V=[v1,v2,…,Vn].
  • To examine the difference between a conventional scalar processor and a vector processor, consider the following Fortran DO loop:

                 DO 20 I = 1, 100

                 20     C(I) = B(I) + A(I)

  • This is implemented in machine language by the following sequence of operations.

             Initialize I=0

            20 Read A(I) Read B(I)

            Store C(I) = A(I)+B(I)

            Increment I = I + 1 If I <= 100 go to 20

            Continue

  • A computer capable of vector processing eliminates the overhead associated with the time it takes to fetch and execute the instructions in the program loop.

                                   C(1:100) = A(1:100) + B(1:100)

  • A possible instruction format for a vector instruction is shown in 4-11.
    • This assumes that the vector operands reside in memory.
  • It is also possible to design the processor with a large number of registers and store all operands in registers prior to the addition operation.
    • The base address and length in the vector instruction specify a group of CPU registers.

                   

                                                    Fig 4-11: Instruction format for vector processor

Matrix Multiplication


  • The multiplication of two n x n matrices consists of n2 inner products or n3 multiply-add operations.
    • Consider, for example, the multiplication of two 3 x 3 matrices A and B.
    • c11= a11b11+ a12b21+ a13b31
    • This requires three multiplication and (after initializing c11 to 0) three additions.
  • In general, the inner product consists of the sum of k product terms of the form C= A1B1+A2B2+A3B3+…+AkBk.
    • In a typical application k may be equal to 100 or even 1000.
  • The inner product calculation on a pipeline vector processor is shown in 4- 12.

C = AB1 + AB5 + AB9 + A13B13 + ......

+ AB2 + AB6 + A10 B10 + A14 B14 + .......

+ AB3 + AB7 + A11B11 + A15 B15 + ........

+ AB4 + AB8 + A12 B12 + A16 B16 + .......

                                Fig 4-12: Pipeline for calculating an inner product

Memory Interleaving


  • Pipeline and vector processors often require simultaneous access to memory from two or more sources.
    • An instruction pipeline may require the fetching of an instruction and an operand at the same time from two different segments.
    • An arithmetic pipeline usually requires two or more operands to enter the pipeline at the same time.
  • Instead of using two memory buses for simultaneous access, the memory can be partitioned into a number of modules connected to a common memory address and data buses.
    • A memory module is a memory array together with its own address and data registers.
  • Fig 4-13 shows a memory unit with four modules.

                       

                                                       Fig 4-13: Multiple module memory organization

  • The advantage of a modular memory is that it allows the use of a technique called interleaving.
  • In an interleaved memory, different sets of addresses are assigned to different memory modules.
  • By staggering the memory access, the effective memory cycle time can be reduced by a factor close to the number of modules.

Supercomputers


  • A commercial computer with vector instructions and pipelined floating-point arithmetic operations is referred to as a supercomputer.
    • To speed up the operation, the components are packed tightly together to minimize the distance that the electronic signals have to travel.
  • This is augmented by instructions that process vectors and combinations of scalars and vectors.
  • A supercomputer is a computer system best known for its high computational speed, fast and large memory systems, and the extensive use of parallel processing.
    • It is equipped with multiple functional units and each unit has its own pipeline configuration.
  • It is specifically optimized for the type of numerical calculations involving vectors and matrices of floating-point numbers.
  • They are limited in their use to a number of scientific applications, such as numerical weather forecasting, seismic wave analysis, and space research.
  • A measure used to evaluate computers in their ability to perform a given number of floating-point operations per second is referred to as flops.
  • A typical supercomputer has a basic cycle time of 4 to 20 ns.
  • The examples of supercomputer:
  • Cray-1: it uses vector processing with 12 distinct functional units in parallel; a large number of registers (over 150); multiprocessor configuration (Cray X-MP and Cray Y-MP)
    • Fujitsu VP-200: 83 vector instructions and 195 scalar instructions; 300 megaflops