Vector Processor

Vector Processor
Find and Compare prices on vector processor at Smarter.com.
www.smarter.com

Vector processor - Wikipedia, the free encyclopedia
Processor board of a CRAY YMP vector computer ... Vector processor elements have since been added to almost all modern CPU designs, ...
en.wikipedia.org

vector processor: Information from Answers.com
vector processor A computer with built-in instructions that perform multiple calculations on vectors (one-dimensional arrays) simultaneously
www.answers.com

Vector Computing Tutorial -- General Architecture
We continue by showing the application of these ideas to the hardware in vector processors. ... One example of a memory-memory vector processor is the CDC Cyber 205. ...
www-ugrad.cs.colorado.edu

Vector Processors
Vector processors are special purpose computers that match a range of (scientific) ... A single miss in the vector cache results in a number of processor ...
www.cs.umd.edu

Electrosonic VECTOR large screen image processor
Electrosonic Vector image processor allows multiple image sources to be displayed on multiple screens ... VECTOR is a large screen display image processor that ...
www.electrosonic.com

Improving Memory System Performance for Soft Vector Processors
soft vector processors via (i) tuning the data cache con?g ... vector processor connected to DDR and executing hand ... soft vector processor, we observed that ...
www.eecg.toronto.edu

Earth Simulator System
history of computing ... Vector processors are already around for some time but no manufacturer took the ... NEC's ESS vector processor(1) ...
www.thocp.net

Vector Network Processors
With these modifications, a vector processor that ... In contrast, a pipelined vector processor with a regular, logically simple ...
www.ece.wisc.edu

A Case for Vector Network Processors
Hence we believe a vector processor is the most efficient solution to process ... tantly, vector processors can greatly ease the design of the memory subsystem. ...
www.ece.wisc.edu




Warning: mkdir() [function.mkdir]: Permission denied in /home/webs/affiliatelib2/CacheManager.php on line 12

Warning: mkdir() [function.mkdir]: No such file or directory in /home/webs/affiliatelib2/CacheManager.php on line 12

Warning: fopen(/home/templatecore2cache//*cluesnet.com/7f/7f998fbe3042cef7821c2d8ac999d7ac70fca594.tc2cache) [function.fopen]: failed to open stream: No such file or directory in /home/webs/affiliatelib2/CacheManager.php on line 130

Warning: fwrite(): supplied argument is not a valid stream resource in /home/webs/affiliatelib2/CacheManager.php on line 131

Warning: fclose(): supplied argument is not a valid stream resource in /home/webs/affiliatelib2/CacheManager.php on line 132





A vector processor, or array processor, is a Central processing unit design that is able to run mathematical operations on multiple data elements simultaneously. This is in contrast to a scalar processor which handles one element at a time. The vast majority of CPUs are scalar (or close to it). Vector processors were common in the scientific computing area, where they formed the basis of most supercomputers through the 1980s and into the 1990s, but general increases in performance and processor design saw the near disappearance of the vector processor as a general-purpose CPU.

Today most commodity CPU designs include some vector processing instructions, typically known as SIMD (Single Instruction, Multiple Data), common examples include Streaming SIMD Extensions and AltiVec. Modern video game consoles and graphics card rely heavily on vector processing in their architecture. In 2000, IBM, Toshiba and Sony collaborated to create a Cell (microprocessor), consisting of one scalar processor and eight vector processors, for the Sony PlayStation 3.

History Vector processing was first worked on in the early 1960s at Westinghouse Electric Corporation in their Solomon project. Solomon's goal was to dramatically increase math performance by using a large number of simple coprocessor (or Arithmetic logic units) under the control of a single master Central processing unit. The CPU fed a single common instruction to all of the ALUs, one per "cycle", but with a different data point for each one to work on. This allowed the Solomon machine to apply a single algorithm to a large data set, fed in the form of an array. In 1962 Westinghouse cancelled the project, but the effort was re-started at the University of Illinois at Urbana-Champaign as the ILLIAC IV. Their version of the design originally called for a 1 GFLOPS machine with 256 ALUs, but when it was finally delivered in 1972 it had only 64 ALUs and could reach only 100 to 150 MFLOPS. Nevertheless it showed that the basic concept was sound, and when used on data-intensive applications, such as computational fluid dynamics, the "failed" ILLIAC was the fastest machine in the world. It should be noted that the ILLIAC approach of using separate ALUs for each data element is not common to later designs, and is often referred to under a separate category, massively parallel computing.

The first successful implementation of vector processing appears to be the CDC STAR-100 and the Texas Instruments Advanced Scientific Computer (ASC). The basic ASC (i.e., "one pipe") ALU used a pipeline architecture which supported both scalar and vector computations, with peak performance reaching approximately 20 MFLOPS, readily achieved when processing long vectors. Expanded ALU configurations supported "two pipes" or "four pipes" with a corresponding 2X or 4X performance gain. Memory bandwidth was sufficient to support these expanded modes. The STAR was otherwise slower than Control Data Corporation's own supercomputers like the CDC 7600, but at data related tasks they could keep up while being much smaller and less expensive. However the machine also took considerable time decoding the vector instructions and getting ready to run the process, so it required very specific data sets to work on before it actually sped anything up.

The vector technique was first fully exploited in the famous Cray-1. Instead of leaving the data in memory like the STAR and ASC, the Cray design had eight "vector processor register" which held sixty-four 64-bit words each. The vector instructions were applied between registers, which is much faster than talking to main memory. In addition the design had completely separate pipelines for different instructions, for example, addition/subtraction was implemented in different hardware than multiplication. This allowed a batch of vector instructions themselves to be pipelined, a technique they called vector chaining. The Cray-1 normally had a performance of about 80 MFLOPS, but with up to three chains running it could peak at 240 MFLOPS – a respectable number even today.

Other examples followed. Control Data Corporation tried to re-enter the high-end market again with its ETA-10 machine, but it sold poorly and they took that as an opportunity to leave the supercomputing field entirely. Various Japanese companies (Fujitsu, Hitachi, Ltd. and Nippon Electric Corporation) introduced register-based vector machines similar to the Cray-1, typically being slightly faster and much smaller. Oregon-based Floating Point Systems (FPS) built add-on array processors for minicomputers, later building their own minisupercomputers. However Cray continued to be the performance leader, continually beating the competition with a series of machines that led to the Cray-2, Cray X-MP and Cray Y-MP. Since then the supercomputer market has focused much more on massively parallel processing rather than better implementations of vector processors. However, recognizing the benefits of vector processing IBM developed IBM ViVA for use in supercomputers coupling several scalar processors to act as a vector processor.

Today the average computer at home crunches as much data watching a short QuickTime video as did all of the supercomputers in the 1970s. Vector processor elements have since been added to almost all modern CPU designs, although they are typically referred to as SIMD. In these implementations the vector processor runs beside the main scalar (computing) CPU, and is fed data from programs that know it is there.

Description In general terms, CPUs are able to manipulate one or two pieces of data at a time. For instance, many CPU's have an instruction that essentially says "add A to B and put the result in C," while others such as the MOS 6502 require two or three instructions to perform these types of operations.

The data for A, B and C could be—in theory at least—encoded directly into the instruction. However things are rarely that simple. In general the data is rarely sent in raw form, and is instead "pointed to" by passing in an address to a memory location that holds the data. Decoding this address and getting the data out of the memory takes some time. As CPU speeds have increased, this memory latency has historically become a large impediment to performance.

In order to reduce the amount of time this takes, most modern CPUs use a technique known as instruction pipelining in which the instructions pass through several sub-units in turn. The first sub-unit reads the address and decodes it, the next "fetches" the values at those addresses, and the next does the math itself. With pipelining the "trick" is to start decoding the next instruction even before the first has left the CPU, in the fashion of an assembly line, so the address decoder is constantly in use. Any particular instruction takes the same amount of time to complete, a time known as the Latency (engineering), but the CPU can process an entire batch of operations much faster than if it did so one at a time.

Vector processors take this concept one step further. Instead of pipelining just the instructions, they also pipeline the data itself. They are fed instructions that say not just to add A to B, but to add all of the numbers "from here to here" to all of the numbers "from there to there". Instead of constantly having to decode instructions and then fetch the data needed to complete them, it reads a single instruction from memory, and "knows" that the next address will be one larger than the last. This allows for significant savings in decoding time.

To illustrate what a difference this can make, consider the simple task of adding two groups of 10 numbers together. In a normal programming language you would write a "loop" that picked up each of the pairs of numbers in turn, and then added them. To the CPU, this would look something like this:

read the next instruction and decode it fetch this number fetch that number add them put the result here read the next instruction and decode it fetch this number fetch that number add them put the result there

and so on, repeating the base command 10 times over.

But to a vector processor, this task looks considerably different:

read instruction and decode it fetch these 10 numbers fetch those 10 numbers add them put the results here

There are several savings inherent in this approach. For one, only two address translations are needed. Depending on the architecture, this can represent a significant savings in of itself. Another savings is fetching and decoding the instruction itself, which only has to be done one time instead of ten. The code itself is also smaller, which can lead to more efficient memory use.

But more than that, the vector processor typically has some form of superscalar implementation, meaning there is not one part of the CPU adding up those 10 numbers, but perhaps two or four of them. Since the output of a vector command does not rely on the input from any other, those two (for instance) parts can each add five of the numbers, thereby completing the whole operation in half the time.

As mentioned earlier, the Cray implementations took this a step further, allowing several different types of operations to be carried out at the same time. Consider code that adds two numbers and then multiplies by a third; in the Cray these would all be fetched at once, and both added and multiplied in a single operation. Using the pseudocode above, the Cray essentially did:

read instruction and decode it fetch these 10 numbers fetch those 10 numbers fetch another 10 numbers add and multiply them put the results here

The math operations thus completed much faster, the limiting factor being the memory accesses.

Not all problems can be attacked with this sort of solution. Adding these sorts of instructions adds complexity to the core CPU. That complexity typically makes other instructions slower — ie, whenever it is not adding up ten numbers in a row. The more complex instructions also add to the complexity of the decoders, which might slow down the decoding of the more common instructions like normal adding.

In fact they work best only when you have large amounts of data to work on. This is why these sorts of CPUs were found primarily in supercomputers, as the supercomputers themselves were found in places like weather prediction and physics labs, where huge amounts of data exactly like this is "crunched".

Vector processor - Wikipedia, the free encyclopedia
A vector processor, or array processor, is a CPU design that is able to run mathematical operations on multiple data elements simultaneously. This is in contrast to a scalar ...

array processor from FOLDOC
vector processor ==> array processor < processor > (Or "vector processor") A computer, or extension to its arithmetic unit, that is capable of performing simultaneous computations ...

vector processor definition | Dictionary.com
The Free On-line Dictionary of Computing, © 1993-2007 Denis Howe

vector processor definition of vector processor in the Free Online ...
A computer with built-in instructions that perform multiple calculations on vectors (one-dimensional arrays) simultaneously. It is used to solve the same or similar problems as an ...

vector processor Content at ZDNet UK
News Articles, Whitepapers, Downloads, Opinion and Resources relating to vector processor ... IBM Goes Retro To Bridge The Supercomputing Divide. News With a technology called ...

Vector processor - What does VP stand for? Acronyms and abbreviations ...
Acronym Definition; VP: Vice President: VP: Vollpension (German: full board) VP: Vasopressin: VP: Virtual Path: VP: Vice Principal: VP: Various Places: VP: Video Processor

Vector - Wikipedia, the free encyclopedia
Dope vector, a data structure used to store information about an array; Vector processor, a computer processor which works on arrays of several numbers at once; Vector graphics, images ...

Vector Processor
Bottom of Page | Previous Page | Next Page | Index | Feedback] Assembler Language Reference Appendix I. Vector Processor. This appendix provides an overview of the vector processor ...

vector processor
The Free Online Dictionary of Computing (http://foldoc.doc.ic.ac.uk/) is edited by Denis Howe < dbh@doc.ic.ac.uk >. Previous: vector graphics Next: vector space

Is vector processor dead?
Is vector processor dead? Ratio of Vector processor to Microprocessor speed vs time. 1993 Cray Y-MP IBM RS6000/550 9.4 1997 NEC SX-4 SGI R10k 9.02





 
Copyright © 2008 opini8.com - All rights reserved.
Home | Terms of Use | Privacy Policy
All Trademarks belong to their repective owners.
Many aspects of this page are used under
commercial commons license from Yahoo!