Disclaimer: This is an example of a student written essay.
Click here for sample essays written by our professional writers.

Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UKEssays.com.

Evaluation of MIMD Vs SIMD Architecture

Paper Type: Free Essay Subject: Computer Science
Wordcount: 4235 words Published: 8th Feb 2020

Reference this


     In this paper, I will discuss the advantages and disadvantages of a SIMD and MIMD architecture. Single Instruction, Multiple Data (SIMD) and Multiple Instruction, Multiple Data (MIMD) have many features that we will discuss thoroughly. Flynn’s taxonomy is the classification of parallel computer architecture that is based on the number of concurrent instructions. There are four categories to Flynn’s taxonomy, but we will only discuss MIMD and SIMD.  This paper is designed to evaluate SIMD and MIMD and decide which one is the better architecture. I have a created a criterion to decide which is the better architecture. These include the features of each design, the architecture of each design, and which design is better at being a parallel component.


     “Michael J. Flynn created one of the earliest classifications for parallel computers and programs. Flynn’s taxonomy classified programs and computers whether they are operating using a single set or multiple sets of instructions.” These instructions use a single set of data or multiple sets of data. Flynn’s taxonomy is defined by four classifications. The four classifications are SISD, SIMD, MISD, and MISD. These four classifications are based upon the number of concurrent instruction streams and data streams available in the architecture. In this evaluation, we are going to focus on SIMD and MIMD. SIMD architecture means single-instruction-multiple-data. This allows for the same operation to be done repeatedly over a large period on multiple data pieces. This includes retrieving, calculating or storing information. The most common form of SIMD is signal processing applications.  An example of a SIMD architecture is retrieving multiple files at the same time. MIMD architecture means multiple-instruction-multiple-data. MIMD allows for multiple actions simultaneously on numerous data pieces and is the most common type of parallel programs.  An example of MIMD is various mathematical calculations such as addition and multiplication.

Get Help With Your Essay

If you need assistance with writing your essay, our professional essay writing service is here to help!

Essay Writing Service

     In order to evaluate the two different parallel architectures, we need to focus on the different types of architectures that SIMD and MIMD share and differ. Also, we need to evaluate each of the features that are in the two architectures. We would evaluate the two architectures by evaluating the risks and non-risks of each system. The risks and non-risks can range from security to performance issues. The architecture with the least amount of risks will be a better security architecture. Also, we must measure the latency of each system. The latency in programming means the execution time for each program execution. The instructions that run faster and accurately will have a better latency. This relates to the throughput of each architecture. Throughput is regarded as the number of tasks per unit time. A more affordable and efficient architecture is beneficial since we can run more efficient programs and they will cost less money. The three main evaluations that are key to evaluating architecture is usability, reliability, and modifiability. Usability is regarded as the elegance and clarity with which the interaction with a computer program. Reliability is the assurance that the program will consistently perform according to its specifications. Modifiability refers to the fact that if you can modify or change the design and still implement it into the system. 


    SIMD and MIMD architecture has many features that differ between the two. For starters, a SIMD architecture is a single program with a single processing element that operates simultaneously.  A processing element is provided for improving performance and reducing the number of memory ports by eliminating the dedication of ports to specific functional units by proving data paths to other forward results from functional unit outputs directly to other functional inputs.  Regarding program memory requirements, only one copy of the program is stored. It also has a lower instruction cost because there is only one decoder in the control unit. The complexity of architectures is simple, and the cost of SIMD architecture is low. The architecture is scalable in size and in performance. The conditional statements depend upon data local to processors, all the instructions of then bock must broadcast, and finally, they are followed by all else block. The synchronization in these architectures are implicit in the program and they have an automatic synchronization of all “send” and “receive” operations. Lastly, the total execution time equals the sum of maximal executions times through all processors.

     A MIMD architecture feature is that there is multiple communication programs and processing elements that operate asynchronously. In each processing element, they each store their own program. Also, one decoder is assigned in each processing element.  MIMD has a more complex architecture and has a higher cost than SIMD. Regarding the size, the MIMD architecture is larger. However, it has much better performance. Since MIMD architecture uses multiple instruction streams, this allows for more efficient execution of conditional statements because each processor can independently follow either decision path. In MIMD, explicit data structures, synchronization, operations, identification protocols are needed. Lastly, the total execution time equals the maximum execution time on a given processor.


     The architecture for a SIMD design is complex. As a start, a loop controller generates the loop control signal to complete long vector operations. The loop controller is a system made up of all the hardware components and software control functions needed for the measurement and adjustment of a variable that controls an individual project. Each control loop inside the controller commands a variable in an industrial process. SIMD also utilizes a functional unit to perform vector operations. A functional unit allows a vector processor or an array processor to communicate with the CPU and an implement an instruction set containing instructions that operate on one-dimensional arrays. An array processor is an instruction that operates on multiple data elements at the same time. A vector processor is an instruction that operates on multiple data elements in consecutive time steps. These vector processors only work in a one-dimensional array of numbers. Each instruction performs an operation on each element in consecutive cycles. These cycles pipelined the functional units and operate on different data elements. An advantage of vector processing in SIMD architecture is that the vectors can have a very deep pipeline. Each instruction generates a lot of work that reduces the instruction fetch bandwidth. They also have a highly regular memory access pattern that interleaves multiple banks for higher memory bandwidth. A disadvantage of vector processors is that parallelism only works in regular vector operations and is very efficient if parallelism is only regular.

     A smart compiler is used for vectorizing these instructions. The compiler’s job is a special program that processes statements written in a programming language and then turns them into machine language, so the computer processor can use. The SIMD design makes it efficient enough for executing arithmetic intensive programs.  However, each SIMD architecture suffers from data alignment problems. As a result, extra time overhead hinders automatic vectorization. A SIMD architecture supports the single bank, multi-bank, and multi-port memory systems. A single bank memory system only supports aligned accesses. A multi-bank memory system enables unaligned accesses and the stride accesses with a bank-conflict limitation. Finally, a multi-port memory system is capable of both the unaligned and stride accesses without any limitations.

 For a MIMD architecture, a group is created of a group of memory modules and processors. A memory module is a circuit board that contains DRAM integrated circuits that are installed into the memory slot on a computer motherboard. Each memory module is directly accessed by the means of an interconnection network. The interconnection networks are a class of high-speed computer networks that are usually composed of processing elements. The processing elements are on one end of the network and the memory elements are on the other end. These two ends are connected by a switching element. This group of memory modules outlines a universal address space that is shared between the processors. Each processing element can communicate with others by sending messages. Then the instructions that are sent are used by any accessible data rather than being forced to operate upon a single, shared data stream.

MIMD Architecture in Parallel Computing

      In order to utilize SIMD and MIMD, they must use parallel processing. In a MIMD architecture, they mostly utilize thread and process-level architecture. As a result, multiple threads can be executed in parallel on many computer systems. These threads will split themselves into two or more simultaneously running tasks. In each processor instruction an address space is allocated, and it loads the process of the program into that space. This process of parallel computing is popular among integrated circuit technology.

     A breakthrough that came in MIMD parallel computers was the transputer. The transputer is a parallel microprocessor that has a built-in operating system.  The goal of these transputers was to produce low cost, low power chips to form a complete processor. Most transputers have 32-bit addresses and give 4 Gigabytes address space.

     The MIMD parallel computing design represented the von Neumann machine in its simplest form. It represented the von Neumann machine because it contained a single processor connected to a single memory module. In order for the MIMD parallel computing design to obtain multiple processors and memory modules there would be two options. The first option is to replace the processor/memory pairs and connect them via an interconnection network. This processor/memory pair is known as a processing element. The processor/memory send messages across the processing elements. Also, none of the processing elements can ever access directly to the memory module of another processing element. This is because the processing elements cannot interact with one another and are independent.  The second alternative for obtaining multiple processors and memory modules is to create a set of processors and memory modules. By obtaining a set of multiple processors, each processor will have the ability to access any memory module via an interconnection network. These sets of memory modules define a global address space which is shared. This is different from option one since the processing elements will be able to communicate with each other. Also, each memory module can interact with each other and share a memory space. This type of shared memory system is known as dance-hall. Each Shared Memory MIMD architecture utilizes multiprocessors. Access to local memory could happen way quicker as opposed to accessing data on a remote processor. Furthermore, if the physical distance to the remote processor is greater, access to the remote data will take more time.

Types of MIMD Parallel Computing

    The two most prominent types of parallel computing for MIMD are distributed and shared memory MIMD. Distributed memory refers to a multiprocessor computer system in which each processor has its own private memory. This type of memory typically needs a processor, memory, and some form of interconnection that allows programs on each processor interact with each other. Any computation task can only be computed in local memory. In distributed memory, there is a local memory module. The memory module is a circuit board that contains DRAM integrated circuits. Also, a distributed memory is highly scalable and builds massive parallel computers. Since distributed is highly scalable, their system can process a large amount of work and has the potential to accommodate that growth. Regarding message passing, it has the ability to solve communication and synchronization well. Synchronization gives the system the ability to work with different machines and processors and the MIMD design should still work properly. Since there are local memory modules and message passing, there is no need for monitors.  However, distributed memory must pay special attention to load balancing. This is because the user is responsible for all the problems that could come from load balancing. The user is also responsible for avoiding deadlock and for the partition of code and data among processor elements. Synchronization can lead to these deadlock situations. When a deadlock occurs, two or more processes are waiting for each other to release a resource. This could also result from more than two processes waiting for resources in a circular chain. In distributed memory, the user is in charge of not allowing the processes to wait for a long period of time. As a result of the user being responsible for portioning the code, the user must make a large code base manageable by breaking up different segments of it into smaller chunks that can be handled easily. If the user does not do this the large portion of the code can have many areas of failure and takes up large portions of a disk. Distributed must have a physical copy of data structures among processes.

Find Out How UKEssays.com Can Help You!

Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.

View our services

     In shared memory MIMD parallel computing there is no need to partition either the code or data. Shared memory takes care of large programs by splitting them up into smaller portions themselves. Also, there is no need to physically move data when two or more processors are communicating. Since all the processors share a memory system, there would be no need to move data from one processor to the other since all the information is shared in one location. This also means that we can access data on the same place where the user composed it. When message passing occurs in shared memory, it is much easier to understand since all the messages and information are in the same place. However shared memory has a lack of scalability, so they cannot make large programs and they have to wait for access rights to memory.

SIMD Architecture in Parallel Computing

     In SIMD parallel computing there is one main memory system known as vector processing. A vector processor is a CPU that implements an instruction set containing instructions that operate on 1-D arrays. The CPU contains the control unit, arithmetic logic unit, and register. The control unit directs the operation of the processor. The control unit tells the computer’s memory, arithmetic out, logic unit and input and output devices how to respond to the instructions. The arithmetic logic unit takes care of the comparators and the addition, subtraction, multiplication, and division. A register is a small set of data that holds an instruction, a storage address, or any kind of data.  These instructions have been sent to the processor.   These vectors contain multiple data elements. The number of data elements per vector is typically referred to as a vector length. Both the instructions and the data are pipelined to reduce the decoding time.  A vector processor utilizes Memory to Memory architecture. For all the vector operations, the specific operands are fetched directly from main memory. Also, the vector operations are routed to the functional unit. Once routed to the functional unit, the results are written back to the main memory.

     Since vector processing is independent of its previous results due to them not interacting with other processors, it achieves a high clock rate. A single vector instruction performs a great deal of work which means fewer fetches and fewer branches. As a result, there are fewer mispredictions. The vector instructions access memory a block at a time which results in very low memory latency. For example, if we have less memory access which equals a faster processing time. Since vector operations have a low number of operations compared to its scalar counterparts, they have a lower cost. However, vector processing works well only with data that can be executed in a highly or completely parallel manner. A vector process needs large blocks of data to operate on to be efficient because of the recent advances increasing the speed of accessing memory. As a result, the operations are severely lacking in performance compared on scalar data. Since the individual chips lack performance, there is a high price of chips due to limitations of on-chip memory. The need to vectorize the data has increased code complexity. Lastly, since there is a high price for individual chips, there is also a high cost of design and low returns compared to superscalar microprocessors.

     For each type of memory system listed above (distributed, shared, vector) we must go over the qualifications for the design considerations. To determine the best design consideration, we must decide which memory system can reduce message traffic. Another decision is to decide which system reduces the most memory latency. A way to show the memory latency is by utilizing a communication graph if it is a static or dynamic network. A static connection is used by switching units that are fixed and they use a point to point connection. A dynamic connection is a set of active switching units of the system.


     In order to evaluate the design of MIMD and SIMD we must discover what architecture criteria would make for a more proficient performance. The evaluation of the risks and non-risks of each system. Each architecture must be measured by there latency which means I will compare the execution time of the MIMD and SIMD design. The execution time could also relate to the number of tasks per unit time. A major factor is the benefits and costs of each architecture. The last three ways to evaluate the architecture is by comparing the usability, reliability, and modifiability of each design.

     In a general SIMD design is faster, cheaper, smaller, and simpler.  A SIMD mode is cheaper because the cost is reduced by the need of only a single instruction decoder. The SIMD design only allows for one instruction to be done at one time and it ran in a single program. Only one copy of the program is stored and only has one decoder inside the control unit. SIMD is typically used for problems performing the same operation in parallel. A general MIMD design is slower, more expensive, larger, and more complex compared to SIMD. However, MIMD can compute multiple instructions at the same time and can run multiple programs. As a result, a MIMD design is able to compute multitasking. MIMD is capable of far more complex operations. MIMD architecture is based on the duplication of control units and they each individual processor. MIMD is frequently used for problems that break down algorithms into separate and independent parts. Each part is assigned to a different processor for a simultaneous solution. A MIMD can perform complex operations concurrently while SIMD processors must perform them sequentially. A single instruction stream and implicit synchronization of SIMD make programs easier to create, understand, and debug since it focuses on one instruction at a time.  Opposed to MIMD architecture, the user does not need to be concerned with the relative timings among the processors. This advantage becomes prominent in large-scale systems. MIMD is extremely flexible in that the different operations may be performed on different processors which means there are multiple threads of control. As a result, MIMD is effective for a much wider range of algorithms compared to SIMD. Regarding the type of processor, SIMD mode uses synchronized processors located at the instruction level. Explicit synchronization primitives such as semaphores are required in MIMD. Since MIMD mode is explicit, it has an asynchronous nature result which means it has a higher effective execution rate of instructions that take a variable amount of time to complete. In SIMD mode, a processor must wait until all the other processors have completed an instruction before continuing. This effect is not required in MIMD. Also, the SIMD component by itself may be cheaper but a MIMD mode does not have the added cost of a control unit.  However, since SIMD has a control unit, the control flow instructions and many scalar operations can be overlapped on the control unit which results in a performance advantage.  SIMD mode allows for easier programming since it is a simple architecture and has asynchronous control structure. Since there is an ease of programming for SIMD modes, they work best in highly data-parallel applications.

Which Is the Better Architecture


    A MIMD mode has the advantage with their features because they can handle multiple communication processes and processing elements. Each process can run independently since each processor can communicate. Even though MIMD architecture is more expensive, it has a much better performance which is more crucial for long, complex programs. Also, MIMD architecture uses multiple instruction streams and this allows for more efficient execution of conditional statements because each processor can independently follow either decision path. A SIMD architecture only cannot communicate with other processors and each processor is dependent on each other. Even though SIMD is faster, it cannot handle multiple instructions or complex instructions.


     The MIMD architecture advantages are that multiple threads can be executed in parallel on many computer systems. Each instruction is loaded into a separate memory system.  An address space is allocated and loads a program into the separate memory spaces. MIMD architecture is popular among integrated circuit technology. These types of architectures represent the von Neumann machine in its simplest form. This mode has a single processor connected to a single memory module. The SIMD design makes it efficient enough for executing arithmetic intensive programs.  However, each SIMD architecture suffers from data alignment problems. As a result, extra time overhead hinders automatic vectorization. As a result, from the information in the above paragraphs and the information presented here, the conclusion is MIMD has the better architecture.

     In conclusion, MIMD is the better mode compared to SIMD. MIMD is more expensive but can perform much more complex programs. Also, MIMD can multitask and perform multiple processes at the same time. MIMD is the most basic and most familiar type of parallel processor. MIMD architecture includes a set of N-individual processors. Each processor includes their own memory that can be common to all processors. Each processor can operate independently and asynchronously. Many of the processors may carry out various instructions at any time on various pieces of data. The two most prominent types of parallel computing both belong to MIMD architecture. These are known as the shared memory MIMD and distributed memory MIMD. Shared created a group of memory modules while distributed clones the memory/processor pairs. By providing every processor its own memory, the MIMD architecture bypasses the downsides of SIMD. A processor may only access the memory that is directly connected to it.


  • Berg, Thomas B. “Instruction Execution Trade Off of SIMD vs MIMD.” 12 Nov. 2014, pp. 1–8.
  • Kaur, Mandeep, and Rajkeep Kaur. “A Comparative Analysis of SIMD and MIMD Architectures .” International Journal of Advanced Research in Computer Science and Software Engineering, vol. 3, no. 9, Sept. 2013, pp. 1152–1156., ijarcsse.com/Before_August_2017/docs/papers/Volume_3/9_September2013/V3I9-0332.pdf.
  • Pandey, Siddharth. “Computer Architecture | Flynn’s Taxonomy.” GeeksforGeeks, 8 Feb. 2018, www.geeksforgeeks.org/computer-architecture-flynns-taxonomy/.
  • Quinn, Michael J, and Phillip J Hatcher. “Compiling SIMD Programs for MIMD Architectures .” 3 May 2015, pp. 1–6., www.computer.org/csdl/proceedings/iccl/1990/2036/00/00063785.pdf.


Cite This Work

To export a reference to this article please select a referencing stye below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

DMCA / Removal Request

If you are the original writer of this essay and no longer wish to have your work published on UKEssays.com then please: