Home | Menu


A Simple Implementation of DLX using VHDL
Click here for Tutorials in DLX pipeline
Click here for source code
Click here for more references

DLX provides a good architectural model for study, not only because of the recent popularity of this type of machine, but also because it is easy to understand. Like most recent load/store machines, DLX emphasizes

Registers for DLX Data types for DLX
for integer data
     - 8-bit bytes
     - 16-bit half words
     - 32-bit words
for floating point
     - 32-bit single precision
     - 64-bit double precision
The DLX operations work on 32-bit integers and 32- or 64-bit floating point. Bytes and half words are loaded into registers with either zeros or the sign bit replicated to fill the 32 bits of the registers.

Instruction format


Operations
There are four classes of instructions:
1. Load/Store
     Any of the GPRs or FPRs may be loaded and stored except that loading R0 has no effect.
2. ALU Operations
     All ALU instructions are register-register instructions.
     The operations are :
     - add
     - subtract
     - AND
     - OR
     - XOR
     - shifts
     Compare instructions compare two registers (=,!=,<,>,=<,=>).
     If the condition is true, these instructions place a 1 in the destination register, otherwise they place a 0.
3. Branches/Jumps
     All branches are conditional.The branch condition is specified by the instruction, which may test the register source for zero or nonzero.
4. Floating-Point Operations
     - add
     - subtract
     - multiply
     - divide

Performance analysis:

Compare with nonpipelined machine (assuming no hazards or cache misses).

Assume 40% of instructions are ALU operations requiring 4 cycles, 20% are branches requiring 4 cycles, and 40% are loads and stores requiring 5 cycles.  Assume a clock cycle time of 10 ns.

In comparing machines, we can disregard IC and just look at CPI, clock time.

Average instruction time (nonpipelined, 5-cycle) = clock time x CPI
                                = (10ns) x ((.6)(4) + (.4)(5))
                                = 44 ns

Average instruction time (pipelined) = (11 ns)(1) = 11 ns

So the speedup is 4.

Consider a single-cycle implementation of DLX.  Assume the stages require 10 ns, 8 ns, 10 ns, 10 ns, and 7 ns.  Then one instruction can be completed in a single 45 ns clock cycle.

Average instruction time (nonpipelined, single cycle) = (45 ns)(1) = 45 ns

The speedup is 4.1.
 



Home

Research

Resume

Projects

Links