DESIGN OF LOW POWER COMPLEX MULTIPLIER USING COMPRESSORS THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT REQUIREMENT FOR DEGREE OF MASTER OF TECHNOLOGY IN VLSI DESIGN
BY Nilay Chandrakant Ghumre UNDER GUIDANCE OF PROF. DR. R. B. Deshmukh Department of Electronics and Computer Science Engineering Visvesvaraya National Institute of Technology Nagpur, May 2010
DEPARTMENT OF ELECTRONICS AND COMPUTER SCIENCE VISVESVARAYA NATIONAL INSTITUTE OF TECHNOLOGY NAGPUR
Date:
CERTIFICATE
This is to clarify that the thesis entitled “ Design of Low Power Complex Complex Multipli Multiplier er using using Compre Compressor ssors” s”
is bonafie bonafiedd workdon workdonee at Visvesv Visvesvara araya ya
National Institute of Technology, Nagpur, India by Nilay Chandrakant Ghumre and is submitted to Visvesvaraya National Institute of Technology, Nagpur, India in partial fulfillment of degree of Master of Technology in VLSI Design
(Dr. R. B. Deshmukh) Guide Department
(Dr. R. M. Patrikar) Head of
DEPARTMENT OF ELECTRONICS AND COMPUTER SCIENCE VISVESVARAYA NATIONAL INSTITUTE OF TECHNOLOGY NAGPUR
Date:
CERTIFICATE
This is to clarify that the thesis entitled “ Design of Low Power Complex Complex Multipli Multiplier er using using Compre Compressor ssors” s”
is bonafie bonafiedd workdon workdonee at Visvesv Visvesvara araya ya
National Institute of Technology, Nagpur, India by Nilay Chandrakant Ghumre and is submitted to Visvesvaraya National Institute of Technology, Nagpur, India in partial fulfillment of degree of Master of Technology in VLSI Design
(Dr. R. B. Deshmukh) Guide Department
(Dr. R. M. Patrikar) Head of
Department of Electronics and Computer Science Engineering VNIT, Nagpur, India 440011. May 2010
DECLARATION
I here with submit the thesis “ Design of Low Power Complex Multiplier Multiplier using Compressors ”
to Visvesv Visvesvaray arayaa Nation National al Instit Institute ute of Techno Technolog logy, y,
Nagpur for degree of Master of Technology in VLSI Design. I carried it out under the guidance of Prof. R. B. Deshmukh, ( Department of Electronics and Computer Science Engineering ). This thesis has not been submitted to any other University/ Institute for award of any degree or diploma.
Date:
Nilay Chandrakant Ghumre
M. Tech, VLSI Design VNIT, Nagpur, India.
Acknowledgement
I express my sincere gratitude to many people who have helped me and supported during the project work. Without them I could not have completed the project on time. I am thankful to my guide, Prof. Dr. R. B. Deshmukh , for his encouragement, patience and valuable guidance throughout entire project, Prof. Dr. R. M. Patrikar for their valuable suggestions and the whole VLSI design lab members for their cooperation and coordination.
I also want to thank my colleagues and friends for their encouragement while completing this project work, I want to thank my parents, without their emotional and moral support nothing was possible. Their love and support always encouraged me, and last but not least I am very thankful to God, who provided me good health and good people around me.
Nilay Chandrakant Ghumre
ABSTRACT
In High-performance VLSI circuits, the on-chip power densities are playing dominant role in both static and dynamic conditions due to shrinking device features. The consumed power is usually dissipated heat, affecting the performance and reliability of the chip. Complex Multiplier is an arithmetic circuit that is extensively used in DSP and communication applications like, FFT, Digital Filters etc. For fast circuit implementation, parallel multiplier is preferred. For large bit-width multiplications, a large number of adders are required to perform the partial product addition.. Compressors are used to compress partial product addition stages. Higher order compressors permit the reduction of the vertical critical paths in parallel multiplier resulting in better speed-power product for the multiplier circuit. Thesis presents a novel scheme for 16*16 bit multiplier using thirteen different types of compressors. The scheme is optimized for low power as well as high speed implementation over reported schemes. It represents low power multiplier design methodology, which counts only number of 1’s in the partial products. .
CONTENTS
1. INTRODUCTION 1.1 Introduction 1.2 Complex Number
1.2.1 Operation of Complex Numbers 1.3 Organization of Thesis
2. SURVEY OF COMPLEX MULTIPLICATION 2.1 General rule of Complex Multiplication 2.2 Cases of Multiplication 2.3 Types of Complex Multiplication
2.3.1 Complex Multiplication for Area Efficient 2.3.2 Multiplication of Complex Number using a low power parallel multiplier 2.4 Related Research
2.4.1 Braun Multiplier 2.4.2 Baugh-Wooley Multiplier 2.4.3 Multiplier using Bypassing circuitary 2.4.4 Multiplier using Adder-Subtractor Unit (ASU) 2.5 Signed Number Multiplication
2.5.1 Representation of Negative Numbers 2.5.2 Booth’s Recoding Algorithm 2.5.3 Basic Technique of Booth’s Recoding Algorithm for Radix-2 and Radix-4
3. MULTIPLIER UNIT 3.1 Partial Product Generator 3.2 Different Order Compressors
3.2.1 Adder as Counter 3.2.2 Compressor Logic
3.3 Parallel Adders 3.4 Architecture of Multiplier using Compressors
4. PROPOSED COMPLEX MULTIPLIER 4.1 Unsigned Multiplication 4.2 Signed Multiplication
4.2.1 Modified Technique Recoding Algorithm for Radix-2 and Radix4
4.2.2 Modified Booth’s Recoding Unit 4.3 Compressors and Adders
5. RESULTS AND DISCUSSION 5.1 Behavioral Simulation 5.2 Synthesis Report 5.3 Power Calculation 5.4 Layout
6.CONCLUSION AND FUTURE WORK 6.1 Conclusion 6.2 Future work
7. REFERENCES
LIST OF FIGURES Figure 2.1. OBC-DA based Complex Multiplier structure Figure 2.2. 4x4 Braun Multiplier Figure 2.3. 4*4 Bypass Multiplier Figure 2.4 4*4 ASU Multiplier Figure 2.5 Adder Subtractor Unit Figure 2.6: - Smart Adder (SA) Figure 3.1. Internal Block Diagram of 16*16 Basic Multiplier Figure 3.2. Partial Product Generator (4 Bit) Figure 3.3. Half Adder Figure 3.4. Full Adder Figure 3.5. Block Diagram of 4:3 Compressor Figure 3.6. Block Diagram of 5:3 Compressor Figure 3.7. Block Diagram of 6:3 Compressor Figure 3.8. Block Diagram of 7:3 Compressor Figure 3.9. Block Diagram of 8:4 Compressor Figure 3.10. Block Diagram of 9:4 Compressor Figure 3.11. Block Diagram of 10:4 Compressor Figure 3.12. Block Diagram of 11:4 Compressor Figure 3.13. Block Diagram of 12:4 Compressor Figure 3.14. Block Diagram of 13:4 Compressor Figure 3.15. Block Diagram of 14:4 Compressor Figure 3.16. Block Diagram of 15:4 Compressor Figure 3.17. Block Diagram of 16:5 Compressor Figure 3.18. Block Diagram of Parallel Adder Figure 3.19. Architecture of 8*8 Multiplier using Compressors
Figure 4.1. Block Diagram of Unsigned Complex Multiplier Figure 4.2. Combinational Logic for intermediate sign
Figure 4.3. Combinational Circuit for output sign Real Part and Imaginary Part Figure 4.4. Modified Complex Multiplier Block Diagram Figure 4.5 Block Diagram of Modified Booth’s Recoding unit Multiplier Figure 4.6 Addition scheme for Radix-2 Figure 4.7 Architecture of 8*8 Signed Multiplier for Radix-2 Figure 4.8 Addition scheme for Radix-4 Figure 4.9 Architecture of 8*8 Signed Multiplier for Radix-4
LIST OF TABLES Table 3.1. Half Adder as a Counter Table 3.2 Full Adder as a Counter
Table 4.1. Booth’s Recoding algorithm Radix-2 Table 4.2. Booth’s Recoding algorithm Radix-4 Table 4.3 Modified Booth’s Recoding Algorithm Radix-2 Table 4.4 Modified Booth’s Recoding Algorithm Radix-4
Chapter 1.
Introduction The electronics industry has achieved a phenomenal growth over the last two decades, mainly due to the rapid advances in integration technologies, large-scale systems design - in short, due to the advent of VLSI. The number of applications of integrated circuits in high-performance computing, telecommunications, and consumer electronics has been rising steadily, and at a very fast pace. Increasing demand for portable electronics for computing and communication, as well as other applications, has necessitated longer battery life, lower weight, and lower power consumption. In order to satisfy these requirements, research activities focusing on low power/low voltage design techniques are underway. Since 'power' is now one of the design decision variables, the expanded design space required for low power has further increased the complexity of an already non-trivial task. Low power design basically involves two concomitant tasks: power estimation and analysis and power minimization. These tasks need to be carried out at each of the levels in the design hierarchy, namely, the behavioral, architectural, logic, circuit and physical levels.[1] In the survey of the current state of the field, many of the salient power estimation and minimization techniques proposed for low power VLSI design are reviewed. For each of the design levels, we provide an overview of several power estimation and minimization approaches and the CAD tools that support them. Finally, future research issues are discussed that will be necessary in order to make the low power design endeavor a successful one. In the majority of digital signal processing (DSP) applications the critical operations are the multiplication and accumulation. Real-time signal processing requires high speed and high throughput Multiplier unit that consumes low power, which is always a key to achieve a high performance digital signal processing system. The purpose of this work is design and implementation of a low power multiplier unit with block enabling technique to save power[2].
1.1
Introduction
Sizes of devices are scaling down by Moore Law. The sources of energy consumption on a CMOS chip can be classified as static and dynamic power dissipation. The dominant component of energy consumption in CMOS is dynamic power consumption caused by the actual effort of the circuit to switch. A first order approximation of the dynamic power consumption of CMOS circuitry is given by the formula: P = C*V2*f
Where P is the power, C is the effective switch capacitance, V is the supply voltage, and f is the frequency of operation. The power dissipation arises from the charging and discharging of the circuit node capacitances found on the output of every logic gate. Power management is the careful planning of power budget for every subsystem of a VLSI chip. This is especially important issue for today’s complex systems. The most important and successful use of power management is to deactivate a portion of circuit when its computation is not required [3]. Every low-to-high logic transition in a digital circuit incurs a change of voltage, drawing energy from the power supply. A designer at the technological and architectural level can try to minimize the variables
in
these
equations
to
minimize
the
overall
energy
consumption. However, power minimization is often a complex process of trade-offs between speed, area, and power consumption. The current work proposes reduction of dynamic switching power in 16*16 complex multiplier by using higher order compressors to reduce the switching activity as well as reduction of gate counts.
Multipliers require high amount of power and delay during the partial products addition. At this stage, most of the multipliers are designed with different kind of adders that are capable to add two/three or at most 4 bits by using 4-2 compressors. For higher
order multiplications, a huge number of adders or compressors are used to perform the partial product addition. Binary counter property has been merged with the compressor property to develop higher order compressors[3] [5].
1.2
Complex Number:-
A complex number is a number comprising a real and imaginary part. It can be written in the form a + bi, where a and b are real numbers, and i is the standard imaginary unit with the property i 2 = −1. To construct a complex number, we associate with each real number a second real number. A complex number is then an ordered pair of real numbers(a,b). Complex numbers were first conceived and defined to to find solutions to cubic equations. The solution of a general cubic equation in radicals (without trigonometric functions) may require intermediate calculations containing the square roots of negative numbers, even when the final solutions are real numbers. This ultimately led to the fundamental theorem of algebra, which shows that with complex numbers, a solution exists to every polynomial equation of degree one or higher. Complex numbers thus form an algebraically closed field, where any polynomial equation has a root. Complex numbers are usually written in the form (A+Bi), where a and b are real numbers, and i is the imaginary unit, which has the property i 2 = −1. The real number a is called the real part of the complex number, and the real number b is the imaginary part. For example, 3 + 2i is a complex number, with real part 3 and imaginary part 2. If, Z=A+Bi, the real part A is denoted by Re(Z) and imaginary part B is denoted by Im(Z). The complex numbers (C) are regarded as an extension of the real numbers (R) by considering every real number as a complex number with an imaginary part of zero. The real number a is identified with the complex number a + 0i. Complex numbers with a real part of zero (Re(z)=0) are called imaginary numbers. Instead of writing 0 + bi, that imaginary number is usually denoted as just bi. If b equals 1, instead of using 0 + 1i or 1i, the number is denoted as i. Two complex numbers are said to be equal if and only if their real parts are equal and their imaginary parts are equal. In other words, if the two complex numbers are
written as a + bi and c + di with a, b, c, and d real, then they are equal if and only if a = c and b = d .[4] [5]
1.2.1 Operations of Complex Numbers:-
Complex numbers are added, subtracted, multiplied, and divided by formally applying the associative, commutative and distributive laws of algebra, together with the equation i2 =
−1. Here,i is the abbreviation of √–1(square root of -1). In other words, i is
something whose square is –1. i) Addition :-
ii) Subtraction :-
iii) Multiplication :-
iv) Division :-
1.3 Organization of Thesis:-
Chapter 2. “Survey of Complex Multiplication”, in that General rules, Cases and Types of Complex Multiplication is explained. Chapter 3. These chapter will explained Basic “Multiplier Unit” using Compressor technique, in that we explained how to generate partial products, compressor technique and parallel adder to generate multiplication.
Chapter 4. Explained “Types of Multiplication”. It explains both unsigned and signed number multiplication. Chapter 5. “Results and Discussion”, it will explain all behavioral simulation result, synthesis result and power calculation result for every multiplier. Chapter 6. “Conclusion and Future Work”, will give conclusion of the thesis and any future work.
-:References:[1] Power Reduction Techniques for Ultra-Low-Power Solutions by Virage Logic Corporation. [2] R.M.Badghare, S.K.Mangal, R.B.Deshmukh , R.M.Patrikar (2009), “Design of Low Power Parallel Multiplier”, Journal of Low Power Electronics, Volume 5, Number 1, April 2009, 31-39. [3] A. Dandapat, S. Ghosal, P. Sarkar, D. Mukhopadhyay (2009), “A 1.2ns16×16-Bit Binary Multiplier Using. High Speed Compressors”, International Journal of Electrical, Computer, and Systems Engineering, 2009, 234-239. [4] Conway, John B. (1986), Functions of One Complex Variable I , Springer, ISBN 0387-90328-3 [5] K.Z. Pekmestzi, "Complex Number Multipliers" IEE Proceed- ings (Computers and Digital Technology), Vol. 136, No. 1, 1989, pp. 70-75
Chapter 2.
Survey of Complex Multiplication
In many real-time DSP applications, high performance is a prime target. However, achieving this may be done at the expense of area, power dissipation and accuracy. Attempts have been made to use alternative number systems to optimize the realization of arithmetic blocks, maintaining high performance without incurring prohibitive area and power increases[1]. Fourier transforms play an important role in many digital signal processing applications including speech, signal and image processing. However, direct computation of Discrete Fourier Transform (DFT) requires on the order of N 2 operations where N is the transform size. Parallel-pipelined FFTs are preferred for both high throughput and low power consumption. 2.1 General rule of Complex Multiplication:-
Consider two complex numbers: (a+bi) and (c+di) ,then (a+bi).(c+di)=(ac-bd) + (ad+bc)i
(ac-bd) is the Real Part of Complex Multiplication and (ad+bc) is the Imaginary Part of
Complex Multiplication. Remember that (ac–bd), the real part of the product, is the product of the real parts minus the product of the imaginary parts, but (ad + bc), the imaginary part of the product, is the sum of the two products of one real part and the other imaginary part.
The positive value
is called the modulus of Z and is denoted as |Z|.
Z=a+bi , then |Z|=
2.2 Cases of Multiplication:-
i)
Multiplication of Complex Number with Real Number:-
In the above formula for multiplication, if d is zero, then you get a formula for multiplying a complex number a+bi and a real number c together: (a+bi).c = ac + bc i.
In other words, we just multiply both parts of the complex number by the real number. For example, let us take two numbers (1+2i) and 3 then after multiplication of these two numbers we get:(1+2i).3= 3+6i
Geometrically, when you double a complex number, just double the distance from the origin, 0. Similarly, when you multiply a complex number z by 1/2, the result will be half way between 0 and z. You can think of multiplication by 2 as a transformation which stretches the complex plane C by a factor of 2 away from 0; and multiplication by 1/2 as a transformation which squeezes C toward 0. ii)
Multiplication of Complex Number with Imaginary Number:-
In the above formula for multiplication, if c is zero, then you get a formula for multipliying a complex number a+bi and a imaginary number d together: (a+bi).di = -bd+ad i.
In other words, we just multiply both parts of the complex number by the imaginary number. For example, let us take two numbers (1+2i) and 3i then after multiplication of these two numbers we get:(1+2i). 3i= -6+3i
2.3 Types of Complex Multiplication 2.3.1 Complex Multiplication for Area Efficient:i) Complex Multiplication using LNS [2]:-
Complex Multiplication for Lower Area i.e. to reduce hardware cost of realizing Complex Multiplier is explained below using Logarithmic Number System(LNS). LNS based complex multiplier employs correction algorithm. It composed with four real multipliers, one adder and one subtractor. Attempts have been made to optimize the realization of the complex multiplier by reducing the number of multipliers and accumulating the partial products; however, the wider the input, the more partial product layers that must be added in order to compute the result. To solve this problem, one can consider the LNS to realize the multiplication as shown in Equations Xo=AC-BD = log -1(log A + log C) – log -1(log B+ log D) Yo=BC+AD = log -1(log B+log C) + log -1(log A + log D) Figure shows the complex multiplier block diagram that is composed from logarithmic and anti-logarithmic converters and N-Bit Adders. This method can significantly reduce the hardware to build a multiplier. LNS provides a simple technique to compute multiplication at the cost of reduced precision. This approach has limited accuracy. ii) Complex Multiplier using OBC and DA [3] :-
A well known Area-Efficient method to implement Complex Multiplier is Offset Binary Coded and Distributed Arithmetic. The structure of Complex Multiplier using OBC-DA is shown below:-
Figure 2.1. OBC-DA based Complex Multiplier structure[3]
It is formed by the following modules: a) Two registers that store a W-bits word each (-(cR-cI) and -(cR+cI)), whose outputs are connected to two multiplexers that are controlled by an XOR of the input bits. b) Two shift-accumulators SA to add and shift the multiplexer output. In this structure a subtraction can happens in each cycle of the computation, as a difference with the previous one where it only happens during the last cycle. The extrabit slide is a bit-serial adder which is needed to complete the two’s complement in any cycle. Another difference is that SA2 includes hardware for loading the offset value (Ao) in carry registers. 2.3.2 Multiplication of Complex Number using a low power parallel multiplier:-
The Conventional Technique of Complex Multiplier is given as (A
+ Bj) . (C + Dj) = (AC –BD) + (AD + BC )j
It requires four multiplication and two adders . In this technique a different way for the realization of complex multiplication that reduces complexity of the circuit. The canonical form of the obtained circuits makes them well suited for VLSI realizations.
Besides circuit reduction, the hardware or software for the control in the realization of the algorithms is simplified, especially when either of these includes only complex operations, as in an FET. Each complex bit takes four possible values. Consequently, it must be represented by two bits. This representation allows the development of algorithms for operations with complex numbers and the ability to describe these algorithms in the bit-level. It is natural that these algorithms and the corresponding circuits have great similarities to those for real numbers in two’s complement form. Complex Parallel multiplication is the most critical for realization. The parallel multiplier includes specialized hardware circuitry designed to perform complex multiplication operations at high speeds. The parallel multiplier requires significantly less die area than conventionally required, which results in reduced manufacturing costs and reduced power consumption.[4]
2.4 Related Research:-
In FPGA designs power reduction is possible only through reduced switching activity, which is also called dynamic power. In general dynamic power consumption is defined as the power consumed while the clock is running and the external inputs are switching. In general design practices to reduce switching activity reduction can be controlled at various levels of the design flow. Architectural decisions in the early design phases have the greatest impact. For high switching signals, delay balancing and reduction of the number of logic levels are among the most efficient techniques to tackle power penalty. An obvious method to reduce the switching activity is to shut down the idle part of the circuit, which is not in operating condition. A general M x N parallel multiplier operates by computing the partial products in parallel and by shifting and accumulating the partial products. Switching activity is poorly correlated with the input coefficient. In particular, reducing the switching activity of the component used in the design can minimize the power dissipation i.e. if kth bit of the coefficient is zero, the kth row of adders need not be activated. However, this type of multiplier does not help us for reduced switching since there is unnecessarily switching of adders even if the kth bit is zero.
2.4.1 Braun Multiplier[4][5] :a2b0
a3b0
a3b1
a3b2
P7
a3b3
+
+
+
P6
P5
a2b3
+
a2b1
+ +
+
a2b2
+
a1b2
+
a1b3
+
a0b3
a1b0
a1b1
+
a0b0
a0b1
a0b2
+ P4
P3
P2
P1
P0
Figure 2.2 4x4 Braun Multiplier
Above figure shows structure of 4*4 Braun Multplier. An n*n bit Braun Multiplier requires n(n-1) adders and n2 AND gates. In these technique each partial product can be added to previous sum of partial products by using row of adders. The Carry-out signals are shifted one bit to the left and then added to the sum of the first adder which is adition of partial product bits. The shifting of carry-out bits to the left is done by carry-save adder. As carry bits are passed diagonally downward to the next adder stage, there is no horizontal carry propagation for the first four rows. Instead, the respective carry bit is “saved” for the subsequent adder stage. Braun Multiplier has some drawback that, the number of components required in building the Braun Multiplier increases quadratically with number of bits. This makes Braun Multiplier inefficient. The delay of Braun Multiplier is dependent on full adder cell and also on final adder in last row. In this multiplier array, a full adder with balanced carry and sum delays is desirable because sum and carry both are in critical path . 2.4.2 Baugh-Wooley Multiplier[6]:-
Baugh-Wooley Multiplier are used for both unsigned and signed number multiplication. Signed Number operands which are represented in 2’s complemented
form. Partial Products are adjusted such that negative sign move to last step, which in turn maximize the regularity of the multiplication array. Baugh-Wooley Multiplier operates on signed operands with 2’s complement representation to make sure that the signs of all partial products are positive. To reiterate, the numerical value of 2’s complement numbers, suppose X and Y can be obtained from following product terms made of one AND gate.
Variables with bars denotes prior inversions. Inverters are connected before the input of the full adder or the AND gates as required by the algorithm. Each column represents the addition in accordance with the respective weight of the product term. 2.4.3 Multiplier using Bypassing circuitary:-
In these technique, The main idea of our approach is based on the observation that most modern multipliers produce a large number of signal transitions while adding zero partial products. If, any bit of the multiplier is zero that row of adders need not to be activated, since corresponding partial product is zero. The adders of these multiplier, however perform summation of the zero partial products and, as result, exhibit redundant signal switching. The increased activity of the internal nodes results in unnecessary power dissipation[7] [8]. To disable this adder rows we have to bypass the partial product of previous adder row to next adder row. It modifies the unnecessary transitions and bypass inputs to outputs when corresponding partial product is zero. Multiplexers are used at the output of full adder to pass the partial product directly when it is zero to the next stage.
Figure 2.3 4*4 Bypass Multiplier
The tri-state buffers, placed at the inputs of the adder cell, disable signal transitions in those adding cells which are bypassed. The output carry-bits c are passed downwards, instead of to the right [9]. 2.4.4 Multiplier using Adder-Subtractor Unit(ASU)[4] :-
In these technique, higher power reduction can be achieved if the operand contains more number of 0’s than 1’s. In this approach it was propose Binary / Booth Recoding Unit which will force operand to have more number of zeros. The advantage here is that if operand contains more successive number of ones then Binary / Booth Recoding unit converts these ones in zeros. Adder-Subtractor Unit also removes the extra 2’s complement addition circuitry needed. Use of look up table is again an added advantage to this design. The switching activity of the component used in the design depends on the input bit coefficient. This means if the input bit coefficient is zero, corresponding row or column of adders need not be activated. If operand contains more zeros, higher power reduction can be achieved. We proposed a Binary / Booth Recoding Unit which will force operand to have more number of zeros.
s2b1
s0b0
s1b1 a3b0
XO R
XO
XO
R
R
XO
XO
R
R
text +/-
s3b3 XO R
a2
Mux
AND
a2
+/-
XO R
Mux
a1
Mux
a0
+/-
s0b3 Mux
a2
XO
Mux
R
text +/-
a2
+/-
s0b2 Mux
+/s1b3
a0b0
XO R
+/-
s1b2
a3b2 s2b3
XO R
+/-
a3b1 s2b2
a1b0
a2b0
a1 Mux
a1
Mux
a0
text +/-
a1
AND
a0
Mux
a0
AND
SA
SA
SA
P6
P5
P4
P3
P1
P2
P0
Figure 2.4 4*4 ASU Multiplier [4]
Figure shows the 4x4 low power ASU multiplier structure. This technique will be very useful as we go for higher width of the multiplicand specially when there are successive numbers of ones.Each ASU will work as an adder or subtractor depending upon the sign bit of sign register. For multiplication with b it will make ASU to work as subtractor and with 0 and 1, it will work as an adder. The great advantage of this technique is that we don’t need extra addition circuitry to add sign extension bits when multiplicand bit is –1. In the upper row of architecture we need to and sign bits with b0. Since when sj=1 and b0=0, if not added produces wrong outputs. At the bottom, ASU will work as half adder or subtractor depending upon the sign bits. For higher width of multiplicand smart adder chain will continue. bi
S(i-1)j+1 C(i-1)j
aj
aj
ASU
Cij
10
Sij
Sj
aj
Figure 2.5 Adder Subtractor Unit[1]
a ibj c (i-1)j
C I+j+1
+/-
S (i-1)j
+
C (i -2)j
XOR
S I+j
Figure 2.6: - Smart Adder (SA)
The Modified Full Adder-Subtractor Unit is constructed as shown in figure. If aj is zero, FA is disabled. Here sj is a sign bit of operand. Structure of smart adder is shown in figure.
2.5 Signed Number Multiplication:-
As we seen in unsigned multiplication, user has to input number as well as sign ,so for total operation of this multiplier we required more hardware and more switching operation hence the switching power, i.e. dynamic power will be more for Unsigned Multiplication. In Signed Multiplication, directly user has to enter signed number, so there is no need to enter separate sign bit for all four numbers. The only difference between Signed number and Unsigned number is the range of the number. As, we saw earlier in section 3.1 the range of the Unsigned number is from 0 to 2ⁿ-1. So, the range of the Signed Number is from –2ⁿ -1 to +(2ⁿ -1-1). 2.5.1 Representation of Negative Numbers:-
For fixed-point number in a radix r system, we have to determine way of negative number to be represented. Two different forms are commonly used:1. Sign and Magnitude Representation. 2. Complement Representation.
1.Sign and Magnitude Representation:-
In this form of representation sign and magnitude are represented separately. First digit is sign bit and the remaining (n-1) bits are magnitude. In binary case, ‘0’ is represented as positive and ‘1’ is represented as negative. In the non-binary case, value 0 and (r-1) are assigned to the sign digit of positive and negative number, respectively. In the binary case all 2n sequences are utilized. The 2n-1 sequence from 00----0 to 01----1 represents positive number, while the remaining 2n-1 sequences from 10----0 to 11----1 represents negative number. A major disadvantage of the signed-magnitude representation is that the operation to be performed may depend on the signs of the operand. For example, when adding a positive number X and a negative number –Y, we need to perform the calculation X+(-Y). If, Y>X, then we should obtain as a final result –(Y-X). For that we have to perform (Y-X) ,i.e., switch the order of operands and perform subtraction rather than addition, and then attach minus sign to it. Example:- +7 would be 111 and then a 0 in front so 00000111 for an 8-bit representation. -9 would be 1001 (+9) and then a 1 so 10001001 for an 8-bit representation 2. Complement Representation:-
In complement representation, numbers are represented as two’s complement in the binary section. In this method, positive number is represented in the same way as signed-magnitude method. It is most widely used method of representation. Positive numbers are simply represented as a binary number with ‘0’ as sign bit. To get negative number convert all 0’s to 1’s , all 1’s to 0’s and then add ‘1’ to it. Suppose, a number which are in 2’s complement form and we have to find its value in binary, then if number starts with ‘0’ then it is a positive number and if number starts with ‘1’ then it is a negative number. If, number is negative take the 2’s complement of that number, we will get number in ordinary binary. Let us take, 1101. Take the 2’s complement then we will get 0011. As, number is started with ‘1’ it is negative number and 0011 is binary representation of positive 3. So, the number is -3. Similarly, we are representing other negative numbers in 2’s complement representation.
Suppose we are adding +5 and -5 in decimal we get ‘0’. Now, represent these numbers in 2’s complement form, then we get +5 as 0101 and -5 as 1011. On adding these two numbers we get 10000. Discard carry, then the number is represented as ‘0’ In this signed multiplication we had modified the Complex Multiplication strategy, normally we are having Four Multipliers and three adder/subtractor blocks. But,in modified strategy we require Three Multipliers and five Adders. For Complex Multiplication of two numbers:-
(a+jb).(c+jd) we get Real Part:-
(c-d).b + c.(a-b)
Imaginary Part:- (c+d).a – c.(a-b) So, we required only Three Multiplication term as c.(a-b) is common term in both results. Hence, we are saving more power than we used in previous method of Complex Multiplication. 2.5.2 Booth’s Recoding Algorithm:Parallel Multiplication using basic Booth’s Recoding algorithm technique based on the fact that partial product can be generated for group of consecutive 0’s and 1’s which is called as Booth’s Recoding. These Booth’s Recoding algorithm is used to generate efficient partial product. These Partial Products always have large number of bits than the input number of bits. This width of partial product is usually depends upon the radix scheme used for recoding. These generated partial products are added by compressor’s
as explained in section
3.2. So, these scheme uses less partial products which comprises low power and area. There are two types of algorithm Radix-2 and Radix-4 to generate efficient partial products for multiplication. First we will explain basic technique of Booth’s Recoding algorithm and then Modified Booth’s Recoding technique for both Radix-2 and Radix-4 algorithm.
2.5.3 Basic Technique of Booth’s Recoding Algorithm for Radix2 and Radix-4:Booth has proposed Radix algorithm for high speed multiplication which
reduces
partial
products
for
multiplication.
The
Booth’s
algorithm for multiplication is based on this observation. To do a multiplication A*B, where A= an ,an-1…..a0 is a multiplier
B= bn ,bn-1…..b0 is a multiplicand then, we check every two consecutive bits in A at a time:Ai 0 0 1
Ai-1 0 1 0
1
1
Y 0 1.B -1.B
Comments Middle of 0’s End of 1’s Beginning of
0
1’s Middle of 1’s
Explanation String of 0’s shift only Add and Shift Add and Shift String of 1’s shift only
Table 2.1. Booth’s Recoding algorithm Radix-2
Ai+1 0
Ai 0
Ai-1 0
Y 0
Comments
Explanation
Strings of Two bit shift only
0
0
1
zeros 1.B End of 1’s
0 0 1
1 1 0
0 1 0
1.B 2.B -2.B
Add and two bit shift
A single 1 Add and two bit shift End of 1’s Add and two bit shift Beginning of Add and two bit shift 1’s
1 1 1
0 1 1
1 0 1
-1.B -1.B
A single 0
Add and two bit shift Beginning of Add and two bit shift
0
1’s Strings
of Two bit shift only
zeros Table 2.2. Booth’s Recoding algorithm Radix-4
Let us take example:Radix-2:-
Suppose A is Multiplier having value -5 and B is Multiplicand having value +2 then, B=> 0010 (+2) A=> 1011 (-5) After looking into above table for multiplicand, first we see two LSB values and then adjacent values in A. We, get partial product as:i)
For 10 we have to perform -1.B, i.e., 2’s complement of B, 1110.
ii)
For 11 we have to put all 0’s i.e., 0000.
iii)
For 01 we have to perform 1.B, i.e., value of B,0010
iv)
For 10 again -1.B, i.e. 1110.
Here, some bits are encapsulated called as correction bits to match the width of partial products.
Radix 4:A=> -5 => 1 1 1 1 1 0 1 1 B=> 46 => 0 0 1 0 1 1 1 0, then the following Partial Products are generated:-
In the above technique of Booth’s Algorithm vertical length of partial products are more, hence more adders are required, so power and area will be more.
-:References:-
[1] Solomentsev, E.D. (2001), "Complex number", in Hazewinkel, Michiel, Encyclopaedia of Mathematics, Springer, ISBN 978-1556080104
[2] Man Yan Kong; Langlois, J.M.P.; Al-Khalili, D.(2008), “Efficient FPGA implementation of complex multipliers using the logarithmic number system “Circuits and Systems, 2008. ISCAS 2008. IEEE International Symposium on Digital Object Identifier, Page(s): 3154 – 3157. [3] Pascual, A.P.; Valls, J.; Peiro, M.M(1999), “Efficient complex-number multipliers mapped on FPGA”, Electronics, Circuits and Systems, 1999. Proceedings of ICECS '99. The 6th IEEE International Conference on [4] R.M.Badghare, S.K.Mangal, R.B.Deshmukh , R.M.Patrikar (2009), “Design of Low Power Parallel Multiplier”, Journal of Low Power Electronics, Volume 5, Number 1, April 2009, 31-39. [5] Jones, C.M. ; Dlay, S.S. ; Naguib, R.G.(Oct 1996), “Berger check prediction for concurrent error detection in the Braun array multiplier”, Electronics, Circuits, and Systems, 1996. ICECS '96., Proceedings of the Third IEEE International Conference, Pages 81 - 84 vol.1
[6] C. R. Baugh and B. A.Wooley, .A two.s complement parallel array multiplication algorithm., IEEE Trans. Comput., Dec. 1973, vol. C-22, pp. 1045-1047. [7] Ko-Chi Kuo; Chi-Wen Chou (2006),” Low Power Multiplier with Bypassing and Tree Strucuture” Circuits and Systems, 2006. APCCAS 2006. IEEE Asia Pacific Conference 4-7 Dec. 2006,602 – 605. [8] J. Ohban, V.G. Moshnyaga, and K. Inoue, Multiplier energy reduction through bypassing of partial products,
Asia-Pacific Conf. on Circuits and Systems. 2002.,vol.2,
pp. 13-17. [9] Ming-Chen Wen, Sying-Jyan Wang, and Yen-Nan Lin, Low Power Parallel Multiplier with Column Bypassing , Electronics letters, 10, 12 May 2005 Volume
41, Issue Page(s): 581 – 583
Chapter 3.
Multiplier Unit
As explained in previous chapters about various technique of Complex Multipliers, we found that implementation of Complex Multipliers are implemented using more than one number of Basic Multipliers are required, i.e. to implement normal way to implement Complex Multiplication, four Basic Multipliers are required. To make Complex Multiplier as low power unit, this Basic Multipliers are designed by using Compressor technique. If, the Basic Multiplier is designed as low power then Complex Multiplier also becomes a low power unit.
Figure 3.1 Internal Block Diagram of 16*16 Basic Multiplier[2]
The above figure shows Internal Block Diagram of Basic Multiplier. It consists of three stages:i)
Partial Product Generator
ii)
Different Order Compressors
iii)
Parallel Adder
Below is the description of all three blocks that are used for multiplication. 3.1 Partial Product Generator:-
In Unsigned Multiplier, normally we are generating partial products and adding them to generate result of multiplier. Let ‘A’ and ‘B’ are two n-bit unsigned numbers which is generating product ‘Z’ which is of 2n-bit. First we are generating Partial products by using ‘AND’ operation. For n bit number multiplication n*n number of partial product generated.
Let us take two 16-bit numbers A15-A0 called Multiplicand and B15-B0 called Multiplier as inputs of multiplier, partial products are generated by ANDing each bit of ‘A’ with each bit of ‘B’, so 16*16=256 number of partial products are generated. Each bit of multiplicand is ANDed with every bit of multiplicand. a0 is ANDed with b0-b15 producing m00-m015 sixteen partial product for first row. Similarly, for other 14 rows we are using AND operation of a1-a15 with b0-b15 for producing other 240 remaining partial products i.e. from m01-m1515.
Figure 3.2. Partial Product Generator(4 Bit)
In above diagram Partial Product Generator is explained. a0 bit which is multiplicand is ANDed with other bits of multiplier b0-b3 producing sixteen partial products m00-m33. This Partial Products is going to the inputs of Compressors to compress the partial product stages. This Compressors are used to reduce the stages of partial products into only two stages.
3.2 Different Order Compressors[1][3][4]:After Generation of Partial Products, these partial products are going to inputs to compressors. Compressors are used to reduce the partial product stages of the multiplier. The main operation of compressors is to count number of 1’s. After generating partial products we have make vertical groups. This vertical groups will count number of 1’s and count value of that group is passed it on second stage.
3.2.1 Adder as Counter:Adder circuit whether it is a full adder or half adder can be used as a counter which counts number of 1’s.
Figure 3.3. Half Adder
Figure
3.4.Full Adder
A
B
Carr y
Sum
0
0
0
0
0
1
0
1
1
0
0
1
1
1
1
0
Table 3.1. Half Adder as a Counter
A
B
C
Carr
Sum 0
0
0
0
y 0
0
0
1
0
1
0
1
0
0
1
0
1
1
1
0
1
0
0
0
1
1
0
1
1
0
1
1
0
1
0
1
1
1
1
1
Table 3.2. Full Adder as a Counter[2]
Above table shows the half adder and full adder as a counter, it counts number of 1’s , if inputs are A,B and C then its count value carry and sum together gives number of 1’s in binary form. Carry is Most Significant Bit and Sum is Least Significant Bit. This adder which uses three inputs and generating two outputs, so it means it compresses three bits into two bits called 3:2 compressor. Similarly, on the basis of these logic we can make other types of compressors having more number of inputs called higher order compressors. These compressors count number of 1’s of
higher number of inputs. So, as vertical length of partial products increases we can use these higher order compressors.
3.2.2Compressor 3.2.2Compressor Logic:Different Compressor logic based upon the concept of counter of full adder. It can be defined as single bit adder circuit that has more than three inputs as in full adder and less number of outputs. It is noticed that in full adder there are three outputs so, it will count upto three(11). Similarly, for three bit output it will count upto maximum seven(111) value. Compressors having four,five,six and seven number of inputs produces three number of outputs which counts maximum seven(111) value. Other Compressors having eight to fifteen number of inputs produces four number of outputs which counts maximu maximum m fiftee fifteen(1 n(111 111) 1) value. value. So, these these compre compressor ssors s are build build depend depend on number of inputs inputs they are having having and what count value they they have have to genera generate. te. Follow Following ing is the descri descripti ption on of differ different ent compressor logics with their block diagrams:-
1) 4:3 Compressor:-
Figure 3.5. Block Diagram of 4:3 Compressor
Above figure shows block diagram of 4:3 Compressor. It consists of four inputs and three outputs. 4:3 Compressor has two Half Adders and one Parallel Adder. If, all four inputs are 1 then it will give maximum count value as 100 . Consider the output bits represented as j, (j+1), and (j+2). (j+2)th bit is MSB and jth bit is LSB.
2) 5:3 Compressor:-
Figure 3.6. 3.6. Block Block Diagram of 5:3 Compressor Compressor
Above figure shows block diagram of 5:3 compressor. compressor. It consists of five inputs and three outputs. 5:3 Compressors has one Half adder,
one Full adder and a Parallel Adder. So, the maximum count value will be 101. Consider the output bits represented as j, (j+1), and (j+2). (j+2)th bit is MSB and jth bit is LSB.
3) 6:3 Compressor:-
Figure 3.7. 3.7. Block Block Diagram of 6:3 Compressor Compressor Above figure figure shows block diagram of 6:3 compressor. compressor. It consists consists of six inputs and three outputs. 6:3 Compressor has two Full adders and one parallel adder.So, the maximum count value of 6:3 compressor will be 110. Consider the output bits represented as j, (j+1), and (j+2). (j+2)th bit is MSB and jth bit is LSB.
4) 7:3 Compressor:-
Figure 3.8. Block Diagram of 7:3 Compressor
Above figure shows block diagram of 7:3 compressor. It consists of seven inputs and three outputs. 7:3 Compressors has one 4:3 Compressor, one Full adder and one parallel adder. So, the maximum count value of 7:3 compressor is 111. Consider the output bits represented as j, (j+1), and (j+2). (j+2)th bit is MSB and jth bit is LSB.
5) 8:4 Compressor:-
Figure 3.9. Block Diagram of 8:4 Compressor
Above figure shows block diagram of 8:4 compressor. It consists of eight
inputs
and
four
outputs.
8:4
Compressor
has
one
5:3
Compressor, one Full Adder and one Parallel Adder. The maximum count value of 8:4 compressor is 1000. Consider the output bits represented as j, (j+1), (j+2), (j+3). (j+3)th bit is MSB and jth bit is LSB.
6) 9:4 Compressor:-
Figure 3.10. Block Diagram of 9:4 Compressor
Above figure shows block diagram of 9:4 Compressor. It consists of nine inputs and four outputs. 9:4 Compressor has one 6:3 Compressor, one Full Adder and one parallel adder. The maximum count value of 9:4 compressor is 1001. Consider the output bits represented as j, (j+1), (j+2), (j+3). (j+3)th bit is MSB and jth bit is LSB.
6) 10:4 Compressor:-
Figure 3.11. Block Diagram of 10:4 Compressor Above Figure shows block diagram of 10:4 Compressor. It consists of ten inputs and four outputs. 10:4 Compressor has one 7:3 Compressor, one Full Adder and one Parallel Adder.The maximum count value of 10:4 compressor is 1010. Consider the output bits represented as j, (j+1), (j+2), (j+3). (j+3)th bit is MSB and jth bit is LSB.
7) 11:4 Compressor:-
Figure 3.12. Block Diagram of 11:4 Compressor
Above Figure shows Block Diagram of 11:4 Compressor. It consists of eleven inputs and four outputs. 11:4 Compressor has one 7:3 Compressor, one 4:3 Compressor and one Parallel Adder. The maximum count value of 11:4 compressor is 1011. Consider the output bits represented as j, (j+1),(j+2) and (j+3). (j+3)th bit is MSB and jth bit is LSB.
8) 12:4 Compressor:-
Figure 3.13. Block Diagram of 12:4 Compressor
Above Figure shows Block Diagram of 12:4 Compressor. It consists of twelve inputs and four outputs. 12:4 Compressor has one 7:3 Compressor, one 5:3 Compressor and one three-bit Parallel adder. The maximum count value of 12:4 compressor is 1100. Consider the output bits represented as j, (j+1), (j+2) and (j+3). (j+3)th bit is MSB and jth bit is LSB.
9) 13:4 Compressor:-
Figure 3.14. Block Diagram of 13:4 Compressor
Above Figure shows Block Diagram of 13:4 Compressor. It consists of thirteen inputs and four outputs. 13:4 Compressors has one 7:3
Compressor, one 6:3 Compressor and one three-bit parallel adder.The maximum count value of 13:4 compressor is 1101. Consider the output bits represented as j, (j+1), (j+2) and (j+3). (j+3)th bit is MSB and jth bit is LSB.
10)
14:4 Compressor:-
Figure 3.15. Block Diagram of 14:4 Compressor
Above Figure shows Block Diagram of 14:4 Compressor. It consists of fourteen inputs and four outputs. 14:4 Compressor has two 7:3 Compressors and one three-bit parallel adder. The maximum count value of 14:4 compressor is 1110. Consider the output bits represented as j, (j+1), (j+2) and (j+3). (j+3)th bit is MSB and jth bit is LSB
11)
15:4 Compressor:-
Figure 3.16. Block Diagram of 15:4 Compressor
Above Figure shows Block Diagram of 15:4 Compressor. It consists of fifteen inputs and four outputs. 15:4 Compressors has one 8:4 Compressor, one 7:3 Compressors and one three-bit parallel adder.The maximum count value of 15:4 compressor is 1111. Consider the output bits represented as j, (j+1), (j+2) and (j+3). (j+3)th bit is MSB and jth bit is LSB
12)
16:5 Compressor:-
Figure 3.17. Block Diagram of 16:5 Compressor
Above Figure shows Block Diagram of 16:5 Compressor. It consists of sixteen inputs and five outputs. 16:5 Compressors has two 8:4 Compressors and one four-bit parallel adder. The maximum count value of 16:5 compressor is 10000. Consider the output bits represented as j, (j+1), (j+2) ,(j+3) and (j+4). (j+4)th bit is MSB and jth bit is LSB.
These different order Compressors are used to reduce the partial product stages. Compressors are also used to reduce the switching operations as we are used to count the number of 1’s only. The partial
products
generated
compressors vertically.
is
divided
into
different
order
3.3
Parallel Adders:-
Figure 3.18. Block Diagram of Parallel Adder
Above figure shows Block Diagram of Parallel Adder. It consists of cascaded Full Adder’s. Depending on length of output that many of adders are used. For N*N multiplication 2N number of full adders are used. Here, Cout of first full adder is connected to Cin of next adjacent full adder. The main concept of these parallel adder is comes from Carry Look-ahead Adder. The output of Parallel Adder is the final output of Multiplier.
3.4 Architecture of Multiplier Using Compressor:Following figure shows the Architecture of 8*8 Multiplier using different order Compressors.
.
Figure 3.19. Architecture of 8*8 Multiplier using Compressors[2]
As, shown in above figure Partial Products are added in four stages. Adders and different compressors are used to minimize the stage operations. Compressors are used carefully so that minimum number of outputs are generated. Consider column number eight, where eight bits are added at the first stage. These eight bits are added by using 8:4 Compressor, that generates four output which eventually decreases number of bits for next stage. It is to be mentioned that output of each compressor from 4:3 to 7:3 has bit position jth, (j+1)th and (j+2)th, where jth bit is LSB bit and (j+2)th bit is MSB bit.Compressor from 8:4 to 15:4 has bit position jth, (j+1)th, (j+2)th and (j+3)th, where jth bit is LSB and (j+3)th is MSB. Compressor 16:5 has bit position jth, (j+1)th, (j+2)th, (j+3)th and (j+4)th, where jth bit is LSB and (j+4)th is MSB.
Suppose, if compressor in column number four i.e.,4:3 Compressor, its jth output goes to column number four and next adjacent output i.e.,(j+1)th output goes to column number five and (j+2)th output goes to column number six. Similarly, for eight column i.e. for 8:4 compressor,its jth output goes to column number eight and next adjacent output (j+1)th output goes to column number nine and last output(j+3)th output goes to column number eleven. Thus, these compressors are used to reduce vertical critical path more rapidly. Now, similarly for next stage if vertical path having bit more than two bits, we used compressors of that many bits to reduce again the vertical critical path. Finally, we use compressors upto the stage where only vertically two bits are there and that two bits are added parallely as explained in section 3.3.
-:References:[1] C.H.Chang, J.Gu, M.Zhang (2004) ,”Ultra low-voltage low-power CMOS 4-2 and 5-2 compressors for fast arithmetic circuits”, Circuits and Systems Regular Papers, IEEE Transactions page(s): 1985- 1997, Volume: 51, Issue: 10, Oct. 2004. [2] A. Dandapat, S. Ghosal, P. Sarkar, D. Mukhopadhyay (2009), “A 1.2ns16×16-Bit Binary Multiplier Using. High Speed Compressors”, International Journal of Electrical, Computer, and Systems Engineering, 2009, 234-239. [3] J. Gu, C.H.Chang (2003), “Ultra low voltage low power 4-2 compressor for high speed multiplications”. Circuits and Systems, 2003.ISCAS ’03. Proceedings of the International Symposium, vol. 5, May 2003, 321-324. [4] K. Prasad and K. K. Parthi (2001), “Low power 4-2 and 5-2 compressor”. Proc. of the 35th Asilomar Conf. on Signals, Systems and Computors, vol. 1, ,2001,129-133.
Chapter 4.
Proposed Complex Multiplier In these Chapter we proposed new Complex Multiplier for both unsigned and signed Complex Multiplication. 4.1 Unsigned Multiplication:-
As, we saw in General rule of Complex Multiplication when we multiplying two complex numbers we are getting four different multipliers and three adders/subtractors. The range of unsigned number is 0 to 2ⁿ-1 Being as a unsigned number, we have to enter separate sign for all four real numbers hence, we are getting real and imaginary parts of the number with sign of real and imaginary by using some combinational logic we are getting Real and Imaginary sign output.
Figure 4.1. Block Diagram of Unsigned Complex Multiplier
As shown in figure 1, we are entering four real numbers ‘a’,’b’,’c’ and ‘d’ & sign of each number as ‘sa’, ‘sb’, ‘sc’, ‘sd’. After, multiplying the Real numbers using four Multipliers and by using Add/Sub Block of 32 bit we are getting output as “rr” which is Real part and “ri” which is Imaginary part of the result of Complex Multiplication. Similarly, to get sign of result for both Real and Imaginary part we have to apply some combinational logic for sign inputs and we are getting output sign as “ssr” for Real part and “ssi” for Imaginary part. As explained in Chapter 2. multiplication of Two Complex Numbers. (a+bi).(c+di)=(ac-bd) + (ad+bc)i
As, we are entering sign of each number separately, we have to use some combinational circuit to produce sign of result for Real part(sr) as well as Imaginary part(si). Consider first term “ac” represent as ‘e’, “bd” represent as ‘f’, “ad” represent as ‘g’ and “bc” represent as ‘h’. So, sign of these results represented as se,sf,sg and sh. So, these sign results will be generated as by using XORing operations.
se= sa xor sc.
sf= sb xnor sd.
sg= sb xor sc.
sh= sa xor sd. Figure 4.2. Combinational Logic for intermediate sign
Now, by using some condition on se, sf, sg, and sh, we are generating final sign result, i.e. for “sr” for real part and “si” for imaginary part. We are applying 2:1 Mux to generate the output sign value. ‘0’ is represented for Positive Value and ‘1’ is represented for Negative Value.
Figure 4.3. Combinational Circuit for output sign Real Part and Imaginary Part
4.2 Signed Multiplication:-
Figure 4.4. Modified Complex Multiplier Block Diagram .
Above Block Diagram shows Modified Complex Multiplier which consists of three multipliers and three adder/subtractor unit. These multiplier requires one less multiplier compare to previous technique. So, it consumes less power. To perform signed multiplication we are using Booth’s Radix algorithm. Booth’s Radix algorithm reduces partial products as compared to normal multiplier algorithm. So, it reduces the switching operation of the multiplier, hence reduces power. It is based on the fact that partial product can be generated for group of consecutive zeros & ones which is called as Booth’s recoding.
4.2.1 Modified Technique Recoding Algorithm for Radix-2 and Radix-4[1][2]:Parallel Multiplication using basic Booth Recoding Technique is explained in previous section. Since this technique requires lot of adders as a
result it requires more power & area. In next proposed multiplier design, we have reduced number of adders required in partial product addition. Hence, reduction of vertical length of Partial Products. In these technique, mainly correction bits are reduced This is done without compromising correctness of multiplication of 2’s complement numbers. We have used Multiplexer based Booth Recoding scheme to reduce the length and width of partial products. In these technique, change in scheme results in partial products which after recoding are always greater than input bit length by one bit Radix-2 scheme. Similarly, in Radix-4 scheme recoding are always greater than input bit length by two bits. These additional bit/bits are act as a correction bit/bits to get correct value of the multiplier. Also, at hardware realization of Booth’s recoding scheme, we can remove extra select line, which is used at the time of recoding. Because of this extra select lines multiplexer size become large. We have observed that if we do not consider this extra bit at the time of hardware realization we can reduces size of one multiplexer. So, in radix 2 LSB decides first partial product. Also, in radix 4 first two LSB bits decides first partial product. Now these partial products have been added using proposed array of adders to achieve correct multiplication output. The working of this novel design has been explained in following sections.
Figure 4.5 Block Diagram of Modified Booth’s Recoding unit Multiplier[1]
In order to achieve signed number multiplication Partial Products are generated using Modified Booth’s Recoding Unit Multiplication block. After generation of new Partial products these are added using Compressors and Parallel adder. Below is the explanation of Modified Booth’s Recoding Unit for Multiplier. 4.2.2 Modified Booth’s Recoding Unit[3]:-
Partial Products are generated using Modified Booth’s Recoding Unit block. As, we saw in previous section generation of Partial Products for basic Booth’s Recoding algorithm, using the same concept we are generating partial products for Modified Booth’s Recoding Algorithm having the length of partial product more than input bit sequence by one for Radix-2 scheme and by two for Radix-4 scheme.
These modified technique is explained below:Radix-2 Method:-
As, we saw in Table 1. output partial products are added and shifted according to input sequence. Here, we are using multiplexers to generate recoding unit. Select lines of multiplexers are input bits of multiplier and outputs are according to modified table as shown below:Ai 0 0 1
Ai-1 0 1 0
1
1
Y 0 1.B -1.B
Explanation All 0’s [ B(n-1) , B ] --------
0
[ B(n-1) , (-B) ] All 0’s
Table 4.3 Modified Booth’s Recoding Algorithm Radix 2
This can be explained with simple example:Suppose B => 1100 (-4) A => 1010 (-6) So, according to table as shown above we will obtained recoding bits as partial products:PP0 => 0 0 0 0 0 PP1 => 0 0 1 0 0 PP2 => 1 1 1 0 0 PP3 => 0 0 1 0 0 Here, in Modified Booth’s Recoding algorithm one extra bit is added to the MSB of the input bit sequence as shown in Table. The hardware realization for this recoding unit is based on multiplexers and include 2’s complement unit. At the time of recoding we are assuming one extra bit ‘0’ before the LSB of input bit sequence and these extra bit ‘0’ decides Partial Product according the sequence as explained in Table above. We have observed that at the time of hardware realization only LSB is sufficient to get partial products, because of these multiplexer become 2x1 rather than 4x1 and other
multiplexers will remain same as per their input select lines depending upon recoding scheme. So, multiplexers are important hardware for Booth’s Recoding unit. Radix-4 Method:-
Radix-4 scheme is same as above Radix-2 scheme which is also used to reduce the partial product, so it is very useful for fast multiplication of long input bit sequence. Here, partial products we got from recoding unit is always 2 bit more than input bits. So, if input bits are n bits then partial product length will be of (n+2) bits. Ai+1 0 0 0 0 1 1 1 1
Ai 0 0 1 1 0 0 1 1
Ai-1 0 1 0 1 0 1 0 1
Y 0 1.A 1.A 2.A -2.A
Explanation All 0’s [A(n), A(n), A] [A(n), A(n), A] [A(n), A, 0] --------
-1.A
[A(n-1), -A, 0] -------- --------
-1.A
[A(n-1), A(n-1), -A] -------- --------
0
[A(n-1), A(n-1), -A] All 0’s
Table 4.4 Modified Booth’s Recoding Algorithm Radix-4
Above Table shows how partial products are generated according to input bit sequence. Here, we are generating two extra bits according the input bit. These two bits are correction bits to get corrected output of multiplication. MSBs of partial products need to be added carefully. For that, new structure of adder array is introduced. This modification removes the problem of large number of correction bits which requires more numbers of adders hence more higher order compressors.
4.3 Compressors and Adders:-
Recoding and Addition scheme for Radix-2 and Radix-4 for four bit input sequence [4] [5]:-
Figure 4.6 Addition scheme for Radix-2
Above figure shows the addition scheme for Radix-2 which having five bit partial product. These partial product are added using compressor scheme as explained previously. Here, value of m(0)(4) is added diagonally. i.e, added with diagonal bit which is MSB of second partial product and also a correction bit. So, we are adding m(0)(4) with m(1)(4) and result of that is putting in place of m(1)(4). Similarly, that new value of MSB of second partial product row is added with old MSB of third partial product to get new value of MSB of third partial product as shown in above figure. After getting new values of correction bit we are adding these nits by using compressors.
Figure 4.7 Architecture of 8*8 Signed Multiplier for Radix-2 [5]
Above figure shows Architecture of 8*8 Signed Multiplier for Radix-2 scheme where partial products are generated by using Modified Booth’s Recoding Unit. Here, we are generating partial product of 9 bits per row. In first stage, this partial products are divided in vertical blocks, these vertical blocks are half adders, full adders and different order compressors. Vertical block of 2 Bits are half adders and vertical block of 3 bits are full adders. Output of these adders and compressors arranged as explained in chapter 3. Horizontal blocks are parallel adders which are used for addition to generate final multiplication result.
Figure 4.8 Addition scheme for Radix-4
Above figure shows addition scheme for Radix-4 which having six partial product bits, four LSB bits are input sequence and two MSB’s are correction bit. Here, MSB of the first row of partial products is added to both MSB’s of second row. In Modified Radix-4 scheme total number of partial products row are half of the normal partial product scheme. Suppose, if the multiplier is of 4*4 bit then total number of rows for partial product including correction bits are two, i.e. half of the rows of original scheme as shown in above figure. Similarly, for other wide bit multiplier using radix-4 scheme total number of partial products row are half of the original, that results in less switching operation hence, less power.
Figure 4.9 Architecture of 8*8 Signed Multiplier for Radix-4
Above figure shows Architecture of 8*8 Signed Multiplier of Radix-4 scheme where Partial Products are generated by using Modified Booth’s Recoding Unit. In this scheme we are generating partial products of 10 bit each, i.e. extra two bit for each row as explained in table of Radix-4 scheme. The main advantage of Radix-4 scheme is that number of rows for partial products are become half of the Radix-2 method, i.e., here in 8*8 multiplier number of partial products row are become four, so less compressors are required and hence less switching operation which causes low-power.
-: References:[1] D. A Pucknell, K. Eshraghain, Basic VLSI Design, Prentice-Hall, ISBN 81-203-0986-3. [2] Israel Koren, Computer arithmatics algorithms A.K.Peters Ltd. ISBN 1568811608. [3] A.D.Booth, A signed binary multiplication technique, Quarterly Journal of Mechanics and Applied mathematics, vol-IV,pt-2-1951. [4] C.H.Chang, J.Gu, M.Zhang (2004) ,”Ultra low-voltage low-power CMOS 4-2 and 5-2 compressors for fast arithmetic circuits”, Circuits and Systems Regular Papers, IEEE Transactions page(s): 1985- 1997, Volume: 51, Issue: 10, Oct. 2004. [5] A. Dandapat, S. Ghosal, P. Sarkar, D. Mukhopadhyay (2009), “A 1.2ns16×16-Bit Binary Multiplier Using. High Speed Compressors”, International Journal of Electrical, Computer, and Systems Engineering, 2009, 234-239.
Chapter 5.
Results and Discussion
5.1 Behavioral Simulation 5.2 Synthesis Report 5.3 Power Calculation 5.4 Layout
This section shows all the results of different blocks which are used for implementation of Complex Multiplier. It consists of Simulation Results of different blocks, Synthesis Report and Power Calculation of different blocks. Power of the design is calculated by giving 100 Random Inputs. Test Bench is written in VHDL. The textio format is used where, input is given in input file called infile and we are getting output in output file called outfile. All of the below design are simulated using ModelSim XE III 6.2g, synthesized by using Xilinx ISE Project Navigator 9.1i, power calculation using Xilinx XPower tool. Power Calculation is also calculated in ASIC Encounter synthesis tool.
5.1 Behavioral Simulation:i) Unsigned Basic Multiplier16*16:-
Figure 5.1 Behavioral Simulation of Unsigned 16*16 Basic Multiplier
Above Figure shows the simulation of 16*16 unsigned multiplier. Inputs are ‘a’ and ‘b’ each of 16 bit, while ‘z’ is the 32 bit output. As, this is unsigned multiplier range of input number is from 0 to 65535. Here, in these type of multiplier no negative number is considered. All are positive numbers. As shown in the simulation diagram if both inputs ‘a’ and ‘b’ value is entered as unsigned 7 i.e. “0000000000000111”
in binary we get output ‘z’ value as 49 in unsigned format. Consider the
maximum value i.e. 65535 which is highest value for 16 bit unsigned format. It consists of all 1’s i.e. “1111111111111111”
in binary, we get output ‘z’ as 4294836225 which is the maximum value for
16*16 unsigned multiplier.
ii)
Unsigned Complex Multiplier 16*16:-
Figure 5.2 Behavioral Simulation of 16*16 Unsigned Complex Multiplier.
Above figure shows waveform of 16*16 Complex Multiplier for unsigned number. Here, four inputs are there ‘a’,’b’,’c’ and ‘d’ of 16 bit input each. As, the inputs are unsigned number, we have to enter sign of each number separately. So, for all four inputs we are entering sign bit as ‘sa’ for input ‘a’, ‘sb’ for input ‘b’, ‘sc’ for input ‘c’ and ‘sd’ for input ‘d’. As explained in section 4.1 block diagram of unsigned complex multiplication, we are getting output of complex multiplier as shown in above figure. Operation of Complex Multiplier is explained in above simulation waveform. iii) Signed Multiplier 16*16:a) Radix-2:-
Figure 5.3 Behavioral Simulation of 16*16 Basic Signed Multiplier
Above figure shows Behavioral Simulation of 16*16 Basic Signed Multiplier. In these scheme we have to enter signed values of input i.e.,’a’ and ‘b’. Inputs are of 16 bit while output ‘x’ is of 32 bit. Here, the range of the numbers are from -32768 to +32767. As, these is signed number multiplier so both positive and negative numbers are considered. As shown in above figure result of signed multiplier, here we don’t have to input sign value of each input as we are required in Unsigned scheme. Negative numbers are entered in 2’s complement form. Suppose, we are putting value of ‘a’ and ‘b’ as 7 and -7 respectively. As, ‘a’ is positive number so we enter value as “0000000000000111” and ‘b’ as negative number so, we enter value as “111111111111001” for -7 which is in 2’s complement form. Result ‘z’ we got here in these case is in binary form is “1111111111001111” which is value of -49 in 2’s complement form. b) Radix-4
In Radix-4 design simulation result is same as Radix-2 scheme. Only difference between these two schemes are synthesis report iv) Signed 16*16 Complex Multiplier:i) Radix-2
Figure 5.4 Behavioral Simulation of 16*16 Complex Signed Multiplier
Above figure shows Behavioral simulation of 16*16 Complex Signed Multiplier. In these scheme we are entering inputs ‘a’,’b’,’c’ and ‘d’ in both positive and negative format. So, there is no need to enter sign bits for all inputs. As, we discussed the range of the number and format of number in previous section, consider the first example where a=1,b=2,c=3 and d=4. All these numbers are positive number so we put their binary values as normal binary weighted values. After calculation of (1+2i).(3+4i) we get result as 5-10i. Real part is +5 and imaginary part is -10. These result in binary format is written as for +5 it is “00000000000000000000000000000101” and for -10 it is “11111111111111111111111111110110” which is in 2’s complement form. v) Radix-4:-
Behavioral Simulation of Radix-4 Complex Multiplier is same as Radix-2 scheme.
5.2 Synthesis Report:i) Unsigned Basic Multiplier16*16:-
Design Summary:a) Xilinx FPGA xc2vp20-5ff1152:-
Logic Utilization: Number of 4 input LUTs:
714 out of 18,560 3%
Logic Distribution: Number of occupied Slices:
405 out of 9,280 4%
Number of Slices containing only related logic: 405 out of
405 100%
Number of Slices containing unrelated logic:
0 out of
Total Number of 4 input LUTs:
714 out of 18,560 3%
Number of bonded IOBs:
64 out of
405 0%
564 11%
Total equivalent gate count for design: 4,287 Combinational Path Delay:- 34.009ns ii) Unsigned Complex Multiplier 16*16:Design Summary:-
Logic Utilization: Number of Slice Latches:
2 out of 18,560 1%
Number of 4 input LUTs:
3,422 out of 18,560 18%
Logic Distribution: Number of occupied Slices:
1,891 out of 9,280 20%
Number of Slices containing only related logic: 1,891 out of 1,891 100% Number of Slices containing unrelated logic:
0 out of 1,891 0%
Total Number of 4 input LUTs:
3,422 out of 18,560 18%
Number of bonded IOBs:
136 out of
IOB Latches:
66
Total equivalent gate count for design: 21,760
564 24%
Combinational Path Delay:- 41.271 ns iii) Signed Basic Multiplier 16*16 radix 2:Design Summary Xilinx FPGA xc2vp20-5ff1152:-
Logic Utilization: Number of 4 input LUTs:
811 out of 18,560 4%
Logic Distribution: Number of occupied Slices:
468 out of 9,280 5%
Number of Slices containing only related logic:
468 out of
Number of Slices containing unrelated logic:
0 out of
468 100% 468 0%
Total Number of 4 input LUTs:
812 out of 18,560 4%
Number used as logic:
811
Number used as a route-thru:
1
Number of bonded IOBs:
64 out of
564 11%
Total equivalent gate count for design: 4,980 Combinational Path Delay:-35.432 ns iv) Signed Basic Multiplier 16*16 radix-4. Design Summary Xilinx FPGA xc2vp20-5ff1152:-
Logic Utilization: Number of 4 input LUTs:
705 out of 18,560 3%
Logic Distribution: Number of occupied Slices:
392 out of 9,280 4%
Number of Slices containing only related logic:
392 out of
Number of Slices containing unrelated logic:
0 out of
Total Number of 4 input LUTs:
707 out of 18,560 3%
Number used as logic:
705
Number used as a route-thru:
2
Number of bonded IOBs:
63 out of
Total equivalent gate count for design: 4,422
392 100% 392 0%
564 11%
Combinational Path Delay:-35.858 ns v) Signed 16*16 Complex Multiplier Radix-2:Design Summary:Xilinx FPGA xc2vp20-5ff1152:-
Logic Utilization: Number of 4 input LUTs:
3,903 out of 18,560 21%
Logic Distribution: Number of occupied Slices:
2,238 out of 9,280 24%
Number of Slices containing only related logic: 2,238 out of 2,238 100% Number of Slices containing unrelated logic:
0 out of 2,238 0%
Total Number of 4 input LUTs:
3,908 out of 18,560 21%
Number used as logic:
3,903
Number used as a route-thru:
5
Number of bonded IOBs:
126 out of
Total equivalent gate count for design:
24,231
564 22%
Combinational path Delay:- 58.181 ns
vi) Signed 16*16 Complex Multiplier Radix-4:Design Summary:Xilinx FPGA xc2vp20-5ff1152:-
Logic Utilization: Number of 4 input LUTs:
3,195 out of 18,560 17%
Logic Distribution: Number of occupied Slices:
1,758 out of 9,280 18%
Number of Slices containing only related logic: 1,758 out of 1,758 100% Number of Slices containing unrelated logic:
0 out of 1,758 0%
Total Number of 4 input LUTs:
3,200 out of 18,560 17%
Number used as logic:
3,195
Number used as a route-thru:
5
Number of bonded IOBs: Total equivalent gate count for design: 20,301 Combinational path delay: 57.847ns
5.3 Power Calculation:i) Unsigned Basic Multiplier 16*16:-
a) Xilinx FPGA xc2vp20-5ff1152:-
Dynamic Power:-52.68mW Static Power:-
540.72 mW
Power-Delay Product:- 1.79 nJ b) ASIC Encounter Synthesis:-
Number of Cells:- 668 out of 549815 Dynamic Power:- 18.97 mW ii) Unsigned Complex Multiplier 16*16:-
a) Xilinx FPGA xc2vp20-5ff1152:-
Dynamic Power:- 6486.61mW Static Power:-
7248.75mW
Power-Delay Product:- 267.7nJ iii) Signed Basic Multiplier 16*16 radix 2:-
a) Xilinx FPGA xc2vp20-5ff1152:-
Dynamic Power:- 87.34mW Static Power:- 554.68mW Power-Delay Product:-3.09 nJ
126 out of
564 22%
b) ASIC Encounter Synthesis:-
Number of Cells:- 2818 out of 75981 Dynamic Power:- 3.84 mW
iv) Signed Basic Multiplier 16*16 radix-4:-
a) Xilinx FPGA xc2vp20-5ff1152:-
Dynamic Power:- 81.21mW Static Power:- 464.07mW Power-Delay Product:-2.9nJ b) ASIC Encounter Synthesis:-
Number of Cells:- 653 out of 17774 Dynamic Power:- 2.83 mW v) Signed Complex Multiplier 16*16 radix-2:-
a) Xilinx FPGA xc2vp20-5ff1152:-
Dynamic Power:- 80.78mW Static Power:- 951.67mW Power-Delay product:-4.69nJ b) ASIC Encounter Synthesis:-
Number of Cells:- 3509 out of 115564 Dynamic Power:- 25.63 mW vi) Signed Complex Multiplier 16*16 radix-4:-
a)
Xilinx FPGA xc2vp20-5ff1152:-
Dynamic Power:- 80.78mW Static Power:- 951.67mW Power-Delay product:-4.69nJ
b) ASIC Encounter Synthesis:-
Number of Cells:- 1621 out of 46147 Dynamic Power:-10.48mW
5.4 Layout:Signed Complex Multiplier 16*16:-
Chapter 6.
Conclusion and Future Work
6.1 Conclusion 6.2 Future Work
This Chapter summarizes the conclusion for the design and also explained about future work. 6.1 Conclusion:-
Parallel Complex Multiplier using different order Compressors is explained. Use of Compressors are used to reduce the switching activity and propagation delay for the Multipliers. It also reduced vertical critical path delay, hence reduces stages of partial products. Optimal use of all these thirteen different compressors improves the speed as well as power performance of the multiplier. As, the delay and power both are reduced then power-delay product is also reduced. Results are calculated in both FPGA and ASIC. FPGA we used in our design is xc2vp205ff1152 to calculate all synthesis report and power for all multipliers. For, ASIC design we used Encounter Synthesis Tool to calculate hardware information and power for all multipliers. It is found that signed multipliers has less area and low power compared to unsigned multiplier. 6.2 Future Work:-
Complex Multiplier of higher width can be implemented using these compressors. More higher order compressors can be design to reduce the vertical height for higher width multiplier, hence we can achieve less power.Design of these Complex Multipliers are used to implement FFT/IFFT design which are used in DSP applications.