1.2.3 If
a compter connected to a 1 Gigabit Ethernet nerwork needs to send a 256Kbytes file, how long it would take? Answer: Network speed: 1 gigabit network ==> 1 gigabit/per second = 125 Mbytes/ second. File size: 256 Kbytes = 0.256 Mbytes. Time for 0.256 Mbytes = 0.256/125 = 2.048 ms For problems below, use the information about access time for every type of memory in the following table. Cache DRAM Flash Memory Magnetic Disk 5ns 50 ns 5 μs 5 ms 1.2.4 Find
how long it takes to read a file from a DRAM if it takes 2 microseconds from the cache memory.
Answer : 2 microseconds from cache
==> 20 microseconds from DRAM.
20 microseconds from DRAM ==> 2 seconds from magnetic disk. 20 microseconds from DRAM ==> 2 ms from fl ash memory Exercise 1.3
Consider three different processors P1, P2, and P3 executing the same instruction set with the clock rates a nd CPIs given in the following table. Processor P1 P2 P3
Clock Rate 2 Ghz 1.5 Ghz 3 Ghz
CPI 1.5 1.0 2.5
[5] <1.4> Which processor has the highest highest performance expressed in instructions per second (MIPS)?
P3
3 GHz
90.109
9s
Answer : IPC = 1/CPI = No. instr./(time × clock rate) IPC(P1) = 20.109 / (7*2Ghz) = 1.42 IPC(P2) = 30.109 / (10*1.5Ghz) = 2 IPC(P3) = 90.109 / (9*3Ghz)= 3.33 1.3.5 [5] <1.4> Find the clock rate for P2 that reduces its execution time to that of P1 Answer:
f_new = No. instr. × CPI/time_new f_old = No. instr. × CPI/time_old f_new/f_old = time_old/time_new time_old/time_new f_new = (f_old * 10/7) = 1.5 Ghz *10/7 = 2.14 Ghz 1.3.6 [5] <1.4> Find the number of instructions for P2 that reduces its execution time to that of P3 Answer:
No.instr_new = (f * time_new) time_new) / CPI No.instr_old = (f * time_old) time_old) / CPI No.instr_new / No.instr_old No.instr_old = time_new time_new / time_old No.instr_new = No.instr_old No.instr_old * 9/10 9/10 = 30*109 *9 / 10 = 27 * 10 9 Consider two different implementations of the same instruction set architecture. There are four classes of instructions, A, B, C, and D. The clock rate and CPI of each implementation are given in the following table. CPI Class CPI Class CPI Class CPI Class Clock rate A B C D P1 1.5 Ghz 1 2 3 4 P2 2 Ghz 2 2 2 2 Exercise 1.4
1.3.1
Answer : =>P2 has the highest performance IPS(P1) = 1.33 × 10 9 MIPS(P1) = 1.33 × 10 3
a program with 106 instructions divided divided into classes as follows: 10% class A, 20% class B, 50% class C, and 20% class D, which implementation is faster?
1.4.1 Given
Answer : P2 Class A: 10 5 instr. Class B: 2 × 105 instr. Class C: 5 × 105 instr. Class D: 2 × 105 instr. Time = No. instr. × CPI/clock rate
IPS(P2) = 1.5 × 109 MIPS(P2) = 1.5 × 10 3 IPS(P3) = 1.2 × 109 MIPS(P3) = 1.2 × 10 3 1.3.2 [10] <1.4> If the processors each execute a program in 10 seconds, find the number of cycles and the number of instructions
P1: Time class A = (105/1.5*10 9) = 0.66 × 10 -4 Time class B = 2.66 × 10 -4 Time class C = 10 × 10 -4 Time class D = 5.33 × 10 -4
Answer :
Total time P1 = 18.65 × 10 -4
No. cycles = time × clock clock rate cycles(P1) = 10 × 2 × 109= 20 × 10 9s cycles(P2) = 10 × 1.5 × 109= 15 × 10 9s cycles(P3) = 10 × 3 × 109= 30 × 10 9s
P2: Time class A = 10-4 Time class B = 2 × 10-4 Time class C = 5 × 10 -4 Time class D = 3 × 10-4
time = (No. instr. × CPI)/clock rate => No. instructions = No. cycles/CPI cycles/CPI instructions(P1) = 20 × 109/1.5 = 13.33 × 10 9 instructions(P2) = 15 × 109/1 = 15 × 10 9 instructions(P3) = 30 × 109/2.5 = 12 × 10 9
Total time P2 = 11 × 10-4 1.4.2 [5]
<1.4> What is the global CPI for each implementation? CPI = time × c lock rate/No. instr. CPI(P1) = 18.65 × 10 -4 × 1.5 × 109/106 = 2.79 CPI(P2) = 11 × 10-4× 2 × 109/106= 2.2 Answer :
1.3.3 [10] <1.4> We are trying to reduce the time by 30% but this leads to an increase of 20% in the CPI. What clock rate should we have to get this time reduction?
<1.4> Find the clock cycles required in both cases.
Clock cycle = Instruction for a program * CPI clock cycles(P1) = InstrucA * CPIA + InstrucB * CPIB + InstrucC * CPIC + InstrucD * CPID = 105 × 1 + 2 × 10 5× 2 + 5 × 10 5× 3 + 2 × 10 5× 4 = 28 × 5 10 1.4.4 [5] <1.4> Assuming that arith instructions take 1 cycle, load and store 5 cycles, and branches 2 cycles, what is the execution time of the program in a 2 GHz processor?
CPI_New = CPI_Old * 1.2 CPI_New(P1) = 1.5 * 1.2 = 1.8 CPI_New(P2) = 1 * 1.2 = 1.2 CPI_New(P3) = 2.5 * 1.2 = 3 Time_New = Time_Old * 0.7 = 10*0.7 = 7s
ƒ = No. instr. × CPI/time ( No. No. instr lấy ở câu 1.3.2) ƒ(P1) = 13.33*10 9 * 1.8/7 = 3.43 GHz ƒ(P2) = 15*109 * 1.2 / 7 = 2.57 GHz ƒ(P3) = 12*109 * 3 / 7 = 5.14 Ghz the IPC (instructions per cycle) for each processor For problems below, use the information in the following table. Processor Rate Clock No. Instructions P1 3 GHz 20.109 P2 1.5 GHz 30.109
1.4.3 [5] Answer:
Answer:
The following table shows the number of instructions for a program. Arith Store Load Branch Total 500 50 100 50 700
1.3.4 Find
Time 7s 10s
Answer:
CPU Time =
∗ Clock rate
Answer :
Same as 1.5.2 a. P2 is 1.31 times faster than P1 b. P1 is 1.00 times faster than P2
= (500 *1 + 50 * 5+
9
-9
100*5+50*2)/(2*10 ) = 675 * 10 s = 675 ns <1.4> Find the CPI for the program. Answer : CPI = time × clock rate/No. instr. CPI = 675 × 10-9 × 2 × 109 /700 = 1.92 1.4.5 [5]
1.5.4 [5]
<1.4> Assuming that computes take 1 cycle, loads and store instructions take 10 cycles, and branches take 3 cycles, find the execution time on a 3 GHz MIPS processor.
1.4.6 [10]
<1.4> If the number of load instructions can be reduced by one half, what is the speedup and the CPI? Answer : Time = (500 × 1 + 50 × 5 + 50 × 5 + 50 × 2) × 0.5 × 10 -9 = 550 ns Speed-up = 675 ns/550 ns = 1.22 CPI = 550 × 10-9 × 2 × 109 /700 = 1.57 Exercise 1.5
Consider two different implementations, P1 and P2, of the same instruction set. There are five classes of i nstructions (A, B, C, D, and E) in the instruction set. The clock rate and CPI of each class is given below. Clock CPI CPI CPI CPI CPI Rate Class A Class B Class Class Class C D E P1 1.0 GHz 1 2 3 4 3 a P2 1.5 Ghz 2 2 2 4 4 P1 1.0 GHz 1 1 2 3 2 b P2 1.5 Ghz 1 2 3 4 3 [5] <1.4> Assume that peak performance is defined as the fastest rate that a computer can execute any instruction sequence. What are the peak performances of P1 and P2 expressed in instructions per second? 1.5.1
Answer:
Instructio n Compute Load Store Branch total
Cycl e 1 10 10 3
Instruct 1
1000 400 100 50
P1 Cycle*instruc t1 1000 4000 1000 150 6150
P2 Cycle*instruc t2 1500 3000 1000 300 5800
Instruct 2
1500 300 100 100
Cpu-time1 = Cycle*instruct1/F = 6150/3Ghz = 2.05*10 6s = 2.05 µs Cpu-time2 = Cycle*instruct2/F = 5800/3Ghz = 1.93 µs [5] <1.4> Assuming that computes take 1 cycle, loads and store instructions take 2 cycles, and branches take 3 c ycles, find the execution time on a 3 GHz MIPS processor 1.5.5
Answer: a.
Peak performance on P1 occurs when only class A instructions are executed peakP1 = 1 inst/cycle x 1 x 109 cycles/sec = 1 x 109 inst/sec = 1G inst/sec peak P2 = (1/2) inst/cycle x 1.5 x 109 cycles/sec = 0.75 x 10 9 inst/sec = 0.75G inst/sec b. Peak performance on P1 occurs when only class A instructions are executed peakP1 = 1 inst/cycle x 1 x 109 cycles/sec = 1 x 10 9 inst/sec = 1G inst/sec peak P2 = 1 inst/cycle x 1.5 x 109 cycles/sec = 1.5 x 10 9 inst/sec = 1.5G inst/sec [10] <1.4> If the number of instructions executed in a certain program is divided equally among the classes of instructions except for class A, which occurs twice as often as each of the others, which computer is fas ter? How much faster is it? Answer : 1.5.2
CPI
Freq
Freq*CPI
Freq
Freq*CPI
a
1
0.333
0.333
2
0.666
b
2
0.167
0.334
2
0.334
c
3
0.167
0.501
2
0.334
d
4
0.167
0.668
4
0.668
e
3
0.167
0.501
4
0.668
Total
The table below shows instruction type breakdown for different programs. Using this data, you will be exploring the performance tradeoffs for different changes made to an MIPS processor. No Instruction Compute Load Store Branch total Program 1 1000 400 100 50 15500 Program 2 1500 300 100 100 1750
2.337
2.67
Cpu-time = I/F Cpu-time1 = 2.337 I / 1Ghz Cpu-time2 = 2.67 I / 1.5Ghz perf2/perf1 = cpu-time1/cpu-time2 = 1.5 * 2.337/2.67 = 1.3 (Performance = 1/ execution time) a. P2 is 1.33 times faster than P1 b. Same as question a: P1 is 1.03 times faster than P2 1.5.3 [10] <1.4> If the number of instructions executed in a certain program is divided equally among the classes of instructions except for class E, which oc curs twice as often as each of the others, which computer is faster? How much faster is it?
Answer :
P1 Instructio n
Cycl e
Instruc t1
Compute
1
1000
Load
2
Store Branch total
P2
Cycle*instru ct1
Instruc t2
Cycle*instruct
1000
1500
1500
400
800
300
600
2
100
200
100
200
3
50
150
100
300
2
2150
Cpu-time1 = Cycle*instruct1/F = 2150/3Ghz = 716*10 6s = 0.71 µs Cpu-time2 = Cycle*instruct2/F = 2600/3Ghz = 0.86 µs
2600