1. Consider two cloud service systems: Google File System and Amazon S3. Explain how they achieve their design goals to secure data integrity and to maintain data consistency while facing the problems of hardware failure, especially concurrent hardware failures. Amazon S3 : Amazon Simple Storage Service is storage for the Internet. It is designed to make web-scale computing easier for developers. Amazon S3 has a simple web services interface that you can use to store and retrieve any amount of data, at any time, from anywhere on the web. Amazon S3 replicates each object across all Availability Zones within the respective region. Replication can provide data and service availability in the case of system failure, but provides no protection against accidental deletion or data integrity compromise –it replicates changes across all Availability Zones where it stores copies. Amazon S3 offers standard redundancy and reduced redundancy options, which have different durability objectives and price points.
2. Suggest two hardware mechanisms mechanisms and software schemes to secure the application cloud (SaaS), the infrastructure cloud (IaaS), and the platform cloud (PaaS). Discuss their specific requirements and difficulties and limitations that may be encountered. 3. Draw a layered diagram to relate the construction of IaaS, PaaS, and SaaS clouds from bare machine hardware to the user’s applications. Briefly list the representative cloud service offerings at each cloud layer from the major cloud providers that you know of. 4. Discuss the enabling technologies for building the cloud platforms from virtualized and automated data centers to provide IaaS, PaaS, or SaaS services. Identify hardware, software, and networking mechanisms or business models that enable multitenant services. 5. Consider a program for multiplying two large-scale N × N matrices, where N is the matrix size. The sequential multiply time on a single server is T1 = cN3 minutes, where c is a constant determined by the server used. An MPI-code parallel program requires Tn = cN3/n + dN2/n0.5 minutes to complete execution on an n-server cluster system, where d is a constant determined by the MPI version used. Assume the program has a zero sequential bottleneck (α = 0). The second term in Tn accounts for the total message-passing overhead experienced by n servers. Answer the following questions for a given cluster configuration with n = 64 servers, c = 0.8, and d = 0.1. Parts (a, b) have a fixed workload corresponding to the matrix size N = 15,000. Parts (c, d) have a scaled workload associated with an enlarged matrix size N′ = n1/3 N= 641/3 × 15,000 = 4 × 15,000 = 60,000. Assume Assume the same cluster configuration to process both workloads. Thus, the system parameters n, c, and d
stay unchanged. Running the scaled workload, the overhead also increases with the enlarged matrix size N′. – a. Using Amdahl’s law, calculate the speedup of the n-server cluster over a single server. – b. What is the efficiency of the cluster system used in Part (a)? – c. Calculate the speedup in executing the scaled workload for an enlarged N′ × N′ matrix on the same cluster configuration using Gustafson’s law. – d. Calculate the efficiency of running the scaled workload in Part (c) on the 64 processor cluster. – e. Compare the above speedup and efficiency results and comment on their implications. 6. Briefly explain the following terms associated with network threats or security defense in a distributed computing system: – a. Denial of service (DoS) – b. Trojan horse – c. Network worm – d. Service spoofing – e. Authorization – f. Authentication – g. Data integrity – h. Confidentiality 7. Answer the following questions regarding PC and HPC systems: – a. Explain why PCs and HPCs were evolutionary rather than revolutionary in the past 30 years. – b. Discuss the drawbacks in disruptive changes in processor architecture. Why is the memory wall a major problem in achieving scalable changes in performance? – c. Explain why x-86 processors are still dominating the PC and HPC markets.
8. Compare GPU and CPU chips in terms of their strengths and weaknesses. In particular, discuss the trade-offs between power efficiency, programmability, and performance. Also compare various MPP architectures in processor selection, performance target, efficiency, and packaging constraints. 9. Briefly answer the following questions regarding green information technology and energy efficiency in distributed systems: – a. Why is power consumption critical to data-center operations? – b. What constitutes the dynamic voltage frequency scaling (DVFS) technique? – c. Conduct in-depth research on recent progress in green IT research, and write a report on its applications to data-center design and cloud service applications. 10. Compare three distributed operating systems: Amoeba, DCE, and MOSIX. Research their recent developments and their impact on applications in clusters, grids, and clouds. Discuss the suitability of each system in its commercial or experimental distributed applications. Also discuss each system’s limitations and explain why they were not successful as commercial systems.