A Hadoop-like Cloud Computing System

MapReduce (credit https://blog.sqlauthority.com/)

This is a course project for CS425 Distributed System. You can only use this for reference if you are also implementing the course project for CS 425.

In this project, we used Golang to build a Hadoop-like parallel computing system. The system is called MapleJuice with one master and X workers.

Each node has a full membership list updated based on the Gossip protocol(MP1 Instruction, MP1 Report, MP1 Code). The node failure, join and leave will be reflected in all membership lists in 6 seconds.

Then we implement a simple distributed file system (SDFS) where the user can put, get and delete files which is a simplified version of HDFS (Hadoop Distributed File System) (MP2 Instruction, MP2 Report, MP2 Code). The whole system can tolerate up to three machine failures and rereplicate the data efficiently after failures. The file upload and download activities are based on HTTP protocol.

Lastly, we built a new parallel cloud computing framework called MapleJuice, a simpler version of MapReduce/Hadoop (MP3 Instruction, MP3 Report, MP3 Code). We built two applications and used these two applications to compare the performance between MapleJuice and MapReduce. We found our system outperforms the original MapReduce.

We also built an interface to show the state of our system, as shown in the following picture.

Interface

We learned basic theory and algorithms of distributed system, and implement and debug a complex system by ourseleves.

Beitong Tian
Beitong Tian
Ph.D. Student in Computer Science

My research interests include wireless sensing network, mobile computing and machine learning.