Hadoop is an open source software framework for distributed storage and the processing of large data sets. It’s distributed file system has the .provision of rapid data transfer rates among nodes. It also allows the system to continue operating in case of node failure .