1 minute read

MapReduce

Definition

  • MapReduce is indeed a data processing model designed for handling large-scale data sets. It emerged as a response to the challenges of processing increasingly larger datasets that could not be managed effectively through traditional vertical scaling methods.
  • In the early 2000s, Google engineers recognized the limitations of existing data processing methods due to the rapidly growing size of datasets. This realization led to the development of a new approach.
  • Jeffrey Dean and Sanjay Ghemawat authored a seminal white paper in 2004 titled “MapReduce: Simplified Data Processing on Large Clusters.”
  • This paper introduced the MapReduce model to the world and detailed how it could efficiently process large amounts of data using a distributed and parallel architecture. The paper became a foundational text in the field of big data processing and inspired numerous implementations and adaptations in various big data technologies.
  • A MapReduce job is comprised of 3 main steps:
    • Map: The map function accesses various chunks of the dataset in a distributed system, transforming them into intermediate key-value pairs.
    • Shuffle: Reorganizes the intermediate key-value pairs such that pairs of the same key are routed to the same machine in the final step.
    • Reduce: The reduce function on the newly shuffled key-value pairs and transforms them into more meaningful data.

Distributed File System (DFS)

  • A DFS is an abstraction over a (usually large) cluster of machines that allows them to act like one large file system.
  • A DFS guarantees that Distributed storage, Replication and High Availability, Scalability, Transparency, Reliability and Fault tolerance.
  • 2 Famous DFS are the Google File System and Hadoop.

Hadoop

  • A popular, open-source framework that supports MapReduce jobs and many other kinds of data-processing pipelines. Its central component is HDFS (Hadoop Distributed File System), on top of which other technologies have been developed.

Leave a comment