Percolator

Notes Bigtable doesn’t support multi-row/multi-table transactions. Why does Google need multi-table transactions? Removing duplicates(Multiple URLs may lead to the same website), calculation of pagerank will get affected. Built on top of big table, because didn’t have that many people working on it and also didn’t have source code access to big table. Locks Locks in percolator could have been implemented in two ways - In place(in database) Problem with this is that you can’t maintain complex locks with queues and techniques like wound wait, wait die, etc....

MapReduce

TIL: POSIX Reason for MapReduce(Why use a distributed system?) Lots of data(1 PetaByte), machines had 160 GB storage so can’t process in one machine I/O speed is very low(performance) Fault tolerance(Tolerate machine and disk failures) Application programmers don’t need to work on systems and making sure their job is running in a distributed fashion Workflow Input Map Reduce Result key, value K1 <v1, v1`, v1`` > K1 R(v1, v1`, …) K2, v2 K3, v3 Examples Word Count Sort Reverse Links(Used for page ranking) Input - key: URL, value: HTML Map - (<target1, src1>, <target2, src2>,…....