Friday, March 28, 2014

Unit 11 Reading Note (3/31)

IES chapter 14 parallel information retrieval

Document Partitioning: each document is divided into one or more nonoverlapping partitions. Many of the text-framework features can be configured to operate differently for each partition. But it doesn't have good performance if the index is stored on disk.

Term partitioning addresses the disk seek problem by splitting the collection into sets of terms instead of sets of documents.

MapReduces are highly parallelizable, because both map and reduce can be executed in parallel on many different machines.

No comments:

Post a Comment