Friday, February 7, 2014

Unit 5 Reading Note (2/10)

IIR Chapter 11

Binary Independence Model (BIM) assumption: the presence or absence of a word in a document is independent of the presence or absence of any other word (given the query); terms not occurring in the query are equally likely to occur in relevant and nonrelevant doc- uments: that is, if qt = 0 then pt = ut

Retrieval Status Value (RSV) estimate: ut = udf; pt (fixed size set V)

Chapter 12

Language Model (LM): query likelihood model: P(d|q) P(d) ((1 λ)P(t|Mc) + λP(t|Md))

Different from BIM,  LM approach does away with explicitly modeling relevance.

Query language Model (BIM) and document language model (LM) combination: Kullback-Leibler (KL) divergence.

Translation Model: P(q|Md) = ∏ ∑ P(v|Md)T(t|v)

Relating the New Language Models of Information Retrieval to the Traditional Retrieval Models

LM shares some some characteristics with VS (vector space) and BIM: justification for using tf.idf weights and new relevance weighting method (terms can be assigned a zero relevance weight; two steps until the value of relevance weight does not change)

Extended Boolean retrieval: the probability of the disjunction of m possible translations;easy add term/collection frequencies by OR ; "grouping" by OR; disjunction into conjunctive norm form.

LM outperforms VS, BIM and Boolean Model.

No comments:

Post a Comment