IS2140: Unit 4 Reading Note (2/3)

IIR 1.3 and 1.4

Processing Boolean queries: intersection by merge algorithm
Proximity operator (query closeness)

Chapter 6

Scoring zones and learning weights (machine-learned relevance; weight g determined by enable error equals 0)

Term frequency and weight: tf-idf weighting; tf: term frequency; idf: inverse document frequency (rare term is high)

-->Vector space model for scoring: dot products (weights of terms; cosine similarity) -->queries as vectors (v(q).v(d))

Variant tf-idf functions (alternatives): sublinear tf scaling and maximum tf normalization; scheme (ddd.qqq eg. lnc.ltc: log-weighted tf, no idf and cosine normalizaition (document); log-weighted tf, idf and cn (query))

Pivoted normalized document length (enable dot product score account for the effect of document length on relevance): a normalization factor for each document vector that is not the Euclidean length of that vector, but instead one that is larger than the Euclidean length for documents of length less than lp, and smaller for longer documents.

IS2140

Thursday, January 30, 2014

Unit 4 Reading Note (2/3)

No comments:

Post a Comment