Precision and Recall

March 17, 2009
166 Views

This month’s issue of IEEE Computer is a special issue featuring information seeking support systems, edited by Gary Marchionini and Ryen White. You can read their introduction for free here; unfortunately, the articles, while available online, are only free for IEEE Xplore subscribers.

What I can share is a 500-word sidebar I wrote that appears on p. 39, in an article by Peter Pirolli entitled “Powers of 10: Modeling Complex Information-Seeking Systems at Multiple Scales“.


Precision and Recall

Information retrieval (IR) research today emphasizes precision at the expense of recall. Precision is the number of relevant documents a search retrieves divided by the total number of documents retrieved, while recall is the number of relevant documents retrieved divided by the total number of existing relevant documents that should have been retrieved.

These measures were originally intended for set retrieval, but most current research assumes a ranked retrieval model, in which the search returns results in order of their estimated likelihood of relevance to a search query. Popular measures like mean average precision (MAP) and normalized discounted cumulative gain (NDCG

This month’s issue of IEEE Computer is a special issue featuring information seeking support systems, edited by Gary Marchionini and Ryen White. You can read their introduction for free here; unfortunately, the articles, while available online, are only free for IEEE Xplore subscribers.

What I can share is a 500-word sidebar I wrote that appears on p. 39, in an article by Peter Pirolli entitled “Powers of 10: Modeling Complex Information-Seeking Systems at Multiple Scales“.


Precision and Recall

Information retrieval (IR) research today emphasizes precision at the expense of recall. Precision is the number of relevant documents a search retrieves divided by the total number of documents retrieved, while recall is the number of relevant documents retrieved divided by the total number of existing relevant documents that should have been retrieved.

These measures were originally intended for set retrieval, but most current research assumes a ranked retrieval model, in which the search returns results in order of their estimated likelihood of relevance to a search query. Popular measures like mean average precision (MAP) and normalized discounted cumulative gain (NDCG) [1] mostly reflect precision for the highest-ranked results.

For the most difficult and valuable information-seeking problems, however, recall is at least as important as precision. In particular, for tasks that involve exploration or progressive elaboration of the user’s needs, a user’s progress depends on understanding the breadth and organization of available content related to those needs. Techniques designed for interactive retrieval, particularly those that support iterative query refinement, rely on communicating to the user the properties of large sets of documents and thus benefit from a retrieval approach with a high degree of recall [2].

The extreme case for the importance of recall is the problem of information availability, where the seeker faces uncertainty as to whether the information of interest is available at all. Instances of this problem include some of the highest-value information tasks, such as those facing national security and legal/patent professionals, who might spend hours or days searching to determine whether the desired information exists.

The IR community would do well to develop benchmarks for systems that consider recall at least as important as precision. Perhaps researchers should revive the set retrieval models and measures such as the F1 score, which is the harmonic mean of precision and recall.

Meanwhile, information scientists could use information availability problems as realistic tests for user studies of exploratory search systems, or interactive retrieval approaches in general. The effectiveness of such systems would be measured in terms of the correctness of the outcome (does the user correctly conclude whether the information of interest is available?); user confidence in the outcome, which admittedly may be hard to quantify; and efficiency—the user’s time or labor expenditure.

Precision will always be an important performance measure, particularly for tasks like known-item search and navigational search. For more challenging information-seeking tasks, however, recall is as or more important, and it is critical that the evaluation of information-seeking support systems take recall into account.

References

  1. K. Järvelin and J. Kekäläinen, “Cumulated Gain-Based Evaluation of IR Techniques,” ACM Trans. Information Systems, Oct. 2002, pp. 422-446.
  2. R. Rao et al., “Rich Interaction in the Digital Library,” Comm. ACM, Apr. 1995, pp. 29-39.


Link to original post