Tuesday, October 14, 2014

Ada Lovelace Day: Karen Spärck Jones and Information Retrieval

Today is Ada Lovelace Day, a day to blog about female computer scientists we admire. Augusta Ada Byron, Countess of Lovelace (10 December 1815 – 27 November 1852) is credited with authoring the first computer algorithm (which concerned a method for calculating a sequence of Bernoulli numbers) in 1843 for use on Charles Babbage's early mechanical general-purpose computer, the analytical engine.

This year I am going to post about Karen Spärck Jones, whose work on information retrieval is fundamental to the operation of all modern search engines.

Karen Spärck Jones was born in Huddersfield, in 1935, she attended Cambridge University, in the late 1950s began working as a researcher at the Cambridge Language Research Unit. During that time she worked in the field of Natural Language Processing, and looked at the problem of near-synonyms, and developed more sophisticated ways of distinguishing ambiguous terms.

By the 1960s she was focusing on Information Retrieval and helped develop a metric to measure the importance of an individual word (or a family of words) in a document. This is the notion of Inverse Document Frequency (IDF) weighting, which she introduced in a 1972 paper "A Statistical Interpretation of Term Specificity and Its Application in Retrieval"

The Inverse Document Frequency (IDF) is used in all web search engines and is fundamental to their operation in terms of classification and retrieval, and has also filtered into areas of NLP.

Her more recent work had been on document retrieval, including speech applications, database query, user and agent modelling, summarising, and information and language system evaluation as well as projects on automatic summarising, belief revision for information retrieval, video mail retrieval, and multimedia document retrieval, the last two in collaboration with the Engineering department.

As an influential figure on evaluation programmes, Karen Spärck Jones was also involved in setting the standards for a large proportion of the work in NLP.

She was recipient of a significant number of awards, including:

  • Gerard Salton Award (1988)
  • ASIS&T Award of Merit (2002)
  • ACL Lifetime Achievement Award (2004) 
  • BCS Lovelace Medal (2007)
  • ACM-AAAI Allen Newell Award (2007)

She died on 4 April 2007, and Computer Science lost one of it's most important heroes.