Generative Neural Information Retrieval Models
Date created: | Last Updated:
: DOI | ARK
Creating DOI. Please wait...
Description: Search engines have become modern society's main sources of information. They put vast amounts of knowledge about virtually any topic at our fingertips wherever we go. To do so, a search engine studies our search query and tries to form an understanding of what it is that we were looking for in the first place, when querying for "boston tea party reason", "IRS form 1040" or "pizza near me". This internal representation is then compared with billions of webpages, books, or news articles in an attempt to find the best possible information for our searcher's query. This project will support and innovate this process in two ways. First, it will develop a more accurate understanding of search queries using insights into the way that humans use language, rather than just comparing queries and documents word-by-word. Secondly, using these improved representations of query meaning, the researchers will develop a fundamentally different way of searching for information. Instead of comparing our query with every possible match, they let the search engine come up with an idealized response to the query and then try to find those webpages that are most similar to this optimal answer. The expected consequences will be better search results and faster computation for the machines running the search engines (that, in turn, can lead to reduced electricity demand and CO2 emissions). Deep and representation learning have brought promising improvements to various Information Retrieval (IR) tasks. Existing neural IR models estimate a matching score between the information need - such as a query or question - and the documents, using semantic similarities between terms, learned from a large set of relevance information. In contrast to classical IR models where the estimation of matching scores is constrained to only those documents containing the query terms, neural IR models need to trace over all documents, or instead re-rank the top-retrieved documents, obtained from a classical IR model. In addition, since neural IR models are often based on purely distributional representations of term meaning, they lack a grounded understanding of language subtleties such as for example gradable terms. The objective of this project is to design generative information retrieval models enhanced by distributed representations of gradable terms. To accomplish this, the research team proposes the following concrete objectives (1) Generative IR models: Instead of computing matching scores for each query-document pair, a document generative model can effectively approximate a representation in the relevance sub-space for a given query, facilitating efficient fully-neural document retrieval. The investigators will explore generative models to approximate hierarchical representations of relevant documents, and use efficient nearest-neighbor algorithms to find and retrieve the most suitable organic documents in the collection. (2) Distributed representations of gradable terms: The often intangible meaning of gradable terms can be resolved by considering the global context of each term. The project will study a probabilistic formulation of gradable terms based on their hypothetical value ranges and frames of reference, estimated from the collection. (3) Incorporation of gradable term representations into generative retrieval systems: The integration of grounded representations of gradable terms in the generative retrieval model will provide better understanding and support of information needs. The project will study this effect on information needs with and without gradable terms. This project is supported by the National Science Foundation under grant agreement number 1956221.