Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  



























Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Overview  





2 History  





3 Applications  



3.1  General applications  





3.2  Domain-specific applications  





3.3  Other retrieval methods  







4 Model types  



4.1  First dimension: mathematical basis  





4.2  Second dimension: properties of the model  







5 Performance and correctness measures  





6 Timeline  





7 Major conferences  





8 Awards in the field  





9 See also  





10 References  





11 Further reading  





12 External links  














Information retrieval






العربية
Azərbaycanca
Български
Català
Čeština
Dansk
Deutsch
Ελληνικά
Español
Esperanto
Euskara
فارسی
Français
Gaeilge
Galego

ि
Bahasa Indonesia
Italiano
Bahasa Melayu
Монгол
Nederlands

Norsk bokmål
Norsk nynorsk
Polski
Português
Русский
Simple English
Српски / srpski
Suomi
Svenska
ி
Тоҷикӣ
Türkçe
Українська
Tiếng Vit


 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 




In other projects  



Wikimedia Commons
Wikiquote
 


















From Wikipedia, the free encyclopedia
 


Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an information need. The information need can be specified in the form of a search query. In the case of document retrieval, queries can be based on full-text or other content-based indexing. Information retrieval is the science[1] of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.

Automated information retrieval systems are used to reduce what has been called information overload. An IR system is a software system that provides access to books, journals and other documents; it also stores and manages those documents. Web search engines are the most visible IR applications.

Overview[edit]

An information retrieval process begins when a user enters a query into the system. Queries are formal statements of information needs, for example search strings in web search engines. In information retrieval, a query does not uniquely identify a single object in the collection. Instead, several objects may match the query, perhaps with different degrees of relevance.

An object is an entity that is represented by information in a content collection or database. User queries are matched against the database information. However, as opposed to classical SQL queries of a database, in information retrieval the results returned may or may not match the query, so results are typically ranked. This ranking of results is a key difference of information retrieval searching compared to database searching.[2]

Depending on the application the data objects may be, for example, text documents, images,[3] audio,[4] mind maps[5] or videos. Often the documents themselves are not kept or stored directly in the IR system, but are instead represented in the system by document surrogates or metadata.

Most IR systems compute a numeric score on how well each object in the database matches the query, and rank the objects according to this value. The top ranking objects are then shown to the user. The process may then be iterated if the user wishes to refine the query.[6]

History[edit]

there is ... a machine called the Univac ... whereby letters and figures are coded as a pattern of magnetic spots on a long steel tape. By this means the text of a document, preceded by its subject code symbol, can be recorded ... the machine ... automatically selects and types out those references which have been coded in any desired way at a rate of 120 words a minute

— J. E. Holmstrom, 1948

The idea of using computers to search for relevant pieces of information was popularized in the article As We May ThinkbyVannevar Bush in 1945.[7] It would appear that Bush was inspired by patents for a 'statistical machine' – filed by Emanuel Goldberg in the 1920s and 1930s – that searched for documents stored on film.[8] The first description of a computer searching for information was described by Holmstrom in 1948,[9] detailing an early mention of the Univac computer. Automated information retrieval systems were introduced in the 1950s: one even featured in the 1957 romantic comedy, Desk Set. In the 1960s, the first large information retrieval research group was formed by Gerard Salton at Cornell. By the 1970s several different retrieval techniques had been shown to perform well on small text corpora such as the Cranfield collection (several thousand documents).[7] Large-scale retrieval systems, such as the Lockheed Dialog system, came into use early in the 1970s.

In 1992, the US Department of Defense along with the National Institute of Standards and Technology (NIST), cosponsored the Text Retrieval Conference (TREC) as part of the TIPSTER text program. The aim of this was to look into the information retrieval community by supplying the infrastructure that was needed for evaluation of text retrieval methodologies on a very large text collection. This catalyzed research on methods that scale to huge corpora. The introduction of web search engines has boosted the need for very large scale retrieval systems even further.

Applications[edit]

Areas where information retrieval techniques are employed include (the entries are in alphabetical order within each category):

General applications[edit]

Domain-specific applications[edit]

Other retrieval methods[edit]

Methods/Techniques in which information retrieval techniques are employed include:

Model types[edit]

Categorization of IR-models (translated from German entry, original source Dominik Kuropka)

In order to effectively retrieve relevant documents by IR strategies, the documents are typically transformed into a suitable representation. Each retrieval strategy incorporates a specific model for its document representation purposes. The picture on the right illustrates the relationship of some common models. In the picture, the models are categorized according to two dimensions: the mathematical basis and the properties of the model.

First dimension: mathematical basis[edit]

Second dimension: properties of the model[edit]

Performance and correctness measures[edit]

The evaluation of an information retrieval system' is the process of assessing how well a system meets the information needs of its users. In general, measurement considers a collection of documents to be searched and a search query. Traditional evaluation metrics, designed for Boolean retrieval[clarification needed] or top-k retrieval, include precision and recall. All measures assume a ground truth notion of relevance: every document is known to be either relevant or non-relevant to a particular query. In practice, queries may be ill-posed and there may be different shades of relevance.

Timeline[edit]

Major conferences[edit]

Awards in the field[edit]

See also[edit]

  • Computer memory – Computer component that stores information for immediate use
  • Controlled vocabulary – Method of organizing knowledge
  • Cross-language information retrieval – retrieval of Information in different languages
  • Data mining – Process of extracting and discovering patterns in large data sets
  • Data retrieval – Way to obtain data from a database
  • European Summer School in Information Retrieval – ESSIR promotes research, innovation, and development of information access systems by educating junior and senior researchers, students, professionals, and developers on the latest developments in the field, both methodological and technological.
  • Human–computer information retrieval (HCIR)
  • Information extraction – Machine reading of unstructured documents
  • Information seeking – Process or activity of attempting to obtain information in both human and technological contexts
  • Information Retrieval Facility – Organization in Vienna, Austria 2006–2012
  • Knowledge visualization – Set of techniques for creating images, diagrams, or animations to communicate a message
  • Multimedia information retrieval
  • Personal information management – Tools and systems for managing one's own data
  • Pearl growing – Type of search strategy
  • Query understanding – Search engine processing step
  • Relevance (information retrieval) – Measure of a document's applicability to a given subject or search query
  • Relevance feedback – type of feedback
  • Rocchio classification – A classification model in machine learning based on centroids
  • Search engine indexing – Method for data management
  • Special Interest Group on Information Retrieval – Subgroup of the Association for Computing Machinery
  • Subject indexing – Classifying a document by index terms
  • Temporal information retrieval – Area of research related to information retrieval centered on timeliness
  • tf–idf – Estimate of the importance of a word in a document
  • XML retrieval – Content-based retrieval of XML documents
  • Web mining – Process of extracting and discovering patterns in large data sets
  • References[edit]

    1. ^ Luk, R. W. P. (2022). "Why is information retrieval a scientific discipline?". Foundations of Science. 27 (2): 427–453. doi:10.1007/s10699-020-09685-x. hdl:10397/94873. S2CID 220506422.
  • ^ Jansen, B. J. and Rieh, S. (2010) The Seventeen Theoretical Constructs of Information Searching and Information Retrieval Archived 2016-03-04 at the Wayback Machine. Journal of the American Society for Information Sciences and Technology. 61(8), 1517-1534.
  • ^ Goodrum, Abby A. (2000). "Image Information Retrieval: An Overview of Current Research". Informing Science. 3 (2).
  • ^ Foote, Jonathan (1999). "An overview of audio information retrieval". Multimedia Systems. 7: 2–10. CiteSeerX 10.1.1.39.6339. doi:10.1007/s005300050106. S2CID 2000641.
  • ^ Beel, Jöran; Gipp, Bela; Stiller, Jan-Olaf (2009). Information Retrieval On Mind Maps - What Could It Be Good For?. Proceedings of the 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom'09). Washington, DC: IEEE. Archived from the original on 2011-05-13. Retrieved 2012-03-13.
  • ^ Frakes, William B.; Baeza-Yates, Ricardo (1992). Information Retrieval Data Structures & Algorithms. Prentice-Hall, Inc. ISBN 978-0-13-463837-9. Archived from the original on 2013-09-28.
  • ^ a b Singhal, Amit (2001). "Modern Information Retrieval: A Brief Overview" (PDF). Bulletin of the IEEE Computer Society Technical Committee on Data Engineering. 24 (4): 35–43.
  • ^ Mark Sanderson & W. Bruce Croft (2012). "The History of Information Retrieval Research". Proceedings of the IEEE. 100: 1444–1451. doi:10.1109/jproc.2012.2189916.
  • ^ JE Holmstrom (1948). "'Section III. Opening Plenary Session". The Royal Society Scientific Information Conference, 21 June-2 July 1948: Report and Papers Submitted: 85.
  • ^ Mooers, Calvin N.; The Theory of Digital Handling of Non-numerical Information and its Implications to Machine Economics (Zator Technical Bulletin No. 48), cited in Fairthorne, R. A. (1958). "Automatic Retrieval of Recorded Information". The Computer Journal. 1 (1): 37. doi:10.1093/comjnl/1.1.36.
  • ^ Doyle, Lauren; Becker, Joseph (1975). Information Retrieval and Processing. Melville. pp. 410 pp. ISBN 978-0-471-22151-7.
  • ^ Perry, James W.; Kent, Allen; Berry, Madeline M. (1955). "Machine literature searching X. Machine language; factors underlying its design and development". American Documentation. 6 (4): 242–254. doi:10.1002/asi.5090060411.
  • ^ Maron, Melvin E. (2008). "An Historical Note on the Origins of Probabilistic Indexing" (PDF). Information Processing and Management. 44 (2): 971–972. doi:10.1016/j.ipm.2007.02.012.
  • ^ N. Jardine, C.J. van Rijsbergen (December 1971). "The use of hierarchic clustering in information retrieval". Information Storage and Retrieval. 7 (5): 217–240. doi:10.1016/0020-0271(71)90051-9.
  • ^ Doszkocs, T.E. & Rapp, B.A. (1979). "Searching MEDLINE in English: a Prototype User Interface with Natural Language Query, Ranked Output, and relevance feedback," In: Proceedings of the ASIS Annual Meeting, 16: 131-139.
  • ^ Korfhage, Robert R. (1997). Information Storage and Retrieval. Wiley. pp. 368 pp. ISBN 978-0-471-14338-3.
  • Further reading[edit]

    External links[edit]


    Retrieved from "https://en.wikipedia.org/w/index.php?title=Information_retrieval&oldid=1207764981"

    Categories: 
    Information retrieval
    Natural language processing
    Hidden categories: 
    Webarchive template wayback links
    Articles with short description
    Short description is different from Wikidata
    Wikipedia articles needing clarification from June 2018
    Pages displaying wikidata descriptions as a fallback via Module:Annotated link
    Pages displaying short descriptions of redirect targets via Module:Annotated link
    Commons category link from Wikidata
    Articles with BNE identifiers
    Articles with BNF identifiers
    Articles with BNFdata identifiers
    Articles with GND identifiers
    Articles with J9U identifiers
    Articles with LCCN identifiers
    Articles with NDL identifiers
    Articles with NKC identifiers
     



    This page was last edited on 15 February 2024, at 17:27 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki