Home  

Random  

Nearby  



Log in  



Settings  



Donate  



About Wikipedia  

Disclaimers  



Wikipedia





Enterprise search





Article  

Talk  



Language  

Watch  

Edit  





Enterprise search is software technology for searching data sources internal to a company, typically intranet and database content. The search is generally offered only to users internal to the company.[1][2] Enterprise search can be contrasted with web search, which applies search technology to documents on the open web, and desktop search, which applies search technology to the content on a single computer.

Enterprise search systems index data and documents from a variety of sources such as: file systems, intranets, document management systems, e-mail, and databases. Many enterprise search systems integrate structured and unstructured data in their collections.[3] Enterprise search systems also use access controls to enforce a security policy on their users.[4]

Enterprise search can be seen as a type of vertical search of an enterprise.

Components of an enterprise search system

edit

In an enterprise search system, content goes through various phases from source repository to search results:

Content awareness

edit

Content awareness (or "content collection") is usually either a push or pull model. In the push model, a source system is integrated with the search engine in such a way that it connects to it and pushes new content directly to its APIs. This model is used when real-time indexing is important. In the pull model, the software gathers content from sources using a connector such as a web crawler or a database connector. The connector typically polls the source with certain intervals to look for new, updated or deleted content.[5]

Content processing and analysis

edit

Content from different sources may have many different formats or document types, such as XML, HTML, Office document formats or plain text. The content processing phase processes the incoming documents to plain text using document filters. It is also often necessary to normalize content in various ways to improve recallorprecision. These may include stemming, lemmatization, synonym expansion, entity extraction, part of speech tagging.

As part of processing and analysis, tokenization is applied to split the content into tokens which is the basic matching unit. It is also common to normalize tokens to lower case to provide case-insensitive search, as well as to normalize accents to provide better recall.

Indexing

edit

The resulting text is stored in an index, which is optimized for quick lookups without storing the full text of the document. The index may contain the dictionary of all unique words in the corpus as well as information about ranking and term frequency.

Query processing

edit

Using a web page, the user issues a query to the system. The query consists of any terms the user enters as well as navigational actions such as faceting and paging information.

Matching

edit

The processed query is then compared to the stored index, and the search system returns results (or "hits") referencing source documents that match. Some systems are able to present the document as it was indexed.

See also

edit

References

edit
  1. ^ Kruschwitz, Udo; Hull, Charlie (2017). "Searching the Enterprise". Foundations and Trends in Information Retrieval. 11: 1–142. doi:10.1561/1500000053.
  • ^ "What is Enterprise Search?".
  • ^ "The New Face of Enterprise Search: Bridging Structured and Unstructured Information" (PDF). Archived from the original (PDF) on 2015-10-28. Retrieved 2013-05-27.
  • ^ "Security Requirements to Enterprise Search: part 1 - New Idea Engineering".
  • ^ "Understanding Content Collection and Indexing".

  • Retrieved from "https://en.wikipedia.org/w/index.php?title=Enterprise_search&oldid=1224144818"
     



    Last edited on 16 May 2024, at 14:26  





    Languages

     


    Deutsch
    فارسی

    Norsk bokmål
    Русский
    Svenska
    Türkçe
     

    Wikipedia


    This page was last edited on 16 May 2024, at 14:26 (UTC).

    Content is available under CC BY-SA 4.0 unless otherwise noted.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Terms of Use

    Desktop