Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 History  





2 Architecture  



2.1  Query management  





2.2  Cluster management  







3 Features  





4 See also  





5 References  





6 External links  














Apache Pinot







Add links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 




In other projects  



Wikimedia Commons
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


Apache Pinot
Original author(s)
  • Kishore Gopalakrishna
  • Xiang Fu
  • Developer(s)Apache Pinot
    Stable release

    1.0.0 / 19 September 2023; 10 months ago (2023-09-19)

    RepositoryPinot repository
    Written inJava
    Operating systemCross-platform
    Type
  • real-time
  • column-oriented data store
  • LicenseApache License 2.0
    Websitepinot.apache.org

    Apache Pinot is a column-oriented, open-source, distributed data store written in Java. Pinot is designed to execute OLAP queries with low latency.[1][2][3][4][5] It is suited in contexts where fast analytics, such as aggregations, are needed on immutable data, possibly, with real-time data ingestion.[6][7][8] The name Pinot comes from the Pinot grape vines that are pressed into liquid that is used to produce a variety of different wines. The founders of the database chose the name as a metaphor for analyzing vast quantities of data from a variety of different file formats or streaming data sources.[9]

    Pinot was first created at LinkedIn after the engineering staff determined that there were no off the shelf solutions that met the social networking site's requirements like predictable low latency, data freshness in seconds, fault tolerance and scalability.[9][10] Pinot is used in production by technology companies such as Uber,[11] Microsoft,[8] and Factual.

    History

    [edit]

    Pinot was started as an internal project at LinkedIn in 2013 to power a variety of user-facing and business-facing products. The first analytics product at LinkedIn to use Pinot was a redesign of the social networking site's feature that allows members to see who has viewed their profile in real-time. The project was open-sourced in June 2015 under an Apache 2.0 license and was donated to the Apache Software Foundation by LinkedIn in June 2019.[9][8]

    Architecture

    [edit]
    Architecture of Apache Pinot
    Architecture diagram of Apache Pinot

    Pinot uses Apache Helix for cluster management. Helix is embedded as an agent within the different components and uses Apache ZooKeeper for coordination and maintaining the overall cluster state and health. All Pinot servers and brokers are managed by Helix. Helix is a generic cluster management framework to manage partitions and replicas in a distributed system.

    Query management

    [edit]

    Queries are received by brokers—which checks the request against the segment-to-server routing table—scattering the request between real-time and offline servers.

    Cluster management

    [edit]

    Pinot leverages Apache Helix for cluster management. Helix is a cluster management framework to manage replicated, partitioned resources in a distributed system. Helix uses Zookeeper to store cluster state and metadata.

    Features

    [edit]

    Pinot shares similar features with comparable OLAP datastores, such as Apache Druid.[12][13] Like Druid, Pinot is a column-oriented database with various compression schemes such as Run Length and Fixed-Bit Length. Pinot supports pluggable indexing technologies - Sorted Index, Bitmap Index, Inverted Index, Star-Tree Index, and Range Index, which are what primarily differentiates Pinot from other OLAP datastores.

    Pinot supports near real-time ingestion from streams such as Kafka, AWS Kinesis and batch ingestion from sources such as Hadoop, S3, Azure, GCS. Like most other OLAP datastores and data warehousing solutions, Pinot supports a SQL-like query language that supports selection, aggregation, filtering, group by, order by, distinct queries on data.

    See also

    [edit]

    References

    [edit]
    1. ^ Cui, Tingting; Peng, Lijun; Pardoe, David; Liu, Kun; Agarwal, Deepak; Kumar, Deepak (14 August 2017). "Data-Driven Reserve Prices for Social Advertising Auctions at LinkedIn". Proceedings of the ADKDD'17. Association for Computing Machinery. pp. 1–7. doi:10.1145/3124749.3124759. ISBN 9781450351942. S2CID 12327343.
  • ^ Rosa, Marcello La (2021). ADVANCED INFORMATION SYSTEMS ENGINEERING: 33rd International Conference. Springer Nature. ISBN 978-3-030-79382-1.
  • ^ Chin, Francis Y. L.; Chen, C. L. Philip; Khan, Latifur; Lee, Kisung; Zhang, Liang-Jie (20 June 2018). Big Data – BigData 2018: 7th International Congress, Held as Part of the Services Conference Federation, SCF 2018, Seattle, WA, USA, June 25–30, 2018, Proceedings. Springer. p. 153. ISBN 978-3-319-94301-5.
  • ^ Im, Jean-François; Gopalakrishna, Kishore; Subramaniam, Subbu; Shrivastava, Mayank; Tumbde, Adwait; Jiang, Xiaotian; Dai, Jennifer; Lee, Seunghyun; Pawar, Neha; Li, Jialiang; Aringunram, Ravi (2018-05-27). "Pinot: Realtime OLAP for 530 Million Users". Proceedings of the 2018 International Conference on Management of Data. Sigmod '18. Association for Computing Machinery. pp. 583–594. doi:10.1145/3183713.3190661. ISBN 9781450347037. S2CID 44083085.
  • ^ "The Apache Software Foundation Announces Apache® Pinot™ as a Top-Level Project". blogs.apache.org. 2 August 2021.
  • ^ Rogers, Ryan; Subramaniam, Subbu; Peng, Sean; Durfee, David; Lee, Seunghyun; Kancha, Santosh Kumar; Sahay, Shraddha; Ahammad, Parvez (16 November 2020). "LinkedIn's Audience Engagements API: A Privacy Preserving Data Analytics System at Scale". arXiv:2002.05839 [cs.CR].
  • ^ Javadi, Seyyed Ahmad; Gupta, Harsh; Manhas, Robin; Sahu, Shweta; Gandhi, Anshul (July 2018). "EASY: Efficient Segment Assignment Strategy for Reducing Tail Latencies in Pinot". 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS). pp. 1432–1437. doi:10.1109/ICDCS.2018.00144. ISBN 978-1-5386-6871-9. S2CID 21659844.
  • ^ a b c Pawar, Neha. "Pinot Joins Apache Incubator" Archived 2019-04-02 at the Wayback Machine, LinkedIn Engineering, 01 April 2019
  • ^ a b c Gopalakrishna, Kishore. "Open Sourcing Pinot: Scaling the Wall of Real-Time Analytics". engineering.linkedin.com. LinkedIn. Archived from the original on 10 September 2015. Retrieved 3 September 2020.
  • ^ Yegulalp, Serdar (2015-06-11). "LinkedIn fills another SQL-on-Hadoop niche". InfoWorld.
  • ^ Fu, Yupeng; Soman, Chinmay (9 June 2021). "Real-time Data Infrastructure at Uber". Proceedings of the 2021 International Conference on Management of Data. Sigmod/Pods '21. Association for Computing Machinery. pp. 2503–2516. arXiv:2104.00087. doi:10.1145/3448016.3457552. ISBN 9781450383431. S2CID 232478317.
  • ^ Ordonez, Carlos; Song, Il-Yeol; Anderst-Kotsis, Gabriele; Tjoa, A. Min; Khalil, Ismail (2 October 2019). Big Data Analytics and Knowledge Discovery: 21st International Conference, DaWaK 2019, Linz, Austria, August 26–29, 2019, Proceedings. Springer. p. 170. ISBN 978-3-030-27520-4.
  • ^ Uttamchandani, Sandeep (10 September 2020). The Self-Service Data Roadmap. "O'Reilly Media, Inc.". ISBN 978-1-4920-7520-2.
  • [edit]
    Retrieved from "https://en.wikipedia.org/w/index.php?title=Apache_Pinot&oldid=1190108063"

    Categories: 
    Apache Software Foundation projects
    Distributed data stores
    Structured storage
    Free database management systems
    Database engines
    Big data products
    Hidden categories: 
    Webarchive template wayback links
    Articles with short description
    Short description is different from Wikidata
     



    This page was last edited on 16 December 2023, at 00:20 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki