Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Stream format  



1.1  Duplicate string elimination  





1.2  Bit reduction  







2 Encoder/compressor  



2.1  Deflate64/Enhanced Deflate  







3 Using Deflate in new software  



3.1  Encoder implementations  





3.2  Hardware encoders  







4 Decoder/decompressor  



4.1  Inflate-only implementations  





4.2  Hardware decoders  







5 See also  





6 References  





7 External links  














Deflate






Català
Čeština
Deutsch
Español
Français
Galego

Italiano
Nederlands

Polski
Português
Русский
Slovenščina
Suomi
Svenska
Українська

 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


Incomputing, Deflate (stylized as DEFLATE, and also called Flate[1][2]) is a lossless data compression file format that uses a combination of LZ77 and Huffman coding. It was designed by Phil Katz, for version 2 of his PKZIP archiving tool. Deflate was later specified in RFC 1951 (1996).[3]

Katz also designed the original algorithm used to construct Deflate streams. This algorithm was patentedasU.S. patent 5,051,745, and assigned to PKWARE, Inc.[4][5] As stated in the RFC document, an algorithm producing Deflate files was widely thought to be implementable in a manner not covered by patents.[3] This led to its widespread use – for example, in gzip compressed files and PNG image files, in addition to the ZIP file format for which Katz originally designed it. The patent has since expired.

Stream format

[edit]

A Deflate stream consists of a series of blocks. Each block is preceded by a 3-bit header:

The stored block option adds minimal overhead and is used for data that is incompressible.

Most compressible data will end up being encoded using method 10, the dynamic Huffman encoding, which produces an optimized Huffman tree customized for each block of data individually. Instructions to generate the necessary Huffman tree immediately follow the block header. The static Huffman option is used for short messages, where the fixed saving gained by omitting the tree outweighs the percentage compression loss due to using a non-optimal (thus, not technically Huffman) code.

Compression is achieved through two steps:

Duplicate string elimination

[edit]

Within compressed blocks, if a duplicate series of bytes is spotted (a repeated string), then a back-reference is inserted, linking to the previous location of that identical string instead. An encoded match to an earlier string consists of an 8-bit length (3–258 bytes) and a 15-bit distance (1–32,768 bytes) to the beginning of the duplicate. Relative back-references can be made across any number of blocks, as long as the distance appears within the last 32 KiB of uncompressed data decoded (termed the sliding window).

If the distance is less than the length, the duplicate overlaps itself, indicating repetition. For example, a run of 10 identical bytes can be encoded as one byte, followed by a duplicate of length 9, beginning with the previous byte.

Searching the preceding text for duplicate substrings is the most computationally expensive part of the DEFLATE algorithm, and the operation which compression level settings affect.

Bit reduction

[edit]

The second compression stage consists of replacing commonly used symbols with shorter representations and less commonly used symbols with longer representations. The method used is Huffman coding which creates an unprefixed tree of non-overlapping intervals, where the length of each sequence is inversely proportional to the logarithm of the probability of that symbol needing to be encoded. The more likely it is that a symbol has to be encoded, the shorter its bit-sequence will be.

A tree is created, containing space for 288 symbols:

A match length code will always be followed by a distance code. Based on the distance code read, further "extra" bits may be read in order to produce the final distance. The distance tree contains space for 32 symbols:

Note that for the match distance symbols 2–29, the number of extra bits can be calculated as .

The two codes (the 288-symbol length/literal tree and the 32-symbol distance tree) are themselves encoded as canonical Huffman codes by giving the bit length of the code for each symbol. The bit lengths are themselves run-length encoded to produce as compact a representation as possible. As an alternative to including the tree representation, the "static tree" option provides standard fixed Huffman trees. The compressed size using the static trees can be computed using the same statistics (the number of times each symbol appears) as are used to generate the dynamic trees, so it is easy for a compressor to choose whichever is smaller.

Encoder/compressor

[edit]

During the compression stage, it is the encoder that chooses the amount of time spent looking for matching strings. The zlib/gzip reference implementation allows the user to select from a sliding scale of likely resulting compression-level vs. speed of encoding. Options range from 0 (do not attempt compression, just store uncompressed) to 9 representing the maximum capability of the reference implementation in zlib/gzip.

Other Deflate encoders have been produced, all of which will also produce a compatible bitstream capable of being decompressed by any existing Deflate decoder. Differing implementations will likely produce variations on the final encoded bit-stream produced. The focus with non-zlib versions of an encoder has normally been to produce a more efficiently compressed and smaller encoded stream.

Deflate64/Enhanced Deflate

[edit]

Deflate64, specified by PKWARE, is a proprietary variant of Deflate. It's fundamentally the same algorithm. What has changed is the increase in dictionary size from 32 KB to 64 KB, an extension of the distance codes to 16 bits so that they may address a range of 64 KB, and the length code, which is extended to 16 bits so that it may define lengths of three to 65,538 bytes.[6] This leads to Deflate64 having a longer compression time, and potentially a slightly higher compression ratio, than Deflate.[7] Several free and/or open source projects support Deflate64, such as 7-Zip,[8] while others, such as zlib, do not, as a result of the proprietary nature of the procedure[9] and the very modest performance increase over Deflate.[10]

Using Deflate in new software

[edit]

Implementations of Deflate are freely available in many languages. Apps written in C typically use the zlib library (under the permissive zlib License). Apps in Borland Pascal (and compatible languages) can use paszlib. Apps in C++ can take advantage of the improved Deflate library in 7-Zip. Both Java and .NET Framework offer out-of-the-box support for Deflate in their libraries (respectively, java.util.zip and System.IO.Compression). Apps in Ada can use Zip-Ada (pure) or ZLib-Ada.

Encoder implementations

[edit]

AdvanceCOMP uses the higher compression ratio versions of Deflate in 7-Zip, libdeflate, and Zopfli to enable recompression of gzip, PNG, MNG and ZIP files with the possibility of smaller file sizes than zlib is able to achieve at maximum settings.[14]

Hardware encoders

[edit]

Decoder/decompressor

[edit]

Inflate is the decoding process that takes a Deflate bitstream for decompression and correctly produces the original full-size data or file.

Inflate-only implementations

[edit]

The normal intent with an alternative Inflate implementation is highly optimized decoding speed, or extremely predictable RAM usage for micro-controller embedded systems.

Hardware decoders

[edit]

See also

[edit]

References

[edit]
  1. ^ The Go Authors. "flate package - compress/flate - Go Packages". The Go Programming Language. Google. Retrieved 5 September 2023. Package flate implements the DEFLATE compressed data format, described in RFC issue 1951.
  • ^ Adobe Systems Incorporated. "PDF 32000-1:2008: Document management — Portable document format — Part 1: PDF 1.7" (PDF). Adobe Open Source. Adobe. p. 23. Retrieved 5 September 2023. FlateDecode [...] Decompresses data encoded using the zlib/deflate compression method
  • ^ a b Deutsch, L. Peter (May 1996). DEFLATE Compressed Data Format Specification version 1.3. IETF. p. 1. sec. Abstract. doi:10.17487/RFC1951. RFC 1951. Retrieved 2014-04-23.
  • ^ US patent 5051745, Katz, Phillip W., "String Searcher, and Compressor Using Same", published 1991-09-24, issued 1991-09-24, assigned to PKWare Inc. 
  • ^ David, Salomon (2007). Data Compression: The Complete Reference (4 ed.). Springer. p. 241. ISBN 978-1-84628-602-5.
  • ^ "Binary Essence – Deflate64". Archived from the original on 21 June 2017. Retrieved 22 May 2011.{{cite web}}: CS1 maint: bot: original URL status unknown (link)
  • ^ "Binary Essence – "Calgary Corpus" compression comparisons". Archived from the original on 27 December 2017. Retrieved 22 May 2011.{{cite web}}: CS1 maint: bot: original URL status unknown (link)
  • ^ "-m (Set compression Method) switch". sevenzip.osdn.jp. Archived from the original on 2022-04-09. Retrieved 2023-01-21.
  • ^ History of Lossless Data Compression Algorithms – Deflate64
  • ^ zlib FAQ – Does zlib support the new "Deflate64" format introduced by PKWare?
  • ^ "Plan 9 from Bell Labs's /n/sources/plan9/sys/src/libflate". plan9.bell-labs.com. Lucent Technologies. Archived from the original on 2006-03-15.
  • ^ "High Performance DEFLATE Compression with Optimizations for Genomic Data Sets". Intel Software. 1 October 2019. Retrieved 18 January 2020.
  • ^ "libdeflate". Heavily optimized library for DEFLATE/zlib/gzip compression and decompression.
  • ^ Mazzoleni, Andrea (21 February 2023). "amadvance/advancecomp". GitHub.
  • ^ "Intel® Xeon® Processor E5-2600 and E5-2400 Series with Intel® Communications Chipset 89xx Series". Retrieved 2016-05-18.
  • ^ a b "Introducing the IBM z15 - The enterprise platform for mission-critical hybrid multicloud". IBM. 12 September 2019. Retrieved 2021-11-01.
  • ^ a b Lascu, Octavian (28 April 2021). IBM z15 (8562) Technical Guide, Page 97. IBM Redbooks. ISBN 9780738458991. Retrieved 2021-11-01.
  • ^ a b "Data compression by using the zlibNX library - IBM Documentation". IBM. Retrieved 2021-11-01.
  • ^ a b "Exploitation of In-Core Acceleration of POWER Processors for AIX". Retrieved 2021-11-01.
  • [edit]
    Retrieved from "https://en.wikipedia.org/w/index.php?title=Deflate&oldid=1235299422"

    Categories: 
    Data compression
    Lossless compression algorithms
    Hidden categories: 
    CS1 maint: bot: original URL status unknown
    Articles with short description
    Short description matches Wikidata
    Webarchive template wayback links
    All articles with dead external links
    Articles with dead external links from November 2019
    Articles with permanently dead external links
     



    This page was last edited on 18 July 2024, at 16:28 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki