3. LZMA#
- group LZMA_API
LZMA is a lossless compression algorithm that provides a high degree of compression. Its compression ratios are lower than other LZ77 based methods for most inputs (in the range of 25-30 for Silesia dataset). The lower compression ratio comes at the expense of lower compression speed. However, it provides good decompression speed (better than BZIP2, which can give compression ratios close to LZMA).
The LZMA compression library provides in-memory compression and decompression functions. Typical usage is as follows :
Call LzmaEncProps_Init() to initialize CLzmaEncProps object.
Update _CLzmaEncProps, if any specific user settings are desired, such as compression level.
To compress a file, load file to a source buffer and pass this and a destination buffer to LzmaEncode(). LzmaEncode() performs in-memory compression and writes the compressed data to the destination buffer.
To decompress, call LzmaDecode() by passing compressed data as source and a destination buffer to hold uncompressed bytes.
Decode Functions
-
SRes LzmaProps_Decode(CLzmaProps *p, const Byte *data, unsigned size)#
decodes header bytes in
data
and sets properties inp
.Parameters
Direction
Description
p
out
Properties object to be set
data
in
Header bytes from compressed stream
size
in
Size of header in bytes
- Returns:
Result
Description
Success
SZ_OK
Fail
SZ_ERROR_UNSUPPORTED - Unsupported properties
-
void LzmaDec_Init(CLzmaDec *p)#
Initialize LZMA decoder.
Parameters
Direction
Description
p
out
Decoder object to be initialized
- Returns:
void
-
SRes LzmaDec_AllocateProbs(CLzmaDec *p, const Byte *props, unsigned propsSize, ISzAllocPtr alloc)#
Allocate probability tables in decoder object. Sets properties in p by calling LzmaProps_Decode().
Parameters
Direction
Description
p
out
Decoder object that holds prob tables
props
in
Header bytes from compressed stream
propsSize
in
Size of header in bytes
alloc
in
Allocator object
- Returns:
Result
Description
Success
SZ_OK
Fail
SZ_ERROR_MEM - Memory allocation error
SZ_ERROR_PARAM - Incorrect parameter
SZ_ERROR_UNSUPPORTED - Unsupported properties
-
void LzmaDec_FreeProbs(CLzmaDec *p, ISzAllocPtr alloc)#
Free probability tables in decoder object.
Parameters
Direction
Description
p
out
Decoder object that holds prob tables
alloc
in
Allocator object
- Returns:
void
-
SRes LzmaDec_Allocate(CLzmaDec *p, const Byte *props, unsigned propsSize, ISzAllocPtr alloc)#
Allocate probability tables in decoder object. Sets properties in p by calling LzmaProps_Decode(). Allocate dictionary buffer.
Parameters
Direction
Description
p
out
Decoder object that holds prob tables and dictionary
props
in
Header bytes from compressed stream
propsSize
in
Size of header in bytes
alloc
in
Allocator object
- Returns:
Result
Description
Success
SZ_OK
Fail
SZ_ERROR_MEM - Memory allocation error
SZ_ERROR_PARAM - Incorrect parameter
SZ_ERROR_UNSUPPORTED - Unsupported properties
-
void LzmaDec_Free(CLzmaDec *p, ISzAllocPtr alloc)#
Free probability tables and dictionary in decoder object.
Parameters
Direction
Description
p
out
Decoder object that holds prob tables and dictionary
alloc
in
Allocator object
- Returns:
void
-
SRes LzmaDec_DecodeToDic(CLzmaDec *p, SizeT dicLimit, const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode, ELzmaStatus *status)#
Decode src and write decompressed data into internal dictionary buffer. You can use it, if you want to eliminate the overhead for data copying from dictionary to some other external buffer. You must work with CLzmaDec variables directly in this interface.
STEPS:
LzmaDec_Construct() LzmaDec_Allocate() for (each new stream) { LzmaDec_Init() while (it needs more decompression) { LzmaDec_DecodeToDic() use data from CLzmaDec::dic and update CLzmaDec::dicPos } } LzmaDec_Free()
When decoding to internal dictionary buffer (CLzmaDec::dic), you must manually update CLzmaDec::dicPos, if it reaches CLzmaDec::dicBufSize !!!
Parameters
Direction
Description
p
in,out
Decoder object that contains properties and dictionary buffer
dicLimit
in
Max number of bytes that can be decompressed and saved in dictionary
src
in
Source buffer containing compressed data
srcLen
in,out
Length of source buffer
finishMode
It has meaning only if the decoding reaches output limit (dicLimit).
- LZMA_FINISH_ANY - Decode just dicLimit bytes.
- LZMA_FINISH_END - Stream must be finished after dicLimit.
status
out
Decompression status at the end of current operation
- Returns:
Result
Description
Success
SZ_OK
Fail
SZ_ERROR_DATA - Data error
SZ_ERROR_PARAM - Incorrect parameter
SZ_ERROR_FAIL - Some unexpected error: internal error of code, memory corruption or hardware failure
Decode One Call Interface
-
SRes LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen, const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode, ELzmaStatus *status, ISzAllocPtr alloc)#
Decode compressed data in
src
and save result todest
Parameters
Direction
Description
dest
out
Destination buffer to save decompressed data
destLen
out
Size of decompressed data
src
in
Source buffer containing compressed data
srcLen
in,out
Length of source buffer
propData
in
Header bytes in compressed source data
propSize
in
Size of header
finishMode
It has meaning only if the decoding reaches output limit (*destLen).
- LZMA_FINISH_ANY - Decode just destLen bytes.
- LZMA_FINISH_END - Stream must be finished after (*destLen).
status
out
Decompression status at the end of current operation
alloc
in
Memory allocator object
- Returns:
Result
Description
Success
SZ_OK
Fail
SZ_ERROR_DATA - Data error
SZ_ERROR_MEM - Memory allocation error
SZ_ERROR_PARAM - Incorrect parameter
SZ_ERROR_UNSUPPORTED - Unsupported properties
SZ_ERROR_INPUT_EOF - It needs more bytes in input buffer (src)
SZ_ERROR_FAIL - Some unexpected error: internal error of code, memory corruption or hardware failure
Encode Functions
-
void LzmaEncProps_Init(CLzmaEncProps *p)#
Init properties. Properties are set to auto select mode.
Parameters
Direction
Description
p
out
Lzma encode properties object
- Returns:
void
-
void LzmaEncProps_Normalize(CLzmaEncProps *p)#
Set default values for properties in auto select mode.
Parameters
Direction
Description
p
out
Lzma encode properties object
- Returns:
void
-
UInt32 LzmaEncProps_GetDictSize(const CLzmaEncProps *props2)#
Normalize props2 and return dictionary size.
Parameters
Direction
Description
props2
out
Lzma encode properties object
- Returns:
dictionary size
-
CLzmaEncHandle LzmaEnc_Create(ISzAllocPtr alloc)#
Construct Lzma encoder.
Parameters
Direction
Description
alloc
in
Allocator object
- Returns:
Lzma encoder handle
-
void LzmaEnc_Destroy(CLzmaEncHandle p, ISzAllocPtr alloc, ISzAllocPtr allocBig)#
Free Lzma encoder.
Parameters
Direction
Description
p
out
Lzma encoder handle
alloc
in
Allocator object
allocBig
in
Allocator object for large blocks
- Returns:
void
-
SRes LzmaEnc_SetProps(CLzmaEncHandle p, const CLzmaEncProps *props)#
Update properties in p with values in props.
Parameters
Direction
Description
p
out
Lzma encoder handle
props
in
Lzma encoder properties
- Returns:
Result
Description
Success
SZ_OK
Fail
SZ_ERROR_PARAM - Incorrect parameter in props
-
void LzmaEnc_SetDataSize(CLzmaEncHandle p, UInt64 expectedDataSiize)#
Set expected data size in p to expectedDataSiize.
Parameters
Direction
Description
p
out
Lzma encoder handle
expectedDataSiize
in
Expected size of data
- Returns:
void
-
SRes LzmaEnc_WriteProperties(CLzmaEncHandle p, Byte *properties, SizeT *size)#
Build header bytes and save in properties.
Parameters
Direction
Description
p
in
Lzma encoder handle
properties
out
Buffer to write header bytes into
size
in,out
Number of bytes saved in properties
- Returns:
Result
Description
Success
SZ_OK
Fail
SZ_ERROR_PARAM - Incorrect parameter in props
-
unsigned LzmaEnc_IsWriteEndMark(CLzmaEncHandle p)#
Get write end mark saved within p.
Parameters
Direction
Description
p
in
Lzma encoder handle
- Returns:
writeEndMark
-
SRes LzmaEnc_MemEncode(CLzmaEncHandle p, Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen, int writeEndMark, ICompressProgress *progress, ISzAllocPtr alloc, ISzAllocPtr allocBig)#
Encode src in-memory and save compressed data to dest.
Parameters
Direction
Description
p
in,out
Lzma encoder handle
dest
out
Destination buffer to hold compressed data
destLen
out
Size of compressed data written to dest
src
in
Source buffer with uncompressed data
srcLen
in
Size of uncompressed data in src
writeEndMark
in
If non-0, finish stream with end mark
progress
in
Compression progress indicator
alloc
in
Allocator object
allocBig
in
Allocator object for large blocks
- Returns:
Result
Description
Success
SZ_OK
Fail
SZ_ERROR_MEM - Memory allocation error
SZ_ERROR_PARAM - Incorrect parameter in props
SZ_ERROR_WRITE - ISeqOutStream write callback error
SZ_ERROR_OUTPUT_EOF - output buffer overflow - version with (Byte *) output
SZ_ERROR_PROGRESS - some break from progress callback
Encode One Call Interface
-
SRes LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen, const CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize, int writeEndMark, ICompressProgress *progress, ISzAllocPtr alloc, ISzAllocPtr allocBig)#
Encode data in src and save compressed data to dest.
Parameters
Direction
Description
dest
out
Destination buffer to hold compressed data
destLen
out
Size of compressed data written to dest
src
in
Source buffer with uncompressed data
srcLen
in
Size of uncompressed data in src
props
in
Properties to control compression method
propsEncoded
out
Buffer to save header bytes
propsSize
out
Size of header bytes
writeEndMark
in
If non-0, finish stream with end mark
progress
in
Compression progress indicator
alloc
in
Allocator object
allocBig
in
Allocator object for large blocks
- Returns:
Result
Description
Success
SZ_OK
Fail
SZ_ERROR_MEM - Memory allocation error
SZ_ERROR_PARAM - Incorrect parameter in props
SZ_ERROR_WRITE - ISeqOutStream write callback error
SZ_ERROR_OUTPUT_EOF - output buffer overflow - version with (Byte *) output
SZ_ERROR_PROGRESS - some break from progress callback
Defines
Typedefs
-
typedef void *CLzmaEncHandle#
Pointer to context object that maintains state of LZMA encoder.
-
typedef struct _CLzmaEncProps CLzmaEncProps#
Structure to hold configurable parameters that can be used by LZMA encoder.
-
typedef struct _CLzmaProps CLzmaProps#
Structure to hold header parameters that are stored in LZMA compressed streams.
Enums
-
enum ELzmaFinishMode#
There are two types of LZMA streams:
Stream with end mark. That end mark adds about 6 bytes to compressed size.
Stream without end mark. You must know exact uncompressed size to decompress such stream.
ELzmaFinishMode has meaning only if the decoding reaches output limit.
Note
1. You must use
LZMA_FINISH_END
, when you know that current output buffer covers last bytes of block. In other cases you must useLZMA_FINISH_ANY
.Note
2. If LZMA decoder sees end marker before reaching output limit, it returns
SZ_OK
, and output value of destLen will be less than output buffer size limit. You can check status result also.Note
3. You can use multiple checks to test data integrity after full decompression:
Check Result and “status” variable.
Check that output(destLen) = uncompressedSize, if you know real uncompressedSize.
Check that output(srcLen) = compressedSize, if you know real compressedSize.
You must use correct finish mode in that case.
Values:
-
enumerator LZMA_FINISH_ANY#
finish at any point
-
enumerator LZMA_FINISH_END#
block must be finished at the end
-
enum ELzmaStatus#
ELzmaStatus is used as output status of decode function call.
Values:
-
enumerator LZMA_STATUS_NOT_SPECIFIED#
use main error code instead
-
enumerator LZMA_STATUS_FINISHED_WITH_MARK#
stream was finished with end mark.
-
enumerator LZMA_STATUS_NOT_FINISHED#
stream was not finished
-
enumerator LZMA_STATUS_NEEDS_MORE_INPUT#
you must provide more input bytes
-
enumerator LZMA_STATUS_MAYBE_FINISHED_WITHOUT_MARK#
there is probability that stream was finished without end mark
-
enumerator LZMA_STATUS_NOT_SPECIFIED#
-
struct _CLzmaEncProps#
- #include <LzmaEnc.h>
Structure to hold configurable parameters that can be used by LZMA encoder.
Public Members
-
int algo#
Dictionary search algo to use: 0 - fast (hash chain), 1 - normal (binary search tree).
0 - fast, 1 - normal,
default = 1
-
int btMode#
0 - hashChain Mode, 1 - binTree mode - normal, default = 1
-
UInt32 dictSize#
size of dictionary / search buffer
(1 << 12) <= dictSize <= (1 << 27) for 32-bit version
(1 << 12) <= dictSize <= (3 << 29) for 64-bit version default = (1 << 24)
-
int fb#
Number of fast bytes.
5 <= fb <= 273,
default = 32
-
int lc#
Number of high bits of the previous byte to use as a context for literal encoding.
0 <= lc <= 8,
default = 3
-
int level#
Control degree of compression. Lower level gives less compression at higher speed.
0 <= level <= 9
-
int lp#
Number of low bits of the dictionary position to include in literal posState.
0 <= lp <= 4,
default = 0
-
UInt32 mc#
Cut value, limit on number of nodes to search in dictionary.
1 <= mc <= (1 << 30),
default = 32
-
int numHashBytes#
Number of bytes used to compute hash.
2, 3 or 4,
default = 4
-
int numThreads#
Threads used for processing.
1 or 2,
default = 1
-
int pb#
Number of low bits of processedPos to include in posState.
0 <= pb <= 4,
default = 2
-
UInt64 reduceSize#
estimated size of data that will be compressed. default = (UInt64)(Int64)-1.
Encoder uses this value to reduce dictionary size
-
unsigned writeEndMark#
0 - do not write EOPM, 1 - write EOPM, default = 0
-
int algo#
-
struct _CLzmaProps#
- #include <LzmaDec.h>
Structure to hold header parameters that are stored in LZMA compressed streams.
Public Members
-
UInt32 dicSize#
size of dictionary / search buffer to use to find matches
-
Byte lc#
number of high bits of the previous byte to use as a context for literal encoding (default 3).
-
Byte lp#
number of low bits of the dictionary position to include in literal posState (default 0).
-
Byte pb#
number of low bits of processedPos to include in posState (default 2).
-
UInt32 dicSize#
-
struct CLzmaDec#
Public Members
-
const Byte *buf#
input stream of compressed bytes
-
UInt32 checkDicSize#
indicator for situation where bytes to be processed is more than bytes that can fit in dest buffer
-
UInt32 code#
range coder: encoded point within range
-
Byte *dic#
circular buffer of decompressed bytes. Used as reference to copy from for future matches
-
SizeT dicBufSize#
dictionary size
-
SizeT dicPos#
current position in dictionary
-
UInt32 numProbs#
number of items in probs table
-
CLzmaProb *probs#
all context model probabilities
-
CLzmaProb *probs_1664#
all context model probabilities
-
UInt32 processedPos#
indicator of bytes decompressed until now. Incremented by 1 or incremented by len based on type of decompress operation performed
-
CLzmaProps prop#
properties read from header bytes in compressed data
-
UInt32 range#
range coder: range size
-
UInt32 remainLen#
shows status of LZMA decoder:
< kMatchSpecLenStart : the number of bytes to be copied with (rep0) offset
= kMatchSpecLenStart : the LZMA stream was finished with end mark
= kMatchSpecLenStart + 1 : need init range coder
= kMatchSpecLenStart + 2 : need init range coder and state
= kMatchSpecLen_Error_Fail : Internal Code Failure
= kMatchSpecLen_Error_Data + [0 … 273] : LZMA Data Error
-
UInt32 reps[4]#
offsets for repeated matches rep0-3
-
UInt32 state#
current state in state machine
-
const Byte *buf#