3. LZMA#

group LZMA_API

LZMA is a lossless compression algorithm that provides a high degree of compression. Its compression ratios are lower than other LZ77 based methods for most inputs (in the range of 25-30 for Silesia dataset). The lower compression ratio comes at the expense of lower compression speed. However, it provides good decompression speed (better than BZIP2, which can give compression ratios close to LZMA).

The LZMA compression library provides in-memory compression and decompression functions. Typical usage is as follows :

  1. Call LzmaEncProps_Init() to initialize CLzmaEncProps object.

  2. Update _CLzmaEncProps, if any specific user settings are desired, such as compression level.

  3. To compress a file, load file to a source buffer and pass this and a destination buffer to LzmaEncode(). LzmaEncode() performs in-memory compression and writes the compressed data to the destination buffer.

  4. To decompress, call LzmaDecode() by passing compressed data as source and a destination buffer to hold uncompressed bytes.

Decode Functions

SRes LzmaProps_Decode(CLzmaProps *p, const Byte *data, unsigned size)#

decodes header bytes in data and sets properties in p.

Parameters

Direction

Description

p

out

Properties object to be set

data

in

Header bytes from compressed stream

size

in

Size of header in bytes

Returns:

Result

Description

Success

SZ_OK

Fail

SZ_ERROR_UNSUPPORTED - Unsupported properties

void LzmaDec_Init(CLzmaDec *p)#

Initialize LZMA decoder.

Parameters

Direction

Description

p

out

Decoder object to be initialized

Returns:

void

SRes LzmaDec_AllocateProbs(CLzmaDec *p, const Byte *props, unsigned propsSize, ISzAllocPtr alloc)#

Allocate probability tables in decoder object. Sets properties in p by calling LzmaProps_Decode().

Parameters

Direction

Description

p

out

Decoder object that holds prob tables

props

in

Header bytes from compressed stream

propsSize

in

Size of header in bytes

alloc

in

Allocator object

Returns:

Result

Description

Success

SZ_OK

Fail

SZ_ERROR_MEM - Memory allocation error

SZ_ERROR_PARAM - Incorrect parameter

SZ_ERROR_UNSUPPORTED - Unsupported properties

void LzmaDec_FreeProbs(CLzmaDec *p, ISzAllocPtr alloc)#

Free probability tables in decoder object.

Parameters

Direction

Description

p

out

Decoder object that holds prob tables

alloc

in

Allocator object

Returns:

void

SRes LzmaDec_Allocate(CLzmaDec *p, const Byte *props, unsigned propsSize, ISzAllocPtr alloc)#

Allocate probability tables in decoder object. Sets properties in p by calling LzmaProps_Decode(). Allocate dictionary buffer.

Parameters

Direction

Description

p

out

Decoder object that holds prob tables and dictionary

props

in

Header bytes from compressed stream

propsSize

in

Size of header in bytes

alloc

in

Allocator object

Returns:

Result

Description

Success

SZ_OK

Fail

SZ_ERROR_MEM - Memory allocation error

SZ_ERROR_PARAM - Incorrect parameter

SZ_ERROR_UNSUPPORTED - Unsupported properties

void LzmaDec_Free(CLzmaDec *p, ISzAllocPtr alloc)#

Free probability tables and dictionary in decoder object.

Parameters

Direction

Description

p

out

Decoder object that holds prob tables and dictionary

alloc

in

Allocator object

Returns:

void

SRes LzmaDec_DecodeToDic(CLzmaDec *p, SizeT dicLimit, const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode, ELzmaStatus *status)#

Decode src and write decompressed data into internal dictionary buffer. You can use it, if you want to eliminate the overhead for data copying from dictionary to some other external buffer. You must work with CLzmaDec variables directly in this interface.

STEPS:

LzmaDec_Construct()
LzmaDec_Allocate()
for (each new stream)
{
  LzmaDec_Init()
  while (it needs more decompression)
  {
    LzmaDec_DecodeToDic()
    use data from CLzmaDec::dic and update CLzmaDec::dicPos
  }
}
LzmaDec_Free()

When decoding to internal dictionary buffer (CLzmaDec::dic), you must manually update CLzmaDec::dicPos, if it reaches CLzmaDec::dicBufSize !!!

Parameters

Direction

Description

p

in,out

Decoder object that contains properties and dictionary buffer

dicLimit

in

Max number of bytes that can be decompressed and saved in dictionary

src

in

Source buffer containing compressed data

srcLen

in,out

Length of source buffer

finishMode

It has meaning only if the decoding reaches output limit (dicLimit).

- LZMA_FINISH_ANY - Decode just dicLimit bytes.

- LZMA_FINISH_END - Stream must be finished after dicLimit.

status

out

Decompression status at the end of current operation

Returns:

Result

Description

Success

SZ_OK

Fail

SZ_ERROR_DATA - Data error

SZ_ERROR_PARAM - Incorrect parameter

SZ_ERROR_FAIL - Some unexpected error: internal error of code, memory corruption or hardware failure

Decode One Call Interface

SRes LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen, const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode, ELzmaStatus *status, ISzAllocPtr alloc)#

Decode compressed data in src and save result to dest

Parameters

Direction

Description

dest

out

Destination buffer to save decompressed data

destLen

out

Size of decompressed data

src

in

Source buffer containing compressed data

srcLen

in,out

Length of source buffer

propData

in

Header bytes in compressed source data

propSize

in

Size of header

finishMode

It has meaning only if the decoding reaches output limit (*destLen).

- LZMA_FINISH_ANY - Decode just destLen bytes.

- LZMA_FINISH_END - Stream must be finished after (*destLen).

status

out

Decompression status at the end of current operation

alloc

in

Memory allocator object

Returns:

Result

Description

Success

SZ_OK

Fail

SZ_ERROR_DATA - Data error

SZ_ERROR_MEM - Memory allocation error

SZ_ERROR_PARAM - Incorrect parameter

SZ_ERROR_UNSUPPORTED - Unsupported properties

SZ_ERROR_INPUT_EOF - It needs more bytes in input buffer (src)

SZ_ERROR_FAIL - Some unexpected error: internal error of code, memory corruption or hardware failure

Encode Functions

void LzmaEncProps_Init(CLzmaEncProps *p)#

Init properties. Properties are set to auto select mode.

Parameters

Direction

Description

p

out

Lzma encode properties object

Returns:

void

void LzmaEncProps_Normalize(CLzmaEncProps *p)#

Set default values for properties in auto select mode.

Parameters

Direction

Description

p

out

Lzma encode properties object

Returns:

void

UInt32 LzmaEncProps_GetDictSize(const CLzmaEncProps *props2)#

Normalize props2 and return dictionary size.

Parameters

Direction

Description

props2

out

Lzma encode properties object

Returns:

dictionary size

CLzmaEncHandle LzmaEnc_Create(ISzAllocPtr alloc)#

Construct Lzma encoder.

Parameters

Direction

Description

alloc

in

Allocator object

Returns:

Lzma encoder handle

void LzmaEnc_Destroy(CLzmaEncHandle p, ISzAllocPtr alloc, ISzAllocPtr allocBig)#

Free Lzma encoder.

Parameters

Direction

Description

p

out

Lzma encoder handle

alloc

in

Allocator object

allocBig

in

Allocator object for large blocks

Returns:

void

SRes LzmaEnc_SetProps(CLzmaEncHandle p, const CLzmaEncProps *props)#

Update properties in p with values in props.

Parameters

Direction

Description

p

out

Lzma encoder handle

props

in

Lzma encoder properties

Returns:

Result

Description

Success

SZ_OK

Fail

SZ_ERROR_PARAM - Incorrect parameter in props

void LzmaEnc_SetDataSize(CLzmaEncHandle p, UInt64 expectedDataSiize)#

Set expected data size in p to expectedDataSiize.

Parameters

Direction

Description

p

out

Lzma encoder handle

expectedDataSiize

in

Expected size of data

Returns:

void

SRes LzmaEnc_WriteProperties(CLzmaEncHandle p, Byte *properties, SizeT *size)#

Build header bytes and save in properties.

Parameters

Direction

Description

p

in

Lzma encoder handle

properties

out

Buffer to write header bytes into

size

in,out

Number of bytes saved in properties

Returns:

Result

Description

Success

SZ_OK

Fail

SZ_ERROR_PARAM - Incorrect parameter in props

unsigned LzmaEnc_IsWriteEndMark(CLzmaEncHandle p)#

Get write end mark saved within p.

Parameters

Direction

Description

p

in

Lzma encoder handle

Returns:

writeEndMark

SRes LzmaEnc_MemEncode(CLzmaEncHandle p, Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen, int writeEndMark, ICompressProgress *progress, ISzAllocPtr alloc, ISzAllocPtr allocBig)#

Encode src in-memory and save compressed data to dest.

Parameters

Direction

Description

p

in,out

Lzma encoder handle

dest

out

Destination buffer to hold compressed data

destLen

out

Size of compressed data written to dest

src

in

Source buffer with uncompressed data

srcLen

in

Size of uncompressed data in src

writeEndMark

in

If non-0, finish stream with end mark

progress

in

Compression progress indicator

alloc

in

Allocator object

allocBig

in

Allocator object for large blocks

Returns:

Result

Description

Success

SZ_OK

Fail

SZ_ERROR_MEM - Memory allocation error

SZ_ERROR_PARAM - Incorrect parameter in props

SZ_ERROR_WRITE - ISeqOutStream write callback error

SZ_ERROR_OUTPUT_EOF - output buffer overflow - version with (Byte *) output

SZ_ERROR_PROGRESS - some break from progress callback

Encode One Call Interface

SRes LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen, const CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize, int writeEndMark, ICompressProgress *progress, ISzAllocPtr alloc, ISzAllocPtr allocBig)#

Encode data in src and save compressed data to dest.

Parameters

Direction

Description

dest

out

Destination buffer to hold compressed data

destLen

out

Size of compressed data written to dest

src

in

Source buffer with uncompressed data

srcLen

in

Size of uncompressed data in src

props

in

Properties to control compression method

propsEncoded

out

Buffer to save header bytes

propsSize

out

Size of header bytes

writeEndMark

in

If non-0, finish stream with end mark

progress

in

Compression progress indicator

alloc

in

Allocator object

allocBig

in

Allocator object for large blocks

Returns:

Result

Description

Success

SZ_OK

Fail

SZ_ERROR_MEM - Memory allocation error

SZ_ERROR_PARAM - Incorrect parameter in props

SZ_ERROR_WRITE - ISeqOutStream write callback error

SZ_ERROR_OUTPUT_EOF - output buffer overflow - version with (Byte *) output

SZ_ERROR_PROGRESS - some break from progress callback

Defines

LzmaDec_Construct(p)#

First operation to call before setting up CLzmaDec.

Parameters

Direction

Description

p

out

Decoder object to be initialized

Typedefs

typedef void *CLzmaEncHandle#

Pointer to context object that maintains state of LZMA encoder.

typedef struct _CLzmaEncProps CLzmaEncProps#

Structure to hold configurable parameters that can be used by LZMA encoder.

typedef struct _CLzmaProps CLzmaProps#

Structure to hold header parameters that are stored in LZMA compressed streams.

Enums

enum ELzmaFinishMode#

There are two types of LZMA streams:

  • Stream with end mark. That end mark adds about 6 bytes to compressed size.

  • Stream without end mark. You must know exact uncompressed size to decompress such stream.

ELzmaFinishMode has meaning only if the decoding reaches output limit.

Note

1. You must use LZMA_FINISH_END, when you know that current output buffer covers last bytes of block. In other cases you must use LZMA_FINISH_ANY.

Note

2. If LZMA decoder sees end marker before reaching output limit, it returns SZ_OK, and output value of destLen will be less than output buffer size limit. You can check status result also.

Note

3. You can use multiple checks to test data integrity after full decompression:

  • Check Result and “status” variable.

  • Check that output(destLen) = uncompressedSize, if you know real uncompressedSize.

  • Check that output(srcLen) = compressedSize, if you know real compressedSize.

    You must use correct finish mode in that case.

Values:

enumerator LZMA_FINISH_ANY#

finish at any point

enumerator LZMA_FINISH_END#

block must be finished at the end

enum ELzmaStatus#

ELzmaStatus is used as output status of decode function call.

Values:

enumerator LZMA_STATUS_NOT_SPECIFIED#

use main error code instead

enumerator LZMA_STATUS_FINISHED_WITH_MARK#

stream was finished with end mark.

enumerator LZMA_STATUS_NOT_FINISHED#

stream was not finished

enumerator LZMA_STATUS_NEEDS_MORE_INPUT#

you must provide more input bytes

enumerator LZMA_STATUS_MAYBE_FINISHED_WITHOUT_MARK#

there is probability that stream was finished without end mark

struct _CLzmaEncProps#
#include <LzmaEnc.h>

Structure to hold configurable parameters that can be used by LZMA encoder.

Public Members

int algo#

Dictionary search algo to use: 0 - fast (hash chain), 1 - normal (binary search tree).

0 - fast, 1 - normal,

default = 1

int btMode#

0 - hashChain Mode, 1 - binTree mode - normal, default = 1

UInt32 dictSize#

size of dictionary / search buffer

(1 << 12) <= dictSize <= (1 << 27) for 32-bit version

(1 << 12) <= dictSize <= (3 << 29) for 64-bit version default = (1 << 24)

int fb#

Number of fast bytes.

5 <= fb <= 273,

default = 32

int lc#

Number of high bits of the previous byte to use as a context for literal encoding.

0 <= lc <= 8,

default = 3

int level#

Control degree of compression. Lower level gives less compression at higher speed.

0 <= level <= 9

int lp#

Number of low bits of the dictionary position to include in literal posState.

0 <= lp <= 4,

default = 0

UInt32 mc#

Cut value, limit on number of nodes to search in dictionary.

1 <= mc <= (1 << 30),

default = 32

int numHashBytes#

Number of bytes used to compute hash.

2, 3 or 4,

default = 4

int numThreads#

Threads used for processing.

1 or 2,

default = 1

int pb#

Number of low bits of processedPos to include in posState.

0 <= pb <= 4,

default = 2

UInt64 reduceSize#

estimated size of data that will be compressed. default = (UInt64)(Int64)-1.

Encoder uses this value to reduce dictionary size

unsigned writeEndMark#

0 - do not write EOPM, 1 - write EOPM, default = 0

struct _CLzmaProps#
#include <LzmaDec.h>

Structure to hold header parameters that are stored in LZMA compressed streams.

Public Members

UInt32 dicSize#

size of dictionary / search buffer to use to find matches

Byte lc#

number of high bits of the previous byte to use as a context for literal encoding (default 3).

Byte lp#

number of low bits of the dictionary position to include in literal posState (default 0).

Byte pb#

number of low bits of processedPos to include in posState (default 2).

struct CLzmaDec#

Public Members

const Byte *buf#

input stream of compressed bytes

UInt32 checkDicSize#

indicator for situation where bytes to be processed is more than bytes that can fit in dest buffer

UInt32 code#

range coder: encoded point within range

Byte *dic#

circular buffer of decompressed bytes. Used as reference to copy from for future matches

SizeT dicBufSize#

dictionary size

SizeT dicPos#

current position in dictionary

UInt32 numProbs#

number of items in probs table

CLzmaProb *probs#

all context model probabilities

CLzmaProb *probs_1664#

all context model probabilities

UInt32 processedPos#

indicator of bytes decompressed until now. Incremented by 1 or incremented by len based on type of decompress operation performed

CLzmaProps prop#

properties read from header bytes in compressed data

UInt32 range#

range coder: range size

UInt32 remainLen#

shows status of LZMA decoder:

< kMatchSpecLenStart : the number of bytes to be copied with (rep0) offset

= kMatchSpecLenStart : the LZMA stream was finished with end mark

= kMatchSpecLenStart + 1 : need init range coder

= kMatchSpecLenStart + 2 : need init range coder and state

= kMatchSpecLen_Error_Fail : Internal Code Failure

= kMatchSpecLen_Error_Data + [0 … 273] : LZMA Data Error

UInt32 reps[4]#

offsets for repeated matches rep0-3

UInt32 state#

current state in state machine