From e1bd1af2bd99cbeb8ea8f81e970989061cb7ef23 Mon Sep 17 00:00:00 2001 From: Klaus Post Date: Mon, 7 Aug 2023 09:28:04 +0200 Subject: [PATCH] WIP: S2++ If the first bytes of a block is `0x40, 0x00` (repeat, length 4), this indicates that all [Copy with 4-byte offset (11)](https://github.com/google/snappy/blob/main/format_description.txt#L106) are all 3 bytes instead for the remainder of the block. There can be no literals before this tag and no repeats before a match as specified above. This will only trigger on this exact tag. > These are like the copies with 2-byte offsets (see previous subsection), > except that the offset is stored as a 24-bit integer instead of a > 16-bit integer (and thus will occupy three bytes). When in this mode the maximum backreference offset is 16777215. This *cannot* be combined with dictionaries. --- s2/README.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/s2/README.md b/s2/README.md index 8284bb0810..9f99044865 100644 --- a/s2/README.md +++ b/s2/README.md @@ -1047,6 +1047,21 @@ The first copy of a block cannot be a repeat offset and the offset is reset on e Default streaming block size is 1MB. +## S2++ + +If the first bytes of a block is `0x40, 0x00` (repeat, length 4), this indicates that all [Copy with 4-byte offset (11)](https://github.com/google/snappy/blob/main/format_description.txt#L106) are all 3 bytes instead for the remainder of the block. + +There can be no literals before this tag and no repeats before a match as specified above. +This will only trigger on this exact tag. + +> These are like the copies with 2-byte offsets (see previous subsection), +> except that the offset is stored as a 24-bit integer instead of a +> 16-bit integer (and thus will occupy three bytes). + +When in this mode the maximum backreference offset is 16777215. + +This *cannot* be combined with dictionaries. + # Dictionary Encoding Adding dictionaries allow providing a custom dictionary that will serve as lookup in the beginning of blocks.