Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add option to force compressed data to be byte aligned #49

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

ryancdotorg
Copy link

@ryancdotorg ryancdotorg commented Nov 22, 2020

This adds byte_align and bare_stream parameters to enable production of Brotli compressed blocks that can be used to construct a complete stream with trivial byte-wise concatonation.

To my knowledge, this functionality does not currently exist in any other Brotli implementation.

These options are exposed via the command line tool as --bytealign and --bare (bare mode enables byte alignment).

Byte align mode inserts an empty metadata block before the final empty block if the compressed data block is not already byte aligned.

Bare mode additionally omits the final empty block, and, in catable mode, the stream header.

TODO

  • don't emit empty penultimate metadata block if the compressed block ended on a byte boundry
  • add support for emitting "bare" compressed data without a stream header or empty last block
  • don't bother with byte aligning the start in appendable mode
  • plumb for use by libraries
  • update documentation
  • add tests

Notes to maintainers

This is the first time I've ever touched Rust code, so please forgive me if I've done something silly.

@ryancdotorg ryancdotorg changed the title bytealign mode: basic functionality Feature: Add option to force compressed data to be byte aligned Nov 22, 2020
@ryancdotorg ryancdotorg force-pushed the bytealign branch 2 times, most recently from 2b03734 to 1596167 Compare November 22, 2020 18:14
@ryancdotorg
Copy link
Author

@philippeitis @danielrh can i get some feedback on this please?

@philippeitis
Copy link
Contributor

philippeitis commented Dec 16, 2020

I'm not actually a maintainer, but my feedback is largely nit-picking code style. However, given that the codebase itself was machine-generated, I don't think that's a major issue. Otherwise, I think providing a link in the PR to relevant documentation of this feature in other brotli implementations (if it exists, otherwise saying that it's original would also be fine) would be helpful. Again, not a maintainer, so take my comments with a grain of salt.

@@ -340,6 +341,17 @@ value: u32) -> i32 {
params.favor_cpu_efficiency = value != 0;
return 1i32;
}
if p as (i32) == BrotliEncoderParameter::BROTLI_PARAM_BYTE_ALIGN as (i32) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if p == BrotliEncoderParameter::BROTLI_PARAM_BYTE_ALIGN should compile on rust 1.12? (I know that the rest of the code in this function uses the other convention, but from what I understand, this is all machine generated, and not how you'd naturally write the code).

@@ -721,7 +740,9 @@ fn EnsureInitialized<Alloc: BrotliAlloc>
if (*s).params.quality == 0i32 || (*s).params.quality == 1i32 {
lgwin = brotli_max_int(lgwin, 18i32);
}
EncodeWindowBits(lgwin, s.params.large_window, &mut (*s).last_bytes_, &mut (*s).last_bytes_bits_);
if !(*s).params.bare_stream {
EncodeWindowBits(lgwin, s.params.large_window, &mut (*s).last_bytes_, &mut (*s).last_bytes_bits_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to do &mut (*s).field - &mut s.field works identically and looks nicer (again, machine generated code which isn't very natural).

@@ -2014,7 +2035,13 @@ fn WriteMetaBlockInternal<Alloc: BrotliAlloc,
false,
cb);
if actual_is_last != is_last {
BrotliWriteEmptyLastMetaBlock(storage_ix, storage)
// insert empty block for byte alignment if required
if params.byte_align && ((*storage_ix & 7u32 as (usize)) != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better as (storage_ix & 7) != 0 (machine generated code)

@@ -2154,7 +2181,13 @@ fn WriteMetaBlockInternal<Alloc: BrotliAlloc,
cb);
}
if actual_is_last != is_last {
BrotliWriteEmptyLastMetaBlock(storage_ix, storage)
// insert empty block for byte alignment if required
if params.byte_align && ((*storage_ix & 7u32 as (usize)) != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better as (storage_ix & 7) != 0 (machine generated code)

@ryancdotorg
Copy link
Author

Thanks for following up. I was basing my additions on the existing codebase, I didn't realize it was machine generated. I think it'd be mildly confusing to mix use of e.g. (*s).field and s.field, but not my repo, and I'm find doing it however will get merged.

To the best of my knowledge, this feature doesn't exist in any other Brotli implementation, and I can edit the PR description to reflect that a bit later. I can expand a bit on the motivation as well.

@danielrh
Copy link
Collaborator

Hi! Thanks for doing this--sorry I've been away for a bit but I'm back now...I'd love to hear about the motivation of this patch and what problem it's trying to solve...is it taking the catable command one step further to allow vanilla cat instead of the special broccoli cat tools to combine files?

@ryancdotorg
Copy link
Author

@danielrh Yes, the idea is to produce substreams which can be assembled with the generic cat command or other tools that do naive concatenation/copying of bytes. Simply copying bytes ought to be faster than having to do even minor bit shuffling, though I feel like simplicity is a bigger win here.

This could be used in a number of scenarios, but the two main example I had in mind:

  • Stitching together multiple blocks of precompressed HTML, JavaScript and/or CSS into a stream
  • Limited templating with precompressed data, e.g. a precompressed JavaScript IIFE with an object literal embedded at the very end as a data parameter

@ryancdotorg
Copy link
Author

What do we want to do about the the issue of changes matching existing code conventions vs changes being reasonably idiomatic rust code? My sensibilities swing towards keeping code style consistent, but it's your call.

@CLAassistant
Copy link

CLAassistant commented Apr 16, 2022

CLA assistant check
All committers have signed the CLA.

@johnterickson
Copy link

johnterickson commented Nov 9, 2023

Ok this is really cool! We have a scenario that is similar to https://dropbox.tech/infrastructure/-broccoli--syncing-faster-by-syncing-less where we our customers upload to a content-addressable-store.

Create a header block:
touch empty.bin
--appendable --bytealign --bare -c empty.bin start.br
Individually compress the actual data blocks like this:
--catable --bytealign --bare -c block001 block001.br
Then fake a ISLASTEMPTY metablock:
printf "\x03" > ~/end.br

Then you can both

  1. Decompress individual blocks by a) prepending the start block and b) appending the end block: cat start.br block001.br end.br | brotli -d
  2. A standard decompressor (e.g. curl --compressed) can recreate the whole file from the concatenation of all the compressed blocks: e.g. cat start.br block*.br end.br | brotli -d
  3. If you need to, rearrange the compressed blocks order to rearrange the output order

@johnterickson
Copy link

@ryancdotorg Have you done anything more with this idea?

I brought it over to https://github.com/johnterickson/BrotliSharpLib/tree/bytealign

@ryancdotorg
Copy link
Author

@johnterickson No, haven't been working on it. All this PR needs is a rebase, tests and updated docs...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants