Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements writing e-expressions in binary 1.1 #722

Merged
merged 5 commits into from
Mar 7, 2024
Merged

Implements writing e-expressions in binary 1.1 #722

merged 5 commits into from
Mar 7, 2024

Conversation

zslayton
Copy link
Contributor

@zslayton zslayton commented Mar 5, 2024

Adds write support for e-expressions in the binary writer.

Additionally:

  • Adds a write_many_structs benchmark that compares time to write structs in Ion 1.0, Ion 1.1 w/length prefixes, Ion 1.1 w/delimited containers, and Ion 1.1 w/macros.
  • Adds an example program that generates a large number of Log4J-style event structs and serializes them in various forms.
  • Amends some binary encoding details that were ambiguous in the spec.
  • Adds ice_code as a dependency to label cold code paths without having to construct new methods.
  • Sets some default allocation sizes for encoding buffers and the bump allocator itself to reduce reallocations.
  • Adds a delegate_value_writer_to! macro that cuts the number of places the ValueWriter trait method collection is written out, simplifying refactoring.

Related to #671.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor Author

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ PR tour

@@ -71,8 +71,9 @@ num-integer = "0.1.44"
num-traits = "0.2"
arrayvec = "0.7"
smallvec = {version ="1.9.0", features = ["const_generics"]}
bumpalo = {version = "3.14.0", features = ["collections", "std"]}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -142,7 +142,6 @@ pub fn criterion_benchmark(c: &mut Criterion) {
}

fn roundtrip_var_uint_test(unsigned_values: &[u64]) -> IonResult<Vec<u8>> {
println!("Roundtripping unsigned values as VarUInts to check for correctness.");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ These println! statements were being run even when the associated benchmark was not, leading to weird output.

use ion_rs::lazy::encoder::value_writer::{StructWriter, ValueWriter};
use ion_rs::RawSymbolTokenRef;

fn write_struct_with_string_values(value_writer: impl ValueWriter) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Each of these methods takes an impl ValueWriter, allowing us to pass in any writer in the set of {text, binary} x {v1.0, v1.1} and ensuring that the same logic is used for each.

//! This program demonstrates implementing WriteAsIon using Ion 1.1's e-expressions for a more
//! compact encoding. It uses raw-level writer APIs that end users are unlikely to leverage.
//! Ion 1.1 is not yet finalized; the encoding produced by this example and the APIs it uses
//! are very likely to change.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This program generates reasonable example data wherein each struct has the same general layout and values are likely to recur. Its output can be used for data size and compression testing.

src/element/mod.rs Show resolved Hide resolved
// is 10 bytes.

if bytes_available < 10 {
cold_path! {{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This code path is for encoding FlexUInts that require 9 or 10 bytes to encode, which is exceedingly rare. There's no logic change here, just indentation for cold_path!.

Comment on lines +83 to +85
pub trait MacroArgsWriter: SequenceWriter {
// TODO: methods for writing tagless encodings
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ For the moment, the MacroArgsWriter only supports tagged arguments.

// The ValueWriter implementation of these methods moves `self`. In contrast, all of the methods
// in the SequenceWriter interface take `&mut self`, which adds another lifetime to the mix. The
// borrow checker is not currently able to tell that `&mut self`'s lifetime will outlive the
// closure argument's.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ I took another crack at fixing this only to remember why it wasn't possible. 😅

@@ -31,3 +37,128 @@ impl<'top, D: LazyDecoder> From<Never> for MacroExpr<'top, D> {
unreachable!("macro in Ion 1.0 (method: into)")
}
}

impl AnnotatableValueWriter for Never {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Implementing all of these traits for Never allows it to be used to satisfy macro-related associated types in Ion 1.0, statically preventing those types/methods from being used.

@@ -22,15 +22,15 @@ use std::fmt::{Display, Formatter};
use crate::types::Str;
/// Configures and constructs new instances of [Reader].
pub struct ReaderBuilder {
// This will be set to to `Some` catalog when the reader builder is created with catalog information.
// Default value for this field will be set to `None`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This field used to be an Option. I've updated the comment in the constructor.

digest = { version = "0.9", optional = true }
ice_code = "0.1.4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure we can trust the author of this dependency? 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems pretty shady to me!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that's why it's so cold.

src/element/mod.rs Show resolved Hide resolved
@@ -1,13 +1,14 @@
use bumpalo::collections::Vec as BumpVec;
use bumpalo::Bump as BumpAllocator;
use delegate::delegate;
use ice_code::ice as cold_path;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI—I think it's fine to use ice! { ... } without renaming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good feedback, thanks. I was wondering if I should push the name cold_path upstream since it's a bit more self-documenting (albeit less fun 😛). I'll leave this for the moment since it's a pretty easy find/replace to change later. @nirosys, do you have an opinion one way or another on the macro name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like the self-documenting nature of cold_path, since it's a bit easier to understand without familiarity of what ice_cold does. But, I don't feel too strongly about it, since it does rely on knowing what a cold path is to be more understandable.

src/lazy/encoder/binary/v1_1/container_writers.rs Outdated Show resolved Hide resolved
SymbolId(sid) if self.flex_uint_encoding => {
FlexUInt::write_u64(self.buffer(), sid as u64)?;
// and then opcode 0x90, which indicates symbol ID 0.
cold_path! {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question—while this one instance of using ice/cold_path might not make much difference on its own, do we need to be concerned about this macro inflating the compiled code?

I.e.: Is cold_path! going to inflate the size of the compiled code as compared to something like having a function like this so that ice/cold_path doesn't generate a new function for both the 0x80 and 0x90 cases?

#[inline(never)]
fn cold_extend_from_slice(self, slice: &[u8]) -> IonResult<()> {
    self.buffer().extend_from_slice_copy(slice)
}

Comment on lines +91 to +93
cold_path! {{
let _ = Self::write_large_i64(output, value, encoded_size_in_bytes);
}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, any particular reason we need to use cold_path! instead of just marking write_large_i64 as #[inline(never)]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downsides of marking the callee #inline(never) are:

  1. That information isn't visible at the callsite, so a reader would need to go investigate it separately.
  2. It prevents that method from being used in non-cold paths. (This isn't a problem for this method in particular since it's never going to be on the hot path, but it is an issue more generally.)
  3. If any of the method's arguments are statically known at the call site, cold_path! bakes them into the resulting function so the assembly doesn't have to push or pop them from the call stack.
  4. There may be other functionality that's part of the cold path but not part of the method. For example:
fn foo(&self) -> IonResult<()> {
    if self.something() {
        return Ok(());
    }
    cold_path! {{
        // This fn contains most of the cold path logic...
        fallible_fn()
        // ...but we need to map its result to an IonResult
            .map_err(IonError::from)
    }}
}

src/lazy/text/raw/v1_1/reader.rs Show resolved Hide resolved
@zslayton zslayton requested a review from popematt March 6, 2024 16:42
@zslayton zslayton merged commit 9fd213c into main Mar 7, 2024
28 checks passed
@zslayton zslayton deleted the write-eexp branch March 7, 2024 20:46
popematt pushed a commit to popematt/ion-rust that referenced this pull request Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants