Removes the crate's dependency on `BigInt`, `BigUint` #767

zslayton · 2024-05-14T19:00:19Z

This PR limits the range of integers that can be read and written to those that can be represented by the i128 type.

This reduces the size of Int, UInt, Decimal, Timestamp, RawValueRef, and ValueRef. It also allows those types to implement Copy.

It also allows us to remove several custom methods to compare across mixed i64/BigInt and u64/BigUint representations for equality and ordering.

Here's the benchmark output following this change:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

This PR limits the range of integers that can be read and written to those that can be represented by the `i128` type. This reduces the size of `Int`, `UInt`, `Decimal`, and `Timestamp`, `RawValueRef`, and `ValueRef`. It also allows those types to implement `Copy`. It also allows us to remove several custom methods to compare across mixed `i64`/`BigInt` and `u64`/`BigUint` representations for equality and ordering.

zslayton

🗺️ PR Tour

zslayton · 2024-05-15T20:13:00Z

Cargo.toml

+# We need at least 1.65 for GATs[1] and 1.67 for `ilog`[2]
+# [1] https://blog.rust-lang.org/2022/11/03/Rust-1.65.0.html
+# [2] https://blog.rust-lang.org/2023/01/26/Rust-1.67.0.html#stabilized-apis
+rust-version = "1.67"


🗺️ Small bump to the minimum Rust version to enable us to use the stabilized ilog10 method for counting decimal digits in integers.

zslayton · 2024-05-15T20:13:11Z

Cargo.toml

@@ -62,7 +63,6 @@ chrono = { version = "0.4", default-features = false, features = ["clock", "std"
 delegate = "0.12.0"
 thiserror = "1.0"
 nom = "7.1.1"
-num-bigint = "0.4.3"


🗺️ 👋

zslayton · 2024-05-15T20:14:05Z

benches/encoding_primitives.rs

@@ -99,7 +99,7 @@ pub fn criterion_benchmark(c: &mut Criterion) {
        b.iter(|| {
            let mut encoded_length: usize = 0;
            for value in &unsigned_values {
-                encoded_length += black_box(FlexUInt::write_u64(&mut output, *value).unwrap());
+                encoded_length += black_box(FlexUInt::write(&mut output, *value).unwrap());


🗺️ The FlexUInt::write method can now accept any int size. I ran the encoding_primitives benchmark and confirmed that this is the same or faster than the u64-specific method in all cases.

zslayton · 2024-05-15T20:14:39Z

src/binary/decimal.rs

-        magnitude: UInt {
-            data: UIntData::U64(0),
-        },
+        magnitude: UInt::ZERO,


🗺️ I added ZERO and NEGATIVE_ZERO const values to the Decimal and Coefficient types for easier reference.

zslayton · 2024-05-15T20:16:27Z

src/binary/uint.rs

+        // TODO: copy from the slice and use from_be_bytes
+        let mut magnitude: u128 = 0;
+        for &byte in uint_bytes {
+            let byte = u128::from(byte);
+            magnitude <<= 8;
+            magnitude |= byte;
+        }


🗺️ I moved this from the (now gone) BigUInt::from_bytes_be method. As the TODO notes, it could still be optimized, but that's for another PR.

zslayton · 2024-05-15T20:18:53Z

src/ion_hash/representation.rs

+    fn write_repr_decimal(&mut self, value: Option<Decimal>) -> IonResult<()>;
+    fn write_repr_timestamp(&mut self, value: Option<Timestamp>) -> IonResult<()>;


🗺️ Several small changes to ion-hash based on this update.

zslayton · 2024-05-15T20:25:32Z

src/lazy/encoder/binary/v1_1/value_writer.rs

                        self.push_byte(0xE3); // Biased FlexUInt follows
-                        FlexUInt::encode_u64(self.encoding_buffer, symbol_id as u64 - 65_792)
-                    }
+                        let _ = FlexUInt::write(self.encoding_buffer, symbol_id - 65_792).unwrap();


🗺️ This is encoding to a Vec, so it can't fail.

zslayton · 2024-05-15T20:26:50Z

src/lazy/encoder/binary/v1_1/value_writer.rs

-                    ),
-                    -40,
-                ),
+                Decimal::new(Int::from(27182818284590452353602874713526624i128), -40),


🗺️ I switched this test value out for a marginally shorter one that fits in an i128.

zslayton · 2024-05-15T20:30:44Z

src/types/decimal/mod.rs

-    /// Extract the integer and fractional parts and the exponents of this as a `(BigInt, (BigInt, i64))`
-    ///
-    /// ```ignore
-    /// # use num_bigint::BigInt;
-    /// # use ion_rs::Decimal;
-    /// let d1: Decimal = Decimal::new(123456789, -2);
-    /// let (int_part, (frac_part, exponent)) = d1.into_parts();
-    /// assert_eq!(int_part, BigInt::from(1234567u64));
-    /// assert_eq!(frac_part, BigInt::from(89));
-    /// assert_eq!(exponent, -2);
-    /// ```
-    pub(crate) fn into_parts(self) -> (BigInt, (BigInt, i64)) {
-        let magnitude: BigInt = self.coefficient.try_into().unwrap();
-        if self.exponent.is_zero() {
-            (magnitude, (BigInt::zero(), 0))
-        } else if self.exponent.is_negative() {
-            let divisor = BigInt::from(10u64).pow((-self.exponent) as u32);
-            let (i, f) = magnitude.div_rem(&divisor);
-            (i, (f, self.exponent))
-        } else {
-            let multiplicand = BigInt::from(10u64).pow(self.exponent as u32);
-            (magnitude * multiplicand, (BigInt::zero(), 0))
-        }
-    }


🗺️ This and related methods existed solely to enable conversion of decimal components to/from BigInt.

zslayton · 2024-05-15T20:32:32Z

src/types/decimal/mod.rs

-    }
-
-    #[test]
-    fn test_difference_by_parts_lossy() {


🗺️ Removed tests for the also-removed difference_by_parts_lossy & co.

src/binary/decimal.rs

jobarr-amzn · 2024-05-15T21:15:55Z

src/binary/decimal.rs

-
-    #[test]
-    fn oversized_decimal() {
-        let mut buf = vec![];
-        // Even though our output stream is a growable Vec, the encoder uses a
-        // fixed-size, stack-allocated buffer to encode the body of the decimal.
-        // If it's asked to encode a decimal that won't fit (>32 bytes), it should
-        // return an error.
-        let precise_pi = Decimal::new(
-            BigUint::from_str("31415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679").unwrap(),
-            -99
-        );
-        let result = buf.encode_decimal_value(&precise_pi);
-        assert!(result.is_err());
-    }


Do we want to keep this test and assert that it is an error? Do we elsewhere have a test that verifies the new encoding error behavior?

This sent me down a useful rabbit hole.

Because I had removed the constructor that accepted a BigUint coefficient, it became statically impossible to trigger this case. However, because the Coefficient type was a (UInt, Sign) tuple, it was possible for it to store magnitudes outside the i128 range but inside the u128 range.

In my latest commits I've modified Coefficient to an (Int, Sign) pair where the Int's sign information always agrees with Sign except when the coefficient is a negative zero. This is essentially the same logic as before but with a constructor that's now capped at Int::MAX. This had the nice side effect of also eliminating the need for constructors that take an explicit Sign parameter because that would only ever be necessary to construct a -0. There are now Coefficient::new(some_int_value) and Coefficient::negative_zero() constructors.

All of that too say, it's now statically impossible to programmatically construct a Decimal that's too big.

src/lazy/binary/immutable_buffer.rs

src/types/decimal/mod.rs

src/types/decimal/coefficient.rs

src/types/integer.rs

src/types/timestamp.rs

src/lazy/value_ref.rs

src/lazy/encoder/binary/v1_1/value_writer.rs

src/lazy/encoder/binary/v1_1/flex_uint.rs

src/binary/uint.rs

rmarrowstone

This is a great simplification. Particularly for the Decimal and Timestamp implementations. Love it.

I'm not concerned for the potential limiting of decimals and timestamps, but for Ion Integers I don't have a clear picture of what a users options are (could be none) when they have a value stream with an "overlarge" integer. I'm good with an answer like "they can access a vec of the two's complement bytes from this API", and am likely fine with they can't. But I'm not familiar enough with ion-rust to see it from this PR. Thanks.

src/binary/int.rs

zslayton · 2024-05-21T15:41:02Z

This is a great simplification. Particularly for the Decimal and Timestamp implementations. Love it.

I'm not concerned for the potential limiting of decimals and timestamps, but for Ion Integers I don't have a clear picture of what a users options are (could be none) when they have a value stream with an "overlarge" integer. I'm good with an answer like "they can access a vec of the two's complement bytes from this API", and am likely fine with they can't. But I'm not familiar enough with ion-rust to see it from this PR. Thanks.

For the moment, there's not a (stable) option. An upcoming release will expose the span APIs, allowing users to access a value's underlying data.

Longer-term, I've been thinking about augmenting the IonError type with variants to expose values that were outside the happy path but potentially usable. For example, returning an IonError::OversizedInt(oversized_int) variant where oversized_int had methods to access its bytes and determine its byte order. (This is related to #759, where users may wish to access top-level containers that are found to be incomplete or contain invalid child data--CC @sajidanw.) I haven't fully baked this idea though; it seems like it would require adding a lifetime to IonError, which could be a big change. Another alternative would be adding a tolerant_next() method (name tbd) alongside next() which returned a different enum of its own, allowing users to opt into handling such cases.

src/binary/int.rs

Co-authored-by: Joshua Barr <70981087+jobarr-amzn@users.noreply.github.com>

Zack Slayton and others added 5 commits May 14, 2024 14:51

Merge branch 'main' into rm-bigint

4011a3d

adds as_i128, as_u128, and expect_* methods to Int, UInt

3d92e03

Updates ion-hash

88f6fba

Updated f64-to-decimal conversion logic

0929c8b

zslayton commented May 15, 2024

View reviewed changes

clippy suggestion, comment

c42f704

zslayton marked this pull request as ready for review May 15, 2024 20:36

zslayton requested review from popematt and nirosys May 15, 2024 20:37

jobarr-amzn approved these changes May 15, 2024

View reviewed changes

popematt reviewed May 15, 2024

View reviewed changes

rmarrowstone approved these changes May 15, 2024

View reviewed changes

src/binary/int.rs Show resolved Hide resolved

Zack Slayton added 3 commits May 16, 2024 11:08

Refactors DecodedUInt::uint_from_slice

6ec63ba

Changes Coefficient from (UInt, Sign) to (Int, Sign)

7927f71

Removed unused constants, incorporated feedback

feeb419

zslayton and others added 2 commits May 21, 2024 11:44

Merge branch 'main' into rm-bigint

3c38fe8

unit test cleanup

87760f7

zslayton requested a review from popematt May 21, 2024 15:47

fixes merge conflict

9f0c291

jobarr-amzn approved these changes May 21, 2024

View reviewed changes

src/binary/int.rs Outdated Show resolved Hide resolved

Removes superfluous comment

0e88bfc

Co-authored-by: Joshua Barr <70981087+jobarr-amzn@users.noreply.github.com>

popematt approved these changes May 21, 2024

View reviewed changes

zslayton merged commit 92b5f86 into main May 21, 2024
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removes the crate's dependency on `BigInt`, `BigUint` #767

Removes the crate's dependency on `BigInt`, `BigUint` #767

zslayton commented May 14, 2024 •

edited

zslayton left a comment

zslayton May 15, 2024

zslayton May 15, 2024

zslayton May 15, 2024

zslayton May 15, 2024

zslayton May 15, 2024

zslayton May 15, 2024

zslayton May 15, 2024

zslayton May 15, 2024

zslayton May 15, 2024

zslayton May 15, 2024

jobarr-amzn May 15, 2024

zslayton May 21, 2024

rmarrowstone left a comment

zslayton commented May 21, 2024

		fn write_repr_decimal(&mut self, value: Option<Decimal>) -> IonResult<()>;
		fn write_repr_timestamp(&mut self, value: Option<Timestamp>) -> IonResult<()>;

Removes the crate's dependency on BigInt, BigUint #767

Removes the crate's dependency on BigInt, BigUint #767

Conversation

zslayton commented May 14, 2024 • edited

zslayton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rmarrowstone left a comment

Choose a reason for hiding this comment

zslayton commented May 21, 2024

Removes the crate's dependency on `BigInt`, `BigUint` #767

Removes the crate's dependency on `BigInt`, `BigUint` #767

zslayton commented May 14, 2024 •

edited