Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add raw array pointer types #496

Merged
merged 10 commits into from Nov 29, 2018
Merged

Conversation

jturner314
Copy link
Member

@jturner314 jturner314 commented Sep 28, 2018

The motivation for this is #388 and another project I'm working on where I'd like to be able to properly deal with non-Copy uninitialized data.

This PR introduces two new traits, DataRaw and DataRawMut, which are less restrictive than Data and DataMut. It also introduces ArrayPtr and ArrayPtrMut which are similar to ArrayView and ArrayViewMut but don't carry lifetime information. I believe all the changes are backwards compatible.

ArrayPtr and ArrayPtrMut are more restrictive than normal raw pointers in Rust: they require pointers to all elements in the array to be within bounds or one byte past the end of a single allocated object. The reason for this restriction is to ensure that .offset() operations on the array's pointer can be performed safely. If we lifted this restriction, then then following methods would have to be unsafe for ArrayPtr and ArrayPtrMut:

  • .slice_move()
  • .slice_inplace()
  • .slice_axis_inplace()
  • .subview_inplace()
  • .into_subview()
  • .invert_axis()
  • .split_at()

While we could make these methods safe for normal arrays and unsafe for array pointers by implementing them separately for ArrayBase<S: Data, D: Dimension>, ArrayPtr<A, D: Dimension>, and ArrayPtrMut<A, D: Dimension>, doing so requires some duplication and hurts code organization. Additionally, as a user, I'd prefer for most methods on ArrayPtr/ArrayPtrMut to be safe for convenience, and in all of my realistic use-cases, my ArrayPtr/ArrayPtrMut instances will point to valid allocations.

@bluss Will you please review this? This PR deals with more unsafe code and potential for undefined behavior than I'm used to.

Edit: I've now added a DataRawClone trait to allow cloning ArrayPtr and ArrayPtrMut. (I couldn't just implement DataClone for PtrRepr<*const A> and PtrRepr<*mut A> because DataClone: Data.)

src/impl_methods.rs Outdated Show resolved Hide resolved
unsafe fn clone_with_ptr(&self, ptr: *mut Self::Elem) -> (Self, *mut Self::Elem) {
(*self, ptr)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can be without this trait, and just use DataClone? Clone is the only consumer of the trait

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In current ndarray, DataClone has a Data bound. If we want to be able to clone ArrayPtr/ArrayPtrMut, this is too restrictive given only the current implementation of Clone for ArrayBase. Possible alternatives to the approach in this PR:

  1. Don't add a DataRawClone trait; instead, implement Clone individually for ArrayPtr and ArrayPtrMut. That would be inconsistent and would prevent code with just a DataClone bound from being able to clone ArrayPtr and ArrayPtrMut.

    With a breaking change, though, we could get rid of DataClone entirely and just implement Clone individually for ArcArray, Array, ArrayView, ArrayPtr, and ArrayPtrMut. This actually seems pretty nice, but it does make writing generic code more difficult. (In user code, the bound to allow array.clone() would have to be ArrayBase<S, D>: Clone instead of S: DataRawClone.)

  2. Add the DataRawClone trait and remove the DataClone trait. That would work but would be a breaking change.

Fwiw, DataClone does have a small amount of utility as the combination of Data and DataRawClone, but I agree it does seem like unnecessary complication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(2) I think a breaking change is ok. We have lots of them queued? We should get on to merging them as soon as 0.12.1 is released (and I can do the releasing if you want).

I'd hope it has very little impact on users, but maybe I underestimate how much these representation traits are used?

I'd use the name DataClone for the new trait.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use the name DataClone for the new trait.

The only thing that I don't like about this is consistency of naming with the other data traits, especially DataMut vs. DataRawMut. I was naming traits without a Data bound DataRaw* and traits with a Data bound Data*.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. We still should use only one trait if we don't need 2, we could use the name DataRawClone and let DataClone be a deprecated reexport of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataClone could be a trait alias of DataRawClone + Data? That's a new Rust feature we can transition to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah, I had no idea trait aliases were a thing. They look very convenient.

I've added a commit deprecating DataClone. We can switch DataClone to a trait alias once they're available on stable.

src/impl_ptrs.rs Outdated Show resolved Hide resolved
src/impl_ptrs.rs Outdated Show resolved Hide resolved
@jturner314 jturner314 force-pushed the arrayptr branch 7 times, most recently from 208b4df to d2daef9 Compare November 19, 2018 07:00
@bluss
Copy link
Member

bluss commented Nov 19, 2018

I like the new names. It also puts the Raw types a bit outside of the list of important types with Array* names, so we keep in a less visible place in the docs.

@jturner314
Copy link
Member Author

jturner314 commented Nov 19, 2018

Yeah, it's nice they're sorted near the end alphabetically. Edit: I also like the method name .as_raw_view() better than .as_array_ptr().

@jturner314
Copy link
Member Author

In this PR, I've created a separate RawViewRepr type, but I could just as easily reuse the existing ViewRepr type. Any thoughts?

@bluss
Copy link
Member

bluss commented Nov 19, 2018

Seems to not matter, separate works.

/// Unsafe because, `ptr` must point inside the current storage.
unsafe fn clone_with_ptr(&self, ptr: *mut Self::Elem) -> (Self, *mut Self::Elem);
#[deprecated(note="use `Data + DataRawClone` instead", since="0.12.1")]
pub trait DataClone : Data + DataRawClone {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to decide, but it seems best to me to still keep DataClone under its current name, and let the breaking change instead be, that DataClone does not require Data. The code using DataClone on github that I've seen (random sample) mostly uses DataClone + Data<Elem=X> as bound anyway.

Ideally, we should be here. These should be internal implementation traits. Users should have better ways of being general over our arrays.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why you lean towards dropping the Data bound from DataClone instead of adding a DataRawClone trait (and keeping DataClone: Data + DataRawClone if desired). The name DataClone is a little shorter than DataRawClone, which is nice for code that needs to specify a DataClone/DataRawClone bound, but I don't see any other advantages to this approach. If the issue is primarily the length of the name, we could call the new trait RawClone instead of DataRawClone.

Fwiw, I'd typically write DataClone<Elem = X> instead of DataClone + Data<Elem = X>. This does come up more often with DataMut (example below), but I could conceivably see it happening with DataClone too.

extern crate ndarray;

use ndarray::prelude::*;
use ndarray::{array, Data, DataMut};

fn foo<S1, S2>(a: &ArrayBase<S1, Ix2>, b: &mut ArrayBase<S2, Ix2>)
where
    S1: Data<Elem = f32>,
    S2: DataMut<Elem = f32>, // I like this better than Data<Elem = f32> + DataMut
{
    *b += a;
}

fn main() {
    let a = array![[1., 2.], [3., 4.]];
    let mut b = array![[5., 6.], [7., 8.]];
    foo(&a, &mut b);
    println!("{}", b);
}

I'm against dropping the Data bound from DataClone for two reasons:

  1. The biggest issue is inconsistency with the names of other traits: all the other traits named Data* have a Data bound, while the DataRaw* traits don't.
  2. If users don't notice the change in the changelog and update their code, it would lead to safety issues for user code relying on DataClone implying Data. For example, this could happen if existing user code specifies a DataClone<Elem = A> bound on an array and dereferences pointers to elements in the array. The consequences of this could be pretty bad, but this coming up in practice does seem unlikely.

We could resolve reason 1 and maintain consistency of naming by dropping the Data bound from DataMut, DataOwned, and DataShared too. Then, we'd have something like this:

/// Array representation trait.
///
/// For an array that meets the invariants of the `ArrayBase` type. This trait
/// does not imply any ownership or lifetime; pointers to elements in the array
/// may not be safe to dereference.
///
/// ***Note:*** `RawData` is not an extension interface at this point.
/// Traits in Rust can serve many different roles. This trait is public because
/// it is used as a bound on public methods.
pub unsafe trait RawData : Sized {
    /// The array element type.
    type Elem;

    #[doc(hidden)]
    // This method is only used for debugging
    fn _data_slice(&self) -> Option<&[Self::Elem]>;

    private_decl!{}
}

/// Array representation trait.
///
/// For an array with writable elements.
///
/// ***Internal trait, see `RawData`.***
pub unsafe trait DataMut : RawData {
    /// If possible, ensures that the array has unique access to its data.
    ///
    /// If `Self` provides safe mutable access to array elements, then it
    /// **must** panic or ensure that the data is unique.
    #[doc(hidden)]
    fn try_ensure_unique<D>(&mut ArrayBase<Self, D>)
    where Self: Sized,
          D: Dimension;

    /// If possible, returns whether the array has unique access to its data.
    ///
    /// If `Self` provides safe mutable access to array elements, then it
    /// **must** return `Some(_)`.
    #[doc(hidden)]
    fn try_is_unique(&mut self) -> Option<bool>;
}

/// Array representation trait.
///
/// A representation that is a unique or shared owner of its data.
///
/// ***Internal trait, see `Data`.***
pub unsafe trait DataOwned : RawData {}

/// Array representation trait.
///
/// A representation that is a lightweight view.
///
/// ***Internal trait, see `Data`.***
pub unsafe trait DataShared : Clone + RawData + DataClone { }

/// Array representation trait.
///
/// An array representation that can be cloned.
///
/// ***Internal trait, see `RawData`.***
pub unsafe trait DataClone : RawData {
    #[doc(hidden)]
    /// Unsafe because, `ptr` must point inside the current storage.
    unsafe fn clone_with_ptr(&self, ptr: *mut Self::Elem) -> (Self, *mut Self::Elem);

    #[doc(hidden)]
    unsafe fn clone_from_with_ptr(&mut self, other: &Self, ptr: *mut Self::Elem) -> *mut Self::Elem {
        let (data, ptr) = other.clone_with_ptr(ptr);
        *self = data;
        ptr
    }
}

/// Array representation trait.
///
/// For an array with elements that can be accessed with safe code.
///
/// ***Internal trait, see `RawData`.***
//
// # For implementers
//
// If you implement both the `Data` and `DataMut` traits, you are guaranteeing
// that the `DataMut::try_ensure_unique` implementation always panics or
// ensures that the data is unique. You are also guaranteeing that
// `try_is_unique` always returns `Some(_)`.
pub unsafe trait Data : RawData {
    #[doc(hidden)]
    fn new(elements: Vec<Self::Elem>) -> Self
    where
        Self: DataOwned;

    /// Converts the data representation to a shared (copy on write)
    /// representation, without any copying.
    #[doc(hidden)]
    fn into_shared(self) -> OwnedRcRepr<Self::Elem>
    where
        Self: DataOwned;

    /// Converts the array to a uniquely owned array, cloning elements if necessary.
    #[doc(hidden)]
    fn into_owned<D>(self_: ArrayBase<Self, D>) -> ArrayBase<OwnedRepr<Self::Elem>, D>
    where
        Self::Elem: Clone,
        D: Dimension;

    /// Ensures that the array has unique access to its data.
    #[doc(hidden)]
    #[inline]
    fn ensure_unique<D>(self_: &mut ArrayBase<Self, D>)
    where
        Self: DataMut + Sized,
        D: Dimension
    {
        Self::try_ensure_unique(self_)
    }

    /// Returns whether the array has unique access to its data.
    #[doc(hidden)]
    #[inline]
    fn is_unique(&mut self) -> bool
    where
        Self: DataMut,
    {
        self.try_is_unique().unwrap()
    }
}

There are two disadvantages to this: First, the safety issue (reason 2 above) is more likely to occur with more important traits like DataMut missing the Data bound. Second, this requires explicitly writing a Data bound in more cases (e.g. DataMut + Data<Elem = A> instead of DataMut<Elem = A>) which isn't as convenient.

These should be internal implementation traits. Users should have better ways of being general over our arrays.

Treating them as internal would be fine, but leaving them public doesn't seem too bad. If we treat them as internal, we'd need to add ArrayMut, ArrayOwned, etc., traits implemented on ArrayBase to allow users to operate in a generic way over array types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, you're right :)

@bluss
Copy link
Member

bluss commented Nov 27, 2018

After reading about the stacked borrows model and trying to catch up with what the unsafe guidelines people have been working on, this seems even more important. For example down the line, let's have an NdProducer that produces raw pointer elements, that should allow us (I think?) to get around problems like accessing uninitialized elements through a mutable reference, if that becomes an issue.

I hope that unsafe code guidelines will have some room for pragmatism and adapting to existing unsafe code practices.

@jturner314
Copy link
Member Author

I'd like to rename DataRaw/DataRawMut/DataRawClone to RawData/RawDataMut/RawDataClone. (I like these names better, and this makes them more consistent with RawViewRepr/RawView/RawViewMut.) Any objection to renaming those traits?

After reading about the stacked borrows model and trying to catch up with what the unsafe guidelines people have been working on, this seems even more important. For example down the line, let's have an NdProducer that produces raw pointer elements, that should allow us (I think?) to get around problems like accessing uninitialized elements through a mutable reference, if that becomes an issue.

Yeah, we should be able to refactor a lot of functionality in terms of RawView/RawViewMut, especially iterators/producers, and it would be useful to provide iterators/producers over RawView/RawViewMut for use in unsafe code.

For uninitialized data specifically, I'm looking forward to RFC 1892, but for the time being, raw views provide a safe way to work with uninitialized data. We could even have a "raw" owned array type for use with uninitialized data until RFC 1892 is implemented, but I'd prefer to just wait for RFC 1892.

@bluss
Copy link
Member

bluss commented Nov 28, 2018

New names sound good!

We'll replace `DataClone` with a trait alias once that feature becomes
available on stable.
@jturner314 jturner314 merged commit d8078f6 into rust-ndarray:master Nov 29, 2018
@jturner314 jturner314 deleted the arrayptr branch November 29, 2018 21:40
@bluss
Copy link
Member

bluss commented Nov 29, 2018

💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants