Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More ergonomic API for constructing sockets #56

Open
jonhoo opened this issue May 9, 2017 · 13 comments
Open

More ergonomic API for constructing sockets #56

jonhoo opened this issue May 9, 2017 · 13 comments

Comments

@jonhoo
Copy link

jonhoo commented May 9, 2017

I started out wanting to be able to bind a socket to a given address (for the purposes of using a particular network interface) before connecting, and quickly discovered the standard library does not provide methods for doing so. I then found net2::TcpBuilder, which does provide this feature, but found the API somewhat less ergonomic than I would have liked.

@sfackler pointed out that RFC 1461 (after the failed RFC 1158) pulled in a couple of changes from net2 into the standard library, and @seanmonstar that net2 is a "Desired out-of-band evaluation" in the libz blitz, so I figured I'd send some feedback in the hopes that eventually something like TcpBuilder might land in the standard library too.

My suggestions follow below. I'd be happy to file a PR, but @sfackler suggested that given the backwards-incompatibility of these changes, filing a ticket first for discussion would be a good idea. Some of these build on the Builders enable construction of complex values (C-BUILDER) section from @brson's Rust API guidelines, whereas others are just because I like them better. Thoughts and feedback very welcome!

  • The methods on TcpBuilder that modify the socket should all take &mut self, and return that &mut in their Result. This is because, while it is technically true that a socket can be mutated safely from multiple places, it leaves the consumer of the API confused about whether there's a difference between the original builder and the returned one. For instance, it is not clear just from the API (nor from the resulting code) that tcp and tcp2 are the same socket, bound to the same port in the following code:

    let tcp = TcpBuilder::new_v4().unwrap();
    let tcp2 = tcp.bind("0.0.0.0:4000");

    Using &mut self is also more semantically accurate, since it indicates that we are in fact modifying the underlying socket.

  • The methods on TcpBuilder that listen, connect, or otherwise "initiate" the socket, should take self, not &self. The current API implies that you can re-use a builder to construct a second socket after calling listen or connect on it once, but this is not true. The code will panic if you try to do this, presumably because a single socket cannot be re-used. In theory, the builder could remember the configuration, and lazily create and configure the socket when needed, which would enable this kind of API, but I'm not sure that's really better. I think it's fairly rare that you'll want to re-use a socket configuration.

  • The change above brings us back to the first point about &mut self vs &self. @brson's post says that:

    Under the rubric of making easy things easy and hard things possible, all builder methods for a consuming builder should take and returned an owned self.

    Which suggests that we should in fact make all the builder methods take self. This unfortunately combines poorly with the fact that our methods can fail, and thus return Result<Self, ...>. If a method fails, you'd like to still be able to recover the builder, which is tricky if self was consumed. Maybe @brson can provide some insight here?

  • This is a more controversial one, but it might be that we should provide a SocketBuilder rather than a separate TcpBuilder and UdpBuilder. Under the hood, that's what the Berkeley Socket API provides, and it's unclear you want the distinction here either. Instead, you could imagine SocketBuilder being parameterized by UDP/TCP, which would still let us only expose the TCP related methods for TCP sockets. I personally think this would make the API more readable, and it would avoid the complete duplication of methods between the implementations. Along the same lines, I'd prefer to see V4/V6 be an enum passed to new, but I feel less strongly about that particular point.

@seanmonster pointed out that this is unlikely to break existing code, since current implementations must already only be calling the finalizing methods once (since any other use panics). Making the methods take &mut self will also likely require minor or no changes, since they are unlikely to be using the aliased & pointer anyway. Moving to self will require more changes to user code, as the original element will no longer be available, but if the regular one-line builder pattern is used with unwrap (which is unfortunately common), everything will keep working as usual. Dealing with the error cases will be trickier though, and we need a story there.

Here's the original discussion with @seanmonstar, @habnabit, and @sfackler from #rust:
jonhoo: why on earth do all the methods on TcpBuilder in net2 take &self
jonhoo: and return Result&TcpBuilder:
jonhoo: that's a really weird builder pattern
seanmonstar: jonhoo: cause you can mutate the socket without a mutable reference
jonhoo: true, but it's a strange pattern
jonhoo: what does the return value even mean?
jonhoo: for bind() for example, do I need to be using the returned value?
jonhoo: or the original?
jonhoo: or is the original somehow not modified?
sfackler: jonhoo: they're the same pointer
seanmonstar: that was because since all those operations could return an error (they're adjusting options on the socket, not something in rustland)
_habnabit: jonhoo, the return value is for chaining
seanmonstar: it wasn't desirable for you to call build() and get a EINVAL, and be left wondering "well crap, which of those methods had the invalid argument"
jonhoo: sure, but the classic way to do this is to then take `self`
sfackler: jonhoo: no standard library builders work that way iirc
jonhoo: being left with both pointers is weird, because it's not *clear* that they're the same
jonhoo: taking `self` is unambigious
_habnabit: jonhoo, how do you get the socket back out then?
seanmonstar: jonhoo: https://github.com/brson/rust-api-guidelines#builders-enable-construction-of-complex-values-c-builder
seanmonstar: jonhoo: you'd then need it to return ResultTcpStream, (io::Error, TcpStream):
jonhoo: hmm
jonhoo: I see
seanmonstar: otherwise its lost
jonhoo: right right
jonhoo: I still think I'd prefer it taking an `&mut self` even though it's not technically necessary, just to indicate that you are actually modifying the socket (and thus that you can use either the original or the returned)
jonhoo: just like in brson's pattern above
seanmonstar: its cause TcpStream is an odd duckling, and can be mutated through &self
jonhoo: mmm
seanmonstar: its safe to copy from a stream back onto itself
seanmonstar: so io::copy(&tcp, &tcp) works
jonhoo: oh, sure, I realize that, but for the builder pattern specifically it seems strange to not take &mut self even though *technically* you can
jonhoo: also, are you allowed to connect multiple times from the same builder?
jonhoo: because the current API allows that, right?
seanmonstar: nope, looks like it wil panic if tried again
jonhoo: yeah, I don't think the underlying sockets allows that
seanmonstar: i agree that the semantics of the api feel very not-normal
jonhoo: I guess *technically* the builder could allow that by lazily creating the socket on connect
jonhoo: which is the case for brson's example, and the reason his Command's build takes &self
jonhoo: but somehow it feels cleaner to me, especially for TcpBuilder, to take `self`
jonhoo: since there is just *one* socket that you're constructing
jonhoo: and then I think it follows that it should also be &mut
jonhoo: even though the syscall API allows &self
jonhoo: it'd be a pretty serious breaking change though
seanmonstar: would it though? the most common case is likely building it all at once
jonhoo: ah, you think most code actually conforms to the stricter api
jonhoo: yeah, probably
jonhoo: though it would require people to place `mut` in a couple of places if we made the methods `&mut self`
seanmonstar: i dunno, im sure you'd get push back if you did
seanmonstar: but i'd be silently rooting for you
jonhoo: haha, probably
jonhoo: thanks :p
arete: me too, I remember the API being pretty terrible =)
jonhoo: maybe I'll file a PR to revamp it and fix up the docs
jonhoo: worth filing an issue first to discuss you think?
sfackler: jonhoo: I would
sfackler: that builder is very old and barely used
sfackler: conventions around it are not super clear
sfackler: jonhoo: we moved a bunch of stuff from net2 into libstd a year or so ago, but left those builders out of it because we weren't super sure about the way they should work
jonhoo: sfackler: mmm, fair enough. okay, I'll file a ticket discussing this
@jonhoo
Copy link
Author

jonhoo commented May 9, 2017

I guess this is cc @alexcrichton too — you're everywhere in Rust land!

@alexcrichton
Copy link
Contributor

Thanks for the report! There's definitely quite a few constraints on this API, and I agree that we have yet to thread the needle quite well enough just yet.

The original intent of the decisions in this crate were:

  • Using &self instead of self was chosen in order to align with Command, allowing ergonomic configuration of a builder over time without having to consume and rebind it.
  • Using &self over &mut self was chosen because this builder doesn't actually require a mutable reference (e.g. the socket under the hood is safe to use concurrently)
  • Methods returning a Result was chosen because each stage's error could be acted to differently. For example if one address failed to bind you could choose a fallback address to bind to, but if the original socket creation failed then that may be fatal.
  • The TCP/UDP split was intended to align with libstd, and also provide a place for the extension traits for various socket options to live. That is, many socket options for TCP don't make sense for UDP and vice versa.

So given all that, I'm not personally 100% wed to the current API. I'd be totally fine moving over to &mut self as it's just downright more flexible in the long run. I would prefer to not, however, take self by value because it makes configuration over time more difficult, for example methods like this. I'm not personally a huge fan of the panic-on-double-listen semantics, but in practice it doesn't seem to cause any problems and it allows methods to ergonomic be chained if it doesn't consume self.

Given that, what do you think?

@jonhoo
Copy link
Author

jonhoo commented May 9, 2017

I agree that &mut self -> Result<&mut self> is the most flexible solution, and would be happy to make that the interface. It does directly conflict with the recommendation in C-BUILDER, but having worked with self-consuming channel senders in futures, I agree that not consuming self makes a lot of things nicer. That said, I do think that the finalizers (e.g., TcpBuilder::connect) should consume self. This would not affect chaining (since you can't chain across a finalizer anyway), and would entirely avoid the double-connect panics.

As for TCP/UDP, I'm a bit on the fence here. I agree that for established connections,, the methods you can run are different, but I don't know that this is true prior to calling listen/bind/etc., which is what the builders are for. I don't have a particularly strong opinion here though, and the current scheme is also fine (assuming it shares the underlying code eventually anyway).

So, in total, my preferred API would be (taking #45 into account):

impl<Proto> SocketBuilder<Proto> {
    fn new_v4() -> Result<Self>;
    fn new_v6() -> Result<Self>;

    fn ttl(&mut self, ttl: u32) -> Result<&mut Self>;
    fn only_v6(&mut self, only_v6: bool) -> Result<&mut Self>;
    fn reuse_address(&mut self, reuse: bool) -> Result<&mut Self>;

    fn local_addr(&self) -> Result<SocketAddr>;
    fn get_reuse_address(&self) -> Result<bool>;
    fn take_error(&self) -> Result<Option<Error>>;
}
impl SocketBuilder<Tcp> {
    fn bind(&mut self, addr: SocketAddr) -> Result<&mut Self>;
    fn listen(self, backlog: i32) -> Result<TcpListener>;
    fn connect<T>(&self, addr: SocketAddr) -> Result<TcpStream>;    
}
impl SocketBuilder<Udp> {
    fn bind(self, addr: SocketAddr) -> Result<UdpSocket>;
}

I'm not entirely sure what purpose the to_tcp_* methods serve, but wonder if they should be implemented using TryInto instead?

It's sort of weird that bind has very different semantics for UDP vs TCP sockets. For TCP, it just binds the address and returns the builder, whereas for UDP you no longer have access to the builder after calling bind(). I'd suggest making bind return Result<&mut Self> for both, and then have a separate build(self) method for SocketBuilder<Udp>.

I'm also not sure about the naming of some of these methods:

  • take_error seems strange; does it acually "take" anything?
  • ttl and only_v6 seem like they should have a set_ prefix?
  • get_reuse_address vs local_addr should be consistent about prefix

@alexcrichton
Copy link
Contributor

Oh so one point with consumption as a final step (but &mut self in the middle) is that then chaining no longer works in the case of:

TcpBuilder::new_v4()?.bind("...")?.listen(n)

The return value of bind is &mut TcpBuilder where listen would then require a move of the internal contents :(.

In general we've avoided type state in the past, and I'd be hesitant to use it here. In general while it does deduplicate the implementation a little bit I've found it doesn't tend to help consumers (e.g. why would a consumer be generic over SocketBuilder<T>?). Do you think there's a use case for consumers though?

The to_tcp_* methods are currently mostly just there to enable more fine-grained construction, such as when operations have been manually performed or will be continued out of band, but the previous configuration is needed.

One thing that I've personally concluded over time is that using a builder basically just isn't worth it. Instead I feel like we should just have struct Socket and be done with it. It has tons of methods for everything and then you do everything manually basically. Essentially this crate would just become a "rusty" socket API.

All of my own personal previous attempts at least have ended up being unergonomic in one way or another in some situation. Having just a straight up Socket, however, would fix this as the expectation level would just be different for the level of ergonomics :)

Out of curiosity, what do you think of that?

@jonhoo
Copy link
Author

jonhoo commented May 10, 2017

I think that's a great idea. In a way, that's what the API above starts to approximate (including being protocol-agnostic) if you remove the builder parts. I like it. You could arguably also get rid of the <Proto> generic, and instead just panic if you try to connect or listen on a UDP socket?

@alexcrichton
Copy link
Contributor

Heh so I actually intended on doing this a long time ago, and finally got around to it! I haven't documented much but do the APIs here seem reasonable?

@jonhoo
Copy link
Author

jonhoo commented May 10, 2017

That looks pretty good! I think I'd also like to see a way to convert into TcpStream, TcpListener, and UdpSocket for interoperability, but having the full low-level interface is nice.

@alexcrichton
Copy link
Contributor

Oh bah looks like they didn't show up in the docs but the conversions are the via into

@njaard
Copy link

njaard commented Oct 7, 2019

Can I poke the authors to take a look at this again? The lack of backlog on UnixListener is a serious deficiency as it keeps one from using std with services that get more than minimal throughput.

@sfackler
Copy link
Contributor

sfackler commented Oct 7, 2019

@njaard I would recommend using socket2: https://docs.rs/socket2/0.3.11/socket2/

@njaard
Copy link

njaard commented Oct 7, 2019

Thanks for the fast response! I already found it, but since I'm using tokio I also have to propagate that through mio! It's disappointing that backlog was forgotten throughout the entire ecosystem.

@sfackler
Copy link
Contributor

sfackler commented Oct 7, 2019

You can turn a socket2 Socket into a std UnixStream and a std UnixStream into a tokio UnixStream.

@njaard
Copy link

njaard commented Oct 7, 2019

For anyone else who may stumble upon this: The following definitely changes the socket backlog to 4096:

    let path: &str = ...;
    use socket2::*;
    let socket = Socket::new(Domain::unix(), Type::stream(), None).unwrap();
    socket.bind(&SockAddr::unix(path).unwrap()).unwrap();
    socket.listen(4096).unwrap();
    let socket = socket.into_unix_listener();
    let incoming = tokio_uds::UnixListener::from_std(socket, &tokio_reactor::Handle::default())
      .unwrap()
      .incoming();

    let s = hyper::Server::builder(incoming)
      .serve(new_service)
      .map_err(|e| eprintln!("error: {}", e));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants