* Make client::InFlightRequests generic over result.
Previously, InFlightRequests required the client response type to be a
server response. However, this prevented injection of non-server
responses: for example, if the client fails to send a request, it should
complete the request with an IO error rather than a server error.
* Gracefully handle client-side send errors.
Previously, a client channel would immediately disconnect when
encountering an error in Transport::try_send. One kind of error that can
occur in try_send is message validation, e.g. validating a message is
not larger than a configured frame size. The problem with shutting down
the client immediately is that debuggability suffers: it can be hard to
understand what caused the client to fail. Also, these errors are not
always fatal, as with frame size limits, so complete shutdown was
extreme.
By bubbling up errors, it's now possible for the caller to
programmatically handle them. For example, the error could be walked
via anyhow::Error:
```
2023-01-10T02:49:32.528939Z WARN client: the client failed to send the request
Caused by:
0: could not write to the transport
1: frame size too big
```
* Some follow-up work: right now, read errors will bubble up to all pending RPCs. However, on the write side, only `start_send` bubbles up. `poll_ready`, `poll_flush`, and `poll_close` do not propagate back to pending RPCs. This is probably okay in most circumstances, because fatal write errors likely coincide with fatal read errors, which *do* propagate back to clients. But it might still be worth unifying this logic.
---------
Co-authored-by: Tim Kuehn <tikue@google.com>
In the interest of the user's attention, some ancillary APIs have been
moved to new submodules:
- server::limits contains what was previously called Throttler and
ChannelFilter. Both of those names were very generic, when the methods
applied by these types were very specific (and also simplistic). Renames
have occurred:
- ThrottlerStream => MaxRequestsPerChannel
- Throttler => MaxRequests
- ChannelFilter => MaxChannelsPerKey
- server::incoming contains the Incoming trait.
- server::tokio contains the tokio-specific helper types.
The 5 structs and 1 enum remaining in the base server module are all
core to the functioning of the server.
tarpc is now instrumented with tracing primitives extended with
OpenTelemetry traces. Using a compatible tracing-opentelemetry
subscriber like Jaeger, each RPC can be traced through the client,
server, amd other dependencies downstream of the server. Even for
applications not connected to a distributed tracing collector, the
instrumentation can also be ingested by regular loggers like env_logger.
# Breaking Changes
## Logging
Logged events are now structured using tracing. For applications using a
logger and not a tracing subscriber, these logs may look different or
contain information in a less consumable manner. The easiest solution is
to add a tracing subscriber that logs to stdout, such as
tracing_subscriber::fmt.
## Context
- Context no longer has parent_span, which was actually never needed,
because the context sent in an RPC is inherently the parent context.
For purposes of distributed tracing, the client side of the RPC has all
necessary information to link the span to its parent; the server side
need do nothing more than export the (trace ID, span ID) tuple.
- Context has a new field, SamplingDecision, which has two variants,
Sampled and Unsampled. This field can be used by downstream systems to
determine whether a trace needs to be exported. If the parent span is
sampled, the expectation is that all child spans be exported, as well;
to do otherwise could result in lossy traces being exported. Note that
if an Openetelemetry tracing subscriber is not installed, the fallback
context will still be used, but the Context's sampling decision will
always be inherited by the parent Context's sampling decision.
- Context::scope has been removed. Context propagation is now done via
tracing's task-local spans. Spans can be propagated across tasks via
Span::in_scope. When a service receives a request, it attaches an
Opentelemetry context to the local Span created before request handling,
and this context contains the request deadline. This span-local deadline
is retrieved by Context::current, but it cannot be modified so that
future Context::current calls contain a different deadline. However, the
deadline in the context passed into an RPC call will override it, so
users can retrieve the current context and then modify the deadline
field, as has been historically possible.
- Context propgation precedence changes: when an RPC is initiated, the
current Span's Opentelemetry context takes precedence over the trace
context passed into the RPC method. If there is no current Span, then
the trace context argument is used as it has been historically. Note
that Opentelemetry context propagation requires an Opentelemetry
tracing subscriber to be installed.
## Server
- The server::Channel trait now has an additional required associated
type and method which returns the underlying transport. This makes it
more ergonomic for users to retrieve transport-specific information,
like IP Address. BaseChannel implements Channel::transport by returning
the underlying transport, and channel decorators like Throttler just
delegate to the Channel::transport method of the wrapped channel.
# References
[1] https://github.com/tokio-rs/tracing
[2] https://opentelemetry.io
[3] https://github.com/open-telemetry/opentelemetry-rust/tree/main/opentelemetry-jaeger
[4] https://github.com/env-logger-rs/env_logger
This required the breaking change of removing the Client trait. The
intent of the Client trait was to facilitate the decorator pattern by
allowing users to create their own Clients that added behavior on top of
the base client. Unfortunately, this trait had become a maintenance
burden, consistently causing issues with lifetimes and the lack of
generic associated types. Specifically, it meant that Client impls could
not use async fns, which is no longer tenable today.
1. Renames
Some of the items in this module were renamed to be less generic:
- Handler => Incoming
- ClientHandler => Requests
- ResponseHandler => InFlightRequest
- Channel::{respond_with => requests}
In the case of Handler: handler of *what*? Now it's a bit clearer that
this is a stream of Channels (aka *incoming* connections).
Similarly, ClientHandler was a stream of requests over a single
connection. Hopefully Requests better reflects that.
ResponseHandler was renamed InFlightRequest because it no longer
contains the serving function. Instead, it is just the request, plus
the response channel and an abort hook. As a result of this,
Channel::respond_with underwent a big change: it used to take the
serving function and return a ClientHandler; now it has been renamed
Channel::requests and does not take any args.
2. Execute methods
All methods thats actually result in responses being generated
have been consolidated into methods named `execute`:
- InFlightRequest::execute returns a future that completes when a
response has been generated and sent to the server Channel.
- Requests::execute automatically spawns response handlers for all
requests over a single channel.
- Channel::execute is a convenience for `channel.requests().execute()`.
- Incoming::execute automatically spawns response handlers for all
requests over all channels.
3. Removal of Server.
server::Server was removed, as it provided no value over the Incoming/Channel
abstractions. Additionally, server::new was removed, since it just
returned a Server.
I don't know what the intention was behind using u32::MAX + 1 but since the
argument's type is usize this is the only giant value that makes sense to me.
Instead, serde_transport::tcp::connect returns a future named Connect
that has methods to directly access the framing config. This is
consistent with how serde_transport::tcp::listen returns a future with
methods to access the framing config. In addition to this consistency,
it reduces the API surface and provides a simpler user transition from
"zero config" to "some config".
This PR obsoletes the JSON and Bincode transports and instead introduces a unified transport that
is generic over any tokio-serde serialization format as well as AsyncRead + AsyncWrite medium.
This comes with a slight hit for usability (having to manually specify the underlying transport
and codec), but it can be alleviated by making custom freestanding connect and listen fns.
As part of this, I made an optional tokio feature which, when enabled,
adds utility functions that spawn on the default tokio executor. This
allows for the removal of the runtime crate.
On the one hand, this makes the spawning utils slightly less generic. On
the other hand:
- The fns are just helpers and are easily rewritten by the user.
- Tokio is the clear dominant futures executor, so most people will just
use these versions.
Send + 'static was baked in to make it possible to spawn futures onto
the default executor. We can accomplish the same thing by offering
helper fns that do the spawning while not requiring it for the rest of
the functionality.
Fixes https://github.com/google/tarpc/issues/212
- fn serve -> Service::serve
- fn new_stub -> Client::new
This allows the generated function names to remain consistent across
service definitions while preventing collisions.
-- Connection Limits
The problem with having ConnectionFilter default-enabled is elaborated on in https://github.com/google/tarpc/issues/217. The gist of it is not all servers want a policy based on `SocketAddr`. This PR allows customizing the behavior of ConnectionFilter, at the cost of not having it enabled by default. However, enabling it is as simple as one line:
incoming.max_channels_per_key(10, ip_addr)
The second argument is a key function that takes the user-chosen transport and returns some hashable, equatable, cloneable key. In the above example, it returns an `IpAddr`.
This also allows the `Transport` trait to have the addr fns removed, which means it has become simply an alias for `Stream + Sink`.
-- Per-Channel Request Throttling
With respect to Channel's throttling behavior, the same argument applies. There isn't a one size fits all solution to throttling requests, and the policy applied by tarpc is just one of potentially many solutions. As such, `Channel` is now a trait that offers a few combinators, one of which is throttling:
channel.max_concurrent_requests(10).respond_with(serve(Server))
This functionality is also available on the existing `Handler` trait, which applies it to all incoming channels and can be used in tandem with connection limits:
incoming
.max_channels_per_key(10, ip_addr)
.max_concurrent_requests_per_channel(10).respond_with(serve(Server))
-- Global Request Throttling
I've entirely removed the overall request limit enforced across all channels. This functionality is easily gotten back via [`StreamExt::buffer_unordered`](https://rust-lang-nursery.github.io/futures-api-docs/0.3.0-alpha.1/futures/stream/trait.StreamExt.html#method.buffer_unordered), with the difference being that the previous behavior allowed you to spawn channels onto different threads, whereas `buffer_unordered ` means the `Channels` are handled on a single thread (the per-request handlers are still spawned). Considering the existing options, I don't believe that the benefit provided by this functionality held its own.
# Changes
## Client is now a trait
And `Channel<Req, Resp>` implements `Client<Req, Resp>`. Previously, `Client<Req, Resp>` was a thin wrapper around `Channel<Req, Resp>`.
This was changed to allow for mapping the request and response types. For example, you can take a `channel: Channel<Req, Resp>` and do:
```rust
channel
.with_request(|req: Req2| -> Req { ... })
.map_response(|resp: Resp| -> Resp2 { ... })
```
...which returns a type that implements `Client<Req2, Resp2>`.
### Why would you want to map request and response types?
The main benefit of this is that it enables creating different client types backed by the same channel. For example, you could run multiple clients multiplexing requests over a single `TcpStream`. I have a demo in `tarpc/examples/service_registry.rs` showing how you might do this with a bincode transport. I am considering factoring out the service registry portion of that to an actual library, because it's doing pretty cool stuff. For this PR, though, it'll just be part of the example.
## Client::new is now client::new
This is pretty minor, but necessary because async fns can't currently exist on traits. I changed `Server::new` to match this as well.
## Macro-generated Clients are generic over the backing Client.
This is a natural consequence of the above change. However, it is transparent to the user by keeping `Channel<Req, Resp>` as the default type for the `<C: Client>` type parameter. `new_stub` returns `Client<Channel<Req, Resp>>`, and other clients can be created via the `From` trait.
## example-service/ now has two binaries, one for client and one for server.
This serves as a "realistic" example of how one might set up a service. The other examples all run the client and server in the same binary, which isn't realistic in distributed systems use cases.
## `service!` trait fns take self by value.
Services are already cloned per request, so this just passes on that flexibility to the trait implementers.
# Open Questions
In the service registry example, multiple services are running on a single port, and thus multiple clients are sending requests over a single `TcpStream`. This has implications for throttling: [`max_in_flight_requests_per_connection`](https://github.com/google/tarpc/blob/master/rpc/src/server/mod.rs#L57-L60) will set a maximum for the sum of requests for all clients sharing a single connection. I think this is reasonable behavior, but users may expect this setting to act like `max_in_flight_requests_per_client`.
Fixes#103#153#205
# New Crates
- crate rpc contains the core client/server request-response framework, as well as a transport trait.
- crate bincode-transport implements a transport that works almost exactly as tarpc works today (not to say it's wire-compatible).
- crate trace has some foundational types for tracing. This isn't really fleshed out yet, but it's useful for in-process log tracing, at least.
All crates are now at the top level. e.g. tarpc-plugins is now tarpc/plugins rather than tarpc/src/plugins. tarpc itself is now a *very* small code surface, as most functionality has been moved into the other more granular crates.
# New Features
- deadlines: all requests specify a deadline, and a server will stop processing a response when past its deadline.
- client cancellation propagation: when a client drops a request, the client sends a message to the server informing it to cancel its response. This means cancellations can propagate across multiple server hops.
- trace context stuff as mentioned above
- more server configuration for total connection limits, per-connection request limits, etc.
# Removals
- no more shutdown handle. I left it out for now because of time and not being sure what the right solution is.
- all async now, no blocking stub or server interface. This helps with maintainability, and async/await makes async code much more usable. The service trait is thusly renamed Service, and the client is renamed Client.
- no built-in transport. Tarpc is now transport agnostic (see bincode-transport for transitioning existing uses).
- going along with the previous bullet, no preferred transport means no TLS support at this time. We could make a tls transport or make bincode-transport compatible with TLS.
- a lot of examples were removed because I couldn't keep up with maintaining all of them. Hopefully the ones I kept are still illustrative.
- no more plugins!
# Open Questions
1. Should client.send() return `Future<Response>` or `Future<Future<Response>>`? The former appears more ergonomic but it doesn’t allow concurrent requests with a single client handle. The latter is less ergonomic but yields back control of the client once it’s successfully sent out the request. Should we offer fns for both?
2. Should rpc service! Fns take &mut self or &self or self? The service needs to impl Clone anyway, technically we only need to clone it once per connection, and then leave it up to the user to decide if they want to clone it per RPC. In practice, everyone doing nontrivial stuff will need to clone it per RPC, I think.
3. Do the request/response structs look ok?
4. Is supporting server shutdown/lameduck important?
Fixes#178#155#124#104#83#38