Just pass `&[&str]` into custom storage providers. The scopeset struct has a range of unnecessary internal features.
It's now also part of the interface for custom storage providers that the given scopes will be both unique and sorted.
The only slightly awkward thing is that there's no conventient way to expose a `scopes_covered_by` helper method (which almost all custom storage engines will need), but it's still included in the example code.
Instead, suggest using interior mutability (and RwLock in the example) to manage storage of token states. This makes it easier to share authenticators between threads.
Allow users to build their own token storage system by implementing the `TokenStorage` trait. This allows use of more secure storage mechanisms like OS keychains, encrypted files, or secret-management tools.
Custom storage providers are Box-ed to avoid adding more generics to the API — the indirection cost will only apply if using a custom store.
I've added `anyhow` to allow easy handling of a wide range of errors from custom storage providers.
Could be helpful when troubleshooting issues with various providers if
the user is able to turn on debug logging. The most critical logging
provided is the request and responses sent and received from the oauth
servers.
What was previously called Token is now TokenInfo and is merely an
internal implementation detail. The publicly visible type is now called
AccessToken and differs from TokenInfo by not including the refresh
token. This makes it a smaller type for users to pass around as well as
reducing the ways that a refresh token may be leaked. Since the
Authenticator is responsible for refreshing the tokens there isn't any
reason users should need to concern themselves with refresh tokens.
Defering disk writes is still probably a good idea, but unfortunately
there are some tradeoffs with rust's async story that make it non-ideal.
Ideally we would defer writes, but have a Drop impl on DiskStorage that
waited for all the deferred writes to complete. While it's trival to
create a future that waits for all deferred writes to finish it's not
currently possible to write a Drop impl that waits on a future.
It would be possible to write an inherent async fn that takes self by
value and waits for the writes, but that method would need to be
propogated up all the way to users of the library and they would need to
remember to invoke it before dropping the Authenticator.
Keeping the same tokens in a Vec and BTreeMap created more overhead than
was warranted. It makes much more sense to simply iterator over the
BTreeMap than keep a separate Vec.
This keeps DiskStorage Sync + Send and therefore Authenticator Sync +
Send. The DiskStorage was threadsafe because JSONTokens contains a Mutex
around all the Rc<RefCell<T>> objects, but there's no way to prove to
the type system that none of the Rc's get cloned to an alias used
outside the Mutex so it's not provably safe. I'll probably reevaluate
the design here, but in the meantime the double locking is fine.
When bloom filters were added the btreemap values changed to be a
vector of tokens to accomodate the possibility of bloom filter
collisions. The implementation naively just pushed new tokens onto the
vec even if they were replacing previous tokens meaning old tokens were
still kept around even after a refresh has replaced it. To fix this
efficiently the storage layer now tracks both a hash value and a bloom
filter along with each token. Their is a map keyed by hash for every
token that points to a reference counted version of the token, and each
token also exists in a separate vector. Updates to existing tokens
happens in place, when new entries are added they are added to both data
structures.
Prior to this change DeviceFlow and InstalledFlow were used within
Authenticator, while ServiceAccountAccess was used on it's own. AFAICT
this was the case because ServiceAccountAccess never used refresh tokens
and Authenticator assumed all tokens contained refresh tokens.
Authenticator was recently modified to handle the case where a token
does not contain a refresh token so I don't see any reason to keep the
service account access separate anymore. Folding it into the
authenticator provides a nice consistent interface, and the service
account implementation no longer needs to provide it's own caching since
it is now handled by Authenticator.
Each token is stored along with a 64bit bloom filter that is created
from the set of scopes associated with that token. When retrieving
tokens for a set of scopes a new bloom filter is calculated for the
requested scopes and compared to the filters of all previously fetched
scopes. The bloom filter allows for efficiently skipping entries that
are definitely not a superset.
Seahash is a stable hash, but there isn't any value in serializing it's
value. Instead calculate the value of the hash when deserializing and
only serialize the scopes and tokens. This provides flexibility to
change the hash value in the future without breaking the on-disk format.
Use a BTreeMap to key the tokens by the hash value. On retrieval first
look for a matching hash value and return it if it exists. Only if it
does not exist does it fallback to the subset matching. This makes the
common case where an application uses a consistent set of scopes more
efficient without detrimentally impacting the less common cases.
DefaultHasher is not documented as being consistent. It's best to not
trust that the resulting hash value is consistent even across different
executions of the same binary and even more so across different
versions.
This is already the case when writing a token file. Presumably the only
reason it was an Option was for backwards compatibility, but we're
already breaking compatibility with the change to the hash value so this
seems like an appropriate time to make the change.
This change also highlights how unused the hash value has been
previously. Future changes plan to use the hash value for more efficient
handling.
No caller ever provided a None value. Presumably a None value should
delete the token, but it didn't do that and that would be more clearly
done with a remove or delete method.
These previously accepted a hash and scopes. The hash was required to be
a hash of the provided scopes but that wasn't enforced by the compiler.
We now have the compiler enforce that by creating a HashedScopes type
that ties the scopes and the hash together and pass that into the
storage methods.
No more need to macro_use serde. Order the imports consistently (albeit
somewhat arbitrary), starting with items from this crate, followed by
std, followed by external crates.
The current code uses standard blocking i/o operations (std::fs::*) this
is problematic as it would block the entire futures executor waiting for
i/o.
This change is a major refactoring to make the token storage mechansim
async i/o friendly. The first major decision was to abandon the GetToken
trait. The trait is only implemented internally and there was no
mechanism for users to provide their own, but async fn's are not
currently supported in trait impls so keeping the trait would have
required Boxing futures. This probably would have been fine, but seemed
unnecessary. Instead of a trait the storage mechanism is just an enum
with a choice between Memory and Disk storage.
The DiskStorage works primarily as it did before, rewriting the entire
contents of the file on every set() invocation. The only difference is
that we now defer the actual writing to a separate task so that it does
not block the return of the Token to the user. If disk i/o is too slow
to keep up with the rate of incoming writes it will push back and
will eventually block the return of tokens, this is to prevent a buildup
of in-flight requests. One major drawback to this approach is that any
errors that happen on write are simply logged and no delegate function
is invoked on error because the delegate no longer has the ability to
say to sleep, retry, etc.
Along with the public facing change the implementation has been modified
to no longer clone the scopes instead using the pointer to the scopes
the user provided. This greatly reduces the number of allocations on
each token() call.
Note that this also changes the hashing method used for token storage in
an incompatible way with the previous implementation. The previous
implementation pre-sorted the vector and hashed the contents to make the
result independent of the ordering of the scopes. Instead we now combine
the hash values of each scope together with XOR, thus producing a hash
value that does not depend on order without needing to allocate another
vector and sort.
This requires enforcing that errors returned from TokenStorage
implementations are Send, which the ones in this crate are, but is a
breaking change because any external implementations may be returning
errors that are !Sync currently.
The motivation for this change is that Box<dyn Error + Send> is not as fully
supported within the rust stdlib as Box<dyn Error + Send + Sync>. In
particular there exists these two From impls:
impl<'a, E: Error + 'a> From<E> for Box<dyn Error + 'a>
impl<'a, E: Error + Send + Sync + 'a> From<E> for Box<dyn Error + Send + Sync + 'a>
but no corresponding impl for
impl<'a, E: Error + Send + 'a> From<E> for Box<dyn Error + Send + 'a>
This may just be an oversight in the rust stdlib that could be fixed,
but in practice it means that dealing with 'Error + Send' types is not
the most ergonomic because the '?' operator can't be used to convert
from a Box<dyn Error + Send> to a Box<dyn Error>.
Since the current implementations (not counting any external ones that
may exist) implement Sync this seems like a good tradeoff to make it a
little easier to use in an ergonomic way.
Change it to accept an iterator of items that can be converted to
`String`s rather than an iterator of items that can be referenced as
`&str`s.
Primarily this allows it to be called with a larger variety of inputs.
For example ::std::env::args().skip(1) can now be passed directly to
token, where before it would first need to be collected into a vector.
Since all implementations unconditionally collected the iterator into a
vector this shouldn't have any negative impact on performance and should
actually reduce the number of allocations in some uses.
It simplifies the signature since the lifetime bounds are no longer
required.
Use the power of the `AsRef` trait to take generic parameters for
several API functions. This makes the API more ergonomic because the
callers may pass in static `str` slices or references to owned `String`s
or even more exotic things like a `Cow`, all based on their particular
situation.
Update the tests and examples to use the most natural types they have
available.
Fixes#77. No existing code should break, as `&String` implements
`AsRef<str>` and `AsRef<Path>`