Maximizing Authentication Performance

This video is number fourteen of the Zero Trust Authentication Master Class series.  


I'm Jasson Casey and this is our section on scale-up. And specifically, we're going to focus on throughput and availability. So, we talked about scale-up in terms of latency, how we organize around the world, and we left our building block at the region. 

So, now we want to kind of expand that out. So, what is a region and how does that actually work? So, traffic is going to come in. And initially, traffic is going to come in and it's going to hit a load balancer, right? This happens at the region level. And this load balancer has a choice of where to send the traffic. And that choice is really going to be the breakup across availability zones. 

Now, I'm using Amazon terminology, but we also operate in the other cloud providers as well. It's just a lot of us kind of grew up on Amazon terminology and that's what we work with. So, I have my first instance of servers. And these servers, the first thing they really do is they worry about putting things onto a message bus, right? 

This is a horizontally scaling message bus where essentially microservices can talk to each other behind the scenes. And so anytime you see me draw these boxes like this, it just means that there's more than one of any particular function. I didn't really... And then there's usually data stores backing these, right? And these data stores can be horizontally scaling data stores if it's really a large massive data set, or they can be kind of more classic data stores that you worry more about like sharding if it's actually small data sets. 

The key thing when you're actually working on these architectures is to match the tools with the problem at hand. Right? The counter technique that you have to be on the lookout for is, unfortunately, and I certainly fall into this gap, is engineers like to use new tools. Right? 

We read a blog about a thing, if it turns the right gear, if it tickles the right part of the brain, we want to use it even though it may not be the best tool for the job. And that's really kind of what we have to focus on down here. Okay. So, I come into a region, I have a load balancer. The load balancer is deciding essentially which AZ is actually getting my traffic. 

It's deciding on which AZ gets my traffic really based on past performance like how active is what's actually going on. There's some other information that kind of comes in here really around the concept of failure detection on whether to stop sending traffic into one of these particular areas. We have a microservice approach, and this is just to allow kind of separation of concerns and isolation of concerns in our architecture. 

Message buses are pretty common ways of reducing synchronous calls in any sort of operation. So, let me back up a little bit. When I'm worried about performance, throughput and latency are highly connected. Right? Previously, we talked about latency in the big sense, right, that is dominated by round trip times plus the latency of a wire, the physics of a wire. 

But there is computational latency, and that's really what I'm talking about right now. And the relationship between computational latency and throughput is such that if I can decrease the latency of a computation, I'll always get the inverse of that as a speedup in terms of throughput. If I'm able to add to my throughput, I'm not necessarily able to reduce my latency, like, it doesn't actually work in that exact way. 

So, I'm always focused on how do I reduce computational latency of any particular job? And I always hit some point of diminishing return. And once I hit that percent of diminishing return, then I want to make sure that how do I guarantee I'm maximally using the underlying resources? And so that basic idea here is this concept of, you know, we never really want one service or one thread of control to be able to starve another thread of control. 

Every thread runs until it can no longer run. It needs things that are no longer local, or it needs to transfer control to something that's no longer local, and it immediately ceases once it posts that message, and the next thread takes over. And that's kind of how we're able to kind of keep that non-blocking thought process going here. So, we have these AZs. 

These AZs all have replicated data. And when we think about our problem, right, our problem is authentication. So, there's also a way of dividing up the problem where I don't have to have such tight synchronization boundaries around these data stores as well. So, if one of these AZs were to fail between me logging in and you logging in, no one's going to notice the difference. 

If one of these fails mid-transaction while I'm logging on, I'm going to notice. And I'll notice it through a timeout. And so I then usually just retry and I'll come in through a second AZ. Now, the way that works is when we look at a transaction, we start to... Maybe we need to remind ourselves what's in the transaction, right? So, I'm going to just draw a super simplified version of the diagram. 

Alice wants to authenticate. I send out a challenge. She sends back a response. Right? So, clearly, I have to remember that challenge to get that response, then I send back a code, and then Alice will send that code, remember, to the third party, right? 

The SSO was the one we talked about. And then that SSO will try and redeem that code with us before it lets Alice proceed. So, clearly, we also have to store the code and remember that code. So, when I talk about the transaction, this is kind of what I'm talking about. Like, this is the only real state that we have to worry about keeping around, because if I can't keep this state around in a high-performance way, I'm not really servicing these customers. 

But it turns out all the other state, like, what's the client ID for this? What's the thing for that? All the other state, the change frequency in all that other state is really, really, really low. And because the change frequency on everything else is really, really low, and most of my operations on all the other state except for this are reads or read dominant, I can just do simple replication on the back. And so there's a really, really loose coupling with all of this. 

Not just loose replication amongst a particular region, but also between regions, right? Region one, region two. That way when Alice shows up to actually do an authentication with region two or a different AZ, all the data that's necessary to handle that is already there. Now, remember when we talked about pinning, sometimes we don't want it to be everywhere available, we just want it to be region-available. 

Well, it's actually easy to say replicate everywhere, but then here's my exception list, and for this exception list, keep it in this region or this region or these regions, etc. But that's kind of the exceptional case. This is the general case. So, this is really the only code or the only state that I have to worry about maintaining in a highly available manner. 

And the other kind of out that I have is it's not like a phone call to a 911 dispatcher or an emergency dispatch facility. I don't have to always keep everything up. If I did have a failure and I have an ability for Alice just to hit Retry… I touched it again. And Alice just hits Retry, and they could come through a different zone or a different region, then I'm actually servicing Alice in a way where the outside world they may notice the failure count, but from an availability perspective, we're actually still providing service. 

So, this really is the trick. How do I manage this? And it turns out the way I manage this is a couple of things. So, number one, all I really have to do is just guarantee that whatever microservice this auth lands on, I just need to guarantee these two come back to. And it turns out in this protocol, it's really easy to track that transaction through things called the client ID. 

There's a nots, although they don't call it a nots, they call it something else. But basically, it's a unique value that can kind of use to hash this specific transaction. Also, the way these protocols work, it's very easy for you to just push an arbitrary parameter back in the challenge, because, again, remember these are, like, redirections. And it's going to come back to you and it's going to include that piece of information, so you can use that piece of information to track. 

And because this is in the path of the URL, so remember in HTTP, the first line of any HTTP message is called the request URI, which is the method, the URI, and then usually something that looks like this. Well, in this URI, if I explode that out, it usually looks like… Well, no, in this scenario, it would be 

And then it would have parameters, right? So, it would say client ID equals something. And we could push this arbitrary parameter, right? And this arbitrary parameter might be… Well, we would have pushed it in a previous message and it comes back to us as, you know, literally arbitrary equals X. 

And then there's more. So, because that's in the URL and I have not just a layer-three load balancer over here, but I have an application…these are application load balancers, by the way. App load balancers, i.e., they're operating at layer-seven. They read the path, right? 

They read the path, this part right here, the path, the thing that is actually beyond the domain part of the URI or the request URI, the HTTP request. I can route and I can switch on it, i.e., I can guarantee this goes back to the machine that actually has that state in low-latency, high-refresh memory. So, that's part of the problem. 

The other interesting thing to point out is by keeping all of these things in the path as opposed to in the domain, I can keep a lot of this information out of DNS. I can keep a lot of this information out of certificate transparency. And these are two gargantuan datasets that are publicly available that generally both adversaries and red teams use to build inventory lists of what products and services are actually in play and what third-party relationships exist, etc. 

So, this is a strategy, right? By, like, encoding as much as you can as the path as opposed to a subdomain to really kind of guard against that information leakage. But it's also a strategy to be able to switch traffic to a very small subset of my architecture to guarantee that the data that needs to be there is in fact actually going to be there. 

And it turns out if it's not, we can restart the transaction in a pretty simple way. So, this scales horizontally. This has no state in it. This slow replication state has no impact really or almost a minimal impact on this. So, it turns out this actually scales quite nicely, both in terms of throughput and in terms of transactions per second. 

It's really just a question of adding really easy two instances where I need them whenever I notice certain load averages happening in the architecture. So, again, this was our kind of final section on scale-up performance from a design perspective. There's a couple of key ideas in this, number one, is how do we minimize state necessary across the transaction as much as possible? 

Some of it can be accomplished by pushing into the URLs, but some of it cannot. You still have to store some of it in high-speed data stores or high-speed things like Memcached, that sort of thing, but you minimize it. And essentially, you make the failure path to where when something does break there, you just do a retry. Everything else is slow state. 

Slow state can be replicated willy-nilly. And your synchronization concerns on slow state don't matter because we're talking about updates on the magnitude of days or months or weeks, whereas these things occur in the magnitude of seconds or milliseconds. So, the trick is really just kind of keeping the domain separate and then, of course, also managing your controls at the L3 and the L7.