High Availability at Cloud-Scale

This video is number thirteen of the Zero Trust Authentication Master Class series.  


I'm still Jasson Casey, and you're still watching. So, let's finish up this section on scale. And that's going to bring me back a little bit to kind of the beginning of the company. A little bit about our background. 

A lot of the early engineers in the company have a background in telco, building telco equipment, switches, routers, firewalls, VPN concentrators, services that have to not just operate at scale in terms of high performance, but also at high availability. And the other half of the kind of the founding engineers, early engineers if you will, had a background in security, specifically security that was more focused on kind of surveillance, observational analytics, those sorts of things. 

So, when we got started here, we kind of pooled a lot of our learnings, and it resulted in a couple key objectives of how we wanted to build the platform here. We knew that we wanted to start from the beginning and build something that was telco scale. And when we say telco scale, we don't mean kind of traditional enterprise software hosted in the cloud. 

We mean designing kind of a truly multi-tenant cloud-native AZ and region-independent architecture, where no matter where you are in the world, if you're using the Beyond Identity service, you're using a region's resources that are closer to you, as close to you as possible, right? So, your latency is compressed. 

You're using a set of resources that is redundant, so if a service goes down, there's backup services that can immediately kind of take your retry, so to speak. And obviously, something that we could scale with the growth of our customers, right? So deploying in partners like Amazon and Azure, where it's rather trivial for us to scale up both compute and storage. 

The other part of it, the security part, I mean, obviously, we wanted to design correct security protocols that, you know, do what we say they do and nothing else, but a couple of us had some backgrounds in cybersecurity risks ratings and just kind of basic network intelligence and analytics, and a disturbing trend that we kind of picked up over the last decade is the proliferation of companies' assets and customer lists into the DNS architecture and even into the certificate architecture that shows up in things like certificate transparency. 

And we wanted to make sure that while we can never truly minimize...or while we can never truly eliminate all instances of those sorts of things, because, ultimately, sometimes you make choices where these bits of information do have to come out and do have to be public, but we wanted to make sure that our architecture wasn't opinionated. And I keep seeing opinionated user all over the place like it's a good thing. 

But, ultimately, when you're trying to service enterprises, I found it's a good thing to not be opinionated because every enterprise has a slightly different opinion, and ultimately, you're trying to kind of work within their opinion and get them good results. Or work within their definition of risk and get them good results. So, we didn't want to be opinionated on essentially how they set up integrations with applications and third parties and whatnot, to where we would force the disclosure of the tool chains that they actually use, right? 

Because, remember, from an adversarial perspective, when I'm attacking a company or an organization, the first thing that I do is I actually build targeting list from surveillance, of who works at the company, what products is the company actively using, right? And so if I can keep that list as small as possible, it raises the bar a little bit. And ultimately, you know, we have to do a lot more than just that, but, like, why make it easy? 

And that's kind of how we got started. So, I'll jump in, and I'll just kind of start with some first principles, and we'll just talk about the latency side of things. So, how fast can I truly do this? All right. So, in theory, that's North America, and there's South America. There's Africa. 

And you're seeing Jasson's, "How well does he understand Europe?" And then my knowledge is going to dip a little bit down here. And loop de doo. All right. Let's not forget our buddies over here, and down here, and over here, and is it over here? Yeah, I think it's over there. 

All right, good enough. And I managed to stay in the line, so you can see everything. Points for me. All right. So, it turns out any multinational company is going to operate across the world, right? Their headquarters may be based over here, or maybe their headquarters is based over here, or maybe their headquarters is based over here, or maybe their headquarters is based over here. But, their employees are going to, you know, be everywhere. 

Their contractors are going to be everywhere, right? So, there's this interesting problem that shows up, and this is the first thing that we're focusing on, right? We're talking about latency. And this interesting problem that shows up is, from an access perspective, I want to service the company, but I also want to service the user, and I want to service the user as close to the user as possible. 

And the reason for that really just has to do with physics, right? You can always write software that goes faster, but your software can never truly go faster than the physics that's powering it. You can never transmit data between two points faster than the speed of light, right? That's what we really say when we mean physics. So, we spend a lot of time on the architecture and writing efficient software and removing latencies there, but the long pole that really drives a lot of latency performance in products like ours is actually latency. 

How far are you from the services that's actually servicing you? So much so, there's a whole industry right now, and I had actually...I didn't know the cool name. I was calling it something else, but there's whole industry really built around this concept, and it's called edge computing. And the basic idea is by bringing the computing as close to the edge as possible, you're going to give a much better latency experience. And just to see why that's the case, remember our little sequence diagrams from earlier, right? 

I have the user agent, I have the app, I have the SSO, and I have Beyond Identity. And remember what that flow looks like. I bounce here, I bounce here, I bounce here, I bounce here, I do a get again, I do it one more time, I come back, it verifies who I am. It comes back, I go back, it verifies who I am. I come back, and then, finally, I get the page I'm logged into, right? 

So, one, two, three, four. How do we want to count this? We can count this as five and six. Whether it's five or whether it's five and six really depends on are these two close together relative to the UA or not, right? But let's just assume they aren't. And then let's say this is seven and eight. 

All right. So, I have eight round-trip times. Let's say my average two-way latency is...actually, let's make it easy, 100 milliseconds, right? So, I have now built a system that is never going to perform better than, it's always going to be some number larger than 800 milliseconds. And that's assuming, you know, my computation times are negligible. 

And if we're building a proper system, we're usually operating in the microsecond range, right? Where there are actual transactions or actual computation time, right? If we're not, we're not really building cloud systems, it's a good way of, like... I'm sorry, we're not building high-performance systems. So, it's a good way of really thinking of how are you actually looking at some of these things. But, the point I'm driving here is, like, this clearly dominates. 

Or put in another way, where you live, i.e. what this number is, and how you've designed your protocol. Sometimes you'll hear people describe protocols as chatty. These are the two most dominant factors that actually impact latency, which is the most dominant impact for kind of customer experience if you will, right? 

Your end users are happier if things just happen, right? They're less happy if things take longer time. So, when we set up latency, we have this problem, people live all over the place. So, how do we attack the problem? It turns out we attack the problem in two ways. There could be 10 ways we attack the problem, but it turns out 2 of the knobs have the biggest impact. 

So number one, how do we design the protocol to require the fewest round trips possible? And we showed you what things looked like earlier and, hopefully, we were able to prove to you to your satisfaction that there is no sequence of the protocol that's more compact than what we represented if you want to federate from an app to an SSO and to us. And at that point, it's really just driven by how you write policy, right? 

And, again, just remember, we have a minimized data approach in that, like, we'll collect almost no data. Data collection's really driven on how you write policy. And depending on how you write policy, you can gather a bunch of it, or you can do it progressively. And that will have an impact on round-trip times. So, now we move to the second part of the problem, which is latency, right? 

So, I could build one service node in North America and say, "You know what? I'm good. I'm going to make this super highly available. I'm going to make this super-high throughput. And I know I'm going to sell to companies overseas, and I know my U.S. companies are going to have people overseas, but you know what? I don't care. They're all going to home back to this U.S. 'node,' if you will." Well, the problem with that is on any given day, we are talking about...use the right color, from this region of the world, we're talking about, you know, 100 to 200 milliseconds, and from this region to... 

I meant to do that. To this part of the world is, like, 200-plus milliseconds, right? So, we're just taking this number, and we're jacking it up by a factor of two and oftentimes even more, right? So, how do we do better than that? The answer in how we do better than that is at Beyond Identity, we actually operate regions. And each of our regions are identical. 

And what that basically means is, and I'm going to go into our regions architecture in the next part, but for now, let's say I have a region on the eastern part of the U.S., and I have a region on the western part of the U.S., and I have a... Oh, wow. How did I forget these guys? And I have a region right here, and I have a region right here. And over time, you know, I'll end up with a region over here, and this is probably not right, but I'll have a region over there, and I'll probably end up with a region right here, and, you know, we'll see how business goes, but we'll probably end up with a region down here. 

So, a key idea about all of these regions, they look identical. They have the same computing architecture, they have the same layout. So, if I'm a user in this system and let's say I happen to be over in, you know, Eastern Europe but I work for a U.S. company, when I go to access my service, I'm actually going to use a BI facility that is close to me. 

So, how do we determine closeness? Well, turns out there's really only two ways of dealing with proximity. One is through DNS, and the other is through BGP. How do I actually route to a resource that's close to me? There's really two primary mechanisms, right? DNS and BGP. 

So, let's remind ourselves what these things are real quick. So, humans, right? We're all humans, I think. We're all humans, don't remember things like this well. Like, I need you to go to, blah, blah, right? What we do remember is actual human names. We've been using this example, bank.com, so let's just keep using it. 

So, DNS is the mapping of that useful name to an actual name, whether it's a v4 address or a v6 address, and I'm not going to write v6 addresses because they're annoying, but they exist, too. All right. So, it's possible through the DNS configuration of the world to actually program your DNS entries to essentially respond with a different IP address depending on what servers are actually being used in what part of the world. 

DNS can also do things like try to figure out based on where you're coming from, your source IP, guess at where you are in the world, and give you an appropriate answer. I'm not saying we'll never use it, but I tend to kind of not like that approach. I think the BGP approach is a better approach. Sorry. So, what BGP does, BGP has this concept...well, it doesn't really have this concept, it's just general concept of...it's how networks essentially advertise to each other what they can reach, right? 

So we have...you know, all these continents have networks on them, right? Different network operators live all over the place. And these network operators, they form, you know, the "internet," the big eye, if you will, by interconnecting with each other. And so, there's two types of interconnections. There's a big fat pipe that's basically giving traffic back and forth, but then there's a smaller, slower, much smarter pipe, which is the...and it's not really a pipe, it's more of a connection, but it's the BGP connection. 

And it's essentially the protocol that lets these regions not just talk to each other, but aggregate information that they've learned from elsewhere and pass that information along. And I'm sure you've heard of these things called path-based routings before, but without this turning into a BGP session, you can basically get things that say, like, "Hey, I have a path to this guy, and it is starting at A, going to B, going to C. 

And I have another path to that guy, and it is from D directly to C," right? And so you get these different paths. And so through using BGP, what you can do is you can ensure... 

So because remember, and the way BGP routing works is I'm always trying to take the smallest distance to get to my destination. So, I would take this distance. What you're able to do is you're all to be able to make sure that the route that exists for the same IP address, this, in all of these regions is actually through a facility and through a link that is in that region, right? 

And so that's kind of the BGP approach. That's how we... When we actually use that, we're not using bank.com to distribute. We're actually using the IP address. So, it's kind of like the IP address lives in multiple data centers around the world at the same time. And it does. And the way that we're able to do that consistently is through this nice partitioning of traffic, and BGP, of course, is what gives us that partition. 

So, the way that when I'm over here, I go to this particular facility, is because it's the shortest route versus this facility. Now that sets up the problem that we'll talk about in the next drawing, which is how do I ensure consistency of data across these different regions? But there's one more thing to address before we move on, which is, "But, Jasson, what if I don't want to go to the data center closest to me? What if I want a home to a physical data center?" 

And there are companies that will have reasons for wanting to do that, and that's actually easy. In that scenario, we can actually set up essentially the named region as part of the domain, and, of course, the general name exists everywhere, but the specific name is actually going to exist only in one location. 

And that one location, let's say, happens to be this region or this region. And that's how you get things to that area. So, what we focused on in this drawing is talking about latency. Latency is important, right, if you want this. Latency is predominantly driven by protocol round-trip times and physical latency, right? 

Speed of light, things like fiber in the ground, fiber under the ocean, that's and whatnot. So, we combat that in two ways, right? One is by designing the simplest protocol possible for the problem. Two is by having a proper cloud-based architecture to where I'm always going to the facility that is nearest to me. 

And this also will set up some of what we use when we get to the highly available argument that when an entire region goes away, it turns out, from a routing perspective, as long as that route comes out of the global routing table, I always have backups that I can route to. So, traffic can seamlessly flow to a facility that's ready to receive it.