Policy is the Essence of Zero Trust Authentication
This video is number ten of the Zero Trust Authentication Master Class series.
I'm Jason Casey with Beyond Identity, and I like turnips. Now, let's start focusing on how to add policy into the system. So, we introduced this concept earlier of this three-dimensional world when we're thinking about Zero Trust access, what do we really want to worry about?
And it kind of starts with the operation, depending on what someone is trying to do. The criticality of what they're trying to do really informs what level of trust I need to determine about their identity and about their resource, right? So, if I can't make any statements about their identity and or their resource, I'm not really kind of connecting the dots, so to speak.
So, and just to bring it back, right, when we talk about perimeter-based security, so much of that architecture is based on like, do you have a privileged device? Or do you live on a privileged part of the network? As opposed to actually asking the questions that matter. Are the security controls I expect given the risk environment or given the criticality of what you're trying to do, in fact, present?
And if I can ask those direct questions, let's just ask those direct questions as opposed to trying to lean on like the properties of transitivity, right? It turns out that if A is less than B and B is less than C, then A is less than C in math. But when you use that sort of thinking from a security perspective, you get yourself into trouble very, very quickly.
And unfortunately, perimeter defense, privileged positioning, privileged devices really fall into that trap. So, the way we keep ourselves out of the trap is we always want to know based on what someone's trying to do we have to establish a level of trust in both their identity and in their resource or the thing they're trying to do it from. So, we also introduced this concept of the platform authenticator, right?
And the platform authenticator was essentially an authenticator that lives on the device or resource, right? So, it is our way to actually answer this question. It is our way of actually answering this question. But we haven't really shown a full architecture for that yet.
So, now we're going to show a full architecture and develop the concepts in a bit more detail. So, I have Alice as an operator. Alice is going to have a browser. I often will just say user agent because it turns out user agents can actually exist inside of native applications as well but from the outside end, they still behave a lot like a browser.
We're going to have our platform authenticator. This is going to form the basis of the end user. Then I'm going to have some application, we'll call it Slack to make it real. I'm going to have an SSO, or you can call it an IDP.
Let's make it Okta because why not? Okta's great. And then I'm going to have a couple of components on our side. First, is an authentication server or service. Then I am going to also need a directory service.
I'm going to need a policy service and a data service. And this is going to form Beyond Identity Cloud.
So, again, we are a pure SaaS-based solution and we exist wherever the internet does to serve you. All right.
Enough of that. Now, let's actually get into how do we get at and start solving some of these problems. Well, you'll notice I've introduced a policy service, you may have trouble reading it, but trust me, that says policy service. Auth service is going to do the things that you would expect. So, let's start this off, right? Alice wants the browser to go do a thing. Let's log in to Slack.
302 is really compact when you're drawing sequence diagrams. That's why I use them. But it's not the only way. All right. I'm going to go to Okta and surprise, surprise, Okta's going to redirect back to Beyond Identity. Because again, Okta's relationship to us is not dissimilar than their...Okta's relationship to us is not dissimilar than Slack's relationship to Okta, right?
One is acting the client, the other is acting the Auth service. One is acting the client, the other is acting the Auth service. Okay. So, we've got to this point, we want to log in. So, what's going to happen? Well, it turns out there was a step that actually happened ahead of time. Ahead of time, this Auth service and this is just based on our architecture.
It's not hard to believe that ahead of time our cloud understands all of our tenants, right? Tenant, you can think of as an equals a customer. And all of the sub configuration that goes therein of which policy, or I should say active policy is one. So, we just make sure that the Auth service has a copy of that.
It's important for this to happen ahead of time because just to show that like the only latency that really ever happens when we do policy evaluation is latency due to third-party services if you start to invoke them, not based on our service. So, we're really trying to design this to be as compact as possible. So, we'll go through and we basically do a policy evaluation loop.
So, we'll call that a PEL. And let's go ahead, whoops, and define a PEL over here, a policy evaluation loop. What does it actually look like? Well, the first thing we always say is validation. We are speaking open protocols, right?
In this example our protocol of choice is OIDC, right? But it could just as easily be SAML and in OIDC, there's lots of protocol things where if this message isn't well formed to a certain point, I can't proceed and I have to immediately send back an error, right? So, validation, you can think of it as like a combination of basic syntax but also basic semantics. So, if there's no client ID, which is a parameter specific to our Auth and OIDC, we'll call it a client ID.
If there's no client ID that actually comes across on this, I can't actually index it into a tenant or a customer and start to even do some basic processing. So that's the idea behind the validation step. Okay. The next thing that we do is what's called building a context. All right. So, remember, our theory of security really says that every network, every architecture is a sea...it's really a sea of transactions.
And so, if that's our atomic unit, if we can figure out how to make the atom of the transaction secure, everything else flows. Well, every transaction in our system, this included, can be described with a context, which is a fancy way of just saying, we have this state that talks about the transaction and it talks about all the things we know about the transaction.
And you could think of this really as divided into two parts. One part of it you could think of as fact. We know this to be true. And the other part of it, you could think of it as claim. Someone is claiming something to be true. And during the process, the whole idea is can we move this bar low enough to where enough fact exists for us to decide to either proceed or deny the operation?
That's again, really the framework here. So, I build the context at this point, what is the context going to have? It's going to have some really basic things. So, the client ID, I can index the tenant and I can understand, I don't know it's Alice yet, but I understand maybe it's Alice with you know, OmniCorp, right? I can understand of all the policies that are loaded, which policies to evaluate.
For Alice, there's a couple of other things that I can do. The next step is what's called enrich the context. So, enriching the context really involves kind of reaching out to my directory service. So it turns out at this point, I don't actually know who this user is claiming to be. So, I can't index anything in the directory just yet about the user, but I certainly know the IP address it's coming from, right?
Both the apparent IP address, like what we see here, as well as some things inscribed inside. So there's no reason I can't reach out to directory at data services and ask for enrichments on the IP. So for instance, this IP has been observed as part of botnet X over the last 24 hours. That's kind of interesting.
Are they connecting from a known tour exit bridge? That's kind of interesting. So there's all sorts of enrichments that can happen at step 3, which are really about taking this context and adding like additional tags, additional things that we know. And some of them are trivial and some of them are profound, but all of them help in making a decision a little bit later. Okay. Step 4 is we evaluate the context over the policy and this results in a set of actions.
And step 5, is we then execute that set of actions. So, this is our policy eval loop. This happens every time the Auth service gets hit, right?
Again, not much is going on right now, so the most likely rule that's going to get hit and policy, and obviously you can write policy however you would, but the most likely rule that's going to hit is say, I can't proceed in any sort of way. So I need you to execute a prove action. And the most basic proof is I need you to prove that you possess a user key in a tenet, right? And obviously, we're not going to name the tenet.
I committed a [inaudible] We're not going to name the tenet. We're just going to say, I need you to prove that you have a key and a tenet, and we may ask for a couple of other proofs and we'll talk through what proofs mean here in a minute. And so, executing that action is going to result in this coming back and basically being that challenge. And again, in our system, the challenge is a series of proofs.
I want you to prove A, B, and C. So, this challenge is bundled again in a redirect. And so, we're going to get the browser, the user agent to invoke our platform authenticator. It's almost like it's logging into our authenticator. And that challenge comes across to our platform authenticator.
It's able to not just interpret that challenge, but essentially respond to that challenge with proofs. So, remember, we... I'm running out of space here. Remember we talked about the architecture of the platform authenticator, and I'm going to overemphasize a couple of layers, but we said we have this crypto abstraction layer, and then we have this operating system abstraction layer, then we have this kind of trust library that sits on top.
You could think of the trust library as a solver. It's job is given a set of proof requirements, provide a stream of evidence. And so, that's exactly what's going to happen right here. It's going to provide a stream of evidence. This is going to come back up here. Call that our challenge response. It will be cryptographically sound in that, like the evidence is signed over using a private key that's anchored in the secure enclave.
And the certificate of the enclave is presented to prove out the veracity of the key and the resource that's hosting the key, right? And again, policy evaluation loop. The exact same processing. We try and validate it, right? So, now on the validation step, excuse me, we can answer the question, remember any time we sign a payload it essentially get sealed over the signature.
So one of the questions that comes out of here is this just is it valid? Based on everything that's produced is this valid or has it been tampered with? We build the context. So now we do know we have some additional claims, right? Now, Alice is claiming that she's Alice under OmniCorp, right? And Alice is claiming that she possesses, she has the rights to use this particular key and this key.
And if the key certificate is here, she's claiming certain things about the key, right? So the validation step is very mechanical. It's very mathematical. We're really just trying to push this line down. Can we take those claims that she just made and can we prove them and verify them as fact, right? And obviously, if the root of that certificate tracks back to a secure enclave manufacturer that we trust in our root of trust of enclave manufacturers, then we do move the line down for that sub-segment of things that were actually in the validated response.
So, for sake of argument, let's say that the action the second time around was an allow. And let's say it said, you know what, here's an allow, and here's a series of scopes that we're going to allow. Then, well, remember how this unwinds, I'm going to go back pretty quickly.
It unwinds with essentially... All right. Or these are getting ID tokens, and these are really just the redemption. And again, I'm just calling out basic Auth.
So, this is how we get to the policy engine, and this is how we unwind the stack. So, really the decision-making is what happens in here.
And it is what's driven by policy. Policy always follows these particular steps. This is really only useful...It's useful to conceptually understand what's going on. It's useful to debug the system. The other reason why we put it up there is to just show that like we have a sound and principled way of thinking about how we actually do policy execution.
Latency that the user is going to experience in this environment is really a function of distance of this person and the application, and the SSO, and us. We will get into the fact on scalability and performance a little bit later of how we actually compress this down.
But something else you might have noticed from this drawing is how many times you come back to the Auth service is actually policy dependent. So, depending on how you write the rules inside of here, by the way, that's kind of what the rules look like. And we'll get into that a little bit more in detail. You can make this go back and forth and be iterative or you can make it quick. And our reasoning for that is we know every corporation, every enterprise, every organization has different data privacy concerns.
And from a data privacy and security perspective, it's kind of two sides of the same coin, but it gets flipped in a different way depending on an organization, it's policy, it's people, it's mission, blah, blah, blah. So we wanted to make it possible with our policy engine to really allow the organization to decide what information is collected from the endpoint, from the resource given what's actually going on.
So, we don't by default actually collect any information in this loop. So when you're writing policy rules when you say, I need you to prove X, Y, and Z, everything you ask to be proved ends up being a data collection operation on the user agent. So, it's our way of basically making the choice of what data to collect, to adjudicate security decisions to give that back to the security team at enterprises as opposed to, you know, one thing we could have done is we could have just collect everything possible upfront and then you have all the data and you can make all the decisions you want.
The downside to that is that doesn't fly in privacy-conscious organizations. It also is a bit heavy-handed if the risk of the operation is low, do I really need all that other information? So that's why we made it progressive. And you can really kind of design the system as a policy writer to dig as deep or as shallow as you need given the operation someone's actually trying to drive. A couple of other things it might point out, every time we get hit, clearly there is an event that goes into our data system, which we then can export to, you know, your SIEMs or your data lakes however you see fit.
We do have analytics that run over here. So the astute observer might say, well, how does your analytics run over data if there's no policy asking the user agent or asking the endpoint to provide some of that data as part of the proof to actually proceed with an access attempt? And so, the idea there is actually...so we have actions, you know, allow and deny. Those are kind of obvious, but we also have something called monitor.
So, monitor is like, "Hey, whenever this condition is met, I just want you to shoot a little packet of SIEM data about the environment that generated that condition up to the cloud." And so, that ends up useful for policy writers to kind of really interrogate the security posture of their devices without shutting down access, right?
Because it turns out when we show up to most organizations, they already have business, they already have operations. It's also useful for analytics. Analytics can build what's called a requirement set of rules with monitor. So when you load an analytic, it essentially loads its monitor rules into policy. So then we get that data and the right data for analytics to actually fire. So we've now kind of connected the loop with policy. The next step is we'll talk a little bit more about, well, what does policy actually look like and how do you write policy?
But this kind of concludes the section on how does the architectural function? How does it actually perform? And again, the key concepts, we have this platform authenticator. The platform authenticator truly is universal. It works on any operating system and any secure enclave technology. It's based on this iterative proof concept with our policy engine.
The idea being is we only collect the data that you decide is necessary to really answer this question over here. Based on the operation someone's trying to conduct, what level of proof do they need to present about their identity and their resource? Or another way of internalizing that question, are the security controls I would expect at place on the device initiating the transaction relative to what the transaction is?
Much more of kind of a Zero Trust way of just cutting to the heart of the problem.