CICD Pipelines are a No-man's Land of Ownership
Informal chat with our host Reece Guida, CTO Jasson Casey, VP of Product Strategy Husnain Bajwa, and Product Evangelist Nelson Melo about the recent CircleCI compromise, the phishablility of traditional MFA, asymmetric secrets, credential theft and signing fools, and trusted computing.
Hello, everybody, and welcome to the first "Cybersecurity Hot Takes" episode of 2023. It's a sentimental moment. Today marks the one-year anniversary that we started recording our precious little podcast. And we did get a bit of a tech upgrade.
I don't know if you've noticed that the timbre of my voice is more resonant. And you will notice that my precious beloved co-hosts will also sound enhanced, thanks to our producer, Joshua. So, we're gathered here today to talk about Circle CI. And the hot take I'm going to offer all of you is that, you know, when it comes to CI/CD pipelines, it's kind of a no-man's land of ownership.
So, HB, can you get our fellow hosts and audience up to speed on what happened with CircleCI, and then we'll get into this hot take?
Sure. So, CircleCI is...
Do I need to re-record that?
We're doing great.
CircleCI is one of the dominant modern alternatives to Jenkins, and Bamboo, providing sort of the automation and coordination of CI/CD pipelines, including policy application and build rules. CircleCI has been especially popular because of their usability.
They state that they have over a million developers using their platform. And during the holidays this past December, they experienced a compromise of one of their privileged accounts. The eventual realization was that a valid session token was stolen from a privileged user.
And it was not detected for a couple of weeks. Once it was detected, the scope of the compromise and the impacted secret in a tax surface was evolving rapidly, especially over about a one-week period from December 31st to January 7th.
The result of their investigation was that they forced or recommended rotation of all secrets across their solutions. And they indicated that they were going to take some steps to upgrade their overall security. One of the interesting aspects of CI/CD systems is that they've been a target of attackers and adversaries over the past year especially.
So, when we hear about Source Code Control and software supply chains. A good portion of that attention is now being applied to CI/CD pipelines that tend to get implemented and not adjusted over time. They're often treated as an adjunct and not directly associated to the production systems.
And as a result, they tend to be a little bit stranded. If I were to guess, like, why many of the customers at CircleCI go with CircleCI, it's because it's a modern cloud-delivered platform.
So they're able to provide a level of security, and service assurance, and maintenance that is not typically associated with legacy, kind of Jenkins or Bamboo setups that people might have. So, it's particularly jarring when this kind of attack occurs. And my impression is that it's probably necessitates looking at safety practices and security practices altogether, and really seeing these things as being critical to security.
And to that end, I think Jasson probably has a lot of insights on how security and safety might be established in these kinds of systems.
Yeah, if you guys have to guess, what's the first place you would look at as the root source of this compromise?
Well, I mean, they got malware on the guy's device, right? So, clearly, the malware is deposited in some way. He targeted delivery, he got phished, social engineered. I am kind of curious, like, did they confirm any sort of, like, techniques of tokens?
Like, how did they steal the token? Do you know or do they know?
They just... So far, they've only indicated that malware was deployed on the user's system and that malware was not detected by the antivirus software. There was no indication of whether the anti-virus software was actively updated and whether the signatures were current or the specific malware in use.
But the subsequent attack vector after initial access with malware installation involved cookie theft.
Got it. So all we really know is they had a foothold on the endpoint, not necessarily the configuration state of the endpoint, or how they went about getting the token out of the browser or the UI. And clearly, the privilege level, the token the guy had, to begin with, must have been pretty high for the adversary to then turn around and mint production tokens, right?
Yeah, I don't know.
And for a company like this, it's probably safe to assume they have a single sign-on system. They have MFA. The phishability of the MFA, I'd be curious but...
I would say it's safe not to assume, right? Like, you don't really know how a company is set up until you actually talk to them. Right? So, maybe they're set up... Maybe it's a super secure organization and this was, like, highly targeted and advanced or not, right? We just don't know until they release more details.
But yeah, token theft is pretty interesting, in general, but also, when you think about the amplification effect, right, like, this is not reminiscent, but definitely reminds me of SolarWinds, right? Like, CircleCI is at the heart of a large amount of products being built. If I have an ability to compromise that infrastructure downstream of customers, then what steps would actually be necessary to doing, like, sunburst-style attacks that we saw in SolarWinds, where they're actually injecting implants, that sort of thing?
Yeah, I just don't know, at this point.
Yeah, so I think the interesting part of this to me was that, obviously, organizations have been engaged in some sort of third-party assessments and security questionnaires with a provider as important as CircleCI in their environments. And CircleCI has SOC 2 Type 2 certification.
And so, I think this probably necessitates sort of revisiting what compliance looks like versus effective security itself. And so...
I mean, of all the name-brand breaches of the last 12 months, how many of them didn't have SOC 2, right? Oh, go ahead. I'm sorry.
So what kind of approaches would you think should be taken? Because, like, to me, this feels like the mental model is not sufficient, that there's not enough being applied to functional safety and functional security in these environments. And there's not a holistic approach to take a look at these things from, you know, a threat model perspective.
That next level of why the breach occurred is not happening. And you probably need to, as an organization delivering safe products, go two or three levels deeper in terms of investigating the whys on a lot of these things, right? Like, when...
Yeah, I would break it up, right? Like, this one's tricky, just because you have kind of the compulsory necessary reporting that's happening. You don't have the details. But how did the machine itself end up with malware? Right? Like, that's an interesting question. Number two is how is that malware on the machine if it was unprivileged able to escalate, right, or able to actually harvest a particular token?
Which is a pretty interesting question. Number three, and this one's kind of the more obvious result is, why isn't there partitioning between production and development? Right? It doesn't make the problem completely go away, because then you can just do, like, essentially, staged attacks, but it makes it a lot harder, right?
Like, production infrastructure should, for the most part, be read-only. You shouldn't be able to modify it and that sort of thing. Yeah, no, there's definitely ways of bucketing down what we do know in terms of phrasing interesting questions, and then also, maybe we do another session on, like, well, what is state of the art for these three different boxes right now and who is kind of doing interesting things there?
Because I know token theft, specifically is an area that we're kind of starting to do a bit deeper research into, right? Because we feel like we do a pretty good job of preventing credential theft and signing fools and man-in-the-middle sort of things. And if we are really kind of solving most of that problem, the next frontier, the next milestone, the next checkpoint is, all right, well, what about tokens?
Because aren't tokens, for the most part, Willy Wonka magic tickets? You show up with a golden ticket and you get all the chocolate you can eat. What can be done there to make those things...really to reduce their blast radius when they do get stolen and to make it harder for them to steal? Why am I not or why are systems not involving hardware-level protections for these tokens? Why are they still symmetric?
Like, there's a lot of infrastructure behind the question that I'm posing that would have to change. But I do think that's where the frontier of some of the stuff that we kind of like to work on is going to start moving.
And that applies just as much to employee access tokens to internal applications as much as to repositories that CI/CD platforms have. I was looking at the mitigation action state proposed where they followed. And actually revoking GitHub tokens, and Bitbucket tokens, and personal API tokens is part of all of it.
So, I don't know if this is an area we've thought a lot about, but helping organizations use asymmetric keys and different types of tokens for external authentication, is that something we've talked about a lot?
We definitely haven't talked about it here on the show, right? But, like, from industry problems that are near and dear to our heart, right, like, philosophically, we're opposed to symmetric secrets, right? Because symmetric secrets, by their definition, have to be exchanged to be useful. And that brings in part of the problem when you have to exchange it. How do you know you're not exchanging it to either a bad actor or infrastructure that could be or is compromised by a bad actor?
Asymmetric secrets are so much better because you essentially don't have to go through the same process. Now there's a little bit of weakness at sign-up but it's one moment in life when you're doing an enrollment or endorsement as opposed to continuously every time you try and use the credential. You know, we want to take it one step further by using hardware to anchor that credential to guarantee, not only does it not move, but it's never even in the memory of my computer, right?
It's never in the file system on my computer. There's literally a little thing on my motherboard, my processors are going off and asking to and saying, "Hey, can you pretty please sign this?" And it responds based on however the policy is set up, saying, "Yeah, give us your... Well, prove this. Prove that. And if you do, I'll give you a signature." Right?
So, we can do that for access today. Right? But where else are symmetric secrets being used? Because the problems of symmetric secrets are going to carry over, right, as we see, right? Cookie theft, token theft is a thing. And I don't have the answers to this question but I do think they're really interesting questions of, how do we promote harbor-backed, enclave-backed, asymmetric crypto in the token space? Where are the performance bottlenecks?
What are the limitations? What infrastructure must be changed versus where are we just not being creative enough to allow for transparent operation? Yeah, I think that's kind of an evolving frontier for the type of technology we work on.
Yeah, you wrote a blog post about some of these that I think we still have internal, but we maybe want to expose it externally because some of these thoughts are pretty universal.
Credential theft and signing fools. Yeah. So, what Nelson is talking about, but we probably will make it public at some point. It was really just an internal dialogue that I've been having off and on with, you know, sales folks, product managers, engineers. And it was kind of the philosophy behind why we do certain things.
And it really... You know, the title I think sums it up pretty well, which is, when you're worried about theft at the point of authentication, there's ton of two main classes of problems that you have to worry about. Technically, there's more than two, but these are the two big biggies. Number one is just credential theft, right?
If I can steal your credential, I can be you. So, how do you make it impossible to steal a credential? And that's kind of the first part of the white paper. The second part is, all right, if the credential can't be stolen, what's the next likely avenue for exploitation? And the answer is a signing fool. Well, if I can't steal the key used to sign a thing, can I get the fool who has the key sign things arbitrarily for me?
And there's lots of really interesting attacks that show up there. And it turns out, like, there are protocols that provably solve these problems that are practical and exist today in production in commercial systems. And how do we promulgate the use into more and more of industry where we can kind of move from this symmetric secret approach, to any asymmetric secret approach?
Maybe more food for later.
Yeah. So, what do you make of the fact that all of these, like, name-brand breaches that you were talking about over the past year, that a lot of them are essentially saying, ''Hey, look, we got breached? It's really bad but you're safe. Nothing was really compromised.'' And, you know, the trust but verify part of that, it feels like there's some sort of a gap there, especially when CI/CD tools are involved and, like, core components of the developer lifecycle.
I'm reminded of Ken Thompson's paper on Trusting Trust, you know, compromising a compiler, then messing with the decompiler, so that your friends at Bell Labs can't figure out, like, exactly what's going on for a while. On that kind of stuff, like, what do you think the importance is of things, like, you know, code signing, and provenance, and attestation?
Like, at what level should people be worried about this? And how does one go back and account for things in their own environment?
So, the general area that we're talking about is called trusted computing, right? And I would argue, while trusted computing has been around for a while academically, it's only been practical in an industrial term or a commercial term for the last couple of years. And it doesn't today solve all the problems you just walked through.
But trusted computing is going to become a more common phrase in any engineer who works in integrity, privacy, or authenticity, or any engineer who touches that part of their product. And it's just going to become a bigger and bigger part of everyone's life. So, understanding what it means but also understanding, like, these aren't marketing words. They're actually academic technical words that come with a very specific definition of proof.
How do I establish trust? How does a thing attest to its property is being true? How do I assert...? How do I distinguish the difference between the two? How do I make sure these things are cryptographically backed? So I think the...
And there's a lot of topics that we can talk on this general area, but the area of trusted computing is just becoming more and more important, right? TPMs, TPM 2.0, secure enclaves, these are just one example of hardware toolkits coming out of this world, right? On the software side, you know, getting back to Ken Thompson's paper, like, that was a lot of fun. Also, a lot of fun is this project that was the...
Actually the Ph.D. work of this graduate student, named Xavier Leroy, a French guy working out of Inria. Inria is kind of the Bell Labs of France, for those of you that don't know. And his project was really, "How do I prove that what goes in the front end of a compiler is actually what comes out of the back?"
And what he meant by that, he uses these fancy term called semantically equivalent and nothing more. And, you know, the easy way of understanding that is if you wrote a C program that said X plus Y equals Z, that was it. Whatever comes out the back, the assembly, the machine code needs to really be semantically equivalent, X plus Y still equals C, and nothing else. Nothing else is happening.
Now, he did this across a very large subset of the entire C language family and built this thing called a certified C compiler, that actually is in production use now in, like, the automotive industry, the aerospace industry, and a couple of other places where essentially failure means death. So, there's a lot of developments that have been happening over 2010 and even last 5 years of pushing trusted computing tools and concepts into the more mainstream.
As far as all of the security center breaches that you're talking about over the last year concerned, you know, there's things they can do immediate, which they're doing, right, like, putting mitigations in place, doing reporting. They're also under these interesting constraints, right? A lot of these are public companies, they have reporting obligations. Like, there's a lot of other dynamics that's forcing their behavior, but maybe medium and long-term.
They need to be looking for structural solutions to these problems, not mitigations. So when we say a structural solution, we actually mean a very precise thing. We mean changing the framework, changing the architecture, changing the structure, or even the math in such a way to where certain bad things are not possible to happen, right?
So, for instance, when you create a key in a TPM with an attribute that says the key is not allowed to leave the TPM, then structurally I have a guarantee that that key can't leave the TPM shorter the TPM being destroyed. It would never end up in memory, right, if it can't leave the TPM.
If it's never in memory, it can clearly never end up in the file system, right? If it's never on the bus, it's never actually leaving the device. So that's what we mean by structural solutions. Like, these companies need to really be looking for structural security solutions coming out of the trusted computing world to solve some of their problems.
Or else failure equals death. That line definitely stuck with me. And I think that if we look back to 2022 and all of the attacks, and then this, it's a mounting pile of evidence that failure to change and look at the problem from a new angle is going to result in some kind of death, whether it's the death of a company, the death of a software, the stakes are very high.
Yeah, on one end, it would be the death of trust with your customers, on the other end, you know, like, automotive, aerospace, medical, like, it's legitimate, literal death.
But to that end, do you have like any...? Like, what would be...? I'm putting you on the spot, like, like, when you're looking at something critical in, like, a development pipeline...? Like, since you have a lot of background in compilers and languages and stuff...
When I look at a compromise of a CI/CD platform, it feels like unlimited potential damage and nearly irrecoverable conditions for people who are downstream. When you're talking about those safety promises, those feel like they're only really practical at the software provider level at the top level.
And then there's going to be sort of a spectrum of users of the platform and their abilities to manage it. Like, what does a low or medium-skill user of something like a CircleCI or a breach CI/CD platform do in this scenario?
So, I would actually direct anyone listening who's truly interested to take a look at SLSA, that's S-L-S-A. It's an existing open-source framework. And, you know, like all frameworks, it doesn't have solutions to all your problems, but it helps you break down the big problem into smaller problems, right? So when you look at the core of any software organization, how does it build its product and get it to its customers?
You have source code, you have third-party dependencies, you have build system, you have distribution, right? So, of those four components, how do you manage the integrity of your product? And when I say integrity, I mean in the cryptographic sense, right? Like, how do I know that what went in is exactly what came out? How do I know that what went in came from the people I expect and no one else? If you set out and you say, ''All right, I want Xavier Leroy's answer to this problem," you're going to struggle, and probably your head will explode, right?
Because you can't bring that to bear on day one. It's a great long-term goal. But day one, take your build process and break it into those four parts, right? What's my source control? What's the integrity over my source control? How do I know that's true, right?
And this is not necessarily for a low-skilled audience. I would say this is medium and high-skilled audience. Like, this is not the everyday developer. This is your system developer, your system engineer, your SRE, your dev is engineer. Like, whatever title happens to be in Vogue that day, it's the person who actually manages your infrastructure. And they're going to have answers to these questions, right, but you also kind of have to prioritize the work.
How do we take our build system? How do we segment into these four phases? And how do we answer questions around authenticity and integrity of the product that goes in the front, comes out the back? If I put a pork shoulder in one, I want to make sure that only it is coming out into the sausage casings in the back and nothing else. No boots. Clearly, it's a longer more detailed discussion than we have for a podcast.
But there is a way of making traction.
Well, that inspired me to go eat a bit of pork now. I'm hungry.
But no boots.
No boots. I mean, the leathery flavor might be nice, but the texture would just be a nightmare. So thank you guys for talking about CircleCI. I think that there is a lot of work to be done to prevent things like this from happening. But Jasson is on the right track when talking about Trusted Computing going mainstream, and we're looking forward to making that happen.
Tune in for the next episode, and we'll see you then. Don't forget to smash that Subscribe button.