Dangerous vulnerabilities in H.264 decoders.
Dave Bittner: Hello everyone and welcome to the CyberWire's Research Saturday. I'm Dave Bittner and this is our weekly conversation with researchers and analysts tracking down the threats and vulnerabilities, solving some of the hard problems and protecting ourselves in a rapidly evolving cyberspace. Thanks for joining us.
Willy R. Vasquez: So, the idea for this work actually started from a class project from other students back in 2018. Essentially what they did is that they were trying to fingerprint video decoders on the web.
Dave Bittner: That's Willy R. Vasquez, a doctoral student at the University of Texas at Austin. Today, we're discussing his research "The Most Dangerous Codec in the World; Finding and Exploiting Vulnerabilities in H.264 Decoders."
Willy R. Vasquez: And so, they got an MP4 file and they just created a Hamming Distance 1 video, so what they did is they got the video file and then flipped the bit at every single point inside of the file and then just played it back to see what would happen. And while doing this, they found that some videos were able to leak out contents of previously decoded videos. So, yeah it was it was kind of weird that, you know, so video decoding is supposed to be this deterministic process, right? Whatever you put in you should always get back the same results, and so they noticed every time they decoded these malformed videos something weird would pop up. Now, so they did this for a class and wrote up the report, got some good grades, but they didn't continue on with the project. Then in 2019, whenever I was looking for some--a research project to work on my advisor, Hovav, suggested following up on this work and exploring more into the video decoder states. So, the first test was figuring out why this specially crafted video played differently each time. And in doing that work, we started to find a lot more fun things to explore in the H [multiple speakers].
Dave Bittner: Yeah.
Willy R. Vasquez: Explore space.
Dave Bittner: So, just for my own sort of background here, I mean in a previous world I was in the desktop video world and I remember a lot of these codecs coming to be. I remember we dealt with things like CoMotion JPEG and H.263, and then H.264 came out and it was kind of like this universal codec that had the capability to contain all sorts of different things, but it had a high overhead in terms of processing power. My recollection was that a lot of this stuff was baked into hardware at the time, you know, you would buy a video camera that would encode to this and the encoding was on chips. Can you give us a little bit of the backstory here on what exactly is going on when it comes to compression and decompression with a codec like this?
Willy R. Vasquez: Yes. So, all our commodity devices do have specialized hardware as you described in order to encode and decode videos. Either they come as a part of GPUs, and video A and B or as a separate coprocessor system on chips on devices. So, how this process usually works is you have an MP4 video file that contains the H.264 encoded contents and this MP4 file is just a container format, you need to have different codecs inside. So, the browser will first parse out the MP4 and then send the encoded contents down to the operating system which would prepare the hardware to receive the encoded video and then the hardware will likely construct each frame that you'd see.
Dave Bittner: So, one of the things that struck me in reading your research was that, and correct me if I'm wrong here, but the H.264 standard is only laid out on the decode side. Do I have that correct?
Willy R. Vasquez: Yeah, that's correct. So, the video encoding is actually a search problem. So, how codecs work, is that they find similarities within frames of a video and also across frames. And so, encoding is this search problem of finding these similarities, and they the encoder will jot down the instructions to recreate each frame and that's what the codec specifies. So, whenever you get your encoded video, the H.264 specification tells you how to take these instructions and reproduce an image, but encoding is all search-based and it's a lot of it is proprietary and patch and filled.
Dave Bittner: I see. So, just to clarify that, I mean what it means is that the specification says this is what we're going to do on the decode side, but the folks who are doing the compression on that side of things, it's kind of the wild west on that side; they can do whatever they want as long as it meets the demands of what the decoder is expecting to see?
Willy R. Vasquez: Exactly, yeah. So, the--as part of the specification, there is all these profiles and levels, profiles detail, what kind of features to use, what kind of features are used when compressing the video and then the level is an estimate for the expected bit rate for a playback of a video. So, decoders are meant to satisfy that and encoding tries to, you know, reach a particular bit rate.
Dave Bittner: And on the decoder side, how much documentation do we have here? Is it laid out in a very specific and overt way, or is there black magic there as well?
Willy R. Vasquez: So, for the most part, there are many open source implementations. I think the most--the most well-known one is OpenH264 by Cisco and that is the--that is the H264 decoder actually used in Firefox for WebRTC. So, and also the people that create these specifications create reference encoders and decoders to compare your own custom decoders there. So, there's a lot of companies that create their own decoders, and I think that's some of the problems that we were able to identify the heterogeneity of the ecosystem of decoders.
Dave Bittner: Well, let's dig into the actual security issues that you found here. Can you walk us through your research process and how you discovered things?
Willy R. Vasquez: Sure. So, as I previously mentioned, we wanted to understand why that especially crafted video was decoding differently each time. So, for to understand that it was a lot of time spent on the reference decoder and also just looking at the H.264 spec and understanding what each item means. So, how we got started is by trying to understand that video, that specially crafted video, just starting into the spec and looking at the reference decoder. And at the same time, to get a better understanding, I also began to write a decoder in Rust. This was the base of what would later become H26Forge. So, by looking at the spec and then understanding how the difference in text elements work together, and I should go back and say that, the H.264 spec describes video reconstruction instructions using these things called syntax elements and so these are variables to tell the decoder what--how to reconstruct the image and each syntax element is expected to have a particular range. These are known as the H.264 semantics. So, what was going on in that compressed video is that one of the prediction modes, the semantics was way off.
Dave Bittner: So, we've got this odd ball file that's making the decoder behave or misbehave and in an unpredictable way which, you know, anybody who works with computers would be like, wait a minute this should be, you know, it should repeatable, right? So, how do you dig into that and explore what's going on here under the hood?
Willy R. Vasquez: Yeah, so the first thing that we did was try to run that video under the H.264 reference decoder and see where that crashed, and that gave us an inkling of what part of the spec to look at. And then in understanding the spec, we found different areas that could be interesting to look at from a security point-of-view. So, there are many cases in which a variable is read in and then that's used as the loop bound and we, in understanding that video, in understanding the codec, we built out this tool that became H26Forge and first started to generate videos that had out of bounds ad text elements and just ran it on devices to see what was going on.
Dave Bittner: And what did you discover when you started messing with these files?
Willy R. Vasquez: So, at first we discovered a bunch of older broken code in some Android devices. So, what we were interested in is looking at how different decoders interacted. So, depending on the output that you get, you could actually identify what decoder was being used and you could use this as a kind of web-based fingerprinting based on the output image and in generating these videos and running it on devices, we also created different heuristics to identify videos of interests.
Dave Bittner: Were you trying to sort of stress test the codec to see, you know, if we do this it will break here or isn't this an interesting way that it reacts if we mess with it this way?
Willy R. Vasquez: Yeah. So, we just started generating randomized videos, playing it on devices and seeing what would happen. So, some of the heuristics that we were looking at is, does the device turn off whenever we--after we play this video? So, that's very interesting to see. Second was if we decode the same video multiple times, do we get different outputs? So, that was also something worthy of investigation, and also just looking at interesting log messages. And in the testing Android devices, yeah we found a couple of issues in the hardware decoder. We were able to understand that one weird video; essentially, how prediction inside of a particular frame works is that the frame is broken up into 16 by 16 pixels and it looks at the edges to see to copy down information to create a prediction of the frame. And what we found is, on the top most part of the frame, if you tried to predict up, there shouldn't be a frame there. But we were actually able to get pixels from previously decoded videos and so that's what was going on in that video. It was reading stale information inside of the decoder. It wasn't resetting each time.
Dave Bittner: Wow. Tell me about H.264 itself. Is it easy to work with? Is it challenging or where does it stand?
Willy R. Vasquez: So, there are two challenges when trying to work with H.264 and coded videos. First, is that the values are encoded at the bit level, meaning that traditional pleasures like AFL, couldn't set a particular syntax element to a chosen value. This is because AFL tries to work at byte level granularity, so that's one issue, and then the second issue, is the cascading effects between syntax elements. So, if you change one parameter, that's going to change how the rest of the video is decoded. So, what our tool aims to do is just change one particular element, but keep everything else the same or, in other words, it tries to make sure that the specific values are decoded correctly and it's more on how the decoder uses those values where the issues arise.
Dave Bittner: You know, what's interesting to me, again, you know reaching back in my memory of just from the video side of things being a video producer, I remember, you know, year after year the big camera manufacturers, the Sony's, and the Panasonics of the world, their quality would get better each year using, you know, the same bit rate, same codec, but somehow whatever they were doing in there it would get better and better each year. And I guess that sort of speaks to some of the research that you've done here where they had a lot of flexibility or leeway on what they could they do on the encode side.
Willy R. Vasquez: Yeah, correct. They on the encode side, there's newer patents, there's faster chips, and so they can utilize different features of H.264 codec and, also I should say that, the codec itself has not remained dormant since it first came around in the early 2000s. Every so many years, they keep adding new stuff to it, new updates, so I think something interesting for security researchers is, you know, wherever there's new code there's likely new vulnerabilities, and so, I think our took can help explore those issues as well.
Dave Bittner: So, what are the potential issues here? As you all dig into it, to what degree is this actually a real world security concern as opposed to, you know, an interesting finding from a research point-of-view? Where do we stand there?
Willy R. Vasquez: We all believe that this is a very important area to do research in especially given that some of these issues in video decoders are being exploited in the wild. So, in the paper we talk about a root cause analysis done by Natalie Silvanovich of Google Project Zero for an Apple video, H.264 video decoder kernel bug. So, I mentioned before that we were doing a lot of work on Android, but then once we learned about this in-the-wild vulnerability, we were like okay let's get some iPhones and began poking around there and we were able to find a lot of issues inside of older Apple video decoder. So, yeah we think it's really concerning that these kinds of issues are being exploited in the wild given that there's a possibility for zero click exploits. So, someone just sends you a video and while a thumbnail is being generated, that goes to the same video decoding pipeline, so the vulnerability can be there and you may not even notice or alternatively you're just browsing a web and you get this video in an ad.
Dave Bittner: And it's fair to say, I mean, pretty much every bit of computing hardware that you get these days that has a display on it has some capability of decoding H.264 video.
Willy R. Vasquez: Definitely. I think as you mentioned in the beginning, that this codec has been around for a while. So, it's used by a lot of video companies as almost a default codec. It's assumed that every device can decode H.264, so you know, they'll experiment with newer codecs, but they always know that they can fall back to H.264. This is why we went ahead and said, you know, the most dangerous codec in the world.
Dave Bittner: Yeah. Do you suppose that that's a big part of this here? That, you know, H.264 I suspect the spec was probably certainly they were thinking about it back in the 90s I'm guessing and, you know, they probably weren't thinking about cybersecurity the way that we are today. Everything was hardware-based. It was a lot harder to do back then.
Willy R. Vasquez: Yeah. I think that the codec developers did have a good sense of the kind of issues that can arise, and inside of the specs, they do say that heyfor each variable, this is the expected range, but the challenge comes from, you know, the actual implementation of the spec, in which errors can arise, you know, people may miss a balance check and that can lead to the many vulnerabilities that we found.
Dave Bittner: This tool that you all created here, as you say H.26Forge, is that generally available? Can folks do their own work with it?
Willy R. Vasquez: Yeah. We're working on cleaning up the code and we plan to release it before August when this work will be presented at USENIX Security.
Dave Bittner: Alright, terrific. Willy, thanks so much for taking the time for us, fascinating conversation.
Willy R. Vasquez: Yeah, thanks Dave.
Dave Bittner: Our thanks to Willy R. Vasquez from the University of Texas at Austin for joining us. The research is titled, "The Most Dangerous Codec in the World; Finding and Exploiting Vulnerabilities in H.264 Decoders." We'll have a link in the Show Notes.
The CyberWire Research Saturday Podcast is a production of N2K Networks, proudly produced in Maryland out of the startup studios of DataTribe, where they're cobuilding the next generation of cybersecurity teams and technologies. This episode was produced by Liz Irvin, and Senior Producer, Jennifer Eiben, our mixer is Elliot Peltzman, our executive editor is Peter Kilpe, and I'm Dave Bittner. Thanks for listening.