What To Look For In CVEs
Dan Mellinger: Today on Security Science, CVE data is often misinterpreted; here's what to look for. Thank you for joining us. I'm Dan Mellinger and with me today is Jerry Gamblin, who's learned to speak the language of CVE, and is going to teach us what to look out for in CVE data to make better prioritization decisions. What's up, Jerry?
Jerry Gamblin: Not much, how's it going? Good to be back.
Dan Mellinger: Yeah, it's been a little while. Glad that you're getting to travel a little bit lately, getting back out on the road post- COVID, right?
Jerry Gamblin: Yeah. Just finished up at AWS re: Invent and looking forward to taking my last and third trip of the year, this next upcoming week. So it'll be good.
Dan Mellinger: Woo. Well, it's nice to have you back on the show. This is actually a pretty cool topic. So we're breaking down a lot of the misconceptions with common vulnerabilities and exposures, and so we'll get into it. But I wanted to start with a little bit of a primer on CVEs, just in case this is anyone's first time listening to the podcast or trying to get into security in general. So the mission of CVE or Common Vulnerabilities and Exposures, the program itself is to identify, define, and catalog publicly disclosed cybersecurity vulnerabilities. So there is one CVE record for each vulnerability in the catalog. The vulnerabilities are discovered and then assigned and published by organizations from around the world. So they're partners with CVE and we'll get into that, because that also comes with some downsides. The partners publish these CVE records to communicate some consistent descriptions, which can be true or maybe not, vulnerabilities. And then information technology and cybersecurity professionals use CVE records to ensure they're discussing the same issue and to coordinate their efforts, prioritize and address vulnerabilities. So all of that is the goal. How effective all those little components are in reality are actually what we're talking about right now. And then I was just going to start with this little intro. So this is really based on a byline that you wrote, Jerry, for Dark Reading and it's called CVE Data Is Often Misinterpreted: Here's What to Look For. So I'm going to read your intro and then we'll kick off from there. And we will link that article on the podcast page. Most people only ever give common vulnerabilities and exposures a passing glance. They may look at the Common Vulnerability Scoring System or CVSS score, determine whether the list of effective products is a concern for them and then move on. And that's not really surprising. There's more to sift through than ever, considering there have been more than 14, 000 CVEs and counting, just published in 2021. It isn't practical to try to investigate them all. So we are on pace to see nearly 40% more CVEs in 2021 than last year. So when you do see a CVE that might apply to you, how can you tell? What should you be looking for and what should be you be looking at to determine if it's worth your time? Unfortunately, you can't just read the title of the CVE and know whether it's safe to ignore. Within CVE data, there are actionable details that can help address your security concerns, including auxiliary data points like common platform enumeration specifics. It requires a little bit of extra work, but there could be a big payoff if you identify and patch of vulnerability before it's exploited. So let's dig deeper into this. Jerry, where do CVEs come from?
Jerry Gamblin: From the vendor, from researchers. So CVEs are the general reporting guideline of a vulnerability. They are ran by MITRE for the US government. It started 21 years ago in 1999. They decided at Purdue to put this together; I think we talked about that at an earlier podcast recording. They really are supposed to be the final resting place, the final public disclosure for vulnerabilities. So everybody can go to one place and find the vulnerabilities and know that that's the source of truth. It's weird. Before CVEs the data was everywhere, and for 10, 15 years up until about two years ago, the data was always in CVE. And that was the good point. And we're stretching away from that. And we'll get into that a little bit more as we go on, but there's so many data points now CVE is getting, for lack of a better term, clogged up and a little behind the times.
Dan Mellinger: Interesting. So before'99, it was just shared, right? It was through the community, it was on... What are they called, message boards?
Jerry Gamblin: Bulletin boards. Yeah, BBS's.
Dan Mellinger: Yeah. So it was all over the place and the whole goal of this was to unify things. So it least people had a database of everything that's been discovered, roughly using the same characteristics to describe stuff, right?
Jerry Gamblin: As a history geek, I like to talk to people and tell them that it's the Rosetta Stone of security, is that it's the common language. So no matter what other language it's in, if you could get back to a CVE, you know you're going to get the details you want and from there you can spread out and read it in different platforms.
Dan Mellinger: Yeah. And it was interesting because to your point, we've done some research on the P2P reports and shown growth over time. And then we've done a podcast episode talking about that as well. But at first it was relatively low volume, so pretty easy for people to manage. And that is absolutely not the case anymore. So CVE has partnered with these CVE Numbering Authorities, right? So CNAs.
Jerry Gamblin: CNA. Correct.
Dan Mellinger: And their whole goal is basically they're vendors and research groups that are pre- validated to work with CVE and input data directly, right?
Jerry Gamblin: Yeah. So they can directly put data into MITRE. So they're able to fill out and have nearly direct access. So they become the people who check the data before it's pushed up to MITRE. It was supposed to make the system more streamlined and take some of the pressure away from a contractor for doing all the work onto more of the end user, of the end group.
Dan Mellinger: Which is interesting, because we did note a massive spike when CNAs started to become a process. CVE wasn't able to keep up with the volume of submissions so they were being published late. They rolled out the CNA process, and that year we saw things spike by a third in volume, I think, compared to the years before. And it hasn't slowed down. In fact, it's continued to increase every single year.
Jerry Gamblin: Yeah, we're seeing it again this year. We'll put the data in the blog post hopefully, but this year, the number one submitter of CVEs is GitHub at 1002.
Dan Mellinger: Wow.
Jerry Gamblin: Microsoft, their whole team has 803. And then the next one is WPscan. com, which just finds vulnerabilities in WordPress. And they're at 691 as of today.
Dan Mellinger: None of that honest sounds like that much of a surprise.
Jerry Gamblin: No it doesn't. But if you think about it, that Microsoft is in the middle there, but the fresh one is all open source that people from GitHub can file a CVE. Right now, if you go and open a security issue on any repo in GitHub, you have the ability to request a CVE for that issue and it'll go through and it'll score and automatically create it.
Dan Mellinger: Ah, that's interesting. People are increasingly publishing vulnerabilities to GitHub as well. We did a piece on that with Jay Jacobs earlier this year.
Jerry Gamblin: Yeah. Vulnerabilities' proof of concepts. GitHub is becoming that funnel where it not only hosts some of the most popular code in the world, but it's also becoming a very, very research heavy place where you see proof of concept code that lives there and is widespread.
Dan Mellinger: Interesting. You raise a good point, as well. So having this volume come into through the CNAs is ultimately, it's good to have this stuff out there, but it results in some challenges. So what are some of the downsides of having this many CNAs, this many, the ability for these companies to basically directly submit their own vulnerabilities, researchers to essentially automatedly create CVEs as they're uploading to GitHub?
Jerry Gamblin: The data is bad, sometimes. It might not seem like a lot, but a 1 or 2% error in data in the CVE database can really cause headaches for some people. If you're a Microsoft Windows user, you might have switched to Edge Browser, which is a Chromium based browser. Microsoft always did a really good job on CPEs for Window's Internet Explorer. So you could say, I have this version of Internet Explorer and you could go to the NVD database and find out all the open CVEs. They've really struggled and hit the fall on their face a few times this last year on trying to get the CPE data correct for Edge. So that's one of the main examples. So if you have a version of Edge and you're trying to find the open CVE inaudible, the CPE data today isn't accurate.
Dan Mellinger: Interesting. And that just makes it harder to import data and do a better analysis, right?
Jerry Gamblin: Yeah. Think of it this way. A lot of people, and not even people, everybody does the same thing. If you're a big company or security vendor, what you do is you say," Okay, what version of Chrome are you running? I'm running Chrome Version 82." So the first thing you do is you find the Common Platform Enumeration, the CPE of that string. It'll just say, here's the vendor, Google, here's the product, Chrome, here's the version 82. And then it has some other information. You can then take that data and then run it through the NVD. And then one of the data points there, a lot of people overlook, it'll then give you back a list of all the open CVEs for that version of the software. It's a way to reverse look up in the NVD database to be able to see what vulnerabilities are on the software. That happens when you don't do a good job and you just say this CVE is for all versions, no matter when it was published, if you go and look for it today, that's the case with Edge, is that you'll get a hundred CVEs and there's no way to put in a version that is closed.
Dan Mellinger: Interesting. And just for everyone listening, NVD stands for National Vulnerability Database. I think you have a fun quote here that makes the NVD more like a best effort and less like the source of truth that it was intended to be.
Jerry Gamblin: That's what it's becoming. And you're starting to see some data slowness and some data issues. Once we got to turn here, VMware is having issues, getting their stuff into the NVD in time. We've seen that a couple of times this year, VMware will put out their publication that says," Hey, there's a major vulnerability here. Here's what the CVE is going to be. Here's the CVSS score. Here's the patch." And we've seen those sit on VMware's pages for two or three days before they've ended up through the flow and published a minor and then NVD. So anybody who's waiting like Kenna Security, like Cisco, who's waiting on it to be in the NVD, is a little bit behind now just because we're waiting on that source of truth. So us as a company have started to try to move more and more to the Edge to try to break some of this stuff. So we're talking to VMware, to Microsoft, to most of the major CNAs directly now and trying to pull on that data as soon as they publish it on their site.
Dan Mellinger: Interesting. So just because there's a data lag and lag can leave people vulnerable if a vulnerability's published and there's a few days in between when it actually hits the databases that most companies use for look ups to base their programs off of. That's very, very interesting. You also talk here as well about how even the CVE description fields are limited and they can limit the accuracy of the data that you get from them as well.
Jerry Gamblin: Yeah. CVE description fields are 500 characters length and explaining what it means correctly and openly so that you can either read it and understand it are, if you have someone smart on your team, like we have Michael Roytman who can write in an LP to try to pick out some keywords. I they're not written correctly or clearly, you can just miss the whole meaning. And it can go either way, it can be overly descriptive where it makes it sound like it's the worst thing in the world, or it can undervalue what a CVE is actually doing.
Dan Mellinger: Interesting. So there's supposed to be standards from that, but I think as a side corollary of automation and things like that, there's a little bit less rigor. So they're just suggestions, but not necessarily pulled through every time.
Jerry Gamblin: Correct.
Dan Mellinger: Which can make things difficult if you're trying to automate things like correlation, for example, and the description is Microsoft Word vulnerability.
Jerry Gamblin: Yes.
Dan Mellinger: Or overly descriptive and people get a little scared because I don't know. What's a good example of that?
Jerry Gamblin: Well, when it's overly descriptive, it can downplay. It could be like, this specific version of Microsoft Word does this remote code execution on this. And you just miss what; it over describes what the vulnerability is and you might not think it applies to you. We've seen that time after time again.
Dan Mellinger: Interesting. It seems like there was some good basis for why the data can be misleading or why you actually need to pay a little bit extra attention to this stuff. So let's dig in a little bit on some of the best practices here. I think you started with the 500 character description issue, talking about linking back to security advisories. So let's start there.
Jerry Gamblin: Yeah. So every CVE should have a security advisory inaudible comes from a CNA. And I always try to go and look at those and that's usually much more in depth. It's normally hosted on the vendors, on the CNAs website and it should really do a deep dive into what's actually happening. Maybe have some screenshots. Should have a link to a patch. Should have a link to maybe a discussion board or to how you can get support. If you think of the CVE page as the SparkNotes for a CVE, the advisory page should be the full novel.
Dan Mellinger: Doing your homework, instead of reading the Cliffs Notes. Got it.
Jerry Gamblin: Yes, exactly.
Dan Mellinger: That makes a lot of sense. And in this case it would be, let's pick on Microsoft because it's easy to do so, going back to their advisory, you'd click through the CVE, which should have a link to the Microsoft advisory, which they're pretty, really, actually very, very good at that. You go back and you can get all the details on the software versions, how it's implemented, how they found it, what the timeline was roughly, all those background details so you can really make a much better decision.
Jerry Gamblin: Yeah. Are they seeing exploits, et cetera, et cetera.
Dan Mellinger: Yep. Got it. And that's typically also where you're going to find patches for almost everything.
Jerry Gamblin: It should be. Yeah.
Dan Mellinger: Got it. Cool. And then there's also, you brought up VMware as well. So they're pretty good about putting out advisories. The lag time is a different story, but you advise, go check that out directly? And is there a list of companies that they should keep an eye out for, that post pretty good advisories, might have...
Jerry Gamblin: That comes down to the other part. When you start doing this on your own, you really need to figure out what you're running. And then figure out what those companies are. I teach a class, I taught it over at BSides London on using this open source NBD data to build your own alerting system. If you know you're on VMware on your stuff, I can help you build or we can actually link to a VMware scraper that we have that'll go out and scrape those so that you can keep track of it. There's no need of looking at the WordPress Scan site, which is the third biggest CVE publisher, if you're not running WordPress anywhere. So it really comes down to what's on your network and what you care about.
Dan Mellinger: Got it. So look at where you may be vulnerable, what you actually own, and then prioritize their importance and then go from there to build some tooling, possibly things like that.
Jerry Gamblin: And most of these companies like VMware has alerting. So if you're a customer, you can say," Hey, if you publish a new security advisor, you send me an email, maybe a Slack or a Webhook of some kind."
Dan Mellinger: Got it. Nice. And then you also go into length on CPE data and it seems like that might be a good way to help reverse look up and/ or build some plugs to automate some of this stuff.
Jerry Gamblin: Exactly. CPE data is some of the most important and most overlooked data in the NVD, period. It just allows you to know what version is vulnerable to this CVE, and it's so important because there are so many versions of software. I don't have the number in front of me, but we were talking about this recently, about how many versions of Chrome have been put out this year. I think it's been 30, maybe? 30 main versions of Chrome. It seems like every other week they're asking you to patch your Chrome. So it's one of those things that you really need to know that this CVE that came out affects this version of Chrome, Version 82. 5, but we've already updated past that so we don't have to worry about this CVE anymore.
Dan Mellinger: Got it. And CPE stands for a Common Platform Enumeration. So what is that exactly?
Jerry Gamblin: It's a version. It's a schema. And we'll link to a schema because it's easier, so it's a link to it. But what it does is it just tells you who the manufacturer of the software is, what the common name of the software is, and then what version of the software is vulnerable. So it would be something like Google, colon, Chrome, colon 82 colon, and then subversion point 82.5. And then there are about six more colons in there you can add extra data to. But those are normally those first three that you look for.
Dan Mellinger: So it's a universal way to basically enumerate the platform. So software version and all that good stuff. So whatever you're searching matches up with whatever you're actually looking for in owning.
Jerry Gamblin: Correct.
Dan Mellinger: So you talk about using some of these data points and plugging them into like a JSON Scheme, things like that. So let's talk about some of this automation stuff, because I think that could be pretty powerful.
Jerry Gamblin: Yeah. We support the EPSS. This is the data that they use. All of the key points that they use are in there are in here, is the vector, the network vector. Is this vulnerability network or local? Is this vulnerability or remote code execution? What's the likelihood? What's the CVSS score? It has all of those data points in there in the schema. And you can build your own data warehouse on there. If you're running in Amazon or Google, you don't care hear about local exploits because nobody's going into the Google data center and plugging USB drive into your Lennox Server, hopefully.
Dan Mellinger: Got it. Oh yeah, yeah. Yeah. Okay.
Jerry Gamblin: So it's just the ability to really filter down the noise. And that becomes pretty important. I think I just looked today and we're up to an average of 42 CVEs published every day. This year, I was hoping... Not hoping, that's the wrong word. My guesstimate was that it would be 50. So we're running under what I thought we would. But at 42 CVEs a day, nobody, very few companies, have the time to have somebody go through those and see which ones really affect them and which ones doesn't. That's why you either need to invest in a platform like Kenna. VI + that can help you, or you have to build your own automation to really help you dig through and make sure you're not missing those CVEs that are important to you and your company.
Dan Mellinger: Prioritization is key.
Jerry Gamblin: Yes, exactly.
Dan Mellinger: Yeah. Well, and then also you talk about one of the final challenges, that CVE records aren't required to include the versions that are affected, right?
Jerry Gamblin: Yeah. The CPE aren't required, which is terrible if you're trying to look and understand. The versions might be in the description, they might not. So a lot of times you just have to guess. I wrote a whole thing on my personal blog about this. There are bunch that are just wide open. There are CVEs for IE from 2016 that has bad CPE data. And no matter what version of Internet Explorer you're on, it's always going to show up as vulnerable to that. So that's just something that you just have to learn to ignore, because they're not going to go back and fix the data. It's really, really frustrating at some point that the thing is to figure out how to not make it frustrating and how to automate your way out of this. Because we're getting to the point where we keep growing these CVE numbers. We were at 40% when we wrote that Dark article, it's come down a little bit. We're probably in the thirties now. But we'll finish the year with 15,000 CVEs, which will be an all time high. And if the guess that I took through some modeling is correct, next year maybe over 20, 000. It just keeps growing and there's no stopping. And it's not at a bad way. That's what people ask me," Should CVEs be harder to get?" I don't think so, but I think it changes how we have to use the data. If you can't read the data individually and look for individual records, it has to be able to be automated and at large scale and fast. I think that's what companies like we do and I think that's what end users and big corporations are going to have to do to be able to stay on top of this.
Dan Mellinger: Yeah. It makes a ton of sense. Well, and ideally, the data would just be a little more clean and consistent to allow companies to be able to action them a little bit quicker.
Jerry Gamblin: I want to make sure that people understand this. The data is clean. It's probably 95% accurate outside of the box, which is great. Anywhere else, 95% data accuracy is clean data for most people. But when you're talking about vulnerabilities and getting hacked, getting that 5% cleaned up is really, really important for our community and for everyone who relies on the database to be clean.
Dan Mellinger: Well, if we look at some of our research, 2- 5% are exploited in the wild, so that margin of error, if there's an overlap there, that's not a good thing. And speaking of, the last piece in of guidance in your byline, was that there's actually an API to actually scrape some of this stuff.
Jerry Gamblin: Yeah. NVD as an API, they produce all these records in JSON that gets updated hourly. I have a bunch of Jupyter Notebooks that are on my personal GitHub and on the kind of GitHub that you can go and pull down and look at and to start looking at this data yourself, because the easiest way to get into it is start to visualize and start to poke around in it yourself.
Dan Mellinger: Awesome. I think that's good, best practices to lead off with. We will post links to the byline definition of CPE, the CVE list, Jerry's personal blog and his GitHub. If you want to learn more about Jupyter Notebooks, because Jerry seems to be particularly fond of those, you can tweet at him directly. His DMs are open. I'm sure we'll just...
Jerry Gamblin: Yep. Slide on in.
Dan Mellinger: Awesome. And then just a quick reminder as well, don't forget to sign up and register. You can get your( ISC) 2 Continuing Education credits as well for listening to the podcast. So Jerry, any final thoughts before we hop off?
Jerry Gamblin: No, thank you so much and have a great rest of the year.
Dan Mellinger: All right. Thanks everyone.
CVE data is often misinterpreted. Jerry Gamblin discusses why that is and what to look for to get the most out of CVE data.