When it comes to open source software, it’s natural for development and security leaders to want to know that the code they’re using is secure. Historically, they’ve relied on traditional software composition analysis solutions and the National Vulnerability Database to mine for open source issues. Yet there is a little-discussed fact that open source begets open source. We know that developers use open source libraries to speed up the development process by adding ready-made functionality to their code. The libraries that they select and use are called direct dependencies, and often times, those direct dependencies have dependencies of their own.
Just like any other piece of software, open source libraries often rely on other open source libraries to achieve the desired functionality and goal. When developers choose an open source library, they may not be aware of the indirect dependencies they are stitching into their software. At Veracode, we’ve seen anywhere from two to more than 10 levels of libraries being called on, one after the other. Once you start assessing each level of library, the volume of vulnerabilities can skyrocket beyond your team’s ability to manage them.
Using software composition analysis is an amazing first step to solving some of this open source risk, but what happens when an open source library contributor fixes a security vulnerability and doesn’t tell anyone? Or the time between submission and publication, with an organization like the National Vulnerability Database, is too long to wait?
A Database Is Only as Good as the Data it Captures
The National Vulnerability Database (NVD), upon which most traditional SCA solutions rely, is a robust and widely used source of vulnerability data available today, cataloguing tens of thousands of vulnerabilities across all application types and open source libraries. While it is no doubt a valuable and necessary library of flaws and fixes, through no fault of its own, the organization is unable to keep pace with the volume of vulnerabilities disclosed and updated on a daily basis. Open source library vulnerabilities get stuck in a logjam behind everything else that is disclosed.
It’s important to note that vulnerabilities only make it into the database if a software developer or independent security research submits them. It’s common for a vulnerability to be fixed, but never disclosed or submitted to the NVD. For example, the Apache Struts Remote Code Execution vulnerability – the same type that led to the Equifax breach in 2017 – was disclosed to the public in August 2018, but was patched in April of that same year.
Four months is plenty of time for malicious actors looking to take advantage of vulnerable software. If they were monitoring the commit logs of the library, they would have been aware of it before organizations could update to the latest version of the component.
Machine Learning and Natural Language Close the Gap
Machine learning technology has the ability to automate the identification of potential security vulnerabilities from commit messages and bug reports. In open source projects, bugs are typically tracked with issue trackers, and code changes are merged in the form of commits to source control repositories. If an organization is able to monitor all of these repositories, and review each new bug issue and commit message, they could identify potential vulnerabilities. However, there are tens of thousands of open source repositories, with hundreds of thousands of bug tracking issues and commit messages to comb through, with new ones hitting every day.
Natural language processing and real machine learning can identify potential vulnerabilities in open source libraries with a high level of accuracy. By analyzing the patterns found in past commit messages and bug-tracking issues using machine learning, our model can identify when new commits or bug issues resemble a silent fix of a potential vulnerability. These potential vulnerabilities are then raised to security researchers.
These silent fixes can be a silent killer for your data protection.
Modern Software Composition Analysis Designed for Modern Application Development
We have developed our own database that includes all of the open source vulnerabilities in the NVD, as well as our own list of vulnerabilities in open source libraries that have not yet been disclosed to the NVD. In many cases, the vulnerabilities we find and record have either not been disclosed yet and are in the time between patching and full public disclosure, or in some cases, there was never any intent to disclose the vulnerability and its fix. There is a third category we track, which are “Reserved CVEs.” We take the Reserved CVE IDs from the NVD and then find the vulnerabilities in the public repos, in order to give you a head start on the fix prior to full public disclosure.
To learn more about how to use these silent fixes to your advantage by putting your development team on an even playing field with attackers, download our free white paper, Accelerating Software Development with Secure Open Source Software.