Can one approach to application security solve all your problems? Of course this is a silly question as anyone who is tasked with reducing the risk of their application layer knows. The only people who ask this question are vendors … who of course have a vested interest in drumming up business for their offerings. This week we’re all treated to watch this spectacle play out in the pages of Dark Reading, loosely disguised as a discussion about a new industry benchmark. While vendors sling arrows at each other, the benchmark itself isn’t getting much attention and I think it would benefit us all to focus on what’s important here: the benchmark.
If you haven’t been following the drama, over the past few days, the general manager of HP’s Fortify division, Jason Schmitt, and the CTO and Co-founder of Contrast Security, Jeff Williams, have been in a tit-for-tat argument over this question. In a post published yesterday, Williams points to a new benchmark from OWASP as a good way to objectively evaluate the strengths and weaknesses of different application security tools.
It’s no surprise that Williams is promoting the OWASP benchmark as his company’s application testing tool performs well given the parameters that OWASP created. I understand his position – and pressure to drum up interest in the one technology he sells – since 10 years ago I was in a similar situation and Veracode only had one offering to sell. With experience over time, I and the industry, have come a long way in our thinking about how to solve the application security problem. No longer are questions like, ‘to use on-premise or SaaS’ an issue. Similarly, businesses aren’t asking for SAST, IAST, DAST, etc – they’re asking, ‘how do I solve my problem’ and the right answer is, ‘with a little bit of everything, depending on your environment’. Any other answer puts the industry back 10 years (in my opinion).
Benchmarks are important and I applaud the work OWASP has done. Yet it is still early days for the benchmark, now in version 1.2 beta. I expect it to become more accurate and the results presented more fairly in the future as other application security vendors and security experts dig in and help improve it.
Before I talk about the benchmark, I want to be transparent. Veracode believes that IAST (Interactive Application Self Testing) technology, such as that sold by Contrast Security, has a role in application security programs. That’s why we partnered with Contrast earlier this year to resell and integrate their technology into the Veracode platform, and expand the range of solutions available to our customers. Each analysis technology has its own strengths and weaknesses—which is why Veracode offers static, dynamic, IAST, mobile behavioral, software composition analysis, web perimeter monitoring, and manual penetration testing.
No one technology is a silver bullet. For example, IAST is a poor fit for applications written in JavaScript, non-web applications, mobile clients, legacy applications in COBOL or ABAP, PHP applications, and completely misses many vulnerability categories such as hardcoded passwords, time bombs, and others.
Improving the Benchmark for the Betterment of All
With respect to the OWASP benchmark, I welcome the effort to create a usable guide that can look at different security technologies. However, there are some issues in the current implementation of the OWASP benchmark that need to be disclosed. The most egregious is that it tries to force all tools into a box and test them against similar circumstances and this isn’t how real-world application security works, nor what IAST, SAST, DAST, etc., were designed to do. For example, in order to test IAST technologies, the benchmark comes with a pre-configured script to exercise every input form in the application, thus yielding a much higher code coverage percentage for IAST than is realistic. We’d all love it if we were able to write tests that looked precisely at where every vulnerability is because we could see the rulebook on every application! For the scoring of IAST to be compared to that of other techniques it need to be adjusted for the real-world code coverage that security testers can obtain. For instance, if 65% is a real world coverage expectation. IAST scores should be multiplied by .65 to in order to be compared with SAST and DAST which are required to drive their own coverage. People using IAST can look at the coverage they are getting in their own test environment and use their own multiplier to compare.
I have also seen comparisons being made between different testing tools where the version of the OWASP benchmark is not being disclosed. Some of the tools have been tested with OWASP Benchmark version 1.1 which contains 20,000 test cases. Version 1.2, currently in beta, was slimmed down to only 3,000 vulnerabilities. In the words of the OWASP project, “this was done to make it easier for DAST tools to scan it (so it doesn't take so long and they don't run out of memory, or blow up the size of their database)." This is reasonable but you can’t simply compare scores between the two benchmarks. It’s like having two Indy race tracks, one 21 miles long and the other three miles long (for cars with smaller gas tanks), and simply dividing the lap time of cars that ran the 21 mile course by seven to compare to cars that ran the 3 mile course. This would be disingenuous. All tracks are different and so are all sets of test cases.
I have a concern with the OWASP benchmark scoring as well. I don’t agree with the scoring process where the score is true positive rate minus false positives rate (score = TP%-FP%). It is much more important to be able to detect a vulnerability than to reject a false positive, to a point. I am going to recommend to OWASP that TP% and FP% be reported and not combined into a final score. This way there is more information presented and customers can make up their minds about the FP rate their risk posture and resources can tolerate. For instance if a test has a TP% of 65% and FP% around 35%, instead of just comparing a score of 30 to compare test results look at both numbers. That paints a more realistic picture of how a testing technology will perform.
It is going to take more than one automated technique and manual processes to secure your applications. Gather the strengths of multiple testing techniques along the entire application lifetime to drive down application risk in your organization. SAST doesn’t require a fully functional system with test data and automated test suites. DAST doesn’t require modifying the production environment, let alone finding a server and the admin to modify it. Because of these strengths SAST can be used earlier in the development cycle than both IAST and DAST. DAST can be used easier that SAST and IAST in production.
Once developed, the OWASP benchmark has the potential to be a valuable tool for companies struggling with application security challenges. In its current state, it does everyone a disservice by indicating there’s a silver bullet to solve their problems. We all know this to be a silly presumption. Application security is a difficult challenge and businesses need help to understand which techniques can and should be applied to the various aspects of the challenge. It’s time for the vendor community to stop being salesmen and start being advisors on a complicated issue. Software has eaten the world and attackers are gorging themselves on a seemingly endless supply of vulnerabilities.
Learn more about Veracode's approach to solving the appsec problem: