proponente: Daniela da Cruz, Nuno Oliveira
instituição/empresa: Checkmarx
tema/título: Applying Machine-Learning techniques for fast security preview
área científica: Machine learning, source code analysis, security vulnerabilities
local: Braga
curso de mestrado: Mestrado Integrado em Engenharia Informática
We want to analyze thousands of projects, and find some correlation between the results we find
(could be partitioned by language) and any specification of the project (size, framework, number of
files, some buzzwords appearing in the project, specific connectivity of files…) – the
specifications can be predetermined or (better?) set automatically by some machine learning
algorithm. The idea would be to be able to “meta-parse” a project, even a huge one, very fast
(seconds/minutes), then be able to asses if it’s a project with high probability to be
vulnerable, based on the correlation above. Customers with hundreds and thousands of projects (or huge repository) can choose which projects to
scan first, or scan at all, based on this information. As today it’s impossible to scan hundreds of millions of LOC, just to find 20 really problematic