Damien Octeau, Somesh Jha, Matthew Dering, Patrick McDaniel, Alexandre Bartel,
Li Li, Jacques Klein, and Yves Le Traon.
Combining Static Analysis with Probabilistic Models to Enable
Market-Scale Android Inter-Component Analysis.
In Proceedings of the 43rd ACM SIGPLAN-SIGACT Symposium on
Principles of Programming Languages (POPL), January 2016.
[ bib ]
Static analysis has been successfully used in many areas, from
verifying mission-critical software to malware detection. Unfortunately,
static analysis often produces false positives, which require significant
manual effort to resolve. In this paper, we show how to overlay a
probabilistic model, trained using domain knowledge, on top of static
analysis results, in order to triage static analysis results. We apply this
idea to analyzing mobile applications. Android application components can
communicate with each other, both within single applications and between
different applications. Unfortunately, techniques to statically infer
Inter-Component Communication (ICC) yield many potential inter-component and
inter-application links, most of which are false positives. At large scales,
scrutinizing all potential links is simply not feasible. We therefore overlay
a probabilistic model of ICC on top of static analysis results. Since
computing the inter-component links is a prerequisite to inter-component
analysis, we introduce a formalism for inferring ICC links based on set
constraints. We design an efficient algorithm for performing link resolution.
We compute all potential links in a corpus of 11,267 applications in 30
minutes and triage them using our probabilistic approach. We find that over
95.1% of all 636 million potential links are associated with probability
values below 0.01 and are thus likely unfeasible links. Thus, it is possible
to consider only a small subset of all links without significant loss of
information. This work is the first significant step in making static
inter-application analysis more tractable, even at large scales.