PRIMO: Probabilistic ICC Modeling
Static analysis is a widely used technique for analyzing software, particularly in the context of security. Static analysis is often followed by a manual confirmation of any issues found in code under scrutiny. If the analysis produces many false positives, then the subsequent manual effort can become prohibitive. This problem is particularly severe in analysis of mobile applications. In Android, applications can reuse other applications' functionality through the use of Inter-Component Communication (ICC), a sophisticated, platform-specific message-passing system. For example, if an application requires dialing a phone number, it can send a completely generic message requesting that an application handle the dialing process. The message need not be targeted at a specific application. In addition, the same communication mechanisms are used to send messages within an application, for example to transition between different user screens. Unfortunately, if these interfaces are improperly exposed, by developers, they can potentially leak data or be used for privilege escalation.
Inter-component analysis computes ICC links between message-passing locations and potential message targets. It is important to avoid considering links that may never occur during an execution, since link imprecision propagates to analysis results that are based on ICC. Imprecise results have very limited usefulness, as they increase the amount of manual analysis needed to confirm any potential threat. Unfortunately, no current analysis technique can infer links in a precise and scalable manner. A variety of tools can statically infer the possible values of Intents, which are the main inter-component messages. However, static ICC analysis has inherent limitations that restrict its precision. In particular, Intents are composed of strings of characters that are sometimes impossible to infer precisely and efficiently. Even few imprecisions can result in an explosion of the number of potential ICC links at large scales. This is due to the fact that a conservative matching process has to consider all potential targets for an Intent when one of its field values is not known.
We overlay a probabilistic model, which is trained using domain knowledge of ICC, on top of static analysis results. We introduce PRIMO (PRobabilistic ICC MOdeling), the first system to triage ICC links based on estimating the probability that they are true positives. The PRIMO system requires no manual labeling of analysis results. Our probabilistic model takes into account many aspects of Intents and predicts the expected value of imprecise Intent fields. The model is guided by the insight that Intents are used by developers in predictable ways. More specifically, the patterns of Intent fields and expected targets are similar across message-passing code locations. Some fields may be ambiguously inferred by the static analysis, leading to unfeasible links. However, by utilizing the predictability of ICC patterns, we estimate the probability that ICC links may actually occur. Since computing ICC links is required for any inter-application analysis, we introduce a formalization of the Intent resolution process that is based on solving set constraints. Our formalism accounts for the case where Intent fields are arbitrary regular expressions, since imprecisely-inferred Intent values are expressed as such. We design an efficient algorithm to compute all ICC links in a large set of applications and analyze its average-case complexity.
In order to enable other researchers to apply our ICC analysis to a variety of problems, we make it available for download and we release its entire source code. Please see our installation page for instructions on how to install and use it.
Questions and Issues
Please submit any questions or issues to the issue tracker for PRIMO.
INSR | SIIS | CSE | Penn State | Copyright 2016 SIIS Lab