PRIMO: Installation and Usage

Before installing PRIMO, you need to make sure that you have Python 2.7, SciPy, NumPy and python-gflags installed.

Installation

First, start by downloading the latest version of PRIMO. Then extract and install with the following:

% tar xf primo-0.1.0.tar.gz
% cd primo-0.1.0
% python setup.py build
% python setup.py install
The last command is optional, since the code can be executed from the primo-0.1.0 directory without installing.

PRIMO uses Intent and Intent Filter values computed using IC3 in the form of protocol buffers. They can be generated by using the -protodir option in IC3. Note that by default, PRIMO only computes statistics about computed links. In order to build an analysis on top of PRIMO, you need to make calls to the PRIMO interface. The interface to PRIMO is quite straightforward. It includes the linking.find_links module, which declares the FindLinks function. It takes as inputs a list of input paths and options and outputs a tuple with the ICC links, the component objects, the Intent Filters, the applications and the Intents. The find_links module is compiled as a shared object (.so) file, but it should be possible to simply import it (see the primo.py file for an example).

Function FindLinks returns the links in the form of a Python dict. In this data structure each key is an Intent object and each value is a tuple with two elements. The first element is a list of target objects (either Component or IntentFilter). The second element is a list of link probabilities given in the same order as the corresponding targets. If the --nocomputeattributes option is used, then instead of a tuple the value is only composed of the list of targets.

Usage

If you are interested in link probabilities and statistics about these links, PRIMO can be launched with:

% primo.py [input] [options]

Valid input arguments include the following:

  • --protobuf <path to protobuf file> - can used multiple times.
  • --protodir <path to protobuf directory> - can be used multiple times.
  • --protobufs <paths to protobuf files (comma-separated)>.
Other options include:
  • --skipempty: Skip empty Intents.
  • --dumpintentlinks: Dump Intent links (see below for format).
  • --nocomputeattributes: Do not compute link probabilities (makes overall processing faster).
  • --validate <k>: Perform k-fold cross validation of the probabilistic model.
When using the --dumpintentlinks option, links are output as a NumPy array compressed with bloscpack. See https://github.com/Blosc/bloscpack#numpy for information about the format. Each row in the array corresponds to a potential link between an Intent and a Filter. For each link, the following are provided, in this order:
  • The Intent ID.
  • A Boolean indicating whether the Intent is explicit (1 = explicit, 0 = implicit).
  • A number indicating whether one of the data fields is set. Value 0 indicates that there is no data, 1 implies that there is data but at least one of the data fields is a regular expression (i.e., imprecise). Finally, 2 indicates that there is data and all data fields are constants.
  • An unused value.
  • A Boolean indicating whether the Intent has its extras field set (1 = has extras, 0 = no extras).
  • The ID of the target (either a component ID or an Intent Filter ID).
  • A Boolean indicating whether the target is protected by a permission.
  • An integer between 0 and 100 that represents the probability that the link is real. To obtain an actual probability value between 0 and 1, divide by 100.
  • A Boolean indicating whether the link is intra-application (1 = intra-application link, 0 = inter-application link).
The reason for using a NumPy array is that it allows us to store each link with 15 bytes. As a result, we are able to compute and store over 600 million links on a single machine.

It should be noted that in its current form PRIMO only computes links and their probabilities. It does not currently output the links in a format that can readily be used by client analyses. In particular, the mapping between Intent ID and Intent is not output (neither is the mapping between target ID and target). It should be relatively straightforward for client analyses to add the necessary code. See https://github.com/siis/primo/blob/master/linking/intents.pyx#L75 for the mapping between Intent and ID and https://github.com/siis/primo/blob/master/linking/components.pyx#L53 and https://github.com/siis/primo/blob/master/linking/intent_filters.pyx#L78 for the target IDs.

PRIMO is developed using Cython, which extends Python with C-like typing and transforms Python code to C. If you wish to modify the Cython code in PRIMO, please follow the instructions on the source page.