5/12/2022
Research by:  
Jakob Lell and Daniel Corak

Extended Android security check: SnoopSnitch tests for Java vulnerabilities

Key takeaways

  • Security issues in Android often arise from Java code. Detecting known Java issues directly on Android phones is possible by using custom signatures of patched code
  • The open-source Android app SnoopSnitch now includes Java patch tests, doubling its patch coverage
  • Your feedback can help us to further improve Android security testing

Introduction

The Android patch gap, which SRLabs found during previous research, is shrinking. But it is not closed. To keep track of the patch gap SRLabs created the Android app SnoopSnitch, which enables users to test whether their phone is missing security patches. In earlier versions of SnoopSnitch, patch testing was limited to Android components written in C and C++. The recent SnoopSnitch version now also features support for detecting missing patches in Android Java components. Integrating these Java patch tests doubled SnoopSnitch’s coverage since more and more vulnerabilities are found in Android components written in Java.  

This blog post discusses in detail how missing patches in Java code can be identified. The key is to dive deep into the bytecode to create unique signatures of Java code/methods and then classify these signatures as patched/unpatched. These fingerprints enable SnoopSnitch to verify automatically whether a phone is patched or unpatched. The potential of these heuristics is not limited to Android. While there are differences between Dalvik bytecode and other Java bytecode, our results can be a starting point into exploring how stable signatures can be created for Java bytecode outside of Android.

The fragmented Android patch ecosystem

Android requires regular patching. Android is the most popular mobile OS with over 2.8 billion users worldwide and a market share of 75%. To keep the system secure, regular patching is necessary and new patches are released every month. The monthly Android Security bulletin contains information on current vulnerabilities (CVEs) and links to available patches. Patches are provided by Google and other vendors and then distributed to the affected phones.

The Android security patch process is complex since the Android patch ecosystem is fragmented and patch responsibility lies in the hands of the individual vendors. We looked into the Android patch gap during previous research and even though the situation has significantly improved the patch gap will not go away. The two main reasons are: Different Android versions are in use simultaneously. And Android is an open-source operating system which is modified by different vendors. Therefore, vendors often need to adapt patches for their modified versions.

Keep in mind that a missing Android patch does not automatically equal an exploitable vulnerability. The lion’s share of Android exploitation is based on social engineering and malware. Nevertheless, to avoid known vulnerabilities being used to exploit Android devices, users should ensure that they are patched completely.

This is where SnoopSnitch helps, which compares the actual patch situation to the claimed patch level of a device and flags any hidden patching gap.

Automated Java patch testing

Why we automated Java patch testing

Android is mainly written in C/C++ and Java–for the former we already came up with a patch test solution and presented it at HITB 2018.

However, in the past couple of years many of the vulnerabilities fixed in the monthly Android security patches have been in Android Java components and our data shows that more missing patches are written in Java than in C/C++.

Patch test heuristics depend on the compiled language, thus the patch test method differs between e.g., C/C++ and Java tests. We developed techniques to test Java patches.

Java code is compiled to bytecode–an intermediate binary form which has to be either further compiled or interpreted to be executed. The bytecode is packaged into a .dex file, an Android-specific file format.

For a manual analysis it is possible to decompile the bytecode for a given DEX file and check whether the patch has been applied. However, this method is not suitable for running automated tests for a variety of reasons:

  • There can be a lot of variances in the decompiler output even if the original source code is equal
  • Java decompilers are notoriously unreliably and often do not produce any result
  • Decompilation takes significant CPU time

Patch signatures are the solution

Our solution, based on creating custom signatures of patched code, creates a robust way of testing whether a patch has been applied. We draw on what we have done for detecting patches in C and C++ compiled code in the past. Both approaches are based on the same principle: If a signature has been classified as patched/unpatched and we find the same signature on the device being tested, we can directly conclude that the device is also patched/unpatched.

Generating signatures in three steps

1/ Identify changes. We start by identifying unique changes in published patches. This could be a newly introduced function name in an already existing class. If that function name is at that specific (compiled) part of the component, the patch must have been applied.

2/ Parse DEX. After identifying unique changes in patches, we can look for them inside the bytecode. The first step to be able to do so is to understand the DEX file format, which the bytecode is part of. Every file format follows a well-defined structure that allows finding specific information in specific locations.

Let us say we want to find the code of the function establishConnection, which is part of the class ConnectionManager. Figure 1 illustrates the process of traversing the file with the goal of first identifying the class, then the method, and lastly the code inside the method so we can create a signature.

Figure 1: Traversing the DEX file format to find the relevant bytecode section. 1.	Read the header of the DEX file and follow the class_defs_off pointer to reach the class_defs section that contains class_def_items. For each Java class in this file, we will find one of these items 2.	Iterate through all the class_def_items until we find our target class (ConnectionManager) by looking for its name 3.	Follow the class_data_off pointer to reach the class_data_item, which contains references to actual class data 4.	The class_data_item contains two lists of methods: direct_methods and virtual_methods. Each of the listed methods inside of them leads to one encoded_method 5.	In the encoded_method we use code_off to finally reach our destination, which is the code_item that contains next to some header data (e.g., the number of registers) the actual bytecode in the field insns
Figure 1: Traversing the DEX file format to find the relevant bytecode section

3/ Create signatures. After locating the bytecode in the file, the next step is to create a unique signature of it. A simple hash of the bytecode will not do the job since many parts (e.g., calls to other methods) will reference parts of the DEX file with some kind of index like references to strings or references to other methods/classes. This index will vary between different firmware images due to minimal changes somewhere else in the codebase.

Instead, we need to include in the signature only the relevant instructions, while leaving out volatile values. This results in robust signatures that handle volatile parts appropriately. The relevant instructions are documented in the Android Open-Source Project: Dalvik bytecode and Dalvik Executable instruction formats. Both websites must be used in conjunction since information is spread across them, e.g., the instructions are defined in the first and the instruction format in the second (see Figure 2).

Figure 2: Combining information about Dalvik bytecode instructions and their format. The information comes up in the following sections.
Figure 2: Combining information about Dalvik bytecode instructions and their format

An example of creating a custom Java bytecode signature

We concluded that we need to handle bytecode instructions and therefore need to understand their structure. Let us look into one specific example: Loading a string to a register using the const-string instruction.

Each Java bytecode instruction is identified by a unique number–the opcode. Let us take the instruction with opcode 0x1a as an example (Figure 3).

Figure 3: Components of the const-string bytecode instruction. Information comes up in the following sections.
Figure 3: Components of the const-string bytecode instruction

Figure 2 shows that the respective instruction format is 21c. Using that information, we know that this instruction has two operands: A 8-bit destination register and a 16-bit string index, which is an offset in the strings_ids section of the DEX file (see Figure 1). This means the actual string value is located in the strings_ids section and you use the numeric string index to locate it.

If another string in the same DEX file is added or removed, for example through other patches, the numeric string index will likely change. Therefore, it is not possible to include this string index in the signature. However, it is possible to extract the string value from the DEX file and add this value to the signature (instead of the numeric string index). This will only change if the actual value is changed and therefore can be considered a valid part of the signature.

A similar logic is needed for a variety of other opcodes. For example, for creating a new instance of an object (opcode 0x22: new-instance instruction), the class name is hashed instead of the numeric index pointing to the class.

Another challenge for creating bytecode signatures: Resource identifiers

Creating signatures by removing volatile indices and then hashing the bytecode is a big step towards stable signatures. But that is not yet a complete solution due to another source of volatility: Android resource identifiers.

Resource identifiers are a convenient way for managing strings in Android applications. The identifiers are separated from the application logic code and placed in an XML file. Developers can assign each string a variable, which allows referencing a string using something like getString(R.string.hello_world). The advantage is that the code is more readable, and translations to different languages will automatically be selected depending on the system language.

All resources (strings are just one example) are assigned a numeric resource identifier by the build system at compile time. These identifiers are volatile by nature. To mitigate this, resource identifiers need to be excluded from the signature. However, there is no special opcode to load a resource identifier. Loading a resource identifier will pretty much always use the opcode 0x14 (CONST) but this opcode is not reserved for resource identifiers and will also be used for other purposes.

We could just exclude all integer constants loaded with 0x14 (CONST), but that could lead to incorrect matches of signatures when loading other numeric values. Additionally, in some cases a patch will only change some numeric values such as flags and if you exclude all integer constants you will not be able to detect these patches.

The solution lies in being able to recognize resource identifiers. We create and use heuristics. The resource identifier is generated automatically by the Android build system, and it is of the form 0xPPTTNNNN with certain constraints:

  • PP is some kind of package identifier and will always be either 0x01 or 0x7f.
  • TT is the type of resource (i.e., a string or a layout element) assigned by the build system sequentially for the resource types within an application. This results in a small (but non-zero) number, according to an analysis of thousands of Android firmware images it is pretty much always in the range from 1 to 25.
  • NNNN is the resource number, all resources of a given type will get a sequential number, starting with 1. In pretty much all cases this is a value in the range 1 to 5000.

Combining these restrictions allows creating a heuristic, which detects practically all valid resource identifiers while only having a relatively low number of false positives (i.e., matches for other 32-bit integers loaded with the same CONST opcode).

If the heuristics detect that a loaded number is likely a resource identifier, the number will be removed from the signature. This assures that different binaries compiled from the same source code (but with different numeric resource identifiers assigned by the build system) will match the same signature.

Conclusion

SRLabs devised an approach to create and detect signatures for Java bytecode in order to analyze the potential security patch gap in Android's Java components. We looked into the details of the DEX file format and the bytecode instruction set. Since some parts of the bytecode are volatile, we had to exclude them from the signature. Heuristics can help to navigate this problem. The Java patch tests that we create based on this logic are now part of our app SnoopSnitch and doubled the coverage of the patch level analysis.

Despite the differences between Dalvik bytecode and other Java bytecode further research should investigate the possibility of a transfer and adaption of these heuristics for the later. Unique patch signatures could e.g., be used to ensure that Log4j is patched in popular Java libraries.

Call for action: Help us further improve SnoopSnitch

The latest SnoopSnitch version is available on the Play Store. Alternatively, you can download it from our project site. You can report issues on our GitHub to help us to improve our latest version.

Disclaimer: Publicly available firmware form the basis for our test creation. There can be limits of our Java tests due to customizations of Android by different vendors. As of May 2022, SnoopSnitch does not yet include patch tests applicable for Android 12L, since it has just been released. We add tests regularly.

Editing by: Maria Bühner

Explore more

aLL articles
The physical access control market is ripe for an upgrade to modern technology
cryptography
device hacking
9/16/2010
Decrypting GSM phone calls
telco
device hacking
open source
6/2/2021
Simple fuzzing goes a long way, even for critical blockchain software
blockchain
5/27/2019