5.3 Evaluation
5.3.4 Discussion
This section explores some of the limitation of the aforementioned work. One of the limitations relates to the generation of regular expressions. Currently, we require manual observation and interaction to create regular expressions. This process can often be tedious and difficult. As we continue to develop our approach, we intend to automate this process.
Another limitation is our approach of identifying behavioural patterns. The patterns identified were based on our observations from various applications. As a result of this, there may have been some behaviours that were not captured. In future, we would like to introduce a more formalised and robust methodology of identifying behavioural patterns at a system call-level ensuring that all behaviours are captured without any uncertainty.
Additionally, we intend to introduce more behavioural patterns capable of de- tecting other types of malware, such as Backdoors and Trojans, which were identi- fied as two of the most prominent types of infections for third-party apps (Google, 2019b). This enables us to expand our dataset and evaluate the efficacy of our methodology on a larger sample size consisting of different types of malware.
Many dynamic anti-malware solutions often utilises isolated environments, most commonly through the use of Virtual Machines (VMs), to contain and analyse
74
malware. This presents an issue as more contemporary malware are equipped with the capabilities to detect and evade those environments (Gadhiya and Bhavsar, 2013; Uppal et al., 2014). Although the experiments conducted in this thesis were evaluated on VM, the work has shown to be feasible to implement on a real device.
In future, a potential avenue for improvement is to develop a system that incor- porates the concepts proposed in this thesis, which can efficiently capture system calls in real time on a real user device whilst adhering to the resource constraints of mobile devices.
As previously mentioned in Section 5.2.1, Suspicious and General patterns were not utilised in our evaluations. However, these patterns were still identified and created to lead into future work. These patterns can be expanded to create a more robust real time malware detection model for Android devices, or aid current and future anti-malware solutions in detecting and deterring malware.
The use of system calls enables the ability to capture large quantities of informa- tion, which can often be used to associate behaviours exhibited by an application.
However, by further exploring system calls, it was observed that more complex behaviours are difficult to capture and understand at a system call level. One of the examples was locker-type ransomware behaviour and SMS Trojans. The core mechanism of a locker-type ransomware is to restrict the users’ ability to access their devices, often times, this restriction is imposed by a perpetual overlay or window, which cannot be closed. Conversely, SMS Trojans send or intercept SMS messages for malicious purposes (e.g., stealing of credentials and involuntary sub- scription to premium services). At a system call-level, these types of behaviours are often handled by the ioctl system call, which observes binder transactions.
Due to the complexity of binder transactions, theioctlsystem calls are not easily
understandable.
Based on the understanding of this limitation, it would not be feasible to accu- rately identify and understand the complex behaviours of specific types of malware by solely relying on system calls alone. Further additions, such as frequency anal- ysis (Bhatia and Kaushal, 2017), specialise decoding of Binder transactions (Tam et al., 2015), or additional observable features, such as permissions (Ferrante et al., 2017), would be necessary to produce a more descriptive overview of an applica- tion’s behaviour. Nevertheless, solely utilising system calls and its sequence does not inhibit the ability to observe behavioural patterns in general. However, this is a limitation that should be considered for more complex behaviours or specific malware types.
The use of dynamic analysis methods are more resilient to common static ob- fuscation techniques, such as code obfuscation, and junk code insertion. However, it should be noted that obfuscation techniques have been effectively explored to invalidate existing dynamic analysis methods. A dynamic analysis obfuscation technique of particular interest is system call obfuscation. Srivastava et al. (2011) proposed an Illusion attack that utilises an Alternative System Call Execution Path (ASEP) and the ioctl system call to obfuscate malicious behaviour. The proposed method showed that it was possible to masquerade the behaviours per- formed by malicious applications as the system calls invoked through the use of ioctl, which is difficult to discern from benign applications due to the marshalling process, unless a specialised decoding process was implemented.
In this work, the use of regular expressions to devise behavioural patterns have been shown to effectively detect encryption-type ransomware. However, regular expressions have its limitations, particularly relating to more complex pattern
76
matching, such as nested parentheses, and counting or checking for balanced sets of characters. In the following chapters of this thesis, this issue is addressed by deviating from the prevalent use of regular expressions in the behavioural patterns.