Android Malware Datasets

Posted by Flight Tracer on January 24, 2018

新搭建了个人博客,文章全部转到http://blog.datarepo.cn上更新。

Android malware datasets.

1. Android Malware Genome Project

In this project, we focus on the Android platform and aim to systematize or characterize existing Android malware. Particularly, with more than one year effort, we have managed to collect more than 1,200 malware samples that cover the majority of existing Android malware families, ranging from their debut in August 2010 to recent ones in October 2011.

Publication Dissecting Android Malware: Characterization and Evolution. Yajin Zhou, Xuxian Jiang. Proceedings of the 33rd IEEE Symposium on Security and Privacy (Oakland 2012). San Francisco, CA, May 2012

Homepage (No longer supported) http://www.malgenomeproject.org

2. M0Droid Dataset

M0Droid basically is android application behavioral pattern recognition tool which is used to identify android malwares and categorize them according to their behavior. It utilized a kernel level hook to capture all system call requests of the application and then generate a signature for the behavior of the application.

Publication Damshenas M, Dehghantanha A, Choo K K R, et al. M0droid: An android behavioral-based malware detection model[J]. Journal of Information Privacy and Security, 2015, 11(3): 141-157.

Homepage http://m0droid.netai.net/modroid/

Blog http://www.alid.info/blog/2015/2/4/android-malware-research-dataset

3. The Drebin Dataset

The dataset contains 5,560 applications from 179 different malware families. The samples have been collected in the period of August 2010 to October 2012 and were made available to us by the MobileSandbox project. You can find more details on the dataset in the paper.

Publication Arp D, Spreitzenbarth M, Hubner M, et al. Drebin: Efficient and explainable detection of android malware in your pocket[C]//Proc. of 17th Network and Distributed System Security Symposium, NDSS. 14.

Homepage http://user.informatik.uni-goettingen.de/~darp/drebin/

4. A Dataset based on ContagioDump

*The dataset is a collection of Android based malware seen in the wild. The malware pieces were downloaded on October 26th, 2011. The total number of malware included in the sample is 189. I have qualitatively split them into categories based on their primary behaviours where available. I obtained their primary behaviours from malware reports from the various AV companies.If the malware would download a separate payload as its primary function, it was put in the Trojan category. If the malware executed an escalation of privilege attack, it was in the escalation of privilege category. If the malware primarily stole data from the phone, it was classified as information stealing. If the malware sent premium SMS messages, it was a premium SMS transmitting malware. *

Homepage http://cgi.cs.indiana.edu/~nhusted/dokuwiki/doku.php?id=datasets

5. AndroMalShare

AndroMalShare is a project focused on sharing Android malware samples. It’s only for research, no commercial use. We present statistical information of the samples, a detail report of each malware sample scanned by SandDroid and the detection results by the anti-virus productions. You can upload malware samples to share with others and each malware sample can be downloaded(only by registered users)!

Homepage http://sanddroid.xjtu.edu.cn:8080/#home

6. Kharon Malware Dataset

The Kharon dataset is a collection of malware totally reversed and documented. This dataset has been constructed to help us to evaluate our research experiments. Its construction has required a huge amount of work to understand the malicous code, trigger it and then construct the documentation. This dataset is now available for research purpose, we hope it will help you to lead your own experiments.

Publication CIDRE, EPI. Kharon dataset: Android malware under a microscope. Learning from Authoritative Security Experiment Results (2016): 1.

Homepage http://kharon.gforge.inria.fr/dataset/

7. AMD Project

AMD contains 24,553 samples, categorized in 135 varieties among 71 malware families ranging from 2010 to 2016. The dataset provides an up-to-date picture of the current landscape of Android malware, and is publicly shared with the community.

Publication Li Y, Jang J, Hu X, et al. Android malware clustering through malicious payload mining[C]//International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, Cham, 2017: 192-214.

Wei F, Li Y, Roy S, et al. Deep Ground Truth Analysis of Current Android Malware[C]//International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, Cham, 2017: 252-276.

Homepage http://amd.arguslab.org

8. AAGM Dataset

AAGM dataset is captured by installing the Android apps on the real smartphones semi-automated. The dataset is generated from 1900 applications.

Publication Arash Habibi Lashkari, Andi Fitriah A.Kadir, Hugo Gonzalez, Kenneth Fon Mbah and Ali A. Ghorbani, Towards a Network-Based Framework for Android Malware Detection and Characterization, In the proceeding of the 15th International Conference on Privacy, Security and Trust, PST, Calgary, Canada, 2017.

Homepage http://www.unb.ca/cic/datasets/android-adware.html

9. Android PRAGuard Dataset

As retrieving malware for research purposes is a difficult task, we decided to release our dataset of obfuscated malware.

The dataset contains 10479 samples, obtained by obfuscating the MalGenome and the Contagio Minidump datasets with seven different obfuscation techniques.

Publication Davide Maiorca, Davide Ariu, Igino Corona, Marco Aresu and Giorgio Giacinto. Stealth attacks: an extended insight into the obfuscation effects on Android malware. In Computers and Security, vol. 51, pp. 16-31, 2015.

Homepage http://pralab.diee.unica.it/en/AndroidPRAGuardDataset

10. AndroZoo

AndroZoo is a growing collection of Android Applications collected from several sources, including the official Google Play app market.It currently contains 5,781,781 different APKs, each of which has been (or will soon be) analysed by tens of different AntiVirus products to know which applications are detected as Malware. We provide this dataset to contribute to ongoing research efforts, as well as to enable new potential research topics on Android Apps.By releasing our dataset to the research community, we also aim at encouraging our fellow researchers to engage in reproducible experiments.

Publication K. Allix, T. F. Bissyandé, J. Klein, and Y. Le Traon. AndroZoo: Collecting Millions of Android Apps for the Research Community. Mining Software Repositories (MSR) 2016.

Homepage https://androzoo.uni.lu/