Android Application Security A Semantics and Context-Aware Approach.pdf

SPRINGER BRIEFS IN COMPUTER SCIENCE

Mu Zhang Heng Yin

Android Application Android Application Security A Seman Semantics and Context-Aware Approach

1 3

SpringerBriefs in Computer Science

Series editors Stan Zdonik, Brown University, Providence, USA Shashi Shekhar, University of Minnesota, Minneapolis, USA Jonathan Katz, University of Maryland, College Park, USA Xindong Wu, University of Vermont, Burlington, USA Lakhmi C. Jain, University of South Australia, Adelaide, Australia David Padua, University of Illinois Urbana-Champaign, Urbana, USA Xuemin (Sherman) Shen, University of Waterloo, Waterloo, Canada Borko Furht, Florida Atlantic University, Boca Raton, USA V.S. Subrahmanian, University of Maryland, College Park, USA Martial Hebert, Carnegie Mellon University, Pittsburgh, USA Katsushi Ikeuchi, University of Tokyo, Tokyo, Japan Bruno Siciliano, Università di Napoli Federico II, Napoli, Italy Sushil Jajodia, George Mason University, Fairfax, USA Newton Lee, Newton Lee Laboratories, LLC, Tujunga, USA

More information about this series at http://www.springer.com/series/10028

Mu Zhang • Heng Yin

Android Application Security A Semantics and Context-Aware Approach

1 3

Mu Zhang Computer Security Department NEC Laboratories America, Inc. Princeton, NJ, USA

Heng Yin University of California, Riverside Riverside, CA, USA

ISSN 2191-5768 ISSN 2191-5776 (electronic) SpringerBriefs in Computer Science ISBN ISBN 978978-33-31 3199-47 4781 8111-1 1 ISBN ISBN 978978-33-31 3199-47 4781 8122-8 8 (eBo (eBook ok)) DOI 10.1007/978-3-319-47812-8 Library Library of Congress Congress Control Control Number: 201 6959410 6959 410 © The Author(s) 2016 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material material is concerne concerned, d, specifica specifically lly the rights rights of translati translation, on, reprintin reprinting, g, reuse reuse of illustrat illustrations, ions, recitati recitation, on, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of genera generall descr descript iptiv ivee names, names, regis register tered ed names, names, tradem trademark arks, s, servic servicee marks, marks, etc. etc. in this this public publicati ation on does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

The authors would like to dedicate this book to their beloved families and friends and to those who overcome their frustration and persevere persevere with resubmitting resubmitting papers to top-tier computer security conferences.

Preface

This book is an introduction to the cutting-edge technologies for discovery, diagnosis, and defense of emerging security problems in modern Android applications. With great power comes great threat. Recently, due to the popularity of Android smartphones, Android apps have attracted varieties of cyber attacks: some involve advanced advanced anti-dete anti-detectio ction n technique techniques; s; some exploit exploit “genetic” “genetic” defects defects in Android Android programs; some cover up identity theft with camouflage; some trick end users to fall into a trap using intriguing but misleading language. To defeat malicious attempts, researchers strike back. Many traditional techniques have been studied and practi practiced ced:: malwa malware re classi classifica ficati tion, on, taint taint analys analysis, is, acces accesss contro control, l, etc. etc. Yet, intrusive techniques also advance, and, unfortunately, existing defenses fall short, fundamentally due to the lack of sufficient interpretation of Android application behaviors. To addr addres esss this this limi limita tati tion on,, we look look at the the prob proble lem m from from a diff differ eren entt angl angle. e. Android apps, no matter good, bad, or vulnerable, are in fact software programs. Their functionality is concretized through semantically meaningful code and varies under different circumstances. This reveals two essential factors for understanding Android application, semantics and contexts, which, we believe, are also the key to tackle security problems in Android apps. As a result, we have developed a series of semantics and context-aware techniques to fight against Android security threats. We have have applie applied d our idea idea to four four signifi significan cantt areas, areas, namely namely,, malwa malware re detect detection ion,, vulnerability patching, privacy leakage mitigation, and misleading app descriptions. This will be elaborated through the whole book.

Intended Audience This book is suitable for security professionals and researchers. It will also be useful for graduate students who are interested in mobile application security.

vii

viii

Preface

Acknowledgments The author authorss would would like like to thank thank Lok, Lok, Aravin Aravind, d, Andre Andrew w, Qian, Qian, Xuncha Xunchao, o, Yue, Rundong, Jinghan, Manju, Eknath, and Curtis for the stimulating discussions and generous support. Princeton, NJ, USA Riverside, CA, USA September 2016

Mu Zhang Heng Yin

Contents

1

2

. . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. .. . . . Introd Intr oduc ucti tion on . . . . .. 1.1 Secur Security ity Threa Threats ts in Androi Android d Appli Applicati cations ons . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. .. . 1.1.1 1.1 .1 Mal Malwa ware re Att Attack ackss . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . 1.1.2 Softw Software are Vulner ulnerabil abilitie itiess . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. .. 1.1.3 1.1 .3 Inf Inform ormat ation ion Lea Leakag kagee . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. .. . 1.1.4 1.1 .4 Ins Insecu ecure re Des Descri cripti ptions ons . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. .. 1.2 A Semanti Semantics cs and and Context Context Aware Aware Approa Approach ch to Androi Android d Application Security . Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . .. . . .. . . .. . . .. . . .. .. . References . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. .. . . . . .

3 4

. . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . . . Backgr Back grou ound nd . . . . . .. 2.1 And Androi roid d Appl Applica icatio tion n . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. 2.1.1 2.1 .1 And Androi roid d Fra Frame mewor work k API . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. .. . . 2.1.2 2.1 .2 And Androi roid d Per Permis missio sion n . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. .. . . 2.1.3 2.1 .3 And Androi roid d Com Compon ponent ent . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. .. . 2.1.4 2.1 .4 And Androi roid d App Des Descri cripti ption on.. . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. .. . . 2.2 And Androi roid d Mal Malwa ware re Det Detect ection ion . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. .. . 2.2.1 Signa Signature ture Dete Detection ction and Malw Malware are Analy Analysis sis . . .. . . . .. . . . .. . . . .. . . . .. . . . .. .. 2.2.2 Androi Android d Malw Malware are Class Classifica ification tion . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . 2.3 Androi Android d Appli Applicati cation on Vulner ulnerabili abilities ties . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. 2.3.1 Compon Component ent Hija Hijacking cking Vulner ulnerabil abilitie itiess . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. . 2.3.2 Autom Automatic atic Pat Patch ch and Signa Signature ture Gener Generatio ation n . . .. . . . .. . . . .. . . . .. . . . .. . . . .. .. 2.3.3 2.3 .3 Byt Byteco ecode de Re Rewri writin ting g . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . . . . 2.3.4 Instr Instrument umentatio ation n Code Opti Optimiza mization tion . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . . . 2.4 Pri Priva vacy cy Lea Leakag kagee in in Andro Android id App Appss . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . . 2.4.1 Pri Privac vacy y Leak Leakage age Dete Detectio ction n . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . . 2.4.2 Pri Privac vacy y Leak Miti Mitigati gation on . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. 2.4.3 2.4 .3 Inf Inform ormati ation on Flo Flow w Cont Control rol . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . . . 2.5 Text Analy Analytics tics for Androi Android d Secur Security ity . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. 2.5.1 Autom Automated ated Gener Generatio ation n of Softw Software are Descr Descripti iption on . . .. . . . .. . . . .. . . . .. .. . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7 7 8 8 8 9 9 10 10 11 11 12 12 13 13 13 14 14 14 15 15

1 1 1 2 2 2

ix

x

3

4

5

Contents

. . . . . . .. . . . . . . .. .. . . . . Semantics-Aware Android Malwar Semantics-Aware Malwaree Classificati Classification on . . . . .. 3.1 3. 1 In Intr trod oduc ucti tion on . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. .. . . . 3.2 3. 2 Ov Over ervi vieew . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 3.2 .1 Pro Proble blem m Sta Statem tement ent . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. 3.2.2 Archi Architect tecture ure Ove Overvie rview w . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. 3.3 Weight eighted ed Conte Contextual xtual API Depen Dependenc dency y Graph Graph.. . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. . . 3.3.1 3.3 .1 Ke Key y Beh Behav avior ioral al Asp Aspec ects ts . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. .. . . . . 3.3.2 3.3 .2 Fo Forma rmall Defi Definit nition ion . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. 3.3. 3. 3.3 3 A Re Real al Ex Exam ampl plee . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . . 3.3.4 3.3 .4 Gra Graph ph Gen Genera eratio tion n . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. . 3.4 Androi Android d Malw Malware are Class Classifica ification tion . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. 3.4.1 3.4 .1 Gra Graph ph Mat Matchi ching ng Sco Score re.. . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. . 3.4.2 3.4 .2 Weig eight ht Ass Assign ignmen mentt . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . . . . 3.4.3 Imple Implementa mentation tion and Graph Data Database base Query Query.. . .. . . . .. . . . .. . . . .. . . . .. . . . .. .. 3.4.4 Malw Malware are Class Classifica ification tion . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. 3.5 3. 5 Ev Eval alua uati tion on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 3.5 .1 Dat Datase asett and and Exp Experi erimen mentt Setu Setup p . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . . . 3.5.2 3.5 .2 Sum Summar mary y of of Grap Graph h Gene Generat ration ion . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . . 3.5.3 3.5 .3 Cla Classi ssifica ficati tion on Res Result ultss . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . . 3.5.4 3.5 .4 Run Runtim timee Per Perfor forman mance ce . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . 3.5.5 Eff Effecti ectivene veness ss of Weight Weight Genera Generation tion and Weight Weighted ed Graph Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . .. . . .. . . .. . . .. .. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40 42

Automatic Generation of Vulnerability ulnerability-Specific -Specific Patches for . . .. . . .. . . .. . . .. .. . Preventing Prev enting Component Hijacking Attacks . . . . . . . . . . . . . . . . . . . . .. 4.1 4. 1 In Intr trod oduc ucti tion on . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. .. . . . 4.2 Probl Problem em Stat Statement ement and Approa Approach ch Over Overvie view w . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. . . 4.2.1 4.2 .1 Run Runnin ning g Exa Exampl mplee . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. . 4.2.2 4.2 .2 Pro Proble blem m Sta Statem tement ent . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. 4.2.3 4.2 .3 App Approa roach ch Ov Overv ervie iew w . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . . . 4.3 Taint Slic Slicee Comput Computatio ation n . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . 4.3.1 4.3 .1 Run Runnin ning g Exa Exampl mplee . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. . 4.4 Pa Patc tch h Sta Statem tement ent Pla Placem cement ent . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . . . 4.5 Pa Patc tch h Opt Optimi imizat zation ion . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . . 4.5.1 Opti Optimize mized d Pat Patch ch for Runnin Running g Examp Example le . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. 4.6 Exper Experiment imental al Ev Evaluat aluation ion . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. . . 4.6.1 4.6 .1 Exp Experi erimen mentt Set Setup up . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. 4.6.2 4.6 .2 Sum Summar marize ized d Res Result ultss . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . . . 4.6.3 4.6 .3 Det Detail ailed ed Ana Analys lysis is . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45 45 47 47 49 50 51 51 52 53 54 56 56 57 58 60

. . .. . . .. . . .. . . .. .. Efficient and Context-A Context-Aware ware Priva Privacy cy Leakage Confinement . .. 5.1 5. 1 In Intr trod oduc ucti tion on . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. .. . . . 5.2 App Approa roach ch Ov Overv ervie iew w . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. . 5.2.1 5.2 .1 Ke Key y Tech echniq niques ues . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . .

63 63 65 65

19 19 21 21 22 23 23 24 24 26 30 30 31 32 33 34 34 34 36 40

Contents

5.3

xi

Context-Aware Context-A ware Polic Policy y . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . . . . 5.3.1 Taint Propa Propagati gation on Tr Trace ace . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. .. . . . . 5.3.2 5.3 .2 Sou Source rce and Sin Sink k Call Call-Si -Sites tes . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . . 5.3.3 Par Paramet ameteriz erized ed Sourc Sourcee and Sink Pai Pairs rs . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. . . 5.3.4 5.3 .4 Imp Implem lement entati ation on . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . . 5.4 Exper Experimen imental tal Ev Evaluat aluation ion . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. . . 5.4.1 Summa Summarized rized Analy Analysis sis Resul Results ts . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. .. . . . . 5.4.2 5.4 .2 Det Detail ailed ed Ana Analys lysis is . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. 5.4.3 5.4 .3 Run Runtim timee Per Perfor forman mance ce . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

66 67 67 68 69 69 70 71 74 75

Autom utomatic atic Gener Generatio ation n of Security-C Security-Centr entric ic Descrip Descriptions tions for for . . .. . . .. . . .. . . .. .. . Android Apps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6.1 6. 1 In Intr trod oduc ucti tion on . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. .. . . . 6.2 6. 2 Ov Over ervi vieew . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 6.2 .1 Pro Proble blem m Sta Statem tement ent . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. 6.2.2 Archi Architect tecture ure Ove Overvie rview w . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. 6.3 Sec Securi urity ty Beh Behav avior ior Gra Graph ph . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. 6.3.1 6.3 .1 Fo Forma rmall Defi Definit nition ion . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. 6.3.2 SBG of of Motivating Example . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. 6.3.3 6.3 .3 Gra Graph ph Gen Genera eratio tion n . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. . 6.4 Beha Behavior vior Minin Mining g and Graph Compre Compression ssion.. . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . . . . . 6.5 Des Descri cripti ption on Gen Genera eratio tion n . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . . 6.5.1 Autom Automatic atically ally Gene Generated rated Descr Descripti iptions ons . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. . 6.5.2 Beha Behavior vior Descr Descripti iption on Model . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. 6.5.3 Beha Behavior vior Graph Tr Transla anslation tion . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. . 6.5.4 6.5 .4 Mot Motiv ivat ating ing Exa Exampl mplee . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. . . . 6.6 6. 6 Ev Eval alua uati tion on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6.1 Correc Correctness tness and Secur Security-A ity-Aware wareness ness . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. . . 6.6.2 Reada Readabilit bility y and Eff Effecti ectivene veness ss . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

77 77 78 78 80 82 82 82 83 86 87 87 88 90 91 92 92 95 97

7

. . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. .. Limitation Limitatio n and Futur Futuree Work . . . . . .. 7.1 Androi Android d Malw Malware are Class Classifica ification tion . . . . .. . . . . . . .. . . . . . . .. . . . . . .. . . . . . . .. . . . . . . .. . . . . . .. .. 7.2 Autom Automated ated Vulner ulnerabili ability ty Pat Patching ching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Conte Context-A xt-Aware ware Pri Privac vacy y Prote Protection ction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Autom Automated ated Gener Generatio ation n of Secur Security-C ity-Centri entricc Descri Description ptionss . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

99 99 100 101 102 103

8

Conc Co nclu lusi sion on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6

Chapter 1

Introduction

severe security security challenge challenges. s. Abstract Along with the boom of Android apps come severe Existing techniques fall short when facing emerging security problems in Android applications, such as zero-day or polymorphic malware, deep and complex vulnerabilities, privacy leaks and insecure app descriptions. To fight these threats, we have proposed a semantics and context aware approach, and designed and developed a series of advanced techniques.

1.1

Security Security Threats Threats in Android Android Applicati Applications ons

Android Android has domina dominated ted the smartp smartphon honee marke markett and become become the most most popula popularr operating system for mobile devices. In the meantime, security threats in Android apps have also quickly increased. In particular, four major classes of problems, malware, program program vulnerabilities vulnerabilities, privacy privacy leaks leaks and insecur insecuree app descripti descriptions ons, bring considerable challenges to Android application security. Although a great deal of research efforts have been made to address these threats, they have fundamental limitations and thus cannot solve the problems.

1.1. 1.1.1 1

Malw Malwar aree Attac Attacks ks

Malicious apps have been increasingly exponentially according to McAfee’s annual report report [1]. Malware Malware steals steals and pollutes pollutes sensiti sensitive ve information information,, execut executes es attacker attacker specified commands, or even totally roots and subverts the mobile devices. Unfortunately, existing automated Android malware detection and classification methods can can be evade vaded d both both in theo theory ry and and prac practi tice ce.. Signa Signatu ture re-b -bas ased ed dete detect ctio ions ns [ 2, 3] barely barely look look for specifi specificc byteco bytecode de patter patterns, ns, and theref therefore ore are easily easily evade evaded d by bytecode-l bytecode-lev evel el transformat transformation ion attacks attacks [ 4]. In contra contrast, st, machin machinee learni learningng-bas based ed approaches [5 [5–7] extract features from simple, external and isolated symptoms of an application (e.g., permission requests and individual API calls). The extracted features are thus associated with volatile application syntax, rather than high-level and robust program semantics. As a result, these detectors are also susceptible to evasion. © The Author(s) 2016 M. Zhang, H. Yin, Android Application Security , SpringerBriefs in Computer Science, DOI 10.1007/978-3-319-47812-8_1 10.1007/978-3-319-47812-8_1

1

2

1.1.2 1.1.2

1

Introduction

Softwa Software re Vulnera ulnerabil bilitie itiess

Apps may also contain security vulnerabilities, such as privilege escalation [8], capabi capabilit lity y leaks leaks [ 9], permis permissio sion n re-del re-deleg egati ation on [ 10 10], ], compon component ent hijack hijacking ing [11 11], ], content leaks & pollution [12 [ 12]] and inter-co inter-compone mponent nt communica communication tion vulnerabil vulnerabil-itie itiess [13 13,, 14 14]. ]. These These vulner vulnerabi abilit lities ies are largel largely y detect detected ed via automa automated ted static static analysis [9, 11 11– –14 14]] to guaran guarantee tee the scalab scalabili ility ty and satisf satisfact actory ory code code cove coverag rage. e. However, static analysis is conservative in nature and may raise false positives. Ther Theref efor ore, e, once once a “pot “poten enti tial al”” vulne vulnera rabi bili lity ty is disc discov over ered ed,, it first first need needss to be confirmed; once it is confirmed, it then needs to be patched. Nevertheless, it is fairly challenging to programmatically accomplish these two tasks because it requires automated interpretation of program semantics. So far, upon receiving a discovered vulnerability, the developer has no choice but to manually confirm if the reported vulnerability is real. It may also be nontrivial nontrivial for the (often inexperienced) inexperienced) developer developer to properly fix the vulnerability and release a patch for it. Thus, these discovered vulnerabilities may not be addressed for long time or not addressed at all, leaving a big time window for attackers to exploit these vulnerabilities.

1.1.3 1.1.3

Inform Informati ation on Leakag Leakagee

Infor Informa mati tion on leak leakag agee is pre prevail vailin ing g in both both malw malwar aree and and beni benign gn appl applic icat atio ions ns.. To address privacy leaks, prior efforts are made to perform both dynamic and static informatio information n flow analyses. analyses. Dynamic Dynamic taint tracking based approaches, approaches, including DroidS DroidScop copee [15 15], ], VetDr etDroi oid d [16 16]] and Taint aintDr Droi oid d [17 17], ], can can accu accura rate tely ly dete detect ct information exfiltration at runtime, but incur significant performance overhead and suffer from insufficient code coverage. Conversely, static program analyses (e.g., FlowDroid [18 [18]] and AmanDroid [19 19]) ]) can achieve high efficiency and coverage, but are only able to identify “potential” leakage and may cause considerable false positives. Besides, both approaches merely discover the presence of private data transmission, but do not consider how and why the transmission actually happens. Due to the lack of “context” information, these detectors cannot explain the cause of detected “privacy leaks”. Thus, they cannot differentiate legitimate usage of private data (e.g., Geo-location required by Google Map) from true data leakage, and may yield severe false alarms.

1.1.4 1.1.4

Insecu Insecure re Descri Descripti ptions ons

Unlike traditional desktop systems, Android provides end users with an opportunity to proactively accept or deny the installation of any app to the system. As a result, it is essential that the users become aware of each app’s behaviors so as to make

1.2 1.2 A Sem Semanti anticcs and Con Context text Aware are App Approac roach h to And Android roid Appl Applic icaation tion Secu Securi rity ty

3

appropriate decisions. To this end, Android markets directly present the consumers with two classes of information regarding each app: (1) permissions requested by the app and (2) textual descriptions provided by the developer. However, neither can serve the needs. Permissions are not only hard to understand [ 20 20]] but also incapable of explaining how the requested permissions are used. For instance, both a benign navigation app and a spyware instance of the same app can require the same permission to access GPS location, yet use it for completely different purposes. Whil Whilee the the beni benign gn app app deli delive vers rs GPS GPS data data to a legi legiti tima mate te map map serv server er upon upon the the user’s approval, the spyware instance can periodically and stealthily leak the user’s location information to an attacker’s site. Due to the lack of context clues, a user is not able to perceive such differences via the simple permission enumeration. Textual Textual descriptions provided by developers are not security-centric. There exists very little incentive for app developers to describe their products from a security perspective, and it is still a difficult task for average developers to write dependable descriptions. Besides, malware authors deliberately deliver misleading descriptions so as to hide malice from innocent users. Previous studies [ 21 21,, 22 22]] have revealed that the existing descriptive texts are deviated considerably from requested permissions. As a result, developer-driven description generation cannot be considered trustworthy.

1.2

A Semanti Semantics cs and and Context Context Aware ware Appr Approach oach to Andro Android id Application Security

The fundamental problem of existing defense techniques lies in the fact that they are not aware of program semantics or contexts. To direct address the threats to Android application security, we propose a semantics and context-aware approach, whic which h is more more effe effect ctiive for for malw malwar aree dete detect ctio ion n and and pri privacy vacy prot protec ecti tion, on, more more usab usable le to improve the security-awareness of end users and can address sophisticated software vulnerabilities that previously cannot be solved. Particularly, we propose four new techniques to address the specific security problems. (1) Android Malware Classification. To battle malware polymorphism and zeroday malware, we extract contextual API dependency graphs of each Android app via program analysis, assign different weights to the nodes in the graphs using learning-b learning-based ased technique technique,, measure measure the weighted weighted graph similari similarity ty,, and further use the similarity score to construct feature vectors. (2) Automati Automaticc Patch Patch Generati Generation on for Compone Component nt Hijackin Hijacking g Vulnerabil ulnerabilitie ities. s. To fix detect detected ed vulner vulnerabi abilit lities ies,, we perfor perform m static static analys analysis is to disco discove verr the small small portio portion n of code code that that leads leads to compon component ent hijac hijackin king g attack attacks, s, and then then selectively insert patch statements into the vulnerable program so as to defeat the exploitations at runtime. (3) Privacy Policy Enforcement. To o defeat privacy privacy leakage without compromising Enforcement. T legitimate legitimate functionality, functionality, we first rewrite the privacy breaching app to insert code that tracks sensitive information flow and enforces privacy polices, and then at

4

1

Introduction

runtime associate the policies to specific contexts via modeling the user reaction to each incident of privacy violation. (4) Automated Automated Generation of Security-Centric Security-Centric Descriptions. To improve the security sensitivity of app descriptions, we first retrieve the behavior graphs from each Android app to reflect its security-related operations, then compress the graphs by collapsing the common patterns so as to produce more succinct ones, ones, and finally finally transl translat atee the compre compresse ssed d graphs graphs into into concis concisee and humanhumanreadable descriptions.

References 1. McAfee McAfee Labs Labs Threat Threatss report report Fourt Fourth h Quarte Quarterr (2013) (2013) http://www http://www.mcafee.com/us/reso .mcafee.com/us/resources/ urces/ reports/rp-quarterly-threat-q4-2013.pdf 2. Zhou Y, Wang Z, Zhou W, Jiang X (2012) Hey, you, get off of my market: detecting malicious apps in official and alternative android markets. In: Proceedings of 19th annual network and distributed system security symposium (NDSS) 3. Grace M, Zhou Y, Zhang Q, Zou S, Jiang X (2012) RiskRanker: scalable and accurate zeroday android malware detection. In: Proceedings of the 10th international conference on mobile systems, applications and services (MobiSys) 4. Rastogi V, Chen Y, Jiang X (2013) DroidChameleon: evaluating android anti-malware against transformation attacks. In: Proceedings of the 8th ACM symposium on InformAtion, computer and communications security (ASIACCS) 5. Peng H, Gates C, Sarma B, Li N, Qi Y, Potharaju R, Nita-Rotaru C, Molloy I (2012) Using probabilistic generative models for ranking risks of android apps. In: Proceedings of the 2012 ACM conference on computer and communications security (CCS) 6. Aafer Y, Du W, Yin H (2013) DroidAPIMiner: mining API-level features for robust malware detect detection ion in androi android. d. In: Procee Proceedin dings gs of the 9th intern internati ationa onall confer conferenc encee on securi security ty and priva privacy cy in communication networks (SecureComm) 7. Arp D, Spreit Spreitzen zenbar barth th M, Hübner Hübner M, Gascon Gascon H, Rieck Rieck K (2014) (2014) Drebin Drebin:: effici efficient ent and explainable detection of android malware in your pocket. In: Proceedings of the 21th annual network and distributed system security symposium (NDSS) 8. Davi Davi L, Dmitri Dmitrienk enko o A, Sadeg Sadeghi hi AR, AR, Winan Winandy dy M (2011) (2011) Pri Privil vileg egee escala escalatio tion n attack attackss on androi android. d. In: Proceedings of the 13th international conference on Information security. Berlin/Heidelberg Berlin/Heidelberg 9. Grace M, Zhou Y, Wang Z, Jiang X (2012) Systematic detection of capability leaks in stock android smartphones. In: Proceedings of the 19th network and distributed system security symposium 10. Felt AP, Wang HJ, Moshchuk A, Hanna S, Chin E (2011) Permission re-delegation: attacks and defenses. In: Proceedings of the 20th USENIX security symposium 11. Lu L, Li Z, Wu Z, Lee W, Jiang G (2012) CHEX: statically vetting android apps for component hijacking hijacking vulnerabilitie vulnerabilities. s. In: Proceedin Proceedings gs of the 2012 ACM conferenc conferencee on computer computer and communications security (CCS) 12. Zhou Y, Jiang X (2013) Detecting passive content leaks and pollution in android applications. In: Proceedings of the 20th network and distributed system security symposium 13. Chin E, Felt AP, Greenwood K, Wagner D (2011) Analyzing inter-application communication in android. In: Proceedings of the 9th international conference on mobile systems, applications, and services (MobiSys) 14. Octeau D, McDaniel P, Jha S, Bartel A, Bodden E, Klein J, Traon YL (2013) Effective intercomponent communication mapping in android with epicc: an essential step towards holistic security analysis. In: Proceedings of the 22nd USENIX security symposium

References

5

15. Yan LK, Yin H (2012 (2012)) DroidS DroidScop cope: e: seamle seamlessl ssly y recons reconstru tructi cting ng OS and Dalvik Dalvik semant semantic ic views for dynamic android malware analysis. In: Proceedings of the 21st USENIX security symposium 16. Zhang Y, Yang M, Xu B, Yang Z, Gu G, Ning P, Wang XS, Zang B (2013) Vetting undesirable behaviors in android apps with permission use analysis. In: Proceedings of the 20th ACM conference on computer and communications security (CCS) 17. Enck W, Gilbert P, Chun BG, Cox LP, Jung J, McDaniel P, Sheth AN (2010) TaintDroid: an informatio information-flo n-flow w tracking tracking system system for realtime realtime privac privacy y monitoring monitoring on smartphon smartphones. es. In: Proceedings of the 9th USENIX symposium on operating systems design and implementation (OSDI) 18. Arzt S, Rasthofer S, Fritz C, Bodden E, Bartel A, Klein J, Traon YL, Octeau D, McDaniel P (2014) (2014) FlowDroid FlowDroid:: precise precise context, context, flow, flow, field, object-sen object-sensitiv sitivee and lifecyclelifecycle-awa aware re taint analys analysis is for androi android d apps. apps. In: Procee Proceedin dings gs of the 35th 35th ACM SIGPLA SIGPLAN N confer conferenc encee on programming language design and implementation (PLDI) 19. Wei F, Roy S, Ou X, Robby (2014) Amandroid: a precise and general inter-component data flow analysis framework for security vetting of android apps. In: Proceedings of the 21th ACM conference on computer and communications security (CCS). Scottsdale 20. Felt AP, Ha E, Egelman S, Haney A, Chin E, Wagner D (2012) Android permissions: user attention, comprehension, and behavior. In: Proceedings of the eighth symposium on usable privacy and security (SOUPS) 21. Pandita R, Xiao X, Yang W, Enck W, Xie T (2013) WHYPER: towards automating risk assess assessmen mentt of mobile mobile applic applicati ations ons.. In: Procee Proceedin dings gs of the 22nd 22nd USENIX USENIX confer conferenc encee on security 22. Qu Z, Rastogi V, Zhang X, Chen Y, Y, Zhu T, Chen Z (2014) Autocog: measuring the descriptionto-per to-permis missio sion n fidelit fidelity y in androi android d applic applicati ations ons.. In: Procee Proceedin dings gs of the 21st 21st confer conferenc encee on computer and communications security (CCS)

Chapter 2

Background

Abstract Android applications are developed on top of Android framework and therefore bear particular features compared to traditional desktop software. In the meantime, due to the unique design and implementation, Android apps are threatened by emerging cyber attacks that target at mobile operating systems. As a result, security researchers have made considerable efforts to discover, mitigate and defeat these threats.

2.1

Andro Android id Appli Applicat cation ion

Android is a popular operating system for mobile devices. It dominates in the battle to be the top smartphone system in the world, and ranked as the top smartphone platform with 52 % market share (71.1 million million subscribers) in Q1 2013. The success success of Andr Androi oid d is also also refle reflect cted ed from from the the popul popular arit ity y of its its appl applic icat atio ions ns.. Tens ens of thou thousa sand ndss of Android apps become available in Google Play while popular apps (e.g., Adobe Flash Player 11) have been downloaded and installed over 100 million times. Android apps are developed using mostly Java programming language, with the support of Android Software Development Kit (SDK). The source code is first compiled into Java classes and further compiled into a Dalvik executable (i.e., DEX file) via dx tool. Then, the DEX program and other resource files (e.g., XML layout files, images) are assembled into the same package, called an APK file. The APK package is later submitted to the Android app markets (e.g., Google Play Store) with the developer’s descriptions in text and other formats. An app market serves as the hub to distribute the application products, while consumers can browse the market and purchase the APK files. Once a APK file is downloaded and installed to a mobile device, the Dalvik executable will be running within a Dalvik virtual machine (DVM).

© The Author(s) 2016 M. Zhang, H. Yin, Android Application Security , SpringerBriefs in Computer Science, DOI 10.1007/978-3-319-47812-8_2 10.1007/978-3-319-47812-8_2

7

8

2.1.1

2

Background

Android Framework API

While an APK file is running in DVM, the Android framework code is also loaded and executed in the same domain. As a matter of fact, a Dalvik executable merely acts as a plug-in to the framework code, and a large portion of program execution happens within the Android framework. A DEX file interacts with the Android framework via Application Programming Inter Interfa face ce (API (API). ). Thes Thesee APIs APIs are are prov provid ided ed to the the deve develo lope pers rs thro throug ugh h Andr Androi oid d SDK. From developers’ perspective, Android API is the only channel for them to communicate with the underlying system and enable critical functionalities. Due to the nature nature of mobil mobilee operat operating ing system system,, Androi Android d offer offerss a broad broad spectr spectrum um of APIs that are specific to smartphone smartphone capabilitie capabilities. s. For instance, instance, an Android Android app can programmatically send SMS messages via sendTextMessage() API or retrieve retrieve user’s geographic location through getLastKnownLocation() .

2.1.2

Android Permission

Sensitive APIs are protected by Android permissions. Android exercises an installtime permission system. To enable the critical functionalities in an app, a developer oper has to specif specify y the needs for corres correspon pondin ding g permis permissio sions ns in a manife manifest st file AndroidManifest.xml AndroidManifest.xml . Once an end user agrees to install the app, the required perm permis issi sion onss are are gran grante ted. d. At runt runtim ime, e, perm permis issi sion on chec checks ks are are enfo enforc rced ed at both both framework and system levels to ensure that an app has adequate privileges to make critical API calls. There There exist exist two two major major limita limitatio tions ns for this this permis permissio sion n system system.. Firstl Firstly y, once once certain permission is granted to an app at the install time, there is no easy way to revoke it at runtime. Secondly and more importantly, the permission enforcement is fundamentally a single-point check and thus lacking continuous protection. If an application can pass a checkpoint and retrieve sensitive data via a critical API call, it can use the data without any further restrictions.

2.1.3

Android Component

Andro Android id fram frameework work also also prov provid ides es a spec specia iall set set of APIs APIs that that are are asso associ ciat ated ed to Android Android compone components nts.. Compone Components nts in Androi Android d are the basic basic buildi building ng units units for apps. In particular, there exist four types of components in Android: Activity , Service, Broadca Broadcast st Receiver Receiver, and Conten Content t Provider Provider. An Activity class takes care of creating the graphical user interface (GUI) and directly interacts with the end user. A Service, in contrast, performs non-interactive longer-running operations in background while accepting service requests from other apps or app

2.2 Android Malware Detection

9

compon component ents. s. A Broadca Broadcast st Recei Receive verr is compon component ent that that liste listens ns to and proces processes ses system-wide broadcast messages. A Content Provider encapsulates data content and shares it with multiple components via a unified interface. Compon Component entss commun communica icate te with with one anothe anotherr via Intents . An Inte Intent nt with with certain ACTION code, code, target target compon component ent and payloa payload d data data indica indicates tes a specifi specificc operation to be performed. For example, with different target parameter, an Intent can be used to launch an Activity, request a Service or send a message to any interested Broadcast Receiver. Receiver. A developer developer can create custom permissions to protect components from receiving receiving Intents from an arbitrary sender. Howe However ver,, such a simple mechanism cannot rule out malicious Intent communication because it does not prevent a malicious app author from requesting the same custom permission at install time.

2.1.4

Android App Description

Once an Android app has been developed, it is delivered to the app markets along with with the deve develop loper’ er’ss descri descripti ptions ons.. Deve Develop lopers ers are usuall usually y intere intereste sted d in descri describin bing g the app’s functionalities, unique features, special offers, use of contact information, etc. Neve Nevert rthe hele less ss,, they they are are not not moti motiva vate ted d to expl explai ain n the the secu securi rity ty risk riskss behi behind nd the the sens sensit itiive app app funct functio ions ns.. Prio Priorr stud studie iess [ 33 33,, 36 36]] have have reveale revealed d significant significant inconsist inconsistenci encies es between what the app is claimed to do and what the app actually does. This indicates that a majority majority of apps exerci exercise se undeclare undeclared d sensitiv sensitivee functional functionalitie itiess beyond beyond the users’ expectation. Such a practice may not necessarily be malicious, but it does provide a potential window for attacks to exploit. To mitigate this problem, Android markets also explains to users, in natural lang langua uage ge,, what what perm permis issi sions ons are are requi require red d by an app. app. The The goal goal is to help help user userss understand the program behaviors so as to avoid security risks. However, such a simple explanation is still too technical for average users to comprehend. Besides, a permission list does not illustrate how permissions are used by an app. As an example, if an application first retrieves user’s phone number and then sends it to a remote server, it in fact uses two permissions, READ_PHONE_STATE and INTERNET in a collaborative collaborative manner. Unfortunately, Unfortunately, the permission list can merely inform that two independent permissions have been used.

2.2

Andro Android id Malwar Malwaree Detect Detection ion

The number of new Android malware instances has grown exponentially in recent years. McAfee reports [28 [ 28]] that 2.47 million new mobile malware samples were collected in 2013, which represents represents a 197 % increase over over 2012. Greater and greater

10

2

Background

amounts of manual effort are required to analyze the increasing number of new malwa malware re insta instance nces. s. This This has led to a strong strong intere interest st in deve develop loping ing methods methods to automate the malware analysis process. Existing automated Android malware detection and classification methods fall into two general categories: (1) signature-based and (2) machine learning-based. Signature-based approaches [17 [ 17,, 54 54]] look for specific patterns in the bytecode and API calls, but they are easily evaded by bytecode-level transformation attacks [37 [ 37]. ]. Machine learning-based approaches [1 [ 1, 2, 34 34]] extract features features from an application’s application’s behavior (such as permission requests and critical API calls) and apply standard machine learning algorithms to perform binary classification. Because the extracted features are associated with application syntax, rather than program semantics, these detectors are also susceptible to evasion.

2.2.1

Signature Detection and Malware Analysis

Previo Previous us studie studiess were were focuse focused d on large large-sc -scale ale and lightlight-wei weight ght detect detection ion of malici malicious ous or dang danger erou ouss Andr Androi oid d apps apps.. Droi DroidR dRan ange gerr [54 54]] propos proposed ed permis permissio sion-b n-base ased d footpr footprint int-ing and heuristics-based schemes to detect new samples of known malware families and identify certain behaviors of unknown malicious families, respectively. RiskRanker [17 [17]] developed an automated system to uncover dangerous app behaviors, such as root exploits, and assess potential security risks. Kirin [11 11]] proposed a security service to certify apps based upon predefined security specifications. WHYWHYPER [ PER [33 33]] leveraged Natural Language Processing and automated risk assessment of mobile apps by revealing discrepancies between application descriptions and their true functionalities. Efforts were also made to pursue in-depth analysis of malware and applicati application on behaviors behaviors.. TaintDroid aintDroid [ 12 12]], DroidScope [47 [47]] and VetDroid [51 [ 51]] conducted dynamic taint analysis to detect suspicious behaviors during runtime. Ded [13 13], ], CHEX CHEX [25 25]], PEG [6], and and Flo FlowDro wDroid id [3] exercis exercised ed static static dataflow dataflow analysis analysis to identify identify dangerous dangerous code in Android Android apps. The effecti effectivene veness ss of these these approaches approaches depends upon the quality of human crafted detection patterns specific to certain dangerous or vulnerable behaviors.

2.2.2

Android Malware Classification

Many Many effor efforts ts have have also also been been made made to automa automati tical cally ly classi classify fy Androi Android d malwa malware re via machine learning. Peng et al. [34 34]] proposed a permission-based classification approa approach ch and introd introduce uced d probab probabili ilisti sticc genera generati tive ve models models for rankin ranking g risks risks for Android Android apps. Juxtapp [18 18]] performed feature hashing on the opcode sequence to detec detectt malici malicious ous code code reuse. reuse. DroidA DroidAPIM PIMine inerr [ 1] extracted extracted Android Android malware malware features at the API level and provided light-weight classifiers to defend against malware installations. DREBIN [2 [ 2] took a hybrid approach and considered both

2.3 Android Application Vulnerabilities

11

Andro Android id perm permis issi sion onss and and sens sensit itiive APIs APIs as malw malwar aree feat featur ures es.. To this this end, end, it perfor performed med broad broad static static analys analysis is to extra extract ct featur featuree sets sets from from both both manife manifest st files files and bytecode programs. It further embedded all feature sets into a joint vector space. As a result, the features contributing to malware detection can be analyzed geometrically and used to explain the detection results. Despite the effectiveness and computati computational onal efficienc efficiency y, these these machine machine learning learning based approaches approaches extract extract features from solely external symptoms and do not seek an accurate and complete interpretation of app behaviors.

2.3

Android Android Applicat Application ion Vulnerabilit ulnerabilities ies

Although the permission-based sandboxing mechanism enforced in Android can effectively confine each app’s behaviors by only allowing the ones granted with corres correspond ponding ing permis permissio sions, ns, a vulner vulnerabl ablee app with with certai certain n critic critical al permis permissio sions ns can perform perform securi securityty-sen sensit sitiv ivee behav behavior iorss on behalf behalf of a malici malicious ous app. app. It is so called confused deputy attack. This kind of security vulnerabilities can present in numerous forms, such as privilege escalation [8 [ 8], capability leaks [16 [ 16]], permission re-delegation [14 [14], ], content leaks and pollution [53 [ 53], ], component hijacking [ hijacking [25 25], ], etc. Prior work primarily focused on automatic discovery of these vulnerabilities. Once a vulnerability is discovered, it can be reported to the developer and a patch is expected. Some patches can be as simple as placing a permission validation at the entry point of an exposed interface (to defeat privilege escalation [8 [8] and permission re-delegation [14 [ 14]] attacks), or withholding the public access to the internal data repositories (to defend against content leaks and pollution [ 53 53]), ]), the fixes to the other problems may not be so straightforward.

2.3.1

Component Hijacking Hi jacking Vulnerabilities

In particular, component hijacking may fall into the latter category. When receiving a mani manipu pula late ted d inpu inputt from from a mali malici cious ous Andr Androi oid d app, app, an app app with with a comp compon onen entt hijack hijacking ing vulner vulnerabi abilit lity y may exfilt exfiltrat ratee sensit sensitiv ivee inform informati ation on or tamper tamper with with the sensitive data in a critical data repository on behalf of the malicious app. In other words, words, a dangerous dangerous information information flow may happen in either either an outbound outbound or inbound inbound direction depending on certain external conditions and/or the internal program state. A prior effort has been made to perform static analysis to discover potential compon component ent hijac hijackin king g vulner vulnerabi abilit lities ies [ 25 25]]. Stat Static ic anal analys ysis is is know known n to be cons conser erv vati ative in nature and may raise false positives. For instance, static analysis may find a viable execution path for information flow, which may never be reached in actual program execution; static analysis may find that interesting information is stored in some elements in a database, and thus has to conservatively treat the entire database

12

2

Background

as sensitive. As a result, upon receiving a discovered discovered vulnerability, vulnerability, the developer developer has to manually confirm if the reported vulnerability is real. However, it is nontrivial for average developers to properly fix the vulnerability and release a patch.

2.3.2

Automatic Patch and Signature Generation

While an automated patching method is still lacking for vulnerable Android apps, a series of studies have been made to automatically generate patch for conventional client client-se -serv rver er progra programs. ms. AutoP AutoPaG aG [ 23 23]] analy analyze zess the the prog progra ram m sour source ce code code and and identifies the root cause for out-of-bound exploit, and thus creates a fine-grained sourc sourcee code code patc patch h to temp tempor orar aril ily y fix it with without out any any huma human n inte interv rven enti tion on.. IntP IntPat atch ch [ 50 50]] utilizes classic type theory and dataflow analysis framework to identify potential integer-overflow-to-buffer-overflow vulnerabilities, and then instruments programs with runtime checks. Sidiroglou and Keromytis [ 39 39]] rely on source code transformations to quickly apply automatically created patches to vulnerable segments of the targeted applications, that are subject to zero-day worms. Newsome et al. [ 31 31]] propose an execution-based filter which filters out attacks on a specific vulnerability based on the vulnerable program’s execution trace. ShieldGen [7 [ 7] generates a data patch or a vulnerability signature for an unknown vulnerability, given a zero-day attac attack k instan instance. ce. Razmo Razmov v and Simon Simon [ 38 38]] automat automatee the filter filter generatio generation n process process based on a simple formal description of a broad class of assumptions about the inputs to an application.

2.3.3

Bytecode Rewriting

In principle, these aforementioned patching techniques can be leveraged to address the vulnerabilities in Android apps. Nevertheless, to fix an Android app, a specific bytecode rewriting technique is needed to insert patch code into the vulnerable programs. Previous studies have utilized this technique to address varieties of problems. The Privacy Blocker application [ 35 35]] performs static analysis of application binaries to identi identify fy and select selectiv ively ely replac replacee reques requests ts for sensit sensitiv ivee data data with with hard-c hard-code oded d shadow data. I-ARM-Droid [9 [9] rewrites Dalvik bytecode to interpose on all the API invocations and enforce the desired security policies. Aurasium [ 46 46]] repackages Android Android apps apps to sandbo sandbox x import important ant nativ nativee APIs APIs so as to monit monitor or securi security ty and privacy violations. Livshits and Jung [24 [ 24]] implement a graph-theoretic algorithm to place mediation prompts into bytecode program and thus protect resource access. However, due the simplicity of the target problems, prior work did not attempt to rewrite the bytecode program in an extensive fashion. In contrast, to address sophisticated vulnerabilities, such as component hijacking, a new machinery has to be developed, so that inserted patch code can effectively monitor and control sensitive information flow in apps.

2.4 Privacy Leakage in Android Apps

2.3.4

13

Instrumentation Code Optimization

The size of a rewritten program usually increases significantly. Thus, an optimization phase is needed. Several prior studies attempted to reduce code instrumentation overhead by performing various static analysis and optimizations. To find error patter patterns ns in Java Java source source code, code, Marti Martin n et al. optimi optimized zed dynamic dynamic instru instrumen menta tatio tion n by perform performing ing static static pointe pointerr alias alias analys analysis is [27 27]. ]. To detect detect numero numerous us softw software are attacks, Xu et al. inserted runtime checks to enforce various security policies in C source code, and remove redundant checks via compiler optimizations [ 45 45]. ]. As a comparison, due to the limited resources on mobile devices, there exists an even more strict strict restricti restriction on for app size. Therefore Therefore,, a novel method method is necessary necessary to address address this new challenge.

2.4

Priva Privacy cy Leakag Leakagee in in Andr Android oid Apps Apps

While powerful Android APIs facilitate versatile functionalities, they also arouse privacy concerns. Previous studies [12 [ 12,, 13 13,, 19 19,, 44 44,, 5 52 2, 54 54]] have exposed that both benign benign and malic maliciou iouss apps apps are steal stealthi thily ly leakin leaking g users’ users’ priva private te inform informati ation on to remote remote servers. Efforts have also been made to detect and analyze privacy leakage either statically or dynamically [12 [ 12,, 13 13,, 15 15,, 22 22,, 25 25,, 26 26,, 48 48]]. Nevertheless, a good solution to defeat privacy leakage at runtime is still lacking.

2.4.1

Privacy Leakage Detection

Egel Egelee et al. al. [10 10]] stud studie ied d the the pri privacy vacy thre threat atss in iOS iOS appl applic icat atio ions ns.. They They proproposed posed PiOS, PiOS, a static static analys analysis is tool tool to detect detect priva privacy cy leaks leaks in Mach-O Mach-O binari binaries. es. TaintDroid is a dynamic analysis tool for detecting and analyzing privacy leaks in Android applications [12 [ 12]]. It modifies Dalvik virtual machine and dynamically instruments Dalvik bytecode instructions to perform dynamic taint analysis. Enck et al. [13 13]] proposed a static analysis approach to study privacy leakage as well. They convert a Dalvik executable to Java source code and leverage a commercial Java source code analysis tool Fortify360 [20 20]] to detect surreptitious data flows. CHEX [25 25]] is designed to vet Android apps for component hijacking vulnerabilities and is essentially capable of detecting privacy leakage. It converted Dalvik bytecode bytecode to WALA [43 43]] SSA IR, and conduc conducted ted static static dataflow dataflow analys analysis is with with WALA framework. AndroidLeaks [15 [ 15]] is a static analysis framework, which also leverages WALA, and identifies potential leaks of personal information in Android applications on a large scale. Mann et al. [ 26 26]] analyzed the Android API for possible sources and sinks of private data and thus identified exemplary privacy policies. All the existing detection methods fundamental cause significant false alarms because

14

2

Background

they cannot differentiate legitimate use of sensitive data from authentic privacy leakage. Though effective in terms of privacy protection, these approaches did not attempt to preserve the system usability.

2.4.2

Privacy Leak Mitigation

Based on TaintDroid, Hornyack et al. [19 19]] proposed AppFence to further mitigate privacy privacy leaks. leaks. When TaintDroid aintDroid discove discovers rs the data dependenc dependency y between between source source and sink, AppFence enforces privacy policies, either at source or sink, to protect sensit sensitiv ivee inform informati ation. on. At source source,, it may provid providee the app with with fake fake inform informati ation on instead of the real one; at sink, it can block sending APIs. To take usability into consideration, the authors proposed multiple access control rules and conducted empirical studies to find the optimal policies in practice. The major limitation of AppFence is the lack of efficiency. AppFence requires modifications in the Dalvik virtual machine to track information flow and incurs considerable performance overhead overhead (14 % on average according to TaintDroid TaintDroid [12 [ 12]). ]). Beside Besides, s, the deploy deploymen mentt is also also challe challengi nging. ng. For one thing, thing, end users users have have to re-install the operating system on their mobile device to enable AppFence. For another, once the Android OS upgrades to a new version, AppFence needs to be re-engineered to work with the novel mechanisms.

2.4.3

Information Flow Control

Though AppFence is limited by its efficiency and deployment, it demonstrates that it is feasible to leverage Information-Flow Control (IFC) technique to address the privacy leakage problem in Android apps. In fact, IFC has been studied on different contexts. Chandra and Franz [5 [ 5] implement an information flow framework for Java virtual machine which combines static analysis to capture implicit flows. JFlow [ 30 30]] extends the Java language and adds statically-checked statically-checked information flow annotations. Jia et al. [21 [ 21]] proposes a component-level runtime enforcement system for Android apps. duPro [32 [32]] is an efficient user-space information flow control framework, which adopts software-based fault isolation to isolate protection domains within the same process. Zeng et al. [ 49 49]] introduces an IRM-implementation framework at a compiler intermediate-representation (IR) level.

2.5

Text Analytics Analytics for for Android Android Security Security

Recent Recently ly,, effor efforts ts have have been been made made to study study the securi security ty implic implicati ations ons of textu textual al descriptions for Android apps. WHYPER [33 [ 33]] used natural language processing technique to identify the descriptive sentences that are associated to permissions

References

15

requests. It implemented a semantic engine to connect textual elements to Android permissions. AutoCog [36 [ 36]] further applied machine learning technique to automaticall cally y corr correl elat atee the the desc descri ript ptiive scri script ptss to perm permis issi sions ons,, and and ther theref efor oree was was able able to asse assess ss description-to-permission fidelity of applications. These studies demonstrates the urgent need to bridge the gap between the textual description and security-related program semantics.

2.5.1

Automated Generation of Software Description

There exists a series of studies on software description generation for traditional Java Java progra programs. ms. Sridha Sridhara ra et al. [40 40]] automa automatic tical ally ly summa summariz rized ed method method syntax syntax and function logic using natural language. Later, they [ 41 41]] improved the method summaries by also describing the specific roles of method parameters and integrating parameter descriptions. They presented heuristics to generate comments and describe the specific roles of different method parameters. Further, they [42 42]] automatically identified high-level abstractions of actions in code and described them in natural language and attempted to automatically identify code fragments that implement high level abstractions of actions and express them as a natural language descri descripti ption. on. In the meanti meantime, me, Buse Buse [ 4] leve leverag raged ed symbol symbolic ic execu executio tion n and code code summarization technique to document program differences, and thus synthesize succinct human-readable documentation for arbitrary program differences. Moreno et al. [29 [ 29]] propose proposed d a summar summariza izatio tion n tool tool which which determ determine iness class class and method method stereotypes and uses them, in conjunction with heuristics, to select the information to be included in the class summaries. The goal of these studies is to improve the program comprehension for developers. As a result, they focus on documenting intra-procedural program logic and low-level code structures. On the contrary, they did not aim at depicting high-level program semantics and therefore cannot help end users to understand the risk of Android Android apps.

References 1. Aafer Y, Du W, Yin H (2013) DroidAPIMiner: mining API-level features for robust malware detect detection ion in androi android. d. In: Procee Proceedin dings gs of the 9th intern internati ationa onall confer conferenc encee on securi security ty and priva privacy cy in communication networks (SecureComm) 2. Arp D, Spreit Spreitzen zenbar barth th M, Hübner Hübner M, Gascon Gascon H, Rieck Rieck K (2014) (2014) Drebin Drebin:: effici efficient ent and explainable detection of android malware in your pocket. In: Proceedings of the 21th annual network and distributed system security symposium (NDSS) 3. Arzt S, Rasthofer S, Fritz C, Bodden E, Bartel A, Klein J, Traon YL, Octeau D, McDaniel P (2014) (2014) FlowDroid FlowDroid:: precise precise context, context, flow, flow, field, object-sen object-sensitiv sitivee and lifecyclelifecycle-awa aware re taint analys analysis is for androi android d apps. apps. In: Procee Proceedin dings gs of the 35th 35th ACM SIGPLA SIGPLAN N confer conferenc encee on programming language design and implementation (PLDI) 4. Buse RP, Weimer WR (2010) Automatically documenting program changes. In: Proceedings of the IEEE/ACM international conference on automated software engineering (ASE)

16

2

Background

5. Chandra D, Franz M (2007) Fine-grained information flow analysis and enforcement in a java virtual machine. In: Proceedings of the 23rd annual computer security applications conference (ACSAC) 6. Chen KZ, Johnson N, D’Silva V, Dai S, MacNamara K, Magrino T, Wu EX, Rinard M, Song D (2013) Contextual policy enforcement in android applications with permission event graphs. In: Procee Proceedin dings gs of the 20th 20th annual annual netwo network rk and distri distribu buted ted system system securi security ty sympos symposium ium (NDSS) (NDSS) 7. Cui W, Peinado M, Wang HJ (2007) Shieldgen: automatic data patch generation for unknown vulnerabilities with informed probing. In: Proceedings of 2007 IEEE symposium on security and privacy 8. Davi Davi L, Dmitri Dmitrienk enko o A, Sadeg Sadeghi hi AR, AR, Winan Winandy dy M (2011) (2011) Pri Privil vileg egee escala escalatio tion n attack attackss on androi android. d. In: Proceedings of the 13th international conference on Information security. Berlin/Heidelberg Berlin/Heidelberg 9. Davis B, Sanders B, Khodaverdian A, Chen H (2012) I-ARM-Droid: a rewriting framework for in-app reference monitors for android applications. In: Proceedings of the mobile security technologies workshop 10. 10. Egel Egelee M, Krue Kruege gell C, Kird Kirdaa E, Vigna igna G (201 (2011) 1) PiOS: PiOS: dete detect ctin ing g priv privac acy y leak leakss in iOS iOS applications. In: Proceedings of NDSS 11. Enck W, Ongtang M, McDaniel P (2009) On lightweight mobile phone application certification. In: Proceedings of the 16th ACM conference on computer and communications security (CCS) 12. Enck W, Gilbert P, Chun BG, Cox LP, Jung J, McDaniel P, Sheth AN (2010) TaintDroid: an informatio information-flo n-flow w tracking tracking system system for realtime realtime privac privacy y monitoring monitoring on smartphon smartphones. es. In: Proceedings of the 9th USENIX symposium on operating systems design and implementation (OSDI) 13. Enck W, Octeau D, McDaniel P, Chaudhuri S (2011) A study of android application security. In: Proceedings of the 20th USENIX Security Symposium 14. Felt AP, Wang HJ, Moshchuk A, Hanna S, Chin E (2011) Permission re-delegation: attacks and defenses. In: Proceedings of the 20th USENIX security symposium 15. Gibler Gibler C, Crusse Crussell ll J, Ericks Erickson on J, Chen Chen H (2012) (2012) Androi AndroidLe dLeaks aks:: automa automatic ticall ally y detect detecting ing potential privacy leaks in android applications on a large scale. In: Proceedings of the 5th international conference on trust and trustworthy computing 16. Grace M, Zhou Y, Wang Z, Jiang X (2012) Systematic detection of capability leaks in stock android smartphones. In: Proceedings of the 19th network and distributed system security symposium 17. Grace M, Zhou Y, Zhang Q, Zou S, Jiang X (2012) RiskRanker: scalable and accurate zeroday android malware detection. In: Proceedings of the 10th international conference on mobile systems, applications and services (MobiSys) 18. 18. Hann Hannaa S, Huan Huang g L, Wu E, Li S, Chen Chen C, Song Song D (201 (2012) 2) Juxt Juxtap app: p: a scal scalab able le syst system em for for detec detectin ting g code reuse among android applications. In: Proceedings of the 9th international conference on detection of intrusions and malware, and vulnerability assessment (DIMVA) 19. Hornyack P, Han S, Jung J, Schechter S, Wetherall D (2011) These aren’t the droids you’re looking for: retrofitting android to protect data from imperious applications. In: Proceedings of CCS 20. HP Fortify Fortify Source Source Code Analyzer Analyzer (2016) (2016) http://www8.hp.com/us/en/software-solutions/static http://www8.hp.com/us/en/software-solutions/staticcode-analysis-sast/ 21. Jia L, Aljuraidan J, Fragkaki E, Bauer L, Stroucken M, Fukushima K, Kiyomoto S, Miyake Y (2013) Run-time enforcement of information-flow properties on android (extended abstract). In: Computer Security–ESORICS 2013: 18th European symposium on research in computer security 22. Kim J, Yoon Y, Yi K, Shin J (2012) Scandal: static analyzer for detecting privacy leaks in android applications. In: Mobile security technologies (MoST) 23. Lin Z, Jiang X, Xu D, Mao B, Xie L (2007) AutoPAG: towards automated software patch generation with source code root cause identification and repair. In: Proceedings of the 2nd ACM symposium on information, computer and communications security

References

17

24. Livsh Livshits its B, Jung Jung J (2013) (2013) Automa Automatic tic mediat mediation ion of priva privacy cy-se -sensi nsiti tive ve resour resource ce access access in smartphone applications. In: Proceedings of the 22th USENIX security symposium 25. Lu L, Li Z, Wu Z, Lee W, Jiang G (2012) CHEX: statically vetting android apps for component hijacking hijacking vulnerabilitie vulnerabilities. s. In: Proceedin Proceedings gs of the 2012 ACM conferenc conferencee on computer computer and communications security (CCS) 26. Mann C, Starostin A (2012) A framework for static detection of privacy leaks in android applications. In: Proceedings of the 27th annual ACM symposium on applied computing 27. 27. Mart Martin in M, Livs Livshi hits ts B, Lam Lam MS (200 (2005) 5) Findi Finding ng appl applic icat atio ion n erro errors rs and and secu securi rity ty flaws flaws usin using g PQL: PQL: a program query language. In: Proceedings of the 20th annual ACM SIGPLAN conference on object-oriented programming, systems, languages, and applications 28. McAfee McAfee Labs Labs Threat Threatss report report Fourt Fourth h Quarte Quarterr (2013) (2013) http://www http://www.mcafee.com/us/reso .mcafee.com/us/resources/ urces/ reports/rp-quarterly-threat-q4-2013.pdf 29. Moreno L, Aponte J, Sridhara G, Marcus A, Pollock L, Vijay-Shanker K (2013) Automatic generatio generation n of natural natural language language summaries summaries for java classes. In: Proceeding Proceedingss of the 2013 IEEE 21th international conference on program comprehension (ICPC) 30. Myers AC (1999) JFlow: practical mostly-static information flow control. In: Proceedings of the 26th ACM symposium on principles of programming languages (POPL) 31. Newsome J (2006) Vulnerability-specific execution filtering for exploit prevention on commodity software. In: Proceedings of the 13th symposium on network and distributed system security (NDSS) 32. Niu B, Tan G (2013) Efficient user-space information flow control. In: Proceedings of the 8th ACM symposium on information, computer and communications security 33. Pandita R, Xiao X, Yang W, Enck W, Xie T (2013) WHYPER: towards automating risk assess assessmen mentt of mobile mobile applic applicati ations ons.. In: Procee Proceedin dings gs of the 22nd 22nd USENIX USENIX confer conferenc encee on security 34. Peng H, Gates C, Sarma B, Li N, Qi Y, Potharaju R, Nita-Rotaru C, Molloy I (2012) Using probabilistic generative models for ranking risks of android apps. In: Proceedings of the 2012 ACM conference on computer and communications security (CCS) 35. Privacy Blocker (2016) http://privacytools.xeudoxus.com/ 36. Qu Z, Rastogi V, Zhang X, Chen Y, Y, Zhu T, Chen Z (2014) Autocog: measuring the descriptionto-per to-permis missio sion n fidelit fidelity y in androi android d applic applicati ations ons.. In: Procee Proceedin dings gs of the 21st 21st confer conferenc encee on computer and communications security (CCS) 37. Rastogi V, Chen Y, Jiang X (2013) DroidChameleon: evaluating android anti-malware against transformation attacks. In: Proceedings of the 8th ACM symposium on information, computer and communications security (ASIACCS) 38. Razmo Razmov v V, Simon Simon D (2001) (2001) Practi Practical cal automa automated ted filter filter genera generatio tion n to expli explicit citly ly enforc enforcee implicit input assumptions. In: Proceedings of the 17th annual computer security applications conference 39. Sidiroglou S and Keromytis AD (2005) Countering network worms through automatic patch generation. IEEE Secur Priv 3:41–49 40. Sridhara G, Hill E, Muppaneni D, Pollock L, Vijay-Shanker K (2010) Towards automatically generating summary comments for java methods. In: Proceedings of the IEEE/ACM international conference on automated software engineering (ASE) 41. Sridhara G, Pollock L, Vijay-Shanker K (2011) Generating parameter comments and integrating with method method summaries. summaries. In: Proceeding Proceedingss of the 2011 IEEE 19th internatio international nal conference conference on program comprehension (ICPC) 42. Sridhara G, Pollock L, Vijay-Shanker K (2011) Automatically detecting and describing high level actions within methods. In: Proceedings of the 33rd international conference on software engineering (ICSE) 43. T.J. Watson Libraries for Analysis (2015) http://wala.sourcefor http://wala.sourceforge.net/wiki/index. ge.net/wiki/index.php/Main_ php/Main_ Page 44. Wu C, Zhou Y, Patel K, Liang Z, Jiang X (2014) AirBag: boosting smartphone resistance to malware infection. In: Proceedings of the 21th annual network and distributed system security symposium (NDSS)

18

2

Background

45. Xu W, Bhatkar S, Sekar R (2006) Taint-enhanced policy enforcement: a practical approach to defeat a wide range of attacks. In: Proceedings of the 15th conference on USENIX security symposium 46. Xu R, Sadi Sadi H, Anders Anderson on R (2012) (2012) Aurasi Aurasium: um: practi practical cal polic policy y enforc enforceme ement nt for androi android d applications. In: Proceedings of the 21th USENIX security symposium 47. Yan LK, Yin H (2012 (2012)) DroidS DroidScop cope: e: seamle seamlessl ssly y recons reconstru tructi cting ng OS and Dalvik Dalvik semant semantic ic views for dynamic android malware analysis. In: Proceedings of the 21st USENIX security symposium 48. Yang Z, Yang M, Zhang Y, Gu G, Ning P, Wang XS (2013) AppIntent: analyzing sensitive data transmission in android for privacy leakage detection. In: Proceedings of the 20th ACM conference on computer and communications security (CCS) 49. Zeng B, Tan G, Erlingsson U (2013) Strato: a retargetable framework for low-level inlinedreference monitors. In: Proceedings of the 22th USENIX security symposium 50. Zhang C, Wang T, Wei T, Chen Y, Zou W (2010) IntPatch: automatically fix integer-overflowto-bu to-buff ffer er-o -ove verflo rflow w vulner vulnerabi abilit lity y at compil compile-t e-time ime.. In: Procee Proceedin dings gs of the 15th 15th Europe European an conference on research in computer security 51. Zhang Y, Yang M, Xu B, Yang Z, Gu G, Ning P, Wang XS, Zang B (2013) Vetting undesirable behaviors in android apps with permission use analysis. In: Proceedings of the 20th ACM conference on computer and communications security (CCS) 52. Zhou Y, Jiang X (2012) (2012) Dissecting Dissecting android malware: malware: character characterizatio ization n and evolutio evolution. n. In: Proceedings of the 33rd IEEE symposium on security and privacy. Oakland 53. Zhou Y, Jiang X (2013) Detecting passive content leaks and pollution in android applications. In: Proceedings of the 20th network and distributed system security symposium 54. Zhou Y, Wang Z, Zhou W, Jiang X (2012) Hey, you, get off of my market: detecting malicious apps in official and alternative android markets. In: Proceedings of 19th annual network and distributed system security symposium (NDSS)

Chapter 3

Semantics-Aware Android Malware Classification

Abstract The drastic increase of Android malware has led to a strong interest in developing methods to automate the malware analysis process. Existing automated Android malware detection and classification methods fall into two general categories categories:: (1) signaturesignature-based based and (2) machine machine learning-b learning-based. ased. SignatureSignature-base based d approaches can be easily evaded by bytecode-level transformation attacks. Prior learning-based works extract features from application syntax, rather than program sema semant ntic ics, s, and and are are also also subj subjec ectt to evasi vasion on.. In this this pape paperr, we prop propos osee a nove novell semantic-based approach that classifies Android malware via dependency graphs. To battle transformation attacks, we extract a weighted contextual API dependency graph as program semantics to construct feature sets. To fight against malware variants and zero-day malware, we introduce graph similarity metrics to uncover homogeneous application behaviors while tolerating minor implementation differences. We implement a prototype system, DroidSIFT , in 23 thousand lines of Java code. We evaluate our system using 2200 malware samples and 13,500 benign samples. samples. Experiments Experiments show that our signature signature detection detection can correctly correctly label 93 % of malware instances; our anomaly detector is capable of detecting zero-day malware with a low false false negative negative rate (2 %) and an acceptable acceptable false positive positive rate (5.15 %) for a vetting purpose.

3.1 3.1

Intr Introd oduc ucti tion on

The number of new Android malware instances has grown exponentially in recent years. McAfee reports [1 [ 1] that 2.47 million new mobile malware samples were coll collec ecte ted d in 2013 2013,, whic which h repr repres esen ents ts a 197 197 % incr increa ease se over over 2012. 2012. Grea Greate terr and and greater amounts of manual effort are required to analyze the increasing number of new malware instances. This has led to a strong interest in developing methods to automate the malware analysis process. Existing automated Android malware detection and classification methods fall into two general categories: (1) signature-based and (2) machine learning-based. Signature-based approaches [2, 3] look for specific patterns in the bytecode and API calls, but they are easily evaded by bytecode-level transformation attacks [4]. Machine learning-based approaches [5–7] extra extract ct feature featuress from from an applica applicatio tion’ n’ss behavior (such as permission requests and critical API calls) and apply standard © The Author(s) 2016 M. Zhang, H. Yin, Android Application Security , SpringerBriefs in Computer Science, DOI 10.1007/978-3-319-47812-8_3 10.1007/978-3-319-47812-8_3

19

20

3


machine learning algorithms to perform binary classification. Because the extracted features are associated with application syntax, rather than program semantics, these detectors are also susceptible to evasion. To directly address malware that evades automated detection, prior works distill program semantics into graph representations, such as control-flow graphs [ 8], data dependenc dependency y graphs [9, 10 10]] and permission event graphs [ 11 11]. ]. These graphs are checked against manually-crafted specifications to detect malware. However, these detectors tend to seek an exact match for a given specification and therefore can potentially be evaded by polymorphic variants. Furthermore, the specifications used for detection are produced from known malware families and cannot be used to battle zero-day malware. In this this chapte chapterr, we propos proposee a novel novel semant semanticic-bas based ed approa approach ch that that class classifie ifiess Android Android malwa malware re via depend dependenc ency y graphs graphs.. To battle battle transf transform ormati ation on attac attacks ks [4], we extra extract ct a weight weighted ed conte contextu xtual al API depend dependenc ency y graph graph as progra program m semant semantics ics to constr construct uct featur featuree sets. sets. The subseq subsequen uentt classi classifica ficati tion on then then depend dependss on more more robust semantic-level behavior rather than program syntax. It is much harder for an adversary to use an elaborate bytecode-level transformation to evade such a traini training ng system system.. To fight fight agains againstt malwa malware re varia variants nts and zerozero-day day malw malware are,, we introduce graph similarity metrics to uncover homogeneous essential application behaviors while tolerating minor implementation differences. Consequently, new or polymorphic malware that has a unique implementation, but performs common malicious behaviors, cannot evade detection. To our knowledge, when compared to traditional semantics-aware approaches for deskto desktop p malwa malware re detect detection ion,, we are the first first to exami examine ne progra program m semant semantics ics within the context of Android malware classification. We also take a step further to defeat malware variants and zero-day malware by comparing the similarity of these programs to that of known malware at the behavioral level. We build a database of behavior graphs for a collection of Android apps. Each graph models the API usage scenario and program semantics of the app that it represents. Given a new app, a query is made for the app’s behavior graphs to search for the most similar counterpart in the database. The query result is a similarity score which sets the corresponding element in the feature vector of the app. Every element of this feature vector is associated with an individual graph in the database. We build build graph graph databa databases ses for two two sets sets of behav behavior iors: s: benign benign and malic maliciou ious. s. Feature vectors extracted from these two sets are then used to train two separate classifiers for anomaly detection and signature detection. The former is capable of discovering zero-day malware, and the latter is used to identify malware variants. We implement implement a prototype prototype system, system, DroidSIFT , in 23 thousand lines of Java code. Our dependency graph generation is built on top of Soot [ 12 12], ], while our graph similarity query leverages a graph matching toolkit [13 13]] to compute graph edit distance. We evaluate our system using 2200 malware samples and 13,500 benign samples. samples. Experiments Experiments show that our signature signature detection detection can correctly correctly label 93 % malware instances; our anomaly detector is capable of detecting zero-day malware with a low false false negative negative rate (2 %) and an acceptable acceptable false positive positive rate (5.15 %) for vetting purpose.

3.2 Overview

3.2 3.2

3.2.1

21

Over Overvi view ew

Problem Statement

An effec effecti tive ve vetti vetting ng process process for disco discove verin ring g malic maliciou iouss softw software are is essent essential ial for mainta maintaini ining ng a health healthy y ecosys ecosyste tem m in the Androi Android d app marke markets. ts. Unfort Unfortuna unate tely ly,, existing vetting processes are still fairly rudimentary. As an example, consider the Bouncer [ Bouncer [14 14]] vetting vetting system system that is used by the official official Google Google Play Play Android Android market. market. Though the technical details of Bouncer are not publicly available, experiments by Ober Oberhe heid idee and and Mill Miller er [15 15]] show show that that Bounce Bouncerr perfor performs ms dynamic dynamic analys analysis is to exam examine ine apps apps within within an emulat emulated ed Androi Android d envir environm onment ent for a limit limited ed period period of time time.. This This meth method od of anal analys ysis is can can be easi easily ly evade vaded d by apps apps that that perf perfor orm m emulator emulator detection, detection, contain hidden behaviors behaviors that are timer-ba timer-based, sed, or otherwise otherwise avoid triggering malicious behavior during the short time period when the app is being vetted. Signature detection techniques adopted by the current anti-malware products have also been shown to be trivially evaded by simple bytecode-level transformations [4 [4]. We propose a new technique, DroidSIFT , illustrated in Fig. 3.1 3.1,, that addresses these these shortc shortcomi omings ngs and can can be deplo deployed yed as a repla replacem cement ent for exist existing ing vetti vetting ng techniques currently in use by the Android app markets. This technique is based on static analysis, which is immune to emulation detection and is capable of analyzing the entirety of an app’s code. Further, to defeat bytecode-level transformations, our static analysis is semantics-aware and extracts program behaviors at the semantic level. Consequ Consequent ently ly,, we are able able to conduc conductt two two kinds kinds of classi classifica ficatio tions: ns: anomal anomaly y detection and signature detection. Upon receiving a new app submission, our vetting process will conduct anomaly detection to determine whether it contains behaviors that that sign signifi ifica cant ntly ly devi deviat atee from from the the beni benign gn apps apps with within in our data databa base se.. If such such a deviation is discovered, a potential malware instance is identified. Then, we conduct furth further er sign signat atur uree dete detect ctio ion n on it to dete determ rmin inee if this this app app fall fallss into into any any malw malwar aree fami family ly within our signature database. If so, the app is flagged as malicious and bounced back to the developer immediately. If the app passes this hurdle, it is still possible that a new malware species has been found. We bounce the app back to the developer with a detailed report when when suspic suspiciou iouss behav behavior iorss that that devia deviate te from from benign benign behav behavior iorss are disco discove vered red,,

Submit Developer

Android App Market

Fig. 3.1 Deployment of DroidSIFT

Update Database & Classifiers

Vet

Offline Graph Database Construction & Online Detection Training Phase

Report

22

3


and we request justification for the deviation. The app is approved only after the developer developer makes a convincing convincing justification for the deviation. Otherwise, after further inve investi stigat gation ion,, we may confirm confirm it it to indeed indeed be a new new malw malware are specie species. s. By placi placing ng this this information into our malware database, we further improve our signature detection to detect this new malware species in the future. It is also also poss possib ible le to depl deploy oy our our tech techni nique que via via a more more ad-h ad-hoc oc sche scheme me.. For For example, our detection mechanism can be deployed as a public service that allows a cautious app user to examine an app prior to its installation. An enterprise that maintains its own private app repository could utilize such a security service. The enterprise service conducts vetting prior to adding an app to the internal app pool, thereby protecting employees from apps that contain malware behaviors.

3.2.2

Architecture Overview

Figure 3.2 Figure 3.2 depicts depicts the workflow of our graph-based Android malware classification. This takes the following steps: (1) Behavior malware re classi classifica ficatio tion n consid considers ers graph graph Behavior Graph Graph Generati Generation. on. Our malwa similarity as the feature vector. To this end, we first perform static program analys analysis is to transf transform orm Androi Android d byteco bytecode de program programss into into their their corres correspon pondin ding g graph representations. Our program analysis includes entry point discovery and call call grap graph h anal analys ysis is to bett better er unde unders rsta tand nd the the API API call callin ing g cont contex exts ts,, and and it leve levera rage gess both both forward forward and backwa backward rd dataflo dataflow w analys analysis is to explo explore re API depend dependenc encies ies and uncover any constant parameters. The result of this analysis is expressed via Weighted Contextual API Dependency Graphs that expose security-related behaviors of Android apps. (2) Scalable Graph Similarity Query. Having generated graphs for both benign and and mali malici ciou ouss apps apps,, we then then quer query y the the grap graph h data databa base se for for the the one one grap graph h most similar to a given graph. To address the scalability challenge, we utilize a bucket-ba bucket-based sed indexing indexing scheme scheme to improve improve search search efficie efficiency ncy.. Each bucket cont contai ains ns grap graphs hs bear bearin ing g APIs APIs from from the the same same Andr Androi oid d pack packag ages es,, and and it is inde indexe xed d

{

}

Android Apps

buckets [0,0,...,0,0,0,0,1]

{

}

[0,0,0,0.9,…,0.8] [1,0.6,0,0,…,0.7] [0,0.9,0.7,0,…,0]

{

}

. . .

[0,0,...,0,0,0,1,0] [0,0,...,0,0,0,1,1]

. . . [1,0,...,1,1,1,1,1]

[0.6,0.9,0,0,…,0] [0.8,0,0.8,0,…,1]

[1,1,...,1,1,1,1,1]

Behavior Graph Generation

Overview of DroidSIFT DroidSIFT Fig. 3.2 Overview

Scalable Graph Similarity Query

Graph-based Feature Vector Extraction

Anomaly & Signature Detection

3.3 Weighted Contextual API Dependency Graph

23

with a bitvector that indicates the presence of such packages. Given a graph query, query, we can quickly quickly seek to the correspondi corresponding ng bucket bucket index by matching matching the package’s vector to the bucket’s bitvector. Once a matching bucket is located, we further iterate this bucket to find the best-matching graph. Finding the bestmatching graph, instead of an exact match, is necessary to identify polymorphic malware. (3) Graph-based Graph-base d Feature Feature Vector Vector Extraction. Given Extraction. Given an app, we attempt to find the best match for each of its graphs from the database. This produces a similarity feature vector. Each element of the vector is associated with an existing graph in the database. This vector bears a non-zero similarity score in one element only if the corresponding graph is the best match to one of the graphs for the given app. (4) Anomaly and Signature Detection. We have implemented a signature classifier and an anomaly detector. We have produced feature vectors for malicious apps, and these vectors are used to train the classifier for signature detection. The anomaly detection discovers zero-day Android malware, and the signature detector uncovers the type (family) of the malware.

3.3

3.3.1

Weighted eighted Contextual Contextual API Dependenc Dependency y Graph

Key Behavioral Aspects

We consider the following aspects as essential when describing the semantic-level behaviors of an Android malware sample: (1) API Dependency. API Dependency. API calls (including reflective calls to the private framework functions) indicate how an app interacts with the Android framework. It is essential to capture what API calls an app makes and the dependencies among those calls. Prior works on semantic- and behavior-based malware detection and classification for desktop environments all make use of API dependency information [ information [9 9, 10 10]. ]. Android malware shares the same characteristics. (2) Context. An entry point of an API call is a program entry point that directly or indire indirectl ctly y trigge triggers rs the call. call. From From a useruser-aw aware arenes nesss point point of view view, there there are are two two kinds kinds of entr entry y poin points ts:: user user inte interf rfac aces es and and back backgr grou ound nd call callba back cks. s. Malware authors commonly exploit background callbacks to enable malicious functionalities without the user’s knowledge. From a security analyst’s perspective, it is a suspicious behavior when a typical user interactive API (e.g., 11]. ]. As a result, we AudioRecord.sta AudioRecord.startRecor rtRecording() ding() ) is called stealthily [ 11 must pay special attention to APIs activated from background callbacks. (3) Constant. Constant Constantss conve convey y semant semantic ic inform informati ation on by reve reveali aling ng the value valuess of crit critic ical al para parame mete ters rs and and unco uncove veri ring ng finefine-gr grai aine ned d API API sema semant ntic ics. s. For For instance, Runtime.exec() may may exec execut utee varie aried d shel shelll comm comman ands ds,, such such as ps or chmod , depending upon the input string constant. Constant analysis

24

3


also discloses the data dependencies of certain security-sensitive APIs whose benign-ness is dependent upon whether an input is constant. For example, a sendTextMessage() call taking a constant premium-rate phone number as a parameter is a more suspicious behavior than the call to the same API receiving the phone number from user input via getText() . Consequently, it is crucial to extract information about the usage of constants for security analysis. Once Once we look look at app behav behavior iorss using using these these three three perspe perspect ctiv ives, es, we perfor perform m similarity checking, rather than seeking an exact match, on the behavioral graphs. Since each individual API node plays a distinctive role in an app, it contributes differently to the graph similarity. similarity. With regards to malware detection, we emphasize security-sensitive APIs combined with critical contexts or constant parameters. We assi assign gn weig weight htss to diff differ eren entt API API node nodes, s, givi giving ng grea greate terr weig weight htss to the the node nodess cont contai aini ning ng crit critic ical al call calls, s, to impr improv ovee the the “qua “quali lity ty”” of beha behavi vior or grap graphs hs when when meas measur urin ing g similarity. Moreover, the weight generation is automated. Thus, similar graphs have higher similarity scores by design.

3.3.2

Formal Definition

To addres addresss all all of the aforem aforement ention ioned ed factor factors, s, we descri describe be app behav behavior iorss using using (WC-ADG DG). ). At a high high leve level, l, a Weighted eighted Context Contextual ual API Dependenc Dependencyy Graphs Graphs (WC-A WC-ADG WC-ADG consis consists ts of API operat operation ionss where where some some of the operat operation ionss have have data data dependencies. A formal definition is presented as follows. Definition 1. A Weighted Contextual API Dependency Graph is a directed graph G D . D . V ; E ; ˛ ; ˇ / over a set of API operations ˙ and a weight space W , where: • The set of verti vertices ces V corresponds to the contextual API operations in ˙ ; • The se set of edg edgees E  V  V corresponds to the data dependencies between operations; • The The labe labeli ling ng functi function on ˛ W V ! ˙ asso associ ciat ates es node nodess with with the the labe labels ls of corresponding contextual API operations, where each label is comprised of 3 elements: API prototype, entry point and constant parameter; • The labeli labeling ng func functio tion n ˇ W V ! W associates nodes with their corresponding weights, where 8w 2 W , w 2 R, and R represents the space of real numbers.

3.3.3

A Real Example

Zitmo is a class of banking trojan malware that steals a user’s SMS messages to discover banking information (e.g., mTANs). Figure 3.3 presents an example WCADG that depicts the malicious behavior of a Zitmo malware sample in a concise, yet complete, manner. This graph contains five API call nodes. Each node contains


25

, BroadcastReceiver.onReceive, BroadcastReceiv er.onReceive, Ø , BroadcastReceiver.onReceive, BroadcastReceiv er.onReceive, Ø , BroadcastReceiver.onReceive, BroadcastReceiv er.onReceive, Ø (java.util.List)>, BroadcastReceiver.onReceive, BroadcastReceive r.onReceive, Ø , BroadcastReceiver.onReceive, BroadcastReceiv er.onReceive, Ø , BroadcastReceiver.onReceive, BroadcastReceive r.onReceive, Setc Setc = {”http://softthrifty.com/security.jsp”} of zitmo Fig. 3.3 WC-ADG of

the call’ call’s prototype, prototype, a set of any constant constant parameter parameters, s, and the entry points of the call. Dashed arrows that connect a pair of nodes indicates that a data dependency exists between the two calls in those nodes. By comb combin inin ing g the the knowl knowled edge ge of API API prot protot otyp ypes es with with the the data data depe depend nden enccy info inform rmaation tion sho shown in the gra graph, ph, we kno know that hat the the app is forw forwaardin rding g an incoming SMS to the network. Once an SMS is received by the mobile phone, Zitm Zitmo o crea create tess an SMS SMS obje object ct from from the the raw raw Prot Protoc ocol ol Data Data Unit Unit by call callin ing g createFromPdu(byte[]) . It extra extract ctss the sender sender’’s phone phone number number and messa message ge content by calling getOriginating- Address() and getMessageBody() . Both strings are encoded into an UrlEncoded- FormEntity object and enclosed into HttpEntityEnclosingRequestBase by using the setEntity() call. Finally, this HTTP request is sent to the network via AbstractHttpClient.execute() . Zitmo variants may also exploit various other communication-related API calls for the the send sendin ing g purp purpos ose. e. Anot Anothe herr Zitm Zitmo o inst instan ance ce uses uses SmsManager.sendTextMessage() to deliver the stolen information as a text message to the attacker’s phone. Such variations motivate us to consider graph similarity metrics, rather than an exact matching of API call behavior, when determining whether a sample app is benign or malicious. The context provided by the entry points of these API calls informs us that the user is not aware of this SMS forwarding behavior. These consecutive API invo invocat cation ionss start start within within the entry entry point point metho method d onReceive() wi with a cal call to createFromcreateFrom- Pdu(byte[]) Pdu(byte[]) . onReceive() is a broadcast receiver registered by the app to recei receive ve incomi incoming ng SMS messag messages es in the backgr backgroun ound. d. Theref Therefore ore,, the createFromPdu(byte[]) and subsequent API calls are activated from a non-userinteractive entry point and are hidden from the user.

26

3


Constant analysis of the graph further indicates that the forwarding destination is suspicious. The parameter of execute() is neither the sender (i.e., the bank) nor any familiar parties from the contacts. It is a constant URL belonging to an unknown third-party.

3.3.4

Graph Generation

We have implemented a graph generation tool on top of Soot [ 12 12]] in 20 k line ines of code. This tool examines an Android app to conduct entry point discovery and perform perform context-se context-sensiti nsitive, ve, flow-sensi flow-sensiti tive, ve, and interproce interprocedural dural dataflow dataflow analyses. analyses. These analyses locate API call parameters and return values of interest, extract constant parameters, and determine the data dependencies among the API calls.

3.3.4.1 3.3.4.1

Entry Entry Point Point Discov Discovery ery

Entry point discovery is essential to revealing whether the user is aware that a certain API call has been made. However, this identification is not straightforward. Consider the callgraph seen in Fig. 3.4 3.4.. This graph describes a code snippet that registers a onClick() event handler for a button. From within the event handler, the the code code star starts ts a thre thread ad inst instan ance ce by call callin ing g Thread.start() , which which invok invokes es the run() method implement implementing ing Runnable.run() . The run() method method passes passes an android.os.Message object to the message queue of the hosting thread via Handler.sendMessage() . A Handler object created in the same thread is then bound to this message queue and its Handler.handleMessage() callback will process the message and later execute sendTextMessage() . The sole entry point to the graph is the user-interactive callback onClick() . However, prior work [16 [16]] on the identification of program entry points does not consider asynchronous calls and recognizes all three callbacks in the program as individual entry points. This confuses the determination of whether the user is aware

Fig. 3.4 Callgraph for asynchronously sending an SMS message. “e” and “a” stand for “event handler” and “action” respectively respectively

OnClickListener. onClick

Runnable.run

Handler. handleMessage

e1

e2

e3

a1

a2

a3

Runnable.start

Handler. SmsManager. sendMessage sendTextMessage


27

Algorithm 1 Entry 1 Entry Point Reduction for Asynchronous Callbacks M entry entry  {Possible entry point callback methods} CM async async  {Pairs of ( BaseClass; RunMethod ) for asynchronous calls in framework} to StartMethod for for asynchronous calls in framework} RS async async  {Map from RunMethod to for m entry 2 M entry entry do c  the class declaring m entry base  the base class of c .base; mentry / 2 CM async if . async then mstart  Lookup(mentry ) in RS async async for 8 call to mstart do r  “this” reference of call PointsToSet  PointsToAnalysis( r ) c 2 PointsToSet then if c M entry entry D M entry entry  fmentry g BuildDependencyStub(mstart , m entry ) end if end for end if end for output M entry entry as reduced entry point set

that an API call has been made in response to a user-interactive callback. To address this limitation, we propose Algorithm 1 1 to to remove any potential entry points that are actually part of an asynchronous call chain with only a single entry point. Algorithm 1 accepts three inputs and provides one output. The first input is M entry entry , which is a set of possible entry points. The second is CM async async , which is a set of ( BaseClass, RunMethod ) pairs. BaseClass represents a top-level asynchronous base class (e.g., Runnable ) in the Android framework and RunMethod is is the asynchronous call target (e.g., Runnable.run() ) declared in this class. The third input is RS async async , which maps RunMethod to StartMethod . RunMethod and StartMethod are are the the call callee ee and and call caller er in an asyn asynch chro ronou nouss call call (e.g (e.g., ., Runnable.run() and Runnable.start() ). The output is a reduced M entry entry set. We comp comput utee the the M entry inputt by appl applyi ying ng the the algo algori rith thm m prop propos osed ed by Lu et al. al. [16 16], ], entry inpu which discovers all reachable callback methods defined by the app that are intended to be called only by the Android framework. To further consider the logical order between Intent senders and receivers, we leverage Epicc [17 [ 17]] to resolve the intercomponent communications and then remove the Intent receivers from M entry entry . Thro Throug ugh h exami xamina nattion ion of the Andr Androi oid d fram frameework ork code ode, we gene genera ratte a list list of 3-tu 3-tupl ples es cons consis isti ting ng of BaseClass, RunMethod and StartMethod . For example, we capture the Android-specific calling convention of AsyncTask with AsyncTask.on AsyncTask.on- PreExecute() PreExecute() being triggered by AsyncTask.execute() . When a new new async asynchr hron onous ous call call is intr introd oduc uced ed into into the the fram frameework work code code,, this this list list is upda update ted d to include the change. Table 3.1 3.1 presents presents our current model for the calling convention of top-level base asynchronous classes in Android framework. Given these inputs, our algorithm iterates over M entry entry . For every method mentry in this set, it finds the class c that declares this method and the top-level base class

28

3


convention of asynchronous asynchronous calls Table 3.1 Calling convention Top-level class

Run method

Start method

Runnable AsyncTask AsyncTask AsyncTask AsyncTask AsyncTask AsyncTask Mes Message

run() onPreExecute() onPreExecute() doInBackground( doInBackground() ) onPostExecute() onPostExecute() hand andleMes Message( ge()

start() execute() execute() onPreExecute() onPreExecute() doInBackground( doInBackground() ) send endMessa ssage()

for dataflow of AsyncTask.execute Fig. 3.5 Stub code for AsyncTask.execute

base that c inherits from. Then, it searches the pair of base and mentry in the CM async async set. If a match is found, the method mentry is a “callee” by convention. The algorithm thus looks up mentry in the map SRasync to find the corresponding “caller” mstart . Each call to m start is further examined examined and a points-to analysis analysis is performed performed on the “this” reference making the call. If class c of method mentry belongs to the points-to set, we can ensure the calling relationship between the caller mstart and the callee mentry and remove the callee from the entry point set. To indicate the data dependency between these two methods, we introduce a stub which connects the parameters of the asynchronous call to the corresponding parameters of its callee. Figure 3.5 3.5 depicts depicts the example stub code for AsyncTask , where the parameter of execute() is first passed to doInBackground() through the stub executeStub() , and then the return from this asynchronous execution is further transferred to onPostExecute() via onPostExecuteStub() . Once Once the the algo algori rith thm m has has redu reduce ced d the the numb number er of entr entry y poin pointt meth method odss in explor oree all all code code reac reacha habl blee from from thos thosee entr entry y point points, s, incl includ udin ing g both both M entry entry , we expl synchronous and asynchronous calls. We further determine the user interactivity of an entr entry y poin pointt by exami xamini ning ng its its toptop-le leve vell base base clas class. s. If the the entr entry y poin pointt call callba back ck over overri ride dess a coun counte terp rpar artt decl declar ared ed in one one of the the thre threee toptop-le leve vell UIUIrela relate ted d inte interf rfac aces es (i.e (i.e., ., android.graphics.drawable.Drawable.Callback , android.view.ac android.view.accessibicessibi- lity.Accessibi lity.AccessibilityEven lityEventSource tSource , and android.view.KeyEvent.Callback ), we then consider the derived entry point method as a user interface.


3.3.4. 3.3.4.2 2

29

Const Constant ant Analys Analysis is

We conduct constant analysis for any critical parameters of security sensitive API calls. These calls may expose security-related behaviors depending upon the values of their constant parameters. For example, Runtime.exec() can directly execute shell commands, and file or database operations can interact with distinctive targets by providing the proper URIs as input parameters. To understand these semantic-level differences, we perform backward dataflow analys analysis is on select selected ed parame paramete ters rs and colle collect ct all possib possible le consta constant nt value valuess on the backward trace. We generate a constant set for each critical API argument and mark the parameter as “Constant” in the corresponding node on the WC-ADG. While a more complete string constant analysis is also possible, the computation of regular expressions is fairly expensive for static analysis. The substring set currently generated effectively reflects the semantics of a critical API call and is sufficient for further feature extraction.

3.3.4.3 3.3.4.3

API Dependenc Dependency y Construct Construction ion

We perform global dataflow analysis to discover data dependencies between API nodes and build the edges on WC-ADG. However, it is very expensive to analyze every single API call made by an app. To address computational efficiency and our interests on security analysis, we choose to analyze only the security-related API calls. Permissions are strong indicators of security sensitivity sensitivity in Android systems, so we leverage the API-permission mapping from PScout [ 18 18]] to focus on permissionrelated API calls. Our static dataflow analysis is similar to the “split”-based approach used by CHEX [16 [16]]. Each program split includes all code reachable from a single entry point. Dataflow analysis is performed on each split, and then cross-split dataflows are examined. The difference between our analysis and that of CHEX lies in the fact that we compute larger splits due to the consideration of asynchronous calling conventions. We mak make a spec specia iall consi onside dera rati tion on for for refl reflecti ectiv ve call alls wit within hin our our ana analysis ysis.. In Andro ndroiid prog progra rams ms,, refle reflect ctiion is rea realiz lized by callin lling g the the meth method od java.lang.refle java.lang.reflect. ct. Method.invoke() Method.invoke() . The “this” reference of this API call is a Method object, which is usually obtained by invoking either getMethod() or getDeclaredMethod() from java.lang.Class . The class is often acquired in a reflective manner too, through Class.forName() . This API call resolves a string input and retrieves the associated Class object. We consider consider any reflecti reflective ve invoke() call call as a sink sink and and condu conduct ct back backwa ward rd dataflow analysis to find any prior data dependencies. If such an analysis reaches string constants, we are able to statically resolve the class and method information. Otherwise, the reflective call is not statically resolvable. However, statically unresolv solvab able le beha behavi vior or is stil stilll repr repres esen ente ted d with within in the the WC WC-A -ADG DG as node nodess whic which h cont contai ain n no constant parameters. Instead, this reflective call may have several preceding APIs, from a dataflow perspective, which are the sources of its metadata.

30

3.4

3


Andro Android id Malwar Malwaree Classifi Classificat cation ion

We generate WC-ADGs for both benign and malicious apps. Each unique graph is associated with a feature that we use to classify Android malware and benign applications.

3.4.1

Graph Matching Score

To quantify the similarity of two graphs, we first compute a graph edit distance. To our knowledge, all existing graph edit distance algorithms treat node and edge uniformly. However, in our case, our graph edit distance calculation must take into account the different weights of different API nodes. At present, we do not consider assigning different weights on edges because this would lead to prohibitively high complexity in graph matching. Moreover, to emphasize the differences between two nodes in different labels, we do not seek to relabel them. Instead, we delete the old node and insert the new one subsequently. This is because node “relabeling” cost, in our context, is not the string edit distance between the API labels of two nodes. It is the cost of deleting the old node plus that of adding the new node. Definition 2. The Weighted Graph Edit Distance (WGED) of two Weighted Contextual API Dependency Graphs G and G0 , with a uniform weight function ˇ , is the minimum cost to transform G to G 0 :

X

0

wged .G; G ; ˇ/ D ˇ/ D min .

ˇ.v I / C

0

v I 2fV V g

X

ˇ.v D / C j E I j C j E D j/;

(3.1)

0

v D 2fV V g

where V and V 0 are respectively the vertices of two graphs, v I and v D are individual vertices inserted to and deleted from G, while E I and E D are the edges added added to and removed from G . WGED presents the absolute difference between two graphs. This implies that wged (G; G0 ) is roughly proportional to the sum of graph sizes and therefore two larger graphs are likely to be more distant to one another. To eliminate this bias, we normalize the resulting distance and further define Weighted Graph Similarity based upon it. Weighted Graph Similarity of two two Weighte eighted d Contex Contextua tuall API Definition 3. The Weighted Dependency Graphs G and G 0 , with a weight function ˇ , is, 0

wgs.G; G ; ˇ/ D ˇ/ D 1 1 

wged .G; G0 ; ˇ/

wged .G; ;; ˇ/ C wged .;; G0 ; ˇ/

;

(3.2)

where ; is an empt empty y grap graph. h. wged .G; ;; ˇ/ C wged .;; G0 ; ˇ/ then then equate equatess the maximum possible edit cost to transform G to G 0 .

3.4 Android Malware Classification

3.4.2

31

Weight Assignment

Instead of manually specifying the weights on different APIs (in combination of their attributes), we wish to see a near-optimal weight assignment.

3.4.2.1 3.4.2.1

Selectio Selection n of Critical Critical API Labels Labels

Give Given n a larg largee numb number er of API API labe labels ls (uni (uniqu quee comb combin inat atio ions ns of API API name namess and and attributes), it is unrealistic to automatically assign weights for every one of them. Our goal is malware classification, so we concentrate on assigning weights to labels for the security-sensitive APIs and critical combinations of their attributes. To this end, we perform concept learning to discover critical API labels. Given a positive example set (PES) containing malware graphs and a negative example set (NES) containing benign graphs, we seek a critical API label (CA) based on two requirements: (1) frequency(CA,PES) > frequency(CA,NES) and (2) frequency(CA,NES) is less less than than the the medi median an freq freque uenc ncy y of all all crit critic ical al API API labe labels ls in NES. NES. The The first first requirement guarantees that a critical API label is more sensitive to a malware sample than a benign one, while the second requirement ensures the infrequent presence of such an API label in the benign set. Consequently, we have selected 108 critical API labels. Our goal becomes the assignment of appropriate weights to these 108 labels while assigning a default weight of 1 to all remaining API labels.

3.4.2.2 3.4.2.2

Weight eight Assignme Assignment nt

Intuitively, if two graphs come from the same malware family and share one or more critical API labels, we must maximize the similarity between the two. We call such a pair of graphs a “homogeneous pair”. Conversely, if one graph is malicious and the other is benign, even if they share one or more critical API labels, we must minimize the similarity between the two. We call such a pair of graphs a “heterogeneous pair”. Therefore, we cast the problem of weight assignment to be an optimization problem. is an optimization problem to maximize the Definition 4. The Weight Assignment is result of an objective function for a given set of graph pairs { < G; G0 >}: 0

max f .f< G ; G >g; ˇ/ D ˇ/ D

X

0

wgs.G; G ; ˇ/ 

is a homogeneous pair

X

wgs.G; G0 ; ˇ/

is a heterogeneous pair

(3.3)

s:t :

1  ˇ.v/  ˇ.v/     ; if v is a critical nodeI ˇ.v/ D ˇ.v/ D 1; 1; otherwise;

32

3


feedback loop to Fig. 3.6 A feedback solve the optimization problem

Homogeneous Heterogeneous Graph Pairs Graph Pairs

β

+

-

f({},β) Hill Climber

where ˇ is the weight function that requires optimization;  is the upper bound of a weight. Empirically, we set  to be 20. To achieve the optimization optimization of Eq. ( 3.3 3.3)), we use the Hill Climbing algorithm [19 [ 19]] to implement a feedback loop that gradually improves the quality of weight assignment. Figure 3.6 Figure 3.6 presents presents such a system, which takes two sets of graph pairs and an initial weight function ˇ as inputs. ˇ is a discrete function which is represented as a weight vector. vector. At each iteration, Hill Climbing adjusts a single element in the weight vector and determines whether the change improves the value of objective function f .f< G; G0 >g; ˇ/ . Any change that improves f .f< G; G0 >g; ˇ/ is accepted, and the process continues until no change can be found that further improves the value.

3.4.3

Implementation and Graph Database Query

To comput computee the weight weighted ed graph graph simila similarit rity y, we use a bipart bipartit itee graph graph matchi matching ng tool [13 [13]. ]. We cannot directly use this graph matching tool because it does not support assigning different weights on different nodes in a graph. To work around this limitation, we enhanced the bipartite algorithm to support weights on individual nodes. We then develop a graph database, where dependency graphs are stored into multiple buckets. Each bucket is labeled according to the presence of critical APIs. To ensure the scalability, we implement the bucket-based indexing with a hash map where the key is the API package bitvector and the value is a corresponding graph set. Empirically, we found this one-level indexing efficient enough for our problem. If the database grows much larger, we can transition to a hierarchical database structure, such as vantage point tree [ 20 20]], under each bucket.

3.4 Android Malware Classification

3.4.4

Malware Classification Cl assification

3.4.4.1 3.4.4.1

Anomaly Anomaly Detectio Detection n

33

We have implemented a detector to conduct anomaly detection. Given an app, the detector provides a binary result that indicates whether the app is abnormal or not. To achieve this goal, we build a graph database for benign apps. The detector then attempts to match the WC-ADGs of the given app against the ones in database. If a sufficiently similar one for any of the behavior graphs is not found, an anomaly is reported by the detector detector.. We have have set the similarity similarity threshold threshold to be 70 % per our empirical studies.

3.4.4.2 3.4.4.2

Signatur Signaturee Detectio Detection n

We next use a classifier to perform signature detection. Our signature detector is a multi-label classifier designed to identify the malware families of unknown malware instances. To enable classification, we first build a malware graph database. To this end, we conduct static analysis on the malware samples from the Android Malware Genome Project [ Project [21 21,, 22 22]] to extract WC-ADGs. In order to consider only the unique graphs, we remove any graphs that have a high level of similarity to existing ones. With experim experiment ental al study study,, we consid consider er a high high simil similari arity ty to be greate greaterr than than 80 %. Further, to guarantee the distinctiveness of malware behaviors, we compare these malware graphs against our benign graph set and remove the common ones. Next, given an app, we generate its feature vector for classification purpose. In such a vector, each element is associated with a graph in our database. And, in turn, all the existing graphs are projected to a feature vector. In other words, there exists a one-to-one correspondence between the elements in a feature vector and the existing graphs in the database. To construct the feature vector of the given app, we produce its WC-ADGs and then query the graph database for all the generated graphs. For each query, a best matching graph is found. The similarity score is then put into the feat featur uree vect vector or at the the posi positi tion on corr corres espon pondi ding ng to this this best best matc matchi hing ng graph graph.. Spec Specifi ifica call lly y, the the feat featur uree vect vector or of a kno known malw malwar aree samp sample le is atta attach ched ed with with its its fami family ly labe labell so that that the classifier can understand the discrepancy between different malware families. Figure 3.7 3.7 gives gives an example of feature vectors. In our malware graph database of 862 graphs, a feature vector of 862 elements is constructed for each app. The two behavior graphs of ADRD are most similar to graph G6 and G7, respectively, from the database. The corresponding elements of the feature vector are set to the similarity scores of those features. The rest of the elements remain set to zero. Once we have produced the feature vectors for the training samples, we can next use them to train a classifier. We select Naïve Bayes algorithm for the classification. In fact, we can choose different algorithms for the same purpose. However, since our graph-based features are fairly strong, even Naïve Bayes can produce satisfying

34

3

ADRD

G1

G2

G3

G4

G5

0

0

0

0

0

0

0

0.7

0

DroidDream 0.9 DroidKungFu


0

G6

G7

G8

0.8 0.9

0

0

0.8 0.7 0.7

0

0

0.6

0

0

0.6

... ... ... ...

G861 G86 1 G86 G862 2

0

0

0

0

0

0.9

of feature vectors Fig. 3.7 An example of

results. Naïve Bayes also has several advantages: it requires only a small amount of training data; parameter adjustment is straightforward and simple; and runtime performance is favorable.

3.5 3.5

3.5.1

Eval Evalua uati tion on

Dataset and Experiment Setup

We coll collec ecte ted d 2200 2200 mal malware ware samp sample less from from the the Andr Androi oid d Mal Malware ware Geno Genome me Project [21 21]] and McAfee, Inc, a leading antivirus company. To build a benign dataset, we received a number of benign samples from McAfee, and we downloaded a variety of popular apps having a high ranking from Google Play. Specifically, with without out loss loss of gene genera rali lity ty,, we foll follow owed ed the the prio priorr appr approa oach ch in majo majorr rese resear arch ch work [6 [6, 23 23]] and and coll collec ecte ted d the the top top 1000 1000 apps apps from from each each of the the 16 cate categor gorie ies. s. To further sanitize this benign dataset, we sent these apps to the VirusTotal service for inspection. The final benign dataset consisted of 13,500 samples. We performed the behavior graph generation, graph database creation, graph similarity query and feature vector extraction using this dataset. We conducted the experiment on a test machine machine equipped equipped with Intel(R) Intel(R) Xeon(R) E5-2650 E5-2650 CPU (20 M Cache, 2 GHz) and 128 GB of physical physical memory. memory. The operating operating system is Ubuntu Ubuntu 12.04.3 12.04.3 (64bit). (64bit).

3.5.2

Summary of Graph Generation

We summ summar ariz izee the the char charac acte teri rist stic icss of the the beha behavi vior or grap graphs hs gene genera rate ted d from from both both beni benign gn and malicious apps. Figures 3.8 and 3.9 illustrate the distribution of the number of grap graphs hs gene genera rate ted d from from beni benign gn and and mali malici cious ous apps apps.. On avera verage ge,, 7.8 7.8 grap graphs hs are computed from each benign app, while 9.8 graphs are generated from each malware instance. Most apps focus on limited functionalities and do not produce a large large number of behavior behavior graphs. graphs. In 92 % of benign samples samples and 98 % of malicious malicious ones, no more than 20 graphs are produced from an individua individuall app.

3.5 Evaluation

35

4000

3000 s p p A f o r 2000 e b m u N

1000

0

0

6

1 2

1 8

2 4

3 0

3 6

4 2

4 8

5 4

6 0

6 6

7 2

Number of Graphs

number of graphs in benign apps Fig. 3.8 Distribution of the number 600

450 s p p A f o r 300 e b m u N

150

0

1

4

7

1 0

1 3

1 6

1 9

2 2

2 5

2 8

3 1

3 4

3 7

Number of Graphs

number of graphs in malware Fig. 3.9 Distribution of the number

Figures 3.10 Figures 3.10 and and 3.11 3.11 present present the distribution of the number of nodes in benign and malicious behavior graphs. A benign graph, on average, has 15 nodes, while a malicious graph carries 16.4. Again, most of the activities are not intensive, so the majority of these graphs have a small number of nodes. Statistics show that 94 % of the benign graphs graphs and 91 % of the malicious malicious ones carry carry less than than 50 nodes. These facts serve as the basic requirements for the scalability of our approach, since the runtime performance of graph matching and query is largely dependent upon the number of nodes and graphs, respectively.

36

3


100000

75000

s h p a r G f o 50000 r e b m u N

25000

0 0

4 0

8 0

0 0 0 0 0 0 0 0 0 0 1 2 1 6 2 0 2 4 2 8 3 2 3 6 4 0 4 4 4 8

Number of Nodes

nodes in benign graphs Fig. 3.10 Distribution of the number of nodes 8000

m u N

ar G f o r e b

s h p

6000

4000

2000

0

0

2 0

4 0

6 0

8 0

0 1 0

0 1 2

0 1 4

0 1 6

0 1 8

0 2 0

0 2 2

Number of Nodes

nodes in malware graphs Fig. 3.11 Distribution of the number of nodes

3.5.3

Classification Results

3.5.3.1 3.5.3.1

Signatur Signaturee Detectio Detection n

We use a multi-label classification to identify the malware family of the unrecognized malicious samples. Therefore, we expect to only include those malware behavior graphs, that are well labeled with family information, into the database. To this end, we rely on the malware samples from the Android Malware Genome

3.5 Evaluation

37

Projec Projectt and use them them to constr construct uct the malwa malware re graph graph databa database. se. Conseq Consequen uently tly,, we built such a database of 862 unique behavior graphs, with each graph labeled with a specific malware family. We then selected 1050 malware samples from the Android Malware Genome Project and used them as a training set. Next, we would like to collect testing samples from the rest of our collection. However, a majority of malware samples from from McAf McAfee ee are are not not stro strong ngly ly labe labele led. d. Over Over 90 % of the the samp sample less are are coar coarse sely ly labeled as “Trojan” or “Downloader”, but in fact belong to a specific malware family family (e.g., (e.g., DroidDream DroidDream). ). Moreover Moreover,, even even VirusTotal irusTotal cannot cannot provide provide reliable reliable malware malware family family inform informati ation on for a give given n sample sample becaus becausee the anti antivirus virus product productss used used by VirusTotal seldom reach a consensus. This fact tells us two things: (1) It is a nontrivial task task to collect evident evident samples as as the ground truth in the context context of multi-label multi-label classification; (2) multi-label malware detection or classification is, in general, a challenging real-world problem. Despite the difficulty, we obtained 193 samples, each of which is detected as the same malware by major AVs. We then used those samples as testing data. The experiment result shows shows that our classifier can correctly label 93 % of these malware instances. Amon Among g the the succ succes essf sful ully ly labe labele led d malw malwar aree samp sample less are are two two type typess of Zitm Zitmo o variants. One uses HTTP for communication, and the other uses SMS. While the former one is present in our malware database, the latter one was not. Nevertheless, our signature detector is still able to capture this variant. This indicates that our similarity metrics effectively tolerate variations in behavior graphs. We further examined examined the 7 % of the samples samples that were mislabeled. mislabeled. It turns out that the mislabeled cases can be roughly put into two categories. First, DroidDream sample sampless are labele labeled d as DroidK DroidKungF ungFu. u. DroidD DroidDrea ream m and DroidK DroidKung ungFu Fu share share multip multiple le malici malicious ous behav behavior iorss such such as gather gathering ing priva privacy cy-re -relat lated ed inform informati ation on and hidden hidden network I/O. Consequently, there exists a significant overlap between their WCADGs ADGs.. Seco Second nd,, Zitm Zitmo, o, Zson Zsonee and and YZHC YZHC inst instan ance cess are are labe labele led d as one anot anothe herr. Thes Thesee three families are SMS Trojans. Though their behaviors are slightly different from each other, they all exploit sendTextMessage() to deliver the user’s information to an attac attacker ker-sp -speci ecified fied phone phone number number.. Despit Despitee the mislab mislabele eled d cases cases,, we still still manage manage to succes successfu sfull lly y label label 93 % of the malware malware samples samples with with a Naïve Naïve Bayes Bayes classifier. Applying a more advanced classification algorithm will further improve the accuracy.

3.5.3.2 3.5.3.2

Anomaly Anomaly Detectio Detection n

Since we wish to perform anomaly detection using our benign graph database, the cove coverag ragee of this database database is essent essential ial.. In theory theory, the more benign benign apps apps that that the database collects, the more benign behaviors it covers. However, in practice, it is extremely difficult to retrieve benign apps exhaustively. Luckily, different benign apps may share the same behaviors. Therefore, we can focus on unique behaviors (rather than unique apps). Moreover, with more and more apps being fed into the

38

3


12000

s h p a r 10000 G e 8000 u q i n U 6000 f o r 4000 e b m u 2000 N 0 3000 30 00 400 000 0 50 500 00 600 000 0 70 700 00 800 000 0 90 900 00 10000 10000 110 1000 00

Number of Benign Apps Convergence of unique unique graphs in benign apps Fig. 3.12 Convergence

benign database, the database size grows slower and slower. Figure 3.12 3.12 depicts depicts our discovery. When the number of apps increases from 3000 to 4000, there is a sharp increase (2087) of unique graphs. However, when the number of apps grows from 10,000 to 11,000, only 220 new, unique graphs are generated, and the curve begins to flatten. We built a database of 10,420 unique graphs from 11,400 benign apps. Then, we test tested ed 2200 2200 malw malwar aree samp sample less agai agains nstt the the beni benign gn clas classi sifie fierr. The The fals falsee nega negati tive ve rate rate was 2 %, which indicates that 42 malware instances were not detected. detected. However, However, we noted that most of the missed samples are exploits or Downloaders. In these cases, their bytecode programs do not bear significant API-level behaviors, and therefore generated WC-ADGs do not necessarily look abnormal when compared to benign ones. At this point, we have only considered the presence of constant parameters in an API call. We did not further differentiate API behaviors based upon constant values. Therefore, we cannot distinguish the behaviors of Runtime.exec() calls or network I/O APIs with varied string inputs. Nevertheless, if we create a custom filter for these string constants, we will then be able to identify these malware samples and the false negative rate will drop to 0. Next, we used the remaining 2100 benign apps as test samples to evaluate the false false positive positive rate of our anomaly anomaly detector. detector. The result result shows that 5.15 % of clean apps are mistakenly mistakenly recognized recognized as suspicious suspicious ones during anomaly detection. detection. This means, if our anomaly detector is applied to Google Play, among the approximately 1200 new apps per day [24 [ 24], ], around 60 apps will be mislabeled as containing anom anomal alie iess and and be boun bounce ced d back back to the the deve develo lope pers rs.. We beli belieeve that that this this is an acceptable ratio for vetting purpose. Moreover, since we do not reject the suspicious apps immediately, but rather ask the developers for justifications instead, we can

3.5 Evaluation

39

further eliminate these false positives during this interactive process. In addition, as we add more benign samples into the dataset, the false positive rate will further decrease. Further, we would like to evaluate our anomaly detector with malicious samples from new new malwa malware re famili families. es. To this this end, end, we retrie retrieve ve a new new piece piece of malwa malware re called Android.HeHe, which was first found and reported in January 2014 [25 25]. ]. Android.HeHe exercises a variety of malicious functionalities such as SMS interception, information stealing and command-and-control. This new malware family does not appear in the samples from the Android Malware Genome Project, which were collected collected from August 2010 to October October 2011, and therefore cannot be labeled labeled via signature detection. DroidSIFT generates 49 WC-ADGs for this sample and, once once thes thesee grap graphs hs are are pres presen ente ted d to the the anom anomal aly y dete detect ctor or,, a warni warning ng is rais raised ed indicating the abnormal behaviors expressed by these graphs.

3.5.3.3 3.5.3.3

Detectio Detection n of of Trans Transfo format rmation ion Attacks Attacks

We collected 23 DroidDream samples, which are all intentionally obfuscated using a transformation technique [ technique [4 4], and 2 benign apps that are deliberately disguised as malware instances by applying the same technique. We ran these samples through our anomaly detection engine and then sent the detected abnormal ones through the signature detector. detector. The result shows that while 23 true malware instances are flagged as abnormal ones in anomaly detection, the 2 clean ones also correctly pass the detection without raising any warnings. We then compared our signature detection results with antivirus products. To obtain detection results of antivirus software, we sent sent these these sample sampless to VirusT irusTota otall and select selected ed 10 anti-v anti-viru iruss (AV) (AV) produc products ts (i.e., AegisLab, F-Prot, ESET-NOD32, DrWeb, AntiVir, CAT-QuickHeal, Sophos, F-Secure, Avast, and Ad-Aware) that bear the highest detection detection rates. Notice that we consider the detection to be successful only if the AV AV can correctly flag a piece of malware as DroidDream or its variant. In fact, to our observation, many AV AV products can provide partial detection results based upon the native exploit code included in the app package or common network I/O behaviors. As a result, they usually recognize these DroidDream samples as “exploits” or “Downloaders” while missing many other important malicious behaviors. Figure 3.13 3.13 presents presents the detection ratios of “DroidDream” across different detectors. While none of the antivirus products can achieve achieve a detectio detection n rate higher than 61 %, DroidSIFT can successfully successfully flag all the obfuscated samples as DroidDream instances. In addition, we also notice that AV2 produc produces es a relati relative vely ly high high detec detectio tion n ratio ratio (52.17 (52.17 %), but but it also also mistak mistakenl enly y flags those two clean samples as malicious apps. Since the disguising technique simply renames the benign app package to the one commonly used by DroidDream (and thus confuses this AV detector), such false positives again explain that external symptoms are not robust and reliable features for malware detection.

40

3

ratio for Fig. 3.13 Detection ratio obfuscated malware


Tru rue e Po Posi siti tive ve

Fal alse se Po Posi siti tive ve

s n 25 o i t c 20 e t e 15 D f o 10 r e b 5 m u 0 N

Detector ID

3.5.4

Runtime Performance

The averag averagee detect detection ion runtime runtime of 3000 3000 apps apps is 175.8 175.8 s, while while the detectio detection n for a majo majori rity ty (86 %) of apps apps is comp comple lete ted d with within in 5 min. min. Furt Furthe herr, most most of the the apps apps (96 %) are processe processed d within within 10 min. min. The time cost of graph graph genera generatio tion n domina dominates tes the overall overall runtime, runtime, taking taking up at least 50 % of total runtime for 83.5 % of the apps. On the other hand, the signature signature and anomaly detectors detectors are usually usually (i.e., in 98 % of the cases) able to finish running in 3 and 1 min, respecti respectively vely..

3.5.5

Effectiveness Effectiveness of Weight Generation and Weighted Graph Matching

Finall Finally y, we eval evaluat uated ed the effec effecti tive venes nesss of the genera generated ted weight weightss and weight weighted ed graph matching. Our weight generation automatically assign weights to the crithomogeneouss graph pairs pairs and hetical API labels, based on a training set of homogeneou erogeneous graph pairs. Consequently, killProcess() , getMemoryInfo() and sendTextMessage() with a constant phone number, for example, are assigned with fairly high weights. Then, given a graph pair sharing the same critical API labels, other than the pairs used for training, we want to compare their weighted graph similarity with the similarity score calculated by the standard bipartite algorithm. To this end, we randomly picked 250 homogeneous pairs and 250 heterogeneous pairs. The results of these comparisons, presented in Figs. 3.14 and 3.15 and 3.15,, conform to our expectation. Figure 3.14 Figure 3.14 shows shows that for every homogeneous pair, the similarity score score genera generated ted by weight weighted ed graph graph matchi matching ng is almost almost alwa always ys higher higher than than the corresponding one computed using standard bipartite algorithm. In addition, the bipartite algorithm sometimes produces an extremely low similarity (i.e., near zero) between two malicious graphs of the same family, while weighted graph matching manages to improve the similarity score significantly for these cases.

3.5 Evaluation Fig. 3.14 Similarity between malicious graph pairs

41

Standar Sta ndard d Bipart Bipartite ite

Weight eighted ed Gra Graph ph Simila Similarity rity

1 e r o 0.8 c S 0.6 y t i r 0.4 a l i 0.2 m i S 0 0

50

100

150

200

250

Graph Pair ID Fig. 3.15 Similarity between benign and malicious graphs

Standard Stan dard Bipar Bipartite tite

Weighte eighted d Gra Graph ph Simil Similarity arity

1

e r o 0.8 c S 0.6 y t i r 0.4 a l i 0.2 m i S

0 0

50

100

150

200

250

Graph Pair ID

Similarly, Similarly, Fig. 3.15 reveal revealss that that betwee between n a hetero heterogen geneou eouss pair pair, the weight weighted ed similarity score is usually lower than the one from bipartite computation. Again, the bipartite algorithm occasionally considers a benign graph considerably similar to a malicious one, provided that they share the same API nodes. Such results can confuse a training system and the latter one thus fails to tell the differences between malicious and benign behaviors. On the other hand, weighted graph matching can effectively distinguish a malicious graph from a benign one, even if they both have the same critical API nodes. We further attempted to implement the standard bipartite algorithm and apply it to our detectors. We then compared the consequent detection results with those of the detectors with weighted graph matching enabled. The results show that weighted graph matching matching significant significantly ly outperforms outperforms the bipartite bipartite one. While the signature signature detector using the former one correctly labels 93 % of malware samples, the detector detector with with the latter latter one is able able to only only label label 73 % of them. On the other other hand, anomaly anomaly detection detection with the bipartite bipartite algorithm algorithm incurs a false negative negative rate of 10 %, which is 5 times greater than that introduced by the same detector using weighted matching. The result indicates that our algorithm is more sensitive to critical API-level semantics than the standard bipartite graph matching, and thus can produce more reasonable similarity scores for the feature extraction.

42

3


References 1. McAfee McAfee Labs Labs Threat Threatss report report Fourt Fourth h Quarte Quarterr (2013) (2013) http://www http://www.mcafee.com/us/reso .mcafee.com/us/resources/ urces/ reports/rp-quarterly-threat-q4-2013.pdf 2. Zhou Y, Wang Z, Zhou W, Jiang X (2012) Hey, you, get off of my market: detecting malicious apps in official and alternative android markets. In: Proceedings of 19th annual network and distributed system security symposium (NDSS) 3. Grace M, Zhou Y, Zhang Q, Zou S, Jiang X (2012) RiskRanker: scalable and accurate zeroday android malware detection. In: Proceedings of the 10th international conference on mobile systems, applications and services (MobiSys) 4. Rastogi V, Chen Y, Jiang X (2013) DroidChameleon: evaluating android anti-malware against transformation attacks. In: Proceedings of the 8th ACM symposium on information, computer and communications security (ASIACCS) 5. Peng H, Gates C, Sarma B, Li N, Qi Y, Potharaju R, Nita-Rotaru C, Molloy I (2012) Using probabilistic generative models for ranking risks of android apps. In: Proceedings of the 2012 ACM conference on computer and communications security (CCS) 6. Aafer Y, Du W, Yin H (2013) DroidAPIMiner: mining API-level features for robust malware detect detection ion in androi android. d. In: Procee Proceedin dings gs of the 9th intern internati ationa onall confer conferenc encee on securi security ty and priva privacy cy in communication networks (SecureComm) 7. Arp D, Spreit Spreitzen zenbar barth th M, Hübner Hübner M, Gascon Gascon H, Rieck Rieck K (2014) (2014) Drebin Drebin:: effici efficient ent and explainable detection of android malware in your pocket. In: Proceedings of the 21th annual network and distributed system security symposium (NDSS) 8. Christodorescu M, Jha S, Seshia SA, Song D, Bryant RE (2005) Semantics-aware malware detection. In: Proceedings of the 2005 IEEE symposium on security and privacy (Oakland) 9. Fredrikson M, Jha S, Christodorescu M, Sailer R, Yan X (2010) Synthesizing near-optimal malware specifications from suspicious behaviors. In: Proceedings of the 2010 IEEE symposium on security and privacy (Oakland) 10. Kolbitsch C, Comparetti PM, Kruegel C, Kirda E, Zhou X, Wang X (2009) Effective and efficient malware detection at the end host. In: Proceedings of the 18th conference on USENIX security symposium 11. Chen KZ, Johnson N, D’Silva V, Dai S, MacNamara K, Magrino T, Wu EX, Rinard M, Song D (2013) Contextual policy enforcement in android applications with permission event graphs. In: Procee Proceedin dings gs of the 20th 20th annual annual netwo network rk and distri distribu buted ted system system securi security ty sympos symposium ium (NDSS) (NDSS) 12. Soot: A Java Optimization Framework (2016) http://www http://www.sable.mcgill.ca/soot/ .sable.mcgill.ca/soot/ 13. Riesen K, Emmenegger S, Bunke H (2013) A novel software toolkit for graph edit distance computation. In: Proceedings of the 9th international workshop on graph based representations in pattern recognition 14. Lockhe Lockheime imerr H (2012) (2012) Androi Android d and securi security ty.. http://googlemobile.blogspot.com/2012/02/ android-and-security.html 15. Oberheide J, Miller C (2012) Dissecting the android bouncer. In: SummerCon 16. Lu L, Li Z, Wu Z, Lee W, Jiang G (2012) CHEX: statically vetting android apps for component hijacking hijacking vulnerabilitie vulnerabilities. s. In: Proceedin Proceedings gs of the 2012 ACM conferenc conferencee on computer computer and communications security (CCS) 17. Octeau D, McDaniel P, Jha S, Bartel A, Bodden E, Klein J, Traon YL (2013) Effective intercomponent communication mapping in android with epicc: an essential step towards holistic security analysis. In: Proceedings of the 22nd USENIX security symposium 18. Au KWY, Zhou Zhou YF, YF, Huang Huang Z, Lie D (2012) (2012) PScout PScout:: analyz analyzing ing the androi android d permis permissio sion n specification. In: Proceedings of the 2012 ACM conference on computer and communications security (CCS) 19. Russell SJ, Norvig P (2013) Artificial intelligence: a modern approach, 3rd edn. Prentice-Hall, Inc., Upper Saddle River 20. Hu X, Chiueh TC, Shin KG (2009) Large-scale malware indexing using function-call graphs. In: Proceedings of the 16th ACM conference on computer and communications security (CCS)

References

43

21. Android Malware Genome Project (2012) http://www http://www.malgenomeproject.o .malgenomeproject.org/ rg/ 22. Zhou Y, Jiang X (2012) (2012) Dissecting Dissecting android malware: malware: character characterizatio ization n and evolutio evolution. n. In: Proceedings of the 33rd IEEE symposium on security and privacy. Oakland 23. Enck W, Octeau D, McDaniel P, Chaudhuri S (2011) A study of android application security. In: Proceedings of the 20th USENIX security symposium 24. Number of Android Applications (2014) http://www http://www.appbrain.com/stats/n .appbrain.com/stats/number-ofumber-of-androidandroidapps 25. Dharmdasani H (2014) Android.HeHe: malware now disconnects phone calls. http://www. fireeye.com/blog/technical/2014/01/android-hehe-malware-now-disconnects-phone-calls. html

Chapter 4

Automatic Generation of Vulnerability-Specific Patches for Preventing Component Hijacking Attacks

Abstract Component hijacking is a class of vulnerabilities commonly appearing in Androi Android d applic applicat ation ions. s. When When these these vulner vulnerabi abilit lities ies are trigge triggered red by attack attackers ers,, the vulnerable apps can exfiltrate sensitive information and compromise the data integrity on Android devices, on behalf of the attackers. It is often unrealistic to purely purely rely rely on deve develop lopers ers to fix these these vulner vulnerabi abilit lities ies for two two reason reasons: s: (1) it is a time-c time-cons onsumi uming ng process process for the deve develop lopers ers to confirm confirm each each vulner vulnerabi abili lity ty and release a patch for it; and (2) the developers may not be experienced enough to properly fix the problem. In this paper, we propose a technique for automatic patch generation. Given a vulnerable Android app (without source code) and a discovered component hijacking vulnerability, we automatically generate a patch to disable this vulnerability. vulnerability. We have implemented a prototype called AppSealer and evaluated its efficacy on apps with component hijacking vulnerabilities. Our evaluation on 16 real-world vulnerable Android apps demonstrates that the generated patches can effectively track and mitigate component hijacking vulnerabilities. Moreover, after going through a series of optimizations, the patch code only represents a small portion (15.9 % on average) of the entire program. The runtime overhead overhead introduced by AppSealer AppSealer is also minimal, minimal, merely merely 2 % on average average..

4.1 4.1


With ith the the boom boom of Andr Androi oid d devi device ces, s, the the secu securi rity ty thre threat atss in Andr Androi oid d are are also also increa increasin sing. g. Althou Although gh the permis permissio sion-ba n-based sed sandbo sandboxin xing g mechan mechanism ism enforc enforced ed in Android can effectively confine each app’s behaviors by only allowing the ones grante granted d with with corres correspond ponding ing permis permissio sions, ns, a vulner vulnerabl ablee app with with certai certain n critic critical al perm permis issi sion onss can can perf perfor orm m secu securi rity ty-s -sen ensi siti tive ve beha behavi vior orss on beha behalf lf of a mali malici ciou ouss app. app. It is so called confused deputy attack. This kind of security vulnerabilities can present in numerous forms, such as privilege escalation [1 [1], capability leaks [2 [2], permission re-delegation [3 [3], content leaks and pollution [ pollution [4 4], component hijacking [ hijacking [5 5], etc. Prior work primarily focused on automatic discovery of these vulnerabilities. Once a vulnerability is discovered, it can be reported to the developer and a patch is expected. Some patches can be as simple as placing a permission validation at the entry point of an exposed interface (to defeat privilege escalation [1 [1] and permission


45

46

4

Autom utomat atic ic Gen Genera eratio tion of Vulne lnerab rabilit ility y-Sp -Specifi ecificc Patc atches hes for for Pre Preventin nting g. . .

re-delegation [3] attacks), or withholding the public access to the internal data repositories (to defend against content leaks and pollution [ 4]), the fixes to the other problems may not be so straightforward. In partic particula ularr, compone component nt hijack hijacking ing may fall fall into into the latte latterr categ category ory.. When When rece receiiving ving a mani manipu pula late ted d input input from from a mali malici cious ous Andr Androi oid d app, app, an app app with with a component hijacking vulnerability may exfiltrate sensitive information or tamper with the sensitive data in a critical data repository on behalf of the malicious app. In other words, a dangerous information flow may happen in either an outbound or inbound direction depending on certain external conditions and/or the internal program state. A prior effort has been made to perform static analysis to discover potential component hijacking vulnerabilities [5 [ 5]. Static analysis is known to be conservative in nature and may raise false positives. To name a few, static analysis may find a viable execution path for information flow, which may never be reached in actual program execution; static analysis may find that interesting information is stored in some elements in a database, and thus has to conservatively treat the entire database as sensitive. Upon receiving a discovered vulnerability, the developer has to manually confirm if the reported vulnerability is real. It may also be nontrivial for the (often inexperienced) developer to properly fix the vulnerability and release a patch for it. As a result, these discovered vulnerabilities may not be addressed for long time or not addressed at all, leaving a big time window for attackers to exploit these vulnerabilities. Out of the 16 apps with component hijacking vulnerabilities we tested, only 3 of them were fixed in 1 year. To close this window, we aim to automatically generate a patch that is specific to the discovered component hijacking vulnerability. In other words, we would like to automatically generate a vulnerability-specific vulnerability-specific patch on the original program to block the vulnerability as a whole, not just a set of malicious requests that exploit the vulnerability. vulnerability. Whil Whilee auto automa mati ticc patc patch h gene genera rati tion on is fair fairly ly new new in the the cont contex extt of Andr Androi oid d appl applic icat atio ions ns,, a grea greatt deal deal of rese resear arch ch has has been been done done for for trad tradit itio iona nall clie client nt and and server programs (with and without source code). Many efforts have been made to automatically generate signatures to block bad inputs, by performing symbolic execution execution and path exploration [ exploration [6 6–10 10]. ]. This approach is effective if the vulnerability is triggered purely by the external input and the library functions can be easily modele modeled d in symbol symbolic ic execu executi tion. on. Howe Howeve verr, for androi android d applic applicati ations ons,, due to the asynch asynchrono ronous us nature nature of the program program exec executi ution, on, a succes successfu sfull explo exploita itatio tion n may depend on not only the external input, but also the system events and API-call return values. Other efforts have been made to automatically generate code patches within within the vulner vulnerabl ablee progra programs ms to mitiga mitigate te certa certain in kinds kinds of vulner vulnerabi abilit lities ies.. To name name a few few, Auto AutoPa PaG G [ 11 11]] focuse focused d on buff buffer er overfl overflow ow and genera generall boundar boundary y errors; errors; IntPatch IntPatch [12 12]] addressed addressed integer integer-ov -overflo erflow-tow-to-buf bufferfer-ove overflo rflow w problem; problem; To defeat defeat zero-d zero-day ay worms worms,, Sidiro Sidiroglo glou u and Kerom Keromyti ytiss [ 13 13]] propos proposed ed an end-po end-point int first-reaction mechanism that tries to automatically patch vulnerable software by identifying and transforming the code surrounding the exploited software flaw; and VSEF [14 14]] monitors and instruments the part of program execution relevant to

4.2 Problem Statement and Approach Overview

47

specific vulnerability, and creates execution-based filters which filter out exploits based on vulnerable program’s execution trace, rather than solely upon input string. In princi principle ple,, our automa automatic tic patch patch genera generatio tion n techn techniqu iquee falls falls into into the second second category. However, our technique is very different from these existing techniques, because it requires a new machinery to address this new class of vulnerabilities. The key of our patch generation technique is to place minimally required required code into into the vulner vulnerabl ablee progra program m to accurately keep track of dangerous information origin originat ated ed from from the expos exposed ed interf interface acess and effec effecti tive vely ly block block the attac attack k at the security-sensitive security-sensitive APIs. To achieve this goal, we first perform static bytecode analysis to identify small program slices that lead to the discovered vulnerability. Then we but complete program devise several several shadowing mechanisms to insert new variables and instructions along the program slices, for the purpose of keeping track of dangerous information at runtime. In the end, we apply a series of optimizations to remove redundant instructions to minimize the footprint of the generated patch. Consequently, the automatically genera generated ted patch patch can can be guaran guarantee teed d to comple completel tely y disabl disablee the vulner vulnerabi abilit lity y with with minimal impact on runtime performance and usability. We implem implement ent a protot prototype ype,, AppSealer , in 16 thou thousa sand nd line liness of Jav Java code code,, base based d on the the Jav Java bytec bytecod odee opti optimi miza zati tion on fram frameework work Soot Soot [ 15 15]]. We leve leverag ragee Soot’s capability to perform static dataflow analysis and bytecode instrumentation. We evaluate our tool on 16 real-world Android apps with component hijacking vulner vulnerabi abili litie ties. s. Our exper experime iments nts show show that that the patche patched d progra programs ms run correc correctly tly,, while while the vulnerabilities are effectively mitigated.

4.2

4.2.1

Problem Problem Statement Statement and Approa Approach ch Overview Overview

Running Example

Figure 4.1 presents a synthetic running example in Java source code, which has a component hijacking vulnerability. More concretely, the example class is one of the the Acti Activi vity ty comp compon onen ents ts in an Andr Androi oid d appl applic icat atio ion. n. It exte extends nds an Andr Androi oid d Activity and overrides several callbacks including onCreate() , onStart() and onDestroy() . Upon receiving a triggering Intent, the Android framework creates the Activity and further invokes these callbacks. Once the Activity is created, onCreate() retrieves the piggybacked URL string from the triggering Intent. Next, it saves this URL to an instance field addr if the resulting string is not null, or uses the DEFAULT_ADDR otherwise. When the Activity starts, onStart() method acquires the latest location information by calling getLastKnownLocation() , and stores it to a static field location . Further, onDestroy() reads the location object from this static field, encodes the data into a string, encrypts the byte array of the string and sends it to the URL specified by addr through a raw socket.

48

4


1

public clas class s VulActivity extends Activity{ 2 private String String DEFA DEFAULT_ ULT_ADDR ADDR = "http://default.url" ; 3 private priv ate byte DEFAULT DEFAULT_KEY _KEY = 127; 4 5 6 7

private String String addr addr; ; private priv ate stati static c Location Location location; private priv ate byte key;

8 9

/* Entry Entry poi point nt of thi this s Act Activi ivity ty */ public publ ic void onCreate(Bundle savedInstan savedInstanceState){ ceState){ this.key = DEFA this.key DEFAULT_ ULT_KEY; KEY;

10 11 12 13

14 15 16 17

this.addr this .addr = getIntent() getIntent().getExtras() .getExtras().getString( .getString( "url" "url"); ); if( if (this this.ad .addr dr == null){ null){ this.addr = DEFA this.addr DEFAULT_ ULT_ADDR ADDR; ; } }

18 19

public void onStart(){ public VulActivity.location VulActivity.l ocation = getLocation( getLocation(); ); }

20 21 22 23

public void onDestroy(){ public String Str ing loc locati ation on = Double.toString(VulActivi Double.toStri ng(VulActivity.location. ty.location.getLongitude getLongitude()) ()) + "," + Dou Double ble. . toString(VulActivity.loc toString(Vul Activity.location.getLat ation.getLatitude()); itude()); byte[] byte byte[] bytes s = loca location tion.get .getByte Bytes(); s(); for( for (int i=0; i
24 25 26 27 28 29 30 31 32 33 34

35

public byte crypt(byte public crypt( byte plain){ return (byte byte)(p )(plai lain n ˆ key key); ); }

36 37 38 39 40 41 42

public Location getLocation getLocation(){ (){ Location Loca tion loca location tion = null; null; LocationManager LocationManag er locationMan locationManager ager = (LocationMa (LocationManager)getSys nager)getSystemService(C temService(Context ontext .LOCATION_SERVICE); location = locationMana locationManager.getLastK ger.getLastKnownLocation nownLocation(LocationMan (LocationManager. ager. GPS_PROVIDER); return location; }

43 44

public void post(String public post(String addr, byte[] byte[] byte bytes){ s){ URL UR L ur url l = new URL(addr); HttpURLConnection HttpURLConnec tion conn = (HttpURLConn (HttpURLConnection)url.o ection)url.openConnectio penConnection(); n(); ... OutputStr Outp utStream eam outp output ut = conn conn.get .getOutp OutputSt utStream ream(); (); output.write(bytes, output.write( bytes, 0, bytes.length bytes.length); ); ... }

45 46 47 48 49 50 51 52

}

code for the running example example Fig. 4.1 Java code

This This progra program m is subjec subjectt to compon component ent hijack hijacking ing attack attack,, becaus becausee a malic maliciou iouss app may send an Intent to this Activity with a URL specified by the atta attack cker er.. As a resu result lt,, this this vuln vulner erab able le app app will will send send the the loca locati tion on info inform rmat atio ion n to the the atta attack cker er’’s URL. URL. In othe otherr words ords,, this this vuln vulner erab abil ilit ity y allo allows ws the the mali mali-ciou ciouss app app to retr retrie ieve ve sens sensit itiive info inform rmat atio ion n with withou outt havi having ng to decl declar aree the the

4.2 Problem Statement and Approach Overview

49

relate related d permis permissio sions ns (i.e., (i.e., android.permission.ACCESS_FINE_LOCATION and android.permission.INTERNET ). Besides information leakage, a component hijacking attack may happen in the reve revers rsee dire direct ctio ion. n. That That is, is, a vuln vulner erab able le prog progra ram m may may allo allow w a mali malici ciou ouss app app to modi modify fy the content of certain sensitive data storage, such as the Contacts database, even though the malicious app is not granted with the related permissions.

4.2.2

Problem Statement

We anticipate our proposed technique to be deployed as a security service in the Android marketplace, as illustrated in Fig. 4.2 4.2.. Both the existing apps and the newly submitted apps must go through the vetting process by using static analysis tools like CHEX [5 [ 5]. If a component hijacking vulnerability is discovered in an app, its developer will be notified, and a patch will be automatically generated to disable the discovered vulnerability. So the vulnerable apps will never reach the end users. This approach wins time for the developer to come up with a more fundamental solution to the discovered security problem. Even if the developer does not have enough skills to fix the problem or is not willing to, the automatically generated patch can serve as a permanent solution for most cases (if not all). In addi additi tion, on, it is also also poss possib ible le to depl deploy oy our our tech techni niqu quee with with more more ad-h ad-hoc oc schemes. For instance, an enterprise can maintain its private app repository and security service too. The enterprise service conducts vetting and necessary patching before an app gets into the internal app pool, and thus employees are protected from vulnerable apps. Alternatively, establishing third-party public services and repositories is also viable and can benefit end users.

Report

Patched V’

Vulnerable App V

Original G

Developer A

Patching

In-Cloud App Inspection & Patching

Good App G

Submission Developer B

Install Android App Market

End User

Fig. 4.2 Deployment of AppSealer

50

4


Vulnerable App

Patched App

Resources

Resources

dex

Taint Slice Computation

Translation

IR

Patch Statement Placement

Slices

Patch Optimization

New IR

Code Generation

dex

Optimized IR

AppSealer Fig. 4.3 Architecture of AppSealer

4.2.3

Approach Overview

Figure 4.3 depicts the workflow of our automatic patch generation technique. It takes the following steps: (1) IR Translation. An Androi Android d app genera generally lly consis consists ts of a Dalvik Dalvik byteco bytecode de executable file, manifest files, native libraries, and other resources. Our patch generation is performed on the Dalvik bytecode program. So the other files remain the same, and will be repackaged into the new app in the last step. To facilitate the subsequent static analysis, code insertion, and code optimization, we first translate Dalvik bytecode into an Intermediate Representation (IR). In partic particula ularr, we first first conve convert rt the DEX into into Java Java bytec bytecode ode progra program m using using dex2jar [ dex2jar [16 16], ], and then using Soot [15 [ 15], ], translate translate the Java Java bytecode bytecode into Jimple IR. (2) Slice Computation. On context-sensitive Computation. On Jimple IR, we perform flow-sensitive context-sensitive inter-procedural dataflow analysis to discover component hijacking flows. We track the propagation of certain “tainted” sensitive information from sources like like intern internal al data data storag storagee and expos exposed ed interf interface aces, s, and detect detect if the tainte tainted d inform informati ation on propag propagate atess into into the danger dangerous ous data data sinks. sinks. By perfor performin ming g both both forwa forward rd dataflo dataflow w analys analysis is and backwa backward rd slici slicing, ng, we comput computee one or more more program slices that directly contribute to the dangerous information flow. To distinguish with other kinds of program slices, we call this slice taint slice, as it includes only the program statements that are involved in the taint propagation from a source to a sink. (3) Patch Statement Placement. With the guidance of the computed taint slices, we place shadow statements into the IR program. The inserted shadow statements serve as runtime checks to actually keep track of the taint propagation while the Android application is running. In addition, guarding statements are also placed at the sinks to block dangerous information flow right on site. (4) Patch Statement Optimization. We Optimization. We further optimize the inserted patch statement ments. s. This This is to remo remove ve the the redu redund ndan antt stat statem emen ents ts that that are are inse insert rted ed from from the the previ previou ouss step step.. As Soot Soot’’s buil builtt-in in opti optimi miza zati tions ons do not not appl apply y well well on these these patch patch statem statement ents, s, we devis devisee three three custom custom optimi optimizat zation ion techni technique quess to safely manipulate the statements and remove redundant ones. Thereafter, the optimized code is now amenable to the built-in optimizations. Consequently,

4.3 Taint Slice Computation

51

after going through both custom and built-in optimizations, the added patch statements can be reduced to the minimum, ensuring the best performance of the patched bytecode program. (5) Bytecode Generation. At Generation. At last, we convert the modified Jimple IR into a new package (.APK file). To do so, we translate the Jimple IR to Java package using Soot, and then re-target Java bytecode to Dalvik bytecode using Android SDK. In the end, we repackage the patched DEX file with old resources and create the new .APK file.

4.3

Taint aint Slice Slice Comput Computati ation on

Taking a Jimple IR program and the sources and sinks specified in the security policies as input, our application-wide dataflow analysis will output one or more taint slices for dangerous information flows. Our analysis takes the following steps: (1) we locate the information sources in the IR program, and starting from each source, we perform forward dataflow analysis to generate a taint propagation graph; (2) if a corresponding sink is found in this taint propagation graph, we perform backward dependency analysis from the sink node and generate a taint slice. We follo follow w the simil similar ar approa approach ch elabor elaborate ated d in Chap. Chap. 3 to conduc conductt dataflo dataflow w analyses, and particularly we take special consideration of programming features in Android apps, such as static and instance fields, Intent, threads, etc.

4.3.1

Running Example

Figure 4.4 illust illustrat rates es the taint taint slices slices for the runnin running g examp example, le, after after using using both both forward forward dataflo dataflow w analys analysis is and backwa backward rd depend dependenc ency y analys analysis. is. There There are two two slices in this graph, each one of which represents the data propagation of one source source.. The left left branch branch descri describes bes the taint taint propaga propagatio tion n of “gps” “gps” inform informat ation ion,, getLastKnownLocation() . The data which originates from the invocation of getLastKnownLocation() is then saved to a static field location , before a series of long-to-string and string-to-bytearray conversions in onDestroy() . Converted byte array is further passed to crypt() for byte-level encryption. The right branch begins with Intent receiving receiving in onCreate() . The piggybacked “url” data is thus extracted from the Intent and stored into an instance field addr. The two branches converge at post(String,byte[]) , when both the encrypted byte array and “addr” string are fed into the two parameters, respectively. “addr” string is used to construct an URL, then a connection, and further an OutputStream object, while the byte array serves as the payload. In the end, both sources flow into the sink OutputStream. write(byte[],int,int) , which sends the payload to the designated address.

52

4


: $r4 : $r4 = = virtualinvoke $r3.("gps");

: $r2 : $r2 = = virtualinvoke r0.();

: return $r4 return $r4;; : $r1 : $r1 = = virtualinvoke r0.();

Static Field

: $r3 : $r3 = = ; location>;

: $r3 = $r3 = virtualinvoke $r2 virtualinvoke $r2.(); : $r4 = : $r4 = virtualinvoke $r3 virtualinvoke $r3.("url");

Instance Field

: $d0 : $d0 = = virtualinvoke $r3 virtualinvoke $r3.(); .(); : $r4 : $r4 = = staticinvoke ($d0 ($d0); );

: $r11 : $r11 = = r0.; addr>;

: $r5 : $r5 = = virtualinvoke $r1.($r4 $r1.($r4); ); : $r6 : $r6 = = virtualinvoke $r5 virtualinvoke $r5.(","); .(","); : $r9 : $r9 = = virtualinvoke $r6 virtualinvoke $r6.($r8); .($r8); : $r10 : $r10 = = virtualinvoke $r9 virtualinvoke $r9.(); : r2 : r2 = = virtualinvoke $r10 virtualinvoke $r10.();

: virtualinvoke r0.( e[])>($r11 $r11,, r2 r2); ); : r1 : r1 := := @parameter0 @parameter0:: String; : r2 := : r2 := @parameter1 @parameter1:: byte[]; : specialinvoke $r3 specialinvoke $r3.(String)>(r1 (String)>( r1); );

r2[i0] : $b2 onDestroy()>: $b2 = = r2 [i0] : $b3 = virtualinvoke r0.($b2 r0.($b2))

: $r4 : $r4 = = virtualinvoke $r3 virtualinvoke $r3.();

: b0 : b0 := := @parameter0 @parameter0:: byte : $b2 : $b2 = = b0 b0 ^ ^ $b1 : return $b2 return $b2

: r5 : r5 = = (java.net.HttpURLConnection (java.net.HttpURLConnection)) $r4 $r4;; : r6 : r6 = = virtualinvoke r5 virtualinvoke r5.();

: $b3 : $b3 = = virtualinvoke r0.($b2 r0.($b2)) : r2 r2[i0] [i0] = $b3

: virtualinvoke r6 virtualinvoke r6.( .(r2 r2,, 0, $i0);

example Fig. 4.4 Taint slices for the running example

4.4

Patch Patch Statem Statement ent Placem Placement ent

Static dataflow analysis is usually conservative and may lead to false positives. Therefore, we insert instrumentation code in these taint propagation slices. The inserted code serves as runtime checks to actually keep track of the taint propagation whil whilee the the Andr Androi oid d appl applic icat atio ion n is runni running. ng. The The sink sinkss are are also also inst instru rume ment nted ed to examine the taint and confine the information leakage. We create shadows, for each data entity defined or used within the slices, to track its runtime taint status. The data entities outside the slices do not need to be shadowed, because they are irrelevant to the taint propagation. We create shadows for different types of data entities individually. individually. Static or instance fields are shadowed by adding boolean fields into the same class definition. A local variable is shadowed with with a boolea boolean n local local varia variable ble within within the same same method method scope, scope, so that that compil compiler er optimizations can be applied smoothly on related instrument code. Shadowing method parameters and return value requires special considerations. Firstly, the method prototype needs modification. Extra “shadow parameters” are adde added d to para parame mete terr list list to pass pass the the shad shadow owss of actu actual al para parame mete ters rs and and retu return rn value from caller to callee. Secondly, parameters are passed-by-value in Java, and therefore primitive-typed local shadow variables (boolean) need to be wrapped as objects before passing to a callee. Otherwise, the change of shadows in the callee cannot be reflected in the caller. To this end, we define a new class BoolWrapper . This class only contains one boolean instance field, which holds the taint status, and a default constructor, as shown below. Notice that the Java Boolean class cannot serve the same purpose because it is treated the same as primitive boolean type once passed as parameter.

4.5 Patch Optimization

53

public class BoolWrapper BoolWrapper extends extends java.lang.O java.lang.Object{ bject{ public public boolea boolean n b; public public void void (){ (){ BoolWrapper BoolWrapper r0; r0 := @this: @this: BoolWr BoolWrapp apper; er; specialinvok specialinvoke e r0.()>(); ()>(); return return; ; } }

With the created shadows, we instrument sources, data propagation code and sinks. At the source of information flow, we introduce taint by setting the corresponding shadow variable to “true”. For data propagation code, we instrument an individual instruction depending on its type. (1) If the instruction is an assignment statement or unary operation, we insert a definition statement on correlative shadow variables. (2) If it is a binary operation, a binary OR statement is inserted to operate on shadow variables. If one of the operators is a constant, we replace its shadow with a constant “false”. (3) Or, if it is a function call, we need to add code to bind shadows of actual parameters and return value to shadow parameters. (4) Further, if the instruction is a API call, we model and instrument the API with its taint propagation logic. We generally put APIs into the following categories and handle each category with a different model. “get” APIs have straightforward taint propagation logic, alwa always ys propag propagati ating ng taint taint from from parame paramete ters rs to their their return return value values. s. Theref Therefore ore,, we generate a default rule, which propagates taint from any of the input parameters to the return value. Similarly, simple “set” APIs are modeled as they propagate tain taintt from from one one para parame mete terr to anot anothe herr para parame mete terr or “thi “this” s” refe refere renc nce. e. APIs APIs like like Vector.add(Object) inserts new elements into an aggregate construct and thus can be modeled as a binary operation, such that the object is tainted if it is already tainted or the newly added element is tainted. APIs like android.content. that operat operatee on (key (key, value value)) ContentV ContentValue alues.p s.put(S ut(Strin tring g key, Byte value) value) that pairs can have more precise handling. In this case, an element is stored and accessed according to a “key”. To track taint more precisely, we keep a taint status for each key, so the taint for each (key, value) pair is updated individually. Then, at the sink, we insert code before the sink API to check the taint status of the sensitive parameter. If it turns out the critical parameter is tainted, the inserted code will query a separate policy service app for decision and a warning dialog is then displayed to the user. We also devise taint cleaning mechanism. That is, if a variable is redefined to be an untainted variable or a constant outside taint propagation slices, we thus insert a statement after that definition to set its shadow variable to 0 (false).

4.5 4.5

Patc Patch h Op Optim timiz izat atio ion n

We furt furthe herr opti optimi mize ze the the adde added d inst instru rume ment ntat atio ion n code code.. This This is to remo remove ve the the redu redunda ndant nt byteco bytecode de instru instruct ction ionss that that are insert inserted ed from from the previo previous us step. step. As Soot’ Soot’ss builtbuiltin optimizations do not apply well on this instrumentation code, we devise three

54

4


custom optimization techniques to safely manipulate the instrumentation code and remove redundant ones. Thereafter, the optimized code is now amenable to the built-in optimizations. Consequently, after going through both custom and builtin optimizations, the added instrumentation code can be reduced to a minimum, ensuring the best performance of the rewritten bytecode program. To be more specific, we have devised four steps of optimizations, described as follows. • O1: instrumen menta tatio tion, n, we add a shado shadow w parame parameter ter for every very single single actua actuall O1: In instru parameter and return value. However, some of them are redundant because they don’t contribute to taint propagations. Therefore, we can remove the inserted code which uses solely these unnecessary shadow parameters. • O2: O2: Next, we remove redundant shadow parameters from parameter list and adjust method prototype. Consequently, instrumentation code, that is used to initialize or update the taint status of these shadow parameters, can also be eliminated. • O3: O3: Further, if inserted taint tracking code is independent from the control-flow logic of a method, we can lift the tainting code from the method to its callers. Thus, the taint propagation logic is inlined. • O4: After custom custom optimi optimizat zation ions, s, instru instrume menta ntatio tion n code code is amenab amenable le to Soot’ Soot’ss O4: After built-in optimization, such as constant propagation, dead code elimination, etc.

4.5.1

Optimized Patch Patch for Running Example

Figure 4.5 presents the optimized patch for the running example. For the sake of readability, we present the patch code in Java as a “diff” to the original program, even though this patch is actually generated on the Jimple IR level. Statements with numeric line numbers are from the original program, whereas those with special line number “P “ P” are patch statements. The underlines mark either newly introduced code or modified parts from old statements. We can see that a boolean variable “addr_s0_t” is created to shadow the instance field “addr”, and another boolean variable “location_s1_t” is created to shadow the static field “location”. Then in onCreate() , the shadow variable “addr_s0_t” is set to 1 (tainted) when the Activity is created upon an external Intent. Otherwise it will be set to 0 (untainted). The shadow variable “location_s1_t” is set to 1 inside onStart() , after getLocation() is called. Note that this initialization is originally placed inside getLocation() after Ln.40, and a BoolWrapper is crea create ted d for the the retu return rn value alue of getLocation() . After After applyi applying ng the inlini inlining ng optimization(O3), this assignment is lifted into the caller function onStart() and the BoolWrapper variable is also removed. Due to the optimi optimizat zation ions, s, many many patch patch state statemen ments ts placed placed for tracki tracking ng tainte tainted d status have been removed. For example, the taint logic in crypt() has been lifted onDestroy() and further optimized there. The tainted values up to the body of onDestroy()

4.5 Patch Optimization

55

code for the patched running example example Fig. 4.5 Java code

should also be properly cleaned up. For instance, “addr_s0_t” is set to 0 after Ln.15, where “addr” is assigned a constant value, which means that if no “url” is provided in the Intent, the “addr” should not be tainted.

56

4


In the method method onDestroy() , when the information flows through the post() method, we wrap the local shadow variables for corresponding parameters and pass these BoolWrapper objects to new post() as additional parameters. In the end, we retrieve the taints in post() and check the taint statuses before the critical networking operation is conducted at Ln.49. Consequently, we stop this component hijacking attack right before the dangerous operation takes place.

4.6

Experi Experimen mental tal Evalu Evaluati ation on

To evaluate the efficacy, correctness and efficiency of AppSealer, we conducted experiments on real-world Android applications with component hijacking vulnerabilities and generated patches for them.

4.6.1

Experiment Setup

We collec collectt 16 vulner vulnerabl ablee Androi Android d apps, apps, which which expos exposee intern internal al capab capabili ilitie tiess to public interfaces and are subject to exploitation. Table 4.1 4.1 describes describes their exposed interfaces, leaked capabilities and possible security threats.

Overview of vulnerable apps apps Table 4.1 Overview ID Package-version

Exposed Leaked interface capa capabi bili lity ty

Thre Threat at desc descri ript ptio ion n

1

CN.MyPrivateMessages-52

Activity

Raw query

SQL injection

2

com.akbur.mathsworkout-92

Activity

Internet

Delegation attack

3

com.androidfu.torrents-26

Activity

Sel Selectio tion query SQL injection

4

com. com.ap apps pspo pot. t.sw swis issc scod odem emon onke keys ys.p .pai aint ntfx fx-4 -4 Acti Activi vity ty

Inte Intern rnet et

5

com.cnbc.client-1208

Activity

Selection query SQL injection

6

com.cnbc.client-1209

Activity


7

com.espn.score_center-141

Activity

Internet

Delegation attack

8

com.espn.score_center-142

Activity

Internet

Delegation attack

9

com. com.gm gmai ail. l.tr trav avel elde deve vel. l.an andr droi oid. d.vl vlc. c.ap appp-13 131 1 Serv Servic icee

Inte Intern rnet et

Dele Delega gati tion on atta attack ck

Dele Delega gati tion on atta attack ck

10 com.kmshack.BusanBus-30

Activity

Ra Raw query

SQL injection

11 com.utagoe.momentdiary-45

Service

Raw query

SQL injection

12 com.yoondesign.colorSti Sticker-8

Activity

Raw query

SQL injection

13 fr.pb.tvflash-9

Activity


14 gov.nasa-5

Activity


15 hu.tagsoft.ttorrent.lite-15

Service

Internet

Delegation attack

16 jp.h jp.hot otpe pepp pper er.a .and ndro roid id.b .bea eaut uty y.hai .hairr-12 -12

Acti Activi vity ty

Raw Raw quer query y

SQL SQL inje inject ctio ion n

4.6 Experimental Evaluation

57

Most of these vulnerable apps accidentally leave their internal Activities open and unguarded. Thus, any Intent whose target matches the vulnerable one can launch it. Others carelessly accept any Intent data from a public Service without input validations. Unauthorized external Intent can therefore penetrate the app, through these public interfaces, and exploit its internal capabilities. Such leaked capabilities, including SQLite database query and Internet access, are subject to various various security security threats. threats. For instance, instance, Intent data recei receive ved d at the expos exposed ed interface may cause SQL Injection; external Intent data sending to Internet may cause delegation attack. To detect and mitigate component hijacking vulnerabilities, AppSealer automatically performs analysis and rewriting, and generates patched apps. We conduct the experiment on our test machine, which is equipped with Intel(R) Core(TM) i7 CPU (8M Cache, 2.80 GHz) and 8 GB of physical physical memory memory. The operating operating system is Ubuntu 11.10 (64bit). To verify the effectiveness and evaluate runtime performance of our generated patc patche hes, s, we furt furthe herr run run them them on a real real devi device ce.. Expe Experi rime ment ntss are are carr carrie ied d out out on Goog Google le Nexus S, with Android OS version 4.0.4.

4.6.2

Summarized Results

We configure AppSealer to take incoming Intents from exposed interfaces as sources, and treat outgoing Internet traffic and internal database access as sinks. A taint slice is then a potential path from the Intent receiver to these privileged APIs. We compute the slice for each single vulnerable instance, and conduct a quantitative study on it. Figure 4.6 4.6 shows shows the proportional size of the slices, compared to the total size of the application. We can see that most of the taint slices represent a small portion of entire entire applicati applications, ons, with the average average proportion proportion being 11.7 %. However However,, we do observe that for a few applications, applications, the slices take up to 45 % of the total size. Some sample sampless (e.g., (e.g., com.km com.kmsha shack. ck.Bus BusanB anBus) us) are fairly fairly small. small. Althou Although gh the total total slice slice size size is only up to several thousands Jimple statements, the relative percentage becomes high. Apps like com.cnbc.client operate on incoming Intent data in an excessive way, and due to the conservative nature of static analysis, many static and instance fields are involved in the slices. We also also meas measur uree the the progr program am size size on dif differe ferent nt stag stages es of patc patch h gene genera rati tion on and optimizations. We observe that the increase of the program size is roughly proportional to the slice size, which is expected. After patch statement placement, the increased increased size is, on average, average, 41.6 % compared compared to the original original program. Figure 4.7 Figure 4.7 further further visualizes the impact of the four optimizations to demonstrate their their perfor performan mance. ce. The five five curve curvess on the figure figure repres represent ent the relati relative ve sizes sizes of the program, compared to the original app size, on different processing stages, respectively. The top curve is the program size when patch statement placement has been conducted, while the bottom one stands for the app size after all four patch

58

4


50% s e c i l S f o e z i S l a n o i t o p o r P

45% 40% 35% 30% 25% 20% 15% 10% 5% 0% 1

2

3

4

5

6

7

8

9

1 0 1 1 12 1 3 1 4 1 5 1 6

App ID

size of slices in percentage percentage Fig. 4.6 Relative size 240% 220%

e z i S 200% p p A 180% l a n o 160% i t o p 140% o r P

Instru O1 O2 O3

120%

O4

100% 1

2

3

4

5

6

7

8

9 10 10 11 12 13 14 1 5 16

App ID Five curves quantify app sizes at different stages Fig. 4.7 Relative app size in percentage. Five

optimizations. We can see that (1) for some of these apps, the increase of program size size due to patch stateme statement nt placeme placement nt can be big, big, and up to 130 %; and (2) these these optimizations are effective, because they significantly reduce the program sizes. In the end, the average average size of patch code drops to 15.9 %.

4.6.3

Detailed Analysis

Here we present detail analysis for these vulnerable apps to discuss the effectiveness and accuracy of our generated patches.


4.6.3.1 4.6.3.1

59

Apps Apps with Simple Simple Exploiti Exploiting ng Paths Paths

Some of the apps are vulnerable but exploitation paths are fairly simple. App 4, 7, 8, 9, 11, 12 fall into this category. Upon obtaining Intent data from an open Activity or Service, these apps directly use it either as URL to load a webpage in WebView, conduct a HTTP GET, or as a SQL string to query an internal database. Consequently, the exploitation path is guaranteed to happen every time a malicious Intent reaches the vulnerable interfaces. In this case, simply blocking the exposed interface might be as good as our patching approach. However, for other apps, a manipulated input may not always cause an actual exploitation.

4.6.3.2 4.6.3.2

Apps Apps with Pop-Up Pop-Up Dialogs Dialogs

Some apps ask user for consent before the capable sink API is called. App 2, 3, 5, 6, 10, 10, 13 shar sharee this this same same feat featur ure. e. If the the user user does does not not appr approv ovee furt furthe herr opera operati tions ons,, the the exploi xploita tati tion on will will not not occu occurr. In this this case case,, bloc blocki king ng at the the open open inte interf rfac acee causes unnecessary interventions. Our approach, on the other hand, disables the vulnerability and requires only necessary mediations. com.akbur.mathsworkout (version code 92) is one of these examples. This app is a puzzle game, which is subject to data pollution and leaks Internet capability. Granted with Internet permission, the app is supposed to send user’s “High Score” to a specific URL. However, the Activity to receive “High Score” data is left unguarded. Thus, an malicious app can send a manipulated Intent with a forged “score” to this vulnerable Activity, polluting the latter’s instance field. This field is accessed in another thread and the resulting data is sent to Internet, once the thread is started. However, starting this background thread involves human interactions. Unless a “OK” button is clicked in GUI dialog, no exploitation will happen. Our patch correctly addresses this case and only displays the warning when sending thread is about to call the sink API (i.e. HttpClient.execute() ). com.cnbc.client (version code 1208 and 1209) asks for user’s consent in a more straight-forward way. This finance app exposes an Activity interface that can access internal database, and thus is vulnerable to SQL injection attack. The exposed Activity is intended to receive the “name” of a stock, and further save it to or delete it from the “watch list”. Malicious Intent can manipulate this “name” and trick the victim app to delete an important one or add an unwanted one. Nevertheless, the deletion requires user’s approval. Before deletion, the app explicitly informs the user and asks for decision. Similarly, if the user chooses “Cancel”, no harm will be done. Our patch automatically enforces necessary checks but but avoid voidss inte interv rven enti tion on in this this scen scenar ario io.. Noti Notice ce that that tain taintt slic slices es of this this app app take take a grea greatt portion portion (27 %) of the program, and therefore therefore it is extremely extremely hard to confirm and fix the vulnerability, or further discover aforementioned secure path with pure human effort. effort. In contrast, contrast, AppSealer AppSealer automatic automatically ally differen differentiat tiates es secure secure and dangerous dangerous paths, and in the meantime manages to significantly reduce the amount of patch statements.

60

4.6.3.3 4.6.3.3

4


Apps Apps with Selectio Selection n Views iews

A similar but more generic case is that apps provide views such as AdapterView for selection. The actual exploit only occurs if an item is selected. Apps 1, 14, 16 are of this kind. CN.MyPrivateMessages (version code 52) is a communication app which suffers the SQL injection attack. An vulnerable Activity may save a manipulated Intent data to its instance field during creation. The app then displays an AdapterView for user to select call logs. Only upon selection does the event handler obtain data from the polluted field and use it to perform a “delete” operation in database.

4.6.3.4 4.6.3.4

Apps Apps with Multiple Multiple Threads Threads

Some samples extensively create new threads during runtime and pass the manipulated input across threads (e.g., app 2, 10, 15). Asynchronous program execution makes it hard for developers or security analysts to reproduce the exploitation and thus to confirm the vulnerability.

References 1. Davi Davi L, Dmitri Dmitrienk enko o A, Sadeg Sadeghi hi AR, Winan Winandy dy M (2011) (2011) Privil Privileg egee escala escalatio tion n attack attackss on Android. In: Proceedings of the 13th international conference on Information security, (Berlin, Heidelberg), 2011 2. Grace M, Zhou Y, Wang Z, Jiang X (2012) Systematic detection of capability leaks in stock Android smartphones. In: Proceedings of the 19th network and distributed system security symposium, 2012 3. Felt AP, Wang HJ, Moshchuk A, Hanna S, Chin E (2011) Permission re-delegation: attacks and defenses. In: Proceedings of the 20th USENIX security symposium, 2011 4. Zhou Y, Jiang X (2013) Detecting passive content leaks and pollution in Android applications. In: Proceedings of the 20th network and distributed system security symposium, 2013 5. Lu L, Li Z, Wu Z, Lee W, Jiang Jiang G (2012) (2012) CHEX: CHEX: stat statica ically lly vetti vetting ng Androi Android d apps apps for compon component ent hijacking hijacking vulnerabilitie vulnerabilities. s. In: Proceedin Proceedings gs of the 2012 ACM conferenc conferencee on computer computer and communications security (CCS’12), October 2012 6. Cui W, Peinado M, Wang HJ (2007) Shieldgen: automatic data patch generation for unknown vulnerabilities with informed probing. In: Proceedings of 2007 IEEE symposium on security and privacy, 2007 7. Brumley D, Newsome J, Song D, Wang H, Jha S (2006) Towards automatic generation of vulnerability-based signatures. In: Proceedings of the 2006 IEEE symposium on security and privacy (Oakland’06), May 2006 8. Costa M, Crowcroft J, Castro M, Rowstron A, Zhou L, Zhang L, Barham P (2005) Vigilante: end-to-end containment of internet worms. In: Proceedings of the twentieth ACM symposium on systems and operating systems principles (SOSP’05), October 2005 9. Costa M, Castro M, Zhou L, Zhang L, Peinado M (2007) Bouncer: securing software by blocking bad input. In: Proceedings of 21st ACM SIGOPS symposium on operating systems principles (SOSP’07), October 2007

References

61

10. Caball Caballero ero J, Liang Liang Z, Poosan Poosankam kam,, Song Song D (2009) (2009) Toward owardss genera generatin ting g high high cove coverag ragee vulnerability-based signatures with protocol-level constraint-guided exploration. In: Proceedings of the 12th international symposium on recent advances in intrusion detection (RAID’09), September 2009 11. Lin Z, Jiang X, Xu D, Mao B, Xie L (2007) AutoPAG: towards automated software patch generation with source code root cause identification and repair. In: Proceedings of the 2nd ACM symposium on information, computer and communications security, 2007 12. Zhang C, Wang T, Wei T, Chen Y, Zou W (2010) IntPatch: automatically fix integer-overflowto-bu to-buff ffer er-o -ove verflo rflow w vulner vulnerabi abilit lity y at compil compile-t e-time ime.. In: Procee Proceedin dings gs of the 15th 15th Europe European an conference on research in computer security, 2010. 13. Sidiroglou Sidiroglou S, Keromyt Keromytis is AD (2005) (2005) Countering Countering network worms through through automatic automatic patch generation. In: IEEE security and privacy, Nov 2005, vol 3, pp 41–49 14. Newsome J (2006) Vulnerability-specific execution filtering for exploit prevention on commodity software. In: Proceedings of the 13th symposium on network and distributed system security (NDSS), 2006 15. Soot: A Java Optimization Framework (2016) http://www http://www.sable.mcgill.ca/soot/ .sable.mcgill.ca/soot/ 16. dex2jar (2016) http://code.google.co http://code.google.com/p/dex2jar/ m/p/dex2jar/

Chapter 5

Efficient and Context-Aware Privacy Leakage Confinement

prevalent ent operating operating system in mobile Abstract As Android has become the most preval devices, privacy concerns in the Android platform are increasing. A mechanism for efficient runtime enforcement of information-flow security policies in Android apps is desira desirable ble to confine confine priva privacy cy leakag leakage. e. The prior prior works works towa towards rds this this proble problem m requir requiree firmware modification (i.e., modding) and incur considerable runtime overhead. Besides, no effective effective mechanism is in place to distinguish malicious privacy privacy leakage from those of legitimate uses. In this paper, we take a bytecode rewriting approach. Given an unknown Android app, we selectively insert instrumentation code into the app to keep track of private information and detect leakage at runtime. To distinguish legitimate and malicious leaks, we model the user’s decisions with a context-aware policy enforcement mechanism. We have implemented a prototype called Capper and evaluated evaluated its efficacy on confining privacy-breaching privacy-breaching apps. Our evaluation on 4723 real-world Android applications demonstrates that Capper can effectively track and mitigate privacy leaks. Moreover, after going through a series of optimizat optimizations, ions, the instrumentati instrumentation on code only represents represents a small small portion portion (4.48 % on average) of the entire program. The runtime overhead introduced by Capper is also minimal, merely 1.5 % for intensive data propagation. propagation.

5.1 5.1


Privacy concerns in the Android platform are increasing. Previous studies [ 1–6] have have expos exposed ed that that both both benign benign and malic maliciou iouss apps apps are stealt stealthil hily y leakin leaking g users’ users’ private information to remote servers. Efforts have also been made to detect and analyze privacy leakage either statically or dynamically [ 1, 2 2,, 7 7– –11 11]. ]. Nevertheless, a good solution to defeat privacy leakage at runtime is still lacking. We argue that a practical solution needs to achieve the following goals: • Information-flow Information-flow based security. security. Privacy leakage is fundamentally an information flow security problem. A desirable solution to defeat privacy leakage would detect sensitive information flow and block it right at the sinks. However, most of prior efforts to this problem are “end-point” solutions. Some earlier soluti solutions ons exten extended ded Androi Android’ d’ss instal install-t l-time ime constr constrai aints nts and enrich enriched ed Androi Android d permissions [12 [12,, 13 13]]. Some aimed at enforcing permissions in a finer-grained


63

64

5


manne annerr and in a more ore flexi flexibl blee way [ 14 14– –17 17]]. Some Some atte attemp mpte ted d to impr improv ovee isolation on various levels and each isolated component could be assigned with a different set of permissions [18 [ 18– –20 20]. ]. In addition, efforts were made to introduce supplementary user consent acquisition mechanism, so that access to sensitive resource also also requires user approval [ approval [21 21,, 22 22]. ]. All these these “end-po “end-point int”” solution solutionss only only mediate the access to private information, without directly tackling the privacy leakage problem. • Low runtime overhead. An overhead. An information-flow based solution must have very low runtime overhead to be adopted on end users’ devices. To To directly address privacy privacy leakage problem, Hornyack et al. proposed AppFence to enforce information flow policies at runtime [3 [ 3]. With support of TaintDroid [2], AppFence keeps track of the propagation of private information. Once privacy leakage is detected, AppFen AppFence ce eithe eitherr blocks blocks the the leakag leakagee at the sink sink or shuffl shufflee the inform informat ation ion from the sour source ce.. Thou Though gh effe effect ctiive in term termss of bloc blocki king ng pri privacy vacy leak leakag age, e, its its effic efficie ienc ncy y is not not favorable. Due to the taint tracking on every single Dalvik bytecode instruction, AppFence incurs significant performance overhead. • No firmware firmware modding. For modding. For a practical solution to be widely adopted, it is also crucial to avoid firmware modding. Unfortunately, the existing information-flow based solutions such as AppFence require modifications on the stock software stack, making it difficult to be deployed on millions of mobile devices. • Context-aware Context-aware policy enforceme enforcement. nt. Many apps need to access user’s privacy for legitimate functionalities and these information flows should not be stopped. Therefore, Therefore, to defeat defeat privac privacy y leakage leakage without without compromisi compromising ng legitim legitimate ate functionality, a good solution needs to be aware of the context where a sensitive information flow is observed and make appropriate security decisions. To the best of our knowledge, we are not aware that such a policy mechanism exists. In this chapter, we aim to achieve all these design goals by taking a bytecode rewriting approach. Given an unknown Android app, we selectively rewrite the program by inserting bytecode instructions for tracking sensitive information flows only in certain fractions of the program (which are called taint slices) that are potent potential ially ly invol involve ved d in inform informati ation on leakag leakage. e. When When an inform informati ation on leaka leakage ge is actual actually ly observ observed ed at a sink sink node (e.g., (e.g., an HTTP HTTP Post Post operat operation ion), ), this this behav behavior ior along with the program context is sent to the policy management service installed on the device and the user will be notified to make an appropriate decision. For example, the rewritten app may detect the location information being sent out to a Google server while the user is navigating with Google Map, and notify the user. Since the user is actively interacting with the device and understands the context very well, he or she can make a proper decision. In this case, the user will allow this behavior. To ensure good user experiences, the number of such prompts must be minimized. To do so, our policy service needs to accurately model the context for the user’s decisions. As a result, when an information leakage happens in the same same conte context, xt, the same same decisi decision on can be made made withou withoutt raisin raising g a prompt. prompt. After After exploring the design space of the context modeling and making a balance between sensitivity, performance overhead, and robustness, we choose to model the context using parameterized source and sink pairs.

5.2 Approach Overview

65

Consequently, our approach fulfills all the requirements: (1) actual privacy leaks are captur captured ed accur accurate ately ly at runtim runtime, e, with with the suppor supportt of insert inserted ed taint taint tracki tracking ng code; (2) the performance overhead of our approach is minimal, due to the static dataflow analysis in advance and numerous optimizations that are applied to the instrumentation code; (3) the deployment of our approach is simple, as we only rewrite the original app to enforce certain information flow policies and no firmware modification is needed; (4) policy enforcement is context-aware, because the user’s decisions are associated with abstract program contexts. We implement a prototype, Capper , ,1 in 16 thousand lines of Java code, based on the Java bytecode optimization framework Soot [ 23 23]]. We leverage leverage Soot’s Soot’s capability to perform static dataflow analysis and bytecode instrumentation. We evaluate our tool on 4723 real-world privacy-breaching Android apps. Our experiments show that rewritten programs run correctly after instrumentation, while privacy leakage is effectively eliminated.

5.2

5.2.1

Approac pproach h Over Overvie view w

Key Techniques Techniques

Figure 5.1 5.1 depicts depicts an overview of our techniques. When a user is about to install an app onto his Android device, this app will go through our bytecode rewriting engine (BRIFT) and be rewritten into a new app, in which sensitive information flows are monitored by the inserted bytecode instructions. Therefore, when this

Query Install

APK

Prompt

Decide

APK’ App

Service

BRIFT Warning Dialog

Phone Capper Fig. 5.1 Architecture of Capper

1

Privacy Policy Enforcement with Re-writing. Capper is short for Context-Aware Privacy

66

5


new app is actually running on the device and is observed to send out sensitive information, this behavior (along with the program context) will be reported to the policy management service for decision. If this behavior under this program context is observed for the first time, the policy management service will prompt the user to make a proper decision: either allow or deny such a behavior. The user’s decision will be recorded along with the program context, so the policy management service will make the recorded decision for the same behaviors under the same context. Therefore, our solution to defeat privacy leakage consists of the following two enabling techniques. (1) Bytecode Bytecode Rewriting Rewriting for Informa Information tion Flow Contro Control. l. Given a bytecode program, the goal of our bytecode rewriting is to insert a minimum amount of bytecode instructions into the bytecode program to trace the propagation of certai certain n sensit sensitiv ivee inform informati ation on flows flows (or taint) taint).. To achie achieve ve this this goal, goal, we first first conduct static dataflow analysis to compute a number of program slices that are involved in the taint propagation. Then we insert bytecode instructions along the program slices to keep track of taint propagation at runtime. Further, we perform a series of optimizations to reduce the amount of inserted instructions. Please refer to Chap. 4 for more details. The user user allo allows ws or deni denies es a cert certai ain n (2) ContextContext-awa aware re Policy Policy Enforc Enforcemen ement. t. The inform informati ation on flow flow in a specifi specificc conte context. xt. The key key for a conte contextxt-aw aware are polic policy y enforcement is to properly model the context. The context modeling must be sensitive sensitive enough to distinguish different program contexts, but not too sensitive. sensitive. Otherwise, a slight difference in the program execution may be treated as a new context and may cause unnecessarily annoying prompts to the user. Further, the context modeling should also be robust enough to counter mimicry attacks. An attacker may be able to “mimic” a legitimate program context to bypass the context-aware context-aware policy enforcement.

5.3

Contex Context-A t-Awar waree Polic Policy y

Once our inserted monitoring code detects an actual privacy leakage, policy service will enforce privacy policy based on user preferences. To To be specific, the service app inquires user’s decision upon detection and offers the user options to either “onetime” or “always” allow or deny the specific privacy breaching flow. The user can then make her decision according to her user experience and the policy manager will remember users preference for future decision making situations if the “always” option is chosen. There exist two advantages to enforce a privacy policy with user preference history. Firstly, it associates user decisions with certain program contexts and can thus selectively restrict privacy-related information flow under different circumstances. Privacy-related outbound traffic occurs in both benign and malicious semantics.

5.3 Context-Aware Policy

67

However However,, from dataflow analysis perspective, perspective, it is fairly hard to distinguish between, for example, a coordinates-based query towards a benign map service and a location leakage via some covert malicious process within, say, a wallpaper app. On the contrary, it is fairly straight-forward for a user to tell the difference because she knows the application semantics. With human knowledge, it is possible to avert overly strict restriction and preserve usability to the largest extent. Secondly, it avoids repetitive warning dialogs and improves user experience. Once an “always” decision is made, this decision will be remembered and used for the same scenario next time. Thus, the user doesn’t need to face the annoying dialog message over and over again for the exactly identical situation. However, it is non-trivial to appropriately model the program context specific to a user decision and the challenge lies in the way semantics is extracted from a dataflow point of view. We hereby discuss some possible options and our solution.

5.3.1

Taint Propagation Trace

To achieve high accuracy, accuracy, we first consider using the exact execution trace as pattern to represent a specific information flow. An execution trace can be obtained at either instruction or method level. It consists of all the instructions or methods propagating sensitive data from a source to a sink, and therefore can uniquely describe a dataflow path. Formally, Formally, we define a trace-based trace-based context as C T T D Œ t 0 ; t 1 ; : : : ; t n , where each t i represents an individual instruction or method. When a user decision is made for a certain information flow, its execution trace is computed and saved as a pattern along with user preference. Next time when a new leakage instance is detected, the trace computation will be done on the new flow and compared with saved ones. If there exists a match, action taken on the saved one will be applied correspondingly. Neve Neverth rthele eless, ss, there there exist exist two two major major drawba drawbacks cks with with this this approa approach. ch. Firstl Firstly y, dynamic tracing is considerably heavy-weight. Comparison of two traces is also fairly expensive. This may affect the responsiveness of interactive mobile apps. Secondly, each dataflow instance is modeled overly precisely. Since any execution divergence will lead to a different trace pattern, even if two leakage flows occur within the same semantics, it is still difficult to match their traces. This results in repeated warning messages for semantically equivalent privacy-related dataflows.

5.3.2

Source and Sink Call-Sites

Trace-based approach is too expensive because the control granularity is extremely fine. fine. We ther theref efor oree atte attemp mptt to rela relax x the the stri strict ctne ness ss and and achi achieeve bala balanc ncee in the the accu accura raccyefficiency trade-off. We propose a call-site approach which combines source-sink call-sites to model privacy flow. That is to say, information flows of same source and sink call-sites are put into one category. Thus, such a context is formally defined

68

5


to be C SS SS D fSource; Sink g. Once an action is taken on one leakage flow, the same acti action on will will be take taken n on futur futuree sens sensit itiive info inform rmat atio ion n flow flow in the the same same cate categor gory y. To this this end, we introduce labels for source and sink call-sites. Information flows starting from or arriving at these call-sites are associated with corresponding labels, so that they can be differentiated based on these labels. With a significant improvement of efficiency, this approach is not as sensitive to program contexts as the traced-based one—different execution paths can start from the same origin and end at the same sink. However, we rarely observed this inaccuracy in practice because the app execution with same source and sink callsites usually represents constant semantics.

5.3.3

Parameterized Parameterized Source and Sink Pairs Pairs

In addi additi tion on to sour source ce/s /sin ink k call call-s -sit ites es,, para parame mete ters rs fed fed into into thes thesee call call-s -sit ites es APIs APIs are are also also cruc crucia iall to the the sema semant ntic ic cont contex exts ts.. For For exam exampl ple, e, the the user user may may allo allow w an app app to send send data data to certain trustworthy URLs but may not be willing to allow access to the others. Therefore, it is important to compare critical parameters to determine if a new observed flow matches the ones in history. We therefore define this parameterized callsites based context as a triple C PSS PSS D fSource; Sink ; Paramsg, where Params contains the critical parameters that are consumed by the two callsites. Notice that checking parameters can minimize the impact of mimicry attack. Prior research shows that vulnerable Android apps are subject to various attacks [7, 24 24– –27 27]. ]. For instance, an exposed vulnerable app component can be exploited to leak private information to an attacker-specified URL. Without considering the URL parameter, it is difficult, if not possible, to distinguish internal use of critical call-sites from hijacking the same call-sites to target a malicious URL. Once a flow through some call-sites is allowed and user preference is saved, mimicking attack using same call-sites will also get approved. On the contrary, a parameter-aware approach can differentiate outgoing dataflows according to the destination URL, and thus, exploitation of a previously allowed call-sites will still raise a warning. Besi Beside dess the the URL of a Inter Interne nett API, API, we also also cons consid ider er some some othe otherr crit critic ical al combin combinati ations ons of an API and its its parame paramete terr. Table able 5.1 summari summarize zess our list. list. The target of a sink API is sensitive to security. Similar to Internet APIs, target phone numbers are crucial to sendTextMessage() APIs and thus need watching. On the other hand, some source call-sites also need to be distinguished according to the

Table 5.1 APIs and critical parameters API description ion

Source or sink

Critical parameter

Send data to internet

Sink

Destination URL

Send SMS message

Sink

Target phone number

Query contacts database

Source

Source URI


69

parameters. For instance, the API that queries contacts list may obtain different data depending on the input URI (e.g., ContactsContract.CommonDataKinds.Phone for phone number, ContactsContract.CommonDataKinds.Email for email).

5.3.4

Implementation

We implement the policy service as a separate app. This isolation guarantees the security of the service app and its saved policies. In other words, even if the client is exploited, the service is not affected or compromised, and can still correctly enforce privacy privacy policies. The The serv servic icee app app comm commun unic icat ates es with with a rewr rewrit itte ten n clie client nt app app sole solely ly thro throug ugh h Andr Androi oid d IPC. IPC. Once Once a clie client nt app app want wantss to quer query y the the serv servic ice, e, it enca encaps psul ulat ates es the the labe labels ls of sour source ce and sink call-sites as well as the specific critical parameter into an Intent as extra data payload. The client app is then blocked and waiting for a response. Since this transaction is usually fast, the blocking won’t affect the responsiveness of the app mostly. On the service side, it decodes the data and searches for a match in its database. If there exists a match, it returns immediately with the saved action to the client. Otherwise, the service app will display a dialog message within a created Activity. The user decision is saved if the user prefers, or not saved otherwise. Either way, user’s option is sent back to the client. On receiving the response from service, the rewritten app will either continue its execution or skip the sink call-site with respect to the reply. It is noteworthy that we have to defend against spoofing attack and prevent forge forged d mess messag ages es from from bein being g sent sent to a clie client nt app. app. To addr addres esss that that,, we inst instrurument the client app to listen for the service reply with a dynamically registered BroadcastReceiver . When a broadcast message is received, the receiver is immediately unregistered. Thus, attack window is reduced due to this on-demand receiver registration. Further, to restrict who can send the broadcast, we protect the receiver with a custom permission . Broadcaster without this permission is therefore unable to send messages to the client app. To defeat replay attack, we also embed a session token in the initial query message, and the client app can therefore authenticate the sender of a response message. Simi Simila larl rly y, we need need to prot protec ectt a poli policcy serv servic icee from from spoo spoofin fing, g, too. too. The The serv servic icee app app has has to chec check k the the call caller er iden identi tity ty from from a boun bound d comm communi unica cati tion on via via getCallingUid() , so that a malicious application cannot pretend to be another app and trick the service to configure the policies for the latter.

5.4

Experi Experimen mental tal Evalu Evaluati ation on

To evalu evaluate ate the effica efficacy cy,, correc correctne tness ss and effici efficienc ency y of Capper Capper,, we conduc conducte ted d experiments on real-world Android applications. In the policy setting, we consider IMEI, owner’s phone number, location, contacts to be the sensitive information

70

5


sources, and network output APIs (e.g., OutputStream.write() , HttpClient. execute() , WebView.loadUrl() , URLConnection.openConnection() and SmsManager.sendTextMessage() ) as the sinks. The action on the sink can be “block” or “allow”. User can also check an “always” option to have the same rule applied to future cases of the same semantic context. Note that this policy is mainly for demonstrating the usability of Capper. More work is needed to define a more complete policy for privacy leakage confinement. We obtain 4915 applications from Google Play and use them as our experiment sample set. We then perform bytecode rewriting on these apps to enable runtime priva privacy cy protec protectio tion. n. We conduc conductt the exper experime iment nt on our test test machin machine, e, which which is equipped equipped with Intel( Intel(R) R) Xeon(R) Xeon(R) CPU E5-2690 E5-2690 (20 M Cache, Cache, 2.90 GHz) and 200 200 GB of physic physical al memory memory.. The operating operating system system is CentOS CentOS 6.3 (64 bit). bit). To verif verify y the effectiveness and evaluate runtime performance of the rewritten apps, we further run them on a real device. Experiments are carried out on Google Nexus S, with Android OS version 4.0.4.

5.4.1

Summarized Analysis Results

Figure 5.2 illustrates the partition of 4915 realworld apps. Amongst these apps, Capper Capper did not finish finish analyz analyzing ing 314 of them within within 30 min. min. These These apps are fairly fairly larg largee (man (many y over over 10 MB). MB). Appl Applic icat atio ionn-wi wide de data dataflo flow w anal analys ysis is is kno known to be expensive for these large programs. We further extended the analysis timeout to 3 h, and 122 more apps were successfully successfully analyzed analyzed and rewritten. rewritten. Given Given sufficie sufficient nt analysis time, we believe that the success rate can further increase from currently 96 % to near nearly ly 100 100 %.

2% 3% 4%

No Need for Rewriting Safely Rewritten Unsafe with Native Code

24%

Unsafe Unsa fe with Reflec Re flection tion 67% 67 %

Timeout

results on 4915 realworld android android apps Fig. 5.2 Bytecode rewriting results


71

Out of the 4723 apps that were completely processed by Capper, 1414 apps may leak private information, according to our static analysis, and Capper successfully performed bytecode rewriting on them. We observed that most of them leak IMEI and location information. These types of information are frequently sent out by apps due to analytic or advertisement reasons. Apps may also sometimes leak owner’s phone number or phone numbers from contacts. contacts. For the rest of them (67 %), static analysis couldn’t find a viable path from the sensitive information sources to the network output sinks. It means that these apps do not leak private information, so no rewriting is needed for these apps. For these 1414 apps that were rewritten, we further investigate their use of native code and reflection. Our study shows that unknown native code is invoked within the taint slices for 118 (2 %) apps. As a bytecode-level bytecode-level solution, our system cannot cannot keep track of information flow processing in the native code. So the information flow may be broken on these unknown native calls. calls. The rest 3 % contain reflective reflective calls within the slices. If the class name and method name cannot be resolved statically, we do not know how information is propagated through this function. Therefore, totally 5 % apps may be unsafe and may not be fully enforced enforced with the specified policies. policies. The best suggestion for the end user is not to use these unsafe rewritten apps due to the potential incompleteness in policy enforcement. We compute the taint propagation slice for each single leakage instance, and cond conduc uctt a quan quanti tita tati tive ve stud study y on them them.. Whil Whilee most most of the the apps apps retr retrie ieve ve pri privacyacy-re rela late ted d information moderately, some apps leak user’s privacy through up to 31 taint slices. Such apps usually enclose various Ads libraries, each of which acquires private data separately. Most of the program slices represent a small portion of the entire application, with with the the avera verage ge propo proport rtio ion n bein being g 2.48 2.48 %. Howe Howeve verr, we do obser observe ve that that for a few few applic applicati ations ons,, the slices slices take up to 54 % of the total total size. size. Some Some sample sampless (e.g., (e.g., com.tarsin.android.dilbert) are fairly small. Although the total slice size is only up to several thousands Jimple statements, the relative percentage becomes high. Apps like de.joergjahnke.mario.android.free and de.schildbach.oeffi operate on privacy inform informati ation on in an exces excessi sive ve way way, and due to the conser conserva vati tive ve nature nature of static static analysis, many static and instance fields are involved in the slices. We measure the program size on different stages. We observe that the increase of the program size is roughly proportional to the slice size, which is as expected. After instrumentation, the increased increased size is, on average, average, 10.45 % compared to the original program. The proposed optimizations are effective and they significantly reduce the program program sizes. In the end, the average average size of inserted code drops to 4.48 %.

5.4.2

Detailed Analysis

Here Here we pres presen entt the the deta detail iled ed anal analys ysis is resu result ltss of ten ten appl applic icat atio ions ns,, and and demo demons nstr trat atee the the effectiveness of context-aware privacy policy enforcement. To this end, we rewrite these apps with Capper, and run them in a physical device along with our policy

72

5


service app. We further manually trigger the privacy leakage components in the app, so that inserted code would block the program and query policy service for decision. The service will then search its own database to see if there exists a rule for the specific dataflow context of the requesting app. If a corresponding rule exists, service replies to requester immediately. Otherwise, a dialog is displayed to the user asking for decision. The user can also check the “always” option so that current decision will be saved for further reference (Fig. 5.3 5.3). ). Notice that, in order to test the context-awareness of our approach, we always check this option during the experiment. Therefore, from the logcat information on the service side, we may observe and compare the number of queries an app makes with the amount of warning dialogs prompted to the user. We also compute the number of information flow contexts with trace-based model for comparison. Table 5.2 lists the summarized results including number of queries, prompts and tracetrace-bas based ed conte contexts xts.. For For these these apps, apps, the prompt prompt number number is often often equal equal to

Fig. 5.3 Warning dialog

Effectiveness of context-aware context-aware policy enforcement Table 5.2 Effectiveness ID

App-Version

1

artfulbits.aiMinesweeper-3.2.3

2

Queries

Prompts

Trace-based contexts

5

2

2

com.avantar.wny-4.1.5.1

3

2

2

3

com.avantar.yp-4.1.5

4

2

2

4

com.bfs.papertoss-1.09

3

2

2

5

com.rs.autorun-1.3.1

10

4

6

6

com.skyfire.browser-2.0.1

4

2

2

7

com.startapp.wallpaper.browser-1.4.15

5

3

3

8

mabilo.ringtones.app-3.6

5

3

3

9

mabilo.wallpapers-1.8.4

4

2

2

10

net.flixster.android-2.9.5

6

3

3


73

or sometimes slightly smaller than the amount of trace-based dataflow contexts, while the number of queries is usually much larger than that of prompts. This means leakage contexts are modeled correctly: disparate contexts are usually treated differently in our callsite-based approach; and equivalent contexts are enforced with the the same same rule. rule. The The fund fundam amen enta tall reas reason on is that that Andro Android id apps apps are are ofte often n comp compon onen enti tize zed d while a separate component exercises a dedicated function. Firstl Firstly y, differ different ent types types of priva private te inform informat ation ion are access accessed ed throug through h separa separate te execution paths. Some apps (ID 2, 3) retrieve both device identifier and location information, and send them at separate sinks. Similarly, mabilo.ringtones.app leaks both geolocation data and user’s phone number. Secondly, the same type of privacy-related data can be retrieved from isolated packages but serves as different purposes. These apps (ID 4, 7, 10) consist of a main program and advertisement components, both of which produce outgoing traffic taking IMEI or location data. Take net.flixster.android as an example. In this movie app, location data is both sent to flixster server for querying theaters and to Ads server for analytical purpose. Further, the same private data can be accessed in one package but contributes to different use cases. For instance, com.rs.autorun obtains device identifier via getDeviceId() API call in the first place. Then the app sends IMEI at multiple sinks sinks to load load advert advertise isemen mentt View, retrie retrieve ve configu configurat ration ion or upload upload analyt analytica icall information including demographics, ratings, etc. Each semantic context is captured from the perspective of taint propagation trace. However, due to the use of same sink call-site, all the analytical traffics are considered to be with the same semantic context in the call-site model. Though call-site-based model is not as sensitive to context as the trace-based approach, it is still able to differentiate the contexts of Ads View , configuration file loading and analytical dataflow, according to disparate sink call-sites. We also also obse observ rvee inco incons nsis iste tenc ncy y betw betwee een n trac traced ed-b -bas ased ed cont contex exts ts and and real real prog progra ram m seman emanttics. ics. That hat lies ies in apps pps (ID (ID 1, 6, 7, 8, 9, 10) 10) whic which h acqui cquire re locati location on inform informati ation, on, where where the number number of tracetrace-bas based ed conte contexts xts excee exceeds ds that that of actual actual conte contexts xts.. Geogra Geographi phicc locat location ion is obtain obtained ed either either approx approxima imatel tely y from from getLastKnownLocation() , or from a real-time update by registering a listener through requestLocationUpdates() and and rece receiiving ving upda update tess via via call callba back ck onLocationChanged(Location) . Some apps adopt both ways so as to achieve higher higher accura accuracy cy.. For For instan instance, ce, artful artfulbit bits.a s.aiMi iMines neswee weeper per reads reads locat location ion data data by calling getLastKnownLocation() at the very beginning of the program, stores it into an instance field, and then periodically updates it with the aforementioned callba callback. ck. Conseq Consequen uently tly,, two separa separate te paths paths achie achieve ve one sole sole purpos purposee and thus thus should be considered as of equivalent context. However, from either trace or callsite point of view, view, there exist two separate contexts. Despite the disparity, we believe that this would at most introduce one extra prompt. Further, it is also reasonable to split this context into two, because one conducts a one-time access while the other obtains data repeatedly.

74

5.4.3

5


Runtime Performance

We compare the runtime overhead of bytecode rewriting with that of dynamic taint analysis in TaintDroid (on Android gingerbread version 2.3.4). In principle, if an app leaks private information only occasionally, the rewritten version would have much better performance than the original version on TaintDroid. This is because in the rewritten app nearly no instrumentation code is added on non-leaking execution paths whereas TaintDroid has to monitor taint propagation all the time. Rather, we would like to compare the performance when the taint is actively propagated during the execution. This would be the worst-case scenario for Capper. Specifically, we build two customized applications for the measurement. Both leak IEMI string via getDeviceId() API, decode the string to a byte array, encrypt the array by doing XOR on every byte with a constant key, reassemble a string from the encrypted byte array, and in the end, send the string out to a specific URL through a raw socket interface. The only difference is that one merely sends the IMEI string, while while the other also appends appends extra information information of totally totally 10 KB to the IMEI string before encryption. In other words, the former conducts a short-period data data tran transf sfer er whil whilee the the latt latter er mani manipu pula late tess the the outgo outgoin ing g mess messag agee with within in a much much long longer er period. We expect that the execution of the first one to be mainly under mterp interpretation mode and the execution of the second to be boosted by JIT. We meas measur ured ed the the exec execut utio ion n time time from from getDeviceId() to the the netw networ ork k interface. We observed that the rewritten application runs significantly faster than the origin original al on TaintDr aintDroid oid,, and only only yields yields fairl fairly y small small overh overhead ead compar compared ed to the original one running on Android, for both short-period and long-period data propagation. Table 5.3 illustrates the result of runtime measurement. While our appr approa oach ch caus causes es 13 and and 1.5 1.5 % over overhe head ad for for shor shortt and and long long data data prop propag agat atio ion n respecti respectivel vely y, TaintDroid aintDroid incurs 330 and 47 % overhead overhead.. The results also show show that the presence of JIT significantly reduces runtime overhead, in both approaches. However, though newer version of TaintDroid (2.3.4 and later) benefits from JIT support, the overhead caused by dynamic instrumentation is still apparently high. To further confirm the runtime overhead of the rewritten programs, we conduct an experiment on Google Nexus S, with Android OS version 4.0.4. It is worth mentioning that such verification on real device requires considerable repetitive manual manual effor efforts ts and thus thus is fairl fairly y time time consum consuming ing.. We theref therefore ore random randomly ly pick pick 10 apps from the privacy-breaching ones, rewrite them, run both the original app and secured one on physical device, and compare the runtime performance before and after rewriting. We rely on the timestamps of Android framework debugging information (logcat logs) to compute the app load time as benchmark. The app load time is measured from when Android ActivityManager starts an Activity

Table 5.3 Runtime performance evaluation evaluation

Orig. rig.

Orig Orig.. on TaintD intDro roid id

Rewr Rewrit itte ten n

Short

30 ms

130 ms

34 ms

Long

10,583 ms ms

15,571 ms ms

10,742 ms ms

References

75

component to the time the Activity thread is displayed. This includes application resolution by ActivityManager , IPC and graphical display. Our observation complies complies with prior experiment experiment result: result: rewritt rewritten en apps usually usually have have insignifica insignificant nt slowdow slowdown, n, with an average average of 2.1 %, while the maximum runtime overhead overhead is less than than 9.4 %.

References 1. Enck W, Octeau D, McDaniel P, Chaudhuri S (2011) A study of Android application security. In: Proceedings of the 20th usenix security symposium, August 2011 2. Enck W, Gilbert P, Chun BG, Cox LP, Jung J, McDaniel P, Sheth AN (2010) TaintDroid: an informatio information-flo n-flow w tracking tracking system system for realtime realtime privac privacy y monitoring monitoring on smartphon smartphones. es. In: Proceedings of the 9th USENIX symposium on operating systems design and implementation (OSDI’10), October 2010 3. Hornyack P, Han S, Jung J, Schechter S, Wetherall D (2011) These aren’t the droids you’re looking for: retrofitting Android to protect data from imperious applications. In: Proceedings of CCS, 2011 4. Zhou Zhou Y, Jiang X (2012) (2012) Dissecting Dissecting Android malware: malware: characteriz characterization ation and evoluti evolution. on. In: Proceedings of the 33rd IEEE symposium on security and privacy (Oakland’12), May 2012 5. Zhou Y, Wang Z, Zhou W, Jiang X (2012) Hey, you, get off of my market: detecting malicious apps in official and alternative Android markets. In: Proceedings of 19th annual network and distributed system security symposium (NDSS’12), February 2012 6. Wu C, Zhou Y, Patel K, Liang Z, Jiang X (2014) AirBag: boosting smartphone resistance to malware infection. In: Proceedings of the 21th annual network and distributed system security symposium (NDSS’14), February 2014 7. Lu L, Li Z, Wu Z, Lee W, Jiang Jiang G (2012) (2012) CHEX: CHEX: stat statica ically lly vetti vetting ng Androi Android d apps apps for compon component ent hijacking hijacking vulnerabilitie vulnerabilities. s. In: Proceedin Proceedings gs of the 2012 ACM conferenc conferencee on computer computer and communications security (CCS’12), October 2012 8. Gibler Gibler C, Crusse Crussell ll J, Ericks Erickson on J, Chen Chen H (2012) (2012) Androi AndroidLe dLeaks aks:: automa automatic ticall ally y detect detecting ing potential privacy leaks in Android applications on a large scale. In: Proceedings of the 5th international conference on Trust and Trustworthy Computing, 2012 9. Kim J, Yoon Y, Yi K, Shin J (2012) Scandal: Static Analyzer for Detecting Privacy Leaks in Android Applications. In: Mobile Security Technologies (MoST) 2012 10. Mann C, Starostin A (2012) A framework for static detection of privacy leaks in Android applications. In: Proceedings of the 27th annual ACM symposium on applied computing, 2012 11. Yang Z, Yang M, Zhang Y, Gu G, Ning P, Wang XS (2013) AppIntent: analyzing sensitive data transmission in Android for privacy leakage detection. In: Proceedings of the 20th ACM conference on computer and communications security (CCS’13), November 2013 12. Ongtang M, McLaughlin S, Enck W, McDaniel P (2009) Semantically rich application-centric security in Android. In: Proceedings of ACSAC, 2009 13. Enck W, Ongtang M, McDaniel P (2009) On lightweight mobile phone application certification. In: Proceedings of the 16th ACM conference on computer and communications security (CCS’09), November 2009 14. Conti Conti M, Nguyen Nguyen VTN, VTN, Crispo Crispo B (2011 (2011)) Crepe: Crepe: conte contextxt-rel relate ated d policy policy enforc enforceme ement nt for Android. In: Proceedings of the 13th international conference on information security, 2011 15. Zhou Zhou Y, Zhang Zhang X, Jiang Jiang X, Freeh Freeh VW (2011 (2011)) Taming aming inform informati ationon-ste steali aling ng smartp smartphon honee applications (on Android). In: Proceedings of the 4th international conference on Trust and trustworthy computing, 2011 16. Nauman M, Khan S, Zhang X (2010) Apex: extending Android permission model and enforcement with user-defined runtime constraints. In: Proceedings of the 5th ACM symposium on information, computer and communications security, 2010

76

5


17. Beresford AR, Rice A, Skehin N, Sohan R (2011) Mockdroid: trading privacy for application functionality on smartphones. In: Proceedings of the 12th workshop on mobile computing systems and applications, 2011 18. Lang Langee M, Lieber Liebergel geld d S, Lacko Lackorzy rzynsk nskii A, Warg arg A, Peter Peter M (2011 (2011)) L4Andr L4Android oid:: a generi genericc operat operating ing system system frame framewo work rk for secure secure smartp smartphon hones. es. In: Procee Proceedin dings gs of the 1st ACM ACM works workshop hop on security and privacy in smartphones and mobile devices, 2011 19. Andrus Andrus J, Dall Dall C, Hof AV, Laadan Laadan O, Nieh Nieh J (2011) (2011) Cells: Cells: a virtua virtuall mobile mobile smartp smartphon honee architecture. In: Proceedings of SOSP, 2011 20. Shekh Shekhar ar S, Dietz M, Wallach allach DS (2012) (2012) Adsplit: Adsplit: separating separating smartphone smartphone advertisi advertising ng from applications. In: Proceedings of the 20th usenix security symposium, August 2012 21. Xu R, Sadi Sadi H, Anders Anderson on R (2012) (2012) Aurasi Aurasium: um: practi practical cal polic policy y enforc enforceme ement nt for Androi Android d applications. In: Proceedings of the 21th usenix security symposium, August 2012 22. Livsh Livshits its B, Jung Jung J (2013) (2013) Automa Automatic tic mediat mediation ion of priva privacy cy-se -sensi nsiti tive ve resour resource ce access access in smartphone applications. In: Proceedings of the 22th usenix security symposium, 2013 23. Soot: A Java Optimization Framework (2016) http://www http://www.sable.mcgill.ca/soot/ .sable.mcgill.ca/soot/ 24. Felt AP, Wang HJ, Moshchuk A, Hanna S, Chin E (2011) Permission re-delegation: attacks and defenses. In: Proceedings of the 20th USENIX security symposium, 2011 25. Grace M, Zhou Y, Wang Z, Jiang X (2012) Systematic detection of capability leaks in stock Android smartphones. In: Proceedings of the 19th network and distributed system security symposium, 2012 26. Zhou Y, Jiang X (2013) Detecting passive content leaks and pollution in Android applications. In: Proceedings of the 20th network and distributed system security symposium, 2013 27. Davi Davi L, Dmitri Dmitrienk enko o A, Sadeg Sadeghi hi AR, Winan Winandy dy M (2011) (2011) Privil Privileg egee escala escalatio tion n attack attackss on Android. In: Proceedings of the 13th international conference on information security, Berlin, Heidelberg, 2011

Chapter 6

Automatic Generation of Security-Centric Descriptions for Android Apps

awareness of end users, Android markets directly Abstract To improve the security awareness prese present nt two two clas classe sess of lite litera rall app app info inform rmat atio ion: n: (1) (1) perm permis issi sion on requ reques ests ts and and (2) text textua uall desc descri ript ptio ions ns.. Unfo Unfort rtun unat atel ely y, neit neithe herr can can serv servee the the need needs. s. A perm permis issi sion on list list is not only only hard to understand but also inadequate; textual descriptions provided by developers are not security-centric and are significantly deviated from the permissions. To fill in this this gap, gap, we propos proposee a novel novel techn techniqu iquee to automa automati tical cally ly genera generate te securi securitytycentric app descriptions, based on program analysis. We implement a prototype system system,, D ESCRIBEM E , and evaluate our system using both DroidBench and realworld Android apps. Experimental results demonstrate that D ESCRIBEM E enables a promising technique which bridges the gap between descriptions and permissions. A further user study shows that automatically produced descriptions are not only readable but also effectively help users avoid malware and privacy-breaching apps.

6.1 6.1


Unlike traditional desktop systems, Android provides end users with an opportunity to proactively accept or deny the installation of any app to the system. As a result, it is essential that the users become aware of each app’s behaviors so as to make appropriate decisions. To this end, Android markets directly present the consumers with two classes of information regarding each app: (1) the Android permissions requested and (2) textual description of the app’s behavior that is provided by the app’s developer. Unfortunately, neither of these can fully serve this need. Permission requests are not easy to understand. First, prior study [ 1] has shown that few users are cautious or knowledgeable enough to comprehend the security impl implic icat atio ions ns of Andr Androi oid d perm permis issi sion ons. s. Seco Second nd,, a perm permis issi sion on list list mere merely ly tell tellss the the user userss which permissions are used, but does not explain h ow they are used. Without such knowledge, one cannot properly assess the risk of allowing a permission request. For instance, both a benign navigation app and a spyware instance of the same app can require require the same permission permission to access GPS location, location, yet use it for completely completely different purposes. While the benign app delivers GPS data to a legitimate map server upon the user’s approval, the spyware instance can periodically and stealthily


77

78

6

Auto utomati maticc Gen Genera eration tion of Secu Securi rity ty-C -Ceentri ntricc Des Descrip cripti tio ons for for Andro ndroid id Apps Apps

leak the user’s location information to an attacker’s site. Due to the lack of context clues, a user is not able to perceive such differences via the simple permission enumeration. Textual descriptions provided by developers developers are not security-centric. There exists very little incentive for app developers to describe their products from a security perspective, and it is still a difficult task for average developers (usually inexperienced) to write dependable descriptions. Malware authors can also intentionally hide malice from innocent users by providing misleading descriptions. Previous studies [2 [2, 3] have revealed that the existing descriptive text deviate considerably from requested permissions. As a result, developer-driven description generation cannot be considered trustworthy. To address this issue, we propose a novel technique to automatically generate app descriptions which accurately describe the security-related behaviors of Android apps. To interpret panoramic app behaviors, we extract security behavior graphs as high-level program semantics. To create concise descriptions, we further condense the graphs by mining and compressing the frequent subgraphs. As we traverse Natural Language Language Generation Generation (NLG) to and parse parse these these graphs graphs,, we leve leverag ragee Natural automatically produce concise, human-understandable descriptions. A series of efforts have been made to describe the functionalities of traditional Java programs as human readable text via NLG. Textual summaries are automatically produced for methods [4 [ 4], method parameters [5 [ 5], classes [6 [ 6], conditional code snippets [ snippets [7 7] and algorithmic code structures [8 [ 8] through program analysis and comprehension. These works focus upon depicting the intra-procedural structurebased operations, while our technique presents the whole-program’s semantic-level acti activit vitie ies. s. Furthe Furthermo rmore, re, we take take the first first step step towa towards rds automa automatin ting g Android Android app description generation for security purposes. We implement a prototype system, D ESCRIBEM E , in 25 thousand lines of Java code code.. Our beha behavi vior or grap graph h gener generat atio ion n is buil builtt on top top of Soot Soot [9], whil whilee our our desc descri ript ptio ion n production leverages an NLG engine [10 [ 10]] to realize texts from the graphs. We evalu evaluat atee our system system using using both both DroidB DroidBenc ench h [ 11 11]] and real-w real-worl orld d Androi Android d apps. apps. Experimental results demonstrate that D ESCRIBEM E is able to effectively bridge the gap between descriptions and permissions. A further user study shows that our automatically-produced descriptions are both readable and effective at helping users to avoid malware and privacy-breaching apps.

6.2 6.2

6.2.1 6.2.1

Over Overvi view ew

Proble Problem m Statem Statement ent

Figures 6.1 and 6.2 demonstrate the two classes of descriptive metadata that are associated with an Android app available via Google Play. The app shown leaks the

6.2 Overview

79

Fig. 6.1 Permission requests

user’s phone number and service provider to a remote site. Unfortunately, neither of these two pieces of metadata can effectively inform the end users of the risk. The permission list (Fig. 6.1 6.1)) simply enumerates all of the permissions requested by the app while replacing permission primitives with straightforward explanations. Besi Beside des, s, it can can mere merely ly tell tell user userss that that the the app app uses uses two two sepa separa rate te perm permis issi sions ons,, READ_PHONE_STATE and INTERNET , but cannot indicate that these two permissions are used consecutively to send out phone number. The textual descriptions are not focused on security. As depicted in the example (the top part in Fig. 6.2 6.2)), developers are more interested in describing the app’s functionalities, unique features, special offers, use of contact information, etc. Prior studies [2 [2, 3] have revealed significant inconsistencies between app descriptions and permissions. We propose a new technique, D ESCRIBEM E, which addresses these shortcomings and can automatically produce complementary security-centric descriptions for apps in Android Android marke markets. ts. It is worth worth noting noting that we do not expect expect to replac replacee the developers’ descriptions with our own. Instead, we hope to provide additional app information information that is written written from a security security perspectiv perspective. e. For example, example, as shown shown in the bottom part of Fig. 6.2 6.2,, our security-sensitive descriptions are attached to the

80

6


Fig. 6.2 Old+New descriptions

existing ones. The new description states that the app retrieves the phone number and writes data to network, and therefore indicates the privacy-breaching behavior. We expect to primarily deploy D ESCRIBEME directly into the Android markets, as illustrated in Fig. 6.3 6.3.. Upon receiving an app submission from a developer, the mark market et dri drives ves our our syst system em to anal analyz yzee the the app app and and crea create te a secu securi rity ty-c -cen entr tric ic desc descri ript ptio ion. n. The genera generated ted descri descripti ptions ons are then then attach attached ed to the corres correspon pondin ding g apps apps in the marke markets. ts. As a result result,, these these new new descri descripti ptions ons,, along along with with the origin original al ones, ones, are displayed to consumers once the app is ready for purchase.

6.2.2 6.2.2

Archit Architect ecture ure Ov Overv erview iew

Figure 6.4 Figure 6.4 depicts depicts the workflow of our automated description generation. This takes the following steps: (1) Behavior Graph Generation. Our Generation. Our natural language descriptions are generated via directly interpreting program behavior graphs. To this end, we first perform

6.2 Overview

81

NLG

Security-centric Descriptions Submit

Attach

Behavior Analysis & Natural Language Generation Analysis

Developer’s App

Android App Market Fig. 6.3 Deployment of DESCRIBEME getDeviceId

{

}

getDeviceId startRecording

startRecording

{ Android App

}

sendTextMessage

{

getDeviceId

sendTextMessage

}

Beha Be havi vior or Gr Grap aph h Ge Gene nera rati tion on

{

} {

startRecording sendTextMessage

}

Subg Su bgra raph ph Mi Mini ning ng & Gr Grap aph h Co Comp mpre ress ssio ion n

{

}

Security-Centric Descriptions

Natu Na tura rall La Lang ngua uage ge Ge Gene nera rati tion on

Overview of DESCRIBEM E Fig. 6.4 Overview

stati staticc progra program m analys analyses es to extra extract ct behav behavior ior graphs graphs from from Androi Android d byteco bytecode de progra programs. ms. Our progra program m analys analyses es enable enable a condit condition ion analys analysis is to reve reveal al the triggering conditions of critical operations, provide entry point discovery to better better unders understa tand nd the API callin calling g conte contexts xts,, and leve leverag ragee both both forwar forward d and backward dataflow analyses to explore API dependencies and uncover constant parameters. The result of these analyses is expressed via Security Security Behavior Graphs that expose security-related behaviors of Android apps. (2) Subgraph Mining and Graph Due to the complexity of objectGraph Compression. Compression. Due oriented, event-driven Android programs, static program analyses may yield sizable behavior graphs which are extremely challenging challenging for graph traversal and automated interpretation. To address this problem, we next reduce the graph size using subgraph mining. More concretely, we first leverage data mining based technique to discover the frequent subgraphs that bear specific behavior patterns. Then, we compress the original graphs by substituting the identified subgraphs with single graph nodes. (3) Natural Language Generation. Finally, we utilize natural language generation technique to automatically convert the semantic-rich graphs to human understandable scripts. Given a compressed behavior graph, we traverse all of its paths and translate each graph node into a corresponding natural language sentence. To To avoid redundancy, redundancy, we perform sentence aggregation to organically

82

6


combine the produced texts of the same path, and further assemble only the distinctive descriptions among all the paths. Hence, we generate descriptive scripts for every individual behavior graph derived from an app and eventually develop the full description for the app.

6.3

6.3.1 6.3.1

Securi Security ty Beha Behavior vior Graph Graph

Formal ormal Definit Definition ion

Similar to our approach to defining API dependencies in Chap. 3, we consider four fact factor orss as esse essent ntia iall when when desc descri ribi bing ng the the secu securi rity ty-c -cen entr tric ic beha behavi vior orss of an Andr Androi oid d app app sample: (1) API (1) API call and dependencies; (2) Trigger condition; (3) Entry point; dependencies ; (2) Trigger condition ; (3) Entry point; and (4) Constant (4) Constant.. To address all of the aforementioned factors, we describe app behaviors using Security Behavior Graphs (SBG). At a high level, an SBG consists of behavioral operations where some operations have data dependencies. Definition 1. A Security Behavior Graph is a directed graph G D .V ; E ; ˛/ over a set of operations ˙ , where: • The The set of of vert vertic ices es V corresponds to the behavioral operations (i.e., APIs or behavior patterns) in ˙ ; • The se set of edg edgees E  V  V corresponds to the data dependencies between operations; • The labeli labeling ng func functio tion n ˛ W V ! ˙ associates nodes with the labels of corresponding semantic-level semantic-level operations, where each label is comprised of 4 elements: behavior name, entry point, constant parameter set and precondition list. Notice that the behavior name can be either an API prototype or a behavior pattern ID. However, when we build the SBGs using static program analysis, we only extract API-level dependency graphs (i.e., the raw SBGs). Then, we perform frequent subgraph mining to identify common behavior patterns so that we can replace the subgraphs with pattern nodes. Graph mining and compression will be discussed in Sect. 6.4 6.4..

6.3.2

SBG of

Motivating Example

Figure 6.5 6.5 presents presents an SBG of the motivating example. It shows that the app first obtains the user’s phone number ( getLine1Number() ) and service provider name (getSimOperatorName() ), then encodes the data into a format string ( format (String,byte[]) ), and finally sends the data to network ( write(byte[]) ).

6.3 Security Behavior Graph Fig. 6.5 An example SBG

83

, OnClickListener.onClick, Ø const , Set cond const cond Set cond } findViewById(View.getId)==B findViewBy Id(View.getId)==Buon(“Confirm uon(“Confirm”) ”) } cond = {

, OnClickListener.onClick, Ø const , Set cond const cond Set cond } findViewById(View.getId)==Bu findViewById (View.getId)==Buon(“Confirm on(“Confirm”) ”) } cond = {

, OnClickListener.onClick, Set const , Ø cond const cond 100/app_id=an1005/ani=%s/dest=%s/phone_number=%s/company=%s/ Set const } const = { 100/app_id=an1005/ani=%s/dest=%s/phone_number=%s/company=%s/

, OnClickListener.onClick, Ø const , Set cond const cond Set cond } findViewById(View.getId)==B findViewById (View.getId)==Buon(“Confirm uon(“Confirm”) ”) } cond = {

All APIs in this graph are called after the user has clicked a GUI component, so they share the same entry point, OnClickListener.onClick . This indicates that these API calls will be directly triggered by the user. The sensiti sensitive ve APIs, including including getLine1Number() , getSimOperatorName() and write(byte[]) , are predominated by a UI-related condition. It checks whether the clicked component is a Button object of a specific name. There exist two security implications behind this information: (1) the app is usually safe to use, without leaking the user’s phone number; (2) a user should be cautious when she is about to click this specific button, because the subsequent actions can directly cause privacy privacy leakage. The encoding encoding operation, operation, format(String,byte[]) , take takess a cons consta tant nt form format at string as the parameter. Such a string will later be used to compose the target URL, so it is an important piece of knowledge, that can be used to understand the scenario in which the privacy-related data is used.

6.3. 6.3.3 3

Grap Graph h Ge Gene nera rati tion on

We apply the same techniques illustrated in Chap. 3 to extract API data dependencies, constant parameters and entry points. dataflow analyCondition Reconstruction Reconstruction We then perform both control-flow and dataflow ses to uncover the triggering conditions of sensitive APIs. All conditions, in general, play an essential role in security analysis. However, we are only interested in certain trigger conditions for our work. This is because our goal is to generate human understandable descriptions for end users. This implies that an end user should

84

6


Algorithm 2 Condition 2 Condition Extraction for Sensitive APIs SG  Supergraph Set  null Set api api  {sensitive API statements in the SG } for api 2 Set api api do Set pred  GetConditionalPredecessors(SG,api) for pred 2 Set pred pred do defined and used in pred do for 8v ar defined do DDG  BackwardDataflowAnalysis(v ar ) Set cond cond  ExtractCondition( DDG; v ar ) Set  Set [ f< api ; Set cond cond >g end for end for end for < API ; conditions > pairs output Set as a set of <

be able to naturally evaluate the produced descriptions, including any condition information. As a result, it is pointless if we generate a condition that cannot be directly observed by a user. Conse Consequ quen entl tly y, our our anal analys ysis is is only only focu focuse sed d on thre threee majo majorr type typess of cond condit itio ions ns that that users can can directly directly observe. observe. (1) User Interface. An end user actively communicates with with the the user user inte interf rfac acee of an app, app, and and ther theref efor oree she she dire direct ctly ly notic notices es the the UIUIrelated conditions, such as a click on a specific button. (2) Device status. Similarly, a user user can can also also notic noticee the the curre current nt phon phonee stat status us,, such such as WI WIFI FI on/of on/off, f, scre screen en locked/unlocked, speakerphone on/off, etc. (3) Natural environment. A user is aware of environmental factors that can impact the device’s device’s behavior, including the current time and geolocation. The algorithm for condition extraction is presented in Algorithm 2. This algorithm accepts a supergraph SG as the input and produces Set as the output. The supergraph SG is derived from callgraph and control-flow analyses; Set is a set of < a; c > pairs, each of which is a mapping between a sensitive API and its conditions. Given the supergraph SG, our algorithm first identifies all the sensitive API statements, Set api api , on the graph. Then, it discovers the conditional predecessors Set pred (e.g., IF statement) for each API statement via GetConditionalPredecessors() . Conditional predecessor means that it is a predominator of that API statement but the API statem statement ent is not its postdo postdomin minato atorr. Intuit Intuitiv ively ely,, it means means the occurr occurrenc encee of that API statement is indeed conditional and depends on the predicate within that predecessor. Next, for every conditional statement pred in Set pred , it performs backward dataflow analysis on all the variables defined or used in its predicate. The result of BackwardDataflowAnalysis() is a data dependency graph DDG, which represents the dataflow from the variable definitions to the conditional statement. The algorithm further calls ExtractCondition() , which traverses this DDG and extracts the conditions Set cond cond for the corresponding api statement. In the end, the API/conditions pair < api; Set cond cond > is merged to output set Set .

6.3 Security Behavior Graph

85

We reiterate that ExtractCondition() only focuses on three types of conditions: user interface, device status and natural environment. It determines the condition types by examining the API calls that occur in the DDG. For instance, an API call call to findViewById() indi indica cate tess the the cond condit itio ion n is asso associ ciat ated ed with with GUI GUI comp compone onent nts. s. The APIs retrieving phone states (e.g., isWifiEnabled() , isSpeakerphoneOn() ) are clues to identify phone status related conditions. Similarly, if the DDG involves time- or location-related APIs (e.g., getHours() , getLatitude() ), the condition is corresponding to natural environment. User Interface Analysis in Android Apps We take special considerations when extracting UI-related conditions. Once we discover such a condition, we expect to know exactly which GUI component it corresponds to and what text this GUI actually displays to the users. In order to retrieve GUI information, we perform an analysis on the Android resource files for the app. Our UI resource analysis is different from prior work [ 12 12]] in two aspects. Firstly, the prior work aims at connecting texts to program entry points, whereas we associate textual resources to conditional statements. Secondly, the previous study did not consider those GUI callbacks that are registered in XML layout files. In contrast, we handle both programmatically and statically registered callbacks in order to guarantee the completeness. Figure 6.6 Figure 6.6 illustrates illustrates how we perform UI analysis. This analysis takes four steps. Firs First, t, we anal analyz yzee the the res/values/public.xml file file to retr retrie ieve ve the the mappi mapping ng betw betwee een n the GUI ID and GUI name. Then, we examine the res/values/strings.xml file to extract the string names and corresponding string values. Next, we recursively check all layout files in the res/layout/ directory to fetch the mapping of GUI type type,, GUI GUI name name and and stri string ng name name.. At last last,, all all the the infor informa mati tion on is comb combin ined ed to generate a set of 3-tuples {GUI type, GUI ID, string value}, which is queried by ExtractCondition() to resolve UI-related conditions.

res/values/public.xml Step 1

res/values/strings.xml Send binary sms (to port 8091) Step 2

res/layout/main.xml Step 3

{type=Checkbox, id name=bina name=binary, ry, string name=send_binarysms}

Step 4

3-tuple={type=Checkbox, id=0x7f050002, text=Send binary sms (to port 8091)} {GUI type, GUI ID, text}

resource analysis analysis Fig. 6.6 UI resource

{GUI type, id name, string name}

86

6


Intuitively,, we could use a constraint solver to compute prediCondition Solving Intuitively cates and extract concrete conditions. However, we argue that this technique is not suitable for our problem. Despite its accuracy, a constraint solver may sometimes genera generate te exces excessi sive vely ly sophis sophistic ticate ated d predic predicate ates. s. It is theref therefore ore extre extremel mely y hard hard to describe such complex conditions to end users in a human readable manner. As a result, we instead focus on simple conditions, such as equations or negations, because their semantics can be easily expressed using natural language. Theref Therefore ore,, once once we have have extra extracte cted d the definit definition ionss of condit condition ion varia variable bles, s, we further further analyz analyzee the equat equation ion and negat negation ion operat operation ionss to comput computee the condit condition ion predic predicate ates. s. To this this end, end, we analyz analyzee how how the varia variable bless are eval evaluat uated ed in condiconditional statements. Assume such a statement is if(h if(hou our r == 8). In its predicate record rd the the cons consta tant nt value alue 8 and and sear search ch back backwa ward rd for for the the (hou (hour r == 8), we reco definition definition of variabl variablee hour. If the value of hour is received directly from API call getHours() , we know that the condition is curr curren ent t time time is equa equal l to 8:00am . For conditions that contain negation, such as a condition like WIF WIFI I is NOT enabled enabled, we exami examine ne the compar compariso ison n operat operation ion and compar compariso ison n value value in the predic predicate ate to retrie retrieve ve the potent potential ial negat negation ion inform informati ation. on. We also also trace trace back back across the entire def-use chain of the condition variables. If there exists a negation operation, we negate the extracted condition.

6.4

Behavior Behavior Mining and Graph Graph Compres Compression sion

Static analysis sometimes results in huge behavior graphs. To address this problem, we identify higher-level behavior patterns from the raw SBGs so as to compress the raw graphs and produce more concise descriptions. Experience tells us certain APIs are typically used together to achieve particular functions. For example, SMSManager.getDefault() always happens before SMSManager.sendTextMessage() . We therefore expect to extract these behavior patterns, so that we can describe each pattern as an entirety instead of depicting every API included. To this end, we first discover the common subgraph patterns, and then compress the original raw graphs by collapsing pattern nodes. To this end, we focus on 109 security-sensitive APIs and perform “API-oriented” behavior mining on 1000 randomly-collected top Android apps. We first follow the approach of our prior work (Chap. 3) and conduct concept learning to obtain these critical APIs. Then, we construct the subset specific to each individual API. In the end, we apply subgraph mining algorithm [13 [ 13]] to each subset. Figure 6.7 Figure 6.7 exemplifies exemplifies our mining process. Specifically, Specifically, it shows that we discover a behavior pattern for the API getLastKnownLocation() . This pattern involves two other API calls, getLongitude() and getLatitude() . It demonstrates the common practice to retrieve location data in Android programs. Now that we have identified common subgraphs in the raw SBGs, we can further compress these raw graphs by replacing entire subgraphs with individual nodes. This involves two steps, subgraph isomorphism and subgraph collapse. We utilize

6.5 Description Generation

87

getLastKnownLocation()

getLastKnownLocation()

getLongitude()

getLatitude()

getL ge tLon ongi gitu tude de() ()

getAltitude()

write()

getL ge tLat atit itu ude( e())

getFromLocation()

a) raw SBG# 1

b) raw SBG# 2

getLastKnownLocation() getLongitude()

Graph Mining

getLatitude()

b) frequent paern

getLastKnownLocation() ation() Fig. 6.7 Graph mining for getLastKnownLoc

the VF2 [14 [14]] algorithm to solve the subgraph isomorphism problem. In order to maximize the graph compression rate, we always prioritize a better match (i.e., larger subgraph). To perform subgraph collapse, we first replace subgraph nodes with one single new node. Then, we merge the attributes (i.e., context, conditions and constants) of all the removed nodes, and put the merged label onto the new one.

6.5

Descri Descripti ption on Genera Generatio tion n

6.5.1 Automat Automatically ically Generated Generated Description Descriptionss Give Given n a behav behavior ior graph graph SBG, we transl translate ate its semant semantics ics into into natura naturall langua language ge descri descripti ptions ons.. This This descri descripti ptive ve langua language ge follo follows ws a subset subset of Engli English sh gramma grammarr, illust illustrat rated ed in Fig. Fig. 6.8 using using Extend Extended ed Backus Backus-Na -Naur ur form form (EBNF) (EBNF).. The description of an app is a conjunction of individual sentences. An atomic sentence makes a statement and specifies a modifier . Recursively, a non-empty atomic modifier can be an adverb clause of condition, which contains another sentence . The translation from a SBG to a textual description is then to map the graph components to the counterparts in this reduced language. To be more specific, each vertex of a graph is mapped to a single sentence , where the API or behavioral patt patter ern n is repr repres esen ente ted d by a statement ; the condit condition ions, s, conte contexts xts and consta constant nt parameter parameterss are expressed expressed using a modifier . Each edge is then translated to “ and” to indicate data dependency. One sentence may have several modifier s. This reflects the fact that one API call can be triggered in compound conditions and contexts, or a condition/context may accept several parameters. The modifiers are concatenated with “ and” or “or” in order to verbalize specific logical relations. A context modifier begins with “once” to show the temporal precedence. A condition modifier starts with either “if” or “depen dependi ding ng on if”. The former is applied when a condition is statically

88

6


Fig. 6.8 An abbreviated syntax of our descriptions

resolvable while the latter is prepared for any other conservative cases. Notice that it is always possible to find more suitable expressions for these conjunctions. In our motivating example, getLine1Number() is triggered under the condition that a specific button is selected. Due to the sophisticated internal computation, we did not extract the exact predicates. To be safe, we conservatively claim that the app retrieves the phone number depending on if the user selects Button “Confirm”.

6.5.2 6.5.2

Behavi Behavior or Descri Descripti ption on Model Model

Once we have associated a behavior graph to this grammatical structure, we further need to translate an API operation or a pattern to a proper combination of subject, verb and object. This translation is realized using our Behavior Description Model. SBG s are also translated using the same model because Conditions and contexts of SBG they are related to API calls. We manually create this description model and currently support 306 sensitive APIs APIs and and 103 103 API API patt patter erns ns.. Each Each entr entry y of this this mode modell cons consis ists ts of an API API or pattern signature and a 3-tuple of natural language words for subject, verb and object. We construct such a model by studying the Android documentation [15 15]. ]. For instance, the Android API call createFromPdu(byte[]) programmatically constructs incoming SMS messages from underlying raw Protocol Data Unit (PDU)


89

and hence it is documented as “Create an SmsMessage from a raw PDU” by Google. Our model records its API prototype and assigns texts “the app”, “retrieve” and “incoming SMS messages” to the three linguistic components respectively. These three components form a sentence template. Then, constants, concrete conditions and contexts serve as modifiers to complete the template. For example, the template of HttpClient.execute() HttpClient.execute() is represented using words “the app”, “send” and “data to network”. Suppose an app uses this API to deliver data to a constant URL “http://constant.url http://constant.url”, ”, when the phone is locked (i.e., keyguard is on). Then, such constant value and condition will be fed into the template to produce the sentence “The app sends data to network “http://constant.url “http://constant.url” ” if the phone is locked.” The condition APIs share the same model format. The API checking keyguard status (i.e., KeyguardManager.isKeyguardLocked() ) is modeled as words “the phone”, “be” and “locked”. It is noteworthy that an alternative approach is to generate this model programmatically. Sridhara et al. [ 8] proposed to automatically extract descriptive texts for APIs and produce the Software Word Usage Model. The API name, parameter type and return type are examined to extract the linguistic elements. For example, the model of createFromPdu(byte[]) may therefore contain the keywords “create”, “from” and “pdu”, all derived from the function name. Essentially, we can take the same approach. However, we argue that such a generic model was designed to assist software development and is not the best solution to our problem. An average user may not be knowledgeable enough to understand the low-level technical terms, such as “pdu”. In contrast, our text selections (i.e., “the app”, “retrieve” and “incoming SMS messages”) directly explain the behavior-level meaning. We gene genera rate te desc descri ript ptio ion n mode modell for for API API patt patter erns ns base based d on thei theirr inte intern rnal al program program logics logics.. Table able 6.1 pres presen ents ts the the thre threee majo majorr logi logics cs,, we have have disc discov ov-ered ered in beha behavi vior oral al patt patter erns ns.. (1) (1) A sing single leto ton n obje object ct is retr retrie ieve ved d for for furt furthe herr operat operation ions. s. For For examp example, le, a SmsManager.getDefault() is alwa always ys calle called d prior prior to SmsManager.sendTextMessage() beca becaus usee the the form former er fetc fetche hess the the defa defaul ultt that the latter latter needs. needs. We theref therefore ore descri describe be only only the latter latter which which SmsManager that is asso associ ciat ated ed to a more more conc concre rete te beha behavi vior or.. (2) (2) Succ Succes essi sive ve APIs APIs cons consti titu tute te a dedica dedicated ted workflo workflow w. For insta instance nce,, divideMessage() alwa always ys happen happenss before before sendMultipartTextMessage() , since the first provides the second with necessary inpu inputs ts.. In this this case case,, we stud study y the the docu docume ment nt of each each API API and and desc descri ribe be the the comp comple lete te beha behavi vior or as an enti entire rety ty.. (3) (3) Hier Hierar arch chic ical al info inform rmat atio ion n is acce access ssed ed usin using g mult multip iple le leve levels ls of APIs APIs.. For For inst instan ance ce,, to use use loca locati tion on data data,, one one has has to first first call call getLastKnownLocation() to fetch a Location object, and then call getLongitude() and getLatitude() to read the “double”-typed data from this

Table 6.1 Program logics in behavioral patterns

Program logic

How to describe

Sing Single leto ton n retr retrie iev val

Desc Descri ribe be the the latt latter er

Workflow

Describe both

Access Access to to hierarc hierarchical hical data

Describe Describe the former former

90

6


object object.. Since Since the higher higher leve levell objec objectt is alread already y meanin meaningful gful enough enough,, we hence hence describe this whole behavior according to only the former API. In fact, we only create description models for 103 patterns out of the total 109 discovered ones. Some patterns are large and complex, and therefore are hard to summarize. For these patterns, we have to fall back to the safe area and describe them in a API-by-API manner. In order to guarantee the security-sensitivity and readability of the descriptive texts, we carefully select the words to accommodate the model. To this end, we learn from the experience of prior security studies [ 2, 3] on app descriptions: (1) The selected vocabulary must be straightforward and stick to the essential API functionalities. As an counterexample, an audio recording behavior can hardly be inferred from the script “ Blow into the mic to extinguish the flame like a real candle” [2]. This is because it does not explicitly refer to the audio operation. (2) Descri Descripti ptive ve texts texts must must be distin distingui guisha shable ble for semant semantica ically lly differ different ent APIs. APIs. Otherw Otherwise ise,, poorly poorly-ch -chose osen n texts texts may confus confusee the reader readers. s. For instan instance ce,, an app turn recor ecordin dings gs into into ringto ringtones nes” in real with with descri descripti ption on “You can now turn realit ity y only only converts previously recorded files to ringtones, but can be mistakenly associated to the permission android.permission.RECORD_AUDIO due to the misleading text choice [ choice [2 2, 3 3]. ].

6.5.3 6.5.3

Behavi Behavior or Graph Graph Transla ranslatio tion n

Now that we have defined a target language and prepared a model to verbalize sensitive APIs and patterns, we further would like to translate an entire behavior graph into natural language scripts. Algorithm 3 demonstrates our graph traversal based translation. This algorithm takes a SBG G and the description model M desc desc as the inputs and eventually outputs a set of descriptions. The overall idea is to traverse the graph and translate each path. Hence, it first performs a breadth-first search and collects all the paths into Set path . Next, it examines each path in Set path to parse the nodes in sequence. Each node is then parsed to extract the node name, constants, conditions and contexts. The node name node:name (API or pattern) is used to query the model M desc desc and fetch the {subj,vb,obj} of a main clause. The constants, conditions and contexts are organized into the modifier (Cmod) of main clause, respectively. In the end, the main clause is realized by assembling {subj,vb,obj} and the aggregate modifier Cmod. The realized sentence is inserted into the output set Set desc desc if it is not a redundant redundant one.


91

Algorithm 3 Generating 3 Generating Descriptions from a SBG G  {A SBG } M desc desc  {Description model} Set desc desc  ; Set path  BFS(G) for path 2 Set path path do desc  null for node 2 path do {subj,vb,obj}  Query M desc (node :name) desc Cmod  null Set const const  GetConsts(node ) for 8const 2 Set const const do Cmod  Aggregate(Cmod,const ) end for Set cc cc  GetConditionsAndContext(node ) for 8cc 2 Set cc cc do {subj,vb,obj}cc  Query M desc (cc) desc text cc cc  RealizeSentence({subj,vb,obj}cc ) Cmod  Aggregate(Cmod,text cc cc ) end for text  RealizeSentence({subj,vb,obj,Cmod}) desc  Aggregate( desc; text ) end for Set desc desc  Set desc desc [ fdescg end for output Set desc desc as the generated description set

6.5.4 6.5.4

Motiva Motivatin ting g Exampl Examplee

We have implemented the natural language generation using a NLG engine [10 [ 10]] in 3 K LOC. Figur Figuree 6.9 6.9 illustrates illustrates how we step-by-step generate descriptions for the motivating motivating example. First, we discover two paths in the SBG : (1) getLine1Number() ! format() ! write() and (2) getSimOperatorName() ! format() ! write() . Next, we describe every node sequentially on each path. For example, for the first node, the API getLine1Number() is modeled by the 3-tuple {“the app”, “retrieve”, “your phone number”}; the entry point OnClickListener.onClick is translated using its model {“a GUI component”, “be”, “clicked”} and is preceded by “Once” ; the condition findViewById(View.getId)==Button(“Confirm”) is described using the template {“the user”, “select”, “ ”}, which accepts the GUI name, Button “Confirm”, as a parameter. The condition and main clause are connected using “depending on if”. At last last,, we aggr aggreg egat atee the the sent senten ence cess deri derive ved d from from indi indivi vidu dual al node nodes. s. In this this example, all the nodes share the same entry point. Thus, we only keep one copy “Once a GUI component is clicked”. Similarly, the statements on the nodes are of “Once

92

6


Behavior Graph

Natural Language Generation

API prototype ry

o in in t

Once “a Once “a GUI component”, “be”, “clicked”

, OnClickListener.onClick, Ø const const , Set cond cond Set cond = { findViewById(View.g } cond = findViewById(View.getId)==Button(“Con etId)==Button(“Confirm”) firm”) }

“The app”, “ retrieve ” retrieve”, “ your your phone number ” depending on if “the “the user”, “select”, “the Button ``Confirm’’ “

Conditions

, OnClickListener.onClick, Ø const const , Set cond cond cond = Set cond = { findViewById(View.getId)==Button(“Con findViewById(View.getId)==Button(“Confirm”) firm”) } }

, OnClickListener.onClick, Set const const , Ø cond cond

Translate using model

e t a g e r g g A

e t a g e r g g A

e t a g e r g g A


Set const } 00/app_id=an1005/ani=%s/dest=%s/pho id=an1005/ani=%s/dest=%s/phone_number=%s/company= ne_number=%s/company=%s/ %s/ const = { 1 00/app_

“The app”, “ encode ” encode”, “ the the data into format ” “100/app_id=an1005/ani=%s/dest=%s 100/app_id=an1005/ani=%s/dest=%s/phone_number=%s/company /phone_number=%s/company=%s/ =%s/ “


, const , Set cond cond OnClickListener.onClick, Ø const

“The app”, “ send ”, “ data ” send ”, data to network ” depending on if “the “the user”, “select”, “the Button ``Confirm’’ “

Set cond = { findViewById(View.get findViewById(View.getId)==Button(“Conf Id)==Button(“Confirm”) irm”) } } cond =

Realize Sentenc Sentence e

Description: Once a GUI component is clicked, the app retrieves retriev es you phone number, and encodes the data into format “100/app_id=an1005/ani=%s/ “100/app_id=an1005/ani=%s/ dest =%s/phone_number=%s/company=%s/ =%s/phone_number=%s/company=%s/ ”, ”, and sends data to network, depending depending on if the user selects the Button “Confirm”.

Once a GUI Once a componentt is componen clicked Finalize

The app retrieves your phone number, 00/app_id=a /app_id=an1 n1 00 005/ 5/ and encodes the data into format “ 1 00 ani=%s/dest=%s/phone_number=%s/company=%s/ ”, and sends data to network depending on if the the user selects the Button ``Confirm’’

motivating example example Fig. 6.9 Description generation for the motivating

also aggregated and thus share the same subject “The app”. We also aggregate the conditions in order to avoid the redundancy. As a result, we obtain the description illustrated at the bottom left of Fig. 6.9 6.9..

6.6 6.6

Eval Evalua uati tion on

6.6.1 Correctnes Correctnesss and Security-A Security-Aware wareness ness To evalu evaluate ate the correc correctne tness, ss, we produce produce textu textual al descri descripti ptions ons for Correctness To DroidBench apps (version 1.1) [11 [ 11]. ]. DroidBench apps are designed to assess the accuracy of static analyses on Android programs. We use these apps as the ground truths because they are open-sourced open-sourced programs programs with clear semantic semantics. s. Table 6.2 presents the experimental results, which show that D ESCRIBEM E achieves a true positiv positivee rate of 85 %. DESCRIBEME misses misses behav behavior ior descri descripti ptions ons due to three three major major reason reasons. s. (1) Points-to analysis lacks accuracy. We rely on Soot’s capability to perform points-to analysis. However, it is not precise enough to handle the instance fields accessed in callback functions. (2) D ESCRIBEME does not process exception exception handler code and therefore loses track of its dataflow. (3) Some reflective calls cannot be statically resolved. Thus, D ESCRIBEME fails to extract their semantics.

6.6 Evaluation Table 6.2 Description generation results for DroidBench

93

Total otal #

Corr Correc ectt

Miss Missin ing g desc desc..

Fals Falsee stat statem emen entt

65

55

6

4

DESCRIBEME produces false statements mainly because of two reasons. First, our static analysis is not sensitive to individual array elements. Thus, it generates false descriptions for the apps that intentionally manipulate data in array . Second, again, our points-to analysis is not accurate and may lead to over-approximation. Despite the incorrect cases, the accuracy of our static analysis is still comparable to that of FlowDroid [16 [ 16], ], which is the state-of-the-art static analysis technique for Android apps. Moreover, it is noteworthy that the accuracy of program analysis is not the major focus of this work. Our main contribution lies in the fact that, we combine static analysis with natural language generation so that we can automatically explain program behaviors to end users in human language. To this end, we can in fact apply any advanced analysis tools (e.g., FlowDroid, AmanDroid [ 17 17], ], etc.) to serve our needs. security-awareness of D ESCRIBEME, we Permission Permission Fidelity To demonstrate the security-awareness use a description vetting tool, AutoCog [ 3], to evaluate the “permission-fidelity” of descriptions. AutoCog examines the descriptions and permissions of an app to discover their discrepancies. We use it to analyze both the original descriptions and the security-centric ones produced by D ESCRIBEME, and assess whether our descriptions can be associated to more permissions that are actually requested. Unfortunately, AutoCog only supports 11 permissions in its current implementation. In particular, it does not handle some crucial permissions that are related to informatio information n stealing stealing (e.g., (e.g., phone number, number, device device identifier identifier,, service service provider provider,, etc.), sending and receiving text messages, network I/O and critical system-level behaviors (e.g., KILL_BACKGROUND_PROCESSES ). The limitation of AutoCog in fact brings difficulties to our evaluation: if generated descriptions are associated to these unsupported permissions, AutoCog fails to recognize them and thus cannot conduct equitable assessment. Such a shortcoming is also shared by another NLP-based (i.e., natural language processing) vetting tool, WHYPER [ 2], which focuses on even fewer (3) permissions. This implies that it is a major challenge for NLP-based approaches to achieve high permission coverage, probably because it is hard to correlate texts to semantically obscure permissions (e.g., READ_PHONE_STATE ). In contrast, our approach does not suffer from this limitation because API calls are clearly associated to permissions [ 18 18]. ]. Despit Despitee the difficu difficult lties ies,, we manage manage to collec collectt 30 benign benign apps apps from from Google Google play and 20 malware samples from Malware Genome Project [19 19], ], whose permissions are supported by AutoCog. We run D ESCRIBEME to create the securitycentric descriptions and present both the original and generated ones to AutoCog. Howe Howeve verr, we notice notice that that AutoCog AutoCog someti sometimes mes cannot cannot recogn recognize ize certai certain n words words that that have have strong strong securi security ty implic implicati ations ons.. For examp example, le, D ESCRIBEME uses uses “geographic graphic location location” to desc descri ribe be the the perm permis issi sion onss ACCESS_COARSE_LOCATION and ACCESS_FINE_LOC ACCESS_FINE_LOCATION ATION . Yet, AutoCog cannot associate this phrase to any of the permissions.

94

6


Described Permissions Orig.Desc.

New Desc.

Permission List

8

s n 6 o i s s i m r e P 4 f o r e b m u 2 N

0 APPS

Fig. 6.10 Permissions reflected in descriptions

The fundamental reason is that AutoCog and D ESCRIBEME use different glossaries. saries. AutoCog performs performs machine learning learning on a particula particularr set of apps and extracts extracts the permission-related glossary from these existing descriptions. In contrast, We manually select descriptive words for each sensitive API, using domain knowledge. To bridge bridge this this gap, gap, we enhanc enhancee AutoCog AutoCog to recogn recognize ize the manual manually ly chosen chosen keywords. The experimental result is illustrated in Fig. 6.10 6.10,, where X-axis represents the evaluated apps and Y-axis is the amount of permissions. The three bars, from top to bottom, represent the amounts of permissions that are requested by the apps, recognized by AutoCog from security-centric descriptions and identified from original descriptions, respectively. Cumulatively, 118 permissions are requested by these 50 apps. Twenty permissions are discovered from the old descriptions, while 66 are uncovered from our scripts. This reveals that D ESCRIBEM E can produce descriptions that are more security-sensitive than the original ones. DESCRIBEME fails to describe certain permission requests due to three reasons. First, some permissions are used for native code or reflections that cannot be reso resolv lved ed.. Seco Second nd,, a few few perm permis issi sion onss are are not asso associ ciat ated ed to API API call callss (e.g (e.g., ., RECEIVE_BOOT_COMPLETED ), and thus are not included into the SBGs. Last, some permis permissio sions ns are correl correlate ated d to certa certain in API parame parameter ters. s. For instan instance, ce, the query API requires permission READ_CONTACTS only if the target URI is the Contacts database. Thus, if the parameter value cannot be extracted statically, statically, such a behavior will not be described.

6.6 Evaluation

95

6.6.2 Readability Readability and Effectiv Effectiveness eness To evaluate the readability and effectiveness of generated descriptions, we perform a user study on the Amazon’s Mechanical Turk (MTurk) [ 20 20]] platform. The goal is two-fold. First, we hope to know whether the generated scripts are readable to average audience. Second, we expect to see whether our descriptions can actually help users avoid risky apps. To this end, we follow Felt et al.’s approach [21 21], ], which also designs experiments to understand the impact of text-based protection mechanisms. descriptions for Android apps using Methodology We produce the security-centric descriptions DESCRIBEME and and meas measur uree user user reac reacti tion on to the the old old desc descri ript ptio ions ns (Cond (Condit itio ion n 1.1, 1.1, 2.1– 2.1– 2.3), machine-generated ones (Condition 2.1) and the new descriptions (Condition 2.4–2.6). Notice that the new description is the old one plus the generated one. efficiency y consideration, consideration, we perform the user study based on Dataset Due to the efficienc the descriptions of 100 apps. We choose these 100 apps in a mostly random manner but we also consider the distribution of app behaviors. In particular, 40 apps are malware and the others are benign. We manually inspect the 60 benign ones and further put them into two categories: 16 privacy-breaching apps and 44 completely clean ones. Participants Recruitment We recruit participants directly from MTurk and we require participants to be smartphone users. Besides, we ask screening questions to make sure participants understand basic smartphone terms, such as “Contacts” or “GPS location”. Hypotheses and Conditions Hypothesis 1: Machine-generated descriptions are readable to average smartphone users. To assess the readability, we prepare both the old descri descripti ptions ons (Condi (Conditio tion n 1.1) 1.1) and genera generated ted ones ones (Condi (Conditio tion n 1.2) 1.2) of the same same apps. apps. We would would like like to evalua valuate te machin machine-g e-gene enerat rated ed descri descripti ptive ve texts texts via comparison. Hypothesis 2: Security-centric Security-centric descriptions can help reduce the downloading of risky apps. To test the impact of the security-centric descriptions, we present both the old and new (i.e., old C generated) descriptions for malware (Condition 2.1 and 2.4), benign apps that leak privacy (Condition 2.2 and 2.5) and benign apps without privacy violations (Condition 2.3 and 2.6). We expect to assess the app download rates on different conditions. Study Deployment We post all the descriptions on MTurk and anonymize their sourc sources es.. We info inform rm the the part partic icip ipan ants ts that that the the task taskss are are abou aboutt Andr Androi oid d app app desc descri ript ptio ions ns and we pay 0.3 dollars for each task. Participants are asked to take part in two sets of experiments. First, they are given a random mixture of original and machine-generated descriptions, and are asked to provide a rating for each script with respect to its readability. The rating is ranged from 1 to 5, where 1 means completely unreadable and 5 means highly readable.

96

6


Readability Comparison C2:New Desc. (count)

C1:Old Desc. (count)

30

22.5

15

7.5

0 0. 9

1. 2

1. 5

1. 8

2. 1

2. 4

2. 7

3.

3. 3

3. 6

3. 9

4. 2

4. 5

4. 8

5. 1

Fig. 6.11 Readability ratings

Second, we present the participants another random sequence of descriptions. Such a sequence contains both the old and new descriptions for the same apps. Again, Again, we stress stress that the new description is the old one plus the generated generated one. one. Then, we ask participants the following question: “ Will you download an app based on the given description and the security concern it may bring to you?”. We emphasize “security concern” here and we hope participants should not accept or reject an app due to the considerations (e.g., functionalities, personal interests) other than security risks. Results and Implications Eventually, we receive 573 responses and a total of 2865 rating ratings. s. Figure Figure 6.11 shows shows the distri distribu butio tion n of readab readabili ility ty rating ratingss of 100 apps for Condition 1.1 and 1.2. For our automatically created descriptions, the avera average ge readab readabil ility ity rating rating is 3.596 3.596 while while over over 80 % reader readerss give give a rating rating higher higher than 3. As a comparison, the average rating of the original ones is 3.788. This indicates our description is readable, even compared to texts created by human developers. The figure also reveals that the readability of human descriptions are relatively stable while machine-generated ones sometimes bear low ratings. In a furth further er inve invest stig igat atio ion, n, we noti notice ce that that our our desc descri ript ptio ions ns with with low low rati rating ngss usua usuall lly y incl includ udee relatively technical terms (e.g., subscriber ID) or lengthy constant string parameters. We believe that this can be further improved during an interactive process. User feedbacks and expert knowledge can help us find out more smooth and user-friendly words and expressions to construct our descriptions. We leave this improvement as a future work.

References Table 6.3 App download rates (ADR)

97

#

Condition

ADR

2.1

Mal Malware are w/ old old desc desc..

63.4% 63.4%

2.2

Lea Leakage kage w/ old desc desc..

80.0% 80.0%

2.3

Clean w/ old desc.

71.1 %

2.4 2.4

Malw Malwar aree w/ new new desc desc..

24.7 24.7 %

2.5 2.5

Leak Leakag agee w/ new new desc desc..

28.2 28.2 %

2.6

Clean w/ new desc.

59.3 %

Table 6.3 depicts experimental results for Condition 2.1–2.6. It demonstrates the the secu securi rity ty impa impact ct of our our new new desc descri ript ptio ions ns.. We can can see see a 38.7 38.7 % decr decrea ease se of application download rate (ADR) for malware, when the new descriptions instead of old ones are presented to the participants. We believe that this is because malware authors deliberately provide fake descriptions to avoid alerting victims, while our descriptions can inform users of the real risks. Similar results are also observed for privacy-breaching benign apps, whose original descriptions are not focused on the security and privacy aspects. On the contrary, our descriptions have much less impact on the ADR of clean apps. Nevertheless, they still raise false alarms for 11.8 % participa participants. nts. We We notice notice that these false alarms result from descriptions descriptions of legitimate but sensitive functionalities, such as accessing and sending location data in social apps. A possible solution to this problem is to leverage the “peer voting” mechanism from prior work [22 [22]] to identify and thus avoid documenting the typical benign app behaviors.

References 1. Felt AP, Ha E, Egelman S, Haney A, Chin E, Wagner D (2012) Android permissions: user attention, comprehension, and behavior. In: Proceedings of the eighth symposium on usable privacy and security (SOUPS’12), 2012 2. Pandita R, Xiao X, Yang W, Enck W, Xie T (2013) WHYPER: towards automating risk assess assessmen mentt of mobile mobile applic applicati ations ons.. In: Procee Proceedin dings gs of the 22nd 22nd USENIX USENIX confer conferenc encee on security, August 2013 3. Qu Z, Rastogi V, Zhang X, Chen Y, Y, Zhu T, Chen Z (2014) Autocog: measuring the descriptionto-permissi to-permission on fidelity fidelity in Android Android applicatio applications. ns. In: Proceeding Proceedingss of the 21st conference conference on computer and communications security (CCS), 2014 4. Sridhara G, Hill E, Muppaneni D, Pollock L, Vijay-Shanker K (2010) Towards automatically generating summary comments for Java methods. In: Proceedings of the IEEE/ACM international conference on automated software engineering (ASE’10), 2010 5. Sridhara G, Pollock L, Vijay-Shanker K (2011) Generating parameter comments and integrating with method method summaries. summaries. In: Proceeding Proceedingss of the 2011 IEEE 19th internatio international nal conference conference on program comprehension (ICPC’11), 2011 6. Moreno L, Aponte J, Sridhara G, Marcus A, Pollock L, Vijay-Shanker K (2013) Automatic generation of natural language summaries for Java classes. In: Proceedings of the 2013 IEEE 21th international conference on program comprehension (ICPC’13), 2013 7. Buse RP, RP, Weimer WR (2010) Automatically Automatically documenting program program changes. changes. In: Proceedings Proceedings of the IEEE/ACM international conference on automated software engineering (ASE’10), 2010

98

6


8. Sridhara G, Pollock L, Vijay-Shanker K (2011) Automatically detecting and describing high level actions within methods. In: Proceedings of the 33rd international conference on software engineering (ICSE’11), 2011 9. Soot: A Java Optimization Framework (2016) http://www http://www.sable.mcgill.ca/soot .sable.mcgill.ca/soot 10. SimpleNLG: Java API for Natural Language Generation (2016) https://code.google.com/p/ simplenlg/ 11. Droidbench-benchmarks (2016) http://sseblog.ec-spride.de/tools/droidbench http://sseblog.ec-spride.de/tools/droidbench/ / 12. Huang J, Zhang X, Tan L, Wang P, Liang B (2014) AsDroid: detecting stealthy behaviors in Android applications by user interface and program behavior contradiction. In: Proceedings of the 36th international conference on software engineering (ICSE’14), 2014 13. Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceedings of IEEE international conference on data mining(ICDM’03), 2002 14. Cordella LP, Foggia P, Sansone C, Vento M (2004) A (Sub) graph isomorphism algorithm for matching large graphs. In: IEEE transactions on pattern analysis and machine intelligence, vol 26(10), 2004, pp 1367–1372 15. Reference - Android Developers (2016) http://dev http://developer eloper.android.com/refere .android.com/reference/packages.html nce/packages.html 16. Arzt S, Rasthofer S, Fritz C, Bodden E, Bartel A, Klein J, Traon YL, Octeau D, McDaniel P (2014) (2014) FlowDroid FlowDroid:: precise precise context, context, flow, flow, field, object-sen object-sensitiv sitivee and lifecyclelifecycle-awa aware re taint analys analysis is for Androi Android d apps. apps. In: Procee Proceedin dings gs of the 35th 35th ACM SIGPLA SIGPLAN N confer conferenc encee on programming language design and implementation (PLDI ’14), June 2014 17. Wei F, Roy S, Ou X, Robby X (2014) Amandroid: a precise and general inter-component data flow analysis framework for security vetting of Android apps. In: Proceedings of the 21th ACM conference on computer and communications security (CCS’14), Scottsdale, AZ, November 2014 18. Au KWY, Zhou Zhou YF, YF, Huang Huang Z, Lie D (2012) (2012) PScout PScout:: analyz analyzing ing the Androi Android d permis permissio sion n specification. In: Proceedings of the 2012 ACM conference on computer and communications security (CCS’12), October 2012 19. Android Malware Genome Project (2012) http://www http://www.malgenomeproject.o .malgenomeproject.org/ rg/ 20. Amazon Mechanical Turk (2016) https://www https://www.mturk.com/mturk/welco .mturk.com/mturk/welcome me 21. Felt AP, Reeder RW, Almuhimedi H, Consolvo S (2014) Experimenting at scale with google chrome chrome’’s SSL warnin warning. g. In: Procee Proceedin dings gs of the SIGCHI SIGCHI confer conferenc encee on human human factor factorss in computing systems, 2014 22. Lu K, Li Z, Kemerlis V, Wu Z, Lu L, Zheng C, Qian Z, Lee W, Jiang G (2015) Checking more and alerting less: detecting privacy leakages via enhanced data-flow analysis and peer voting. In: Proceedings of the 22th annual network and distributed system security symposium (NDSS’15), 2015

Chapter 7

Limitation and Future Work

chapter, we discuss the limitation limitation of our work and propose further Abstract In this chapter, improvement as future work.

7.1

Andro Android id Malwar Malwaree Classifi Classificat cation ion

perform m static static analys analysis is on Dalvik Dalvik Native Native Code Code and HTML5-ba HTML5-based sed Apps Apps We perfor bytecode to generate the behavior graphs. In general, bytecode-level static program analysis cannot handle native code or HTML5-based applications. This is because neit neithe herr the the ARM ARM bina binary ry runn runnin ing g on the the unde underl rlyi ying ng Linu Linux x nor nor the the Jav JavaScr aScrip iptt code execute executed d in WebVie ebView w are visible visible from a bytecode bytecode perspecti perspective. ve. Therefore, Therefore, an alternative mechanism is necessary to defeat malware hidden from the Dalvik bytecode. Learning ing-ba -based sed detect detection ion is subjec subjectt to poison poisoning ing attack attacks. s. To confus confusee Evasion Learn a trai traini ning ng syst system em,, an adve advers rsar ary y can can pois poison on the the beni benign gn data datase sett by intr introd oduc ucin ing g clean apps bearing malicious features. For example, she can inject harmless code intensively making sensitive API calls that are rarely observed in clean apps. Once such samples are accepted by the benign dataset, these APIs are therefore no longer the distinctive features to detect related malware instances. However, our detectors are slight slightly ly differ different ent from from prior prior works. works. First First of all, all, the featur features es are associ associat ated ed with behavior graphs, rather than individual APIs. Therefore, it is much harder for an attacker to engineer confusing samples at the behavioral-level. Second, our anomaly detection serves as a sanitizer for new benign samples. Any abnormal behavior will be detected, and the developer is requested to provide justifications for the anomalies. On the other hand, in theory, it is possible for adversaries to launch mimicry attacks and embed malicious code into seemingly benign graphs to evade our detection mechanism. This, by itself, is an interesting research topic and deserves serious consideration. Nevertheless, we note that it is non-trivial to evade detections based upon high-level program semantics, and automating such evasion attacks does not appear to be an easy task. In contrast, existing low-level transformation attacks can be easily automated to generate many malware variants to bypass the AV scanners. DroidSIFT certainly defeats such evasion attempts.


99

100

7.2

7


Automated utomated Vulnerabilit ulnerability y Patching Patching

soundness of our approach results results from that Soundness of Patch Generation The soundness of slice computation, patch statement placement and patch optimizations. (1) We perform standard static dataflow analysis to compute taint slices. Static analysis, especially on event-driven, object-oriented and asynchronous programs, is known to introd introduce uce false false positi positive ves. s. Howe Howeve verr, such such false false positi positive vess can can be verifi verified ed and mitigated during runtime, with our devised shadowing mechanism and inserted patch code. (2) Our patch statement placement follows the standard taint tracking techniques, which may introduce imprecision. Specifically, our taint policy follows that of TaintDroid. While effective and efficient in a sense, this multi-level tainting is not perfectly accurate in some cases. For instance, one entire file is associated with a single taint. Thus, once a tainted object is saved to a file, the whole file becomes tainted causing over-tainting. Other aggregate structures, such as array, share the same limitation. It is worth noting that improvement of tainting precision is possible. More complex shadowing mechanism (e.g., shadow file, shadow array, etc.) can be devised to precisely track taint propagation in aggregations. However, these these mechan mechanism ismss are more more expen expensi sive ve consid consideri ering ng runtime runtime cost cost and memory memory consumption. (3) Our optimizations take the same algorithms used in compilers, such such as consta constant nt propag propagati ation, on, dead dead code code elimin eliminati ation. on. Thus, Thus, by design design,, origin original al program semantics is still preserved when patch optimization is applied. (4) In spite of the fact that our approach may cause false positives in theory, we did not observe such cases in practice. Most vulnerable apps do not exercise sophisticated data transfer for Intent propagation, and thus it is safe to patch them with our technique. patch state statemen mentt Con Conversi version on Betwe Between en Dalv Dalvik ik Bytec Bytecode ode and Jimple Jimple IR Our patch placem placement ent and optimi optimizat zation ionss are performed performed at Jimple Jimple IR leve level. l. So we need need to convert Dalvik bytecode program into Jimple IR, and after patching, back to Dalvik bytecode. We use dex2jar [1 [ 1] to translate Dalvik bytecode into Java bytecode and then use Soot [2 [ 2] to lift Java bytecode to Jimple IR. This translation process is not always successful. Occasionally we encountered that some applications could not be converted. Enck et al. [3] pointed out several challenges in converting Dalvik byteco bytecode de into into Java Java source source code, code, includ including ing ambigu ambiguous ous cases cases for type type infere inference nce,, different layouts of constant pool, sophisticated conversion from register-based to stack-based virtual machine and handling unique structures (e.g., try/finally block) in Java Java.. In compar compariso ison, n, our conve conversi rsion on faces faces the same, same, if not less, less, challe challenge nges, s, because we do not need to lift all the way up to Java source code. We consider these problems to be mainly implementation errors. Indeed, we have identified a few cases that Soot performs overly strict constraint checking. After we patched Soot, the translation problems are greatly reduced. We expect that the conversion failures can be effectively fixed over time. A complementary implementation option is to engineer a Dalvik bytecode analysis and instrumentation framework, so that operations are directly applied on Dalvik bytecode. Since it avoids conversions between different tools, it could introduce minimal conflicts and failures.

7.3 Context-Aware Privacy Protection

101

Fully Automatic utomatic Defense Defense For most vulnerable samples in our experiment, we are able able to manual manually ly verif verify y the compon component ent hijack hijacking ing vulner vulnerabi abilit lities ies.. Howe Howeve verr, due to the object-oriented nature of Android programs, computed taint slices can sometimes become rather huge and sophisticated. Consequently, we were not able to confirm confirm the explo exploita itable ble paths paths for some some vulner vulnerabl ablee apps apps with with human human effor effort, t, and thus could not reproduce the expected attack. Developers are faced with the same, if not more, challenges, and thus fail to come up with a solution in time. Devising a fully automated mechanism is therefore essential to defend this specific complicated vulnerability. In principle, our automatic patching approach can still protect these unconfirmed cases, without knowing the real presence of potential vulnerability. That is to say if a vulnerability does exist, AppSealer will disable the actual exploitation on the fly. Otherwise, AppSealer does not interrupt the program execution and thus does not affect usability. With automated patching, users do not have to wait until developers fix the problem.

7.3

Context-A Context-Awar waree Privacy Privacy Protectio Protection n

analysis, code instrumentation, instrumentation, Soundness of Our Bytecode Rewriting Our static analysis, and optimizations follow the standard program analysis and compiler techniques, which have been proven to be correct in the single threading context. In the multithreading context, our shadow variables for local variables and function parameters are still safe because they are local to each individual thread, while the soundness of shadow fields depends on whether race condition vulnerability is present in original bytecode programs. In other words, if the accesses to static or instance fields are properly guarded to avoid race condition in the original app, the corresponding operations on shadow fields are also guarded because they are placed in the same code block. However, if the original app does have race condition on certain static or instance fields, the information flow tracking on these fields may be out of sync. We mode modele led d Andr Androi oid d APIs APIs for for both both anal analys ysis is and and inst instru rume ment ntat atio ion. n. We manu manual ally ly gene genera rate te dedicated taint propagation rules for frequently used APIs and those of significant importance (e.g., security sensitive APIs). Besides, we have general default rules for the rest. It is well-recognized that it is a non-trivial task to build a fairly complete API model, and it is also true that higher coverage of API model may improve the soundness of our rewriting. However, previous study [ 4] shows that a model of approximat approximately ely 1000 APIs can already already cover 90 % of calls calls in over over 90,000 Android applications. In addition, it is also possible to automatically create a better API model by analyzing and understanding Android framework, and we leave it as our future work. sensitive information can propagate Tracking Implicit Flow It is well known that sensitive in other channels than direct data flow, such as control flow and timing channels. It is extremely challenging to detect and keep track of all these channels. In this work, we do not consider keeping track of implicit flow. This means that a dedicated

102

7


malicious Android developer is able to evade Capper. This limitation is also shared by other solutions based on taint analysis, such as TaintDroid [ 5] and AppFence [ AppFence [6 6]. Serious research in this problem is needed and is complementary to our work. [7] shows that many Android applications make use of Java Reflection A study [7 Java Java reflectio reflection n to call call undocu undocumen mented ted methods. methods. While While in 88.3 88.3 % cases, cases, the class class names and method names of these reflective calls can be statically resolved, the rest can still cause problems. In our experiment, we seldom encounter this situation, because even though some apps indeed use reflective calls, they are rarely located within the taint propagation slices. That is, these reflective calls in general are not involved in privacy leakage. We could use a conservative function summary, such that all output parameters and the return value are tainted if any of input parameter is tainted, but it might be too conservative. conservative. A more elegant solution might be b e to capture the class name and the method name at runtime and redirect to the corresponding function summary, which enforces more precise propagation logic. We leave this as our future work. native comNative Components Android applications sometimes need auxiliary native ponent ponentss to functi function, on, while while,, unfort unfortuna unatel tely y, static static byteco bytecodede-le leve vell analys analysis is is not capable of keeping track of information flow within JNI calls. However, many apps in fact use common native components, which originate from reliable resources and are of widely widely recogn recognize ized d dataflo dataflow w behav behavior ior.. Thus, Thus, it is possib possible le to model model these known components with offline knowledge. In other words, we could build a database for well-known native libraries and create proper function summaries for JNI calls, and therefore exercise static data propagation through native calls with the help of such summaries.

7.4

Automated utomated Generatio Generation n of of Securi Security-Cen ty-Centric tric Description Descriptionss

The correctness and accuracy of generated description are largely affected by that of static program analysis. Static dataflow analysis is conservative and may cause over-approximation. We believe more advanced static analysis techniques can help reduce false statements of generated descriptions. Besides, in order not to mislead end users, D ESCRIBEM E attempts to stay on the safe side and does not over-claim its findings. Static bytecode-level program analysis may also yield false negatives due to the presence of Java reflections, native code, dynamically loaded classes or JavaScript/ HTML5-based code. Fundamentally, we need whole-system dynamic analysis (e.g., DroidScope [8]) to addr addres esss runt runtim imee beha behavi vior ors. s. Neve Nevert rthe hele less ss,, from from stat static ic anal analys ysis is,, we can observe the special APIs (e.g., java.lang.reflect.Method.invoke() ) that trigger those statically unresolvable code. Further, D ESCRIBEME can retrieve and describe the dependencies between these special APIs and other explicit and critical ones. Such dependenci dependencies es can help assess the risks of former. former. Even Even if there exists exists no such dependencies dependencies,, D ESCRIBEM E can still directly document the occurrence of

References

103

special APIs. The prevalence of unresolved operations and the drastic discrepancy between our description and permission requests are clues for end users to raise alert.

References 1. dex2jar (2016) http://code.google.c http://code.google.com/p/dex2jar/ om/p/dex2jar/ 2. Soot: A Java Optimization Framework (2016) http://www http://www.sable.mcgill.ca/soot/ .sable.mcgill.ca/soot/ 3. Enck W, Octeau D, McDaniel P, Chaudhuri S (2011) A study of Android application security. In: Proceedings of the 20th usenix security symposium, August 2011 4. Chen KZ, Johnson N, D’Silva V, Dai S, MacNamara K, Magrino T, Wu EX, Rinard M, Song D (2013) Contextual policy enforcement in Android applications with permission event graphs. In: Proceedings of the 20th annual network and distributed system security symposium (NDSS’13), February 2013 5. Enck W, Gilbert P, Chun BG, Cox LP, Jung J, McDaniel P, Sheth AN (2010) TaintDroid: an inform informati ationon-flo flow w tracki tracking ng system system for realtim realtimee priva privacy cy monito monitorin ring g on smartp smartphon hones. es. In: Proceeding Proceedingss of the 9th USENIX USENIX symposium symposium on operating operating systems design design and implementation implementation (OSDI’10), October 2010 6. Hornyack P, Han S, Jung J, Schechter S, Wetherall D (2011) These aren’t the droids you’re looking for: retrofitting Android to protect data from imperious applications. In: Proceedings of CCS, 2011 7. Felt AP, Chin E, Hanna S, Song D, Wagner D (2011) Android permissions demystified. In: Proceedings of CCS, 2011 8. Yan LK, Yin H (2012) (2012) DroidS DroidScop cope: e: seamle seamlessl ssly y recons reconstru tructi cting ng OS and Dalvik Dalvik semant semantic ic views views for dynamic Android malware analysis. In: Proceedings of the 21st USENIX security symposium, August 2012

Chapter 8

Conclusion

chapter, we conclude our work and summarize this book. Abstract In this chapter,

To battle various security threats in Android applications, we propose a semantics and conte contextxt-aw aware are approa approach. ch. We argue argue that that such such an approa approach ch impr improv oves es the effec effecti tive ve-ness Android malware detection and privacy preservation, ameliorates the usability of security-related app descriptions and can solve complex software vulnerabilities in mobile apps. Our argument has been validated via the design, implementation and evaluation of a series of security enhancement techniques. DroidSIFT demonstrated that semantics-aware Android malware classification not only achieves high detection rate and low false positive and negative rates, but also defeats polymorphic and zero-day malware. Moreover, it is resilient to bytecode-l bytecode-lev evel el transforma transformation tion attacks attacks and outperforms outperforms all the existing existing antivirus antivirus detectors with respect to the detection of obfuscated malware. AppSealer showed that with the analysis of program semantics in advance, the patch of complex complex applicati application on vulnerabil vulnerabilitie itiess can be automatic automatically ally generated generated.. In addition, static program analysis facilitates a selective patch code instrumentation and therefore improves the runtime performance of patched programs. Capper Capper illus illustra trate ted d that that a conte contextxt-aw aware are priva privacy cy polic policy y can effec effecti tive vely ly differ different entiat iatee legit legitima imate te use of priva private te user user data data from from real real priva privacy cy leakag leakage, e, becaus becausee program program context can faithfully reflect the true intention of critical operations. DESCRIBEME showed that natural language app descriptions, of better readability and higher security sensitivity, are created via program analysis and comprehension of applicati application on semantics semantics.. Automatic Automatically ally produced descripti descriptions ons can help users avoid malware and privacy-breaching apps.


105

Android Application Security A Semantics and Context-Aware Approach.pdf

Recommend Documents