TaintDroid lecture notes (Enck et al., OSDI 2010) the threat: - smartphones widely used - many sources of sensitive personal data: - microphone input - contacts database - accelerometer input - GPS input - camera input - typical smartphone application security model: - downloaded app is binary code (no source code available) - at installation time, OS asks "allow this app to access the [device here]?, e.g,. GPS" - perhaps at application data use time, OS may ask (quickly exceeds users' annoyance tolerance; users begin to always "allow") - application may talk to the network, to other apps within phone - perhaps some isolation within filesystem (e.g., jail, unique user IDs per app; break sharing, but provide isolation) - what might maliciously written apps do with sensitive information--once you say "allow access to location," little to no control over how app uses that information today's topic, taintdroid: - extensions to Android mobile phone OS that: - allow sensitive information to be *tagged* as such (and in a way that indicates the source of the information) - instrument DEX (register-based virtual machine for Java) to propagate tags on data as an interpreted DEX app runs - note when data tagged as sensitive are about to exit the system (i.e., be written to network) and announce that sensitive data have been disclosed (i.e., to warn user or app tester) - propagate tags over RPC "parcels" sent between processes on the same smartphone - propagate tags into (and back from) the filesystem before we go further, which of these is/are TaintDroid's contribution(s), and what requirements must the system meet to deliver these contributions? - "public service announcement", presenting the existential statement, "some Android apps leak your sensitive data--be careful out there!" - bar fairly low here: just need to show *some* private information disclosures have been identified - perhaps more useful if TaintDroid finds private information disclosures (a) previously unknown and (b) not readily identifiable by simpler means (e.g., watching network traffic!) - impact for users? what would change? - private data leak detection system users will run every day, to identify bad apps - key metrics: false positives (FPs, false alarms), false negatives (FNs, missed leaks). - how accurate is it if adversary tries to evade detection? - how would you detect FNs? pretty hard... (does this paper attempt this?) - how would you detect FPs? see error flagged, scrutinize output data, timing of events, and make an educated guess that data sent innocuous - doesn't *prevent* leaks; just identifies bad apps that leak after horse has left barn - FPs annoy users (waste user attention and time) - if not for *users*, maybe for another audience? - private data leak prevention system users will run every day, to stop leakage of their sensitive information - if this is the goal, low FPs even more important - review basic taint propagation concept - review Android architecture, taint granularities supported by TaintDroid - per-variable - per-method - per-parcel (RPC message within phone) - per-file - primary goal: performance--minimal latency/run-time increase when running apps under TaintDroid - secondary goals: full coverage (low FNs), rare false alarms (low FPs) - always ask yourself: how fine-grained are these taint granularities? which introduce false positives and which false negatives? - how many taint flags per array? 1! risk of FPs: if any element of array tainted, whole array seen as tainted. - at what granularity is per-parcel tainting? for the whole parcel! risk of FPs: all content of parcel deemed to be tainted with union of all flags of data marshaled into parcel. - no implicit flows. risks FNs--why? - heuristic for JNI methods (native C and C++ methods): no instrumentation of native code (why?); "sets union of method arg taint tags to return value" - what happens if a JNI method references an object (in args, return value, or method body)? - does heuristic capture taint across object references? - what's the risk? - authors: ~1900 methods that reference objects still using this heuristic - "define method profiles as needed": how do you know? - oddity: why is "unpack" trusted? (used to justify message-granularity taint propagation in RPC) - basic taint propagation logic as expected; two slightly more interesting rules: - when storing to array, add union of taints of index and data element to taint of whole array - when setting a register variable to hold an instance field, taint the register variable with both the taint of the field and the taint of the instance reference - evaluation: - press-baiting findings ;-) - FNs: 105 flagged connections, "37 clearly legitimate use". what does that mean? that these connections were false positives? same para says "there were no false positives." - what's the definition of a "false positive"? reporting sending position (or other) information over the network when no such sending occurred? differs from *user's* desire: "is this a *dangerous* disclosure or one I don't want?" - performance reduction minor (14% on CPU-intensive workload with taint tracking; informally little perceptible difference in interactive apps' latencies) - accuracy? contrains use case for system. - limitations: - native address operations in VM: no tracking of taint - no implicit flow tracing - no preservation of taint on external network