TaintDroid lecture notes (Enck et al., OSDI 2010)

the threat:
    - smartphones widely used
    - many sources of sensitive personal data:
      	      - microphone input
	      - contacts database
	      - accelerometer input
	      - GPS input
	      - camera input
    - typical smartphone application security model:
      	      - downloaded app is binary code (no source code available)
	      - at installation time, OS asks "allow this app to access
	        the [device here]?, e.g,. GPS"
              - perhaps at application data use time, OS may ask (quickly
	        exceeds users' annoyance tolerance; users begin to always
		"allow")
              - application may talk to the network, to other apps within
	        phone
              - perhaps some isolation within filesystem (e.g., jail, unique
	        user IDs per app; break sharing, but provide isolation)
    - what might maliciously written apps do with sensitive information--once
      you say "allow access to location," little to no control over how app
      uses that information

today's topic, taintdroid:
    - extensions to Android mobile phone OS that:
      	      - allow sensitive information to be *tagged* as such (and
	        in a way that indicates the source of the information)
              - instrument DEX (register-based virtual machine for Java)
	        to propagate tags on data as an interpreted DEX app runs
	      - note when data tagged as sensitive are about to exit the
	        system (i.e., be written to network) and announce that
		sensitive data have been disclosed (i.e., to warn user or
		app tester)
	      - propagate tags over RPC "parcels" sent between processes
	        on the same smartphone
	      - propagate tags into (and back from) the filesystem

before we go further, which of these is/are TaintDroid's
contribution(s), and what requirements must the system meet to deliver
these contributions?
    - "public service announcement", presenting the existential statement,
      "some Android apps leak your sensitive data--be careful out there!"
      	      - bar fairly low here: just need to show *some* private
	        information disclosures have been identified
	      - perhaps more useful if TaintDroid finds private information
	        disclosures (a) previously unknown and (b) not readily
		identifiable by simpler means (e.g., watching network
		traffic!)
              - impact for users? what would change?
    - private data leak detection system users will run every day, to identify
      bad apps
    	      - key metrics: false positives (FPs, false alarms),
	        false negatives (FNs, missed leaks).
              - how accurate is it if adversary tries to evade detection?
	      - how would you detect FNs? pretty hard... (does this paper
	        attempt this?)
	      - how would you detect FPs? see error flagged, scrutinize
	        output data, timing of events, and make an educated guess
		that data sent innocuous
	      - doesn't *prevent* leaks; just identifies bad apps that leak
	        after horse has left barn
              - FPs annoy users (waste user attention and time)
	      - if not for *users*, maybe for another audience?
    - private data leak prevention system users will run every day, to stop
      leakage of their sensitive information
              - if this is the goal, low FPs even more important

- review basic taint propagation concept

- review Android architecture, taint granularities supported by TaintDroid
    - per-variable
    - per-method
    - per-parcel (RPC message within phone)
    - per-file
    - primary goal: performance--minimal latency/run-time increase when
      running apps under TaintDroid
    - secondary goals: full coverage (low FNs), rare false alarms (low FPs)

    - always ask yourself: how fine-grained are these taint
      granularities? which introduce false positives and which false
      negatives?

    - how many taint flags per array? 1! risk of FPs: if any element
      of array tainted, whole array seen as tainted.

    - at what granularity is per-parcel tainting? for the whole parcel!
      risk of FPs: all content of parcel deemed to be tainted with union of
      all flags of data marshaled into parcel.

    - no implicit flows. risks FNs--why?

    - heuristic for JNI methods (native C and C++ methods): no
      instrumentation of native code (why?); "sets union of method arg
      taint tags to return value"
      	    - what happens if a JNI method references an object (in
              args, return value, or method body)?
	    - does heuristic capture taint across object references?
	    - what's the risk?
	    - authors: ~1900 methods that reference objects still using
	      this heuristic

	    - "define method profiles as needed": how do you know?

    - oddity: why is "unpack" trusted? (used to justify message-granularity
      taint propagation in RPC)

- basic taint propagation logic as expected; two slightly more interesting
  rules:
    - when storing to array, add union of taints of index and data element
      to taint of whole array
    - when setting a register variable to hold an instance field, taint the
      register variable with both the taint of the field and the taint of
      the instance reference

- evaluation:
    - press-baiting findings ;-)
    
    - FNs: 105 flagged connections, "37 clearly legitimate use". what
      does that mean? that these connections were false positives?
      same para says "there were no false positives."
           - what's the definition of a "false positive"? reporting sending
	     position (or other) information over the network when no such
	     sending occurred? differs from *user's* desire: "is this a
	     *dangerous* disclosure or one I don't want?"

    - performance reduction minor (14% on CPU-intensive workload with
      taint tracking; informally little perceptible difference in
      interactive apps' latencies)

    - accuracy? contrains use case for system.

- limitations:
    - native address operations in VM: no tracking of taint
    - no implicit flow tracing
    - no preservation of taint on external network