StarTrack: A Framework for Enabling Track-Based Applications
============================================================

- Mobile devices equipped with hardware, software for location
  - GPS: Satellite signal, works well outdoors, not indoors
  - WiFi beacons: overhear beacon identifiers, associate with DB
  - Cell (3G) tower triangulation: Triangulate based on cell tower ids

- _Track_: A time-ordered sequence of location readings
  - Captures path taken by a device (person)
  - Contains discretely-sampled points along the true path
  - Each point may have significant location error (above technologies
    aren't perfect)

  - Tracks enable new applications
    - Ride sharing e.g.
      - Record tracks of co-workers driving to work
      - Suggest groups of co-workers who can share a ride
      - Reduce fuel consumption, road traffic congestion

- StarTrack: A system designed for tracks
  - Mobile devices collect + upload tracks to central cloud server
  - Invented facilities to store, compare, cluster, index, retrieve
    tracks
  - Handling large volumes of tracks across multiple users

- Architecture (slide)
  - Mobiles have _location manager_ that grabs data from GPS/Wifi/cell
    tower, exposes a uniform location (presumably latitude-longitude
    coordinates)
  - _Track recorder_ takes location samples and constructs tracks
  - Apps decide when to start/stop track recorder
  - _Star Track client_ talks to server
  - _Star Track server_: Implemented using SQL Server DB
  - Briefly discuss privacy issues

- Programming interface (slide)
  - Between Star Track client and Star Track server
  - *And* between apps on mobile and star track client
  - Applications decide when to start/stop recording a track (energy-
    and computation- expensive)
  - ClipTrack: Restrict to geographic region
  - ClusterTracks: Clusters a set of tracks
    - Selects a representative track from each cluster
    - Returns set of representative tracks
  - Querying tracks: On the StarTrack server

- Use case: Ride Sharing (slide)
  - Ride sharing: StartTrack when you begin your commute; EndTrack
    when you finish commute
    - First narrow to desired commute time
    - Then RepresentativeTrack clusters tracks and returns
      representative track from largest cluster, to make sure they
      regularly drive that route.

-Further Use Cases (slide)
  - Urban sensing: Apps can add metadata to tracks - unrestricted
    data.
  - Collaborative downloading: Coordinate (via 802.11) a striped
    download over many 3G interfaces
  - Health monitoring: Seems interesting, not explicated here

- Track comparison: Challenges (slide)
  - Input: two tracks; output: value between 0 and 1 quantifying the
    similarity between the two
  - Red points = one track; white points = other track
  - Seasons sampling frequency varies: power of GPS/wireless,
    speed of movement
  - Minor detours: For some applications, we want to ignore these
  - Sub-matches: Person with solid dot track can give ride to person
    with hollow dot track, but not other way around
      - Asymmetric: Similarity(T_a,T_b) != Similarity(T_b, T_a)
  - Intersecting segments: Say they don't want to be biased in favor
    of same starting/ending points

- Track clustering (Section 3.2)
  - Goal: Eliminate near-duplicate tracks, pick representative set of
    tracks (*aggregate* information)
  - Use k-medians clustering; issues?
  (need to choose k; entire input set must be available)
  - How do they get around this?
  (group tracks into epochs)

- Track retrieval (Section 3.3)
  - Spatial indices: DB tracks association between table rows and
    geographic grids on a map
  - Each track is a single row in the table containing:
    - track id, XML metadata, track entry locations, timestamps
  - API calls map to SQL statements on the database
  - trks = QueryByTrack(query-track):
    1. Find set of regions that surround target track
    2. Retrieve all tracks that go through these regions
    3. Order tracks by similarity

- Evaluation (Section 4)
  - Nine real tracks from GPS devices in the San Francisco Bay area
  - Synthetic trace generation
    - Look at road map of bay area, compute shortest road path between
      two points
    - Variables: 
      - Hop distance between successive samples
      - Measurement error introduced for readings (uniform from 0 to
        100 meters) (realistic?)

- Track comparison (slide)
  - Initial validation
    - Ground truth based on visual comparison of tracks
    - Later they compute ground truth for synthetic tracks

- Parameter selection (slide)
  - Pick one track t, compare to four sets of 1000 tracks each
  - Each set particular start + end point
  - Ground truth based on distances between intersections
    - But the specific computation is not given in paper

- Fixed vs variable comparison (slide)
  - Is this the whole story?
  - Doesn't measure the accuracy of evaluating similarity of tracks
    that are not the same, that should not match

- Similarity error (slide)
  - What is the problem with imposing a minimum distance?
  - Imposing minimum distance skirts the problem
  - We will see a better way in StarTrack Next Generation and VTrack

- Computational time (slide)
  - Circle/square techniques implemented use n^2 comparisons, n =
    number of entries per track
  - How to speed up?
  - (Could have used location to filter)
  - (Keep spatial indices in the database anyway)


StarTrack Next Generation
=========================

- Motivation
  - Mismatch between API provided by the system and what was required
    by applications
    - Functions missing or unnecessary, led to excessive communication
      between StarTrack client and server
  - Track comparison not efficient (n^2)

- Map matching (canonicalization)
  - Will discuss how this is done next time

- StarTrack API
  - *Delayed evaluation*: What is it?
  - Technique in general called "lazy evaluation"
  - Principle: Do all the work just before you need to show results to
    the user
  - Calls to the manipulation API just record the sequence of
    operations + arguments
  - Only when tracks are retrieved with GetTracks call are the
    manipulations done

- Track Tree (slide)
  - Goal: speed up retrieval of similar tracks
  - Post map-matching: only segments allowed are road segments
  - Where does the track tree live?  (in-memory, to speed up database
    operations)
  - Is this reasonable?  (Yes - track tree contains one leaf for each
    road segment, few bytes of storage at each leaf, fewer internal
    nodes)
  - Each road segment = a leaf

  - Finding tracks with similarity >= x with track T - algorithm:
    - "Map" step: find nodes in track tree that T covers
    - nodes keep track of which tracks they are associated with
    - c.f. Code segment 3.1 in paper

- Storage platform (Section 4)
  - Partitioned between multiple servers each running SQL server
  - Database tables
    - 1 User table: user information
    - 2 Track table: one track/row; timestamped lat-long coordinates;
      both raw and canonicalized versions in the table
    - 3 Representative track table
      - To insert a new track, check if there is a representative
        track that matches the new track.  If not, insert new track
        into representative track table
      - Replaces k-medians clustering from previous StarTrack
    - 4 Coordinate table: map location id --> (lat, long)
    - 5 Coordinate-to-Track table: map (lat, long) --> tracks
      *or* location id --> tracks -- bit ambiguous here

- Evaluation (slide)
  - Synthetically generated tracks: 6K tracks, 250 users
  - *Separate* "scalability" experiments (synthetic): 4.5M tracks, 
    18K users
  - Did they go far enough?  Google: M's of users, B's of tracks
  - DB filtering: Store set of tracks in DB, use query to find tracks
    in DB that intersect the query track; compute similarity against
    retrieved tracks.
  - In memory brute force: All pairs similarity comparison with all
    tracks *in memory*

- Query time with and without coordinte table
  - Figure 3 (in paper)
  - "Original method": spatial index over all tracks' raw data
  - 100K track dataset
  - Big speedup when we canonicalize tracks: less demand on spatial
    indexing.

- GetSimilarTracks - Query Time (slide)
  - Not evalating across database partitions -- everything fits in one
    database

- Track Tree Construction Costs (slide)
  - Breakeven point: minimum number of queries such that amortized
    query time using TrackTree < query time of bruteforce method
  - Answer: 60-70 queries (Figure 7b)

- Performance of applications (slide)
  - PDD: can handle 250 req/sec while giving a 55 ms service time