StarTrack: A Framework for Enabling Track-Based Applications ============================================================ - Mobile devices equipped with hardware, software for location - GPS: Satellite signal, works well outdoors, not indoors - WiFi beacons: overhear beacon identifiers, associate with DB - Cell (3G) tower triangulation: Triangulate based on cell tower ids - _Track_: A time-ordered sequence of location readings - Captures path taken by a device (person) - Contains discretely-sampled points along the true path - Each point may have significant location error (above technologies aren't perfect) - Tracks enable new applications - Ride sharing e.g. - Record tracks of co-workers driving to work - Suggest groups of co-workers who can share a ride - Reduce fuel consumption, road traffic congestion - StarTrack: A system designed for tracks - Mobile devices collect + upload tracks to central cloud server - Invented facilities to store, compare, cluster, index, retrieve tracks - Handling large volumes of tracks across multiple users - Architecture (slide) - Mobiles have _location manager_ that grabs data from GPS/Wifi/cell tower, exposes a uniform location (presumably latitude-longitude coordinates) - _Track recorder_ takes location samples and constructs tracks - Apps decide when to start/stop track recorder - _Star Track client_ talks to server - _Star Track server_: Implemented using SQL Server DB - Briefly discuss privacy issues - Programming interface (slide) - Between Star Track client and Star Track server - *And* between apps on mobile and star track client - Applications decide when to start/stop recording a track (energy- and computation- expensive) - ClipTrack: Restrict to geographic region - ClusterTracks: Clusters a set of tracks - Selects a representative track from each cluster - Returns set of representative tracks - Querying tracks: On the StarTrack server - Use case: Ride Sharing (slide) - Ride sharing: StartTrack when you begin your commute; EndTrack when you finish commute - First narrow to desired commute time - Then RepresentativeTrack clusters tracks and returns representative track from largest cluster, to make sure they regularly drive that route. -Further Use Cases (slide) - Urban sensing: Apps can add metadata to tracks - unrestricted data. - Collaborative downloading: Coordinate (via 802.11) a striped download over many 3G interfaces - Health monitoring: Seems interesting, not explicated here - Track comparison: Challenges (slide) - Input: two tracks; output: value between 0 and 1 quantifying the similarity between the two - Red points = one track; white points = other track - Seasons sampling frequency varies: power of GPS/wireless, speed of movement - Minor detours: For some applications, we want to ignore these - Sub-matches: Person with solid dot track can give ride to person with hollow dot track, but not other way around - Asymmetric: Similarity(T_a,T_b) != Similarity(T_b, T_a) - Intersecting segments: Say they don't want to be biased in favor of same starting/ending points - Track clustering (Section 3.2) - Goal: Eliminate near-duplicate tracks, pick representative set of tracks (*aggregate* information) - Use k-medians clustering; issues? (need to choose k; entire input set must be available) - How do they get around this? (group tracks into epochs) - Track retrieval (Section 3.3) - Spatial indices: DB tracks association between table rows and geographic grids on a map - Each track is a single row in the table containing: - track id, XML metadata, track entry locations, timestamps - API calls map to SQL statements on the database - trks = QueryByTrack(query-track): 1. Find set of regions that surround target track 2. Retrieve all tracks that go through these regions 3. Order tracks by similarity - Evaluation (Section 4) - Nine real tracks from GPS devices in the San Francisco Bay area - Synthetic trace generation - Look at road map of bay area, compute shortest road path between two points - Variables: - Hop distance between successive samples - Measurement error introduced for readings (uniform from 0 to 100 meters) (realistic?) - Track comparison (slide) - Initial validation - Ground truth based on visual comparison of tracks - Later they compute ground truth for synthetic tracks - Parameter selection (slide) - Pick one track t, compare to four sets of 1000 tracks each - Each set particular start + end point - Ground truth based on distances between intersections - But the specific computation is not given in paper - Fixed vs variable comparison (slide) - Is this the whole story? - Doesn't measure the accuracy of evaluating similarity of tracks that are not the same, that should not match - Similarity error (slide) - What is the problem with imposing a minimum distance? - Imposing minimum distance skirts the problem - We will see a better way in StarTrack Next Generation and VTrack - Computational time (slide) - Circle/square techniques implemented use n^2 comparisons, n = number of entries per track - How to speed up? - (Could have used location to filter) - (Keep spatial indices in the database anyway) StarTrack Next Generation ========================= - Motivation - Mismatch between API provided by the system and what was required by applications - Functions missing or unnecessary, led to excessive communication between StarTrack client and server - Track comparison not efficient (n^2) - Map matching (canonicalization) - Will discuss how this is done next time - StarTrack API - *Delayed evaluation*: What is it? - Technique in general called "lazy evaluation" - Principle: Do all the work just before you need to show results to the user - Calls to the manipulation API just record the sequence of operations + arguments - Only when tracks are retrieved with GetTracks call are the manipulations done - Track Tree (slide) - Goal: speed up retrieval of similar tracks - Post map-matching: only segments allowed are road segments - Where does the track tree live? (in-memory, to speed up database operations) - Is this reasonable? (Yes - track tree contains one leaf for each road segment, few bytes of storage at each leaf, fewer internal nodes) - Each road segment = a leaf - Finding tracks with similarity >= x with track T - algorithm: - "Map" step: find nodes in track tree that T covers - nodes keep track of which tracks they are associated with - c.f. Code segment 3.1 in paper - Storage platform (Section 4) - Partitioned between multiple servers each running SQL server - Database tables - 1 User table: user information - 2 Track table: one track/row; timestamped lat-long coordinates; both raw and canonicalized versions in the table - 3 Representative track table - To insert a new track, check if there is a representative track that matches the new track. If not, insert new track into representative track table - Replaces k-medians clustering from previous StarTrack - 4 Coordinate table: map location id --> (lat, long) - 5 Coordinate-to-Track table: map (lat, long) --> tracks *or* location id --> tracks -- bit ambiguous here - Evaluation (slide) - Synthetically generated tracks: 6K tracks, 250 users - *Separate* "scalability" experiments (synthetic): 4.5M tracks, 18K users - Did they go far enough? Google: M's of users, B's of tracks - DB filtering: Store set of tracks in DB, use query to find tracks in DB that intersect the query track; compute similarity against retrieved tracks. - In memory brute force: All pairs similarity comparison with all tracks *in memory* - Query time with and without coordinte table - Figure 3 (in paper) - "Original method": spatial index over all tracks' raw data - 100K track dataset - Big speedup when we canonicalize tracks: less demand on spatial indexing. - GetSimilarTracks - Query Time (slide) - Not evalating across database partitions -- everything fits in one database - Track Tree Construction Costs (slide) - Breakeven point: minimum number of queries such that amortized query time using TrackTree < query time of bruteforce method - Answer: 60-70 queries (Figure 7b) - Performance of applications (slide) - PDD: can handle 250 req/sec while giving a 55 ms service time