The Placelab framework generates geometric positions in terms of longitude and latitude coordinates. However, many applications require a more symbolic notion of locations, as suggested by Hightower et. al. in [2]. Our project is to build an abstraction layer that provides a more symbolic notion of locations from the coordinates generated by the Placelab framework.
The challenge is to label places. Different users may label the same place differently. So, it is unrealistic for one standard organization to label all the places in the world. Rather, each user should be able to label those places where she spends much time.
Now, the question is how to find significant places where the user spends much time. In our project, we find out the significant places from the user's behavior. First, we collect location data. Then, we analyze the user's location data and find the significant places. Finally, we ask the user to label the significant places. This work is related to work by Ashbrook and Starner in [3].
Collecting location data
The Placelab framework provides geometric location in longitude, latitude coordinate - for example, (47.6683932,-122.3141346). The user's location data is timestamped and logged periodically (2 seconds). Below is a plot of locations logged during a day in the life of one user.
Analyzing location data and finding significant places
As seen in the logged data, significant places are those places where the logged data is densely clustered. To find the significant places, we can cluster the longitude and latitude coordinates in the location log data using algorithms such as k-means and Gaussian mixture models (GMM).
K-means clustering aims to find k cluster means such that each data point can be assigned to a unique cluster mean which is closest to that data point. GMM is a generative model. The notion is that each data point is "generated" by one of k gaussians. A priori (i.e. before a data point is observed) there is a prior probability for each gaussian that it will generate the data point. Once the data point is observed, one computes the posterior probability for each gaussian that the data point came from that particular gaussian. To do this, one uses the prior and the likelihood that the data point is drawn from the gaussian in question. The gaussian with the highest posterior probability is typically chosen as the gaussian which generated the data point. See Bishop's book Neural Networks for Pattern Recognition for more details.
The subject that collected this data visited 6 separate locations. If we set the number of clusters to 6, we see that the clustering algorithms include transition points, a less than ideal situation.
(a) Raw data clustered using K-means algorithm with k = 6 |
(b) Raw data clustered using GMM algorithm with 6 gaussians |
|
|
|
Another issue is that these algorithms require that the number of clusters be specified beforehand. (*There are variations of k-means and GMM that compute the number of clusters by themselves.)
To eliminate the transition points and figure out the number of clusters, we apply a filter to the raw location log data. The time-based filter clusters the location data in the time domain. The location data is clustered along the time axis so that the distances among the positions in one cluster is smaller than a certain threshold. After clustering, the filter eliminates those cluster whose time duration is shorter than the time threshold.
(a) Clustering in time domain |
(b) Eliminating small clusters |
|
|
|
Now, we apply the clustering algorithms to the filtered location data. Each algorithm is now able to focus in on the areas of interest. Due to the nice spatial separation of the data, each algorithm picks out the same 6 clusters.
(a) Filtered data clustered using K-means algorithm with k = 6 |
(b) Filtered data clustered using GMM algorithm with 6 gaussians |
|
|
|
Labeling the places
Once the significant places are found, the user can label the places.
Place Extractor in action
We have a simple application, called Position Plotter, that shows the current location of the user in the map and in which place the user is currently. The place is determined by using the results from either the k-means or the GMM clustering algorithm. As noted before, for k-means, one picks the closest cluster and for GMM one picks the highest posterior probability. Given a particular distance threshold or probability threshold, we can label outlier points which should not be assigned a place. Below we show all of the original raw data labeled using the GMM learned from the time-filtered data. The place is denoted by the color of the data point. White points are those which are not assigned a particular place. Locations which were originally removed using the time-filtering approach, but are still close to the extracted places are properly labeled.
Learning Places Over Time
Over time, from log files of the users daily activities, new places can be identified. New places could be defined as clusters of locations where the user spends significant amounts of time, but are not given a label given the current model of places.
Related Work