Proposals for Ground Truth: Place, Contacts, Sedentary Activity

Following discussion with several expert backers, we’re proposing the following labels for ground truth on Places Visited, Contact Relationships, and Sedentary Activities.  If you have any feedback, please share it in the comments section or send directly to us at

Place Mining

Places Visited
   The data on places visited captures a user’s mobility patterns using semantic place categories that do not include any absolute location coordinates. We’re using a relatively standard approach to extraction of significant places from location data (e.g., clustering WLAN, GPS, and GSM data over time). The resulting places are initially tied to a geographic location and one or more visit events that include arrival and departure times. participants will review the (geographically pinned) places on a map and provide one of the following labels for each. The data received by backers will include only place labels with no absolute location information.

Major Place Category Minor Place Category
Personal Home, Work, Friend’s house, Family member’s house
Automotive Automobile club, Parking, Car parts, Car rental, Repair service, Car dealership/repair, Car wash, Gas station
Business Bank, Service business, ATM, Convention/Exhibition center, Currency exchange, Manufacturing business
Education University/College, School, Nursery/Pre-school, Elementary school, Middle school, High school
Emergency Pharmacy, General practitioner, Specialist, Dental surgeon/Dentist, Veterinarian, EMS, Fire station, Hospital, Police station
Entertainment Art gallery, Arcade, Casino, Cinema, Museum, Night life/Disco, Stage, Winery
Food & Drink Fast food, Bar, Ice cream, Pizzeria, Restaurant
Government Court house, Embassy, Government office, Prisons
Lodging Camping, Guest house, Hotel, Recreational camp, Youth hostel
Other Travel agency, Cemetery, Others
Recreation Amusement park, Beach, Fairground, National park, National forest, State park, Park, Zoo/Aquarium, Stadium/Arena, Outdoor sport, Bowling alley, Golf course, Ice rink, Sports center, Swimming pool, Tennis court, Marina, Squash court, Pool hall, Others, Hiking ground, Ski resort, Fitness club
Public Services Library, Post office, Tourist information
Service Shops Shopping center, Service shop, Specialty store, Grocery
Tourist Attraction Building, Monument, Mountain, Other tourist attractions, Church
Traffic Related Border post/Frontier crossing, Mountain pass, Rest area
Travel Airline access, Airport, Ferry terminal, Railway station

Contact Relationships

Contact Relationships
   Contact relationship labels capture the participant-contact relationship category for each anonymous contact communicated with during the data collection period.  The proposed labels for use by participants when labeling contact relationships are as follows.

Contact Category Contact Relationship
Family Daughter, Son, Mother, Father, Brother, Sister, Aunt, Uncle, Nephew, Niece, Cousin, Grandfather, Grandmother, Grandson, Granddaughter, Mother-in-law, Father-in-law, Brother-in-law, Sister-in-law
Romantic Husband, Wife, Spouse, Domestic Partner, Significant Other
Friendship Best Friend, Friend, Significant Other
Organizational Boss, Employee, Co-Worker, Business Partner, Teacher, Student, Classmate, Religious Leader, Religious Group Member
Community Neighbor, Member of Community Group

Sedentary Activities
   Sedentary activity labels will be collected simply with a two-part lockscreen questionnaire: first, are you seated? Then if so, what are you doing?  The second label may take on any of the following values:

Sedentary Behaviors
Eating, In Transit, Hobby, Playing a Game, Reading, Socializing, Using Computer, Watching Television, Working


Key Insights from the First-Ever Data Crowdfunding Campaign 134% Funded was funded successfully at 134% of our target. To the best of our knowledge this is the first-ever example of a crowdfunded data set. So while the experiment is still ongoing we wanted to share some of our findings from the fund raising campaign!

Contributions by Field

Backers and Contributions
   We received support from groups representing several thousand researchers and students globally and across a wide variety of disciplines. While the percentage of funds received from each discipline (see above) in part reflects the communities we marketed to, the New York Times article on attracted support from a much more diverse group.  As shown above, a majority of funds came from start-ups that paid to include a custom ground truth label in the data set. Academic and industrial researchers working on the Internet of Things contributed the next largest portion of funding at over $3500 – researcher in Sensors, Data Science, Networks, and HCI formed the next largest segments followed by a variety of other disciplines including Data Journalism, Finance, Psychology, and Urban Planning among others.

It’s also interesting to note that while the majority of contributions came in small denominations (<=$50) from academic researchers (104 of 158 total contributions), the bulk of funds came from groups that wanted to add their own ground truth label to the data.  The chart below shows the distribution of funds received across available Indiegogo perks.


Some Key Insights on Data Crowdfunding
   We’ll learn more as we continue the experiment, but several key insights regarding the funding process have emerged. We share them here in case other individuals or institutions are interested in crowdfunded data collection.

  • Insight: Institutional approval chains are not a natural match for the excitement and urgency of crowdfunding.  Nearly all of our backers are from academic, commercial, or non-profit institutions (very few independent individuals). As such, many needed to obtain institutional approvals to: (1) use funds to purchase data via crowdfunding, and/or (2) use the data for institutional research. For many backers, both (1) and (2) took days or weeks from the time they decided to back to the time the approvals were obtained. We found that for some institutions it was helpful to prepare a customized invoice and data license to accompany the Indiegogo transaction. The “IRB Kit” was also a useful document for academic institutions that needed approval from internal review boards or ethics committees.
  • Insight: Researchers and data scientists are busy people and most do not frequent crowdfunding sites – the best way to reach them is direct email, followed by Facebook, then LinkedIn. A majority of our backers are researchers or data scientists at large institutions – all are very busy people, many have overfull inboxes and slow response times when it comes to communication outside their immediate circle of colleagues. It helped to start getting the word out via email and social media about 6 months before the campaign launched. However, a significant effort was required to connect with many backers even immediately before and after launch. The most effective method proved to be personal follow-up emails (sorry if we spammed you! 🙂 ) – over 60% of backers came to our campaign page via an email link. Facebook and LinkedIn were also effective in advertising the campaign to large pre-existing groups (e.g., professional IoT, Machine Learning, and/or Deep Learning groups). We also found that a majority of our backers had never used Indiegogo before.
  • Insight: The greatest perceived value is in customization of the data set. As shown above, most of the funds received were for adding a custom ground truth label to the data collection campaign.  In this way, several backers were able to customize the data to answer a particular question of interest at their institution.  While research has shown that even completely unlabeled data holds significant value (e.g., it can be mined to reveal significant patterns, one stream of data can be used as a label for another), our observation is that customization is critical for a crowdfunded mobile data set.  Most researchers backing the project indicated that they were specifically interested in the labels we are collecting or that they would need their own labels in order to make use of the data.  This may be different for different types of data (e.g., climate, astronomy), but it does raise the question of whether or not a “one size fits all” data set is actually achievable in this area of research.

We’ll share additional insights as we move forward and would love to hear any feedback you have in the comments below or by email at


Boost! 100+GB Mobile-Social-Sensor-System Data Guaranteed


NYT logo most emailed in NY Times Technology this week:
CrowdSignals Aims to Create a Marketplace for Smartphone Sensor Data
kdnuggets in KDNuggets:, Building Big Mobile Social Sensor Dataset

Help Us Boost The Dataset!

At our current level of funding we’re guaranteeing 100+GB of data from 30 volunteers for 30 days. This includes sensor, social, system, and interaction data in addition to ground truth on contact relationships, places visited, and 2 additional phenomena to be selected by Backers. But we can do better! With your support we can boost the diversity and density of ground truth labels, making the data more useful for an even broader spectrum of researchers and data scientists!

  • It only costs $2 per academic researcher or $5 per data scientist to contribute
  • Visit the Campaign!

Help us prove the concept and receive Big Data at a tiny fraction of the cost. Support the campaign at any level and share or tweet the news!
Contact us directly with any questions or feedback:

Phase 1: 100+GB of Data for Research and Products

Dear Colleagues,

Today we launch Phase 1 of on Indiegogo!  We’re collecting 100+ GB (over 20K hours!) of rich sensor, social, system, interaction, and ground truth data from smartphones and smartwatches. We’re confident we can create an excellent dataset: the real experiment is in crowdfunding and community. Infographic Med

We’re asking for your help to generate funds that will pay volunteers and administrative staff. In return, we’ll share the all collected data, sample code, and a direct connection to a community of 1,000s of researchers and developers.

More about

  • Donations are just $2 per academic researcher and $5 per data scientist or engineer!
  • 100+ GB of sensor, social, system, and interaction data
  • Precise ground truth labels
  • Executed by AlgoSnap, a bootstrapped, Seattle-based start-up
  • Advised by:
    – Andrew Campbell (Dartmouth)
    – Deborah Estrin (Cornell)
    – Henry Tirri (Aalto U)
    – Jason Hong (CMU)

Please support this crowdfunded dataset and/or forward to any lists or colleagues you think may be interested!