Following discussion with several expert backers, we’re proposing the following labels for ground truth on Places Visited, Contact Relationships, and Sedentary Activities. If you have any feedback, please share it in the comments section or send directly to us at email@example.com
The data on places visited captures a user’s mobility patterns using semantic place categories that do not include any absolute location coordinates. We’re using a relatively standard approach to extraction of significant places from location data (e.g., clustering WLAN, GPS, and GSM data over time). The resulting places are initially tied to a geographic location and one or more visit events that include arrival and departure times. CrowdSignals.io participants will review the (geographically pinned) places on a map and provide one of the following labels for each. The data received by CrowdSignals.io backers will include only place labels with no absolute location information.
|Major Place Category
||Minor Place Category
||Home, Work, Friend’s house, Family member’s house
||Automobile club, Parking, Car parts, Car rental, Repair service, Car dealership/repair, Car wash, Gas station
||Bank, Service business, ATM, Convention/Exhibition center, Currency exchange, Manufacturing business
||University/College, School, Nursery/Pre-school, Elementary school, Middle school, High school
||Pharmacy, General practitioner, Specialist, Dental surgeon/Dentist, Veterinarian, EMS, Fire station, Hospital, Police station
||Art gallery, Arcade, Casino, Cinema, Museum, Night life/Disco, Stage, Winery
|Food & Drink
||Fast food, Bar, Ice cream, Pizzeria, Restaurant
||Court house, Embassy, Government office, Prisons
||Camping, Guest house, Hotel, Recreational camp, Youth hostel
||Travel agency, Cemetery, Others
||Amusement park, Beach, Fairground, National park, National forest, State park, Park, Zoo/Aquarium, Stadium/Arena, Outdoor sport, Bowling alley, Golf course, Ice rink, Sports center, Swimming pool, Tennis court, Marina, Squash court, Pool hall, Others, Hiking ground, Ski resort, Fitness club
||Library, Post office, Tourist information
||Shopping center, Service shop, Specialty store, Grocery
||Building, Monument, Mountain, Other tourist attractions, Church
||Border post/Frontier crossing, Mountain pass, Rest area
||Airline access, Airport, Ferry terminal, Railway station
Contact relationship labels capture the participant-contact relationship category for each anonymous contact communicated with during the data collection period. The proposed labels for use by participants when labeling contact relationships are as follows.
||Daughter, Son, Mother, Father, Brother, Sister, Aunt, Uncle, Nephew, Niece, Cousin, Grandfather, Grandmother, Grandson, Granddaughter, Mother-in-law, Father-in-law, Brother-in-law, Sister-in-law
||Husband, Wife, Spouse, Domestic Partner, Significant Other
||Best Friend, Friend, Significant Other
||Boss, Employee, Co-Worker, Business Partner, Teacher, Student, Classmate, Religious Leader, Religious Group Member
||Neighbor, Member of Community Group
Sedentary activity labels will be collected simply with a two-part lockscreen questionnaire: first, are you seated? Then if so, what are you doing? The second label may take on any of the following values:
|Eating, In Transit, Hobby, Playing a Game, Reading, Socializing, Using Computer, Watching Television, Working
CrowdSignals.io was funded successfully at 134% of our target. To the best of our knowledge this is the first-ever example of a crowdfunded data set. So while the experiment is still ongoing we wanted to share some of our findings from the fund raising campaign!
Backers and Contributions
We received support from groups representing several thousand researchers and students globally and across a wide variety of disciplines. While the percentage of funds received from each discipline (see above) in part reflects the communities we marketed to, the New York Times article on CrowdSignals.io attracted support from a much more diverse group. As shown above, a majority of funds came from start-ups that paid to include a custom ground truth label in the CrowdSignals.io data set. Academic and industrial researchers working on the Internet of Things contributed the next largest portion of funding at over $3500 – researcher in Sensors, Data Science, Networks, and HCI formed the next largest segments followed by a variety of other disciplines including Data Journalism, Finance, Psychology, and Urban Planning among others.
It’s also interesting to note that while the majority of contributions came in small denominations (<=$50) from academic researchers (104 of 158 total contributions), the bulk of funds came from groups that wanted to add their own ground truth label to the data. The chart below shows the distribution of funds received across available Indiegogo perks.
Some Key Insights on Data Crowdfunding
We’ll learn more as we continue the CrowdSignals.io experiment, but several key insights regarding the funding process have emerged. We share them here in case other individuals or institutions are interested in crowdfunded data collection.
- Insight: Institutional approval chains are not a natural match for the excitement and urgency of crowdfunding. Nearly all of our backers are from academic, commercial, or non-profit institutions (very few independent individuals). As such, many needed to obtain institutional approvals to: (1) use funds to purchase data via crowdfunding, and/or (2) use the CrowdSignals.io data for institutional research. For many backers, both (1) and (2) took days or weeks from the time they decided to back CrowdSignals.io to the time the approvals were obtained. We found that for some institutions it was helpful to prepare a customized invoice and data license to accompany the Indiegogo transaction. The CrowdSignals.io “IRB Kit” was also a useful document for academic institutions that needed approval from internal review boards or ethics committees.
- Insight: Researchers and data scientists are busy people and most do not frequent crowdfunding sites – the best way to reach them is direct email, followed by Facebook, then LinkedIn. A majority of our backers are researchers or data scientists at large institutions – all are very busy people, many have overfull inboxes and slow response times when it comes to communication outside their immediate circle of colleagues. It helped to start getting the word out via email and social media about 6 months before the campaign launched. However, a significant effort was required to connect with many backers even immediately before and after launch. The most effective method proved to be personal follow-up emails (sorry if we spammed you! 🙂 ) – over 60% of backers came to our campaign page via an email link. Facebook and LinkedIn were also effective in advertising the campaign to large pre-existing groups (e.g., professional IoT, Machine Learning, and/or Deep Learning groups). We also found that a majority of our backers had never used Indiegogo before.
- Insight: The greatest perceived value is in customization of the data set. As shown above, most of the funds received were for adding a custom ground truth label to the data collection campaign. In this way, several backers were able to customize the data to answer a particular question of interest at their institution. While research has shown that even completely unlabeled data holds significant value (e.g., it can be mined to reveal significant patterns, one stream of data can be used as a label for another), our observation is that customization is critical for a crowdfunded mobile data set. Most researchers backing the project indicated that they were specifically interested in the labels we are collecting or that they would need their own labels in order to make use of the data. This may be different for different types of data (e.g., climate, astronomy), but it does raise the question of whether or not a “one size fits all” data set is actually achievable in this area of research.
We’ll share additional insights as we move forward and would love to hear any feedback you have in the comments below or by email at firstname.lastname@example.org
Help Us Boost The Dataset!
At our current level of funding we’re guaranteeing 100+GB of data from 30 volunteers for 30 days. This includes sensor, social, system, and interaction data in addition to ground truth on contact relationships, places visited, and 2 additional phenomena to be selected by Backers. But we can do better! With your support we can boost the diversity and density of ground truth labels, making the data more useful for an even broader spectrum of researchers and data scientists!
- It only costs $2 per academic researcher or $5 per data scientist to contribute
- Visit the Campaign!
Help us prove the concept and receive Big Data at a tiny fraction of the cost. Support the campaign at any level and share or tweet the news!
Contact us directly with any questions or feedback: email@example.com
Today we launch Phase 1 of CrowdSignals.io on Indiegogo! We’re collecting 100+ GB (over 20K hours!) of rich sensor, social, system, interaction, and ground truth data from smartphones and smartwatches. We’re confident we can create an excellent dataset: the real experiment is in crowdfunding and community.
We’re asking for your help to generate funds that will pay volunteers and administrative staff. In return, we’ll share the all collected data, sample code, and a direct connection to a community of 1,000s of researchers and developers.
More about CrowdSignals.io:
- Donations are just $2 per academic researcher and $5 per data scientist or engineer!
- 100+ GB of sensor, social, system, and interaction data
- Precise ground truth labels
- Executed by AlgoSnap, a bootstrapped, Seattle-based start-up
- Advised by:
– Andrew Campbell (Dartmouth)
– Deborah Estrin (Cornell)
– Henry Tirri (Aalto U)
– Jason Hong (CMU)
Please support this crowdfunded dataset and/or forward to any lists or colleagues you think may be interested!