Personal device data is one of the largest drivers of innovation in areas as diverse as computing, journalism, public health, social science, and urban planning among others. Yet data collection campaigns are very expensive, time consuming, technically, and legally challenging. In addition, researchers in many fields lack the funding and expertise to collect and analyze data from personal devices. Our goal is to enable students, developers, researchers, and scientists across a variety of fields with the data they need to solve important societal problems - and to collect this data using best practices that inform, engage, and compensate data collection volunteers while respecting privacy.
Since cost is a fundamental barrier to collection of large, high-quality datasets, CrowdSignals.io will use crowdfunding to finance the collection of a massive, shared dataset at a per sponsor cost that is orders of magnitude less than do-it-yourself data collection. We aim to have enough sponsors to make the data accessible even to students who need data for a thesis or class project (e.g., sponsors pay $1-$2 per data collection volunteer). The hope is that CrowdSignals.io could generate a massive dataset that jumpstarts work on hundreds of problems that cannot currently be solved due to a lack of data.
AlgoSnap is a C corporation founded and run full-time by Evan Welbourne. AlgoSnap is the entity to which all legal agreements with CrowdSignals.io Sponsors and Volunteers will refer. AlgoSnap also built and runs the system that securely collects, anonymizes, stores, and maintains the personal data during collection. AlgoSnap receives advising from a global panel of experts in different areas such as privacy, data transparency, and ethical data collection. AlgoSnap also works with a major Silicon Valley law firm with world-class specialists in data privacy law. For more on AlgoSnap visit our website.
Visit AlgoSnap Website
We have more than a decade of expertise in sensors, signal processing, machine learning, and pattern classification for mobile and smart devices. AlgoSnap was founded to provide solutions that accelerate the development of intelligent algorithms for IoT devices. As such, AlgoSnap's business is fundamentally about support for algorithm development, not the collection and sale of data. Moreover, all funds contributed by sponsors will be used to pay for data collection - this includes compensation for AlgoSnap staff in addition to the data collection volunteers. At the same time, the lack of quality data is a huge, pervasive problem in the development of intelligent algorithms - and one for which there is no suitable solution available. As a first step toward solving the data scarcity problem we're proposing CrowdSignals.io to build at least one massive dataset that can begin to propel research, innovation, and problem solving in many areas.
Our long term plan with respect to CrowdSignals is to build the most powerful, robust, flexible, and easy to use platform for IoT data collection and make it available to the community for efficient, low-overhead data collection campaigns. We'll leverage our experience in building data collection and analytics infrastructure for numerous institutions including UW, Intel, Nokia, and Samsung among others. We're testing the CrowdSignals platform with limited partners as of early 2016.
Regarding legal structure, AlgoSnap is a C corporation. If we can validate the CrowdSignals.io concept by crowdfunding and executing several initial data collection campaigns then we will look into a legal structure that allows CrowdSignals.io to be spun out as a non-profit.
In late 2015 we collected feedback from academia and industry on priorities for: (1) data collection parameters (e.g., data recorded, volunteer demographics, device categories), (2) ground truth event labels to collect from volunteers (e.g., activities, events, and situations) and (3) anonymization techniques and data license terms that maximize the utility of the data collected while offering an adequate level of protection for volunteers.
We're also gathering data on the number of potential sponsors in order to define appropiate funding levels for the crowdfunding campaign when it launches in early 2016. As noted about, we aim to have enough sponsors to make the data accessible even to students (e.g., $1-2 per data collection volunteer) but this will depend on the community response.
Ensure we incorporate your feedback into consideration by completing our online surveys:
Our 2 minute Survey and Vote On The Data Collection Parameters
Our 2 minute Survey and Vote On The Labels To Collect
Our 2 minute Survey and Vote On The Data Collection Parameters
Visit CrowdSignals.io Main Page
Our 2 minute Survey and Vote On The Labels To Collect
Contact CrowdSignals.io Organizers
Our 2 minute Survey and Vote On The Data Collection Parameters
Contact CrowdSignals.io Organizers
The cost of the data per sponsor will be determined by the number of sponsors that join the CrowdSignals.io crowdfunding campaign. We're actively refining estimates on the number of potential sponsor in order to set costs associated with each funding level in our early 2016 crowdfunding campaign. As noted above, our goal is to drive cost so low that we can make the data accessible to as many sponsors as possible, including students working on thesis or class projects (e.g., each sponsor pays $1-$2 USD per volunteer). For example, a dataset containing rich data from 100 users would cost approximately $100 (US) instead of $50k-$100k+ for a do-it-yourself data collection. The price and terms of use will vary slightly for academic vs. commercial use. We'd love to hear your feedback on the cost of the dataset.
Contact CrowdSignals.io Organizers
Yes, the data collected in the CrowdSignals.io effort will be licensed to a group including yourself via legal agreement. However, a sponsor cannot share the data beyond the declared group. The dataset will also be watermarked with a fingerprint that is unique to each sponsor so that any unauthorized sharing of the data can be traced. Sponsors will be responsible for appointing a site manager responsible for supervising who has access to the data.
Privacy and legal constraints are key reasons why we do not allow the dataset to be shared with any third party. Sponsors gaining access to the data will need to sign a legal agreement with AlgoSnap in which they commit to protect the personal data, for example by not reverse engineering the identities of the data collection volunteers or sharing the data with other entities who have not signed this legal agreement and might use the data for unethical or illegal purposes.
Contact CrowdSignals.io Organizers
Funds collected by the crowdfunding campaign will be applied to compensate volunteers as well as to pay for any equipment, cloud services, software development, AlgoSnap admin personnel, consulting, or legal services.
We may define stretch goals if we surpass the initial funding goal set in a crowdfunding campaign.
You can read our paper "CrowdSignals: A Call to Crowdfund the Community’s Largest Mobile Dataset" published at the ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2014).
Please note that not all technical details (e.g., specific architecture and software components) in the paper are consistent with our current approach nor is the notion of "open data" because the data collected in CrowdSignals.io will be initially only available to sponsors of the project. However, the main ideas and motivations are there as well as a good summary of costs associated with previous large-scale data collections.
Note that there are two key reasons why we are not making the data collected in CrowdSignals.io immediately open to everyone: (1) sponsors gaining access to the data will need to sign a legal agreement with AlgoSnap in which they commit to protect the personal data by for example, not reverse engineering the identities of the volunteers or sharing the data with other entities that have not signed this legal agreement. This level of protection would not be possible with a completely open dataset, (2) it would be unfair for sponsots who contributed funds to the campaign to later find out that other (possibly competing) groups obtained free access to the data at the same time. However, we will release the data more widely and free of charge 12-18 months after it is collected.
See Our UBICOMP 2014 Publication