Backer Profile: Flora Salim, Senior Lecturer at RMIT University

Flora SalimDr. Flora Salim is a Senior Lecturer at the Computer Science and IT department, School of Science, RMIT University. Her research interests are mobile data mining, context-aware computing, activity and behaviour recognition, and context and semantic learning. Her research seeks to enhance user experience by monitoring their behaviours and how they use and interact with their environments, such as in smart home, smart cities, urban transport and mobility, using ambient technologies and ubiquitous computing. Her recent work focuses on analysing and predicting the fine-grained behaviours in human mobility by leveraging heterogeneous sensor data. Previously, she was a Research Fellow at RMIT Spatial Information Architecture Laboratory and an Honorary Research Fellow and Associate Lecturer at Faculty of Information Technology, Monash University. She obtained her PhD in Computer Science from Monash University in 2009. She has secured grants from Australian Research Council, IBM Smarter Cities Lab, Australian Urban Research Infrastructure Network, and numerous industry partners.

Flora will use the data to conduct multiple activity recognition tasks for the purpose of situation recognition.

Follow Flora on Twitter

Featured Sponsor: Goergen Institute for Data Science

Goergen Institute for Data Science

We’re introducing our top sponsors through a series of blog posts over the next few weeks – and we’re happy to announce that the leading academic sponsor of is the Goergen Institute for Data Science at the University of Rochester. The Goergen Institute supports interdisciplinary research in data science across the University of Rochester’s College of Arts and Sciences, the Hajim College of Engineering and Applied Sciences, the School of Medicine and Dentistry and the University of Rochester Medical Center, and the Eastman School of Music. The Institute offers undergraduate and Masters programs in Data Science. In 2017, the Institute and the Department of Computer Science will move into Wegmans Hall, a state of the art facility now under construction.

For more on the Goergen Institute see:

Understanding the neural encoding of time and space in real world memories

We’re introducing the first in a series of fascinating guest posts from experts who have backed the campaign.  Today’s post is from Simon Dennis, Head of the School of Psychology at the University of Newcastle and CEO of Unforgettable Technologies LLC.
— Evan

Simon UoN   We live in exciting times. In the cognitive sciences, the big news for the last twenty or thirty years has been the ability to look inside the functioning brain in real time. A lot has been learned but, as always, science is hard and progress occurs in fits and starts. A critical piece that has been missing is the ability to characterize the environment in which people operate. In the early 1990s, John Anderson introduced rational analysis, which uses statistical analyses of environmental contingencies in order to understand the structure of cognition. Despite showing early promise, the method was stymied by a lack of technologies to collect the environmental data. Now the situation has changed. Smartphones, watches and other wearables are starting to provide us with access to environmental data at scale. For the first time, we can look at cognition from the inside and outside at the same time. Efforts such as are going to be key to realizing the potential.

As an example of what is possible, I would like to highlight a line of research I have been engaged in with Per Sederberg, Vishnu Sreekumar, Dylan Nielson and Troy Smith, which was published in the Proceedings of the National Academy of Sciences last year.

The story starts with rats. In 2014, the Nobel Prize in Physiology and Medicine was awarded to John O’Keefe, May-Britt Moser and Edvard Moser for their discovery of place and grid cells in the rat hippocampus. Within the medial temporal lobe are cells that fire when a rat is in a given location in a room. The cells are laid out in regular patterns creating a coordinate system. For rats, we are talking about spatial scales of meters and temporal scales of seconds. We were interested in whether the same areas would be involved as people attempted to remember experiences over the much longer spatial and temporal scales on which we operate.

To test the idea, we had people wear a smartphone in a pouch around their necks for 2-4 weeks. The phone captured images, accelerometry, GPS coordinates and audio (obfuscated for privacy) automatically as they engaged in their activities of daily living. Later, we showed them their images and asked them to recall their experiences while we scanned their brains using functional magnetic resonance imaging.

We knew when and where each image had been taken, so we were able to create a matrix of the distances between the images in time and in space. Rather than meters and seconds our distances ranged over kilometers and weeks. We then created similar matrices by examining the pattern of neural activity in each of many small regions across the brain for each image. If the pattern of distances in the neural activity is able to predict the pattern of distances in time and/or space then one can deduce that that region is coding information that is related to time and/or space.

We found that areas at the front (anterior) of the hippocampus coded for both time and space. As with the work by O’Keefe, Moser and Moser, it was the hippocampus and surrounding areas that were implicated. What was different, however, is that the regions that were most strongly implicated were at the front of the hippocampus, rather than towards the back as is usually the case. More work is necessary, but one interesting hypothesis is that the scale of both temporal and spatial representations decreases as one moves along the hippocampus towards the back of the brain. Perhaps as people attempt to isolate specific autobiographical events they start with a broad idea of where the event is in time and space and zoom in on the specific event progressively activating representations along the hippocampus.

Beyond this specific hypothesis, this work demonstrates what one can achieve if one combines neuroimaging techniques with experience sampling technologies like smartphones. No doubt it won’t be long before our current efforts are seen as terribly crude. Nonetheless we have a reached a milestone – a place where two powerful techniques intersect – and I think that bodes well for attacking what is in my opinion the most fascinating challenge of all – understanding the human mind.


Nielson, D. M., Smith, T. A., Sreekumar, V., Dennis, S., & Sederberg, P. B. (2015). Human hippocampus represents space and time during retrieval of real-world memories. Proceedings of the National Academy of Sciences,112(35), 11078-11083.

Proposals for Ground Truth: Place, Contacts, Sedentary Activity

Following discussion with several expert backers, we’re proposing the following labels for ground truth on Places Visited, Contact Relationships, and Sedentary Activities.  If you have any feedback, please share it in the comments section or send directly to us at

Place Mining

Places Visited
   The data on places visited captures a user’s mobility patterns using semantic place categories that do not include any absolute location coordinates. We’re using a relatively standard approach to extraction of significant places from location data (e.g., clustering WLAN, GPS, and GSM data over time). The resulting places are initially tied to a geographic location and one or more visit events that include arrival and departure times. participants will review the (geographically pinned) places on a map and provide one of the following labels for each. The data received by backers will include only place labels with no absolute location information.

Major Place Category Minor Place Category
Personal Home, Work, Friend’s house, Family member’s house
Automotive Automobile club, Parking, Car parts, Car rental, Repair service, Car dealership/repair, Car wash, Gas station
Business Bank, Service business, ATM, Convention/Exhibition center, Currency exchange, Manufacturing business
Education University/College, School, Nursery/Pre-school, Elementary school, Middle school, High school
Emergency Pharmacy, General practitioner, Specialist, Dental surgeon/Dentist, Veterinarian, EMS, Fire station, Hospital, Police station
Entertainment Art gallery, Arcade, Casino, Cinema, Museum, Night life/Disco, Stage, Winery
Food & Drink Fast food, Bar, Ice cream, Pizzeria, Restaurant
Government Court house, Embassy, Government office, Prisons
Lodging Camping, Guest house, Hotel, Recreational camp, Youth hostel
Other Travel agency, Cemetery, Others
Recreation Amusement park, Beach, Fairground, National park, National forest, State park, Park, Zoo/Aquarium, Stadium/Arena, Outdoor sport, Bowling alley, Golf course, Ice rink, Sports center, Swimming pool, Tennis court, Marina, Squash court, Pool hall, Others, Hiking ground, Ski resort, Fitness club
Public Services Library, Post office, Tourist information
Service Shops Shopping center, Service shop, Specialty store, Grocery
Tourist Attraction Building, Monument, Mountain, Other tourist attractions, Church
Traffic Related Border post/Frontier crossing, Mountain pass, Rest area
Travel Airline access, Airport, Ferry terminal, Railway station

Contact Relationships

Contact Relationships
   Contact relationship labels capture the participant-contact relationship category for each anonymous contact communicated with during the data collection period.  The proposed labels for use by participants when labeling contact relationships are as follows.

Contact Category Contact Relationship
Family Daughter, Son, Mother, Father, Brother, Sister, Aunt, Uncle, Nephew, Niece, Cousin, Grandfather, Grandmother, Grandson, Granddaughter, Mother-in-law, Father-in-law, Brother-in-law, Sister-in-law
Romantic Husband, Wife, Spouse, Domestic Partner, Significant Other
Friendship Best Friend, Friend, Significant Other
Organizational Boss, Employee, Co-Worker, Business Partner, Teacher, Student, Classmate, Religious Leader, Religious Group Member
Community Neighbor, Member of Community Group

Sedentary Activities
   Sedentary activity labels will be collected simply with a two-part lockscreen questionnaire: first, are you seated? Then if so, what are you doing?  The second label may take on any of the following values:

Sedentary Behaviors
Eating, In Transit, Hobby, Playing a Game, Reading, Socializing, Using Computer, Watching Television, Working


Key Insights from the First-Ever Data Crowdfunding Campaign 134% Funded was funded successfully at 134% of our target. To the best of our knowledge this is the first-ever example of a crowdfunded data set. So while the experiment is still ongoing we wanted to share some of our findings from the fund raising campaign!

Contributions by Field

Backers and Contributions
   We received support from groups representing several thousand researchers and students globally and across a wide variety of disciplines. While the percentage of funds received from each discipline (see above) in part reflects the communities we marketed to, the New York Times article on attracted support from a much more diverse group.  As shown above, a majority of funds came from start-ups that paid to include a custom ground truth label in the data set. Academic and industrial researchers working on the Internet of Things contributed the next largest portion of funding at over $3500 – researcher in Sensors, Data Science, Networks, and HCI formed the next largest segments followed by a variety of other disciplines including Data Journalism, Finance, Psychology, and Urban Planning among others.

It’s also interesting to note that while the majority of contributions came in small denominations (<=$50) from academic researchers (104 of 158 total contributions), the bulk of funds came from groups that wanted to add their own ground truth label to the data.  The chart below shows the distribution of funds received across available Indiegogo perks.


Some Key Insights on Data Crowdfunding
   We’ll learn more as we continue the experiment, but several key insights regarding the funding process have emerged. We share them here in case other individuals or institutions are interested in crowdfunded data collection.

  • Insight: Institutional approval chains are not a natural match for the excitement and urgency of crowdfunding.  Nearly all of our backers are from academic, commercial, or non-profit institutions (very few independent individuals). As such, many needed to obtain institutional approvals to: (1) use funds to purchase data via crowdfunding, and/or (2) use the data for institutional research. For many backers, both (1) and (2) took days or weeks from the time they decided to back to the time the approvals were obtained. We found that for some institutions it was helpful to prepare a customized invoice and data license to accompany the Indiegogo transaction. The “IRB Kit” was also a useful document for academic institutions that needed approval from internal review boards or ethics committees.
  • Insight: Researchers and data scientists are busy people and most do not frequent crowdfunding sites – the best way to reach them is direct email, followed by Facebook, then LinkedIn. A majority of our backers are researchers or data scientists at large institutions – all are very busy people, many have overfull inboxes and slow response times when it comes to communication outside their immediate circle of colleagues. It helped to start getting the word out via email and social media about 6 months before the campaign launched. However, a significant effort was required to connect with many backers even immediately before and after launch. The most effective method proved to be personal follow-up emails (sorry if we spammed you! 🙂 ) – over 60% of backers came to our campaign page via an email link. Facebook and LinkedIn were also effective in advertising the campaign to large pre-existing groups (e.g., professional IoT, Machine Learning, and/or Deep Learning groups). We also found that a majority of our backers had never used Indiegogo before.
  • Insight: The greatest perceived value is in customization of the data set. As shown above, most of the funds received were for adding a custom ground truth label to the data collection campaign.  In this way, several backers were able to customize the data to answer a particular question of interest at their institution.  While research has shown that even completely unlabeled data holds significant value (e.g., it can be mined to reveal significant patterns, one stream of data can be used as a label for another), our observation is that customization is critical for a crowdfunded mobile data set.  Most researchers backing the project indicated that they were specifically interested in the labels we are collecting or that they would need their own labels in order to make use of the data.  This may be different for different types of data (e.g., climate, astronomy), but it does raise the question of whether or not a “one size fits all” data set is actually achievable in this area of research.

We’ll share additional insights as we move forward and would love to hear any feedback you have in the comments below or by email at


Boost! 100+GB Mobile-Social-Sensor-System Data Guaranteed


NYT logo most emailed in NY Times Technology this week:
CrowdSignals Aims to Create a Marketplace for Smartphone Sensor Data
kdnuggets in KDNuggets:, Building Big Mobile Social Sensor Dataset

Help Us Boost The Dataset!

At our current level of funding we’re guaranteeing 100+GB of data from 30 volunteers for 30 days. This includes sensor, social, system, and interaction data in addition to ground truth on contact relationships, places visited, and 2 additional phenomena to be selected by Backers. But we can do better! With your support we can boost the diversity and density of ground truth labels, making the data more useful for an even broader spectrum of researchers and data scientists!

  • It only costs $2 per academic researcher or $5 per data scientist to contribute
  • Visit the Campaign!

Help us prove the concept and receive Big Data at a tiny fraction of the cost. Support the campaign at any level and share or tweet the news!
Contact us directly with any questions or feedback:

Phase 1: 100+GB of Data for Research and Products

Dear Colleagues,

Today we launch Phase 1 of on Indiegogo!  We’re collecting 100+ GB (over 20K hours!) of rich sensor, social, system, interaction, and ground truth data from smartphones and smartwatches. We’re confident we can create an excellent dataset: the real experiment is in crowdfunding and community. Infographic Med

We’re asking for your help to generate funds that will pay volunteers and administrative staff. In return, we’ll share the all collected data, sample code, and a direct connection to a community of 1,000s of researchers and developers.

More about

  • Donations are just $2 per academic researcher and $5 per data scientist or engineer!
  • 100+ GB of sensor, social, system, and interaction data
  • Precise ground truth labels
  • Executed by AlgoSnap, a bootstrapped, Seattle-based start-up
  • Advised by:
    – Andrew Campbell (Dartmouth)
    – Deborah Estrin (Cornell)
    – Henry Tirri (Aalto U)
    – Jason Hong (CMU)

Please support this crowdfunded dataset and/or forward to any lists or colleagues you think may be interested!