Using what we know to solve what we don’t

Every two days we generate an equivalent amount of data to that produced by the whole of human history up to 2003. Such a scale of data is almost daunting, with much, if not most of it, recorded consistently and ubiquitously through sensors embedded into the tools of our everyday: mobile phones, tablets, barcodes and travel passes. As global mobile phone penetration reached 95.5% in 2014, and 90% for developing nations, this advancing use of mobile phones is providing researchers with a veritable feast of data from users. The data are collected not only directly through the sensors in smartphones, (GPS, gyroscope, accelerometer, microphone, camera and Bluetooth) but also indirectly through the cellular infrastructure, creating enormous streams of ‘big’ datasets. This big data represents a new form of research methods for scientific endeavour. The sheer size of datasets calls for novel ways of interpreting, analysing and discussing the knowledge it creates. Distinguished by its volume, velocity and variety, much big data will be geographically referenced and in real time - a sea change in statistical analysis from small-scale studies that comment on static data from the past.

So what can big (mobile) data do? Already, the data generated through user internet searches has been used by marketers to enhance our online advertising experience with ‘targeted’ ads specified to our frequently searched phrases - a development that is thought of by some as useful, and by others as unnerving. As mobile phone penetration has skyrocketed, so has the opportunity to further understand consumer behaviour and create personalised advertising. With a vast amount of people using their mobiles as their primary internet source, clickstream data (which monitors the browsing habits of users by recording a tiny text file of search/viewing history) grants access to the trends in internet searches for shopping, information and entertainment purposes. The development of Google Now demonstrates such a use of big data collection in action for users: a function of the Google search application, it works by recording the frequent actions of users (common locations, popular contacts, calendar appointments etc) providing relevant information to the user such as nearby attractions/events, product listings, developing localised news stories, traffic alerts and event reminders, mostly in anticipation of the user’s location and schedule.

Social behaviours

The data deluge is clearly extremely valuable for marketing applications, but also has the potential to provide new areas and methods of researching social behaviours. Mobile big data has been used to understand the structure of social groupings and how they adapt during critical times. Research in Oaxaca, Mexico analysed cellular data of social responses to an urban earthquake in 2012. Research on the volume of calls, call duration and extent of social circles found that in the minutes after the quake volume increases, duration decreased and social circles expanded widely. The change in call behaviours is easily explained - a panic situation begins during the earthquake, with call volumes increased (larger increases seen in cities closer to the epicentre) as inhabitants rapidly make short calls to their immediate family and then wider social group to confirm their safety. Previously, this kind of information could only be gained via self-reporting - the ability to provide data-based evidence is thus a step forward.

Mining data from smartphones and the cellular infrastructure now allows us to access detailed knowledge about people, things and events that are currently happening, potentially anywhere in the world. The dissemination of new scientific information across the globe is another social behaviour investigated with the use of mobile big data. De Domenico et al.’s 2013 paper ‘The anatomy of a scientific rumour’ crawled worldwide Twitter feeds after the announcement from the research team at CERN Switzerland of the discovery of a Higgs-boson like particle. The data enabled researchers to model the spread of information across the network via the methods of tweets and retweets, with users commenting and giving their personal opinions as well. Clearly, mining big data for analysis is opening up new methods and areas of scientific inquiry.

City planning

What attracts people to places is other people. Big data can further our understanding of why locations work as they do, opening doors for us to see deeper into the meanings behind places, consider new reasons for what makes some areas successful (busy but efficient and safe) or failures (congested or desolate, with a high fear of crime), and see the ways in which the public interact with space and infrastructure. Roth et al (2011) created a map of the flows of pedestrian movement across London as passengers used Oyster cards to travel on the tube. A polycentric spatial arrangement was found comprised of large flows organised around a limited number of activity centres- rather than one central hub of activity. Knowledge such as this about an area allows city planners to better orientate their designs to locals’ actual use of space. Understanding individual traffic movement patterns will allow planners to look to increase efficiency of public transport routes, answering questions such as where to ease congestion by creating new lines or directing new lines to existing stations. The Array of Things team is an urban planning project hoping to ensure central Chicago’s ‘success’ as an area using this idea. The plan includes installing hundreds of sensors across the city that would capture data such as: light level, ambient volume, pollution, humidity, pollen count, smartphone usage, and parking availability. Collecting real-time data on the activity, environment and infrastructure surrounding a sensor has potential uses including providing locals with more fine-grained weather updates and quicker and safer walking/driving routes through the city. All the data recorded would also be openly available for public download, so that it can be used by app developers and researchers too.

Big data for development

The outcomes of big data mining projects have not been a solely developed nation phenomenon. Global Pulse, a UN initiative launched in 2009, is a program with the aim to mine data for assisting development and humanitarian projects in developing countries. The areas of research in this project are vast: food and agriculture, humanitarian action, economic well-being, climate and resilience, gender, and public health. This research therefore helps us explore issues including immunisation awareness, trends in workplace discrimination, estimating migration flows, early warnings of conflict, disaster management during floods, seasonal mobility of populations and measuring global engagement on climate change. Here, big data has been able to go beyond targeted ads and assisted governments and humanitarians in receiving up to date policy feedback and greater knowledge of the groups they strive to assist.

Much of the work in such a project is focused on aiding humanitarian projects in response to major environmental events or violent conflict. Intervention planning for emergencies depends greatly upon knowing where people are. With this in mind, mobile data, which provides a geographically located ‘tag’ of phone users, can enable researchers to describe trends in the macroscopic behaviour of populations through creating relatively cheap population maps in emergency and data scarce situations. This feature of mobile data can be incredibly useful in low-income countries where directed big data sourcing by governments may be infrequent, (e.g in the Democratic Republic of Congo, the last census was taken in 1984, however mobile phone penetration here is 64%). A team working with Antonio Lima as part of the Orange Data for Development Challenge utilised a dataset of mobile phone calls from Ivory Coast and found that social ties are an incredibly useful tool to be manipulated in intervention planning. When vital information needs to be disseminated rapidly (e.g. at the beginning of a disease epidemic), ‘social beacons’ (individuals that make a large number of calls to a wide social set) in communities can be contacted to ensure greater spread of information such as the nearest vaccination centre and suggested hygiene practices to prevent transmission. With knowledge of phone user location, as well as the prime candidates within a population for information spread, governments and organisations tasked with responding to crises are equipped with greater ability to reach vulnerable people more quickly.

Though big data analysis, as an emerging discipline, has not come without its concerns. Our mobile devices are with us for a significant portion of our daily lives, acting as witness to many private moments. Thus issues of privacy abound as sceptics and proponents debate the ability of personal information to be de-anonymised and individuals identified from their activity on social media, particularly with mobile data mined from geo-social media (when users tag themselves on location). Experimental results have found that identification strategies can achieve an accuracy of more than 80% using only 10 ‘check-ins’. One can argue that by participating in location based social networks, users are implicitly agreeing to the privacy disclosure agreement, as they are sharing their location and information to all other users. However there is clearly a call for greater public awareness and information campaigns concerning this, as many users will remain unaware of the potential uses of their data.

This could be partly addressed for the future with the introduction of more computer science based classes in schools, as an attempt to demystify the technology and aims of research in this discipline, as well as encouraging greater interest in the emerging field. Currently only few university institutions in the UK are able to offer course at undergraduate level, despite widespread agreement that computational social science is a fundamental skill for the modern workplace. Educating the public of the technology, developments and potential issues would surely only make them more aware, better informed, and able to decide when to update their social media status.

Sarah Willson is an undergraduate student of Geography at University College London, and can be contacted on Twitter at @SarahWllsn