Big data, big challenge? Together with Harald Sterly of the University of Cologne I presented a little piece of research in the Extended Spatial Analytics session of the German Geography Congress (Deutscher Kongress für Geographie) in Berlin. The project “Calling Abidjan” that we worked on with Kouassi Dongo of Université de Cocody-Abidjan was started after we successfully applied for participation of the D4D Challenge. According to the initiator Orange telecommunications ‘Data for Development’ is “an innovation challenge open on ICT Big Data for the purposes of societal development”. The project allowed us to work with anonymised mobile phone data from individual call records by Orange in the country of Côte d’Ivoire (Ivory Coast).
We were interested in investigating, what non-computer scientists with a social science and urban planning background can do with such data in a more contextual rather that technically driven way and therefore explored how mobile phone call records can be used to better estimate population distribution.
For our analysis we used anonymised call data records consisting of information about the base station, timestamp, and caller ID produced by the approximately 500.000 Orange Télecom users in the country. There were 1079 base stations at the time the data was generated and we were able to work with data covering 183 days. The dataset consisted of 13GB of raw data which some would perhaps call ‘Big Data’ (though I personally do not like this term for many reasons).
The following two (draft) maps give an insight into the results. The purple circles show the distribution and density of population estimates that we derived using only mobile phone call records dataset. To better see the correlation with what other population data tells us about where people live, we did not only produce a normal land area map (on the left, also displaying some basic idea of the topography in the country) but also showed the data on a gridded population cartogram which we generated from the LandScan population grid, the perhaps most detailed population dataset currently available on a globally consistent high-resolution basis:
The correlations that show up in these (admittedly quite drafty and basic) maps already gives an idea of our preliminary findings: Only using mobile phone call records we were able to reproduce a number of similar patterns, namely the higher population densities in the Southeast and the Centre, the lower densities in the Northeast and Northwest, and particularly the urban areas of Abidjan, Yamoussoukro, Bouke, Man and others (which especially stand out in the gridded population cartogram and show the details within these areas). However, aggregated on the 255 subprefectures our population dataset was not consistent in all parts. Considerable differences can be seen especially in the Western parts of the country and, most notable, in the Southwest.
Further analysis showed that the call data record analysis generally seems to overestimate population figures in the urban areas and to underestimate in rural areas or areas with little population density.
These could be explained with differentiated mobile phone subscription rates in urban and rural areas, but possibly also inaccuracies in the underlying census data and population models of the AfriPop dataset that we used to validate our approach.
To highlight one other notable difference: in the Southwest (region of Bas-Sassandra) the figures derived from mobile usage is significantly higher which we interpret as caused by a particularly positive economic development of the region (to a large extent related to the port of San-Pédro), resulting in both population growth as well as higher mobile subscription rates.
Our presentation at the conference was far from over-technical, but instead framed in a geographic context of the ongoing urbanisation processes on the African continent and the relevance of mobile phone communication technologies in this region (which I looked at from a health perspective with another colleague). To get an idea of the full context, take a look at the slides that we used during our talk (also available on Slideshare):