Big data, big challenge? Together with Harald Sterly of the University of Cologne I presented a little piece of research in the Extended Spatial Analytics session of the German Geography Congress (Deutscher Kongress für Geographie) in Berlin. The project “Calling Abidjan” that we worked on with Kouassi Dongo of Université de Cocody-Abidjan was started after we successfully applied for participation of the D4D Challenge. According to the initiator Orange telecommunications ‘Data for Development’ is “an innovation challenge open on ICT Big Data for the purposes of societal development”. The project allowed us to work with anonymised mobile phone data from individual call records by Orange in the country of Côte d’Ivoire (Ivory Coast).
We were interested in investigating, what non-computer scientists with a social science and urban planning background can do with such data in a more contextual rather that technically driven way and therefore explored how mobile phone call records can be used to better estimate population distribution.
For our analysis we used anonymised call data records consisting of information about the base station, timestamp, and caller ID produced by the approximately 500.000 Orange Télecom users in the country. There were 1079 base stations at the time the data was generated and we were able to work with data covering 183 days. The dataset consisted of 13GB of raw data which some would perhaps call ‘Big Data’ (though I personally do not like this term for many reasons).
The following two (draft) maps give an insight into the results. The purple circles show the distribution and density of population estimates that we derived using only mobile phone call records dataset. To better see the correlation with what other population data tells us about where people live, we did not only produce a normal land area map (on the left, also displaying some basic idea of the topography in the country) but also showed the data on a gridded population cartogram which we generated from the LandScan population grid, the perhaps most detailed population dataset currently available on a globally consistent high-resolution basis: