The ability to visualize data and models superimposed with their basic social landmarks and geographic context is invaluable in spatial statistics. Visualization of location data is now easier than ever, thanks to the introduction of several new utility functions that now give access to the GoogleGeocoding, Distance Matrix, and Directions APIs.
Geocoding is a useful tool for spatial data analysis that refers to the conversion of addresses to latitude and longitude coordinates and vice versa (reverse geocoding). Geocoding is a skill that any analyst or visualization designer should have when working with spreadsheets. For this task, full-scale geographic information systems (GIS) applications such as QGIS or ArcGIS are available.
In this post, we will go over how to generate a Google Maps API key, how to integrate it with R, and how to perform in-depth geocoding and mapping of location data using the ggmap package for R, which allows you to create maps with ggplot.
R and R Studio ( an R alternative is the Radian), an open-source statistical tool used by data scientists, statisticians, and other researchers will be used for this exercise. We’re using R because it’s a commonly used open-source tool that will enable us to visualize and analyze our data using a variety of methods that can be easily expanded upon. Any prior knowledge of the software and statistics is needed. In the creation of a Google API key, one requires to have a google account and basic knowledge of how to navigate through the Google Cloud Platform.
Google Cloud Platform is a suite of cloud computing services provided by Google that provides on the same technology that Google uses internally for its end-user products. It offers a set of modular cloud services such as computing, data storage, data analytics and machine learning. The cloud has a Maps Platform with APIs for maps, routes, and places based on Google Maps. The map’s API in this exercise will assist in the creation of personalised experience through static and dynamic maps and Street View imagery.
However, due to recent updates to the Google cloud framework, it may be far from a trivial experience to set up an API key and integrate it with an application. For a beginner in Data Science using R who wants to use Google maps integration in an application, here is a step-by-step guide on how to:
- Develop a Google project &
- Create and enable an API key
Develop a Google Project
Visit the Google Cloud page and sign up with your Google account1. Now, press the “Go to console” button2. To continue with the development of a project of the Google Maps API key, on the Google Platform page select the “Select a project” drop menu3. A page to build the project will appear then click the “New Project” button. Name your project on the pop-up. In accordance with the article, we will call the project “Google Maps,” but give it a name that is easy for you to remember, and then click “CREATE”.4
After you’ve been notified that the project has been created, click on the “Select a project” drop menu once more, and the project’s window will appear, then select the “Google Maps” project on it.5 You’ll note the “Select a project” field has been replaced with “Google Maps”. This is the activation of the project. This means any future action or change one makes will have an effect on the project. Now we are ready to create the Maps API within the functional project.
An API key (Application Programming Interface key) is a code that gets passed in by computer applications that authenticates requests associated with your project for usage and billing purposes. The program or application then calls the API to identify its developer. The key comes with access rights for the API that it is associated with. To create one:
- Click the left panel APIs & Services > Credentials
- Click Create credentials > API key
Once you click on the API key, your API key will be created and an “API key created” window will pop up with it. Click “Close”, then proceed to activate google services.
Note: that this is the key that we will call in our R code during integration. Be sure to copy it somewhere.
Enable Google Map Services
- Click “Library” on the left-hand panel. This redirects us to the Google library that has a list of services Google offers for integrations.
- Click “View All” on Maps segments to see all the services then open the service you want to enable on your API key. For this example, activate the Geocoding API and the Maps Embed API.
- Click on the Geocoding API then click Enable. This redirects you to the Geocoding API metrics dashboard where tracking of requests is registered.
- Do the same for the Maps Embed API & Map Static API, Distance Matrix API.
Now we are ready to use the map API key with R.
GEO-CODING AND MAPPING VISUALIZATION
Data structures and plot methods, such as the Rgooglemaps package and related packages, have made it easier to visualize spatial data in R. With the available methods, one can now plot geographical details from a shapefile containing polygons for areal data. Areal data are exhaustive tesselations of the study area, with no portion of the total remaining unassigned to an entity. Areal data, for example, in the field of crime and security, may be composed of several geometric entities, such as “the number of crime incidents that have occurred in the counties/constituencies.”
It is simple for an analyst who has worked on a visualization problem for some time to understand the visual plots after having interacted with the raw data time after time using address data. However, this might not be the case for a viewer engaging with the plots for the first time due to the disconnect between the plot and the data. Unfortunately, this is usually the case.
ggmap as a package contains a number of resources that assist in bridging this gap by producing a consistent way of defining plots that are easily interpretable by both experts and audiences while protecting against graphical inconsistencies.
As a result, we must now learn how to visualize spatial geocoded data in R using the ggmap package and the layered grammar of graphics implementation of the ggplot2 package in combination with static map contextual knowledge for Google maps and OpenStreet maps.
Setting R Environment
The first steps in setting the R environment will be to install and load the necessary libraries that will allow R to perform the geographic functions used for this type of analysis:
In the code above, we have installed the ggmap and other necessary libraries whose functionality we need. As we continue in the exercise, the function of each library will be explained. It is also important to have a centralized working directory where your data and other files are stored. This makes it easier when writing code avoiding long paths in loading data or/& files.
Security data to be used for visualization was gathered and compiled from newspapers, KNBS reports, and local media platforms; The Standard, Nation Media, Citizen, and other radio stations from reported news; covering the period of September 2020 to mid-April 2021. Part of the raw data contains Constituencies and Locations of security incidents in various parts within Nairobi that we will require to find their latitudes and longitudes through geocoding.
Reading the Data
We start by loading in the selected data from the working directory. This is done by creating a variable and read in our data from our variable directory to it. Once run, the
security_incidents variable will contain the data and geographic information that we will analyze. To import the data run:
Summary of data:
First, set the date column to be of date type by the function
as.Date().To learn more about dates check out the Date & Time Analysis with R. Now you can check the overall structure of the data by the
str() function. This returns the number of variables and observations in the data, the data type of each variable, and a sample of observations in each variable:
Now to find brief descriptives of the data:
To find the count of security incidents that have occurred in each constituency, the count of each offence type, and the count of each category of offence in Nairobi for the period registered, run the “#By Constituency”, “#By Offence type” and “#By Offence Category” code.
As mentioned earlier, geocoding is the process of conversion of addresses to latitude and longitude coordinates and vice versa (reverse geocoding). From out data, we seek to find the latitudes and longitudes of the locations.
First, call the Google API key that was earlier created from the Google Cloud, integrating it with R by the
register_google(). Where there is “XXXX”, input the API key. Remember to keep your API key private as every instance is billable.
Now run the below code to loop through the locations to get the latitude and longitude of each location and add it to the security_incidents data frame in the new columns lat and long.
For loop explained:
The initialized security_incidents data frame is where the geocoded addresses will be stored while the loop, which starts with
for(i in 1:nrow(security_incidents)), processes the rest.
From the above, the
paste() function merges the values of the Constituency and Location variables from the security_incidents data frame and adds “Kenya” to each separating them with a comma such that if the constituency is “Roysambu” and location is “Zimmerman” the value will be “Roysambu, Zimmerman, Kenya”. The function does this for all the observations. The values are then assigned to a variable Add is a new data frame. The
$Add[i]tells R to go through every value in the data frame independently and return the longitudes and latitude
output = "latlona" as stored in the google database
source = "google". The
tryCatch() functions to skip to the next value of the loop if the coordinates of a value are not found. The
NA for latitudes and longitudes whose coordinates are not found. Once the loop has gone through the values it stores all the coordinates in a data frame
loc_coord. Finally, the lists in the loc_coord are not merged in the
security_incidents in the columns
lat for latitudes and
long for longitudes.
Now that we have the coordinates of the data we can now visualize the coordinates on the map. For starters, to view if all the addresses were rightly geocoded i.e. fall within the Nairobi county boundary, run:
Some incidents have fallen outside the Nairobi boundary. The error is due to, as stored in the Google database, the location matched a value initially stored with the coordinates that were returned. It is therefore imperative that you record addresses rightly.
Let us also post the security operations instances in the county by running:
The output is the “Incident Mapping by ggmap” image.
Comparing the two “Incidents mapping by leaflet” and the “Incident Mapping by ggmap” (possible with the Map Static API) the latter is more detailed for an observer to relate with. Given a title, it is simple for the observer to read the map and the story it is narrating. This is because, unlike leaflet, ggmap takes another step by situating contextual information of various kinds of static maps in the ggplot2 plotting framework. The result is an easy consistent way of specifying plots that are readily interpretable by both the expert and the audience.
By definition, the layered grammar demands that every plot consists of five components:
- A default dataset with aesthetic mappings,
- One or more layers, each with a geometric object (“geom”), a statistical transformation (“stat”),
and a dataset with aesthetic mappings (possibly defaulted),
- A scale for each aesthetic mapping (which can be automatically generated),
- A coordinate system, and
- A facet specification.
Mapping a Single Address
Let us now plot a single address with details of its surrounding. For example, when the terrorist attack happened at DUSIT D2 one would desire to know its location and details of its environs. For that:
Anyone can now tell that the dusitD2 is Nairobi and near the PigiaMe Riverside Park along the Waiyaki Way. A terrain representation of the same map (top right) is coded as:
Violent Crimes In General
We can get a good idea of the spatial distribution of violent crimes by using a contour plot. However, to have a clear description of what the plot is saying a density plot will do better. The two plots are achieved by:
The conclusion drawn from the density plot is that crime in Nairobi decreases away from the city center. From a day to day activity, it is observed that it is more dangerous to be in the Nairobi city center on a Tuesday and Wednesday. This is illustrated below:
Distance to places(ggmap: mapdist)
That we want to find the distance from where an incident occurred to another occurrence can be helpful in analysis. With the
mapdist() function, this is possible having activated the Distance Matrix API. The Distance Matrix API limits users to 100 requests per query, 100 requests per 10 seconds, and 2500 requests per 24 hours. Say, you want to find the distance between dusitD2 and Westgate Mall, both having been attacked by al-Shabaab militants, the code is as:
The direction route between the two places is coded as below. For alternative routes, set
I have only explored a few of the possibilities R has in mapping, how you play around with data, also determines the type of plots you can produce. Well, don’t stop here.
THE FUN SEGMENT
MAKE a GIF of satellite imagery in R
I wouldn’t leave you without gifting you for getting to the end. Here is a refreshing task, it will take a minute. Go to the Sentinelhub and on the search bar, search a place and navigate through the calendar while clicking “Generate” to download a number of images to your liking of the Ariel view of the place at different dates. After I download five images of the Mozambique, 1740 S Coast Hwy, Laguna Beach(chose an area of your liking), I saved them in a file in the working directory then with R made the gif by:
The result is a beautiful weather progression. Try it out!
The compiled code is found in GitHub.