# A Geographic Information System with ACLED Analysis in R: Part 2

## Introduction

In Part 1 of this series, I showed you a comprehensive Exploratory data analysis of the ACLED data of Zimbabwe, which we had extracted from the ACLED API via the `acled.api`. I also mentioned to you that in the data are the disaggregated location coordinates of individual security incidents in the database. In this project, our aim is first to seek the translation of the exploratory analysis information and project it into spatial bit and after that visualize the same on maps.

### What is Spatial Analysis?

Spatial analysis, often known as spatial statistics, uses analytical techniques to examine location properties and feature correlations in spatial data. It covers the application of formal approaches to studying locations based on topological, geometric, or geographic features. Spatial analysis, also known as locational analysis, is a sort of geographical analysis that aims to explain patterns of human behaviour, crime, and its spatial expression in terms of mathematics and geometry.

The basic purpose of this analysis, like any other that would be carried out, is to gain helpful knowledge and insight. In each step, the analysis must be performed with great care to produce authentic and dependable results from which decisions can be made. While most analyses present a hypothetical conclusion, data acquired in real-time tends to bridge the gap between the hypothesis that can be made while doing analysis and the decisions that may be made based on the results obtained from the study. As a result, it should be mentioned that any researcher should therefore be objective in any analysis.

Spatial analysis is the process of extracting or creating new information from spatial data. Spatial analysis is used in a variety of applications, including emergency management, service delivery, business location and retail analysis, transportation modelling, crime and disease mapping, land process and natural resource management, to mention a few. Spatial analysis has been regarded as helpful for use in any field that deals with the planning of services and/or goods, taking into account measures such as population, gender distribution, mortality rates, and other aspects.

Spatial analysis answers the where questions of spatial data analysis: Where is the location(s) with the highest crime rate? Where has there been a geological change? What is the ideal place to open a new restaurant? Spatial data are relative spatial information about the earth and its features. A given location on Earth is defined by a pair of latitude and longitude coordinates. According to the storage technique, spatial data is classified into, raster data and vector data.

Raster data are composed of grid cells identified by row and column. The whole geographic area is divided into groups of individual cells, which represent an image. Satellite images, photographs, scanned images, etc., are examples of raster data. Vector data are composed of points, polylines, and polygons. Wells, houses, etc., are represented by points. Roads, rivers, streams, etc., are represented by polylines. Villages and towns are represented by polygons.

Geoprocessing is a critical tool in spatial analysis that should not be overlooked. Geoprocessing is a GIS activity that involves manipulating GIS data. A typical geoprocessing operation takes an input dataset, executes an operation on it, and returns the result as an output dataset. Geographic feature overlay, feature selection and analysis, topological processing, raster processing, and data conversion are all common geoprocessing procedures and in R all these exercises are possible to do. Geoprocessing enables the definition, management, and analysis of data utilized to make decisions. Simply said, geoprocessing uses an appropriate tool(s) to conduct a spatial analysis to generate new information/data.

As a build-up from the Google Map API Setup & Geo-Coding with R for the Using R project, we will continue exploring the spatial packages in R using them to visualize the info acquired previously from Part 1 of the project and translate them into geographical information visually. R as a Geographically Information System (GIS) is equipped with the `cartography`, `leaflet`, `sf`, `sp`, `rdgal` and many more packages that enable us to effectively carry out the desired functionalities. Without wasting time, let us get to it.

## Setting up the Environment

From the Humanitarian Data Exchange website, download the Zimbabwe shapefile in a zip file. The file contains the polygon coordinates that represent the `admin0`, `admin1`, `admin2` and `location` boundaries in the Crime_in_Zimababwe data that we had initially downloaded. Unzip the file in a working folder.

Like before, install and load the necessary spatial packages in R. This will be the sf, leaflet, leaflet.opacity, leaflet.providers & cartography packages.:

Now load the administrative boundaries from the folder they are stored in. I have initialized them such that:

• `zim`: represents the country boundaries
• `level1`: stores the provincial boundaries of Zimbabwe
• `level2`: district boundaries are registered in this variable &
• `level3`: wards and locations are stored here

It is possible to plot the boundaries all at the same time and also in pairs or threes. Depending on what you want to visualize. Below is a visual for the country and provincial boundaries.

code:

Output:

For other pairs and three below are the codes and the respective outputs.

code:

Output:

## Map Visualization

Part 1 of this excerpt concentrated on the exploratory statistics of the ACLED data. The initiative served as the foundation for generating aggregated statistics, which were mostly based on the locations of provinces, districts, wards, and sites where crime occurrences happened in Zimbabwe. Through the study, we were also able to identify tendencies of rampant activity in Zimbabwe, with protest action being the most prevalent in the country. Click here to see the graphical representations of the Part 1 project.

While the aggregated pictorial representation of the data supplied us with enough information to get started, it would be an injustice not to highlight the advantage of the aggregates per region where the crimes occurred. We will use map visualizations to demonstrate the charting of individual incidents and location aggregates in this project.

Map graphics, in essence, present the story quickly for someone who does not need to scratch the data for analysis. Map visualization is the process of analyzing, displaying, and presenting spatially connected data in the form of maps. This type of data expression is more straightforward and intuitive. We can view the distribution or proportion of data in each region visually. It is easier for everyone to gather more information and make better decisions.

In addition to the high efficiency of transmitting information are the aesthetics that map visualizations offer. No matter how boring the content is, it will be eye-catching as long as it is equipped with a cool map. Let me catch your attention too.

#### How many incidents occurred in each province?

We found that Harare, Manicaland, and Mashonaland East as the provinces with very high rates of insecurity in the aggregates. By assumption and virtue of Harare City being the capital city of Zimbabwe, it registered the highest number of insecurity incidents. Conjointly this was also expressed in provincial aggregates. To visualize the information on the map:

Code:

I first converted the count into log aggregates by the log10() function and stored the values in the crime `vector`. I then stored the colour palettes I would like to use in the mapping in the pal vector. The ‘#’ codes represent the CSS general colour codes and you can find them by easily searching the web. Th different color palettes include the RColorBrewer, Wes Anderson and the Viridis. I then used the level_1 vector with the provincial boundaries polygon to plot the map. The crime count of each province determined the colour palettes per province
Output:

The base map used by default while using `leaflet` is the OpenStreetMap. As noted in the map, an increase in the number of insecurity incidents in the regions is the darker the color for that particular province. Harare, recording the highest number of incidents tends to be darker than any other province. Matebaland North and Matebaland South registered the lowest counts.

One would also consider the cartography package in exploring the provincial statistics. Below is the equivalent code from the cartography package.

Code:

For this package, I first attached the aggregate map to the `level_1` spatial data. The circle radius has been signified by the counts of insecurity incidents.

Output:

#### What is the rate of insecurity per district in Zimbabwe?

To map the following question we first needed to aggregate the count in each district. Thereafter, I changed the column names of the table aggregated data.

Nor all the districts in Zimbabwe were recorded in the data. This was noticed while trying to merge the level_2 district map file with its respective aggregated per district. Hence, I first sought the spatial data entries of the districts recorded with insecurity incidents. I then stored the aggregates in the new district spatial data and then plotted the map with the `cartography` package.

Output:

The district of Bulawayo recorded the highest number of incidents. On 23 June 2018, a grenade exploded at White City Stadium in Bulawayo, Zimbabwe. The blast occurred at a ZANU–PF campaign rally, just after President Emmerson Mnangagwa had finished giving a speech. The bombing resulted in at least 49 injured, including Vice-Presidents Constantino Chiwenga and Kembo Mohadi, and other high-ranking government officials. Two security agents later died of their injuries. This is but one of the major incidents in Bulawayo.

In 2019, the district faced multiple protests over freedom of expression and assembly, women’s and girls’ rights and gender Identity violence, and rights health. On the other hand, Bulawayo is known for its famous location of gold mines where the mineral is excavated before transportation. This could be one of the major reasons for the high insecurity of the region due to limited access to the resource known for its high value in trade. In pursuit of making a living, the lust of residents might prompt the rise of motives, crossing the lines, to meet their needs.

Hurungwe North district also recorded a very high insecurity rate. Violence against civilians, protests, and riots recorded a very high number of incidents. From the population statistics, it was noted that the area is moderately densely populated. With Violence against protests being rampant, robbery incidents to pedestrians, business and residential places, mob violence, killings and attacking activities against motorists in transit are expected in the area.

Hurungwe North is also known for the growth of Tobacco, followed by Cotton. By speculation in the transference of these, it is more likely to encounter bandits and highway banditry could be an insecurity problem in the area. Generally, the East of Zimbabwe tends to face more insecurity than the west with the central parts falling moderately between the two. From our plot, as the numbers of insecurity incidents increase the darker the color of the district gets.

#### Describe the countries population against crime incidents on a map.

What is a population? A population is defined as a group of individuals living and interbreeding within a given area. Members of a population often rely on the same resources, are subject to similar environmental constraints, and depend on the availability of other members to persist over time.

In an attempt to analyse the insecurity rate of a place, it is likely that where the population is high there is a high likelihood of scarcity of resources that cause a constrain in individuals’ lives in that particular area. As a result of the constraints, it is true to assume that residents in that area will rise to the occasion to request and in some cases demand to address the needs in one way or another.

In Zimbabwe, this has been the case. And as we saw in comparison of the male gender statistics and the security rate in the area, in part 1 of the project, it was false to assume that a rise in one is a direct proportion to the increase in the other. In fact according to the population statistics from the Inter-Censal Demographic Survey, 2017 the statics showed an indirect proportional correlation between the two variables.

To map the comparison of the population against the crime count in the provinces below is the map analysis.

##### The Data

I opted to analyze the 2018 ACLED crime statistics against the population survey statistics carried out in 2017. I, therefore, first subsetted the 2018 ACLED data. I then performed an aggregate analysis on the same data to find the provincial count of insecurity incidents that occurred in each province then renamed the data frame columns.

I proceeded to download the Inter-Censal Demographic Survey, 2017 research report. I then copied and pasted the summary population statistics table in each province from the pdf, copied it to MsWord as text, transformed the data into a table before pasting it to MsExel then importing it to R from the working directory. Below is the data in the report.

Without the Totals, I then merged the population data with the aggregated provincial insecurity rate count and made individuals columns of each in the level_1 spatial data. Finally, I mapped the data using `cartography` package.

#### The map

Output:

From the map, it is clear to see that majority of the provinces did record low insecurity counts and have medium population counts. Harare has the highest population and also has recorded the highest insecurity rate proving our assumption concerning insecurity and crime rates to be positively correlated for that particular province. Still, this is subject to hypothetical testing with other variables in play.

#### How would an analysis of the Insecurity rate Midlands province in Zimbabwe look like?

The Midlands province is the central province of Zimbabwe. It borders the Masvingo, Matebaland North and Matebaland South provinces it being in the North. It has an area of 49,166 square kilometres (18,983 sq mi) and a population of 1,614,941 (2012). Located at a central point in the country, it contains speakers of Shona, Ndebele, Tswana, Sotho and Chewa, and various other languages. Gweru, the third-largest city in Zimbabwe, is the capital of the province.

Let’s look at the Midlands map as a fraction of Zimbabwe’s.

Map:

One may also be interested in the visuals of the map with districts included:

Code:

Map:

From the copyrights of the above two maps and the code, I have used the leaflet and the leaflet.providers packages in the plotting of the two. From the level_1(provincial) spatial data, I plotted out Midland’s provincial polygon coordinates. For the district only in Midlands, I subset the data with names of districts in Midland province. From the leaflet.providers package, I used the `Stamen.Toner `Tiles from the OpenStreetMap as a base map. The outputs are as beautiful as you can see.

##### Insecurity in Midlands

Midlands recorded the fourth-least rate of insecurity with 6.04% of the total insecurity count in Zimbabwe between the years 2017 and 2021 cumulatively. Like plotting a country’s crime rate by its provinces, we will proceed in plotting a provincial analysis by spatial visualizing the count in each district.

Code:

The first major concern is the data. Using the names of the districts in the Midlands, I made a subset from the ACELD data. I then made an aggregate data of the data and merged the data with the spatial data of the midlands districts. With the Viridis colour palettes, I plotted the statistics on a map.
Output:

The above are but some of the basic visuals in play as we spatially analyze the ACLED data. From a continent to a country, to provinces and counties, to districts and to Wards. R is not limited to help us analyse the following.

However, it will be important to note the limitation is based on the data that one may be dealing with. In our data, we have mostly dealt with continuous numeric data and categorical data. Despite the limit to these two types of variables a comprehensive analysis has come up with. It is important to note that the vastness of your analysis is not limited to the program but rather your ability to first scale through the data available and/or acquired and identify the analysis need1 and your flexibility to carry out each of the analyses identified2.

Time is a factor in any analysis. Therefore, any analyst is required to find the most trends in data within a time limit given by a client requesting such services. It is therefore not necessary to perform all kinds of analysis in the data, rather the specificity of the client needs to define a road path one should take before carrying out any study.

## Extras

Have you been keen? You noticed, we haven’t carried out a significant component in our study trivial, but far from trivial, before any map analysis and training are done on any data, one would first desire to know, how does the raw data I have, look like visually? The question at hand then calls to answer the question:

### How is the general distribution of the insecurity data in Zimbabwe?

We got ourselves lucky. With this question in mind, I choose to introduce you to the `mapview` package in plotting the following map.

#### Mapview

Mapview provides functions to very quickly and conveniently create interactive visualizations of spatial data. Its main goal is to fill the gap of quick (not presentation grade) interactive plotting to examine and visually investigate both aspects of spatial data, the geometries, and their attributes. It can also be considered a data-driven API for the leaflet package as it will automatically render correct map types, depending on the type of the data (points, lines, polygons, raster).

It will be noted that we have used leaflet in filling polygon shapes using two variables. Also, you would require more lines on code in defining facets that a colour palette would use in filling polygons. Well for the mapview package this is but as simple as:

The names of the provinces in Zimbabwe the level_1 are set as the facets used for colouring. the mapview function used by default, the Viridis colour palette is filling the respective provincial polygon. As you do note, most palettes are used in scaling the numeric continuous variable but the mapview presents a genius factor colourization technique in filling map polygons.
Here is the output for the above:

In the development of packages for code, the simpler the code is in functionality the better. You will agree that with a simple code it would be out of the blues to expect such a diverse map with detailed functionality.

The mapview package expects the longitude and the latitude to be numeric, which they are. However on the onset after importing the `Crime_in_Zimbabwe` data, R initialized them as character variables. This shows you the flexibility of `leaflet` over `mapview` as a package.

So, first, change the longitude and the latitude to be numeric with the `as.numeric()` function. Now set spatial coordinates to create a Spatial object or retrieve spatial coordinates from the Crime_in_Zimbabwe data using the `coordinates()` function. Specify the coordinate reference system for the object. Then finally vie the spatial object:

Map:

By first observation, one would know that Bulawayo and Harare are high-risk end areas. Manica area to the far east also tends to present a moderately high level of insecurity. As you will have already noted the function `mapview()`, like `leaflet()`, uses OpenStreetMaps as the default.

The second question I would like to give you an answer for using the mapview is:

### How are insecurity incident categories distributed in Zimbabwe?

Well, this is an in-depth question that is critical, especially in planning against the mitigation of risk. Zimbabwe as we have noted is a protest-rampant country, but where and when?

Code:

Output:

Violence against civilians has been covered more in Zimbabwe cumulatively in the year 2017 to 2021. Such cases would involve robbery with violence from residential places, educational institutions, highway banditry, and other criminal incidents that would involve an attack on social amenities. Like before Harare with a high population has also indicated a high level of violence against civilians.

In an attempt to show you the functionality of mapview and its role in spatial analysis, I have in a nutshell, shown you the ease it presents in analysis in both saving times and providing aesthetic maps that are eye-capturing to clients. Should you decide to consider using mapview in your analysis, here is its documentation for you.

## Application

In Kenya today, the Ministry of Lands and Physical Planning is generating a land portal called the ardhisasa that should function in land allocation. Through this portal, one will be able to easily acquire land and sell land easily. One of the aims of the portal is to reduce and if possible, by all means, eliminate the acts of fraud that have been rampant in Kenya. One of the tools that are in the portal is the geolocation and geoprocessing tools that visualize the already taken lands and those that are available. With the establishment of the system, fraudsters will be kept at bay.

## Conclusion

From geocoding to geoprocessing in spatial analysis and visualization in conjunction with data analysis, all these are building blocks, and in a project, the two would add authenticity to a project. While extensive as the analysis was we have not covered data space distribution map, three-dimensional rectangular map, time-space distribution map,  meat point map,  flow map and so much more, all of which are a handful of skills for your learning. I urge you to proceed in learning the mentioned.

Click here to access the code of the whole Part1 and Part 2 projects on GitHub.

Please let me know your thoughts and contributions in the comment section

Happy Coding!!

## A Geographic Information System with ACLED Analysis with R: Part 1

An R data analysis and spatial analysis & visualization project tracking political violence and demonstrations around the world from the ACLED database.

## How to Structure Machine Learning Projects

This article aims to help Machine Learning practitioners who are starting out to: i) Organize machine learning tasks…

## Data Science Project- PART 3

III. Machine learning using scikit-learn. It is more important to understand how an algorithm works rather than just…

## Data Science Project – Part 1.

In light of the use of Data Science as a tool in decision making, this series of articles…