A Geographic Information System with ACLED Analysis with R: Part 1

An R data analysis and spatial analysis & visualization project tracking political violence and demonstrations around the world from the ACLED database.

Introduction

Security has become a universal issue, and it is now an integral element of making important decisions. According to Sourya Biswas, it is critical in business to understand that the role of security is to “meet the needs of the company,” and the failure to conduct a risk assessment places a business in a constant obligation to operate in a preventative mode, which affects its ability to meet business goals, and security is perceived as a sunk cost. As he puts it, from corporate investment and establishment to settlement and relocation decisions in the housing field to school and social facilities establishment, the security of a location is critical.

In this project, we seek the technical “know-how” of assessing the security of a location, such as a town, city, province, or country. It is now possible to verify a security assessment with access to security, data and ACLED is here for our access. The Armed Conflict Location & Event Data Project (ACLED) is a disaggregated data collection, analysis, and crisis mapping project that tracks political violence and demonstrations worldwide. With easy access to their database, it is now possible to assess the claims of a risk of the desired location.

R has made it simple for us. Accessing security data is now simple thanks to an API from ACLED via the acled.api package. ACLED currently covers Africa, Latin America and the Caribbean, the Middle East, South Asia, Southeast Asia, East Asia, Central Asia, and the Caucasus, Europe, and the United States by registering security incidents as they occur in real-time. With the careful collecting of data from diverse places, risk level comparison is now achievable. R, as a statistical tool, will assist us in visualizing these data. We will draw bar charts, line charts, and histograms of aggregated data using packages like plotly and ggplot, which will aid in presenting the narrative of how notable places are likely to suffer a security threat.

It is worth noting that ACLED is a disaggregated database. As a result, the entries are documented as real-time occurrences. As a result, it is trustworthy in conveying a narrative that the data reveals when aggregated. It will be simple to aggregate data for visualization using tools such as tidyverse, tibble, and janitor. The aggregated security data concerning location is related to local security forces, employment, and population. With such forces at play, building a predictable model for mitigation measures to promote accessibility and investment in a location is possible.

How about doing some spatial analysis? The location coordinates of security incidents are documented in ACLED data. The widespread use of R has made it easier to plot these places. We will not only translate the bars and line visuals information on the charts using the R spatial packages we seek to explore in Part 2 of this project, the cartography, ggplot, and the leaflet, but I will also show you different methods of how to improve your visuals with different stylistic methods. Let’s get to it.

Setting Up the Environment

We will download and upload the required packages as we seek to integrate the ACLED data, do feature engineering, add required variables to ease our task while conducting a comprehensive analysis, and lastly visualize our data:

The above packages help in the following:

  • acled.api: The package makes it easy to retrieve a user-defined sample (or all of the available data) of ACLED, enabling seamless integration of regular data updates into the research workflow. It requires a minimal number of dependencies.
  • dplyr: This package has a primary set of functions designed to enable dataframe manipulation in an intuitive way.
  • tibble: The package helps us keep relevant features and drop those unnecessary ones.
  • janitor: The packages functions to perfectly format data.frame column name providing quick counts of variable combinations (i.e., frequency tables and crosstabs); and isolate duplicate records.

Import ACLED Data

Having downloaded and loaded the acled.api package we can now download the ACLED data. For our case, we shall download the Zimbabwe data:

To get the email.address and the access.key, register, and subscribe to the ACLED services for a subscription fee, after which the credentials will be given. The country we desire is Zimbabwe, and the data we seek to analyze is from 01 January 2017 to 01 July 2021. These have been stated as the start and the end date, respectively. The API has set the standard columns returned for more columns to be produced though one will have to specify them. For our case, the latitude, longitude, geo_precision, and time_precision. Click to read and understand what each column represents.

Once the data has been imported you will receive a prompt stating:

Your ACLED data request was successful.
Events were retrieved for the period starting 2005-01-01 until 2021-06-28.

Our data is stored in the Crime_in_Zimababwe.

Feature Engineering

With that data in place, we will need to extract the time from the Unix time automatically generated by the API at the entry of the incident. From the data column, extract the dow (day of the week) and the month of the year:


The desired columns help generate a time series trend in the monitoring of incident rates; per month or/and per year.

Finally, attach the data to R to ease searching by R when evaluating a variable in the data; objects in the database can be accessed by simply giving their names.

Visualization of ACLED Data

We seek spatial visualizations. Therefore, we will focus more on visual representations of when, where, and how many incidents occurred in a region. This is called exploratory analysis of data.

What is Exploratory Analysis?

Exploratory Data Analysis (EDA) is a way of evaluating datasets to summarize their primary characteristics, typically utilizing visual methods to examine and investigate data sets and characterize their fundamental aspects, generally using data visualization approaches. EDA is used before modelling to see what the data can tell us.

It isn’t easy to detect essential data properties by looking at a column of numbers or an entire spreadsheet. While basic aggregation tables of descriptive statistics add justice to the process, the graphics simplify data interpretation. It can be time-consuming, uninteresting, and/or daunting to extract insights from raw numbers. In this case, exploratory data analysis approaches have been developed to help.

EDA assists in determining how to modify data sources best to achieve the answers you need, making it easier for data scientists to detect patterns, identify anomalies, test hypotheses, and validate assumptions. EDA is largely used to discover what data can reveal beyond the formal modelling or hypothesis testing tasks, and it provides a deeper understanding of data set variables and their interactions. It can also assist you to assess whether the statistical techniques you’re thinking about using for data analysis are appropriate. A statistical model may or may not be utilized, but the primary goal of EDA is to explore what the data can tell us beyond the formal modelling or hypothesis testing task.

It is best to analyze the data first and then strive to glean as many insights as possible from it. In fact, an intelligent data analyst and/or scientist will spend more time understanding what the data variables signify. One will also try to figure out how variables in the data correlate with one another, and how strong or weak variables are related to one another. It has been proved that while EDA is all about making sense of the data at hand, those who conduct accurate analysis at this stage are likely to spend less time constructing efficient models.

The following techniques were utilized in the study of the ACLED: barplots, histograms, scatterplots and line graphs, and piecharts. The questions will assist us in addressing the target interests of the visuals we wish to display. Let’s get started.

1. How many security incidents occurred per location recorded?

As the question suggests, we will first modify our data to produce the count of each unique location by the table() function and store the aggregated data in a variable Location. We will then change the column names of the data frame before requesting the first ten locations with the highest rate of security occurrences. This is done as follows:

Now from the plotly package, we will plot the above-aggregated data by each location against the count of security occurrences:

Output:

As observed in the following plot, Harare has consistently recorded the highest levels of insecurity from 2017 to 2021 combined. Various indices can explain this. Harare is the capital city of Zimbabwe and the administrative centre of the Harare province. Any city or country will be known for advanced growth, and many people believe that greener pastures can be found there. As a result, we anticipate a higher population density in Harare than in any other city in Zimbabwe, as evidenced by the 2017 census. Like the population migration to the city, many end up jobless in search of greener pastures, and many may resort to criminal acts in search of bread.

While these are likely causes, other factors also contribute to a sense of insecurity in a location. Individualist and collectivist attitudes impacted by education, drug and alcohol misuse, political activities, the number of police stations, and law enforcement are just a few of the numerous aspects that can be evaluated to explain insecurity in a location.

Because of the high level of criminal activity in Harare, If one was creating a predictive model for criminal activity, we might meet outliers in the data during the aggregation process, forcing the removal of this data. The outliers package in R will assist in identifying not just the outlier but also the significant ones that will necessitate the replacement or removal of those items. Outliers in R are best studied visually using the boxplot, which is effective in outlier analysis.

2. How many security incidents occurred per province?

The provinces in the data of each incident recorded have been registered in the admin1 column. As the location data, one will first register aggregate counts of each of the first 10 provinces in the Admin1 variable, then plot the data on a horizontal bar plot showing the percentages of each provincial incident occurrence:

Harare, as a province, has an equivalently high level of insecurity. As previously indicated, this alluded to Harare’s city high level of insecurity. Harare province also includes the cities of Chitungwiza and Epworth. They, too, appear in the top ten locations with high insecurity rates, as noted in the location analysis. The aggregate of the three places Harare province as the top province in terms of insecurity.

Bulawayo as a location is found in Bulawayo province, Mt.Darwin and Bidura in Mashonaland Central, Gweru and Mbare in the Midlands, and Masvingo in Masvingo. While these main cities have relatively high insecurity rate numbers in their respective provinces, they come after the Manicaland, Mashonaland East, and West provinces in insecurity ranking. Zimbabwe has 10 provinces in total and Matebaland North is the least in insecurity activities.

3. How many incidents were registered in each month? What is the count of each incident category in each month?

Times and seasons have always been a factor of consideration in the analysis of criminal activity. Schedules of public holidays, political activities, climate and weather, sowing and harvesting seasons, and other seasons tend to determine the levels of insecurity in a place.

Seasons, which are determined in time, have an impact on all industries. In the financial sphere, significant insecurity rates are expected when cash transfers from businesses to banks occur regularly. While one fiscal year ends and another begins, protestors in many parts of the country are likely to spread their demands for improved working conditions. National elections are an important political concern in every country, particularly in Africa. This is due to corruption and splinter groups, which tend to corrupt the process in favour of a candidate, resulting in retaliation from the opposing party. Consider also the harvesting season in the agricultural field, when bandits attempt to flee with farmers’ produce while in transit.

From the above examples, time is a factor in insecurity analysis and would be beneficial in planning to mitigate risk. Time units are expressed in hours of the day, days of the week, months of years and so forth. For our case, we will proceed with the monthly units.

Considering that we are in July(as of the date of writing this project), we will seek to plot the number of incident counts from January to July for the relevant crime categories. For our case, this will be the Abduction/forced disappearance, Mob violence, Looting/property destruction, Peaceful protest, Violent demonstration categories.

  • Subset data for the security incident categories:
  • Subset data for January to July:
  • Aggregate data by count using the tabyl() function from the janitor package of each month against each crime category, then change the names of the columns:

    The data should eventually look as below.
  • Find the percentage of each count of incident category by month(row-wise):
  • Finally plot the grouped bar chart of this:

    Output:

From the year 2017 to 2021, cumulatively, April registered the highest number of insecurity incidents. Peaceful protests were recorded as the highest number of insecurity incidents, followed by a tie in the violent demonstration and Mob violence. Abduction and property destruction come recording the least cases. In most cases, a rise in violent demonstrations could eventually lead to property destruction and mob violence as well. Take, for example, the protests in South Africa that were sourced from the arrest of former President Jacob Zuma. Also, not so long in Kenya, the 2007 post-election violence involved mob violence conjointly with property destruction and mob violence.

In Zimbabwe, statistically, from the six months, peaceful demonstrations come as the talk of the day with high numbers in each month. For a statistician, this would make fine research to document.

4. How many incidents were recorded per each security category?

We will use a pie chart to visualize the following. The crime categories and their subcategories are defined in the event type and the sub_event_type columns, respectively. We will seek to define the percentages of each sub_event_type.

From the ACLED documentation in describing the data, the event_type describes the core categories in recording events in the database. The sub_event_types come as subcategories of the core reporting categories. As you would see to the left the categories and the subcategories have been described effectively. However, it is important to note that these are not the standard categorisation of events and incidents of concern. From one firm to another having databases these categories are subjected to change based on the definitions given to each incident. The differences may confuse any data analyst sourcing information from various databases. Therefore, it is imperative for one, after sourcing, to carefully feature his data according to the analysis one is interested in doing.

To answer our question, first subset the data by the sub_event_type categories you desire to plot. Using the table function, count the number of incidents in each category and finally plot the pie chart of the count:

Output:

As initially noted, the peaceful protests have come as a major highlight in Zimbabwe cumulatively since 2017 with a 43.8% rate of all insecurity events registered. Mob violence follows with 23.9%, violent demonstration with 16.7%, property destruction with 9.25%, and finally armed clashes with a 6.66% rate.

Zimbabwe, according to the Human Rights Watch (HRW) World Report 2021, continues to face protests due to the abuses, ill-treatment, and torture of both citizens and leaders in an attempt to air the discrepancies with the ruling methods of the current governing authorities, lack of accountability, exploration of children rights and the current mishandling of COVID-19 vaccines that have been made inaccessible to most of the citizens.

5. How many incident counts were recorded in each month compared to the year 2019, 2020 & 2021?

For the request above, we will use the plotline to address it. Using years as the units of time and plotly as the mapping tool, generally plotly will have us plot a scatter plot and then use the "line+markers" mode in initalising a line graph. The following are the steps is visualizing the question above:

  • Subset the data aggregating each month against each year. Thereafter, order the data according to the chronological order of appearance of months:
  • Now plot the line chart of the months against incident counts faceted by years:

    Output:

In 2019, Zimbabwe recorded a striking number of insecurity incidents and a sharp decrease in February and a steady trend noted between March and September before the numbers began to rise again.

According to prominent news sources such as the British Broadcasting Corporation (BBC), Voice of Africa (VOA), The Relief Web, and GardaWorld, 2019, January saw a significant increase in protests around the country as a result of a 130 per cent increase in fuel costs. The protests began on Monday, January 14, 2019. Despite the government’s attempt to restrict social media sites, news of a violent crackdown circulated across the country.

Aside from the fuel protest, it was also hinted that the opposition leader of the Movement for Democratic Change (MDC) had taken advantage of the situation by recruiting citizens for riots against the results of the previous year’s elections. According to a statement, the MDC leadership had been persistently promoting the concept that they will utilize violent street action to reverse the results of the 2018 poll. President Emmerson Mnangagwa was declared the winner of the elections. In an interview with the state-owned Sunday Mail newspaper, presidential spokesperson George Charamba blamed the opposition for the violence that has followed the protests.

At least 12 people were killed during the protests, including civil resistance, demonstrations, protest marches, riots, and picketing. The protests came to an end on January 17, 2019.

Competition within and across political parties in Zimbabwe has led to intense bouts of ‘eliminationist’ violence. This form involves competitors from local to national scales engaging in violence to limit opposition, secure dominant positions and capture rent-seeking opportunities.

6. How many incident event_types were recorded from the year 2016 – 2020?

Like the monthly count of every event visualization, the question seeks to know what types of events were recorded in each year. For some organizations reports of various kinds vary from daily, weekly, monthly, and yearly. Yearly reports are vast in trends and predictions of the previous years and expectations can be set for the following year. Yearly trends monitored over time are what help data scientists developed models. With data, factors that affect the business can be determined and incorporated in developing a model that will help in the minimization of cost and increase revenue.

Zimbabwe data from the ACLED data has been recorded from the year 1997. Through the governing authorities, the country has since made advancements in the security field and addressed political, social-economical, health care, education, and the media fields to avoid rubbing shoulders with any of the actors in those fields. Should any of the fields be left unattended to or prioritization be shown to another while the other is neglected, this then would have been the cause of an uprising.

This being the last visual, I will give you a challenge in defining what is happening in the code. However, I will gist you that it is a faceted bar chart.
Code:

Output:

From 2016 to 2018, there has been a consistent rise in the sum of insecurity cases. Thereafter the country registered a consistent decrease to 2016. Explosions in the country recorded the lowest count, followed by battles. Riots, violence and protests registered the highest number of cases in the country.

From 2016 to 2021, protests have been on the rise in Zimbabwe. These, then by statistics, are the expressions of dissatisfaction of citizens in general of the poor service delivery in the country. Based on the current analysis, it will also be true to expect more riots to rise in the future. As invoked earlier, it is very possible for violent protests to be incurred conjointly with mob violence and property destruction. With this in view, it is expected that security enforcements are likely to be boosted to counter the rising protests.

Question: The males mostly participate in protests. Is an increase in the number of men equate to an increase in insecurity in an area?

In picking the population statistics from 2017, the data had population statistics for each of the 10 provinces. It also had the males percentages of each of the provincial population. The data was then merged with the aggregated sums of each province, and a plot was made:

Output:

The plot is such that the crime rate was plotted against the male count per province. The sizes of the plots were equated to the provincial population. From the observations:

  • As the males’ increase, the crime counts increase
  • It is also noted that areas with the least population had the most significant number of males and registered the least number of crimes.

It is true to make an assumption that males participate in protests and crimes. It was true to assume that an increase in males would lead to an increase in the number of criminal activities. However, from the statistics shown all these assumptions are revoked. It could be speculated that all people do participate in the riots. But this is just speculation. This is also a statistic worth checking out.

From the above exploratory data analysis done above, it is clear that more can be drawn from data and misconceptions corrected. With the above analysis from the EDA visuals, we now have enough information drawn that can be translated into our Spatial Analysis and Visualization, which we will cover in Part 2 of this series.

Check out Part 2 of this series.

References

  1. Package acled.api
  2. Plotly
0 Shares:
You May Also Like
Read More

APIs: The Web’s Legos

Over the past two years interacting with developers and businesses, one question found its way whenever conversations begin;…
Read More

Date & Time Analysis With R

One will learn how to analyse dates and time; reproducing date-related columns from a date and also format dates and time. In application, one will learn how to plot simple and multiple time series data using the R language.
Read More

Data Science Project- Part 2.

Data Pre-processing II. Data Preprocessing using Pandas: From part 1 of this series, https://developers.decoded.africa/data-science-project-part-1/, we focused on web…