author: ‘Akash’ date: ‘15 December 2017’
美國2017年發生了大規模的槍擊案件,此分析可能為移民,槍支法,精神健康問題和吸毒問題提供信息。但這個分析僅限於大規模射擊,其他犯罪,如武器,殺人和爆炸等都沒有討論。
In this Project, The USA mass shooting dataset is used to analyze information about the shooter with respect to Race, States, Causes and Targets.
library(data.table)
library(shiny)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:data.table':
##
## hour, isoweek, mday, minute, month, quarter, second, wday,
## week, yday, year
## The following object is masked from 'package:base':
##
## date
library(plotly)
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(tidyverse)
## -- Attaching packages -------------------------------------------- tidyverse 1.2.1 --
## √ tibble 1.4.2 √ purrr 0.2.5
## √ tidyr 0.8.1 √ dplyr 0.7.6
## √ readr 1.1.1 √ stringr 1.3.1
## √ tibble 1.4.2 √ forcats 0.3.0
## -- Conflicts ----------------------------------------------- tidyverse_conflicts() --
## x lubridate::as.difftime() masks base::as.difftime()
## x dplyr::between() masks data.table::between()
## x lubridate::date() masks base::date()
## x dplyr::filter() masks plotly::filter(), stats::filter()
## x dplyr::first() masks data.table::first()
## x lubridate::hour() masks data.table::hour()
## x lubridate::intersect() masks base::intersect()
## x lubridate::isoweek() masks data.table::isoweek()
## x dplyr::lag() masks stats::lag()
## x dplyr::last() masks data.table::last()
## x lubridate::mday() masks data.table::mday()
## x lubridate::minute() masks data.table::minute()
## x lubridate::month() masks data.table::month()
## x lubridate::quarter() masks data.table::quarter()
## x lubridate::second() masks data.table::second()
## x lubridate::setdiff() masks base::setdiff()
## x purrr::transpose() masks data.table::transpose()
## x lubridate::union() masks base::union()
## x lubridate::wday() masks data.table::wday()
## x lubridate::week() masks data.table::week()
## x lubridate::yday() masks data.table::yday()
## x lubridate::year() masks data.table::year()
library(readr)
library(tidyr)
library("leaflet")
#格式化日期
Dateformat = cols(Date = col_date(format = "%m/%d/%Y"))
#讀取csv檔
Mass_Shootings_Dataset_Ver_5 <- read_csv("../week3/us-mass-shootings-last-50-years/Mass Shootings Dataset Ver 5.csv"
, col_types = Dateformat)
#將數據分配到數據框
Crime <- data.table(Mass_Shootings_Dataset_Ver_5)
#Dimenstions of the dataset
dim(Crime)
## [1] 323 21
#Headers of the dataset
head(Crime)
## S# Title Location
## 1: 1 Texas church mass shooting Sutherland Springs, TX
## 2: 2 Walmart shooting in suburban Denver Thornton, CO
## 3: 3 Edgewood businees park shooting Edgewood, MD
## 4: 4 Las Vegas Strip mass shooting Las Vegas, NV
## 5: 5 San Francisco UPS shooting San Francisco, CA
## 6: 6 Pennsylvania supermarket shooting Tunkhannock, PA
## Date Incident Area
## 1: 2017-11-05 Church
## 2: 2017-11-01 Wal-Mart
## 3: 2017-10-18 Remodeling Store
## 4: 2017-10-01 Las Vegas Strip Concert outside Mandala Bay
## 5: 2017-06-14 UPS facility
## 6: 2017-06-07 Weis grocery
## Open/Close Location Target Cause
## 1: Close random unknown
## 2: Open random unknown
## 3: Close coworkers unknown
## 4: Open random unknown
## 5: Close coworkers <NA>
## 6: Close coworkers terrorism
## Summary
## 1: Devin Patrick Kelley, 26, an ex-air force officer, shot and killed 26 people and wounded 20 at a church in Texas. He was found dead later in his vehicle.
## 2: Scott Allen Ostrem, 47, walked into a Walmart in a suburb north of Denver and fatally shot two men and a woman, then left the store and drove away. After an all-night manhunt, Ostrem, who had financial problems but no serious criminal history, was captured by police after being spotted near his apartment in Denver.
## 3: Radee Labeeb Prince, 37, fatally shot three people and wounded two others around 9am at Advance Granite Solutions, a home remodeling business where he worked near Baltimore. Hours later he shot and wounded a sixth person at a car dealership in Wilmington, Delaware. He was apprehended that evening following a manhunt by authorities.
## 4: Stephen Craig Paddock, opened fire from the 32nd floor of Manadalay Bay hotel at Last Vegas concert goers for no obvious reason. He shot himself and died on arrival of law enforcement agents. He was 64
## 5: Jimmy Lam, 38, fatally shot three coworkers and wounded two others inside a UPS facility in San Francisco. Lam killed himself as law enforcement officers responded to the scene.
## 6: Randy Stair, a 24-year-old worker at Weis grocery fatally shot three of his fellow employees. He reportedly fired 59 rounds with a pair of shotguns before turning the gun on himself as another co-worker fled the scene for help and law enforcement responded.
## Fatalities Injured Total victims Policeman Killed Age Employeed (Y/N)
## 1: 26 20 46 0 26 NA
## 2: 3 0 3 0 47 NA
## 3: 3 3 6 0 37 NA
## 4: 59 527 585 1 64 NA
## 5: 3 2 5 0 38 1
## 6: 3 0 3 NA 24 1
## Employed at Mental Health Issues Race Gender Latitude
## 1: <NA> No White M NA
## 2: <NA> No White M NA
## 3: Advance Granite Store No Black M NA
## 4: <NA> Unclear White M 36.18127
## 5: <NA> Yes Asian M NA
## 6: Weis grocery Unclear White M NA
## Longitude
## 1: NA
## 2: NA
## 3: NA
## 4: -115.1341
## 5: NA
## 6: NA
#Preprocessing Race
Crime[Race=="unclear",Race:="Unknown"]
Crime[is.na(Race),Race:="Other"]
Crime[ Race=="black" | Race=="Black American or African American" | Race=="Black American or African American/Unknown",Race:="Black"]
Crime[Race=="white" | Race=="White American or European American" | Race=="White American or European American/Some other Race" ,Race:="White"]
Crime[Race=="Asian American"| Race=="Asian American/Some other race" ,Race:="Asian"]
Crime[Race=="Unknown"|Race=="Two or more races" | Race=="Some other race",Race:="Other"]
#Preprocessing Date
Crime[,Month:=as.factor(month(Date))]
Crime[,Day:=as.factor(day(Date))]
Crime[,Year:=as.factor(year(Date))]
#Preprocessing Gender
Crime[Gender=='M',Gender:="Male"]
Crime[Gender=='M/F',Gender:="Male/Female"]
Crime[is.na(Gender),Gender:="Unknown"]
#Preprocessing Cause
Crime[is.na(Cause)|Cause=="unknown",Cause:="Unknown"]
Crime[Cause=='domestic dispute'|Cause=='domestic disputer', Cause:='domestic dispute']
Crime[Cause=="anger"|Cause=="frustration",Cause:="Anger and Fustration"]
Crime[Cause=="failing exams" | Cause=="suspension", Cause:="Failing exams and suspension"]
#Preprocessing Target
Crime[is.na(Target),Target:="Unknown"]
Crime[Target=='Family'|Target=='Family/Neighbors'|Target=='Family+students'|Target=='Coworkers'|Target=='coworkers'
|Target=='Students+Parents'|Target=='Family+random'|Target=="partner's family"|Target=="Coworker's Family"
|Target=='neighbors'|Target=='Girlfriend'|Target=="House Owner"
|Target=="Friends",Target:="Family,Girlfriend(Relationships at that time)"]
Crime[Target=='Children'|Target=='school girls'|Target=='Students'
|Target=='Students+Teachers'|Target=='Teachers',Target:="School(Teachers or Students)"]
Crime[Target=='Ex-Girlfriend'|Target=='Ex-Wife & Family'|Target == 'Ex-Girlfriend & Family' |Target == 'Ex-GirlFriend'
|Target=='Ex-Girlfriend+random'|Target=='Ex-girlfriend'|Target=="Ex-Coworkers"|Target=='Ex-Wife'|Target=='Ex-Girlfriend', Target:="Girlfriend/Wife/coworkers ( past relationships)"]
Crime[Target=='Sikhs'|Target=='monks'|Target=='prayer group',Target:="Motives against religion"]
Crime[Target=="Marines"|Target=="Policeman+Council Member"| Target=="police"|
Target=="Policeman"|Target=="Trooper"|Target=="Social Workers", Target:="Marines,Police and Social Workers(Police Brutality)"]
Crime[Target=="birthday party bus"|Target=="party guests"|Target=="uninvited guests", Target:="Parties"]
#Preprocessing Mental Health Issues
Crime[`Mental Health Issues`=="unknown",`Mental Health Issues`:="Unknown"]
#Preprocessing Age
Crime[is.na(Age),Age:=0]
NewAge <-
sapply(Crime$Age,function(x){
if(x>=10 && x<20){
"10+ Teen" }
else if(x>=20 && x<30){
"20's"}
else if(x>=30 && x<40){
"30's"}
else if(x>=40 && x<50){
"40's"}
else if(x>=50 && x<60){
"50's"}
else if(x>=60 && x<70){
"60's"}
else if(x>=70 && x<80){
"70's"}
else if(x>80){
"Multiple Shooters"}
else{
"Age Unknown"}
})
#Preprocessing State
Crime$State <- sapply(Crime$Location, function(parts)
{
temp <- strsplit(parts, split = ",")
sapply(temp, function(new)
{
new[2]
}
)
})
Crime$City <- sapply(Crime$Location, function(parts)
{
temp <- strsplit(parts, split = ",")
sapply(temp, function(new)
{
new[1]
}
)
})
Crime[is.na(`State`), `State`:="Unknown"]
Crime[`State`==' CA'| `State`== ' San Diego'| `State`==" LA" |`State`== " California", `State`:="California"]
Crime[`State`==' NV'| `State`==" Nevada",`State`:="Nevada"]
Crime[`State`==' CO'| `State` == " Colorado",`State`:="Colorado"]
Crime[`State`==' Virginia'|`State`==" Virginia",`State`:="Virginia"]
Crime[`State`==" TX"| `State` == " Texas",`State`:="Texas"]
Crime[`State`==" MD",`State`:="Maryland"]
Crime[`State`==" PA"|`State`==" Lancaster"|`State`==" Souderton",`State`:="Pennsylvania"]
Crime[`State`==" WA"|`State`==" Washington",`State`:="Washington"]
#Preprocessing Map Data
KnownLocations = Crime %>% drop_na(Latitude) %>% drop_na(Longitude)
colmap <- colorFactor(topo.colors(9), Crime$Race)
colgen <- colorFactor(topo.colors(6), Crime$Gender)
Data after preprocessing
table(Crime$Race)
##
## Asian Black
## 18 85
## Latino Native American or Alaska Native
## 5 3
## Other White
## 68 144
table(Crime$`Mental Health Issues`)
##
## No Unclear Unknown Yes
## 93 13 111 106
table(Crime$Gender)
##
## Female Male Male/Female Unknown
## 5 292 5 21
table(Crime$Day)
table(Crime$Month)
table(Crime$Year)
table(NewAge)
## NewAge
## 10+ Teen 20's 30's 40's
## 38 43 36 41
## 50's 60's 70's Age Unknown
## 11 3 1 145
## Multiple Shooters
## 5
table(Crime$State)
##
## Alabama Alaska Albuquerque Arizona
## 9 1 1 11
## Arkansas Connecticut Florida Georgia
## 2 3 20 13
## Hawaii Idaho Illinois Indiana
## 1 1 9 1
## Iowa Kansas Kentucky Louisiana
## 1 4 5 4
## Maine Massachusetts Michigan Minnesota
## 1 4 6 4
## Mississippi Missouri Montana Nebraska
## 4 3 2 3
## New Jersey New Mexico New York North Carolina
## 3 2 10 11
## Ohio Oklahoma Oregon Pennsylvania
## 10 5 4 6
## South Carolina South Dakota Tennessee Utah
## 5 1 5 2
## Vermont West Virginia Wisconsin Wyoming
## 1 1 9 1
## California Colorado Maryland Nevada
## 32 7 1 6
## Pennsylvania Texas Unknown Virginia
## 3 18 46 6
## Washington
## 15
table(Crime$Cause)
##
## Anger and Fustration breakup
## 62 1
## domestic dispute drunk
## 13 1
## Failing exams and suspension psycho
## 4 68
## racism religious radicalism
## 6 1
## revenge robbery
## 10 1
## terrorism unemployement
## 65 10
## Unknown
## 81
table(Crime$Target)
##
## basketball players
## 1
## black men
## 1
## club members
## 1
## Congresswoman
## 1
## Contestant
## 1
## drug dealer
## 1
## Family,Girlfriend(Relationships at that time)
## 73
## Girlfriend/Wife/coworkers ( past relationships)
## 25
## hunters
## 1
## lawyers
## 1
## Marines,Police and Social Workers(Police Brutality)
## 13
## Motives against religion
## 3
## Parties
## 10
## postmaster
## 1
## protestors
## 1
## psychologist+psychiatrist
## 1
## random
## 140
## rapper+random
## 1
## School(Teachers or Students)
## 39
## TSA Officer
## 1
## Unknown
## 5
## welding shop employees
## 1
## women
## 1
根據資料,加州、德州及佛羅里達州都有大規模槍擊,作者認為這些州的共通點都有海岸及炎熱的天氣,「可能」與槍擊案發生的原因有關
plot_ly(data = Crime
,type = 'histogram'
,x = ~`State`) %>%
layout(title = "Number of incidents by State"
, xaxis = list(title = "Year"))
資料中沒有出現內華達州,表示該州並沒有發生太多事件
plot_ly(data = Crime[,.('Total victims'= sum(`Total victims`)),by=.(`State`)]
,type = 'bar'
,x = ~`State`
,y = ~`Total victims`) %>%
layout(title = "Total victims by State"
, barmode = 'stack'
, xaxis = list(title = "State")
, yaxis = list(title = "Victims")
, legend = list(x = 0, y = 1)
, hovermode = 'compare')
KnownLocations %>%
leaflet() %>%
addTiles() %>%
addProviderTiles("OpenStreetMap.HOT")%>%
addMarkers(lng = ~Longitude, lat = ~Latitude,clusterOptions = markerClusterOptions(), label=~Location)
加州,伊利諾伊州和阿拉巴馬州有女性射手。
plot_ly(data = Crime
,type = 'histogram'
,x = ~`State`
,color =~`Gender`) %>%
layout(title = "Total incidents by State with respect to Gender"
, showlegend = T
, barmode = 'stack'
, xaxis = list(title = "State")
, legend = list(x = 1, y = 1)
, hovermode = 'compare')
資料顯示,槍手之中有精神疾病的在加州最多,其次是華盛頓。
心理健康問題的原因很多,在學校欺凌,沒有達到社會標準,容易獲得影響大腦獎勵制度的藥物,以及社交媒體(當人們停止互相交流並失去與現實的聯繫時)。
plot_ly(data = Crime
,type = 'histogram'
,x = ~`State`
,color =~`Mental Health Issues`) %>%
layout(title = "Total incidents by State with respect to Mental Health"
, showlegend = T
, barmode = 'stack'
, xaxis = list(title = "State")
, legend = list(x = 1, y = 1)
, hovermode = 'compare')
#### 精神健康問題的散點圖
plot_ly(data = Crime
,type = 'scatter'
,mode = 'markers'
,hoverinfo = 'text'
,x = ~Year
,y = ~State
,size = ~`Total victims`
,color = ~`Mental Health Issues`
,colors = c('Green','Black','Blue','Red')
,text = ~paste("Location: ", Location
,'\n Date: ', Date
,'\n Total victims : ', `Total victims`
,'\n Race: ', Race)) %>%
layout(title = "Mass Shootings in US with year and state"
, xaxis = list(title = "Years")
, yaxis = list(title = "States"))
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
例如,加州的白人射手數量最多
plot_ly(data = Crime
,type = 'histogram'
,x = ~`State`
,color =~`Race`) %>%
layout(title = "Total victims by State with respect to Race"
, showlegend = T
, barmode = 'stack'
, xaxis = list(title = "State")
, legend = list(x = 1, y = 1)
, hovermode = 'compare')
plot_ly(data = Crime
,type = 'histogram'
,x = ~`State`
,color =~`NewAge`) %>%
layout(title = "Total incidents by State with respect to Age"
, showlegend = T
, barmode = 'stack'
, xaxis = list(title = "State")
, legend = list(x = 1, y = 1)
, hovermode = 'compare')
例如,加州擁有最多精神病患者以及會犯下恐怖行動的槍手
plot_ly(data = Crime
,type = 'histogram'
,x = ~`State`
,color =~`Cause`) %>%
layout(title = "Total incidents by State with respect to Cause"
, showlegend = T
, barmode = 'stack'
, xaxis = list(title = "State")
, legend = list(x = 1, y = 1)
, hovermode = 'compare')
學校和家庭,是恐怖槍手所鎖定的最佳目標之一
作者認為可能有牽涉到心理健康問題,當時位在喬治亞的7名槍手犯下槍擊案的目標也是鎖定家人、女朋友以及同事等等。
plot_ly(data = Crime
,type = 'histogram'
,x = ~`State`
,color =~`Target`) %>%
layout(title = "Total incidents by State with respect to Target"
, showlegend = F
, barmode = 'stack'
, xaxis = list(title = "State")
, hovermode = 'compare')
藉由這次的學習,第一次透過資料來達到分析數據的體驗