Sunday, June 28, 2015

Deaths in the Netherlands by cause and age

I downloaded counts of deaths by age, year and mayor cause from the Dutch statistics site. In this post I do some plots to look at causes and changes between the years.

Data 

Data from CBS. I downloaded the data in Dutch, hence the first thing to do was provide some kind of translation. The coding used seems slightly different from IDC-10 main categories (and has been alphabetically disordered compared to that). I used Google translate and IDC-10 to obtain the translations

Plots

Preparation

In the following I will be using both percentage of population and percentage of deaths by age cohort. The need for the percentage of deaths is because in some cohorts the percentages of deaths are much higher, thereby hiding anything happening in other cohorts. In addition I should mention that for visual purposes only the most important eight causes are used in the plots

Young

It seems that most of risks are associated with birth. In addition, these risks have steadily been decreasing. 
Looking at the age cohorts above 0 years, it seems accidents are most important. Most remarkable is a spike at 1953, which occurs for all four ages. After some consideration, I link this to the North Sea flood of 1953. It is remarkable that this is visible in the plot. It says a lot about how safe we are from accidents that it does. In the age category 15 to 20 there is also a relatively large bump during the 1970 to 1975. This is more difficult to explain, but I suspect traffic, especially the moped. A light motorcycle which preferably would be boosted to run much faster than the legal speed. 1975 saw the requirement to wear a helmet. It was much hated at the time, but in hindsight I can see that government felt compelled to do something and that it did have effect.
Looking at the plots, it seems the next big cause are Neoplasms. This is not because these become more deadly, it is because accidents are getting under control.

Elderly

For the elderly, diseases of the circulatory system are the main cause and decreasing quite a bit. The number of Symptoms and Abnormal Clinical Observations seems to decrease too. Since this seems to be a nice name for the 'other' category, this may be better diagnostics.
What is less visible is the increase in mental and behavioral disorders, especially after 1980 and at oldest age. It also seems that Neoplasms are getting lower very slowly.

Code

data reading

library(dplyr)
library(ggplot2)
txtlines <- readLines('Overledenen__doodsoo_170615161506.csv')
txtlines <- grep('Centraal',txtlines,value=TRUE,invert=TRUE) 
#txtlines[1:5]
#cat(txtlines[4])
r1 <- read.csv(sep=';',header=FALSE,
        col.names=c('Causes','Causes2','Age','year','aantal','count'),
        na.strings='-',text=txtlines[3:length(txtlines)]) %>%
    select(.,-aantal,-Causes2)
transcauses <- c(
    "Infectious and parasitic diseases",
    "Diseases of skin and subcutaneous",
    "Diseases musculoskeletal system and connective ",
    "Diseases of the genitourinary system",
    "Pregnancy, childbirth",
    "Conditions of perinatal period",
    "Congenital abnormalities",
    "Sympt., Abnormal clinical Observations",
    "External causes of death",
    "Neoplasms",
    "Illness of blood, blood-forming organs",
    "Endocrine, nutritional, metabolic illness",
    "Mental and behavioral disorders",
    "Diseases of the nervous system and sense organs",
    "Diseases of the circulatory system",
    "Diseases of the respiratory organs",
    "Diseases of the digestive organs",
    "Population",
    "Total all causes of death")
#cc <- cbind(transcauses,levels(r1$Causes))
#options(width=100)
levels(r1$Causes) <- transcauses
levels(r1$Age) <- 
    gsub('jaar','year',levels(r1$Age)) %>%
    gsub('tot','to',.) %>%
    gsub('of ouder','+',.) 

Preparation for plots

perc.of.death <- filter(r1,Causes=='Total all causes of death') %>%
    mutate(.,Population=count) %>%
    select(.,-count,-Causes) %>%
    merge(.,r1) %>%
    filter(.,Causes %in% transcauses[1:17]) %>%
    mutate(.,Percentage=100*count/Population,
        Causes = factor(Causes),
        year = as.numeric(gsub('*','',year,fixed=TRUE))
    )
perc.of.pop <- filter(r1,Causes=='Population') %>%
    mutate(.,Population=count) %>%
    select(.,-count,-Causes) %>%
    merge(.,r1) %>%
    filter(.,Causes %in% transcauses[1:17]) %>%
    mutate(.,Percentage=100*count/Population,
        Causes = factor(Causes),
        year = as.numeric(gsub('*','',year,fixed=TRUE))

    )

young

png('youngpop1.png')
tmp1 <- perc.of.pop %>% filter(.,Age %in% levels(perc.of.pop$Age)[c(1,2,11,3)],
        !is.na(Percentage)) %>%
    mutate(.,Age=factor(Age,levels=levels(perc.of.pop$Age)[c(1,2,11,3)]),
        Causes =factor(Causes)) 
# select 'important' causes (which somewhen got over 15%)
group_by(tmp1,Causes)%>%
    summarize(.,mp = max(Percentage)) %>%
    mutate(.,rk=rank(-mp)) %>%
    merge(.,tmp1) %>%
    filter(.,rk<=8) %>%
    ggplot(.,
        aes(y=Percentage,x=year,col=Causes)) +
    geom_line()+
    guides(col=guide_legend(ncol=2)) + 
    facet_wrap( ~Age ) +
    theme(legend.position="bottom")+
    ylab('Percentage of Cohort')
dev.off()
###
png('youngpop2.png')
tmp1 <- perc.of.pop %>% filter(.,Age %in% levels(perc.of.pop$Age)[c(2,11,3,4)],
        !is.na(Percentage)) %>%
    mutate(.,Age=factor(Age,levels=levels(perc.of.pop$Age)[c(2,11,3,4)]),
        Causes =factor(Causes)) 
# select 'important' causes (which somewhen got over 15%)
group_by(tmp1,Causes)%>%
    summarize(.,mp = max(Percentage)) %>%
    mutate(.,rk=rank(-mp)) %>%
    merge(.,tmp1) %>%
    filter(.,rk<=8) %>%
    ggplot(.,
        aes(y=Percentage,x=year,col=Causes)) +
    geom_line()+
    guides(col=guide_legend(ncol=2)) + 
    facet_wrap( ~Age ) +
    theme(legend.position="bottom")+
    ylab('Percentage of Cohort')
# https://en.wikipedia.org/wiki/North_Sea_flood_of_1953
dev.off()

old

png('oldpop.png')
tmp2 <- perc.of.pop %>% filter(.,Age %in% levels(perc.of.pop$Age)[18:21],
        !is.na(Percentage)) %>%
    mutate(.,Age=factor(Age),
        Causes =factor(Causes)) 
group_by(tmp2,Causes)%>%
    summarize(.,mp = max(Percentage)) %>%
    mutate(.,rk=rank(-mp)) %>%
    merge(.,tmp2) %>%
    filter(.,rk<=8) %>%
    ggplot(.,
        aes(y=Percentage,x=year,col=Causes)) +
    geom_line()+
    guides(col=guide_legend(ncol=2)) + 
    facet_wrap( ~Age ) +
    theme(legend.position="bottom")+
    ylab('Percentage of Cohort')
dev.off()
# rj.GD 
#     2 

png('oldpop2.png')
tmp2 <- perc.of.pop %>% 
    filter(.,
        Age %in% levels(perc.of.death$Age)[18:21],
        year>=1980,
        !is.na(Percentage)) %>%
    mutate(.,Age=factor(Age),
        Causes =factor(Causes)) 
group_by(tmp2,Causes)%>%
    summarize(.,mp = max(Percentage)) %>%
    mutate(.,rk=rank(-mp)) %>%
    merge(.,tmp2) %>%
    filter(.,rk<=8) %>%
    ggplot(.,
        aes(y=Percentage,x=year,col=Causes)) +
    geom_line()+
    guides(col=guide_legend(ncol=2)) + 
    facet_wrap( ~Age ) +
    theme(legend.position="bottom")+
    ylab('Percentage of Cohort')

dev.off()


  





1 comment:

  1. Great idea. One suggestion: plot log mortality, it compresses the enormous differences you see by age.

    ReplyDelete