Take-home Exercise 2

An interactive data visualisation featuring Singapore Demographic Trend

Authors

Affiliations

Ang Bi Lian

 

 

Published

Feb. 3, 2022

DOI

1.0 About the Task

This take-home exercise requires the application of appropriate interactivity and animation methods to design an age-sex pyramid based data visualisation to show the changes of demographic structure of Singapore by age cohort and gender between 2000-2020 at planning area level.

For this task, the data set used is the Singapore Residents by Planning Area / Subzone, Age Group, Sex and Type of Dwelling, June 2000-2010 and Singapore Residents by Planning Area / Subzone, Age Group, Sex and Type of Dwelling, June 2011-2020 The data set is obtained from Department of Statistics.

2.0 Sketch of Proposed Design

3.0 Installing and loading the required libraries

The code chunk below is used to install and load the required packages onto RStudio.

packages = c('gganimate', 'tidyverse', 'gifski', 'plotly')
for(p in packages){
  if(!require(p,character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

4.0 Importing the dataset

The source file is in csv format, hence read_csv of readr package is used to import the dataset.

Popdata2010 <- read_csv("data/respopagesextod2000to2010.csv")
Popdata2020 <- read_csv("data/respopagesextod2011to2020.csv")
glimpse(Popdata2010)
Rows: 1,040,592
Columns: 7
$ PA   <chr> "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio",~
$ SZ   <chr> "Cheng San", "Cheng San", "Cheng San", "Cheng San", "Ch~
$ AG   <chr> "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to~
$ Sex  <chr> "Males", "Males", "Males", "Males", "Males", "Males", "~
$ TOD  <chr> "HDB 1- and 2-Room Flats", "HDB 3-Room Flats", "HDB 4-R~
$ Pop  <dbl> 20, 480, 220, 80, 0, 0, 0, 0, 20, 390, 200, 90, 0, 0, 0~
$ Time <dbl> 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2~
glimpse(Popdata2020)
Rows: 984,656
Columns: 7
$ PA   <chr> "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio",~
$ SZ   <chr> "Ang Mo Kio Town Centre", "Ang Mo Kio Town Centre", "An~
$ AG   <chr> "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to~
$ Sex  <chr> "Males", "Males", "Males", "Males", "Males", "Males", "~
$ TOD  <chr> "HDB 1- and 2-Room Flats", "HDB 3-Room Flats", "HDB 4-R~
$ Pop  <dbl> 0, 10, 30, 50, 0, 0, 40, 0, 0, 10, 30, 60, 0, 0, 40, 0,~
$ Time <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2~

5.0 Data Wrangling

5.1 Joining the two datasets

After confirming that the 2 datasets, Popdata2010 and Popdata2020, contain the same variables, rbind function in Base R is used to combine both datasets into 1 dataset.

PopdataAll <- rbind(Popdata2010,Popdata2020)
glimpse(Popdata2010)
Rows: 1,040,592
Columns: 7
$ PA   <chr> "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio",~
$ SZ   <chr> "Cheng San", "Cheng San", "Cheng San", "Cheng San", "Ch~
$ AG   <chr> "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to~
$ Sex  <chr> "Males", "Males", "Males", "Males", "Males", "Males", "~
$ TOD  <chr> "HDB 1- and 2-Room Flats", "HDB 3-Room Flats", "HDB 4-R~
$ Pop  <dbl> 20, 480, 220, 80, 0, 0, 0, 0, 20, 390, 200, 90, 0, 0, 0~
$ Time <dbl> 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2~

5.2 Data Transformation

Transforming data of one gender

Drawing reference from Take-home Ex01, Pop values of Males are transformed to negative values, and binded with the Female dataset using the following code chunk:

PopdataAll_M <- PopdataAll %>%
                  filter(`Sex` ==  "Males") %>%
                  mutate (Pop = -Pop)
#
PopdataAll_F <- PopdataAll %>%
                  filter(`Sex` ==  "Females") 

PopdataAll_MF <- rbind(PopdataAll_M,PopdataAll_F)
glimpse(PopdataAll_MF)
Rows: 2,025,248
Columns: 7
$ PA   <chr> "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio",~
$ SZ   <chr> "Cheng San", "Cheng San", "Cheng San", "Cheng San", "Ch~
$ AG   <chr> "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to~
$ Sex  <chr> "Males", "Males", "Males", "Males", "Males", "Males", "~
$ TOD  <chr> "HDB 1- and 2-Room Flats", "HDB 3-Room Flats", "HDB 4-R~
$ Pop  <dbl> -20, -480, -220, -80, 0, 0, 0, 0, -20, -620, -260, -120~
$ Time <dbl> 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2~

Correcting data type of the grouping variables

The following code chunk uses factor() and as.integer() from Base R to convert the variables AG and PG to factors, and Time as integers. This step is required for the data groups to be ordered correctly in the chart (e.g. for Time, when using {frame_time} function, it will become a continuous variable with decimal values if not converted to integers).

PopdataAll_MF$AG <- factor (PopdataAll_MF$AG, levels = unique(PopdataAll_MF$AG)) 
PopdataAll_MF$PA <- factor (PopdataAll_MF$PA, levels = unique(PopdataAll_MF$PA)) 
PopdataAll_MF$Time <-as.integer(PopdataAll_MF$Time)

Deriving the frequency count by Planning Area, Time, Age Group and Sex

Next we compute the frequency count of the total population by planning area, time, age group and sex. In the code chunk below, group_by() of dplyr package is used to group the data by PA, Time, AG and Sex, and summarise() from the same package is used to count the number of residents.

PopdataAll_Grp <- PopdataAll_MF %>%
                    group_by(`PA`, `Time`, `AG`, `Sex`) %>%
                    summarise('Count'= sum(`Pop`)) %>%
                    ungroup()

head(PopdataAll_Grp)
# A tibble: 6 x 5
  PA          Time AG       Sex     Count
  <fct>      <int> <fct>    <chr>   <dbl>
1 Ang Mo Kio  2000 0_to_4   Females  4460
2 Ang Mo Kio  2000 0_to_4   Males   -4830
3 Ang Mo Kio  2000 5_to_9   Females  5880
4 Ang Mo Kio  2000 5_to_9   Males   -5890
5 Ang Mo Kio  2000 10_to_14 Females  5520
6 Ang Mo Kio  2000 10_to_14 Males   -5710

6.0 Plotting the chart

6.1 Population Pyramid by Planning Area (2000 to 2020)

As the requirement is to plot the pyramid by planning area, 4 planning areas (out of a total of 57 planning areas) are randomly selected using subset() of Base R.

print(unique(PopdataAll_Grp$PA))
 [1] Ang Mo Kio              Bedok                  
 [3] Bishan                  Boon Lay/Pioneer       
 [5] Bukit Batok             Bukit Merah            
 [7] Bukit Panjang           Bukit Timah            
 [9] Central Water Catchment Changi                 
[11] Changi Bay              Choa Chu Kang          
[13] Clementi                Downtown Core          
[15] Geylang                 Hougang                
[17] Jurong East             Jurong West            
[19] Kallang                 Lim Chu Kang           
[21] Mandai                  Marina East            
[23] Marina South            Marine Parade          
[25] Museum                  Newton                 
[27] North-Eastern Islands   Novena                 
[29] Orchard                 Outram                 
[31] Pasir Ris               Paya Lebar             
[33] Punggol                 Queenstown             
[35] River Valley            Rochor                 
[37] Seletar                 Sembawang              
[39] Sengkang                Serangoon              
[41] Simpang                 Singapore River        
[43] Southern Islands        Straits View           
[45] Sungei Kadut            Tampines               
[47] Tanglin                 Tengah                 
[49] Toa Payoh               Tuas                   
[51] Western Islands         Western Water Catchment
[53] Woodlands               Yishun                 
[55] Not Stated              Boon Lay               
[57] Pioneer                
57 Levels: Ang Mo Kio Bedok Bishan Boon Lay/Pioneer ... Pioneer

To plot the static chart, drawing reference from Take-home Ex01, the relevant codes from ggplot2 package are used.

xbrks <- seq(-300000, 300000, 5000)
xlabls <- paste0(as.character(c(seq(300, 0, -5), seq(5, 300, 5))))

p <- ggplot(subset(PopdataAll_Grp, PA %in% c("Ang Mo Kio", "Sengkang", "Tampines", "Jurong East")), 
      aes (x = AG, y = Count, fill = Sex)) +
      geom_bar(stat = "identity", width = .6 ) +
      scale_y_continuous(breaks = xbrks, labels = xlabls, name = "Population Size ('000)") +
      xlab("Age Groups") +
      coord_flip() +
      theme_bw() 

Using facet_wrap of ggplot2, we are able to plot the population pyramids of the 4 different planning areas into a 2 by 2 panel. ggplotly() is then used to convert to an interactive plot.

xbrks1 <- seq(-300000, 300000, 50000)
xlabls1 <- paste0(as.character(c(seq(300, 0, -50), seq(50, 300, 50))))
p1 <- p +  facet_wrap (~ PA, nrow = 2, ncol = 2) + 
            xlab("Age Groups") +
            scale_y_continuous(breaks = xbrks1, labels = xlabls1, name = "Population Size ('000)")

ggplotly(p1)
0_to_45_to_910_to_1415_to_1920_to_2425_to_2930_to_3435_to_3940_to_4445_to_4950_to_5455_to_5960_to_6465_to_6970_to_7475_to_7980_to_8485_to_8990_and_over200150100500501001502002500_to_45_to_910_to_1415_to_1920_to_2425_to_2930_to_3435_to_3940_to_4445_to_4950_to_5455_to_5960_to_6465_to_6970_to_7475_to_7980_to_8485_to_8990_and_over20015010050050100150200250
SexFemalesMalesPopulation Size ('000)Age GroupsAng Mo KioJurong EastSengkangTampines

While this plot using ggplotly allows interactivity, such as viewing the data for any selected data point, it does not show the changes in data over the years.

6.2 Displaying changes in population demographic from 2000 to 2020 by Planning Area

To show changes in the demographic over the years by respective planning area, functions from gganimate package is used:

p2 <- p + 
      facet_wrap (~ PA, nrow = 2, ncol = 2) +
      transition_time(Time) +
      labs(title = 'Singapore Age-Sex Population Pyramid 2000-2020 \n\nYear: {frame_time}') +
      enter_fade() +
      exit_fade() +
      ease_aes('cubic-in-out')

p2

6.3 Final Visualisation after formatting

Finally, animate() of gganimate is used to render the the chart object and finetune the animation. 24 frame per second over a duration of 20 seconds is chosen for smoother motion. The size of the object is also adjusted to 750 x 600 for better visualisation.

animate (p2, fps = 24, duration = 20, 
         width = 750,
         height = 600
        )

From the visualisation, we can easily see that across most of the planning areas, with the exception of Sengkang, Singapore’s birth rate is slowing down over the years, and the population is ageing. For Sengkang, it is one of the newer estates which sees an increasing population of younger families over the years.

As the R codes are reproducible, we can easily add in other planning areas for comparison depending on the areas of interest.

Challenges and future work

The main challenge with this take-home exercise is the amount of time given and having sufficient experience in using the correct R packages. With suitable R packages and functions, more layers can be added to the chart object depending on user needs. For instance, using crosstalk package, we can have a drop down list to allow users to select the planning area, and a slider for users who prefer to manually filter the planning area and year.

References

Back to homepage