Benini & Sons

16 minutes read


Ratings and rankings account for the majority of data generated in the course of rapid key informant needs assessments during humanitarian emergencies. Using this information to develop consolidated need prioritisation comes with challenges.

Practically speaking, the method presented here can be used for dataset that includes questions such as

what’s your first need? (select_one among different options)

what’s your second need? (select_one among different options)

what’s your third need? (select_one among different options)

This tutorial is based on the publication Priorities and preferences in humanitarian needs assessments -Their measurement by rating and ranking methods and the article Modelling rankings in R: the PlackettLuce package .

Humanitarian needs assessments seek to establish priorities for action and preferences of those whom the action will affect. Assessment teams collect data meant to capture the severity of problems and the situational aspects that matter in the evaluation of response options. Needs assessments establish priorities whereas response plans seek viable compromises between priorities and the means to address them.

When samples of stakeholders, such as key informants or relief workers, state priorities or preferences, the information chiefly produces ratings or rankings and such data are ordinal. In summarizing them, we need to avoid operations that require metric variables (such as the arithmetic mean) wich does not account for compensability; yet we will seek methods that produce aggregates with metric qualities (such as Borda counts) as these have a higher information value and offer greater flexibility for subsequent prioritisation analyses.

When aggregating rating and ranking data and about the interpretation of the resulting measures of priority and preference, two distinctions are fundamental:

  • The first concerns statistical independence: Ratings (such as of the severity of unmet needs invarious sectors), though cognitively interdependent among items, statistically are independent. Rankings reflect an order among items and are dependent in both respects.For the analysis, both types have pros and cons; they contribute the most when used in tandem. This can happen during data collection – by asking both rating and ranking questions – and/or in the analysis – such as by ranking aggregate measures from ratings.

  • Second, certain measures characterize cases (units like geographical areas, communities,camps, eventually households and individuals), while others express relationships with items (formally: options, aspects, attributes, indicators; substantively: sectors, agencies,problems, etc.). Methods suitable to compare Case A vs. Case B as well as Item X vs. Y are few; most of those based on ordinal data work only in one or the other way, not both.

It is alsways good to recall that Ranking should be used only when the respondents can cognitively handle all the objectsat the same time. If the number of objects exceeds their memory and decisional capacity,the ensuing ranks will be incomplete or unreliable. If a complete ranking is desired, one possibility is to break the set of objects into subsets.

Problem statement

The simple visualisation of 3 variable Need1, Need2, Need3 does not allow to clear visualise priorities between needs. Borda count can be used to consolidate ranks. Though it implicitly makes a strong assumptions by implicitely accepting that respondents are aware that they exercise voting.

The Plackett-Luce model of rankings overcomes analysis barrier linked to the analysis of ranked preferences and priorities:

  • It produces a true ratio-level measure of the strength of preferences from full or partial choices.

  • It tests for differences between items follow from confidence intervals; so that such difference can be extended for groups within a given item.

The Plackett-Luce model supplies a stronger measure. It is ratio level. Moreover, it is basedon more intuitive behavioral assumptions than the Borda count.

In this tutorial, we will present code to implement quickly those 2 methods.

Loading packages & Function

## This function will retrieve the packae if they are not yet installed.
using <- function(...) {
    libs <- unlist(list(...))
    req <- unlist(lapply(libs,require,character.only = TRUE))
    need <- libs[req == FALSE]
    if (length(need) > 0) { 
        install.packages(need)
        lapply(need,require,character.only = TRUE)
    }
}

## Getting all necessary package
using("foreign", "PlackettLuce", "tidyverse", "qvcalc","kableExtra","stargazer","NLP",
                     "ggthemes", "ggrepel", "GGally", "bbplot","ggpubr",'grid','gridExtra', 'forcat', 'psychotree')
## [[1]]
## [1] FALSE
rm(using)

# This small function is used to have nicely left align text within charts produced with ggplot2
left_align <- function(plot_name, pieces){
  grob <- ggplot2::ggplotGrob(plot_name)
  n <- length(pieces)
  grob$layout$l[grob$layout$name %in% pieces] <- 2
  return(grob)
}

## Fonction to calculate borda count
AvgRank <- function(BallotMatrix){
    Ballots <- as.matrix(BallotMatrix[, -1], mode = "numeric")
    Num_Candidates <- dim(Ballots)[1]
    Names <- BallotMatrix[, 1]
    Ballots[is.na(Ballots)] <- Num_Candidates + 1 #Treat blanks as one worse than min
    MeanRanks <- rowMeans(Ballots)
    Rankings <- data.frame(Names, MeanRanks)
    Rankings <- Rankings[order(rank(Rankings[, 2], ties.method = "random")), ] #Ties handled through random draw
    Rankings <- data.frame(Rankings, seq_along(Rankings[, 1]))
    names(Rankings) <- c("Items", "Average Rank", "Position")
    return(Rankings)  
}


## Some ggplot2 style
kobo_unhcr_style_scatter <- function() {
    font <- "Arial"
    ggplot2::theme(

      #This sets the font, size, type and colour of text for the chart's title
      plot.title = ggplot2::element_text(family = font, size = 12, face = "bold", color = "#222222"),

      #This sets the font, size, type and colour of text for the chart's subtitle,  as well as setting a margin between the title and the subtitle
      plot.subtitle = ggplot2::element_text(family = font, size = 11, margin = ggplot2::margin(9,0,9,0)),
      plot.caption = ggplot2::element_blank(),

      #This sets the position and alignment of the legend, removes a title and backround for it and sets the requirements for any text within the legend. The legend may often need some more manual tweaking when it comes to its exact position based on the plot coordinates.
      legend.position = "top",
      legend.text.align = 0,
      legend.background = ggplot2::element_blank(),
      legend.title = ggplot2::element_blank(),
      legend.key = ggplot2::element_blank(),
      legend.text = ggplot2::element_text(family = font, size = 9, color = "#222222"),

      #This sets the text font, size and colour for the axis text, as well as setting the margins and removes lines and ticks. In some cases, axis lines and axis ticks are things we would want to have in the chart
      axis.title = ggplot2::element_text(family = font, size = 11, color = "#222222"),
      axis.text = ggplot2::element_text(family = font, size = 10, color = "#222222"),
      #axis.text.x = ggplot2::element_text(margin = ggplot2::margin(5, b = 9)),
      axis.ticks = ggplot2::element_blank(),
      axis.line = ggplot2::element_blank(),

      #This removes all minor gridlines and adds major y gridlines. In many cases you will want to change this to remove y gridlines and add x gridlines.
      panel.grid.minor = ggplot2::element_blank(),
      panel.grid.major.x = ggplot2::element_line(color = "#cbcbcb"),
      panel.grid.major.y = ggplot2::element_line(color = "#cbcbcb"),

      #This sets the panel background as blank, removing the standard grey ggplot background colour from the plot
      panel.background = ggplot2::element_blank(),

      #This sets the panel background for facet-wrapped plots to white, removing the standard grey ggplot background colour and sets the title size of the facet-wrap title to font size 22
      strip.background = ggplot2::element_rect(fill = "white"),
      strip.text = ggplot2::element_text(size  = 11,  hjust = 0)
    )
  }

Dataset

The data used in the this tutorial comes from NPM Site Assessment Round 11, 2018 - for the Rohingya refugee camps in Bangladesh.

data_csv <- read.csv(file = "181227_2027AB_NPM11_Priorities_PlackettLuce.csv", header = T, sep = ",")

# Structure of the data:
#str(data_csv)

When loaded in R, all non-numeric data columns are read as factors when loading csv because of argument default.stringsAsFactors().

The model-fitting function in PlackettLuce, requires data in the form of rankings, with the rank (1st, 2nd, 3rd, …) for each item. Therefore transformation is required for the data in their orginal stage as we can see below.

used_var <- as.character(names(data_csv))[grepl("l_",as.character(names(data_csv)))]
 
 kable( data_csv[1:10 , c("block_id","population","priority1","priority2","priority3") ], 
       caption = "Data Overview") %>%
        kable_styling(bootstrap_options = c("striped", "bordered", "condensed", "responsive"), font_size = 9)
Table 1: Data Overview
block_id population priority1 priority2 priority3
CXB-001-001 40 Food assistance Cooking fuel and firewood Job opportunities
CXB-005-001 129 Cooking fuel and firewood Job opportunities Education for children
CXB-006-001 221 Food assistance Cooking fuel and firewood Healthcare
CXB-007-001 130 Food assistance Sanitation Education for children
CXB-008-001 112 Food assistance Cooking fuel and firewood Healthcare
CXB-009-001 54 Cooking fuel and firewood Healthcare Shelter
CXB-010-001 230 Food assistance Cooking fuel and firewood Healthcare
CXB-013-001 106 Food assistance Cooking fuel and firewood Sanitation
CXB-014-001 570 Cooking fuel and firewood Sanitation Job opportunities
CXB-016-001 67 Sanitation Education for children Healthcare

We can visualise this all together. We fist need to sort all the 3 needs based on frequency of the first need using forcat package. Note that factor level are inverted as the bar chard will be then flipped.

data_csv$priority1 <- factor(data_csv$priority1, levels =  levels(fct_rev(fct_infreq(data_csv$priority1))))

data_csv$priority2 <- factor(data_csv$priority2, levels =  levels(fct_rev(fct_infreq(data_csv$priority1))))
                             
data_csv$priority3 <- factor(data_csv$priority3, levels =  levels(fct_rev(fct_infreq(data_csv$priority1))))
plot1 <- ggplot(data = data_csv, aes( x = priority1)) +
  geom_bar(fill = "#2a87c8",colour = "#2a87c8", stat = "count", width = .8) +
  guides(fill = FALSE) +
  #geom_label(aes(label = ..count.., y = ..count..), stat = "count", fill = "#2a87c8", color = 'white') +
  
  geom_label(aes(label = priority1, y = 1), hjust = 0, vjust=0,  fill = "#222222", color = 'white') +
  coord_flip() +
        xlab("Priority") +
        ylab("") +
       labs(title = "Main Needs",  
        subtitle = "First Priority") +
      bbc_style() +
      theme( plot.title = element_text(size = 13),
         plot.subtitle = element_text(size = 11),
         plot.caption = element_text(size = 7, hjust = 1),
         axis.text = element_text(size = 10), 
         axis.text.y = element_blank(),
         strip.text.x = element_text(size = 11),
         panel.grid.major.x = element_line(color = "#cbcbcb"), 
         panel.grid.major.y = element_blank())

plot1 <- ggpubr::ggarrange(left_align(plot1, c("subtitle", "title", "caption")), ncol = 1, nrow = 1)


plot2 <-   ggplot(data = data_csv, aes( x = priority2)) +
    geom_bar(fill = "#2a87c8",colour = "#2a87c8", stat = "count", width = .8) +
    guides(fill = FALSE) +
   # geom_label(aes(label = ..count.., y = ..count..), stat = "count", fill = "#2a87c8", color = 'white') +
    coord_flip() +
    xlab("") +
    ylab("") +
     labs(title = "",  
        subtitle = "Second Priority") +
      bbc_style() +
      theme( plot.title = element_text(size = 13),
         plot.subtitle = element_text(size = 11),
         plot.caption = element_text(size = 7, hjust = 1),
         axis.text = element_text(size = 10),
         axis.text.y = element_blank(),
         strip.text.x = element_text(size = 11),
         panel.grid.major.x = element_line(color = "#cbcbcb"), 
         panel.grid.major.y = element_blank())
  
plot2 <- ggpubr::ggarrange(left_align(plot2, c("subtitle", "title")), ncol = 1, nrow = 1)

plot3 <-   ggplot(data = data_csv, aes( x = priority3)) +
    geom_bar(fill = "#2a87c8",colour = "#2a87c8", stat = "count", width = .8) +
    guides(fill = FALSE) +
    #geom_label(aes(label = ..count.., y = ..count..), stat = "count", fill = "#2a87c8", color = 'white') +
    coord_flip() +
    xlab("") +
    ylab("") +
     labs(title = "",  
        subtitle = "Third Priority") +
      bbc_style() +
      theme( plot.title = element_text(size = 13),
         plot.subtitle = element_text(size = 11),
         plot.caption = element_text(size = 7, hjust = 1),
         axis.text = element_text(size = 10), 
         axis.text.y = element_blank(),
         strip.text.x = element_text(size = 11),
         panel.grid.major.x = element_line(color = "#cbcbcb"), 
         panel.grid.major.y = element_blank())
  
plot3 <- ggpubr::ggarrange(left_align(plot3, c("subtitle", "title")), ncol = 1, nrow = 1)

  
grid.arrange( plot1, plot2, plot3,  ncol = 3)

The analysis will provide estimates for priorities among 10 areas of need extracted from those 3 variables.

    ## thanks to: https://stackoverflow.com/questions/44232180/list-to-dataframe
    # tosplitlist <- strsplit(as.character(data[ , id]), " ")
    # tosplitlist <- stats::setNames(tosplitlist, seq_along(tosplitlist))
    # tosplitlist2 <- utils::stack(tosplitlist)
    # tosplitframe <- reshape2::dcast(tosplitlist2, ind ~ values, value.var = "ind", fun.aggregate = length)

Arguments to be used the Plackett-Luce model have been prefixed “l_” so that they can retrieved automatically.

Borda Count

## transpose the data

vote <- t(data_csv[ , c(used_var)])

borda <- AvgRank(vote)

borda$Items <- row.names(borda)
row.names(borda) <- NULL

kable( borda, 
       caption = "Borda Count per sector") %>%
        kable_styling(bootstrap_options = c("striped", "bordered", "condensed", "responsive"), font_size = 9)
Table 2: Borda Count per sector
Items Average Rank Position
l_nfis 0.2398190 1
l_healthcare 0.2584213 2
l_safety 0.2845651 3
l_education 0.2890900 4
l_sanitation 0.3076923 5
l_shelter 0.4529915 6
l_food 0.6158874 7
l_jobs 0.7616893 8
l_water 0.9381599 9
l_fuel 1.7073906 10

We can visualise this all together

borda$items <- gsub(pattern     = "l_",
                    replacement = "",
                    borda$Items)

plot1 <- ggplot(data = borda, aes( x = reorder( items, borda[ ,2]), ## Reordering country by Value
                         y = borda[ ,2])) +
  geom_bar(fill = "#2a87c8", colour = "#2a87c8",
           stat = "identity") + 
  coord_flip() +
  scale_y_continuous( ) + ## Format axis number
  guides(fill = FALSE) +
  xlab("") +
  ylab("Priority average Rank") +
  labs(title = "Main Needs according to Borda Count") +
  bbc_style() +
  theme( plot.title = element_text(size = 14),
         plot.subtitle = element_text(size = 12),
         plot.caption = element_text(size = 7, hjust = 1),
         axis.text = element_text(size = 9),
         panel.grid.major.x = element_line(color = "#cbcbcb"), 
         panel.grid.major.y = element_blank(),
         strip.text.x = element_text(size = 11))

ggpubr::ggarrange(left_align(plot1, c("subtitle", "title", "caption")), ncol = 1, nrow = 1)

Plackett-Luce Model

Generate the model

In the needs assessment context, we call the Plackett-Luce-“worth” coefficients “Intensities”; this terminology is arbitrary. Error messages below will apppear but are specific of this dataset and can be ignored.

data_csv %>%
  dplyr::select(starts_with("l_")) %>%
  PlackettLuce() -> intensities
## Recoded rankings that are not in dense form
## Rankings with only 1 item set to `NA`
#names(intensities)
# Summary of intensities:
# "ref = NULL" sets the mean of all intensities as the reference value.
# Else the first item would be the reference, with its coefficent constrained to 0.

Plackett_Luce_est <- summary(intensities, ref = NULL)

Plackett_Luce_est
## Call: PlackettLuce(rankings = .)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## l_fuel        0.40822    0.04211   9.694  < 2e-16 ***
## l_education  -0.05125    0.09615  -0.533   0.5940    
## l_food        1.45591    0.06502  22.393  < 2e-16 ***
## l_healthcare -0.50588    0.11155  -4.535 5.76e-06 ***
## l_jobs       -0.74085    0.07239 -10.235  < 2e-16 ***
## l_nfis       -0.75151    0.12079  -6.221 4.93e-10 ***
## l_safety     -0.75632    0.11546  -6.550 5.74e-11 ***
## l_sanitation -0.17088    0.09585  -1.783   0.0746 .  
## l_shelter     0.06927    0.07656   0.905   0.3656    
## l_water       1.04329    0.05337  19.550  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual deviance:  5912.8 on 5729 degrees of freedom
## AIC:  5930.8 
## Number of iterations: 17

Preparations for confidence intervals

First, we prepare a table for export, stripping the prefix “l_”. We ensure items will be listed in the sequence of the arguments, not alphabetically.

Plackett_Luce_est$coefficients %>%
  gsub(pattern     = "l_",
       replacement = "",
       x           = row.names(.)) %>%
  factor(x = ., levels = .) -> items
data.frame(items_num = as.integer(items),
           items     = items,
           Plackett_Luce_est$coefficients) -> Plackett_Luce_est_z.values

kable(Plackett_Luce_est_z.values, caption = "Model estimation") %>%
           kable_styling(bootstrap_options = c("striped", "bordered", "condensed", "responsive"), font_size = 9)
Table 3: Model estimation
items_num items Estimate Std..Error z.value Pr…z..
l_fuel 1 fuel 0.4082170 0.0421095 9.6941844 0.0000000
l_education 2 education -0.0512509 0.0961501 -0.5330300 0.5940128
l_food 3 food 1.4559099 0.0650167 22.3928506 0.0000000
l_healthcare 4 healthcare -0.5058790 0.1115519 -4.5349226 0.0000058
l_jobs 5 jobs -0.7408503 0.0723857 -10.2347628 0.0000000
l_nfis 6 nfis -0.7515089 0.1207942 -6.2213973 0.0000000
l_safety 7 safety -0.7563186 0.1154639 -6.5502617 0.0000000
l_sanitation 8 sanitation -0.1708777 0.0958493 -1.7827746 0.0746230
l_shelter 9 shelter 0.0692724 0.0765627 0.9047796 0.3655822
l_water 10 water 1.0432861 0.0533651 19.5499757 0.0000000
# ========================================================================
# Export output: if needed
# ------------------------------------------------------------------------
# write.csv(x         = Plackett_Luce_est_z.values,
#           file      = paste0(path.output, "/", "Plackett_Luce_coef_table.csv"),
#           row.names = F)

In order to obtain confidence intervals, quasi-variances of the coefficients are needed:

qv <- qvcalc(intensities, ref = NULL)
summary(qv)
## Model call:  PlackettLuce(rankings = .) 
##                     estimate         SE    quasiSE    quasiVar
##     l_fuel        0.40821699 0.04210947 0.03288481 0.001081411
##     l_education  -0.05125089 0.09615010 0.10275089 0.010557745
##     l_food        1.45590988 0.06501673 0.06142255 0.003772729
##     l_healthcare -0.50587901 0.11155185 0.12108662 0.014661971
##     l_jobs       -0.74085027 0.07238568 0.07490996 0.005611502
##     l_nfis       -0.75150887 0.12079423 0.13156000 0.017308034
##     l_safety     -0.75631864 0.11546388 0.12518339 0.015670881
##     l_sanitation -0.17087771 0.09584931 0.10267997 0.010543176
##     l_shelter     0.06927241 0.07656274 0.07906022 0.006250518
##     l_water       1.04328611 0.05336508 0.04693406 0.002202806
## Worst relative errors in SEs of simple contrasts (%):  -2.9 6.4 
## Worst relative errors over *all* contrasts (%):  -11.3 7.8
#plot(qv, xlab = "Needs Priorities", ylab = "Worth (log)", main = NULL)

We strips prefix “l_” in preparation for export and ensure that items will be listed in the sequence of the arguments, not alphabetically.

qv$qvframe %>%
  gsub(pattern     = "l_",
       replacement = "",
       x           = row.names(.)) %>%
  factor(x = ., levels = .) -> items

Now we computes additional columns needed for 95%-CIs and exponentiate coefficent and CI bound estimates, in accordance with Plackett-Luce Model.

qv$qvframe %>% # qv table ...
  data.frame(items_num = as.integer(items),
             items     = items,
             .) %>%
  mutate(quasiSD  = sqrt(quasiVar),
         quasiLCI = estimate - quasiSD * qnorm(0.975, 0, 1),
         quasiUCI = estimate + quasiSD * qnorm(0.975, 0, 1),
         expLCI   = exp(quasiLCI),
         expEst   = exp(estimate),
         expUCI   = exp(quasiUCI)) -> qv_estim_CI

## Sort based on preference level
qv_estim_CI <- qv_estim_CI[ order(-  qv_estim_CI$expEst), ]
qv_estim_CI$items <- reorder(qv_estim_CI$items, qv_estim_CI$expEst)


kable(qv_estim_CI, caption = "Model Confidence Interval") %>%
           kable_styling(bootstrap_options = c("striped", "bordered", "condensed", "responsive"), font_size = 9)
Table 4: Model Confidence Interval
items_num items estimate SE quasiSE quasiVar quasiSD quasiLCI quasiUCI expLCI expEst expUCI
3 3 food 1.4559099 0.0650167 0.0614225 0.0037727 0.0614225 1.3355239 1.5762959 3.8019873 4.2883836 4.8370056
10 10 water 1.0432861 0.0533651 0.0469341 0.0022028 0.0469341 0.9512970 1.1352752 2.5890656 2.8385294 3.1120298
1 1 fuel 0.4082170 0.0421095 0.0328848 0.0010814 0.0328848 0.3437639 0.4726700 1.4102457 1.5041335 1.6042719
9 9 shelter 0.0692724 0.0765627 0.0790602 0.0062505 0.0790602 -0.0856828 0.2242276 0.9178854 1.0717281 1.2513558
2 2 education -0.0512509 0.0961501 0.1027509 0.0105577 0.1027509 -0.2526389 0.1501372 0.7767483 0.9500403 1.1619936
8 8 sanitation -0.1708777 0.0958493 0.1026800 0.0105432 0.1026800 -0.3721268 0.0303713 0.6892669 0.8429246 1.0308372
4 4 healthcare -0.5058790 0.1115519 0.1210866 0.0146620 0.1210866 -0.7432044 -0.2685536 0.4755875 0.6029753 0.7644845
5 5 jobs -0.7408503 0.0723857 0.0749100 0.0056115 0.0749100 -0.8876711 -0.5940294 0.4116132 0.4767084 0.5520982
6 6 nfis -0.7515089 0.1207942 0.1315600 0.0173080 0.1315600 -1.0093617 -0.4936560 0.3644515 0.4716544 0.6103907
7 7 safety -0.7563186 0.1154639 0.1251834 0.0156709 0.1251834 -1.0016736 -0.5109637 0.3672643 0.4693912 0.5999172
#str(qv_estim_CI)
# ========================================================================
# Export output:
# ------------------------------------------------------------------------
# write.csv(x         = qv_estim_CI,
#           file      = paste0(path.output, "/", "Plackett_Luce_estimates_CI.csv"),
#           row.names = F)

Visualise Results

In order to visualise the results together with the confidence interval, the best type visual is called a Forest plot.

There’s a default plot in PlackettLuce package but we can also use ggplot2.

plot1 <- ggplot(data = qv_estim_CI, 
             aes( x = items, 
                  y = expEst, 
                  ymin = expLCI,
                  ymax = expUCI)) +
      # geom_pointrange(color = "purple", shape = 18,  size = 2) + 
  
        geom_point(color = "purple", shape = 18,  size = 5) + 
        geom_errorbar(aes(xmax = expLCI, xmin = expLCI), height = 1, color = "black", width = 0.2, size = 1) +
  
        #geom_hline(yintercept = 1, lty = 2) +  # add a dotted line at x=1 after flip
        coord_flip() +  # flip coordinates (puts labels on y axis)
        xlab("Preference Level") +
        ylab("") +
        labs(title = " At first: Food, Water & Fuel!",
                 subtitle =  "Sectorial Preference, analysis based on Plackett-Luce model",
                 caption = "The black lines (also called 'whiskers') around the point represent the confidence interval for each sector (the shorter the line, the more precise is the estimate)") +
          bbc_style() +
            theme( plot.title = element_text(size = 14),
                   plot.subtitle = element_text(size = 12),
                   plot.caption = element_text(size = 7, hjust = 1),
                  axis.text = element_text(size = 9),
                  panel.grid.major.x = element_line(color = "#cbcbcb"), 
                  panel.grid.major.y = element_blank(),
                  strip.text.x = element_text(size = 11))

ggpubr::ggarrange(left_align(plot1, c("subtitle", "title", "caption")), ncol = 1, nrow = 1)

Compare Borda Count & Plackett-Luce Model

Plackett-Luce does not score observed ranks mechanically, as Borda does. Rather, it considers the tendency for items to be at the higher end of the observed ranks and then extrapolates that information into simulated rankings of the unobserved ranks.

The Borda count is indifferent to the unobserved rank orders. It is uninformative in this regard.

We can compare the 2 method with the chart below:

Note the usage of ggrepel to avoid having overlaping labels.

compare <- merge( x = qv_estim_CI, y = borda, by = "items")

plot1 <- ggplot(data = compare, aes(x = compare[ ,14], y = expEst)) +
  geom_point(shape = 1) +    # Use hollow circles
  # geom_smooth()  +          # Add a loess smoothed fit curve with confidence region
  geom_label_repel(aes(label = items), size = 5) +
  xlab("Preference Level - Borda") +
  ylab("Preference Level - Plackett-Luce") +
  labs(title = "Comparing Needs Prioritisation",
       subtitle =  "Food or Fuel? It depends on how you count it...") +
  kobo_unhcr_style_scatter() +
  theme( plot.title = element_text(size = 14),
         plot.subtitle = element_text(size = 12),
         plot.caption = element_text(size = 7, hjust = 1),
         axis.text = element_text(size = 9),
         panel.grid.major.x = element_line(color = "#cbcbcb"),
         panel.grid.major.y = element_line(color = "#cbcbcb"),
         strip.text.x = element_text(size = 11))

ggpubr::ggarrange(left_align(plot1, c("subtitle", "title")), ncol = 1, nrow = 1)

comments powered by Disqus