Benini & Sons

4 minutes read


Often affected population or key informant are requested to share their preference for specific type of interventions. How can we identify patterns of preference within a dataset? Can we identify groupings on the basis of several (categorical or continuous) variables that differentiate profiles optimally.

This tutorial is based on the publication Priorities and preferences in humanitarian needs assessments -Their measurement by rating and ranking methods.

Loading packages

## This function will retrieve the packae if they are not yet installed.
using <- function(...) {
   libs <- unlist(list(...))
    req <- unlist(lapply(libs,require,character.only = TRUE))
    need <- libs[req == FALSE]
    if (length(need) > 0) { 
        install.packages(need)
        lapply(need,require,character.only = TRUE)
    }
}

## Getting all necessary package
using("foreign", "PlackettLuce", "tidyverse", "qvcalc","kableExtra","stargazer","NLP",
                     "ggthemes", "ggrepel", "GGally", "bbplot","ggpubr",'grid','gridExtra', 'forcat', 'psychotree')
## [[1]]
## [1] FALSE
rm(using)

# This small function is used to have nicely left align text within charts produced with ggplot2
left_align <- function(plot_name, pieces){
  grob <- ggplot2::ggplotGrob(plot_name)
  n <- length(pieces)
  grob$layout$l[grob$layout$name %in% pieces] <- 2
  return(grob)
}

Dataset

The data used in the this tutorial comes from NPM Site Assessment Round 11, 2018 - for the Rohingya refugee camps in Bangladesh.

data_csv <- read.csv(file = "181227_2027AB_NPM11_Priorities_PlackettLuce.csv", header = T, sep = ",")


 
used_var <- as.character(names(data_csv))[grepl("l_",as.character(names(data_csv)))]

# Structure of the data:
#str(data_csv)

Can we identify differences within preferences?

In the dataset, there are variables define groups among which sectoral needs priorities may differ significantly.This is corresponds to the detection of Differential Item Functioning (DIF).

For instance, we have 4 sub-districts with refugee settlements (upazila), a continuous population size variable for the 1,990 camp blocks (log10pop) (logarithmic), and the distance from the nearest health care facility (healthWalk_enc, with five levels) as a marginalization indicator. These

Rashtree Visualisation are designed to identify significant differences within preferences.

We first need to format the data so that it can be consumed by the algorithm.

covariate <- data_csv[, c("upazila", "log10pop", "healthWalk_enc")]

resp <- as.matrix(data_csv[, used_var])

## Rashtree accepts only 0 or 1 - so everything above 0 shalle be replaced by 1
for (i in 1:nrow(resp)) {
    for (j in 1:ncol(resp)) if (resp[i, j] > 0) 
        resp[i, j] = 1
}
## resp will be a matrix variable used in the model
covariate$resp <- resp

# To exclude rows where all observed item responses are either 0 or 1, we select
# only the subsetof cases for which the proportion of correct item responses is
# strictly between 0 and 1 forfuther analysis.
covariate <- subset(covariate, rowMeans(resp, na.rm = TRUE) > 0 & rowMeans(resp, 
    na.rm = TRUE) < 1)

We can now compute and display it.

If the Rasch tree shows at least one split, DIF is present and there are groups with significant difference in their hiearchy of needs.

## Compute the rashtree
raschtree <- raschtree(resp ~ upazila + log10pop + healthWalk_enc, data = covariate)

## and plotting it...
plot(raschtree)

We can now extract the groups and descriptions.

## Compute the rashtree

kable(as.data.frame(itempar(raschtree)))
respl_fuel respl_education respl_food respl_healthcare respl_jobs respl_nfis respl_safety respl_sanitation respl_shelter respl_water
3 -0.9081861 0.5297518 -0.5548941 -0.9081861 0.7762540 -0.3023904 0.3228189 -0.3023904 -0.1656218 1.5128443
6 -2.5504606 0.6877839 -0.9459138 1.2537563 0.0479739 2.3062634 0.0746899 0.7803303 -0.1944528 -1.4599705
9 -3.0026343 0.7659191 -0.4038798 1.2941211 -0.4606402 0.4042383 1.3474463 1.0606580 -0.2085509 -0.7966776
10 -3.1250880 0.7742496 -0.5240226 1.3287548 -0.5240184 1.6405011 0.7472167 0.8019314 -0.1480432 -0.9714814
11 -3.5968421 1.3106673 -1.0336702 1.2624465 -0.3000406 0.9384069 0.7453142 1.0490368 0.5053854 -0.8807043
13 -1.5111376 0.4193461 -1.4367589 -0.4108577 -0.2586516 0.4193461 1.5902196 0.4193461 -0.0932484 0.8623964
15 -2.3620951 -0.0799298 -0.4793960 1.1217560 -0.5267838 1.2052988 0.9715678 1.5033119 0.1482659 -1.5019957
17 -3.0771201 0.7448512 -0.3089609 0.6938837 0.0052486 1.0407114 0.9910606 0.6331558 0.6570708 -1.3799010
18 -2.1892234 0.3722365 -2.1892233 1.5358228 0.0477570 1.5358228 -Inf -0.6490151 -Inf NA
19 0.5575216 -Inf 2.4097182 0.5575217 2.4097182 -1.8998867 -Inf NA -Inf NA

We can also explore trees by focusing on one aspect…

## and plotting it...
plot(raschtree(resp ~ upazila, data = covariate), main = "For upazila")

## and plotting it...
plot(raschtree(resp ~ log10pop, data = covariate), main = "For population")

## and plotting it...
plot(raschtree(resp ~ healthWalk_enc, data = covariate), main = "For distance to Health Center")

comments powered by Disqus