Profile of preference: Rashtree analysis

December 10, 2019 Benini & Sons

4 minutes read

Often affected population or key informant are requested to share their preference for specific type of interventions. How can we identify patterns of preference within a dataset? Can we identify groupings on the basis of several (categorical or continuous) variables that differentiate profiles optimally.

This tutorial is based on the publication Priorities and preferences in humanitarian needs assessments -Their measurement by rating and ranking methods.

Loading packages

## This function will retrieve the packae if they are not yet installed.
using <- function(...) {
   libs <- unlist(list(...))
    req <- unlist(lapply(libs,require,character.only = TRUE))
    need <- libs[req == FALSE]
    if (length(need) > 0) { 
        install.packages(need)
        lapply(need,require,character.only = TRUE)
    }
}

## Getting all necessary package
using("foreign", "PlackettLuce", "tidyverse", "qvcalc","kableExtra","stargazer","NLP",
                     "ggthemes", "ggrepel", "GGally", "bbplot","ggpubr",'grid','gridExtra', 'forcat', 'psychotree')

## [[1]]
## [1] FALSE

rm(using)

# This small function is used to have nicely left align text within charts produced with ggplot2
left_align <- function(plot_name, pieces){
  grob <- ggplot2::ggplotGrob(plot_name)
  n <- length(pieces)
  grob$layout$l[grob$layout$name %in% pieces] <- 2
  return(grob)
}

Dataset

The data used in the this tutorial comes from NPM Site Assessment Round 11, 2018 - for the Rohingya refugee camps in Bangladesh.

data_csv <- read.csv(file = "181227_2027AB_NPM11_Priorities_PlackettLuce.csv", header = T, sep = ",")


 
used_var <- as.character(names(data_csv))[grepl("l_",as.character(names(data_csv)))]

# Structure of the data:
#str(data_csv)

Can we identify differences within preferences?

In the dataset, there are variables define groups among which sectoral needs priorities may differ significantly.This is corresponds to the detection of Differential Item Functioning (DIF).

For instance, we have 4 sub-districts with refugee settlements (upazila), a continuous population size variable for the 1,990 camp blocks (log10pop) (logarithmic), and the distance from the nearest health care facility (healthWalk_enc, with five levels) as a marginalization indicator. These

Rashtree Visualisation are designed to identify significant differences within preferences.

We first need to format the data so that it can be consumed by the algorithm.

covariate <- data_csv[, c("upazila", "log10pop", "healthWalk_enc")]

resp <- as.matrix(data_csv[, used_var])

## Rashtree accepts only 0 or 1 - so everything above 0 shalle be replaced by 1
for (i in 1:nrow(resp)) {
    for (j in 1:ncol(resp)) if (resp[i, j] > 0) 
        resp[i, j] = 1
}
## resp will be a matrix variable used in the model
covariate$resp <- resp

# To exclude rows where all observed item responses are either 0 or 1, we select
# only the subsetof cases for which the proportion of correct item responses is
# strictly between 0 and 1 forfuther analysis.
covariate <- subset(covariate, rowMeans(resp, na.rm = TRUE) > 0 & rowMeans(resp, 
    na.rm = TRUE) < 1)

We can now compute and display it.

If the Rasch tree shows at least one split, DIF is present and there are groups with significant difference in their hiearchy of needs.

## Compute the rashtree
raschtree <- raschtree(resp ~ upazila + log10pop + healthWalk_enc, data = covariate)

## and plotting it...
plot(raschtree)

We can now extract the groups and descriptions.

## Compute the rashtree

kable(as.data.frame(itempar(raschtree)))

	respl_fuel	respl_education	respl_food	respl_healthcare	respl_jobs	respl_nfis	respl_safety	respl_sanitation	respl_shelter	respl_water
3	-0.9081861	0.5297518	-0.5548941	-0.9081861	0.7762540	-0.3023904	0.3228189	-0.3023904	-0.1656218	1.5128443
6	-2.5504606	0.6877839	-0.9459138	1.2537563	0.0479739	2.3062634	0.0746899	0.7803303	-0.1944528	-1.4599705
9	-3.0026343	0.7659191	-0.4038798	1.2941211	-0.4606402	0.4042383	1.3474463	1.0606580	-0.2085509	-0.7966776
10	-3.1250880	0.7742496	-0.5240226	1.3287548	-0.5240184	1.6405011	0.7472167	0.8019314	-0.1480432	-0.9714814
11	-3.5968421	1.3106673	-1.0336702	1.2624465	-0.3000406	0.9384069	0.7453142	1.0490368	0.5053854	-0.8807043
13	-1.5111376	0.4193461	-1.4367589	-0.4108577	-0.2586516	0.4193461	1.5902196	0.4193461	-0.0932484	0.8623964
15	-2.3620951	-0.0799298	-0.4793960	1.1217560	-0.5267838	1.2052988	0.9715678	1.5033119	0.1482659	-1.5019957
17	-3.0771201	0.7448512	-0.3089609	0.6938837	0.0052486	1.0407114	0.9910606	0.6331558	0.6570708	-1.3799010
18	-2.1892234	0.3722365	-2.1892233	1.5358228	0.0477570	1.5358228	-Inf	-0.6490151	-Inf	NA
19	0.5575216	-Inf	2.4097182	0.5575217	2.4097182	-1.8998867	-Inf	NA	-Inf	NA

We can also explore trees by focusing on one aspect…

## and plotting it...
plot(raschtree(resp ~ upazila, data = covariate), main = "For upazila")

## and plotting it...
plot(raschtree(resp ~ log10pop, data = covariate), main = "For population")

## and plotting it...
plot(raschtree(resp ~ healthWalk_enc, data = covariate), main = "For distance to Health Center")

post

Home

About

Contributors

Categories

Join the Skype-group

learn more with R-bloggers

Recent posts

Anonymisation: Intrusion scenario and risk threshold

Post

Working with Survey Samples in the Tidyverse

Using Gridded Population Data for Household Survey Sampling

Quick tips for visualising data (with R)

Conjoint analysis: modeling judgement to calibrate vulnerability scoring

Using Analytic Hierarchy Process to weight vulnerability scorecard

Profile of preference: Rashtree analysis

Loading packages

Dataset

Can we identify differences within preferences?

Recent posts

Anonymisation: Intrusion scenario and risk threshold

Post

Working with Survey Samples in the Tidyverse

Using Gridded Population Data for Household Survey Sampling

Quick tips for visualising data (with R)

Conjoint analysis: modeling judgement to calibrate vulnerability scoring

Using Analytic Hierarchy Process to weight vulnerability scorecard

about

Home

About

Contributors

Categories

Join the Skype-group

learn more with R-bloggers

Loading packages

Dataset

Can we identify differences within preferences?

Related articles

about