Punya Prasad Sapkota

7 minutes read


R program has become a language of choice for data science work. It has rich feature library that can be readily used for variety of data management tasks. This article focuses on accessing data from the KoBoToolbox.

KoBoToolbox

KoBoToolbox provides a suite of tools for field data collection in the challenging environments. It is free and open source and works both on and offline.

KoBoToolbox has a number of advanced features which are very useful for advanced use case scenario. KoBoToolbox has made data accessible through the implementation of REST API. This provides enormous possibility of automating the workflow of survey data collection, processing and analysis. Even if accessibility of the data has not been set as public, data can be fetched by passing username and password of the KoBo account. KoBoToolbox has two APIs access points namely kpi and kc. The toolbox started supporting form deployments and access to data using ‘kc’ API. In the latest version, ‘kpi’ has been implemented as the primary API.

This article is written to demostrates the methods to access data from KoBo using R program.

Main packages used:

library(httr)
library(jsonlite)
#for reading and writing data
library(readr) # to read CSV data
library(openxlsx) # to write to excel file

Package ‘httr’ has very useful tools for working with HTTP requests such as GET, POST, PATCH etc. In addition, we can pass parameters such as authentication information etc. The ‘jsonlite’ package offers flexible, robust, high performance tools for working with JSON data.

First of all, let’s set the KoBo server URL.

KoBoToolbox Example

####----set global variables ----------
kobo_server_url<-"https://kobo.humanitarianresponse.info/"
kc_server_url<-"https://kc.humanitarianresponse.info/"

To download data, unique identifier of the form is required to construct the url to fetch data. In the older API version 1, forms/projects deployed to collect data has unique six digit identifier (for example, 145448). With the ID of the form, KoBo data download URL can be constructed as:

form_id<-253043 #change the id to your form's id
url<-paste0(kc_server_url,"api/v1/data/",form_id,".csv")
#returns the CSV content of the form

Above url points to the CSV content of the form data that can be easily transformed to R dataframe for further processing.

To download the data, GET request is made. If data is not available publicly, username and password of the KoBo account need to be passed as authentication parameter.

#supply url for the data
rawdata<-GET(url)
#if form data is not accessible publicly, then username and password needs to be provided. 
#Use the following code to download the data.
#u<-"username"
#pw<-"password"
#rawdata<-GET(url,authenticate(u,pw),progress())

The server responds with a status code which needs to be checked for its successful return of the requested information before further processing of the information.

print(paste0("Status Code: ",rawdata$status_code))
## [1] "Status Code: 200"

When an API call responds to a request, the data is returned as a list. The list has many different items and the most of the items are administrative information from the API. It is suggested to explore the list elements to understand the information so that required piece of data can be extracted correctly.

# to check the elements returned:
str(rawdata)
## List of 10
##  $ url        : chr "https://kc.humanitarianresponse.info/api/v1/data/253043.csv"
##  $ status_code: int 200
##  $ headers    :List of 8
##   ..$ date               : chr "Fri, 29 Nov 2019 22:31:10 GMT"
##   ..$ content-type       : chr "application/csv"
##   ..$ content-length     : chr "68467"
##   ..$ server             : chr "nginx/1.10.3 (Ubuntu)"
##   ..$ content-disposition: chr "attachment; filename=aUBanmH6zXuLaDfUyoSbZB_2019_11_29.csv"
##   ..$ content-language   : chr "en"
##   ..$ vary               : chr "Accept, Accept-Language, Cookie"
##   ..$ allow              : chr "GET, POST, PATCH, DELETE, HEAD, OPTIONS"
##   ..- attr(*, "class")= chr [1:2] "insensitive" "list"
##  $ all_headers:List of 1
##   ..$ :List of 3
##   .. ..$ status : int 200
##   .. ..$ version: chr "HTTP/2"
##   .. ..$ headers:List of 8
##   .. .. ..$ date               : chr "Fri, 29 Nov 2019 22:31:10 GMT"
##   .. .. ..$ content-type       : chr "application/csv"
##   .. .. ..$ content-length     : chr "68467"
##   .. .. ..$ server             : chr "nginx/1.10.3 (Ubuntu)"
##   .. .. ..$ content-disposition: chr "attachment; filename=aUBanmH6zXuLaDfUyoSbZB_2019_11_29.csv"
##   .. .. ..$ content-language   : chr "en"
##   .. .. ..$ vary               : chr "Accept, Accept-Language, Cookie"
##   .. .. ..$ allow              : chr "GET, POST, PATCH, DELETE, HEAD, OPTIONS"
##   .. .. ..- attr(*, "class")= chr [1:2] "insensitive" "list"
##  $ cookies    :'data.frame': 0 obs. of  7 variables:
##   ..$ domain    : logi(0) 
##   ..$ flag      : logi(0) 
##   ..$ path      : logi(0) 
##   ..$ secure    : logi(0) 
##   ..$ expiration: 'POSIXct' num(0) 
##   ..$ name      : logi(0) 
##   ..$ value     : logi(0) 
##  $ content    : raw [1:68467] 51 5f 45 2f ...
##  $ date       : POSIXct[1:1], format: "2019-11-29 22:31:10"
##  $ times      : Named num [1:6] 0 0.183 0.314 0.595 0.97 ...
##   ..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect" "pretransfer" ...
##  $ request    :List of 7
##   ..$ method    : chr "GET"
##   ..$ url       : chr "https://kc.humanitarianresponse.info/api/v1/data/253043.csv"
##   ..$ headers   : Named chr "application/json, text/xml, application/xml, */*"
##   .. ..- attr(*, "names")= chr "Accept"
##   ..$ fields    : NULL
##   ..$ options   :List of 2
##   .. ..$ useragent: chr "libcurl/7.58.0 r-curl/4.2 httr/1.4.1"
##   .. ..$ httpget  : logi TRUE
##   ..$ auth_token: NULL
##   ..$ output    : list()
##   .. ..- attr(*, "class")= chr [1:2] "write_memory" "write_function"
##   ..- attr(*, "class")= chr "request"
##  $ handle     :Class 'curl_handle' <externalptr> 
##  - attr(*, "class")= chr "response"

The main raw data of interest is returned with ‘content’ tag. In addition, when content has non-English characters, for example Arabic texts, we can encode the output as ‘UTF-8’ to ensure that the information is retained in its original form and encoding.

d_content <- content(rawdata,"raw",encoding="UTF-8")

The raw content can be parsed to CSV format by using read_csv() function from ‘readr’ package.

d_content_csv <- read_csv(d_content)
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   `Q_E/Q_E1` = col_double(),
##   `Q_E/Q_E2` = col_date(format = ""),
##   `Q_E/Q_E3` = col_double(),
##   `Q_E/Q_E5` = col_double(),
##   `Q_E/Q_E6` = col_double(),
##   `gchoose_th/inter_sect` = col_logical(),
##   `gchoose_th/cccm` = col_logical(),
##   `gchoose_th/education` = col_logical(),
##   `gchoose_th/nfi_shelte` = col_logical(),
##   `gchoose_th/food_secur` = col_logical(),
##   `gchoose_th/health_sec` = col_logical(),
##   `gchoose_th/early_reco` = col_logical(),
##   `gchoose_th/protection` = col_logical(),
##   `I_S_Q/Q_K1/Q_K1_B` = col_double(),
##   `I_S_Q/Q_K1/Q_K1_E` = col_date(format = ""),
##   `I_S_Q/aap_group/Q1_1_4/food` = col_logical(),
##   `I_S_Q/aap_group/Q1_1_4/nfi` = col_logical(),
##   `I_S_Q/aap_group/Q1_1_4/shelter` = col_logical(),
##   `I_S_Q/aap_group/Q1_1_4/wash` = col_logical(),
##   `I_S_Q/aap_group/Q1_1_4/nutrition` = col_logical()
##   # ... with 472 more columns
## )
## See spec(...) for full column specifications.

Let’s change it to the R data frame object.

d_content_csv <- as.data.frame(d_content_csv)

To save it to the external CSV or even excel file, we can use any csv or xlsx writer functions available in R packages.

#using openxlsx package to save data
openxlsx::write.xlsx(d_content_csv,"kobo_data.xlsx",sheetName="data", row.names = FALSE)
## Note: zip::zip() is deprecated, please use zip::zipr() instead

Let’s look at another example where data is in the JSON format. When downloading data, it is a common practice to check the number of submissions. The data download request is made only if the form has one or more records.

#submission count
d_count_subm<-0 #initialise
#returns number of data submisstion in each form
stat_url<- paste0(kc_server_url,"api/v1/stats/submissions/",form_id,"?group=anygroupname")
rawdata_stat<-GET(stat_url)
#supply user name and password if data is not publicly available
#rawdata_stat<-GET(stat_url,authenticate(u,pw),progress())
d_content_stat <- rawToChar(rawdata_stat$content)
d_content_stat <- fromJSON(d_content_stat)
d_count_submission <- d_content_stat$count
  #check whether there is record or not
if (is.null(d_count_submission)){
    d_count_submission <-0
}

Country Based Pooled Fund API

Another example to download Country Based Pooled Fund CBPF data using API. The CBPF has made several dataset available to access using the REST API.

url<-"https://cbpfapi.unocha.org/vo1/odata/Location?poolfundAbbrv=TUR70&$format=csv"
#returns the CSV content of the form
rawdata<-GET(url)
d_content <- content(rawdata,"raw",encoding="UTF-8")
d_content_csv <- as.data.frame(read_csv(d_content))
## Parsed with column specification:
## cols(
##   PooledFundName = col_character(),
##   PooledFundId = col_double(),
##   AllocationTypeId = col_double(),
##   AllocationYear = col_double(),
##   AllocationSourceName = col_character(),
##   ChfId = col_double(),
##   ChfProjectCode = col_character(),
##   Location = col_character(),
##   ActivityName = col_character(),
##   Men = col_double(),
##   Women = col_double(),
##   Boys = col_double(),
##   Girls = col_double(),
##   Percentage = col_double(),
##   LocationAdminLevelLatitude = col_double(),
##   LocationAdminLevelLongitude = col_double(),
##   AdminLocation1TypeName = col_character(),
##   AdminLocation1 = col_character(),
##   AdminLevel1PCode = col_character()
## )
View(d_content_csv)

Above examples provides the list of functions and process to fetch information using REST API. A list of functions has been developed in the KoBo APItoolbox. This GitHub repo includes additional functions and utility to work with the KoBo dataset. This also includes utilities for new KoBo API.

comments powered by Disqus