Edouard Legoupil

7 minutes read


Geocoding location is one of the common task for many humanitarian information management officers. While the regular google is working very well in many countries, most of countries where we work are often poorly covered. Geoname.org is the most extensive database of toponyme. It aggregates a huge number of data source: https://www.geonames.org/datasources/.

In this post, we will see how to quickly geocode a list of location using the fuzzy search capacity of Geonames. A fuzzy search is a process that get results that are likely to be relevant to a search argument even when the argument does not exactly correspond to the desired information. It makes use of algorithm to measure the edit distance (i.e. a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other).

Fuzzy search is quite powerfull when searching for locations in where toponyme are transliteration of arabic for instance (i.e. transferring a word from one alphabet of one language to another one).

Geonames.org

There’s a dedicated R package to use geonames. First we need to create an account on geonames: https://www.geonames.org/login in order to get your own geonamesUsername. You will also need to enable the web service capacity: http://www.geonames.org/enablefreewebservice.

Install package

#install.packages("geonames")
library(geonames)
library(kableExtra)
##options(geonamesUsername="user")

Available functions in the package

Get nearby functions

# GNneighbours(3041565)
# GNneighbourhood(40.7834,-73.96625)
# GNfindNearbyPlaceName(52,-128,300, "30","FULL") 
# GNfindNearbyStreets(37.45,-122.18)

Country functions

# GNcountryCode(lat=47.03,lng=10.2)
# GNcountryInfo()
# GNcountryInfo("DE")

Wikipedia functions

## Wikipedia functions
# GNwikipediaSearch("london")
# GNfindNearbyWikipedia(postalcode=8775,country="CH",radius=10)
# GNwikipediaBoundingBox(north=44.1,south=-9.9,east=-22.4,west=55.2)

Timezone function

# GNtimezone(57.01,-2)
# GNtimezone(lat=0,lng=-40)
# GNtimezone(lat=0,lng=-40, radius=200)

Postal code function

# GNfindNearbyPostalCodes(lat=47,lng=9)
# GNpostalCodeSearch(postalcode=90210,country="FI")
# GNpostalCodeSearch(postalcode=90210,country="US")
# GNpostalCodeLookup(postalcode="LA1",country="UK")
# GNpostalCodeLookup(postalcode="90210")
# GNpostalCodeCountryInfo()

Practical example in DRC

Let’s use a dataset recently shared on the “GRP Inter Agency IM” skype group.

drc <- read.delim("drc.tsv")
str(drc)
## 'data.frame':    40 obs. of  5 variables:
##  $ Departement: Factor w/ 2 levels "cuvette","Likouala": 1 1 2 2 2 2 2 2 2 2 ...
##  $ District   : Factor w/ 8 levels "betou","Bouanela",..: 8 8 7 7 3 7 3 3 7 2 ...
##  $ VILLAGES   : Factor w/ 40 levels "","Afrique du sud",..: 37 36 40 39 38 35 34 33 32 31 ...
##  $ Lat        : Factor w/ 1 level "#N/A": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Long       : Factor w/ 1 level "#N/A": 1 1 1 1 1 1 1 1 1 1 ...

We can search location one by one.

This would be the URL on geonames https://www.geonames.org/advanced-search.html?q=Ndoumba&country=CD&featureClass=P&continentCode=AF&fuzzy=0.6.

Beyond searching for the location name, you can note additional arguments to query the API:

  • featureClass = “P”, - Here we are looking for populated place, you can narrow down this search by selecting one feature among the 645 different type available: https://www.geonames.org/export/codes.html

  • fuzzy = “0.6”, - This enables the fuzzy search. By changing this variable, you will tune the sensitivity of the similar locations identification capacity.

  • country = “CD”, continentCode = “AF” - This will narrow down your search to a country of continent.

More documentation on the webservice is available at https://www.geonames.org/export/web-services.html.

search1 <- GNsearch(q = "Ndoumba",
         featureClass = "P",
         fuzzy = "0.6",
         country = "CD",
         continentCode = "AF",
         maxRows = 20)



kable(search1)
adminCode1 lng geonameId toponymName countryId fcl population countryCode name fclName adminCodes1.ISO3166_2 countryName fcodeName adminName1 lat fcode
22 22.35055 923194 Ndumba 203312 P 0 CD Ndumba city, village,… LU DR Congo populated place Lualaba -10.71362 PPL
22 22.0647 8541961 Ndumba 203312 P 0 CD Ndumba city, village,… LU DR Congo populated place Lualaba -9.73041 PPL
23 22.27717 8285154 Ndumba 203312 P 0 CD Ndumba city, village,… KC DR Congo populated place Kasai-Central -6.97654 PPL
23 22.51405 8285223 Ndumba 203312 P 0 CD Ndumba city, village,… KC DR Congo populated place Kasai-Central -6.89121 PPL
22 23.98294 8462158 Ndumba 203312 P 0 CD Ndumba city, village,… LU DR Congo populated place Lualaba -9.47691 PPL
18 21.34056 8327318 Ndumba 203312 P 0 CD Ndumba city, village,… KS DR Congo populated place Kasai -6.47715 PPL
22 22.48846 923204 Ndjamba 203312 P 0 CD Ndjamba city, village,… LU DR Congo populated place Lualaba -10.50816 PPL
11 29.13872 8327977 Ndoluma 203312 P 0 CD Ndoluma city, village,… NK DR Congo populated place Nord Kivu -0.30456 PPL
21 25.00174 212789 Kipase Deuxième 203312 P 0 CD Kipase Deuxième city, village,… LO DR Congo populated place Lomami -6.47707 PPL
22 22.67092 8463674 Ndjamba 203312 P 0 CD Ndjamba city, village,… LU DR Congo populated place Lualaba -10.32607 PPL
23 21.9611 207350 Dombi 203312 P 0 CD Dombi city, village,… KC DR Congo populated place Kasai-Central -7.59213 PPL
23 22.53354 8285243 Ndumbu 203312 P 0 CD Ndumbu city, village,… KC DR Congo populated place Kasai-Central -6.99824 PPL
23 22.5749 8285247 Ndumbu 203312 P 0 CD Ndumbu city, village,… KC DR Congo populated place Kasai-Central -6.9623 PPL
19 17.5095 2312111 Ndamba 203312 P 0 CD Ndamba city, village,… KG DR Congo populated place Kwango -6.33728 PPL
08 15.25322 8254913 Ndamba 203312 P 0 CD Ndamba city, village,… BC DR Congo populated place Bas-Congo -5.47251 PPL
23 22.17769 8283546 Ndumbi 203312 P 0 CD Ndumbi city, village,… KC DR Congo populated place Kasai-Central -6.23421 PPL
22 22.22394 8461668 Ndemba 203312 P 0 CD Ndemba city, village,… LU DR Congo populated place Lualaba -9.8634 PPL

In order to get quick results, we can also loop through the list of locations.

First we create an empty dataframe with the right structure.

results <- data.frame( adminCode1= "",
                       lng= "",
                       geonameId= "",
                       toponymName= "",
                       countryId= "",
                       fcl= "",
                       population= "",
                       countryCode= "",
                       name= "",
                       fclName= "",
                       adminCodes1.ISO3166_2= "",
                       countryName= "",
                       fcodeName= "", 
                       adminName1= "",
                       lat= "", 
                       fcode= "",
                       Village= "",
                       Departement= "",
                       District= "",
                       stringsAsFactors = FALSE)

names(results)
##  [1] "adminCode1"            "lng"                   "geonameId"            
##  [4] "toponymName"           "countryId"             "fcl"                  
##  [7] "population"            "countryCode"           "name"                 
## [10] "fclName"               "adminCodes1.ISO3166_2" "countryName"          
## [13] "fcodeName"             "adminName1"            "lat"                  
## [16] "fcode"                 "Village"               "Departement"          
## [19] "District"

Now we can loop around locations, to search and append results.

for (i in 1:nrow(drc)) {
  #i <- 1
  Village <- as.character(drc[ i, c("VILLAGES")])
  Departement <- as.character(drc[ i, c("Departement")])
  District <- as.character(drc[ i, c("District")])
  cat(paste0("searching for ", Village, "\n"))
  
  resulti <- GNsearch(q = Village ,
                      country = "CD",
                      featureClass = "P",
                      continentCode = "AF",
                      fuzzy = "0.6",
                      maxRows = 20)
  cat(paste0("Results potential ", nrow(resulti), "\n"))
  if (nrow(resulti) > 0 ) {
    resulti$Village  <- Village
    resulti$Departement  <- Departement
    resulti$District  <- District
  results <- rbind(results, resulti) } else { results <- results}
}
## searching for Obembo
## Results potential 0
## searching for Obambou
## Results potential 0
## searching for Yeleyele
## Results potential 4
## searching for Yabayeleyele
## Results potential 0
## searching for Talanamisso
## Results potential 0
## searching for Nongo
## Results potential 12
## searching for Nianga Kake
## Results potential 3
## searching for Ndzokou
## Results potential 0
## searching for Ndoumba
## Results potential 17
## searching for Moutembou
## Results potential 0
## searching for Moumenguele
## Results potential 0
## searching for Monokoboli
## Results potential 1
## searching for Mongombete
## Results potential 6
## searching for Moliangolo
## Results potential 0
## searching for Mogbala
## Results potential 20
## searching for Mobéyé (village autochtones)
## Results potential 0
## searching for Mawangui
## Results potential 20
## searching for Mankolo
## Results potential 20
## searching for Makolo
## Results potential 20
## searching for Loumbe
## Results potential 20
## searching for Lissala Ngomba
## Results potential 0
## searching for Ligoyi
## Results potential 8
## searching for Lignete
## Results potential 1
## searching for Kpeta
## Results potential 1
## searching for Itele
## Results potential 6
## searching for Ikongolo
## Results potential 10
## searching for Gaga
## Results potential 20
## searching for Etima
## Results potential 2
## searching for Epena centre (4 quartiers)
## Results potential 0
## searching for Elenda
## Results potential 19
## searching for Bossessengue
## Results potential 0
## searching for Bokpende
## Results potential 20
## searching for Bokoumba
## Results potential 20
## searching for Bogboko
## Results potential 20
## searching for Boboukou
## Results potential 2
## searching for Bissobe
## Results potential 1
## searching for Bete-Ntolo
## Results potential 1
## searching for Bangala
## Results potential 20
## searching for Afrique du sud
## Results potential 0
## searching for 
## Results potential 0
#names(resulti)

Last write resuts in csv file.

write.csv(results, "results.csv", row.names = FALSE)

et voila…

comments powered by Disqus