Geocoding location is one of the common task for many humanitarian information management officers. While the regular google is working very well in many countries, most of countries where we work are often poorly covered. Geoname.org is the most extensive database of toponyme. It aggregates a huge number of data source: https://www.geonames.org/datasources/.
In this post, we will see how to quickly geocode a list of location using the fuzzy search capacity of Geonames. A fuzzy search is a process that get results that are likely to be relevant to a search argument even when the argument does not exactly correspond to the desired information. It makes use of algorithm to measure the edit distance (i.e. a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other).
Fuzzy search is quite powerfull when searching for locations in where toponyme are transliteration of arabic for instance (i.e. transferring a word from one alphabet of one language to another one).
Geonames.org
There’s a dedicated R package to use geonames. First we need to create an account on geonames: https://www.geonames.org/login in order to get your own geonamesUsername
. You will also need to enable the web service capacity: http://www.geonames.org/enablefreewebservice.
Install package
#install.packages("geonames")
library(geonames)
library(kableExtra)
##options(geonamesUsername="user")
Available functions in the package
Basic search
###
### some sample usages
# GNsearch(q="london",maxRows=10)
# GNcities(north=44.1,south=-9.9,east=-22.4,west=55.2,lang="de")
# GNchildren(3175395)
Get nearby functions
# GNneighbours(3041565)
# GNneighbourhood(40.7834,-73.96625)
# GNfindNearbyPlaceName(52,-128,300, "30","FULL")
# GNfindNearbyStreets(37.45,-122.18)
Country functions
# GNcountryCode(lat=47.03,lng=10.2)
# GNcountryInfo()
# GNcountryInfo("DE")
Wikipedia functions
## Wikipedia functions
# GNwikipediaSearch("london")
# GNfindNearbyWikipedia(postalcode=8775,country="CH",radius=10)
# GNwikipediaBoundingBox(north=44.1,south=-9.9,east=-22.4,west=55.2)
Timezone function
# GNtimezone(57.01,-2)
# GNtimezone(lat=0,lng=-40)
# GNtimezone(lat=0,lng=-40, radius=200)
Postal code function
# GNfindNearbyPostalCodes(lat=47,lng=9)
# GNpostalCodeSearch(postalcode=90210,country="FI")
# GNpostalCodeSearch(postalcode=90210,country="US")
# GNpostalCodeLookup(postalcode="LA1",country="UK")
# GNpostalCodeLookup(postalcode="90210")
# GNpostalCodeCountryInfo()
Practical example in DRC
Let’s use a dataset recently shared on the “GRP Inter Agency IM” skype group.
drc <- read.delim("drc.tsv")
str(drc)
## 'data.frame': 40 obs. of 5 variables:
## $ Departement: Factor w/ 2 levels "cuvette","Likouala": 1 1 2 2 2 2 2 2 2 2 ...
## $ District : Factor w/ 8 levels "betou","Bouanela",..: 8 8 7 7 3 7 3 3 7 2 ...
## $ VILLAGES : Factor w/ 40 levels "","Afrique du sud",..: 37 36 40 39 38 35 34 33 32 31 ...
## $ Lat : Factor w/ 1 level "#N/A": 1 1 1 1 1 1 1 1 1 1 ...
## $ Long : Factor w/ 1 level "#N/A": 1 1 1 1 1 1 1 1 1 1 ...
We can search location one by one.
This would be the URL on geonames https://www.geonames.org/advanced-search.html?q=Ndoumba&country=CD&featureClass=P&continentCode=AF&fuzzy=0.6.
Beyond searching for the location name, you can note additional arguments to query the API:
featureClass = “P”, - Here we are looking for populated place, you can narrow down this search by selecting one feature among the 645 different type available: https://www.geonames.org/export/codes.html
fuzzy = “0.6”, - This enables the fuzzy search. By changing this variable, you will tune the sensitivity of the similar locations identification capacity.
country = “CD”, continentCode = “AF” - This will narrow down your search to a country of continent.
More documentation on the webservice is available at https://www.geonames.org/export/web-services.html.
search1 <- GNsearch(q = "Ndoumba",
featureClass = "P",
fuzzy = "0.6",
country = "CD",
continentCode = "AF",
maxRows = 20)
kable(search1)
adminCode1 | lng | geonameId | toponymName | countryId | fcl | population | countryCode | name | fclName | adminCodes1.ISO3166_2 | countryName | fcodeName | adminName1 | lat | fcode |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
22 | 22.35055 | 923194 | Ndumba | 203312 | P | 0 | CD | Ndumba | city, village,… | LU | DR Congo | populated place | Lualaba | -10.71362 | PPL |
22 | 22.0647 | 8541961 | Ndumba | 203312 | P | 0 | CD | Ndumba | city, village,… | LU | DR Congo | populated place | Lualaba | -9.73041 | PPL |
23 | 22.27717 | 8285154 | Ndumba | 203312 | P | 0 | CD | Ndumba | city, village,… | KC | DR Congo | populated place | Kasai-Central | -6.97654 | PPL |
23 | 22.51405 | 8285223 | Ndumba | 203312 | P | 0 | CD | Ndumba | city, village,… | KC | DR Congo | populated place | Kasai-Central | -6.89121 | PPL |
22 | 23.98294 | 8462158 | Ndumba | 203312 | P | 0 | CD | Ndumba | city, village,… | LU | DR Congo | populated place | Lualaba | -9.47691 | PPL |
18 | 21.34056 | 8327318 | Ndumba | 203312 | P | 0 | CD | Ndumba | city, village,… | KS | DR Congo | populated place | Kasai | -6.47715 | PPL |
22 | 22.48846 | 923204 | Ndjamba | 203312 | P | 0 | CD | Ndjamba | city, village,… | LU | DR Congo | populated place | Lualaba | -10.50816 | PPL |
11 | 29.13872 | 8327977 | Ndoluma | 203312 | P | 0 | CD | Ndoluma | city, village,… | NK | DR Congo | populated place | Nord Kivu | -0.30456 | PPL |
21 | 25.00174 | 212789 | Kipase Deuxième | 203312 | P | 0 | CD | Kipase Deuxième | city, village,… | LO | DR Congo | populated place | Lomami | -6.47707 | PPL |
22 | 22.67092 | 8463674 | Ndjamba | 203312 | P | 0 | CD | Ndjamba | city, village,… | LU | DR Congo | populated place | Lualaba | -10.32607 | PPL |
23 | 21.9611 | 207350 | Dombi | 203312 | P | 0 | CD | Dombi | city, village,… | KC | DR Congo | populated place | Kasai-Central | -7.59213 | PPL |
23 | 22.53354 | 8285243 | Ndumbu | 203312 | P | 0 | CD | Ndumbu | city, village,… | KC | DR Congo | populated place | Kasai-Central | -6.99824 | PPL |
23 | 22.5749 | 8285247 | Ndumbu | 203312 | P | 0 | CD | Ndumbu | city, village,… | KC | DR Congo | populated place | Kasai-Central | -6.9623 | PPL |
19 | 17.5095 | 2312111 | Ndamba | 203312 | P | 0 | CD | Ndamba | city, village,… | KG | DR Congo | populated place | Kwango | -6.33728 | PPL |
08 | 15.25322 | 8254913 | Ndamba | 203312 | P | 0 | CD | Ndamba | city, village,… | BC | DR Congo | populated place | Bas-Congo | -5.47251 | PPL |
23 | 22.17769 | 8283546 | Ndumbi | 203312 | P | 0 | CD | Ndumbi | city, village,… | KC | DR Congo | populated place | Kasai-Central | -6.23421 | PPL |
22 | 22.22394 | 8461668 | Ndemba | 203312 | P | 0 | CD | Ndemba | city, village,… | LU | DR Congo | populated place | Lualaba | -9.8634 | PPL |
In order to get quick results, we can also loop through the list of locations.
First we create an empty dataframe with the right structure.
results <- data.frame( adminCode1= "",
lng= "",
geonameId= "",
toponymName= "",
countryId= "",
fcl= "",
population= "",
countryCode= "",
name= "",
fclName= "",
adminCodes1.ISO3166_2= "",
countryName= "",
fcodeName= "",
adminName1= "",
lat= "",
fcode= "",
Village= "",
Departement= "",
District= "",
stringsAsFactors = FALSE)
names(results)
## [1] "adminCode1" "lng" "geonameId"
## [4] "toponymName" "countryId" "fcl"
## [7] "population" "countryCode" "name"
## [10] "fclName" "adminCodes1.ISO3166_2" "countryName"
## [13] "fcodeName" "adminName1" "lat"
## [16] "fcode" "Village" "Departement"
## [19] "District"
Now we can loop around locations, to search and append results.
for (i in 1:nrow(drc)) {
#i <- 1
Village <- as.character(drc[ i, c("VILLAGES")])
Departement <- as.character(drc[ i, c("Departement")])
District <- as.character(drc[ i, c("District")])
cat(paste0("searching for ", Village, "\n"))
resulti <- GNsearch(q = Village ,
country = "CD",
featureClass = "P",
continentCode = "AF",
fuzzy = "0.6",
maxRows = 20)
cat(paste0("Results potential ", nrow(resulti), "\n"))
if (nrow(resulti) > 0 ) {
resulti$Village <- Village
resulti$Departement <- Departement
resulti$District <- District
results <- rbind(results, resulti) } else { results <- results}
}
## searching for Obembo
## Results potential 0
## searching for Obambou
## Results potential 0
## searching for Yeleyele
## Results potential 4
## searching for Yabayeleyele
## Results potential 0
## searching for Talanamisso
## Results potential 0
## searching for Nongo
## Results potential 12
## searching for Nianga Kake
## Results potential 3
## searching for Ndzokou
## Results potential 0
## searching for Ndoumba
## Results potential 17
## searching for Moutembou
## Results potential 0
## searching for Moumenguele
## Results potential 0
## searching for Monokoboli
## Results potential 1
## searching for Mongombete
## Results potential 6
## searching for Moliangolo
## Results potential 0
## searching for Mogbala
## Results potential 20
## searching for Mobéyé (village autochtones)
## Results potential 0
## searching for Mawangui
## Results potential 20
## searching for Mankolo
## Results potential 20
## searching for Makolo
## Results potential 20
## searching for Loumbe
## Results potential 20
## searching for Lissala Ngomba
## Results potential 0
## searching for Ligoyi
## Results potential 8
## searching for Lignete
## Results potential 1
## searching for Kpeta
## Results potential 1
## searching for Itele
## Results potential 6
## searching for Ikongolo
## Results potential 10
## searching for Gaga
## Results potential 20
## searching for Etima
## Results potential 2
## searching for Epena centre (4 quartiers)
## Results potential 0
## searching for Elenda
## Results potential 19
## searching for Bossessengue
## Results potential 0
## searching for Bokpende
## Results potential 20
## searching for Bokoumba
## Results potential 20
## searching for Bogboko
## Results potential 20
## searching for Boboukou
## Results potential 2
## searching for Bissobe
## Results potential 1
## searching for Bete-Ntolo
## Results potential 1
## searching for Bangala
## Results potential 20
## searching for Afrique du sud
## Results potential 0
## searching for
## Results potential 0
#names(resulti)
Last write resuts in csv file.
write.csv(results, "results.csv", row.names = FALSE)
et voila…
Share this post
Twitter
Google+
Facebook
Reddit
LinkedIn
StumbleUpon
Pinterest
Email