Country code harmonization in data.fao.org
Most of the data disseminated through the data.fao.org portal and its APIs is country-based information such as agricultural land-use data, production, trade and consumption data, water and agriculture information, and so on. This huge amount of data comes from heterogeneous databases and different FAO applications. One of the goals of the data.fao.org project is to facilitate the retrieval, comparability, and exchange of these country-based datasets. In this context, the adoption of standard coding systems becomes a core requirement to ensure data interoperability; defining a common way to identify countries and territories. The goal is to adopt a 'common language' for both the data-provider and the data-consumer.
In a nutshell, these coding systems foresee short alphabetic or numeric geographical codes, developed to represent countries and dependent areas, for use in data processing and communications. Several different coding systems have been adopted to achieve this goal, including both international and FAO.
International coding systems:
- The standard ISO 3166-1 is stated using three different codes:
- The alpha-2 - a two-letter code.
- The alpha-3 - a three-letter code.
- The numeric - a three-digit numeric code
- Many other common international coding systems exist that can be used to define countries, geographical entities or areas, for example, UNDP code, dbpedia.
FAO coding systems:
- The Global Administrative Unit Layers (GAUL) project - a FAO initiative that aims to provide reliable and standardized geographic information on national and sub-national administrative units for all countries in the world. It also provides a mechanism to track boundary changes over time and to maintain a consistent coding system throughout the layers. The GAUL codes are numeric. An updated version of GAUL is currently in-plan.
- FAOSTAT - bases its regional classification on the M49 UN classification. FAOSTAT codes are numeric.
- AGROVOC - a dictionary that identifies and codifies concepts related to food, nutrition, agriculture, fisheries, forestry, environment, and other related domains, as well as country and geographic terms. Agrovoc codes are numeric.
Agrovoc search terms page
Country identification in data.fao.org
The work of identifying countries in data.fao.org was done in collaboration with other FAO projects, such as Name of Countries (NOCs) and Country Profiles. The goal of the NOCs project is to maintain the official country and territory names. Currently the project is expanding by starting to provide additional information and by managing country codes. A joint effort has been made between the NOCs and data.fao.org teams to update and maintain country codes in order to obtain a complete and official reference. The geopolitical ontology reflected in the Country Profiles project ensures that FAO relies on a master reference for geopolitical information, standard coding systems for maps (UN, ISO, FAOSTAT, AGROVOC, etc), provides relationships among territories (land borders, group membership, etc), and tracks historical changes.
In practice, the identification of a country using country-code standards in data.fao.org starts by identifying the country using the different standards. In order to achieve a consistent mapping of the different country codes in data.fao.org, we extracted the country codes from the NOCS source for all of the countries, territories, and country codes of interest. We then inventoried and implemented a master table with the various country codes and the other geographical entities coming from the official sources. Finally, we compared the information from the NOCS source and from the official standard sources. In the cases where there were differences or gaps, we made further investigations in order to identify the official codes and insert the right mappings.
In the framework of data.fao.org, the focus was on the ISO3, ISO2, UN, GAUL, UNDP and FAOSTAT country codes that appear on the country profile code of the portal. For example:
Country name (en): Bangladesh
In data.fao.org, the country codes for each country are displayed on the landing page for the country with the various standards mentioned above.
Country landing page for Bangladesh in data.fao.org
The mapping aspect is crucial and there is a need for good communication with different stakeholders. In fact, the issue cuts across a number of heterogeneous roles: project management, legal, data management, and IT. Country codes depend on governance, international standards, issues related to management, and so on.
In conclusion, this work contributes to the harmonization of country codes at the FAO level, according to international and FAO standards, to identify countries. In the same way, it reinforces the metadata of the data that is provided by geographical entities through the portal and allows us to provide high quality work to the end user. The high quality work is reflected at two levels. At the first level, with a common country code, we are able to speak the same language and indeed to receive integrated data from various and numerous sources (for instance, The World Bank, countries and so on). At the second level, and reciprocally, adopting a common language to identify a country allows us to disseminate various information through data.fao.org with adequacy and relevance.
Author: Stéphanie Petit