OpenAGRIS: AGRIS' Linked Open Data Model
OpenAGRIS is an RDF-aware system; a mashup application that allows users to query AGRIS-RDF content, interlinking all records to external sources of information. It is the only published Web application in FAO that uses the Linked Open Data methodology and one of the few in the agricultural domain. In OpenAGRIS, 7.6 million bibliographic records become 7.6 million mashup pages. OpenAGRIS currently receives over 300,000 hits a month.
OpenAGRIS is based on the RDF-ization of the AGRIS database, a vast collection of bibliographic records on agricultural science and technology that serves both developed and developing countries. The records are enhanced by the AGROVOC thesaurus, which is extensively used by cataloguers to enrich data indexing in agricultural information systems. This multilingual agricultural thesaurus was developed with the cooperation of FAO member countries. The first version of AGROVOC was produced in 1982 and distributed to all AGRIS centers. In other words, OpenAGRIS transforms the AGRIS bibliographical database into a Linked Open Data environment.
In recent years the life cycle of an AGRIS record has changed enormously. More specifically:
- In the past, data was catalogued and delivered to a central database by national libraries (traditional AGRIS Centers) via floppy disks and email.
- With the arrival of the Open Access movement and the growing number of OAI-PMH (The Open Archives Initiative Protocol for Metadata Harvesting) repositories, AGRIS modified its approach and began to also index data harvested from service providers such as DOAJ (Directory of Open Access Journals), whose content comes from external publishers.
- AGRIS then decided to publish its records as linked data in RDF but it quickly became clear that crucial metadata was missing.
- Every phase in the process generates administrative metadata. For each record, AGRIS has always registered authors, titles, dates, and the cataloging institute.
The work of converting AGRIS to RDF quickly brought out the need to not only produce administrative metadata, but also to disambiguated data wherever possible in order to enhance traceability. The eventual goal has been to completely disambiguate journals, authors and institutions.
What AGRIS has become: A semantic mash-up!
The AGRIS team at FAO decided to treat AGRIS records abstractly as metadata sets that could be leveraged to automatically access and display related data. It was at that moment that the semantic mashup application OpenAGRIS was conceived. OpenAGRIS aggregates information from different Web sources using AGRIS records exposed as sets of triples in a Linked Open Data environment.
In the last three years, AGRIS has focused attention on the metadata end users and used several methods create and enrich these metadata.
- AGRIS consumes metadata provided by the community and publishes it as open data. The metadata is captured by pulling data through harvesting from clients (e.g. aggregators, institutional repositories, using protocols such as OAI-PMH).
- Metadata is also found by pushing data to AGRIS from clients (e.g. national libraries or journal publishers).
- Metadata is randomly manually checked to look for inconsistencies or recurring semantic errors.
- Input format is mapped to AGRIS RDF.
- Metadata is converted to AGRIS RDF, running the AgroTagger when AGROVOC keywords are not available. In fact, AGROVOC is the backbone of the mashup application: thanks to the alignments to other thesauri and to the semantic meaning given to a resource, AGROVOC is used to interlink to external dataset, discovering information semantically related to the current resource, with an high percentage of accuracy.
- Before adding metadata to the triplestore and indexing them in the Solr index, duplicates are detected and managed, as the same record may be indexed in multiple collections or be duplicated in the same repository.
Fig. 1 - Diagram showing the long flow of an AGRIS artifact, from genesis to dissemination.
How is each AGRIS record identified?
Each record has an identifier (ARN) that has a predefined structure and contains information on the data source together with the bibliographic record’s year of creation. For example, “IT 2008 0 00091” refers to a record created in 2008 from a specific AGRIS data provider in Italy, whose progressive number is 91. Data providers’ information is stored in the CIARD RING and triplified in the AGRIS centres dataset (each data provider has its own unique URI).
OpenAGRIS - In practice
Let’s imagine that there’s a PhD student searching for the latest figures related to poppy crops in Afghanistan. A simple Google search on the keywords “poppy, harvest, Afghanistan” produces results including several newspaper articles, a Wikipedia entry and Yahoo chat; everything but the specific data that was needed. Performing the same keyword search in the OpenAGRIS database produces a significantly different result: no less than 51,824 relevant entries. Just looking at the first two, the PhD student has found exactly what they needed: an Afghanistan Opium survey from the UN office on Drugs and Crime (UNODC) and the International Center for Agricultural Research in the Dry Areas (ICARDA) from the future harvest consortium to rebuild agriculture in Afghanistan. A day’s worth of work made easy using the OpenAGRIS database.
Fig 2. - An example OpenAgris search result
The future alongside data.fao.org
So what is the link between OpenAGRIS and data.fao.org? AGROVOC is essentially the glue; both OpenAGRIS and data.fao.org resources have been indexed with AGROVOC. As a result OpenAGRIS can and will consume data.fao.org data in the near future and be able to display them on the mashup pages.
The idea is that OpenAGRIS can dynamically access data.fao.org widgets (maps, statistics, charts and pictures) using AGROVOC to extract significant content or, even better, to extract content that is semantically related to OpenAGRIS articles.
Future developments to data.fao.org APIs will allow searches made in OpenAGRIS to point to data.fao.org’s statistical data and maps. Another possibility is that data.fao.org will be able to access AGRIS data by querying the sparql endpoint and filtering with AGROVOC to give semantic meaning to queries.
Author: Christabel Clark