Data citations in a virtual world
Citations play an important role in the accuracy and transparency of data reuse. Whenever data will be reproduced, it is vital that the author provide access to the original source so that readers can verify its accuracy. If statements and facts cannot be verified, they will become consequently less important. In addition, authors have to pass peer reviews and in order for their research to be taken seriously, the data that they quote from databases has to be able to be reproduced.
A researcher needs to be able to cite FAO statistical databases. In addition to the previously mentioned importance factors, citing data found within FAO databases also protects intellectual copyrights and provides the means to track how much and where the data is being used.
Therefore, it is important that data is reproducible and retrievable, not just in the short term but in the future as well.
Citation challenges
The traditional methods of creating citations based on titles, page numbers, and so on, work well for books and journals, but are of limited use in today’s virtual information world. Take, for example, a printed document that contains citations. It doesn’t matter how long ago the document or the citations were created (i.e. 2 days ago or 200 years), the citations will always remain constant and valid.
On the other hand, citations that focus on virtual data are difficult to reproduce and to maintain; URLs and website titles change constantly for many different reasons, especially when it comes to queries. Therefore, providing URLs in citations often brings the user to a page that no longer exists.
It is clear that new methods are needed for creating citations of virtual resources. This is especially true for results from online statistical databases which can result from highly complex queries. How does someone cite the results of a query in a way that is useful both to humans and machines? And not only today but also in the future? The challenge is to create a standardized form of citations that can be retrievable over time while being easy for the author to create.
The data.fao.org solution
To address these issues, data.fao.org is implementing a solution that will provide a way for the data to be retrievable in a constant and reliable way now and in the future. The data.fao.org citation solution includes the following aspects:
Reproducing queries presents another challenging question. Queries can be very complex and long, therefore putting them into words isn’t efficient. The solution for the citations, therefore, is to not include all query parameters as human readable words in the citation. Instead, the citations should use basic words describing the query, such as those used to cite a table, without including all the specific parameters that were chosen. More importantly, the URI will include the query and be able to reproduce it.
Once the solutions are implemented, it will make using data.fao.org easier and more convenient to use. Additional benefits also include allowing FAO to track the impact of its statistical data, understand where and how it is being used, and improve the acknowledgements of it. The solution that we are developing could be applied to other types of databases as well, and may even be of use to organizations outside of FAO.

