Integrating Economists Online and ODaP

Economists Online and ODaP have similar goals. Although the scope of Economists Online is broader than that of ODaP, but both aim at linking Open Access publications and the corresponding data. Economists Online is doing this for a specific discipline, economics and the scope is international. In ODaP the scope is that of a specific university, Tilburg University, but is broader than only economics. Economists Online and ODaP have each their own dataverse. The label of the dataverse for Economists Online is NEEO = Network of European Economists Online. And the label of the Dataverse of ODaP is TU = Tilburg University. Within the NEEO dataverse there are collections defined for each university that is contributing to Economists Online. The collection of the TU dataverse correspond to the Schools of Tilburg University.

There is a overlap between the NEEO and the TU dataverses. We decided that the NEEO collection for Tilburg University and the TU collection for the Tilburg School of Economics and Management (TISEM) will have the same content. It is not efficient to describe the same datasets twice. To avoid this we have defined the TISEM collection to be dynamic: its content is determined dynamically by a search in DVN. The Dataverse Network system has three types of collections: static, dynamic and linked collections. http://thedata.org/book/manage-collections defines the three types as follows:

  • Static collection – You assign specific studies to this type of collection.
  • Dynamic collection – You can create a query that gathers studies into a collection based on matching criteria, and keep the contents current. If a study matches the query selection criteria one week, then is changed and no longer matches the criteria, that study is only a member of the collection as long as it’s criteria matches the query.
  • Linked collection – You can link an existing collection from another dataverse to your dataverse homepage. Note that the contents of that collection can be edited only in the originating dataverse.

At the moment we have defined the NEEO collection for Tilburg University as a static collection and the TU collection for TISEM as a dynamic collection. Originally the query that defined the TISEM collection was:

relatedPublications:”wo.uvt.nl”

The reason for this was that the permalinks to publications of Tilburg University contain the string wo.uvt.nl. Searching for this string in the element relatedPublications would find all the studies in the NEEO collection for Tilburg University. Example, the study with handle hdl:1902.1/12892 has two permalinks in the relatedPublications element:

By adding permalinks to its description it is possible to link a dataset to the search systems in which the related publication can be found. In this example, both permalinks use the same oai-identifier oai:wo.uvt.nl:3125512 because the link is to the same repository record that is included in two different search systems. wo.uvt.nl is unique for the oai-identifiers of the Tilburg University Repository. However, this query is too broad for collecting the records of TISEM because it not only finds the Tilburg University records in the NEEO dataverse, but also the records in the other collections in the TU dataverse and even records in other dataverses.

We had to redefine the query. The query must indicate something that is unique for the Tilburg records in the NEEO dataverse. The string wo-tilburguniversity-nl is the first part of the identifiers of the Tilburg records in Economists Online. The idenfiers of the repository records in the local search system starts with ir-uvt-nl. The new query defining the TISEM collection has become:

relatedPublications:”wo-tilburguniversity-nl”

We have considered to link to the TISEM collection from the NEEO dataverse. The TISEM collection of the TU dataverse becomes the static source collection for the linked Tilburg University collection in NEOO. However linked collections are not integrated into the collections tree of a dataverse as static and dynamic collections are. Linked collections are at the same level as the root collection of a dataverse. This makes linked collections less attractive for our purposes.

In the present situation the economic datasets are handled in the NEEO dataverse and the other Tilburg datasets are handled in the TU dataverse. This is confusing for information specialists who has to change to another dataverse in the case of a economic dataset (and back for a non-economic dataset). So it would be more convenient to make the TU dataverse the origin of all Tilburg datasets. In this proposal the Tilburg collection in the NEEO dataverse becomes a dynamic collection that is filled (dynamically) with studies from the TISEM collection in the TU dataverse. The query of the dynamic collection in the NEEO dataverse would be: relatedPublications:”wo-tilburguniversity-nl”

There is a another issue that relates to the fact that the same publication has more than one record in a search system because the publication is described in several sources that is used by the search system. This will be the topic of a next blog.

How datasets and publications are linked in ODaP

In ODaP the publications in the institutional repository (IR) and the datasets in DVN are linked in such a way that a user searching the IR can follow links to the datasets in DVN and vice versa an user accessing the Tilburg University dataverse in DVN can follow links to the repository records describing the related publications. So the linking in ODaP is symmetrical: if A links to B then B links to A. This is implemented in such a way that only in one system the links are maintained. The system that is the source of the links is regularly consulted for adding the reverse links to the other system.

The source system for the links is DVN. In the description of a dataset the permalink of the related publication is added. A permalink refers to a page of the Tilburg University search system Get It!. Such a permalink page functions as a splash page or a jump-off page of the publication in the repository. In this way studies in DVN link to the Open Access version of the related publications.

DVN uses the DDI standard (version 2) as metadata format for the description of the datasets. The permalinks of the related publications are stored by us in the DDI element /codebook/stdyDscr/othrStdyMat/relPubl (Related Publications). The DDI records of DVN can be harvested by using the OAI-PMH protocol. The ODaP harvester that harvests DVN, sends the DDI records to the Enrichment Server by using the SRU Record Update protocol. The Enrichment Server uses the permalinks stored in the DDI records to determine the records of the Tilburg University search system Get It! that has to be enriched with the DDI. The records in Get It! come from different sources. One of them being the Tilburg University Repository based on the ARNO system. The ARNO system has no end users interface itself. For this iPort and Get It! are used. Getting the DIDL/MODS records supplied by ARNO into Get It! is done by a harvester as depicted in the following diagram.

Note that the harvesting of the DIDL/MODS from the repository is first and the harvesting of the DDI from DVN comes next. In this way the DIDL/MODS as a representation of a publication is enriched with the DDI as a representation of a dataset and not the other way around. The Enrichment Server can also be used to enrich a search engine record with other information that is coming from an external source. This enriched whole can also be represented as an OAI ORE Resource Map.

This way of enriching bibliographical records is also implemented for Economists Online and for the European Values Study portal. ODaP is most similar to Economists Online. Because the ODaP implementation is still experimental, I will give an example of Economists Online.

This permalink http://www.economistsonline.org/publications?id=eprints-lse-ac-uk:oai:eprints.lse.ac.uk:3607 contains a link to a dataset in DVN. The DVN dataset descriptions have a handle as an unique identifier. In this case the handle is hdl:1902.1/12930 that is resolved by http://dvn.iq.harvard.edu/dvn/study?globalId=hdl:1902.1/12930. When we follow the latter link DVN represents a record with in Related Publications the permalink of the publication.

ODaP and Dataverse Network

In ODaP a dataset belongs to a study that is defined by a particular publication. The data that are used in the study that resulted in the publication is what we call a dataset. These data can be part of a larger dataset or database. In many cases the data as used in a study are the result of processing existing data. The dataset as used for a publication are stored and described in the Dataverse Network of the Institute for Quantative Social Science at Harvard University. We use two dataverses. One is the existing dataverse for Economists Online that is set up in the European project NEEO and the other is a new dataverse for Tilburg University. The dataverse for Tilburg University is made up of collections that correspond to the Schools of the university. The collection of the Tilburg School of Economics and Management (TISEM) is a socalled dynamic collection, while the collections of the other Schools are static. A dynamic collection is populated by studies from other (static) collections. The TISEM collection is defined to be the same collection as the static collection of Tilburg University in the dataverse for Economists Online. In this way our economic datasets can live in two dataverses. The management of the studies that live in one or more dataverses is done in the dataverse that houses the static collection. In our cases the economic studies are described in the dataverse for Economists Online and the studies (datasets) of the other Schools are described in the dataverse for our university.
We had several sessions of one hour to make our information specialists Corry, Ingrid and Trijnie familiar with the DVN system. We try to follow the Guidelines as developed for the NEEO project. It turned out that it is better that ODaP has its own Guidelines. These Guidelines are under review. We will make them available when they are finished.

ODaP and the acquisition of datasets

Acquiring datasets is a lot of work. It means a lot of talking and explaining. We had meetings with the management of the Schools. Mails are sent to leaders of research groups and to individual researchers. We have a (growing) list of 62 names that we want to contact. We contact as much as possible researchers with which the library already has collaborated in the past. But old contacts can lead to new ones; expanding our network. We already contacted 2/3 of the names on the list; in most cases we mail first and then make an appointment. The result until now is that we collected 6 datasets, but we are just starting. In many cases researchers are sympathetic to the idea, but first want to organise (and archive) their data better. This means that ideally we collaborate with the researchers in the different phases of their research. If successful one contact can be good for connecting datasets to several publications. We located more than 60 publications with Tilburg authors that are based on one (longitudinal) dataset. Hopefully we can use these 60 publications in our project. Personal meetings are in most cases necessary to explain and convince researchers.