Integrating Economists Online and ODaP

Economists Online and ODaP have similar goals. Although the scope of Economists Online is broader than that of ODaP, but both aim at linking Open Access publications and the corresponding data. Economists Online is doing this for a specific discipline, economics and the scope is international. In ODaP the scope is that of a specific university, Tilburg University, but is broader than only economics. Economists Online and ODaP have each their own dataverse. The label of the dataverse for Economists Online is NEEO = Network of European Economists Online. And the label of the Dataverse of ODaP is TU = Tilburg University. Within the NEEO dataverse there are collections defined for each university that is contributing to Economists Online. The collection of the TU dataverse correspond to the Schools of Tilburg University.

There is a overlap between the NEEO and the TU dataverses. We decided that the NEEO collection for Tilburg University and the TU collection for the Tilburg School of Economics and Management (TISEM) will have the same content. It is not efficient to describe the same datasets twice. To avoid this we have defined the TISEM collection to be dynamic: its content is determined dynamically by a search in DVN. The Dataverse Network system has three types of collections: static, dynamic and linked collections. http://thedata.org/book/manage-collections defines the three types as follows:

  • Static collection – You assign specific studies to this type of collection.
  • Dynamic collection – You can create a query that gathers studies into a collection based on matching criteria, and keep the contents current. If a study matches the query selection criteria one week, then is changed and no longer matches the criteria, that study is only a member of the collection as long as it’s criteria matches the query.
  • Linked collection – You can link an existing collection from another dataverse to your dataverse homepage. Note that the contents of that collection can be edited only in the originating dataverse.

At the moment we have defined the NEEO collection for Tilburg University as a static collection and the TU collection for TISEM as a dynamic collection. Originally the query that defined the TISEM collection was:

relatedPublications:”wo.uvt.nl”

The reason for this was that the permalinks to publications of Tilburg University contain the string wo.uvt.nl. Searching for this string in the element relatedPublications would find all the studies in the NEEO collection for Tilburg University. Example, the study with handle hdl:1902.1/12892 has two permalinks in the relatedPublications element:

By adding permalinks to its description it is possible to link a dataset to the search systems in which the related publication can be found. In this example, both permalinks use the same oai-identifier oai:wo.uvt.nl:3125512 because the link is to the same repository record that is included in two different search systems. wo.uvt.nl is unique for the oai-identifiers of the Tilburg University Repository. However, this query is too broad for collecting the records of TISEM because it not only finds the Tilburg University records in the NEEO dataverse, but also the records in the other collections in the TU dataverse and even records in other dataverses.

We had to redefine the query. The query must indicate something that is unique for the Tilburg records in the NEEO dataverse. The string wo-tilburguniversity-nl is the first part of the identifiers of the Tilburg records in Economists Online. The idenfiers of the repository records in the local search system starts with ir-uvt-nl. The new query defining the TISEM collection has become:

relatedPublications:”wo-tilburguniversity-nl”

We have considered to link to the TISEM collection from the NEEO dataverse. The TISEM collection of the TU dataverse becomes the static source collection for the linked Tilburg University collection in NEOO. However linked collections are not integrated into the collections tree of a dataverse as static and dynamic collections are. Linked collections are at the same level as the root collection of a dataverse. This makes linked collections less attractive for our purposes.

In the present situation the economic datasets are handled in the NEEO dataverse and the other Tilburg datasets are handled in the TU dataverse. This is confusing for information specialists who has to change to another dataverse in the case of a economic dataset (and back for a non-economic dataset). So it would be more convenient to make the TU dataverse the origin of all Tilburg datasets. In this proposal the Tilburg collection in the NEEO dataverse becomes a dynamic collection that is filled (dynamically) with studies from the TISEM collection in the TU dataverse. The query of the dynamic collection in the NEEO dataverse would be: relatedPublications:”wo-tilburguniversity-nl”

There is a another issue that relates to the fact that the same publication has more than one record in a search system because the publication is described in several sources that is used by the search system. This will be the topic of a next blog.

How datasets and publications are linked in ODaP

In ODaP the publications in the institutional repository (IR) and the datasets in DVN are linked in such a way that a user searching the IR can follow links to the datasets in DVN and vice versa an user accessing the Tilburg University dataverse in DVN can follow links to the repository records describing the related publications. So the linking in ODaP is symmetrical: if A links to B then B links to A. This is implemented in such a way that only in one system the links are maintained. The system that is the source of the links is regularly consulted for adding the reverse links to the other system.

The source system for the links is DVN. In the description of a dataset the permalink of the related publication is added. A permalink refers to a page of the Tilburg University search system Get It!. Such a permalink page functions as a splash page or a jump-off page of the publication in the repository. In this way studies in DVN link to the Open Access version of the related publications.

DVN uses the DDI standard (version 2) as metadata format for the description of the datasets. The permalinks of the related publications are stored by us in the DDI element /codebook/stdyDscr/othrStdyMat/relPubl (Related Publications). The DDI records of DVN can be harvested by using the OAI-PMH protocol. The ODaP harvester that harvests DVN, sends the DDI records to the Enrichment Server by using the SRU Record Update protocol. The Enrichment Server uses the permalinks stored in the DDI records to determine the records of the Tilburg University search system Get It! that has to be enriched with the DDI. The records in Get It! come from different sources. One of them being the Tilburg University Repository based on the ARNO system. The ARNO system has no end users interface itself. For this iPort and Get It! are used. Getting the DIDL/MODS records supplied by ARNO into Get It! is done by a harvester as depicted in the following diagram.

Note that the harvesting of the DIDL/MODS from the repository is first and the harvesting of the DDI from DVN comes next. In this way the DIDL/MODS as a representation of a publication is enriched with the DDI as a representation of a dataset and not the other way around. The Enrichment Server can also be used to enrich a search engine record with other information that is coming from an external source. This enriched whole can also be represented as an OAI ORE Resource Map.

This way of enriching bibliographical records is also implemented for Economists Online and for the European Values Study portal. ODaP is most similar to Economists Online. Because the ODaP implementation is still experimental, I will give an example of Economists Online.

This permalink http://www.economistsonline.org/publications?id=eprints-lse-ac-uk:oai:eprints.lse.ac.uk:3607 contains a link to a dataset in DVN. The DVN dataset descriptions have a handle as an unique identifier. In this case the handle is hdl:1902.1/12930 that is resolved by http://dvn.iq.harvard.edu/dvn/study?globalId=hdl:1902.1/12930. When we follow the latter link DVN represents a record with in Related Publications the permalink of the publication.

ODaP and Dataverse Network

In ODaP a dataset belongs to a study that is defined by a particular publication. The data that are used in the study that resulted in the publication is what we call a dataset. These data can be part of a larger dataset or database. In many cases the data as used in a study are the result of processing existing data. The dataset as used for a publication are stored and described in the Dataverse Network of the Institute for Quantative Social Science at Harvard University. We use two dataverses. One is the existing dataverse for Economists Online that is set up in the European project NEEO and the other is a new dataverse for Tilburg University. The dataverse for Tilburg University is made up of collections that correspond to the Schools of the university. The collection of the Tilburg School of Economics and Management (TISEM) is a socalled dynamic collection, while the collections of the other Schools are static. A dynamic collection is populated by studies from other (static) collections. The TISEM collection is defined to be the same collection as the static collection of Tilburg University in the dataverse for Economists Online. In this way our economic datasets can live in two dataverses. The management of the studies that live in one or more dataverses is done in the dataverse that houses the static collection. In our cases the economic studies are described in the dataverse for Economists Online and the studies (datasets) of the other Schools are described in the dataverse for our university.
We had several sessions of one hour to make our information specialists Corry, Ingrid and Trijnie familiar with the DVN system. We try to follow the Guidelines as developed for the NEEO project. It turned out that it is better that ODaP has its own Guidelines. These Guidelines are under review. We will make them available when they are finished.

ODaP and the acquisition of datasets

Acquiring datasets is a lot of work. It means a lot of talking and explaining. We had meetings with the management of the Schools. Mails are sent to leaders of research groups and to individual researchers. We have a (growing) list of 62 names that we want to contact. We contact as much as possible researchers with which the library already has collaborated in the past. But old contacts can lead to new ones; expanding our network. We already contacted 2/3 of the names on the list; in most cases we mail first and then make an appointment. The result until now is that we collected 6 datasets, but we are just starting. In many cases researchers are sympathetic to the idea, but first want to organise (and archive) their data better. This means that ideally we collaborate with the researchers in the different phases of their research. If successful one contact can be good for connecting datasets to several publications. We located more than 60 publications with Tilburg authors that are based on one (longitudinal) dataset. Hopefully we can use these 60 publications in our project. Personal meetings are in most cases necessary to explain and convince researchers.

Open Data and Publications

We got funding by SURFfoundation for a 5 month project. In this project we will help researchers to publish datasets that they used for their publications. The project focusses on researchers from the School of Economics and Managment and the School of Social and Behavioral Science. The goal is to connect 40 publications with the underlying datasets.

Technical information
The Dataverse Network system will be used for the description and storage of the datasets. In the description of the dataset there will be a link to the metadata record of the publication in the search system of Tilburg University. The dataset descriptions are according to the DDI version 2 standard. The DDI records are harvested from the DVN system using the OAI-PMH protocol. In the search system, the DDI records are added to the metadata records of the corresponding publications. The metadata records are MPEG21/DIDL records that contain the bibliographical description in MODS and links to the full text in the institutional repository. The combination of the DDI record and the DIDL record represents a so-called enhanced publication that can be represented by an OAI-ORE Resource Map. With the exception of the Resource Maps, the same setup is used in Economists Online. The technical work for this was done in the European project NEEO, Also related is the portal of the European Values Study that is the outcome of the project DatapluS funded by SURFfoundation.

The real challenge
The real challenge is however not technical but organisational and behavioural. How to convince and motivate researchers to make their datasets available for open access (this also involves limited access in the sense that access requires the consent of the researcher or someone acting on his/her behalf)? At the end of the project we want to have in place procedures for the delivery of datasets comparable to and integrated with the procedures for the delivery of open access publications to the institutional repository. In following blogs, I will describe how we handle this challenge.

LAIRD is ODAP in Edinburgh

Rob and I met Robin Rice a year ago at the NEEO conference that was held in the British Library. She attended the workshop that we gave on enhancing economic publications with datasets. Rob was invited to contribute to a session of an Open Access Conference in Glasgow later that year.

Today Robin wrote to congratulate us on our new project Open Data and Publications. “This sounds similar to what we’ve been working on in LAIRD (Linking Articles Into Research Data), maybe we can keep in touch about it.”   Yes we will do that.

OPACs and the real information marketplace : why providing a mediocre product at a high price no longer works

Presentation of Lloyd Sokvitne of the State Library of Tasmania.

Paper: http://www.nla.gov.au/lis/stndrds/grps/acoc/documents/Sokvitne.doc
Powerpoint: http://www.nla.gov.au/lis/stndrds/grps/acoc/documents/Sokvitne.ppt
Audio: http://www.nla.gov.au/lis/stndrds/grps/acoc/Skovitne.mp3
Audio, questions: http://www.nla.gov.au/lis/stndrds/grps/acoc/Skovitne_questions.mp3

Records are exported from ILMS. New search service with Verity K2. Links to ILMS for info form circulation control.
Faceted browsing; also from home page (thus facets apply to complete collection).
Default sorting result set:
According to Sokvitne is the most common sort order is alphabetical and sometimes creation date of record is used. Alphabetical order is probably the default for public libraries and academic libraries prefer date. But since Google the user expects the most relevant items on top of the list. But what is relevant? For academic users recency is probably an important ingredient of relevancy. Other ranking criteria? Sokvitne mentions popularity (based on lending info), citations, …
According to Sokvitne, the user applies the principle of ‘satisficing’ in dealing with result lists. The user doesn’t read the list from beginning to the end but stops with the first result that is good enough.

Problem with a heterogeneous database that is a merger of other databases is the minimal set of shared facets.

Catalogue: a call-number lookup system or also a resource discovery tool?

http://www.lib.ncsu.edu/staff/kaantelm/antelman_lynema_pace.pdf

the catalog has become for many students
a call-number lookup system, with resource discovery
happening elsewhere

This article describes the new catalogue of North Carolina State University (NCSU) Libraries. The technology used is Endeca’s Information Access Platform. This allows, e.g., for faceted searching; in this article it is called faceted browsing and Endeca calls it Guided Navigation. An assessment study is reported based on log analysis, analysis of search results and usability testing.

Remarkable is the short implementation timeline: only 5 months. A critical factor was the belief of the library staff that “not all issues, particularly “edge cases,”
(i.e., rarely occurring scenarios) must be resolved before
releasing a new service.

See (or better hear) also the sound recording of the Access 2006 presentation by Tito Sierra on Improving the Catalogue Interface using Endecahttp://www.access2006.uottawa.ca/2006-10-12-03-sierra.mp3; the slides are here: http://www.lib.ncsu.edu/endeca/presentations/200610-access-endeca.ppt

No progress in web services for library systems

“The NISO Web Services and Practices Working Group, after spending several months on its charge to create a best practices document, has concluded that the time is not right for this group to complete this task. The current web services landscape is still in development and the group feels it is too early to write such a backwards-facing document. Members of this group came together, largely drawn from the then-inactive VIEWS group, expecting various things, from a VIEWS-like survey of problem areas needing standards to a narrow focus on a particular area of difficulty to a state-of-the-universe overview. The group feels that there are problems in the library/web services interface that need to be tackled with standards efforts, such as system frameworks and architecture (particularly around connecting) and an e-framework in the e-learning area. There is strong consensus that NISO would benefit in some areas from working with existing efforts (such as the UK-based e-learning efforts) and in some cases in using its vantage point in the standards world to identify areas in which a broader set of standards, perhaps starting as an extension of existing standards, might bring wider usage and constituency.”

Link: http://www.niso.org/committees/Services/Services_comm.html

Belang van informatiebestanden voor de UvT-wetenschapper

In de periode van juni – september van dit jaar heeft het team Facultaire Dienstverlening (FD) een onderzoek gedaan naar de informatiebehoeften en informatiegebruik van wetenschappelijke medewerkers van de UvT. Ruim honderd wetenschappers die gebruik maken van de Persoonlijke Vakliteratuurservice (PVS) zijn door de informatiespecialisten van FD geinterviewd. Een vraag had betrekkig op het belang van informatiebestanden. In het onderzoeksverslag wordt niet ingegaan op de uitslag van deze vraag. Toch is het m.i. een belangrijke vraag. Gelukkig bevat het verslag een tabellenboek met de percentages per antwoordcategorie.

De vraag naar het belang van informatiebestanden is uitgesplitst naar een zevental (typen) informatiebestanden. Voor elk (type) bestand kon een respondent antwoorden met ‘essentieel’, ‘belangrijk’, ‘enigszins belangrijk’ en ‘onbelangrijk’. In de volgende tabel is het percentage opgenomen van de antwoorden ‘essentieel’ en ‘belangrijk’. De hoogst scorende bestanden staan bovenaan.

Online Contents UvT 86% hoogste: FEB 97%
Catalogus UvT 83% hoogste: FEB 89%
Vakgerichte informatiebestanden 77% hoogste: FEB 96%; laagste FCC: 41%
Tijdschriftencatalogus UvT 76% hoogste: FSW 86%
Nederlandse Centrale Catalogus 72% hoogste: FRW 96%; laagste: FEB 46%
Landelijke Online Contents 70% hoogste: FCC/FRW 82%
Google Scholar 43% hoogste: FSW 58%; laagste FRW 35%

Deze gegevens sporen met de loggegevens van iPort. Gemeten naar het aantal zoekacties is de lokale Online Contents nummer 1, gevolgd door de catalogus en als goede derde ABI/Inform. Voor het opvragen van full text wordt de lokale Online Contents wederom het meest gebruikt, gevolgd door ABI/Inform. Voor de gebruikers van de grootste faculteit, FEB, scoren in dit onderzoek Online Contents UvT en vakgerichte informatiebestanden, waartoe ABI/Inform gerekend moet worden, met 97% en 93% het hoogst. Voor de onderzochte medewerkers van FEB staat de catalogus op de derde plaats, ook al schatten zij het belang van de catalogus gemiddeld iets hoger in dan de gehele onderzochte groep.