BASE - Bielefeld Academic Search Engine

Golden Rules for Repository Managers

Why the Golden Rules?

We index all sources with scientific content (journals, repositories, digital collections, etc.) that have an OAI interface and provide metadata via OAI-PMH (more information on OAI can be found on the pages of the Open Archives Initiative or in Wikipedia). The data is stored on servers at the University of Bielefeld.

These "Golden Rules" help you optimize the delivery of your data via your OAI interface. If you adhere to these rules, a problem-free and fast indexing in BASE is guaranteed. Documents from your source are optimally and completely presented in BASE and of course all other services that index data via your OAI interface also benefit from this.

You can check some of the points listed here with our OAI-PMH Validator OVAL.

If your source does not have an OAI interface

If your source does not have an OAI interface, direct indexing of your source is currently not possible. In this case, upload your documents in aggregators (e.g. DataCite or Zenodo) or in specialized repositories already indexed in BASE (see our list of sources) or register your Open Access journal in the DOAJ. We index these sources regularly. If your documents are contained in such an aggregator, they can usually already be found in BASE. A separate registration of your source in BASE is then not necessary.

The best way for your documents to be indexed by us and found in BASE, however, is to operate your own OAI interface. Only then, for example, can the name of your source appear under "Content Provicer" in the results list with supplementary information and your source also appears directly as an independent entry in the list of sources.

General OAI Interface

Functioning OAI interface

Your OAI interface is freely accessible, stable and constantly responding. The request for ListRecords in oai_dc format returns results without a timeout or output error. You should check the functionality of your OAI interface at regular intervals, e.g. using a browser.

If your OAI interface does not function correctly, it is not possible to index your source. The format oai_dc is also mandatory.

Records per page

For each ListRecords response of your OAI interface, ideally 50-1000 data records are output. The so-called resumptionToken at the end of an OAI-PMH response file works and delivers the next 50-1000 records.

If less than 50 records per page are delivered, this will result in many individual calls when we harvest your source. More than 1000 records per page, on the other hand, make the delivered files relatively large and increase the risk of aborts when harvesting the records. If the resumptionToken does not work, complete indexing is not possible.

Contact Persons

In the Identify data of your OAI interface, an e-mail address is specified in the adminEmail field, which can be used to contact the technical operator of the OAI interface. An e-mail address is available on the homepage of your source, which guarantees direct contact with the source operator.

Only if the e-mail addresses work and e-mails are read and answered can we contact you in the event of problems or questions.

Changes / Deletions / Updates

Identification of changes to metadata of individual data records

Any subsequent change to a record must be marked in your OAI interface by updating the datestamp of the document.

All indexed sources are regularly updated in BASE. If the datestamp is not updated, an update in the BASE index is not possible and the document remains unchanged and therefore incorrect in the index.

Deletion of data records

If a document is deleted from your source, the data record must be marked as deleted in the OAI interface and delivered again. Under no circumstances may the data record be completely removed from the OAI interface.

If a document is not marked as deleted but the data record is completely removed from the OAI interface, a deletion of the data record in the BASE index is not possible and the document remains incorrect in the index.

Information about fundamental changes

Should the name of your source or the URL of the OAI interface change (e.g. due to a move to another system), please let us know via our contact form. If necessary, let us know the old and new URL of your source and, if possible, the collection name of your source in BASE (you can find this in our list of sources by clicking on the number in the Documents column of your source).

We check all sources at irregular intervals and correct information (name, system, URL) if necessary. If you actively inform us about changes, make sure that your source is always fully and correctly captured and indexed by BASE. This information is then shared with the worldwide community via our OAI-PMH blog.

Contents / Metadata

Character Encoding

All content in your OAI interface (titles, author names, abstracts) is encoded in UTF-8.

Other encodings or duplicate encodings cause errors in the display of hits from your source.

Separation of Multiple Entries in a Metadata Field

If you provide several entries in a metadata field (for example, the name of the author and his ORCID iD), separate them with spaces, semicolons, and spaces.

This separation enables us to index the information separately and make it searchable.

Completeness of Metadata

BASE harvested the metadata of your source in the standard format oai_dc. Each record of your OAI interface should have metadata for a document that is as complete as possible and use standardized vocabularies. The specification of a functioning URL in <dc:identifier> is mandatory.

The more complete metadata you provide, the easier it will be to find documents from your source in BASE. Standardized vocabularies help us to assign documents from your source to the correct document type, for example, or to the right of subsequent use. Documents that do not have a URL in the identifier are not indexed.

Notes on Individual Metadata Fields

Information Element in oai_dc Criteria
URL of the publication <dc:identifier> Must be
Title <dc:title> Should be
Author <dc:creator> Should be
Publication type <dc:type> Should be
Publication date <dc:date> Should be
Language of the document <dc:language> Should be
Access and re-use rights <dc:rights> Should be
Reference / Citation <dc:source> Should be
Other parties involved in the publication <dc:contributor> Can be
File format <dc:format> Can be
Description <dc:description> Can be
Keywords <dc:subject> Can be
Publisher <dc:publisher> Can be
Related Documents <dc:relation> Can be
Content delimitation <dc:coverage> Can be

URL of the publication <dc:identifier>

Each record contains a working URL in the field <dc:identifier> (starting with http:, https:, doi: or urn:nbn:de:). This leads, if possible, to the front door of the document (info page with bibliographic information and link to the full text) or directly to the Open Access full text in PDF format.

If a record has several <dc:identifier> or if the full text is not offered in a common file format (HTML, PDF) or if it is not "Open Access", the first identifier should always lead to the front door of the document.

Provide persistent identifiers (DOI, handle, URN) that will continue to function even if the server is relocated and the URL is changed. Make sure that the DOIs etc. are registered and working with the appropriate registration agency.

Especially for DSpace installations the handle has to be configured, otherwise it will lead to a "dummy-URL" (handle.net/123456789), which generates an error message (see www.handle.net/documentation.html).

Only documents are indexed whose identifiers begin with http:, https:, doi: or urn:nbn:de: and do not lead to a "dummy-URL" (123456789). If a DOI etc. is not registered, the document is indexed, but the link in the BASE hit list leads to an error message. Sources where most of the links do not work may be removed from the index.

Examples

  • <dc:identifier>https://pub.uni-bielefeld.de/record/2710028</dc:identifier>
  • <dc:identifier>http://hdl.handle.net/10760/12746</dc:identifier>
  • <dc:identifier>https://doi.org/10.1108/07378830610715473</dc:identifier>
  • <dc:identifier>doi:10.1108/07378830610715473</dc:identifier>
  • <dc:identifier>https://nbn-resolving.de/urn:nbn:de:0070-pub-27663089</dc:identifier>
  • <dc:identifier>urn:nbn:de:0070-pub-27663089</dc:identifier>

Note on ISBN, ISSN etc.

In the current DINI certificate 2019 (in German only) it is recommended to add information such as ISBN or ISSN in the field <dc:identifier>. However, currently only URLs are indexed in BASE in the field <dc:identifier>. Other specifications without URLs are not indexed and therefore cannot be found if these specifications are only made in the field <dc:identifier>. If you want to work both DINI-compliant and BASE-optimized, insert specifications such as ISBN in both <dc:identifier> and <dc:source>.

Title <dc:title>

Enter the title in the <dc:title> field as in the original. If the publication has several titles (e.g. in different languages), repeat the field.

Example

  • <dc:title>Advanced calculus: student handbook</dc:title>

Author <dc:creator>

Indicate in <dc:creator> those persons or institutions who are the author of the publication. Specify author names according to the pattern Last Name, First Name. Specify the ORCID iD as part of the author name.

Encourage the dissemination of ORCID iDs (and other person identifiers, if applicable) to make authors uniquely identifiable (even if they have the same name). Encourage authors who publish in your source to register with ORCID to get an ORCID iD and add the ORCID iDs in the metadata directly to the author. Specify the ORCID iD separated by spaces, semicolon, spaces from author and insert before the number "orcid:" or the full URL of the ID. If an ORCID iD exists, authors can also be found using the ORCID iD when searching in BASE.

Examples

  • <dc:creator>Smit, J.H. de</dc:creator>
  • <dc:creator>Utrecht University. Department of Computer Sciences</dc:creator>
  • <dc:creator>Summann, Friedrich ; orcid:0000-0002-6297-3348</dc:creator>
  • <dc:creator>Summann, Friedrich ; https://orcid.org/0000-0002-6297-3348</dc:creator>

Publication type <dc:type>

In the <dc:type> field, enter the publication type of the document (e.g. article, chapter).

If possible, use a standardized vocabulary, e.g. the info:eu-repo Vocabulary for Publication Types or the COAR Resource Type Vocabulary.

The values used by your source must be known to BASE so that we can correctly assign your documents to our document types.

Examples

  • <dc:type>info:eu-repo/semantics/article</dc:type>
  • <dc:type>journal article</dc:type>
  • <dc:type>http://purl.org/coar/resource_type/c_6501</dc:type>

Publication date <dc:date>

Each record should contain in the <dc:date> field the publication year or date of the document in ISO 8601 format (according to the Gregorian calendar). Otherwise the restriction / sorting according to years of publication in BASE does not work correctly for your source.

The field <dc:date> should only be filled once. If there is no concrete publication date, estimate. Inaccurate data such as 17th century should be given as 1650.

Examples

  • <dc:date>2000-12-25</dc:date>
  • <dc:date>1978-02</dc:date>
  • <dc:date>1650</dc:date>

Language of the document <dc:language>

You provide information on the language of a document according to ISO 639 (2- or 3-letter code) in the <dc:language> field.

Otherwise, language information is not output in BASE for documents from your source or is output incorrectly, and the restriction to one language does not work correctly for your source.

Examples

  • <dc:language>eng</dc:language>
  • <dc:language>deu</dc:language>
  • <dc:language>en</dc:language>
  • <dc:language>en</dc:language>
  • <dc:language>nld/dut</dc:language>

Access and re-use rights <dc:rights>

Access rights (Access status)

The <dc:rights> field contains access information to the full text after the info-eu-repo-Access-Rights vocabulary or the COAR-Access-Rights vocabulary. The alternative is: Open Access documents are available in their own OAI set. The name of this set is contained in the setSpec field for each data record. Name the set as uniquely as possible, e.g. open access.

For our users, information about access to a document in the hit list is of particular importance. If this information is not or only insufficiently available, information on access to documents from your source is output incompletely, not at all or incorrectly, and the restriction to certain types of access does not work correctly for your source.

Examples

  • <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  • <dc:rights>closed access</dc:rights>
  • <dc:rights>http://purl.org/coar/access_right/c_abf2</dc:rights>
Subsequent use rights (licenses)

Offer your authors the opportunity to place documents under a license.

Use licenses that are as widely used as possible, such as Creative Commons licenses. Enter the corresponding license in your OAI interface in another <dc:rights> field.

If this information is not or only insufficiently available, information on the re-use of documents from your source is output incompletely, not at all or incorrectly and the restriction to re-use options does not work correctly for your source.

Examples

  • <dc:rights>http://creativecommons.org/licenses/by-sa/2.0/uk/</dc:rights>
  • <dc:rights>https://creativecommons.org/licenses/by/4.0/</dc:rights>

Reference / Citation <dc:source>

Information about the source or the citation (e.g. for articles the name, volume, issue of the journal) can be found in <dc:source>. Pay particular attention to the ISSN of the journal containing the ISSN.

This information allows users to better find your documents in BASE.

Examples

  • <dc:source>Ecology Letters (1461023X) vol.4 (2001)</dc:source>
  • <dc:source>ISSN: 0928-0987</dc:source>
  • <dc:source>Pieper D, Summann F.: Bielefeld Academic Search Engine (BASE). An end-user oriented institutional repository search service. Library Hi Tech. 2006; 24(4):614–619. ISSN 0737-8831.</dc:source>

Note on ISBN, ISSN etc.

In the current DINI certificate 2019 (in German only) it is recommended to add information such as ISBN or ISSN in the field <dc:identifier>. However, currently only URLs are indexed in BASE in the field <dc:identifier>. Other specifications without URLs are not indexed and therefore cannot be found if these specifications are only made in the field <dc:identifier>. If you want to work both DINI-compliant and BASE-optimized, insert specifications such as ISBN in both <dc:identifier> and <dc:source>.

Other parties involved in the publication <dc:contributor>

Indicate persons and institutions who have contributed to a publication without being an author (e.g. editor, reviewer) in <dc:contributor>. The recommendations given in the Author <dc:creator> section apply.

File format <dc:format>

In <dc:format> you should specify the file format of the publication. It is best to use the Internet Media Types (MIME types) used by IANA for this purpose. The complete list can be found at: http://www.iana.org/assignments/media-types.

Examples

  • <dc:format>video/quicktime</dc:format>
  • <dc:format>application/pdf</dc:format>

Description <dc:description>

Use <dc:description> to describe the content of the publication (abstract).

Keywords <dc:subject>

In the field <subject> both keywords and notations of classifications can be specified. If notations are used, the corresponding classification scheme should also be indicated (preferably as URI). The content should also be provided as human-readable text, preferably in English, in another <dc:subject> field.

Examples

  • <dc:subject> info:eu-repo/classification/ddc/641</dc:subject>
  • <dc:subject> Anatomy</dc:subject>

If no specific vocabulary is to be used, we recommend the general Dewey Decimal Classification (DDC): https://www.oclc.org/en/dewey/resources.html.

Publisher <dc:publisher>

The field <dc:publisher> indicates the publisher of the publication, which may be either an institution or a natural person. For university theses, the name of the university should be entered in this field. If there is a hierarchically structured organization, the different hierarchical levels should be separated from each other by points.

Examples

  • <dc:publisher>Peter Langford</dc:publisher>
  • <dc:publisher>Jumper Media </dc:publisher>
  • <dc:publisher>Loughborough University. Department of Computer Science</dc:publisher>

Related documents <dc:relation>

Related/referenced publications are indicated in the field <dc:relation>.

Example

  • <dc:relation>http://hdl.handle.net/10</dc:relation>

Content delimitation <dc:coverage>

The field <dc:coverage> is used to describe the spatial and temporal limitation of the subject of the publication. This includes location information, geocoordinates, time information or the indication of a jurisdiction.

Examples

  • <dc:coverage>Netherlands</dc:coverage>
  • <dc:coverage>name=Western Australia; northlimit=-13.5; southlimit=-35.5; westlimit=112.5; eastlimit=129</dc:coverage>
  • <dc:coverage>1800-1850</dc:coverage>
  • <dc:coverage>52.031629, 8.541202</dc:coverage>

Beyond BASE and OAI ...

Web address of the repository

If possible, offer the start page under your own subdomain (without port and subdirectory). If the start page of the repository is accessible via a port (e.g. repository.domain.com:8080) or a subdirectory (repository.domain.com/xmlui), create a redirectory from the subdomain (repository.domain.com).

Every change at the port or a subdirectory leads to the fact that the links to your source, e.g. in the BASE hit list under "Data supplier" or from our source list, no longer work.

Use generic names for subdomains

Avoid version numbers in the subdomain or directory names (e.g. ojs3.domain.com or ojs.domain.com/ojs-3/)

Every software update can cause the URL to change or your URL to contain an incorrect version number, e.g. if you have updated a software from version 2 to version 3, but the URL is still ojs2.domain.com. As mentioned in the previous point, any change to the URL will cause links to your source to stop working.

If the URL (domain / subdomain) is changed, set redirects.

If there is a change in the Internet address of your source (even if it is only a character), please set a forwarding from the old address to the new one. Also make sure that the OAI interface can still be reached by setting a forwarding.

Missing forwarding, links lead into emptiness. We check the address of the sources regularly. If we come across faulty links and there is no forwarding, we will carry out a brief search for the new address of the source in individual cases. If this is unsuccessful, your source will be deleted from the index. Other search engines also delete sources that are no longer accessible from their index.

Title of the repository / journal in plain text

The name of the repository or the title of a journal should always be found in the source code of your website at a place in plain text, either in <title>, the heading (<h1>) or as an alternative text of a logo.

If the title is missing in plain text, a correct input of the source name in our database is cumbersome. In addition, a missing name as plain text leads to the fact that your source under your name is not found by search engines such as Google or only insufficiently found.

Offer start page also in English

Offer at least the start page of your repository in English.

BASE has a global user community. With its website in English, you can give an international audience uncomplicated access to your documents.

If the start page is not available in German or English, we are usually dependent on automatic translation services to inform us about the content of your source. An English-language start page also leads to better findability of your source both in the BASE source list and in other general search engines.

Contact area with active e-mail address

A contact area is linked from the start page of your source (Contact). There the functioning e-mail address of the content and/or technical responsible person is mentioned. E-mails sent to these addresses are regularly read and answered by those responsible.

If a contact area is missing or e-mails are not read, it is hardly possible to contact you if there are problems with the indexing of your source or queries arise. This may result in your source not being indexed.

Announcement of your source / indexing in search engines

Register your source in OAI directories (e.g. OpenDOAR, ROAR, re3data or Open Archives) and update your information in the directories if changes are made.

In this way you make your source and interface known worldwide and enable other search engines to index documents from your source.

Use a "search engine friendly" folder structure. Offer e.g. a sitemap, over which all documents (Frontdoor / PDFs) are directly attainable and announce you this sitemap in search machines like Google over appropriate registration tools. Use search engine friendly metatags (e.g. Google Scholar metatags).

The good findability of your source in general and scientific search engines leads to documents from your source being easier to find and more frequently retrieved and used. If we do not yet know your source, we may also find your source during a search in OAI directories or search engines. Your sources will then - if technically possible - be actively entered into our database and indexed.