Help
FAQ
Expectations, effort and costs
-
Once you have registered your institution, a DeepGreen account will be created. The access information for your institutional account will then be sent to you immediately. To get as many affiliations as possible through DeepGreen, your institution should configure an affiliation file that contains as many name variations of your institution as possible. From a technical point of view, your institution would need to consider which of the interfaces you would like to use. If your repository has a SWORD interface and you want to use it, then the associated article data will be fed directly into the repository and will then wait there to be released. Your institution can set up retrieval of the article data via the web API or via OAI-PMH. The third option is to retrieve the data manually. If your institution has been sent article data, you must check before publication in your repository whether your institution is actually authorised to publish these articles.
-
During the pilot operation, your institution will not incur any costs for the time being. After the two-year pilot operation, it is planned to involve the institutions financially in the operation of DeepGreen.
Legal and licenses
-
DeepGreen distributes article data on the legal basis of various licenses as well as Gold Open Access articles. Participation in DeepGreen is also possible without participating in an alliance licence.
-
The consent of the author is not obtained by DeepGreen, but must be clarified on site if necessary.
Registration
-
You will find the exclusion of liability on the one hand in the attachment of the e-mail we sent to your institution and on the other hand you can download the exclusion of liability here. In order to participate in DeepGreen, it is a requirement that your institution signs and returns the exclusion of liability to us. When distributing the article data, we cannot assure that the matching will always work 100%. For this reason, DeepGreen cannot assume responsibility to publishers for ensuring that all articles are published only in legitimate repositories. Each institution is responsible for checking whether it is authorized to publish the matched article data in the repository.
-
The signed and scanned exclusion of liability can be sent directly by e-mail to info-deepgreen@zib.de.
-
No, we do not need the original exclusion of liability. It is sufficient if you send us a scan of the signed document.
-
With the participation in the EZB you will receive an identifier. You can find this in your admin area under BibID.
DeepGreen account information
-
If you are no longer able to log in with your DeepGreen account information, simply send a message to info-deepgreen@zib.de. We will take care of it as soon as possible.
-
No problem. Just contact the e-mail address info-deepgreen@zib.de
-
Please do not enter any personal email addresses in your DeepGreen account, as we cannot and do not want to store any personal data in the DeepGreen router.
-
You can specify both Sigel separated by a comma in the corresponding field.
-
The e-mail address is only used for the login. For messages during the test phase, the e-mail addresses of the organizational and technical contact persons specified in the registration form will be used. Please do not enter any personal e-mail addresses for the login.
-
This is up to you. It is not necessary to change the email address. If you wish, you can assign a new e-mail address. However, we ask you not to use personal addresses.
Affiliation file
-
This error message can have several causes. It could be that your file is not encoded in UTF-8. Another popular error is that there may not be exactly 5 commas in each line. Please also note that if you want to specify a name variation that itself contains a comma (e.g.: Humboldt University, Berlin), then please put it in “”.
-
It depends on what program you are using.
Libre Office: With Libre Office you can select the character set “Unicode (UTF-8)” directly when opening the text import. When saving, you have the additional option of saving in UTF-8 by clicking on the checkbox “Edit filter setting” before you press “Save”. In the separate window you can then select “Unicode (UTF-8)” for the character set.
MS Excel: In Excel, you can open the “Web Options” window in the save window under “Tools” next to the “Save” button. There you can select under the tab “Encoding” that you want to save the document as “Unicode (UTF-8)”.
Editor: If you use an editor for editing, you can select the encoding “UTF-8” next to the “Save” button when saving.
Please make sure to select UTF-8 without BOM for all programs!
-
Yes, however, you would have to put this name variation in “”. For example, “Humboldt University, Berlin”.
-
Please do not make any changes to the first line of the affiliation file. This is important for the import routine.
-
Yes, you are welcome to use the name attributions from the Web of Science for your affiliation file.
-
Upper and lower case letters do not have to be considered.
-
The entries must be clearly assignable. For example, it is not sufficient to write only Faculty of Philosophy, since this exists at very many institutions. In such a case, either the location or the main institution would have to be added.
-
Yes, the columns are independent of each other. So, for example, name variants and e-mail domains can be next to each other without being related to each other.
-
For the matching process it does not matter which of the two variants you choose. If you use both it is important that both CSV files have the same content.
-
The affiliation file contains some columns that are not relevant for institutional repositories. The columns “Dummy1” and “Dummy2” are placeholders in case information is added later that might be relevant for matching, e.g. institutional IDs. The column “Keywords” could also be interesting later.
-
This does not work, because the keywords are only compared with keywords in article metadata at a later point in time. With reference to the GDPR, we also ask you to refrain from providing personal data in the affiliation file.
-
On the one hand, you will find information and assistance in the webinar recordings provided to you as a participating institution. In addition, you will find information in the screencast “Use from the perspective of the scientific institution” on our project website.
-
Unfortunately, the use of wildcards or Boolean operators in general is not possible in the affiliation file.
-
No, unfortunately this is not possible.
-
The affiliation file that is already in your account is a good template. For a more optimal result, we recommend adding to this file.
-
The affiliation details are matched as a single string. This means that each name variation is used in its entirety for matching and not just parts of it. Meanwhile, the matching mechanism in DeepGreen has been adjusted so that the name variations in the affiliation file may no longer be part of a word in the affiliation specification in the article metadata.
-
No, unfortunately this is not possible.
Technical
-
Participating repositories have several ways to obtain the article data.
It is possible to retrieve article data via the web API. This option is popular with our user, as the institution retains control over what data is retrieved and when. Documentation of this interface can be found here.
Furthermore, OAI-PMH can be used for item retrieval. Please note, however, that only metadata can be obtained via this interface. A documentation of this interface can be found here.
There is also the possibility to get the article data automatically via SWORD. For OPUS4 repositories the connection via SWORD is relatively simple. Especially if your repository is hosted. Both the KOBV and the BSZ already have experience in connecting to DeepGreen via SWORD. For DSpace repositories, you can find documentation here. For MyCoRe and EPrints repositories, there is experience in the DeepGreen user community. If you have any questions about these repository types feel free to contact us.
Of course, it is also possible to retrieve the article data manually. For each article in the routing history, a ZIB package with the full text and metadata can be downloaded by entering the notification ID and API key in the browser line in the following format:
Feel free to contact us for detailed instructions.
-
As long as the data is still stored on the DeepGreen router, you can download the data as often as you like. Only the automated delivery via SWORD cannot be started multiple times easily.
-
With automated delivery via SWORD, multiple deliveries cannot be made as easily. Confirmations via the SWORD interface are not possible.
Documentation
Guides
Important information for institutional repositories can be found in the guide DeepGreen – Open Access Transformation: A Guide for Institutional Repositories
A comprehensive overview is also provided by the guide DeepGreen: Open Access-Transformation in the Information Infrastructure – Requirements and Recommendations
Technical documentation
DeepGreen is being developed open-source and the code can be viewed at https://github.com/oa-deepgreen
Available interfaces for publishers
-
SFTP
Publishers send their data to the DeepGreen router via SFTP. The deliveries consist of ZIP files, each containing an xml file with metadata in NISO JATS format and a PDF file with the full text of the article. Our technical specification describes the details.
Verfügbare Schnittstellen für Repositorien
-
Web API (native)
DeepGreen offers a (rudimentary) REST interface. The current version of the DeepGreen REST API is
v1
. This can be accessed via the web addresshttps://www.oa-deepgreen.de/api/v1
, e.g. with the command curl in a shell (bash
,tcsh
, etc.):$ curl -s [-X GET|POST|PUT|HEADER] https://www.oa-deepgreen.de/api/v1/...
In the following examples, the
...
replaced by the concrete REST resourcesvalidate
,notification
androuted
from the Web API interface. Another REST resource intended for GUI-less operation of repository accounts is given by theconfig
endpoint. This resource is especially for repository operators who want to develop their own scripts to manage their accounts. It is included here for the purpose of completeness.In the following, for the purpose of clarity
GET
,POST
, etc. shall be written for the entirecurl
line as specified above. Here is a brief overview of the possible calls in this short form:-
Request:
POST /validate
Specification of metadata by „Incoming Notification JSON
“ (internal DeepGreen format)¶POST /validate?api_key=<api_key> Content-Type: application/json [Incoming Notification JSON]
-
Request :
POST /notification
http-POST: Clustering through „Content-Type: multipart/form-data; ...
“¶POST /notification?api_key=<api_key> Content-Type: multipart/form-data; boundary=FulltextBoundary --FulltextBoundary Content-Disposition: form-data; name="metadata" Content-Type: application/json [Incoming Notification JSON] --FulltextBoundary Content-Disposition: form-data; name="content" Content-Type: application/zip [Package] --FulltextBoundary--
-
Request :
GET /routed
-
Request:
GET /notification
A specific notification, delivered in the format „Outgoing Notification JSON
“ (internal DeepGreen format)¶GET /notification/<notification_id>
In addition, as already mentioned at the beginning, there is another call intended exclusively for repository operators:
-
Request:
POST /config
Install new „match-config
“ file for own account (JSON format)¶POST /config?api_key=<api_key> Content-Type: application/json; charset=utf-8 [New (overwriting!) match config settings JSON]
-
Request:
GET /config
Query the current „match-config
“ settings of the own account (JSON format)¶GET /config?api_key=<api_key>
All these options for calling up the
Web-API
will now be explained in more detail below.Deliver article data to DeepGreen¶
Before a data package from a publication is transferred to DeepGreen, the package can be checked by DeepGreen. Thus, two REST resources of the web API are available,
-
Validation of a data package (
validation
), -
Transmission of a data package (
notification
).
For very impatient readers here are two (working!)
bash
sample scripts that delivers or validateszip
-packages with anapi_key
to DeepGreen. The scripts first analyse the specifiedzip
file to determine which metadata schema is present in the data package. Currently, DeepGreen processes three schemes,DTD JATS
,DTD Journal
(this is a variant of theDTD JATS
schema) andDTD RSC
.bash skript for validation¶#! /usr/bin/env bash host_url="https://www.oa-deepgreen.de" if [ $# -ge 2 ]; then api_key=$1 zip_file=$2 else echo "usage: `basename $0` {api-key} {zip-file}" exit -1 fi curl=`which curl` zipgrep=`which zipgrep` wc=`which wc` pkg_fmt="https://datahub.deepgreen.org/FilesAndJATS" has_xml=`${zipgrep} "DOCTYPE article" ${zip_file} | ${wc} -l` if [ ${has_xml} -eq 1 ]; then is_jrnl=`${zipgrep} "//NLM//DTD Journal " ${zip_file} | ${wc} -l` is_jats=`${zipgrep} "//NLM//DTD JATS " ${zip_file} | ${wc} -l` is_rsc=`${zipgrep} "//RSC//DTD RSC " ${zip_file} | ${wc} -l` if [ ${is_jrnl} -eq 1 ]; then pkg_fmt="https://datahub.deepgreen.org/FilesAndJATS" elif [ ${is_jats} -eq 1 ]; then pkg_fmt="https://datahub.deepgreen.org/FilesAndJATS" elif [ ${is_rsc} -eq 1 ]; then pkg_fmt="https://datahub.deepgreen.org/FilesAndRSC" else echo "error: no valid .xml (JATS or RSC) in zip archive found: stop." exit -2 fi else echo "error: no valid (or too many?!) .xml (JATS xor RSC) in zip archive found: stop." exit -3 fi echo "`basename $0`: packaging format in zip archive found:" echo "`basename $0`: ${pkg_fmt}" ${curl} -i -k -s -XPOST "${host_url}/api/v1/validate?api_key=${api_key}" -F "content=@${zip_file};type=application/zip" -F "metadata=@-;type=application/json" <<EOF { "content" : { "packaging_format" : "${pkg_fmt}" } } EOF echo
bash skript for data transmission¶#! /usr/bin/env bash host_url="https://www.oa-deepgreen.de" if [ $# -ge 2 ]; then api_key=$1 zip_file=$2 else echo "usage: `basename $0` {api-key} {zip-file}" exit -1 fi curl=`which curl` zipgrep=`which zipgrep` wc=`which wc` pkg_fmt="https://datahub.deepgreen.org/FilesAndJATS" has_xml=`${zipgrep} "DOCTYPE article" ${zip_file} | ${wc} -l` if [ ${has_xml} -eq 1 ]; then is_jrnl=`${zipgrep} "//NLM//DTD Journal " ${zip_file} | ${wc} -l` is_jats=`${zipgrep} "//NLM//DTD JATS " ${zip_file} | ${wc} -l` is_rsc=`${zipgrep} "//RSC//DTD RSC " ${zip_file} | ${wc} -l` if [ ${is_jrnl} -eq 1 ]; then pkg_fmt="https://datahub.deepgreen.org/FilesAndJATS" elif [ ${is_jats} -eq 1 ]; then pkg_fmt="https://datahub.deepgreen.org/FilesAndJATS" elif [ ${is_rsc} -eq 1 ]; then pkg_fmt="https://datahub.deepgreen.org/FilesAndRSC" else echo "error: no valid .xml (JATS or RSC) in zip archive found: stop." exit -2 fi else echo "error: no valid (or too many?!) .xml (JATS xor RSC) in zip archive found: stop." exit -3 fi echo "`basename $0`: packaging format in zip archive found:" echo "`basename $0`: ${pkg_fmt}" ${curl} -i -k -s -XPOST "${host_url}/api/v1/notification?api_key=${api_key}" -F "content=@${zip_file};type=application/zip" -F "metadata=@-;type=application/json" <<EOF { "content" : { "packaging_format" : "${pkg_fmt}" } } EOF echo
Different return responses from DeepGreen may therefore appear depending on whether verification or actual data delivery is undertaken. For the purpose of simplicity, the two functions are also documented here only with the use cases typical for them. Of course, it is also possible to validate a complete data package with metadata including full text(s), as well as to deliver only metadata. It should be noted that only one article (!) is used per data package. Now the explanation of the return values in detail:
Return values for
validation
(here the example: Deliver metadata only)¶For the verification of a data delivery, it makes perfect sense to send only the metadata of the delivery. For this purpose, the
json
format „Incoming Notification JSON
“ is used to describe the metadata to be checked. Deliveries bundled with binary data can of course also be checked. This works exactly as indicated in the examplenotification
below.Specification of metadata by „Incoming Notification JSON
“ (internal DeepGreen format)¶POST /validate?api_key=<api_key> Content-Type: application/json [Incoming Notification JSON]
-
http header return values
Code
Description
204 No Content
Data package ok!
400 Bad Request
HTTP 1.1 400 Bad Request Content-Type: application/json { "error" : "<comprehensible(!) error message (in english)>" }
401 Unauthorised
e.g. invalid
api_key
, wrong user type
So much for the documentation of the http return values of the
validation
function. Now follows the description of the return values when using thenotification
function.Return values for
notification
(Example: Delivering metadata including full text(s))¶Article deliveries that are to contain binary content (e.g. the full text as a
pdf
), are clustered by marking „multipart/form-data
“ in the http header. The field content.packaging_format must be present in thejson
of the metadata part („Incoming Notifikation JSON
“).http-POST: Clustering through „Content-Type: multipart/form-data; ...
“¶POST /notification?api_key=<api_key> Content-Type: multipart/form-data; boundary=FulltextBoundary --FulltextBoundary Content-Disposition: form-data; name="metadata" Content-Type: application/json [Incoming Notification JSON] --FulltextBoundary Content-Disposition: form-data; name="content" Content-Type: application/zip [Package] --FulltextBoundary--
Minimal specification of the metadata by „packaging_format
“¶POST /notification?api_key=<api_key> Content-Type: multipart/form-data; boundary=FulltextBoundary --FulltextBoundary Content-Disposition: form-data; name="metadata" Content-Type: application/json { "content" : { "packaging_format" : "https://datahub.deepgreen.org/FilesAndJATS" } } --FulltextBoundary Content-Disposition: form-data; name="content" Content-Type: application/zip [Package] --FulltextBoundary--
-
http-Header return values:
Code
Description
202 Accepted
HTTP 1.1 202 Accepted Content-Type: application/json Location: <URL of the api endpoint of the accepted delivery> { "status" : "accepted", "id" : "<unique ID of this new notification>", "location" : "<URL of the api endpoint of this notification>" }
400 Bad Request
HTTP 1.1 400 Bad Request Content-Type: application/json { "error" : "<comprehensible (!) error message (in english)>" }
401 Unauthorised
z.B. invalid
api_key
, wrong user type
Receive article data from DeepGreen ¶
By nature, there are several options to query and list DeepGreen’s data collections. These different possibilities are defined and controlled by different calls, but also by certain parameters. Als parameters for the http addresses are provided:
Possible parameters for GET functions of DeepGreen¶GET <http-Adresse/...>?since=<YYYY-MM-DD> # # Specification required; determines from when the notifications are listed # GET <http-Adresse/...>?pageSize=<number> # # Specification optional, preset to 25, maximum 100; determines how many data records are returned at once # GET <http-Adresse/...>?page=<number> # # Specification optional, defaults to 1; determines which side of the results is returned # GET <http-Adresse/...>?api_key=<api_key> # # Optional; this information is used to authenticate the request (e.g. for obtaining full texts) #
Several parameter specifications in succession can be linked with the
&
signThe notification lists of delivered articles provided by DeepGreen within a certain time window (typically three months) are JSON lists of the DeepGreen-specific schema „
Outgoing Notification
“.Full JSON schema for
Outgoing Notification
(sorted alphabetically by the keys)¶{ "analysis_date": "2016-08-09T14:22:11Z", "content": { "packaging_format": "string" }, "created_date": "2016-08-09T14:22:11Z", "embargo": { "duration": 0 }, "id": "string", "issn_data": "string", "links": [ { "format": "string", "packaging": "string", "type": "string", "url": "string" } ], "metadata": { "author": [ { "affiliation": "string", "firstname": "string", "identifier": [ { "id": "string", "type": "string" } ], "lastname": "string", "name": "string" } ], "date_accepted": "2016-08-09T14:22:11Z", "date_submitted": "2016-08-09T14:22:11Z", "fpage", "string", "identifier": [ { "id": "string", "type": "string" } ], "issue", "string", "journal", "string", "license_ref": { "title": "string", "type": "string", "url": "string", "version": "string" }, "lpage", "string", "project": [ { "grant_number": "string", "identifier": [ { "id": "string", "type": "string" } ], "name": "string" } ], "publication_date": "2016-08-09T14:22:11Z", "publisher": "string", "source": { "identifier": [ { "id": "string", "type": "string" } ], "name": "string" }, "subject": [ "string" ], "title": "string", "volume": "string" } }
The following table gives a concise description for all fields mentioned in the schema. All fields are optional and it should be noted that neither the schema nor the field descriptions in the table may be complete. In a further version of DeepGreen, additional fields can be added or individual fields can be deleted.
-
Field description for outgoing notifications (in alphabetical order)
Field identifier
Describtion
analysis_date
Time of delivery analysis
content.packaging_format
Format of the associated binary package (usually .zip)
created_date
Time when the notification was created
embargo.duration
Embargo period (stated in months)
id
Persistent System ID for this notification
links.format
MIME type of the source
links.packaging
Package format of the source
links.type
Keyword for the type of source (e.g. package)
links.url
URL for the source (publisher- or system-side)
metadata.author.affiliation
Affiliation information, retrieved from the metadata
metadata.author.firstname
First name of an author
metadata.author.lastname
Surname of an author
metadata.author.name
Composite name of an author/originator
metadata.date_accepted
Date when the publication was accepted
metadata.date_submitted
Date when the publication was submitted
metadata.fpage
First page number of the publication
metadata.identifier.id
ID for the publication (e.g. DOI)
metadata.identifier.type
ID type (e.g. “doi”; however, no vocabulary)
metadata.issue
Issue count (journal issue)
metadata.journal
Name of the journal of the publication (journal title)
metadata.license_ref.title
Name of a licence (free text)
metadata.license_ref.type
Licence type (free text)
metadata.license_ref.url
URL leading to further licence information
metadata.license_ref.version
Version of a licence
metadata.lpage
Last page number of the publication
metadata.project.grant_number
Funding abbreviation, if indicated in the metadata
metadata.project.identifier.id
ID of a grant (e.g. Ringold)
metadata.project.identifier.type
ID type in relation to the funding ID
metadata.project.name
Name of a sponsor / a funding institution
metadata.publication_date
Publication date
metadata.publisher
Publisher name or publishing house
metadata.source.identifier.id
ID of the publication source (e.g. ISSN)
metadata.source.identifier.type
ID type of the publication source
metadata.source.name
Name of the publication source (journal title)
metadata.subject
Keyword
metadata.title
Publication title
metadata.volume
Volume Count (Journal Volume)
List of all successfully delivered articles¶
List of all successfully delivered notifications¶GET /routed?since=<YYYY-MM-DD>[&<other params>]
-
http-Header return values
Code
Description
200 OK
HTTP 1.1 200 OK Content-Type: application/json { "since" : "<Date YYYY-MM-DDThh:mm:ssZ>", "page" : "<Page number of results>", "pageSize" : "<Number of results per page>", "timestamp" : "<Time stamp of query>", "total" : "<Number of results at this time>", "notifications" : [ "<Liste der 'Outgoing Notification'-JSON objects>" ] }
400 Bad Request
HTTP 1.1 400 Bad Request Content-Type: application/json { "error" : "<comprehensible (!) error message (in english)>" }
List of articles delivered to an institution¶
List of notifications delivered to one institution¶GET /routed/<repo_id>?since=<YYYY-MM-DD>[&<other params>]
-
http-Header return values
Code
Description
200 OK
HTTP 1.1 200 OK Content-Type: application/json { "since" : "<Date YYYY-MM-DDThh:mm:ssZ>", "page" : "<Page number of results>", "pageSize" : "<Number of results per page>", "timestamp" : "<timestamp of query>", "total" : "<Number of results at this time>", "notifications" : [ "<List of 'Outgoing Notification'-JSON objects>" ] }
400 Bad Request
HTTP 1.1 400 Bad Request Content-Type: application/json { "error" : "<comprehensible(!) error message (in english)>" }
Query of a specific article¶
Each notification for an article that was successfully delivered by DeepGreen is marked with a unique ID, the so-called notification ID. This ID can be found in the output lists of the previous queries. With the following query, the individual
JSON
record of a specific notification ID is obtained.Query of a specific notification, delivered in „Outgoing Notification JSON
“ (internal DeepGreen format)¶GET /notification/<notification_id>
-
http-Header return values
Code
Description
200 OK
HTTP 1.1 200 OK Content-Type: application/json [Outgoing Notification JSON]
404 Not Found
e.g.
notification_id
does not exist, or the notification could not (yet) be delivered and you are not the author (i.e. publisher) of the notification (authenticated viaapi_key
)
Retrieval of the binary data of a notification¶
A special
links
field in theOutgoing Notification JSON
, if set, can then be used to obtain the corresponding full text, provided one is authorised to do so. Assuming that the notification contains the JSON section"links" : [ { "type" : "package", "format" : "application/zip", "url" : "https://www.oa-deepgreen.de/api/v1/notification/123456789/content", "packaging" : "https://datahub.deepgreen.org/FilesAndJATS" }, { "type" : "package", "format" : "application/zip", "url" : "https://www.oa-deepgreen.de/api/v1/notification/123456789/content/SimpleZip", "packaging" : "http://purl.org/net/sword/package/SimpleZip" } ]
then the retrieval of the full text read as follows:
Query the binary data of a data package, usually a .zip file with the full text(s).¶GET <links url>?api_key=<api_key>
-
http-Header return values
Code
Description
200 OK
HTTP 1.1 200 OK Content-Type: application/zip [(binary) package]
401 Unauthorised
e.g. invalid
api_key
, wrong user type or the corresponding notification has not (yet) been delivered404 Not Found
There is no content at the given URL or the URL simply does not exist.
Query and update
match-config
-Ssettings for repositories¶To query the current affiliation and other hit settings of a repository account, one of course needs a valid
api_key
:Query the current matching criteria¶GET /config?api_key=<api_key>
-
http-Header return values
Code
Description
200 OK
HTTP 1.1 200 OK Content-Type: application/json [matching criteria JSON]
401 Unauthorised
e.g. invalid
api_key
404 Not Found
There is no content under the specified URL or the URL simply does not exist
GET /config/
)
Update the
matching
criteria of a repository account¶In order to be able to load new affiliation and hit information for a repository into the associated account, you need (as usual) a valid
api_key
:Overwrite the matching criteria with new values¶POST /config?api_key=<api_key> Content-Type: application/json; charset=utf-8 [Overwriting new match config settings JSON]
-
http-Header return values
Code
Description
200 OK
HTTP 1.1 200 OK Content-Length: 0
401 Unauthorised
e.g. invalid
api_key
400 Bad Request
A syntax error has been detected in the JSON format that was uploaded (e.g. a missing comma in enumerations).
Example correct
match-config
file in internal JSON format¶{ "name_variants": [ "Academia Fridericiana Erlangensis", "Academia Friderico Alexandrina Erlangen-Nürnberg", "Academia Friderico-Alexandrina", "Academia Regia Bavarica Friderico-Alexandrina", "Academia Regia Friderico-Alexandrina", "Bayerische Friedrich-Alexanders-Universität", "F.A.U. Erlangen-Nürnberg", "University of Erlangen" ], "grants": [ "2491691", "2673762", "6273863" ], "domains": [ "fau.de", "uk-erlangen.de", "uni-erlangen.de" ], "keywords": [ "research", "bavarian", "erlangen" ] }
-
-
Another interface that is particularly suitable for metadata harvesting processes is provided by DeepGreen with an
OAI-PMH
compliant request/query option (see Open Archives Initiative (dt.)). This type of metadata query can be addressed via the web addresshttps://www.oa-deepgreen.de/oaipmh/
, e.g. with the command curl in a shell (bash
,tcsh
, etc.):$ curl -s https://www.oa-deepgreen.de/oaipmh/...
In the process, the
...
can be replaced by the two possible call variantsall
orrepo
of the DeepGreenOAI-PMH
interface. As usual, in the following,GET
always means the completecurl
line given above:-
Call :
GET /all
List of all successfully delivered notificationsGET /all[?<params (i.e. oai_verb + params)>]
-
Call:
GET /repo
List of notifications delivered to one facilityGET /repo/<repo_id>[?<params (i.e. oai_verb + params)>]
As an illustrative example, the identification of the DeepGreen-
OAI-PMH
-interface is given:$ curl -k -s https://www.oa-deepgreen.de/oaipmh/all?verb=Identify | xml_pp <?xml version="1.0" encoding="UTF-8"?> <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responseDate>2018-08-17T10:47:33Z</responseDate> <request verb="Identify">http://www.oa-deepgreen.de/oaipmh/all</request> <Identify> <repositoryName>DeepGreen Prototype OAI-PMH Endpoint</repositoryName> <baseURL>http://www.oa-deepgreen.de/oaipmh/all</baseURL> <protocolVersion>2.0</protocolVersion> <adminEmail>***</adminEmail> <earliestDatestamp>2018-05-19T08:47:33Z</earliestDatestamp> <deletedRecord>transient</deletedRecord> <granularity>YYYY-MM-DDThh:mm:ssZ</granularity> </Identify> </OAI-PMH>
Please note that the earliest possible timestamp (
earlierstDatestamp
) is always approximately “-3 months” from the call date (responseDate
), as DeepGreen only keeps item data up to date and thus in stock for so long (moving wall principle).Metadata received from DeepGreen
It should be expressly pointed out once again that DeepGreen’s
OAI-PMH
interface only offers metadata. The delivery of full texts viaOAI-PMH
is not (and will not be in the future!) envisaged.For the sake of clarity, the most common OAI verbs supported by DeepGreen are listed below:
Possible parameters for OAI verbs from DeepGreenGET <hhtp-Adresse/...>?verb=Identify # # This OAI verb has no further parameters # GET <http-Adresse/...>?verb=[ListIdentifiers|ListRecords]&from=<YYYY-MM-DD> # # Specification optional; determines from when the notifications are listed # GET <http-Adresse/...>?verb=[ListIdentifiers|ListRecords]&until=<YYYY-MM-DD> # # Specification optional, specifies a (maximum) end date until which the notifications are listed # GET <http-Adresse/...>?verb=GetRecord?identifier=<oai_id> # # Required; specifies the particular record to be fetched for this OAI verb. #
For all OAI verbs that return one or more data sets, the parameter
...&metadataPrefix=oai_dc
must also be appended. So if it makes sense (or is mandatory) for an OAI verb, several parameters are linked with the&
sign.The
http-Header
return value is always200 OK
for all OAI requests (according to theOAI-PMH
specification):-
http-Header return values
Code
Description 200 OK
HTTP 1.1 200 OK Content-Type: text/xml; charset=utf-8 [XML response to OAI-PMH request; either content (e.g. list of records) or error message]
Examples for the use of the OAI-PMH interface
Exemplary result of an
OAI-PMH
call (with fictitious account number 1234567890 ; the notification IDs are marked with asterisks):# # ListIdentifiers # $ curl -k -s "https://www.oa-deepgreen.de/oaipmh/repo/1234567890?verb=ListIdentifiers&metadataPrefix=oai_dc" | xml_pp <?xml version="1.0" encoding="UTF-8"?> <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responseDate>2018-08-20T12:03:31Z</responseDate> <request metadataPrefix="oai_dc" verb="ListIdentifiers">http://www.oa-deepgreen.de/oaipmh/repo/1234567890</request> <ListIdentifiers> <header xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"> <identifier>oai:www.oa-deepgreen.de/notification:*****</identifier> <datestamp>2018-08-03T07:56:26Z</datestamp> </header> <header xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"> <identifier>oai:www.oa-deepgreen.de/notification:*****</identifier> <datestamp>2018-07-04T12:41:44Z</datestamp> </header> <header xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"> <identifier>oai:www.oa-deepgreen.de/notification:*****</identifier> <datestamp>2018-06-18T13:39:04Z</datestamp> </header> </ListIdentifiers> </OAI-PMH>
Another example with concrete data sets. Using the notification IDs (marked with an asterisk here), the authorised repositories could obtain the articles via DeepGreen’s web API, if necessary:
# # ListRecords # $ curl -k -s "https://www.oa-deepgreen.de/oaipmh/repo/1234567890?verb=ListRecords&metadataPrefix=oai_dc" | xml_pp <?xml version="1.0" encoding="UTF-8"?> <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responseDate>2018-08-20T12:24:10Z</responseDate> <request metadataPrefix="oai_dc" verb="ListRecords">http://www.oa-deepgreen.de/oaipmh/repo/1234567890</request> <ListRecords> <record> <header xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"> <identifier>oai:www.oa-deepgreen.de/notification:*****</identifier> <datestamp>2018-08-10T08:29:00Z</datestamp> </header> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"> <oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <dc:title>Healthcare IT Utilization and Penetration among Physicians: Novel IT Solutions in Healthcare – Use and Acceptance in Hospitals</dc:title> <dc:publisher>S. Karger AG</dc:publisher> <dc:identifier>issn:0014-312X</dc:identifier> <dc:identifier>issn:1421-9921</dc:identifier> <dc:identifier>doi:10.1159/000490241</dc:identifier> <dc:creator>Ferdinand Vogt</dc:creator> <dc:creator>Fritz Seidl</dc:creator> <dc:creator>Giuseppe Santarpino</dc:creator> <dc:creator>Martijn van Griensven</dc:creator> <dc:creator>Martin Emmert</dc:creator> <dc:creator>Guenther Edenharter</dc:creator> <dc:creator>Dominik Pförringer</dc:creator> <dc:contributor>eDepartment of Anaesthesiology, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany</dc:contributor> <dc:contributor>bDepartment of Trauma Surgery, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany</dc:contributor> <dc:contributor>bDepartment of Trauma Surgery, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany; cDepartment of Experimental Trauma Surgery, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany</dc:contributor> <dc:contributor>aDepartment of Cardiac Surgery, Klinikum Nürnberg, Paracelsus Medical University, Nuremberg, Germany</dc:contributor> <dc:contributor>cDepartment of Experimental Trauma Surgery, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany</dc:contributor> <dc:contributor>dInstitute of Management (IFM), School of Business and Economics, Friedrich Alexander University Erlangen-Nuremberg, Nuremberg, Germany</dc:contributor> <dc:date>2018-07-26T00:00:00Z</dc:date> <dc:rights>Alliance License DFG</dc:rights> <dc:subject>Original Paper</dc:subject> <dc:subject>Health monitoring</dc:subject> <dc:subject>Information technology</dc:subject> <dc:subject>Physicians</dc:subject> <dc:subject>Demand</dc:subject> <dc:subject>Outlook</dc:subject> <dc:subject>Expectations</dc:subject> <dc:subject>Data storage</dc:subject> <dc:subject>Healthcare IT</dc:subject> </oai_dc:dc> </metadata> </record> <record> <header xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"> <identifier>oai:www.oa-deepgreen.de/notification:*****</identifier> <datestamp>2018-08-10T08:34:06Z</datestamp> </header> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"> <oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <dc:title>Gastrointestinal Stromal Tumors: Clinical Symptoms, Location, Metastasis Formation, and Associated Malignancies in a Single Center Retrospective Study</dc:title> <dc:publisher>S. Karger AG</dc:publisher> <dc:identifier>issn:0257-2753</dc:identifier> <dc:identifier>issn:1421-9875</dc:identifier> <dc:identifier>doi:10.1159/000489556</dc:identifier> <dc:creator>Ali Aghdassi</dc:creator> <dc:creator>Agnes Christoph</dc:creator> <dc:creator>Frank Dombrowski</dc:creator> <dc:creator>Paula Döring</dc:creator> <dc:creator>Christoph Barth</dc:creator> <dc:creator>Jan Christoph</dc:creator> <dc:creator>Markus M. Lerch</dc:creator> <dc:creator>Peter Simon</dc:creator> <dc:contributor>aDepartment of Medicine A, University Medicine Greifswald, Greifswald, Germany</dc:contributor> <dc:contributor>dChair of Medical Informatics, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany</dc:contributor> <dc:contributor>cGastroenterologische Praxis, Kempten, Germany</dc:contributor> <dc:contributor>bInstitute of Pathology, University Medicine Greifswald, Greifswald, Germany</dc:contributor> <dc:date>2018-06-05T00:00:00Z</dc:date> <dc:rights>Alliance License DFG</dc:rights> <dc:subject>Stomach and Duodenum: Original Paper</dc:subject> <dc:subject>Gastrointestinal stromal tumor</dc:subject> <dc:subject>Gastrointestinal oncology</dc:subject> <dc:subject>Gastrointestinal symptoms</dc:subject> <dc:subject>Gastrointestinal tract</dc:subject> <dc:subject>Metastasis</dc:subject> <dc:subject>Recurrence</dc:subject> </oai_dc:dc> </metadata> </record> </ListRecords> </OAI-PMH>
-
-
The default installation of a SWORDv2 DSpace instance understands three main formats according to the <acceptPackaging> tag of the service document: METSDSpaceSIP, Binary and SimpleZip. Depending on the specified format (packaging), the package to be uploaded must be provided accordingly for a successful deposit.
DSpace configuration to receive the METSMODS packages via SWORDv2:
- The SWORDV2 interface must be enabled. See https://wiki.duraspace.org/display/DSDOC5x/SWORDv2+Server.
- In order to be able to read METSMODS files and get as much DeepGreen metadata as possible, an XSLT file has been implemented that performs the mapping from METSMODS metadata to DSpace metadata. The file can be found at: https://github.com/OA-DeepGreen/DSpace/blob/depositonce-6.3x/dspace/config/crosswalks/sword-mods-ingest.xsl.
You must set the following configurations in the “dspace.cfg” or “local.cfg”:
- mets.default.ingest.crosswalk.MODS = MODS
- crosswalk.submission.MODS.stylesheet= crosswalks/sword-mods-ingest.xsl
Important: For the SWORD configuration of a DeepGreen repository account, you must enter packaging preference, http://purl.org/net/sword/package/METSMODS in the SWORD configuration.
Concordance NISO JATS to METSMODS
NISO JATS (tag) METSMODS (xpath) Allocation in METSMODS (example) n/a //mods/typeOfResource text n/a //mods/genre journal article <article> //mods/language/languageTerm EN <article-title> //mods/titleInfo/title A Short Discussion on… //mods/relatedItem/[@type=”host”]/… NISO JATS (tag) METSMODS (xpath) Allocation in METSMODS (example) <journal-title> titleInfo/title Acta numerica <journal-id> identifier[@type=”publisher-id”] Act. Num. <issn> identifier[@type=”eIssn”] 1234-234X <issn> identifier[@type=”pIssn”] 5432-9634 <volume> part/detail[@type=”volume”]/number 7 <issue> part/detail[@type=”issue”]/number 10 <fpage> part/extent[@unit=”pages”]/start 521 <lpage> part/extent[@unit=”pages”]/end 534 //mods/relatedItem/[@type=”host”]/… NISO JATS (tag) METSMODS (xpath) Allocation in METSMODS (example) <surname> namePart[@type=”family”] Hoppenstadt <given-names> namePart[@type=”given”] Dickie M. <contrib> role/roleTerm[@type=”text”] author <contrib-id> nameIdentifier[@type=”orcid”] 1111-1010-1111-0101 <aff> affiliation Zuse Institut Berlin NISO JATS (tag) METSMODS (xpath) Allocation in METSMODS (example) <abstract> ///mods/abstract In this paper we will … <kwd> //mods/subject/topic viscoelasticity //mods/originInfo NISO JATS (tag) METSMODS (xpath) Allocation in METSMODS (example) <publisher-name> publisher MDPI <publisher-loc> place/placeTerm[@type=”text”] Basel <pub-date> dateIssued[@encoding=”iso8601″] 2015-02-23 <date> dateOther[@type=”accepted”] 2015-02-20 <date> dateOther[@type=”received”] 2015-12-24 NISO JATS (tag) METSMODS (xpath) Allocation in METSMODS (example) <article-id> //mods/identifier[@type=”doi”] 10.1212sjsdh45723dg <license> //mods/accessCondition[@description=”uri”] http:creativecomm.org
The repository account
To participate in DeepGreen, institutional repositories must be registered with the Electronic Journals Library (EZB) in Regensburg.
If this requirement is met, an account can be applied for at DeepGreen. With the allocation of a new repository account, a (unique) 32-digit ID and a so-called API key are generated, which is required for access to the expected full texts.
This account is used to log in to DeepGreen as a repository operator in order to
- find out your own ID and API key,
- view and download the current article allocations,
- Check the current settings of the match-config file (affiliation file), download it or upload a new match-config file if necessary,
- Check and adjust the DeepGreen account settings in general (password, SWORD settings for automated item data delivery, if applicable).
The affiliation file
With this special file (a .csv file with exactly six columns: Name Variants, Domains, Grant Numbers, Dummy1, Dummy2 and Keywords) an institution specifies with which (name) affiliations it would like to have publications assigned.
Comments:
- The columns Dummy1 and Dummy2 have no function in DeepGreen so far and should therefore always remain empty. These columns are left free for future extensions.
- In any case, exactly five commas must be given per line.
- All columns are read in and interpreted as unicode strings. For internal processing within DeepGreens, the nfd normalisation of unicode is used (vgl. https://www.unicode.org/reports/tr15/)
Example of an affiliation file
The first line (header) is unchangeable and must always read as follows
Name Variants,Domains,Grant numbers,Dummy1,Dummy2,Keywords
Academia Friedericiana Erlangensis,,,,,
Academia Friderico-Alexandrina Erlangen-Nürnberg,,,,,
Academia Friderico-Alexandrina,,,,,
Academia Regia Bavarica Friderico-Alexandrina,,,,,
Academia Regia Friderico-Alexandrina,,,,,
Bayerische Friedrich-Alexanders-Universität,,,,,
F.A.U. Erlangen-Nürnberg,,,,,
FAU Erlangen-Nürnberg,,,,,
Friedrich Alexander University,,,,,
Friedrich-Alexander-Universität Erlangen,,,,,
Friedrich-Alexander-Universität Erlangen-Nürnberg,,,,,
Friedrich-Alexander-Universität zu Erlangen,,,,,
Friedrich-Alexander-University Erlangen,,,,,
Friedrich-Alexanders-Universität,,,,,
Friedrichs-Akademie,,,,,
Königlich-Bayerische Friedrich-Alexanders-Universität,,,,,
Univ. Erlangen-Nürnberg,,,,,
Universidad de Erlangen-Núremberg,,,,,
Universidad de Erlangen-Nürnberg,,,,,
Universitas Literarum Regia Friderico-Alexandrina,,,,,
Università di Erlangen-Nürnberg,,,,,
Universität Erlangen,,,,,
Universität Erlangen-Nürnberg,,,,,
University Erlangen-Nuremberg,,,,,
University of Erlangen-Nuremberg,,,,,
University of Erlangen-Nürnberg,,,,,
,fau.de,,,,
,uk-erlangen.de,,,,
,uni-erlangen.de,,,,
,,,123456-563/2,,
,,,99988/365-2,,
The more, the more comprehensive and the more precise the information in this file, the more accurately DeepGreen can find out from the publishing metadata of an article whether the publication in question can be assigned to an institution or not.
Forum
Stay in touch!
Please feel free to use the mailing list “deepgreen-forum@zib.de” to exchange information with other DeepGreen users:
https://listserv.zib.de/mailman/listinfo/deepgreen-forum
With the assignment of a password, the previous message archive can then also be viewed and searched.