Proposed Workflow for INSPIRE with holdingpen2
Currently all HEP ingestions but arXiv categories that we harvest directly are processed in a stand alone workflow at DESY. This functionality needs to be ported to INSPIRE. It relies on holdingpen2 to work on records before the final ingestion to INSPIRE. This is a proposal (from the DESY point of view) how the INSPIRE workflow could look like.
Updated slides form CERN Meeting Jan 2014
workflow.pdf
To start with minimum workflow for arXiv.
No matching / merging - assume all records are new.
For hep* nothing changes.
For all other articles:
- harvest to holdingpen2 incl. fulltext
- run BibClassify (later more, might use references)
- what we harvest now by complete category goes directly to INSPIRE (CC)
- depending on output of BibClassify ask Curator to select article - pre-fill form depending on info (mock-up see below and
full slides)
- ingest selected articles incl added information to INSPIRE
- create tickets (or other means of workflow) for ingested articles
- CORE:
- HEP_curation
- CC, not hep*, N>0:
- Assign_CORE&FC
same GUI as for selection - without selection button
When this is running DESY can stop harvesting all of arXiv.
The
mockup for the GUI / holdingpen action are examples for
arXiv and
journal.
The layout is almost arbitraty, it would be nice if the action and input fields are on the right side. select and CORE buttons to be replaced by 3 buttons: reject/select/CORE.
The numbers shown between the input area and the keywords come from several procedures:
We run
BibClassify three times: on full-text to extract CORE keywords, on title/abstract (metadata) for automatic INSPIRE keywords and with an anti-HEP ontology for Anti-KW. We check how many of the resolvable references are in INSPIRE-HEP, how many are CORE paper. How many CORE paper have been written by the authors.
Code how to get these numbers can be provided by DESY.
Some of these numbers should also be displayed in the HP maintable. Can you add color? green for positive info, red for negative info.
For the
holdingpen maintable:
- Title
- yes, usually over 2 lines or more
- Identifier
- something like
Chin.J.Phys. 52 (2014) 707
or arXiv: 1403.2174
. For arXiv categories in 2nd line: astro-ph, nucl-ex
- Category
- not needed. Either not available yet or part of identifier
- Created
- as long as we can filter by date we don't need it in the display
- Type, Status
- do we need this displayed?
- CORE info
- in 2 lines Number of references // Number of keywords. Really just bare numbers, possibly with color. E.g.
10 | 14 | 15
1 | 1/3 | 0
- Actions
- Recect / Accept / CORE
green: existing stuff / red: new stuff
--
KirstenSachs - 09 Apr 2014