SPIRES Collection Policy
This page is an attempt to codify and standardize the collection policy followed by SPIRES currently. This has never really been done before, with each lab operating somewhat autonomously. In moving forward to inspire, we need to first understand SPIRES policy, then we can consider the CERN collection policy and how that fits with existing SPIRES practices.
SLAC Fermilab and DESY should each comment on their feelings and practices here, of course CERN folks are welcome to comment as well, but we were trying to figure out what SPIRES has been doing (and should be doing) within our framework, before we got lost in deciding what inspire should be doing....
Selection rules for HEP
This is an initial starting point for discussion by Annette, edited by Travis.
1. define core of HEP
suggestion: use desy selection criteria
(used as well to define scope of scoap, see p 15/16 of Report of SCOAP3 Working Party)
to what extent does 2 play a role in 1?... i.e. isn't all of PRD core by defn? hep-XX, nucl-XX, gr-qc are defined as core a priori
hep-xx are completely core, the others not according to desy rules. A rule of thumb (though a bit over-simplified) would be whether an article could be cross-referenced to some hep-xx
2. define criteria for additional publications outside core
- formal (e.g. complete arxiv's like astro-ph, complete jnl like PRD)
- all publications from our own institutions?
- conferences? (Jacow)
- content related ?
- Highly cited (i.e. SLAC adds all 50+ cited papers)
- This becomes a less useful measure if we add more non-core references... not necessarily if we tag core papers and look only at their references - AH
3. tag for core articles?
- currently desy tag (not usable by users)
- possibility to narrow searches to core
- statistics for scoap or any statistics on the hep publication landscape
- useful for alerts (e.g. all astroparticle papers)
4. transparency for users
- display selection criteria? open to user proposals
- core completeness, patchy coverage outside core
- possibly reduced metadata outside core (eg no affs, keywords or refs)
5. collaboration rules
- consensus on extension of HEP content (eg complete coverage of jnl)
- clear definition of who's doing what
- consensus on shifts of responsibilities
Discussion
IN points 2 and 4 note that some thought should be given to the following dilemma:
With html scraping and OAI harvesting, it is now somewhat harder to do partial (content-driven) harvesting than to harvest entire collections of journals. Adding certain types of metadata may still scale with number of articles, but bare inclusion in SPIRES does not have a lot of per-article overhead.
I agree. But we have to select anyway for keywording. So the question is whether we keep all the stuff and tag the important articles or whether we throw some away. AH
Note as well that the above should also be changing as publishers begin to expose some content tagging/categorization which can be used to automatically select a reduced set from a harvest with no effort penalty.
Yes, I expect that automatic preselection will do more and more of the job. An improved bibclassify should some day help as well. AH