Mappings to rules for BibCheck
Mappings of the tasks below to a
CernBibCheck style language.
;; S16 Conference Information
;; actually should store only code and t Formats should fetch c,d,f
(check-field-instance ("111" 0 9999 :mandatory-subfields "cdfgt"))
;;S18 S19 Journal References
(check-field-instance("773" :mandatory-subfields "p" :group-optional-subfields "cvy" :inclusive-optional-subfields "nxa"))
(check-field-replace-subfield-content-via-kbr("773" $$p "<journal_kbr>" report ))
;;S20 Dates should be repeated for all date fields (260c, 269c,111d,f 773y,961c,x) or could use regexp
(check-field-subfield-content ('260' $$c 4 10 :content "date")
;;S21 pages should 61 or 61-70 (could conceivably check that end page is larger?
(check-field-subfield-content-regexp('773' $$c "\\d+\\-?\\d*")
;; S22 affiliations correct (should we replace with inst code in 100/700i?)
(check-field-replace-subfield-content-via-kbr("100" $$u "<inst kbr >" report ))
(check-field-replace-subfield-content-via-kbr("700" $$u "<inst kbr>" report ))
;; S34 exp codes coll_exp.kbr should contain collaboration names, and the experiment codes that generally match them
(check-field-replace-subfield-content-via-kbrs("693" '("710" $$g "<coll_exp kbr>" match-exactly )) )
(check-field-replace-subfield-content-via-kbr("693" $$e "<exp kbr>" report ))
;;S41/42 Correct published and other collections
(check-field-instance('980' 1 9999 :inclusive-mandatory-subfields "ac")
;; peer_review.kbr should contain names of journals we consider to be published and '$$aPUBLISHED'
(check-field-replace-subfield-content-via-kbrs("980" '("773" $$p "<peer_review.kbr>" match-exactly )) )
;;need a way to remove published tag?
;;need a way to do rest of FC/TC assignments??
(check-field-subfield-content-via-kba('980' $$a "<collections.kba>") )
;;Now translate inprocs from the HEP filedefn
;; check that arXiv number is properly formatted?? use regexp + $$9 conditional on 037a?? Also other report numbers are stdized via
s/[\=\/\s\-]+/-/g
;;but not arXiv!
;; check that citations (999C5) are properly formatted? 999C5r should look like eprint, 999C5s should have journal vol page? via regexp? worthwhile?
Based on what I've found in
CernBibCheck I might add the following functions:
- Might be nice to have a way of allowing manual exceptions to any rule? I.e. in metadata add something that causes BibCheck to always pass certain rules
- check-field-instance f-tag sf-code :group-optional-subfields str1 all subfields in str1 must appear if any appear, i.e. they must appear as a group.
- check-field-subfield-content f-tag sf-code :content predicate add predicate "date"
- check-field-subfield-content-regexp f-tag sf-code regexp action-when-no-match value-to-add
- Testing uniqueness? Handled by control fields?
- looking up in a knowledge bas that is a collection, not a file?
Descriptions of tasks
Brief functional descriptions of the jobs described in the SLAC Automated section of
ComparisonSlacFermilabDesyCernEnrichmentScripts
The numbers here should match that table.
Spirestasks #9 Proceedings information into CONFERENCE
Adding published proceedings information to the CONFERENCE subfile
SPIRES protocols:
conf.check.proceedings
conf.published.proc
Searches through the entire BOOKS subfile for records with the CONF-NUMBER element,
makes a list of all the CONF-NUMBERs.
Searches the CONFERENCE subfile for all records with no PUB-NOTE element, makes
a list of all the CNUNs. Each list is sorted, then compared to find lines foundin both lists, conference numbers found in the BOOKS subfile (proceedings
have been published), but with no publications note in the COFERENCE file.
This list is retained. The protocol fetches information from the BOOKS subfile for
each conference one by one (book call number, title, editor) in a format ready
to merge into the CONFERENCE record. The information is presented on the screento an experienced person for review and possible editing before being merged into the
CONFERENCE subfile record.
Spirestask #16 Add MEETING-NOTE to HEP
Finding HEP records with CNUM and adding MEETING-NOTE if needed
SPIRES protocols:
hep.check.meeting.note
hep.add.meeting.note
Checks records added to HEP (we check last 30 days) which include the element CNUM, but
do not have the element MEETING-NOTE. Checks the RESULT for records with more than one
occurrence of the element CNUM and stores the IRN separately to be updated manually,
since there will be 2 or more occurrences of element MEETING-NOTE.
Using the virtual element GETCONF to get MEETING-NOTE information from the CONF
subfile, the protocol uses the format ADD.MN to collect a text file listing IRN, CNUM and
MEETING-NOTE for the records, to be checked and merged into the proper HEP records.
Spirestasks #18 Check new PUB-NOTE
Checking and correcting PUB-NOTE
SPIRES protocols:
hep.check.newlyadded.journals
(makes text files for tasks no.18-21)
hep.check.new.pbn
Checks newly updated records (we check last 3 days, but not TODAY) for records with
a PUB-NOTE (Published in ...). The text file alljour.list is created or added to
with lines containing the IRN of each record and the PUB-NOTE element:
Published in AIP Conf.Proc.45:138-147,1978.
This text file is checked by an experienced person for anomalies in the PUB-NOTEelement. Any mistakes can be corrected bu searching for the PUB-NOTE and editing
the record.
Spirestasks #19 SPICITE elements starting with NONE
Correcting SPICITE elements starting with NONE
SPIRES protocols:
hep.check.newlyadded.journals
(makes text files for tasks no.18-21)=
hep.fix.bad.pbn
Checks newly updated records (we check last 3 days, but not TODAY) for SPICITE
elements starting with NONE, indicating that the PUB-NOTE could not be turned into a
SPICITE element with the normal five-letter CODEN. The text file nones.list is created or
added to and contains the IRN of each record and the SPICITE virtual element in
this form:
41095820 NONE,29,(1955)1
Using the IRN a record is accessed and presented on the screen to an experiencedperson with choices to edit the record, skip to the next record or quit out of the
protocol. The PBN element is edited so the virtual element SPICITE will
congtain the correct five-letter CODEN. The line in the text file nones.list is
then
deleted and the IRN in the next line is read in order to process the next record.
Spirestasks #20 Incorrect or missing DATE elements
Checking and correcting incorrect or missing DATE elements
SPIRES protocols:
hep.check.newlyadded.journals
(makes text files for tasks no.18-21)=
hep.fix.bad.dates
Checks newly updated records (we check last 3 days, but not TODAY) for records with
incorrect or missing DATE elements. Creates the text file noyear.list containing
the IRN, DATE and DATE-RECEIVED elements:
- 6171249 Jul 1001
- 6213588 1915
- 6205461
- 7560320 Dec 2007
Reading the first IRN, the protocol searches for the record in the HEP subfile,
displays the record on the screen with choices to edit the record, skip to the next
record or quit out of the preotocol. An experienced person will edit and update
the
record. The line containing the IRN is then deleted from the text file noyear.list
and the next line is read.
Spirestasks #21 Bad page numbering
Check and correct page numbers
SPIRES protocols:
hep.check.newlyadded.journals
(makes text files for tasks no.18-21)
hep.fix.bad.page
Checks newly updated records (we check last 3 days, but not TODAY) for records with
weird page ranges, e.g. 64-, 46-35, by subtracting the second pagenumber from the
first in the PUB-NOTE element and calling the remnant DELTA. A text file is created
or lines added to an existing file for all records where the DELTA is a negative numbe
or 0. The text file contains IRN, DELTA and the PUB-NOTE elements with the journal
title converted to a five-letter CODEN:
- 6045812 -41 CSJAAA,5,41-.2005
- 6099572 -82 JPAGB,A38,4665-4583.2005
Reading the IRN, a protocol searches for the first IRN in the HEP subfile, displays the
record to be edited manually, and updates the record, then deletes the line containing
the IRN in the text file (so the process can be interrupted and continued at anytime) and the next line is read to process the next record.
Spirestasks #29 Duplicate REPORT-NUM or SPICITE elements
Checking and correcting records with duplicate REPORT-NUM or SPICITE elements
SPIRES protocols:
hep.check.duplicate
hep.deal.wwith.duplicates
Checks records updated the last 30 days (excludes those with HIDDEN-NOTE=rdupok;records that have been checked and may have duplicate REPORT-NUM elements
for some reason). Collects a text file of all REPORT-NUM and SPICITE elements
which is then sorted and uniqued. Each line is made a search command in HEP,
and IRNs with a RESULT of 2 or more are checked for HIDDEN-NOTE = dupok: and
collected into another text file. This text file is again sorted and uniqued and
the form:
- SPICITE = PRTEA,4,37;
- REPORT-NUM = rl 78 059;
Each line is made a search command in HEP and the resulting set of 2-4 records
presented on the screen to an experienced person who will edit, annotate or
delete the affected records. Before going on to the next record, the person
is given a menu of choices: deleting the line containing the
search beacuse the duplication has been fixed and going on to the next set
of records, edit some more, adding HIDDEN-NOTE = rdupok; to records which cannotbe changed (so they don't come up in the initial search again and agin),
skipping this set of records or quitting out of the program. The line in the text
file containing the SPICITE or REPORT-NUM element is then deleted and the next
line is read to start the next search.
Spirestasks #33 New authors in HEP
Checking new authors noticed at input into HEP
SPIRES protocols:
ppfin.checks
hep.check.new.authors
At the time of input into HEP each author name is checked in HEP.
If the name does not appear in HEP, the person doing the input is
prompted to check for misspelling and given a chance to coorect the input. If the
author is new, the name is added to a text file in the form:
arXiv:0709.0009:Daniel, Scott F.
Reading the first line, the author name is presented to an experienced person with
choices to browse the authorname in PRE-FOLIOAUTH, correct the name, skip this name
or quit the protocol. PRE-FOLIOAUTH contains all the author names in HEP, using the
format AUTHCHECK displays the author name with the number of records for the author
in HEP. An experienced person will then decide whether to accept the new author name
or correct it in the HEP record. The line containing this name in the text file is
then deleted and the next line is read, presenting the next AUTHOR.
Spirestasks #34 Records with COLLABORATION, but no EXPERIMENT element
Checks record with the COLLABORATION element, but missing the EXPERIMENT element
SPIRES protocol:
hep.check.exp
hep.add.experiments
Checks records in HEP updated the last 4 days with the COLLABORATION element,
but no EXPERIMENT element (excluding deleted records, temporary records, and
records with a HIDDEN-NOTE = noexpok, records that we have decided do not need
the EXPERIMENT element). Stores IRN, BULL and COLLABORATION elements in a text
file in the form:
7603878 arXiv:0801.0697 BABAR
Using the IRN on the first line searches for the COLLABORATION in the EXPERIMENTS
subfile and displays the record. Searches the EXPERIMENTS subfile to find the experiment
connected with this collaboration and constructs the elemen EXPERIMENT = xxx; ready
to be merged into the HEP subfile. Displays the proposed element, then gives choices to
merge it into HEP, edit before merging, skip this record or quitting out of the pretocol.
If the COLLABORATION is not found in the EXPERIMENTS subfile, offers
choices to skip the record, look at the record on another screen, or merge
HIDDEN-NOTE = noexpok; into the record. If the search in EXPERIMENTS results in
more than one COLLABORATION, displays the records and suggests solving this on another
screen, then again presents the choices to merge, skip, edit, etc. Finally the line
ocntaining the IRN, BULL and COLLABORATION is deleted in the text file and the
next line is read.
Spirestasks #36 Missing FIELD-CODE element
Checks for missing FIELD-CODE elements and prompts for input
SPIRES protocols:
hep.missing.fieldcode
hep.add.fieldcode
Checks records added to HEP for the last 4 days (excluding deleted records) to find
records without the FIELD-CODE element and stores the IRN in a text file.
Using the IRN on the first line, attempts to construct a FIELD-CODE element from the
BBDESCRIP element, presents it to an experienced person with choices to merge it
into the record, otherwise prompts for a FIELD-CODE element to be typed in.
Offers choices to merge, skip the record or quit, then erases the line containing
the IRN that has been done and goes on to the next IRN.
--
TravisBrooks - 06 Feb 2008