1.- General
--------
o
Online Analysis
Steve Armstrong
o
comment on likelihood ratio estimator and FFT method
Hongbo Hu
o
Possible combined signal ID and limit setting
- illustrated
with hnunu
Gavin Davies
o
likelihood fit to a higgs signal
Nikos Konstantinidis
o
btag status report
Nikos Konstantinidis
2.- Analyses
--------
o
Z -> ll in 4 jets channel
Anders Waananen
o
Studies in Hvv
Jennifer Kile
Minutes taken by David Smith
Steve gave an update on the online analysis proposed
in the last meeting.
The analysis has been named BEHOLD!, which stands
for BEhold, a Higgs Online Limit
and/or Discovery!
It consists of the four tasks described in the
proposed architecture which may
be summarised as:
1) Data and integrated luminosity acquisition
(from scanbook), performed daily.
2) Final state reference analyses.
3) Calculate and combine C.L.s using the candidate
lists and their associated
discriminating variables.
4) Generate suitable output, ie. plots
of the results and determine 95% C.L.
for Cs+b.
All these tasks are done, subject to probable
'evolution' of the output format.
192 GeV efficiencies, background and shapes still
need to be determined which
will only be possible once the 192 MC becomes
available.
BEHOLD! is non-trivial and so the Authors
(Steve and Jason) will remain responsible
for maintenance and updating.
Current BEHOLD! output for the 189 data was shown, with the SM limit at 93.07 GeV.
Discussion:
The on-line package will be installed on aloha as soon as possible.
It was pointed out that the limit computed by
the online analysis from the
189 data was not that which was sent to the winter
conference.
There were two differences that this was attributed
to: Firstly one of the
candidates in Hll was different, and secondly
the signal estimator is different.
For MC production there was much discussion on
which Ecm(s) to produce signal at.
The WW group had decided to make their major
productions at 196 GeV. It was
felt that in this should be the priority for
HTF as well. It was pointed
out that in general it is easier to interpolate
shapes and efficiencies rather than to
extrapolate them. However near thresholds, as
is the case for the ZZ background
and some HZ samples, additional MC at 192 GeV
may be required.
Hongbo gave a presentation, 'Comments on the Likelihood
Ratio (LR)
estimator and FFT method'. In this he reminded
us of the definition of
the LR estimator, and that it can be shown to
be the best estimator
(given certain conditions). Then a comparison
of the FFT method and ToyMC.
To run the whole analysis with the FFT takes
takes O(10 mins) whereas using ToyMC
experiments it takes O(a day).
For a precise 5 sigma discovery HongBo estimated
that the ToyMC experiments
would take O(a year) to perform, thus rendering
the method unusable for discovery.
The conclusion he gave was that the LR and FFT
method combination was the natural
and best way to combine the results. It is considerably
faster than the present
method and satisfies internal consistency.
It was discussed whether the conditions under
which the LR is the best estimator
apply to the HTF. It was not entirely clear if
they did, but anyway nothing is
lost by using the LR estimator. Since it is a
more sensitive estimator it was
also questioned whether the LR might increase
systematics in a low background
environment.
Gavin talked about a combined signal discovery
and limit setting procedure.
The method was inspired by techniques developed
by the UKDMC to calculate C.L.
on the upper limit of the WIMP component of the
Galactic dark matter, a group
Gavin worked in previously. He showed that the
method naturally gave an
indication of discovery or, in absence of signal,
a limit.
The measured distribution of a discriminating
variable is treated as a linear
combination of background and signal shapes.
The fraction of signal shape in
the measured distribution is then estimated by
log-likelihood fit. As an example
three hypothetical data sets were constructed,
each with a fixed
fraction of signal but with 10, 100 and 1000
candidates respectively. The signal
fraction was fitted and plotted and the procedure
repeated 1000 times. The
results of this showed the method was sensitive
to discriminating power present
in the background and signal shapes. When a 2d
fit using both mass and btag
shapes was made the distribution was clearly
peaked at the input signal fraction
even in the case of 10 candidates only. This
is without using any cross section
information.
For CL setting, many background only experiments
were prepared thus enabling
an upper limit on the signal fraction to be set
at the desired level. The mass
or the corresponding signal is then the limit.
However
this method is clearly unsuitable for calculating
the observed limit, where there
is only one given data set. So it was proposed
that perhaps the parent mass
shapes might be fluctuated according to their
errors.
This would enable one to also work out the amount
of MC needed to make fluctuations
in the fit negligible. In this case the shape
of the likelihood fit itself could
be used for limit setting.
In conclusion work is ongoing, but current results
show that this relatively simple
method is robust and can 'see' effects of uncertainty
in MC shapes and could
form a single method for discovery and limit
setting.
Nikos talked about the discovery exercise 'part
three'. In this he showed a
likelihood fit to a signal for the 1998 data
(hqq + hvv) and 'fake data' sample
in which a 95 GeV signal had been added.
The method was to take background and signal shapes
and perform a log
likelihood fit for the best signal x-section
at every higgs mass. If there is a
signal a statistically significant x-section
should be obtained, otherwise a limit
can be set on it.
In conclusion it was shown that this (simple)
method yielded a 95% CL, a signal
x-section and mass, along with the statistical
significance. There is also
no standard model dependence assumed since to
x-section is a free parameter. The
indication is that there is a similar sensitivity
for setting a limit as the
existing methods, although this is to be checked.
Nikos went on to talk about the b-tagging checks,
including comparisons of the 188.6 GeV
MC produced this year and the reprocessed 1998
data.
The reprocessed data has about 1% more 2VD-hit
tracks than before, and a few percent
better resolution on track d0s. The vertexing
and primary vertex are OK. The changes
in the MC produced in January over than available
in November include small
alignment errors, and no VDET or TPC hit smearing.
Moving onto the DATA vs MC comparison a number
of things look good. Track multiplicity,
QIPBTAG track types, cos theta of tracks are
good. However when looking at tagging
results using only 2 vdet hit tracks compared
to all tracks it became clear that
the data/mc agreement is much worse when using
all tracks. (2 vdet hit tracks make
up about 80% of the total number of tracks) The
disagreement is seen to be with udsc
events, determined by an investigation of hemisphere
tagging at the Z peak. Looking
for the cause of this discrepancy track Z0 and
D0 were compared between data and mc
for 2-vdet hit tracks and others. It was noticed
that there are quite large discrepancies
for non 2-vdet tracks, of the order of 20%.
To determine the effect on higgs analyses, the
d0/z0's were over smeared to get qiptag
results to agree at the Z. The smearing was then
applied to the high energy MC and
the changes in efficiency in the (hqq+4b)
analysis found. This was seen to be about
+12% background for 4b and +7% for hqq. These
differences do not
account for the observed selection discrepancies
in these analysis.
Anders presented a proposed change in the 4jet
selection for 1999. He showed a Zll
candidate from data also selected by 4jets cuts
selection. While it was evidently
a Zuu event it was not clear why it was selected
for Hqq. Some investigation showed
that one of the high pt muons was forming a low
multiplicity jet and being btagged well.
This suggested reintroducing a cut on high Pt
leptons entering the btagger. He
showed that adding such a cut removed 2 data
events from the Hqq selection, including
the Zll candidate.
The higgs efficiency was reduced by 1.7%, although
the analysis would have to be
reoptimised if this were to be introduced. In
addition to this suggestion he also
proposed a simple anti Z->ll cut. This
could make the Hll and Hqq analysis complimentary.
In particular the dijet mass of leptons from QSELEP
(only u and e) with opposite sign
and same flavour are found. A cut above 40 GeV
for Hll and below for Hqq make the
two analysis complimentary. For Hqq it removes
the Zll data candidate, but does
not touch the higgs efficiency. It also reduces
the expected contamination from Zll in
4jets by a factor of 2.
In conclusion the overlap of Hqq and Hll can be
decreased. More study is need,
of the high pt btag cut and possible improvements
to the proposed anti Zll cut.
Jennifer presented an investigation of a possible
alternative to the current
Wisconsin-Orsay combination in the Hvv channel.
The method is to use the
Wisconsin and Orsay NN outputs as inputs to another
NN.
The net structure tried was 2-20-3. It was trained
with standard preselection
on a cocktail of qq, WW background MC together
with 95 GeV signal. The
possible advantages of this analysis would be
(hopefully) smaller systematics
from shapes and an overall simpler analysis,
since a 3 channel combination reduces to
a single analysis.
The performance curve of the new net was shown,
together with optimization
detains: normal 80% QQ, WW subtraction leading
to a 30% working point.
Comparisons were then made between the new NN
and the Hvv
combinations and also the individual analysis.
The conclusions drawn included that indeed the
analysis can be
combined via a single NN. The performance
shows slight improvement over individual analysis
near mH=95, but slight
degradation over Wisconsin analysis at other
masses. It does not recover
full performance of standard Hvv combination.