Higgs Task Force meeting

76-th Higgs Task Force meeting, Thursday, 24th April, 1999

1.- General
--------

     o Online Analysis                                                                     Steve Armstrong
     o comment on likelihood ratio estimator and FFT method    Hongbo Hu
     o Possible combined signal ID and limit setting
       - illustrated with hnunu                                                        Gavin Davies
     o likelihood fit to a higgs signal                                             Nikos Konstantinidis
     o btag status report                                                                  Nikos Konstantinidis

2.- Analyses
--------

o Z -> ll in 4 jets channel Anders Waananen
o Studies in Hvv Jennifer Kile

Minutes taken by David Smith

Steve gave an update on the online analysis proposed in the last meeting.
The analysis has been named BEHOLD!, which stands for BEhold, a Higgs Online Limit
and/or Discovery!
It consists of the four tasks described in the proposed architecture which may
be summarised as:

1) Data and integrated luminosity acquisition (from scanbook), performed daily.
2) Final state reference analyses.
3) Calculate and combine C.L.s using the candidate lists and their associated
discriminating variables.
4) Generate suitable output, ie. plots of the results and determine 95% C.L.
for Cs+b.

All these tasks are done, subject to probable 'evolution' of the output format.
192 GeV efficiencies, background and shapes still need to be determined which
will only be possible once the 192 MC becomes available.
BEHOLD! is non-trivial and so the Authors (Steve and Jason) will remain responsible
for maintenance and updating.

Current BEHOLD! output for the 189 data was shown, with the SM limit at 93.07 GeV.

Discussion:

The on-line package will be installed on aloha as soon as possible.

It was pointed out that the limit computed by the online analysis from the
189 data was not that which was sent to the winter conference.
There were two differences that this was attributed to: Firstly one of the
candidates in Hll was different, and secondly the signal estimator is different.

For MC production there was much discussion on which Ecm(s) to produce signal at.
The WW group had decided to make their major productions at 196 GeV. It was
felt that in this should be the priority for HTF as well. It was pointed
out that in general it is easier to interpolate shapes and efficiencies rather than to
extrapolate them. However near thresholds, as is the case for the ZZ background
and some HZ samples, additional MC at 192 GeV may be required.

Hongbo gave a presentation, 'Comments on the Likelihood Ratio (LR)
estimator and FFT method'. In this he reminded us of the definition of
the LR estimator, and that it can be shown to be the best estimator
(given certain conditions). Then a comparison of the FFT method and ToyMC.
To run the whole analysis with the FFT takes takes O(10 mins) whereas using ToyMC
experiments it takes O(a day).
For a precise 5 sigma discovery HongBo estimated that the ToyMC experiments
would take O(a year) to perform, thus rendering the method unusable for discovery.

The conclusion he gave was that the LR and FFT method combination was the natural
and best way to combine the results. It is considerably faster than the present
method and satisfies internal consistency.

It was discussed whether the conditions under which the LR is the best estimator
apply to the HTF. It was not entirely clear if they did, but anyway nothing is
lost by using the LR estimator. Since it is a more sensitive estimator it was
also questioned whether the LR might increase systematics in a low background
environment.

Gavin talked about a combined signal discovery and limit setting procedure.
The method was inspired by techniques developed by the UKDMC to calculate C.L.
on the upper limit of the WIMP component of the Galactic dark matter, a group
Gavin worked in previously. He showed that the method naturally gave an
indication of discovery or, in absence of signal, a limit.

The measured distribution of a discriminating variable is treated as a linear
combination of background and signal shapes. The fraction of signal shape in
the measured distribution is then estimated by log-likelihood fit. As an example
three hypothetical data sets were constructed, each with a fixed
fraction of signal but with 10, 100 and 1000 candidates respectively. The signal
fraction was fitted and plotted and the procedure repeated 1000 times. The
results of this showed the method was sensitive to discriminating power present
in the background and signal shapes. When a 2d fit using both mass and btag
shapes was made the distribution was clearly peaked at the input signal fraction
even in the case of 10 candidates only. This is without using any cross section
information.

For CL setting, many background only experiments were prepared thus enabling
an upper limit on the signal fraction to be set at the desired level. The mass
or the corresponding signal is then the limit. However
this method is clearly unsuitable for calculating the observed limit, where there
is only one given data set. So it was proposed that perhaps the parent mass
shapes might be fluctuated according to their errors.
This would enable one to also work out the amount of MC needed to make fluctuations
in the fit negligible. In this case the shape of the likelihood fit itself could
be used for limit setting.

In conclusion work is ongoing, but current results show that this relatively simple
method is robust and can 'see' effects of uncertainty in MC shapes and could
form a single method for discovery and limit setting.

Nikos talked about the discovery exercise 'part three'. In this he showed a
likelihood fit to a signal for the 1998 data (hqq + hvv) and 'fake data' sample
in which a 95 GeV signal had been added.

The method was to take background and signal shapes and perform a log
likelihood fit for the best signal x-section at every higgs mass. If there is a
signal a statistically significant x-section should be obtained, otherwise a limit
can be set on it.

In conclusion it was shown that this (simple) method yielded a 95% CL, a signal
x-section and mass, along with the statistical significance. There is also
no standard model dependence assumed since to x-section is a free parameter. The
indication is that there is a similar sensitivity for setting a limit as the
existing methods, although this is to be checked.

Nikos went on to talk about the b-tagging checks, including comparisons of the 188.6 GeV
MC produced this year and the reprocessed 1998 data.

The reprocessed data has about 1% more 2VD-hit tracks than before, and a few percent
better resolution on track d0s. The vertexing and primary vertex are OK. The changes
in the MC produced in January over than available in November include small
alignment errors, and no VDET or TPC hit smearing.

Moving onto the DATA vs MC comparison a number of things look good. Track multiplicity,
QIPBTAG track types, cos theta of tracks are good. However when looking at tagging
results using only 2 vdet hit tracks compared to all tracks it became clear that
the data/mc agreement is much worse when using all tracks. (2 vdet hit tracks make
up about 80% of the total number of tracks) The disagreement is seen to be with udsc
events, determined by an investigation of hemisphere tagging at the Z peak. Looking
for the cause of this discrepancy track Z0 and D0 were compared between data and mc
for 2-vdet hit tracks and others. It was noticed that there are quite large discrepancies
for non 2-vdet tracks, of the order of 20%.

To determine the effect on higgs analyses, the d0/z0's were over smeared to get qiptag
results to agree at the Z. The smearing was then applied to the high energy MC and
the changes in efficiency in the (hqq+4b) analysis found. This was seen to be about
+12% background for 4b and +7% for hqq. These differences do not
account for the observed selection discrepancies in these analysis.

Anders presented a proposed change in the 4jet selection for 1999. He showed a Zll
candidate from data also selected by 4jets cuts selection. While it was evidently
a Zuu event it was not clear why it was selected for Hqq. Some investigation showed
that one of the high pt muons was forming a low multiplicity jet and being btagged well.
This suggested reintroducing a cut on high Pt leptons entering the btagger. He
showed that adding such a cut removed 2 data events from the Hqq selection, including
the Zll candidate.
The higgs efficiency was reduced by 1.7%, although the analysis would have to be
reoptimised if this were to be introduced. In addition to this suggestion he also
proposed a simple anti Z->ll cut. This could make the Hll and Hqq analysis complimentary.

In particular the dijet mass of leptons from QSELEP (only u and e) with opposite sign
and same flavour are found. A cut above 40 GeV for Hll and below for Hqq make the
two analysis complimentary. For Hqq it removes the Zll data candidate, but does
not touch the higgs efficiency. It also reduces the expected contamination from Zll in
4jets by a factor of 2.

In conclusion the overlap of Hqq and Hll can be decreased. More study is need,
of the high pt btag cut and possible improvements to the proposed anti Zll cut.

Jennifer presented an investigation of a possible alternative to the current
Wisconsin-Orsay combination in the Hvv channel. The method is to use the
Wisconsin and Orsay NN outputs as inputs to another NN.
The net structure tried was 2-20-3. It was trained with standard preselection
on a cocktail of qq, WW background MC together with 95 GeV signal. The
possible advantages of this analysis would be (hopefully) smaller systematics
from shapes and an overall simpler analysis, since a 3 channel combination reduces to
a single analysis.

The performance curve of the new net was shown, together with optimization
detains: normal 80% QQ, WW subtraction leading to a 30% working point.
Comparisons were then made between the new NN and the Hvv
combinations and also the individual analysis.

The conclusions drawn included that indeed the analysis can be
combined via a single NN. The performance
shows slight improvement over individual analysis near mH=95, but slight
degradation over Wisconsin analysis at other masses. It does not recover
full performance of standard Hvv combination.