CMS logoCMS event Hgg
Compact Muon Solenoid
LHC, CERN

CMS-BPH-16-004 ; CERN-EP-2019-215
Measurement of properties of ${\mathrm{B}^{0}_{\mathrm{s}}\to\mu^{+}\mu^{-}}$ decays and search forf ${\mathrm{B}^{0}\to\mu^{+}\mu^{-}}$ with the CMS experiment
JHEP 04 (2020) 188
Abstract: Results are reported for the ${\mathrm{B}^{0}_{\mathrm{s}}\to\mu^{+}\mu^{-}}$ branching fraction and effective lifetime and from a search for the decay ${\mathrm{B}^{0}\to\mu^{+}\mu^{-}} $. The analysis uses a data sample of proton-proton collisions accumulated by the CMS experiment in 2011, 2012, and 2016, with center-of-mass energies (integrated luminosities) of 7 TeV (5 fb$^{-1}$), 8 TeV (20 fb$^{-1}$), and 13 TeV (36 fb$^{-1}$). The branching fractions are determined by measuring event yields relative to ${\mathrm{B^{+}}\to{\mathrm{J}/\psi} \mathrm{K^{+}}} $ decays (with ${\mathrm{J}/\psi} \to\mu^{+}\mu^{-}$), which results in the reduction of many of the systematic uncertainties. The decay ${\mathrm{B}^{0}_{\mathrm{s}}\to\mu^{+}\mu^{-}}$ is observed with a significance of 5.6 standard deviations. The branching fraction is measured to be $\mathcal{B}({\mathrm{B}^{0}_{\mathrm{s}}\to\mu^{+}\mu^{-}} ) = $ [2.9 $^{+0.7}_{-0.6}$ (exp) $\pm$ 0.2 (frag) ]$\times 10^{-9}$, where the first uncertainty combines the experimental statistical and systematic contributions, and the second is due to the uncertainty in the ratio of the $\mathrm{B}^{0}_{\mathrm{s}}$ and the $\mathrm{B^{+}}$ fragmentation functions. No significant excess is observed for the decay ${\mathrm{B}^{0}\to\mu^{+}\mu^{-}} $, and an upper limit of ${\mathcal{B}} ({\mathrm{B}^{0}\to\mu^{+}\mu^{-}} ) < $ 3.6$\times10^{-10}$ is obtained at 95% confidence level. The ${\mathrm{B}^{0}_{\mathrm{s}}\to\mu^{+}\mu^{-}}$ effective lifetime is measured to be ${\tau_{\mu^{+}\mu^{-}}} = $ 1.70 $^{+0.61}_{-0.44}$ ps. These results are consistent with standard model predictions.
Figures & Tables Summary Additional Figures References CMS Publications
Figures

png pdf
Figure 1:
Invariant mass distributions for the $\mu \mu \mathrm{K} $ system used to reconstruct the $ {\mathrm{B^{+}} \to {\mathrm{J}/\psi} \mathrm{K^{+}}} $ normalization sample. The plot on the left shows the 2016A central-region channel ($ {| {\eta ^{\text {f}}_{\mu}} |} < $ 0.7), while the plot on the right shows the 2016B forward-region channel (0.7 $ < {| {\eta ^{\text {f}}_{\mu}} |} < $ 1.4). The mass resolutions for these channels are 30 and 43 MeV, respectively. The data are shown by solid black circles, the result of the fit is overlaid with the black line, and the different components are indicated by the hatched regions.

png pdf
Figure 1-a:
Invariant mass distribution for the $\mu \mu \mathrm{K} $ system used to reconstruct the $ {\mathrm{B^{+}} \to {\mathrm{J}/\psi} \mathrm{K^{+}}} $ normalization sample. The plot shows the 2016A central-region channel ($ {| {\eta ^{\text {f}}_{\mu}} |} < $ 0.7). The mass resolution for this channel is 30 MeV. The data are shown by solid black circles, the result of the fit is overlaid with the black line, and the different components are indicated by the hatched regions.

png pdf
Figure 1-b:
Invariant mass distribution for the $\mu \mu \mathrm{K} $ system used to reconstruct the $ {\mathrm{B^{+}} \to {\mathrm{J}/\psi} \mathrm{K^{+}}} $ normalization sample. The plot shows the 2016B forward-region channel (0.7 $ < {| {\eta ^{\text {f}}_{\mu}} |} < $ 1.4). The mass resolution for this channel is 43 MeV. The data are shown by solid black circles, the result of the fit is overlaid with the black line, and the different components are indicated by the hatched regions.

png pdf
Figure 2:
Comparison of measured and simulated $ {\mathrm{B^{+}} \to {\mathrm{J}/\psi} \mathrm{K^{+}}} $ distributions for the most discriminating analysis BDT variables in the central channel for 2016B: the flight length significance, the pointing angle, and the number of tracks close to the secondary vertex. The events are required to pass the preselection for the analysis BDT training. See text for details. The background-subtracted data are shown by solid circles and the MC simulation by the hatched histogram. The MC histograms are normalized to the number of events in the data. The lower panels display the ratio of the data to the MC simulation. The band in the ratio plot illustrates a $ \pm $20% variation.

png pdf
Figure 2-a:
Comparison of measured and simulated $ {\mathrm{B^{+}} \to {\mathrm{J}/\psi} \mathrm{K^{+}}} $ distribution for the flight length significance in the central channel for 2016B. The events are required to pass the preselection for the analysis BDT training. See text for details. The background-subtracted data are shown by solid circles and the MC simulation by the hatched histogram. The MC histograms are normalized to the number of events in the data. The lower panel displays the ratio of the data to the MC simulation. The band in the ratio plot illustrates a $ \pm $20% variation.

png pdf
Figure 2-b:
Comparison of measured and simulated $ {\mathrm{B^{+}} \to {\mathrm{J}/\psi} \mathrm{K^{+}}} $ distribution for the pointing angle in the central channel for 2016B. The events are required to pass the preselection for the analysis BDT training. See text for details. The background-subtracted data are shown by solid circles and the MC simulation by the hatched histogram. The MC histograms are normalized to the number of events in the data. The lower panel displays the ratio of the data to the MC simulation. The band in the ratio plot illustrates a $ \pm $20% variation.

png pdf
Figure 2-c:
Comparison of measured and simulated $ {\mathrm{B^{+}} \to {\mathrm{J}/\psi} \mathrm{K^{+}}} $ distribution for the number of tracks close to the secondary vertex in the central channel for 2016B. The events are required to pass the preselection for the analysis BDT training. See text for details. The background-subtracted data are shown by solid circles and the MC simulation by the hatched histogram. The MC histograms are normalized to the number of events in the data. The lower panel displays the ratio of the data to the MC simulation. The band in the ratio plot illustrates a $ \pm $20% variation.

png pdf
Figure 3:
Comparison of measured and simulated $ {\mathrm{B^{+}} \to {\mathrm{J}/\psi} \mathrm{K^{+}}} $ distributions for kinematic variables in the central channel for 2016B: the subleading muon $ {p_{\mathrm {T}}}$, the muon helicity angle, and the B meson proper decay time. The events are required to pass the preselection for the analysis BDT training. See text for details. The background-subtracted data are shown by solid circles and the MC simulation by the hatched histogram. The MC histograms are normalized to the number of events in the data. The lower panels display the ratio of the data to the MC simulation. The band in the ratio plot illustrates a $ \pm $20% variation.

png pdf
Figure 3-a:
Comparison of measured and simulated $ {\mathrm{B^{+}} \to {\mathrm{J}/\psi} \mathrm{K^{+}}} $ distribution for the subleading muon $ {p_{\mathrm {T}}}$ for 2016B. The events are required to pass the preselection for the analysis BDT training. See text for details. The background-subtracted data are shown by solid circles and the MC simulation by the hatched histogram. The MC histograms are normalized to the number of events in the data. The lower panel displays the ratio of the data to the MC simulation. The band in the ratio plot illustrates a $ \pm $20% variation.

png pdf
Figure 3-b:
Comparison of measured and simulated $ {\mathrm{B^{+}} \to {\mathrm{J}/\psi} \mathrm{K^{+}}} $ distribution for the muon helicity angle in the central channel for 2016B. The events are required to pass the preselection for the analysis BDT training. See text for details. The background-subtracted data are shown by solid circles and the MC simulation by the hatched histogram. The MC histograms are normalized to the number of events in the data. The lower panel displays the ratio of the data to the MC simulation. The band in the ratio plot illustrates a $ \pm $20% variation.

png pdf
Figure 3-c:
Comparison of measured and simulated $ {\mathrm{B^{+}} \to {\mathrm{J}/\psi} \mathrm{K^{+}}} $ distribution for the B meson proper decay time in the central channel for 2016B. The events are required to pass the preselection for the analysis BDT training. See text for details. The background-subtracted data are shown by solid circles and the MC simulation by the hatched histogram. The MC histograms are normalized to the number of events in the data. The lower panel displays the ratio of the data to the MC simulation. The band in the ratio plot illustrates a $ \pm $20% variation.

png pdf
Figure 4:
(Top row) Comparison of the analysis BDT discriminator distributions for $ {\mathrm{B^{+}} \to {\mathrm{J}/\psi} \mathrm{K^{+}}} $ in background-subtracted data and MC simulation in the central channel for 2011 (left column), 2012 (middle column), and 2016B (right column). The lower panels display the ratio of the data to the MC simulation. The band in the ratio plot illustrates a $ \pm $20% variation. (Bottom row) Illustration of the analysis BDT discriminator distribution in dimuon background data from the 5.45 $ < {m_{\mu^{+} \mu^{-}}} < $ 5.9 GeV sideband and $ {\mathrm{B}^{0}_{\mathrm{s}} \to \mu^{+} \mu^{-}} $ signal MC simulation. The distributions correspond to the full preselection and are normalized to the same number of entries. The solid markers show the data and the hatched histogram the MC simulation. The arrows show the BDT discriminator boundaries provided in Table 1.

png pdf
Figure 4-a:
Comparison of the analysis BDT discriminator distributions for $ {\mathrm{B^{+}} \to {\mathrm{J}/\psi} \mathrm{K^{+}}} $ in background-subtracted data and MC simulation in the central channel for 2011. The lower panel displays the ratio of the data to the MC simulation. The band in the ratio plot illustrates a $ \pm $20% variation.

png pdf
Figure 4-b:
Comparison of the analysis BDT discriminator distributions for $ {\mathrm{B^{+}} \to {\mathrm{J}/\psi} \mathrm{K^{+}}} $ in background-subtracted data and MC simulation in the central channel for 2012. The lower panel displays the ratio of the data to the MC simulation. The band in the ratio plot illustrates a $ \pm $20% variation.

png pdf
Figure 4-c:
Comparison of the analysis BDT discriminator distributions for $ {\mathrm{B^{+}} \to {\mathrm{J}/\psi} \mathrm{K^{+}}} $ in background-subtracted data and MC simulation in the central channel for 2016B. The lower panel displays the ratio of the data to the MC simulation. The band in the ratio plot illustrates a $ \pm $20% variation.

png pdf
Figure 4-d:
Illustration of the analysis BDT discriminator distribution in dimuon background data from the 5.45 $ < {m_{\mu^{+} \mu^{-}}} < $ 5.9 GeV sideband and $ {\mathrm{B}^{0}_{\mathrm{s}} \to \mu^{+} \mu^{-}} $ signal MC simulation. The distribution corresponds to the full preselection in the central channel for 2011 and is normalized to the same number of entries. The solid markers show the data and the hatched histogram the MC simulation. The arrows show the BDT discriminator boundaries provided in Table 1.

png pdf
Figure 4-e:
Illustration of the analysis BDT discriminator distribution in dimuon background data from the 5.45 $ < {m_{\mu^{+} \mu^{-}}} < $ 5.9 GeV sideband and $ {\mathrm{B}^{0}_{\mathrm{s}} \to \mu^{+} \mu^{-}} $ signal MC simulation. The distribution corresponds to the full preselection in the central channel for 2012 and is normalized to the same number of entries. The solid markers show the data and the hatched histogram the MC simulation. The arrows show the BDT discriminator boundaries provided in Table 1.

png pdf
Figure 4-f:
Illustration of the analysis BDT discriminator distribution in dimuon background data from the 5.45 $ < {m_{\mu^{+} \mu^{-}}} < $ 5.9 GeV sideband and $ {\mathrm{B}^{0}_{\mathrm{s}} \to \mu^{+} \mu^{-}} $ signal MC simulation. The distribution corresponds to the full preselection in the central channel for 2016B and is normalized to the same number of entries. The solid markers show the data and the hatched histogram the MC simulation. The arrows show the BDT discriminator boundaries provided in Table 1.

png pdf
Figure 5:
Invariant mass distributions with the fit projection overlays for the branching fraction results. The left (right) plot shows the combined results from the high- (low-)range analysis BDT categories defined in Table 1. The total fit is shown by the solid line and the different background components by the broken lines. The signal components are shown by the hatched distributions.

png pdf
Figure 5-a:
Invariant mass distributions with the fit projection overlays for the branching fraction results. The plot shows the combined results from the high-range analysis BDT categories defined in Table 1. The total fit is shown by the solid line and the different background components by the broken lines. The signal components are shown by the hatched distributions.

png pdf
Figure 5-b:
Invariant mass distributions with the fit projection overlays for the branching fraction results. The plot shows the combined results from the low-range analysis BDT categories defined in Table 1. The total fit is shown by the solid line and the different background components by the broken lines. The signal components are shown by the hatched distributions.

png pdf
Figure 6:
(Left) Likelihood contours for the fit to the branching fractions $ {\mathcal {B}} ({\mathrm{B}^{0}_{\mathrm{s}} \to \mu^{+} \mu^{-}})$ and $ {\mathcal {B}} ({\mathrm{B}^{0}_{\phantom{\mathrm{s}}} \to \mu^{+} \mu^{-}})$, together with the best-fit value (cross) and the SM expectation (solid square). The contours correspond to regions with 1-5 standard deviation coverage. (Right) The quantity 1$-$CL as a function of the assumed $ {\mathrm{B}^{0}_{\phantom{\mathrm{s}}} \to \mu^{+} \mu^{-}} $ branching fraction. The dashed curve shows the median expected value for the background-only hypothesis, while the solid line is the observed value. The shaded region indicates the $ \pm $1 standard deviation uncertainty band.

png pdf
Figure 6-a:
Likelihood contours for the fit to the branching fractions $ {\mathcal {B}} ({\mathrm{B}^{0}_{\mathrm{s}} \to \mu^{+} \mu^{-}})$ and $ {\mathcal {B}} ({\mathrm{B}^{0}_{\phantom{\mathrm{s}}} \to \mu^{+} \mu^{-}})$, together with the best-fit value (cross) and the SM expectation (solid square). The contours correspond to regions with 1-5 standard deviation coverage.

png pdf
Figure 6-b:
The quantity 1$- $CL as a function of the assumed $ {\mathrm{B}^{0}_{\phantom{\mathrm{s}}} \to \mu^{+} \mu^{-}} $ branching fraction. The dashed curve shows the median expected value for the background-only hypothesis, while the solid line is the observed value. The shaded region indicates the $ \pm $1 standard deviation uncertainty band.

png pdf
Figure 7:
Invariant mass (left) and proper decay time (right) distributions, with the 2D UML fit projections overlaid. The data combine all channels passing the analysis BDT discriminator requirements as given in Table 4. The total fit is shown by the solid line and the different background components by the broken lines and cross-hatched distributions. The signal component is shown by the single-hatched distribution.

png pdf
Figure 7-a:
Invariant mass distribution, with the 2D UML fit projections overlaid. The data combine all channels passing the analysis BDT discriminator requirements as given in Table 4. The total fit is shown by the solid line and the different background components by the broken lines and cross-hatched distributions. The signal component is shown by the single-hatched distribution.

png pdf
Figure 7-b:
Proper decay time distribution, with the 2D UML fit projections overlaid. The data combine all channels passing the analysis BDT discriminator requirements as given in Table 4. The total fit is shown by the solid line and the different background components by the broken lines and cross-hatched distributions. The signal component is shown by the single-hatched distribution.

png pdf
Figure 8:
Invariant mass (left) and proper decay time (right) distributions, with the sPlot fit projections overlaid. The data combine all channels passing the analysis BDT discriminator requirements as given in Table 4. For the mass distribution, no requirement on the decay time is applied. The total fit is shown by the solid line, the different background components by the broken lines and cross-hatched distribution. The signal component is shown by the single-hatched distribution.

png pdf
Figure 8-a:
Invariant mass time distribution, with the sPlot fit projections overlaid. The data combine all channels passing the analysis BDT discriminator requirements as given in Table 4. For the mass distribution, no requirement on the decay time is applied. The total fit is shown by the solid line, the different background components by the broken lines and cross-hatched distribution. The signal component is shown by the single-hatched distribution.

png pdf
Figure 8-b:
Proper decay time distribution, with the sPlot fit projections overlaid. The data combine all channels passing the analysis BDT discriminator requirements as given in Table 4. For the mass distribution, no requirement on the decay time is applied. The total fit is shown by the solid line, the different background components by the broken lines and cross-hatched distribution. The signal component is shown by the single-hatched distribution.
Tables

png pdf
Table 1:
Analysis BDT discriminator boundaries per category, channel, and running period for the branching fraction determination (2011 has only one category because of the small sample size). Examples of the requirements for the central channels are illustrated in Fig. 4 (bottom row).

png pdf
Table 2:
Summary of systematic uncertainty sources described in the text. The uncertainties quoted for the branching fraction $ {\mathcal {B}} ({\mathrm{B}^{0}_{\mathrm{s}} \to \mu^{+} \mu^{-}})$ are relative uncertainties, while the uncertainties for the effective lifetime $ {\tau _{\mu^{+} \mu^{-}}} $ are absolute and are given for both the 2D UML and sPlot analysis methods. The relative uncertainties in the upper limit on $ {\mathcal {B}} ({\mathrm{B}^{0}_{\phantom{\mathrm{s}}} \to \mu^{+} \mu^{-}})$ differ for the background yields, but have negligible impact on that result. The bottom rows provide the total systematic uncertainty and the total uncertainty in the branching fraction and the effective lifetime measurements. Contributions that are included in other items are indicated by (*).

png pdf
Table 3:
Summary of the fitted yields for ${\mathrm{B}^{0}_{\mathrm{s}} \to \mu^{+} \mu^{-}}$, ${\mathrm{B}^{0}_{\phantom{\mathrm{s}}} \to \mu^{+} \mu^{-}}$, the combinatorial background for 5.2 $ < {m_{\mu^{+} \mu^{-}}} < $ 5.45 GeV, and the $ {\mathrm{B^{+}} \to {\mathrm{J}/\psi} \mathrm{K^{+}}} $ normalization, the average $ {p_{\mathrm {T}}}$ of the $ {\mathrm{B}^{0}_{\mathrm{s}} \to \mu^{+} \mu^{-}} $ signal, and the ratio of efficiencies between the normalization and the signal for all 14 categories of the 3D UML branching fraction fit. The high and low ranges of the analysis BDT discriminator distribution are defined in Table 1. The size of the peaking background is 5-10% of the $ {\mathrm{B}^{0}_{\phantom{\mathrm{s}}} \to \mu^{+} \mu^{-}} $ signal. The average $ {p_{\mathrm {T}}}$ is calculated from the MC simulation and has negligible uncertainties. The uncertainties shown include the statistical and systematic components. It should be noted that the $ {\mathrm{B}^{0}_{\mathrm{s}} \to \mu^{+} \mu^{-}} $ and $ {\mathrm{B}^{0}_{\phantom{\mathrm{s}}} \to \mu^{+} \mu^{-}} $ yields and their uncertainties are determined from the branching fraction fit and also include the normalization uncertainties.

png pdf
Table 4:
Analysis BDT discriminator minimum requirements per channel and running period for the 1D and 2D effective lifetime fits.
Summary
Measurements of the rare leptonic B meson decays ${\mathrm{B}^{0}_{\mathrm{s}}\to\mu^{+}\mu^{-}}$ and ${\mathrm{B}^{0}\to\mu^{+}\mu^{-}}$ have been performed in ${\mathrm{p}}{\mathrm{p}}$ collision data collected by the CMS experiment at the LHC, corresponding to integrated luminosities of 5 fb$^{-1}$ at center-of-mass energy 7 TeV, 20 fb$^{-1}$ at 8 TeV, and 36 fb$^{-1}$ at 13 TeV. The ${\mathrm{B}^{0}_{\mathrm{s}}\to\mu^{+}\mu^{-}}$ decay is observed with a significance of ${5.6}$ standard deviations and the time-integrated branching fraction is measured to be ${\mathcal{B}} ({\mathrm{B}^{0}_{\mathrm{s}}\to\mu^{+}\mu^{-}} ) = {\mathrm{e}}sObsBFBsmm$, where the experimental uncertainty combines the statistical and systematic terms, and the second uncertainty refers to the uncertainty in the ratio of the $\mathrm{B}^{0}_{\mathrm{s}}$ and the $\mathrm{B^{+}}$ fragmentation functions. No significant ${\mathrm{B}^{0}\to\mu^{+}\mu^{-}}$ signal is observed and an upper limit ${\mathcal{B}} ({\mathrm{B}^{0}\to\mu^{+}\mu^{-}} ) < {3.6\times10^{-10}}$ is determined at {95}% confidence level. The ${\mathrm{B}^{0}_{\mathrm{s}}\to\mu^{+}\mu^{-}}$ effective lifetime is found to be ${\tau_{\mu^{+}\mu^{-}}} = {\mathrm{e}}sObsTauBsmmps$, where the uncertainty combines both statistical and systematic components. The results for the branching fractions supersede the previous results from CMS [11], which were based on the 7 and 8 TeV\ data only. All of the results are in agreement with the standard model predictions.
Additional Figures

png pdf
Additional Figure 1:
(Left) The two-dimensional probability contours representing the simultaneous measurement of the relative probabilities of the $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ vs. $\mathrm{B}^{0}_{\phantom{\mathrm{s}}}\to \mu^{+}\mu^{-}$ decays; the various contours correspond to (innermost to outermost) 1, 2, 3, 4, and 5 standard deviations. The black cross represents the CMS measurement, while the red point corresponds to the SM prediction. (Right) Confidence level (CL) as a function of the assumed $\mathrm{B}^{0}_{\phantom{\mathrm{s}}}\to \mu^{+}\mu^{-}$ branching fraction. The blue solid curve shows the values calculated with a likelihood scan which is based on Wilks' theorem, while the red dashed curve show the results from the Feldman-Cousins approach.

png pdf
Additional Figure 1-a:
The two-dimensional probability contours representing the simultaneous measurement of the relative probabilities of the $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ vs. $\mathrm{B}^{0}_{\phantom{\mathrm{s}}}\to \mu^{+}\mu^{-}$ decays; the various contours correspond to (innermost to outermost) 1, 2, 3, 4, and 5 standard deviations. The black cross represents the CMS measurement, while the red point corresponds to the SM prediction.

png pdf
Additional Figure 1-b:
Confidence level (CL) as a function of the assumed $\mathrm{B}^{0}_{\phantom{\mathrm{s}}}\to \mu^{+}\mu^{-}$ branching fraction. The blue solid curve shows the values calculated with a likelihood scan which is based on Wilks' theorem, while the red dashed curve show the results from the Feldman-Cousins approach.

png pdf
Additional Figure 2:
The one-dimensional likelihood ratio ($-2\ln(L/L_{\rm max})$) as functions of the assumed $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ (left) or $\mathrm{B}^{0}_{\phantom{\mathrm{s}}}\to \mu^{+}\mu^{-}$ (right) branching fractions, when the other branching fraction is profiled together with other nuisance parameters.

png pdf
Additional Figure 2-a:
The one-dimensional likelihood ratio ($-2\ln(L/L_{\rm max})$) as functions of the assumed $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ branching fraction, when the other branching fraction is profiled together with other nuisance parameters.

png pdf
Additional Figure 2-b:
The one-dimensional likelihood ratio ($-2\ln(L/L_{\rm max})$) as functions of the assumed $\mathrm{B}^{0}_{\phantom{\mathrm{s}}}\to \mu^{+}\mu^{-}$ branching fraction, when the other branching fraction is profiled together with other nuisance parameters.

png pdf
Additional Figure 3:
The lifetime distributions of the observed $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ candidates, fitted with a function that has the exponential shape expected for the lifetime of a particle. The plot on the left shows all of the candidates; the plot in the middle shows the candidates in the signal region of $m_{\mu^{+}\mu^{-}}$; the plot on the right shows the distribution after a subtraction of the background contributions. The data in the left-hand histogram are plotted on a logarithmic scale, while the middle and right-hand histograms are plotted using a linear scale. The turnover of the function at very low lifetimes is an artifact of a selection requirement on the minimum decay length of the $\mathrm{B}^{0}_{\mathrm{s}}$ meson.

png pdf
Additional Figure 3-a:
The lifetime distribution of the observed $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ candidates, fitted with a function that has the exponential shape expected for the lifetime of a particle. The plot shows all of the candidates. The data are plotted on a logarithmic scale. The turnover of the function at very low lifetimes is an artifact of a selection requirement on the minimum decay length of the $\mathrm{B}^{0}_{\mathrm{s}}$ meson.

png pdf
Additional Figure 3-b:
The lifetime distribution of the observed $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ candidates, fitted with a function that has the exponential shape expected for the lifetime of a particle. The plot shows the candidates in the signal region of $m_{\mu^{+}\mu^{-}}$. The data are plotted using a linear scale. The turnover of the function at very low lifetimes is an artifact of a selection requirement on the minimum decay length of the $\mathrm{B}^{0}_{\mathrm{s}}$ meson.

png pdf
Additional Figure 3-c:
The lifetime distribution of the observed $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ candidates, fitted with a function that has the exponential shape expected for the lifetime of a particle. The plot shows the distribution after a subtraction of the background contributions. The data are plotted using a linear scale. The turnover of the function at very low lifetimes is an artifact of a selection requirement on the minimum decay length of the $\mathrm{B}^{0}_{\mathrm{s}}$ meson.

png pdf
Additional Figure 4:
Event displays of a $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ candidate in Run 2 data. The two curved red lines correspond to the two muons from the decay. The inset zooms in on the innermost CMS detector region. The tracks other than the muon ones have been removed for clarity. The two muons do not come from the proton-proton collision point, shown as a yellow dot, but from the decay vertex of the $\mathrm{B}^{0}_{\mathrm{s}}$ meson, shown as a red dot.

png
Additional Figure 4-a:
Event display of a $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ candidate in Run 2 data. The two curved red lines correspond to the two muons from the decay. The tracks other than the muon ones have been removed for clarity.

png
Additional Figure 4-b:
Event display of a $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ candidate in Run 2 data. The two curved red lines correspond to the two muons from the decay. The inset zooms in on the innermost CMS detector region. The tracks other than the muon ones have been removed for clarity. The two muons do not come from the proton-proton collision point, shown as a yellow dot, but from the decay vertex of the $\mathrm{B}^{0}_{\mathrm{s}}$ meson, shown as a red dot.

png pdf
Additional Figure 5:
Invariant mass distribution for each analysis BDT category with the fit projection overlays for the branching fraction results. The total fit is shown by the solid line and the different background components by the broken lines. The signal components are shown by the hatched histograms.

png pdf
Additional Figure 5-a:
Invariant mass distribution for one of the analysis BDT categories with the fit projection overlays for the branching fraction results. The total fit is shown by the solid line and the different background components by the broken lines. The signal components are shown by the hatched histograms.

png pdf
Additional Figure 5-b:
Invariant mass distribution for one of the analysis BDT categories with the fit projection overlays for the branching fraction results. The total fit is shown by the solid line and the different background components by the broken lines. The signal components are shown by the hatched histograms.

png pdf
Additional Figure 5-c:
Invariant mass distribution for one of the analysis BDT categories with the fit projection overlays for the branching fraction results. The total fit is shown by the solid line and the different background components by the broken lines. The signal components are shown by the hatched histograms.

png pdf
Additional Figure 5-d:
Invariant mass distribution for one of the analysis BDT categories with the fit projection overlays for the branching fraction results. The total fit is shown by the solid line and the different background components by the broken lines. The signal components are shown by the hatched histograms.

png pdf
Additional Figure 5-e:
Invariant mass distribution for one of the analysis BDT categories with the fit projection overlays for the branching fraction results. The total fit is shown by the solid line and the different background components by the broken lines. The signal components are shown by the hatched histograms.

png pdf
Additional Figure 5-f:
Invariant mass distribution for one of the analysis BDT categories with the fit projection overlays for the branching fraction results. The total fit is shown by the solid line and the different background components by the broken lines. The signal components are shown by the hatched histograms.

png pdf
Additional Figure 5-g:
Invariant mass distribution for one of the analysis BDT categories with the fit projection overlays for the branching fraction results. The total fit is shown by the solid line and the different background components by the broken lines. The signal components are shown by the hatched histograms.

png pdf
Additional Figure 5-h:
Invariant mass distribution for one of the analysis BDT categories with the fit projection overlays for the branching fraction results. The total fit is shown by the solid line and the different background components by the broken lines. The signal components are shown by the hatched histograms.

png pdf
Additional Figure 5-i:
Invariant mass distribution for one of the analysis BDT categories with the fit projection overlays for the branching fraction results. The total fit is shown by the solid line and the different background components by the broken lines. The signal components are shown by the hatched histograms.

png pdf
Additional Figure 5-j:
Invariant mass distribution for one of the analysis BDT categories with the fit projection overlays for the branching fraction results. The total fit is shown by the solid line and the different background components by the broken lines. The signal components are shown by the hatched histograms.

png pdf
Additional Figure 5-k:
Invariant mass distribution for one of the analysis BDT categories with the fit projection overlays for the branching fraction results. The total fit is shown by the solid line and the different background components by the broken lines. The signal components are shown by the hatched histograms.

png pdf
Additional Figure 5-l:
Invariant mass distribution for one of the analysis BDT categories with the fit projection overlays for the branching fraction results. The total fit is shown by the solid line and the different background components by the broken lines. The signal components are shown by the hatched histograms.

png pdf
Additional Figure 5-m:
Invariant mass distribution for one of the analysis BDT categories with the fit projection overlays for the branching fraction results. The total fit is shown by the solid line and the different background components by the broken lines. The signal components are shown by the hatched histograms.

png pdf
Additional Figure 5-n:
Invariant mass distribution for one of the analysis BDT categories with the fit projection overlays for the branching fraction results. The total fit is shown by the solid line and the different background components by the broken lines. The signal components are shown by the hatched histograms.

png pdf
Additional Figure 6:
Invariant mass and proper decay time distributions for each analysis BDT category, with the 2D UML fit projections overlayed. The total fit is shown by the solid line and the different background components by the broken lines and cross-hatched histogram. The signal component is shown by the red single-hatched histogram.

png pdf
Additional Figure 6-a:
Invariant mass and proper decay time distribution for on the analysis BDT categories, with the 2D UML fit projections overlayed. The total fit is shown by the solid line and the different background components by the broken lines and cross-hatched histogram. The signal component is shown by the red single-hatched histogram.

png pdf
Additional Figure 6-b:
Invariant mass and proper decay time distribution for on the analysis BDT categories, with the 2D UML fit projections overlayed. The total fit is shown by the solid line and the different background components by the broken lines and cross-hatched histogram. The signal component is shown by the red single-hatched histogram.

png pdf
Additional Figure 6-c:
Invariant mass and proper decay time distribution for on the analysis BDT categories, with the 2D UML fit projections overlayed. The total fit is shown by the solid line and the different background components by the broken lines and cross-hatched histogram. The signal component is shown by the red single-hatched histogram.

png pdf
Additional Figure 6-d:
Invariant mass and proper decay time distribution for on the analysis BDT categories, with the 2D UML fit projections overlayed. The total fit is shown by the solid line and the different background components by the broken lines and cross-hatched histogram. The signal component is shown by the red single-hatched histogram.

png pdf
Additional Figure 6-e:
Invariant mass and proper decay time distribution for on the analysis BDT categories, with the 2D UML fit projections overlayed. The total fit is shown by the solid line and the different background components by the broken lines and cross-hatched histogram. The signal component is shown by the red single-hatched histogram.

png pdf
Additional Figure 6-f:
Invariant mass and proper decay time distribution for on the analysis BDT categories, with the 2D UML fit projections overlayed. The total fit is shown by the solid line and the different background components by the broken lines and cross-hatched histogram. The signal component is shown by the red single-hatched histogram.

png pdf
Additional Figure 6-g:
Invariant mass and proper decay time distribution for on the analysis BDT categories, with the 2D UML fit projections overlayed. The total fit is shown by the solid line and the different background components by the broken lines and cross-hatched histogram. The signal component is shown by the red single-hatched histogram.

png pdf
Additional Figure 6-h:
Invariant mass and proper decay time distribution for on the analysis BDT categories, with the 2D UML fit projections overlayed. The total fit is shown by the solid line and the different background components by the broken lines and cross-hatched histogram. The signal component is shown by the red single-hatched histogram.

png pdf
Additional Figure 6-i:
Invariant mass and proper decay time distribution for on the analysis BDT categories, with the 2D UML fit projections overlayed. The total fit is shown by the solid line and the different background components by the broken lines and cross-hatched histogram. The signal component is shown by the red single-hatched histogram.

png pdf
Additional Figure 6-j:
Invariant mass and proper decay time distribution for on the analysis BDT categories, with the 2D UML fit projections overlayed. The total fit is shown by the solid line and the different background components by the broken lines and cross-hatched histogram. The signal component is shown by the red single-hatched histogram.

png pdf
Additional Figure 6-k:
Invariant mass and proper decay time distribution for on the analysis BDT categories, with the 2D UML fit projections overlayed. The total fit is shown by the solid line and the different background components by the broken lines and cross-hatched histogram. The signal component is shown by the red single-hatched histogram.

png pdf
Additional Figure 6-l:
Invariant mass and proper decay time distribution for on the analysis BDT categories, with the 2D UML fit projections overlayed. The total fit is shown by the solid line and the different background components by the broken lines and cross-hatched histogram. The signal component is shown by the red single-hatched histogram.

png pdf
Additional Figure 6-m:
Invariant mass and proper decay time distribution for on the analysis BDT categories, with the 2D UML fit projections overlayed. The total fit is shown by the solid line and the different background components by the broken lines and cross-hatched histogram. The signal component is shown by the red single-hatched histogram.

png pdf
Additional Figure 6-n:
Invariant mass and proper decay time distribution for on the analysis BDT categories, with the 2D UML fit projections overlayed. The total fit is shown by the solid line and the different background components by the broken lines and cross-hatched histogram. The signal component is shown by the red single-hatched histogram.

png pdf
Additional Figure 6-o:
Invariant mass and proper decay time distribution for on the analysis BDT categories, with the 2D UML fit projections overlayed. The total fit is shown by the solid line and the different background components by the broken lines and cross-hatched histogram. The signal component is shown by the red single-hatched histogram.

png pdf
Additional Figure 6-p:
Invariant mass and proper decay time distribution for on the analysis BDT categories, with the 2D UML fit projections overlayed. The total fit is shown by the solid line and the different background components by the broken lines and cross-hatched histogram. The signal component is shown by the red single-hatched histogram.

png pdf
Additional Figure 7:
Invariant-mass distributions for the $\mu \mu \mathrm{K}$ (top) and $\mu \mu \mathrm{KK}$ (bottom) systems used to reconstruct the $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ normalization and $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ control samples. From left to right, the plot shows the 2016A central-region, 2016A forward-region, 2016B central-region, and 2016B forward-region channels. The data are shown by solid black circles, the result of the fit is overlayed with the black line, and the different components are indicated by the hatched regions.

png pdf
Additional Figure 7-a:
Invariant-mass distribution for the $\mu \mu \mathrm{K}$ system used to reconstruct the $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ normalization sample. The plot shows the 2016A central-region 2016A forward-region, 2016B central-region 2016B forward-region channel. The data are shown by solid black circles, the result of the fit is overlayed with the black line, and the different components are indicated by the hatched regions.

png pdf
Additional Figure 7-b:
Invariant-mass distribution for the $\mu \mu \mathrm{K}$ system used to reconstruct the $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ normalization control sample. The plot shows the 2016A central-region 2016A forward-region, 2016B central-region 2016B forward-region channel. The data are shown by solid black circles, the result of the fit is overlayed with the black line, and the different components are indicated by the hatched regions.

png pdf
Additional Figure 7-c:
Invariant-mass distribution for the $\mu \mu \mathrm{K}$ system used to reconstruct the $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ normalization control sample. The plot shows the 2016A central-region 2016A forward-region, 2016B central-region 2016B forward-region channel. The data are shown by solid black circles, the result of the fit is overlayed with the black line, and the different components are indicated by the hatched regions.

png pdf
Additional Figure 7-d:
Invariant-mass distribution for the $\mu \mu \mathrm{K}$ system used to reconstruct the $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ normalization control sample. The plot shows the 2016A central-region 2016A forward-region, 2016B central-region 2016B forward-region channel. The data are shown by solid black circles, the result of the fit is overlayed with the black line, and the different components are indicated by the hatched regions.

png pdf
Additional Figure 7-e:
Invariant-mass distribution for the $\mu \mu \mathrm{KK}$ system used to reconstruct the $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ control sample. The plot shows the 2016A central-region 2016A forward-region, 2016B central-region 2016B forward-region channel. The data are shown by solid black circles, the result of the fit is overlayed with the black line, and the different components are indicated by the hatched regions.

png pdf
Additional Figure 7-f:
Invariant-mass distribution for the $\mu \mu \mathrm{KK}$ system used to reconstruct the $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ control sample. The plot shows the 2016A central-region 2016A forward-region, 2016B central-region 2016B forward-region channel. The data are shown by solid black circles, the result of the fit is overlayed with the black line, and the different components are indicated by the hatched regions.

png pdf
Additional Figure 7-g:
Invariant-mass distribution for the $\mu \mu \mathrm{KK}$ system used to reconstruct the $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ control sample. The plot shows the 2016A central-region 2016A forward-region, 2016B central-region 2016B forward-region channel. The data are shown by solid black circles, the result of the fit is overlayed with the black line, and the different components are indicated by the hatched regions.

png pdf
Additional Figure 7-h:
Invariant-mass distribution for the $\mu \mu \mathrm{KK}$ system used to reconstruct the $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ control sample. The plot shows the 2016A central-region 2016A forward-region, 2016B central-region 2016B forward-region channel. The data are shown by solid black circles, the result of the fit is overlayed with the black line, and the different components are indicated by the hatched regions.

png pdf
Additional Figure 8:
Expected mass distributions from MC simulations for a combination of all rare processes (left), of all rare semileptonic decays (middle), and of rare two-body hadronic background components (right), corresponding to the sum of all categories of the high-BDT mass plot.

png pdf
Additional Figure 8-a:
Expected mass distribution from MC simulations for a combination of all rare processes, corresponding to the sum of all categories of the high-BDT mass plot.

png pdf
Additional Figure 8-b:
Expected mass distribution from MC simulations for a combination of all rare semileptonic decays, corresponding to the sum of all categories of the high-BDT mass plot.

png pdf
Additional Figure 8-c:
Expected mass distribution from MC simulations for a combination of rare two-body hadronic background components, corresponding to the sum of all categories of the high-BDT mass plot.

png pdf
Additional Figure 9:
Comparison of measured and simulated distributions for the flight length significance in central-region channel. From left to right, the plot shows the distribution for 2016A $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$, 2016A $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $, 2016B $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$, and 2016B $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates.

png pdf
Additional Figure 9-a:
Comparison of measured and simulated distributions for the flight length significance in central-region channel. The plot shows the distribution for 2016A $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ candidates.

png pdf
Additional Figure 9-b:
Comparison of measured and simulated distributions for the flight length significance in central-region channel. The plot shows the distribution for 2016A $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates.

png pdf
Additional Figure 9-c:
Comparison of measured and simulated distributions for the flight length significance in central-region channel. The plot shows the distribution for 2016B $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ candidates.

png pdf
Additional Figure 9-d:
Comparison of measured and simulated distributions for the flight length significance in central-region channel. The plot shows the distribution for 2016B $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates.

png pdf
Additional Figure 10:
Comparison of measured and simulated distributions for the pointing angle (defined in the main text) in central-region channel. From left to right, the plot shows the distribution for 2016A $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$, 2016A $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $, 2016B $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$, and 2016B $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates.

png pdf
Additional Figure 10-a:
Comparison of measured and simulated distributions for the pointing angle (defined in the main text) in central-region channel. The plot shows the distribution for 2016A $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ 2016A $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates.

png pdf
Additional Figure 10-b:
Comparison of measured and simulated distributions for the pointing angle (defined in the main text) in central-region channel. The plot shows the distribution for 2016A $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates.

png pdf
Additional Figure 10-c:
Comparison of measured and simulated distributions for the pointing angle (defined in the main text) in central-region channel. The plot shows the distribution for 2016B $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ candidates.

png pdf
Additional Figure 10-d:
Comparison of measured and simulated distributions for the pointing angle (defined in the main text) in central-region channel. The plot shows the distribution for 2016B $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates.

png pdf
Additional Figure 11:
Comparison of measured and simulated distributions for the number of tracks close to the secondary vertex in central-region channel. From left to right, the plot shows the distribution for 2016A $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$, 2016A $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $, 2016B $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$, and 2016B $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates.

png pdf
Additional Figure 11-a:
Comparison of measured and simulated distributions for the number of tracks close to the secondary vertex in central-region channel. The plot shows the distribution for 2016A $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ candidates.

png pdf
Additional Figure 11-b:
Comparison of measured and simulated distributions for the number of tracks close to the secondary vertex in central-region channel. The plot shows the distribution for 2016A $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates.

png pdf
Additional Figure 11-c:
Comparison of measured and simulated distributions for the number of tracks close to the secondary vertex in central-region channel. The plot shows the distribution for 2016B $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ candidates.

png pdf
Additional Figure 11-d:
Comparison of measured and simulated distributions for the number of tracks close to the secondary vertex in central-region channel. The plot shows the distribution for 2016B $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates.

png pdf
Additional Figure 12:
Comparison of measured and simulated distributions for the subleading muon $p_{\rm T}$ in central-region channel. From left to right, the plot shows the distribution for 2016A $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$, 2016A $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $, 2016B $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$, and 2016B $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates.

png pdf
Additional Figure 12-a:
Comparison of measured and simulated distributions for the subleading muon $p_{\rm T}$ in central-region channel. The plot shows the distribution for 2016A $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ candidates.

png pdf
Additional Figure 12-b:
Comparison of measured and simulated distributions for the subleading muon $p_{\rm T}$ in central-region channel. The plot shows the distribution for 2016A $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates.

png pdf
Additional Figure 12-c:
Comparison of measured and simulated distributions for the subleading muon $p_{\rm T}$ in central-region channel. The plot shows the distribution for 2016B $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ candidates.

png pdf
Additional Figure 12-d:
Comparison of measured and simulated distributions for the subleading muon $p_{\rm T}$ in central-region channel. The plot shows the distribution for 2016B $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates.

png pdf
Additional Figure 13:
Comparison of measured and simulated distributions for the muon helicity angle for 2016A (left) and 2016B (right) $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ candidates in central-region channel.

png pdf
Additional Figure 13-a:
Comparison of measured and simulated distributions for the muon helicity angle for 2016A $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ candidates in central-region channel.

png pdf
Additional Figure 13-b:
Comparison of measured and simulated distributions for the muon helicity angle for 2016B $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ candidates in central-region channel.

png pdf
Additional Figure 14:
Comparison of measured and simulated distributions for the $B$ meson proper decay time in central-region channel. From left to right, the plot shows the distribution for 2016A $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$, 2016A $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $, 2016B $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$, and 2016B $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates.

png pdf
Additional Figure 14-a:
Comparison of measured and simulated distributions for the $B$ meson proper decay time in central-region channel. The plot shows the distribution for 2016A $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ candidates.

png pdf
Additional Figure 14-b:
Comparison of measured and simulated distributions for the $B$ meson proper decay time in central-region channel. The plot shows the distribution for 2016A $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates.

png pdf
Additional Figure 14-c:
Comparison of measured and simulated distributions for the $B$ meson proper decay time in central-region channel. The plot shows the distribution for 2016B $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ candidates.

png pdf
Additional Figure 14-d:
Comparison of measured and simulated distributions for the $B$ meson proper decay time in central-region channel. The plot shows the distribution for 2016B $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates.

png pdf
Additional Figure 15:
Illustration of the analysis BDT discriminator distributions in background-subtracted data and MC simulation in the central channel. In top row, from left to right, the plot shows the distribution for 2011 $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$, 2011 $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $, 2012 $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$, and 2012 $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates. In the bottom row, the plots (from left to right) show the distribution for 2016A $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$, 2016A $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $, 2016B $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$, and 2016B $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates.

png pdf
Additional Figure 15-a:
Illustration of the analysis BDT discriminator distributions in background-subtracted data and MC simulation in the central channel. The plot shows the distribution for 2011 $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ candidates.

png pdf
Additional Figure 15-b:
Illustration of the analysis BDT discriminator distributions in background-subtracted data and MC simulation in the central channel. The plot shows the distribution for 2011 $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates.

png pdf
Additional Figure 15-c:
Illustration of the analysis BDT discriminator distributions in background-subtracted data and MC simulation in the central channel. The plot shows the distribution for 2012 $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ candidates.

png pdf
Additional Figure 15-d:
Illustration of the analysis BDT discriminator distributions in background-subtracted data and MC simulation in the central channel. The plot shows the distribution for 2012 $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates.

png pdf
Additional Figure 15-e:
Illustration of the analysis BDT discriminator distributions in background-subtracted data and MC simulation in the central channel. The plot shows the distribution for 2016A $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ candidates.

png pdf
Additional Figure 15-f:
Illustration of the analysis BDT discriminator distributions in background-subtracted data and MC simulation in the central channel. The plot shows the distribution for 2016A $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates.

png pdf
Additional Figure 15-g:
Illustration of the analysis BDT discriminator distributions in background-subtracted data and MC simulation in the central channel. The plot shows the distribution for 2016B $B^{+}\to \mathrm{J}/\psi \mathrm{K}^{+}$ candidates.

png pdf
Additional Figure 15-h:
Illustration of the analysis BDT discriminator distributions in background-subtracted data and MC simulation in the central channel. The plot shows the distribution for 2016B $\mathrm{B}^{0}_{\mathrm{s}}\to \mathrm{J}/\psi \phi $ candidates.

png pdf
Additional Figure 16:
Illustration of the analysis BDT discriminator distribution in dimuon background data from the 5.45 $ < m_{\mu^{+}\mu^{-}} < $ 5.9 GeV sideband and $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ signal MC simulation. In top row, from left to right the plot shows the distribution for 2011 central, 2011 forward, 2012 central, and 2012 forward events. In the bottom row, the plots (from left to right) show the distribution for 2016A central, 2016A forward, 2016B central, and 2016B forward events. The arrows show the BDT discriminator boundaries.

png pdf
Additional Figure 16-a:
Illustration of the analysis BDT discriminator distribution in dimuon background data from the 5.45 $ < m_{\mu^{+}\mu^{-}} < $ 5.9 GeV sideband and $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ signal MC simulation. The plot shows the distribution for 2011 central events. The arrows show the BDT discriminator boundary.

png pdf
Additional Figure 16-b:
Illustration of the analysis BDT discriminator distribution in dimuon background data from the 5.45 $ < m_{\mu^{+}\mu^{-}} < $ 5.9 GeV sideband and $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ signal MC simulation. The plot shows the distribution for 2011 forward events. The arrows show the BDT discriminator boundary.

png pdf
Additional Figure 16-c:
Illustration of the analysis BDT discriminator distribution in dimuon background data from the 5.45 $ < m_{\mu^{+}\mu^{-}} < $ 5.9 GeV sideband and $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ signal MC simulation. The plot shows the distribution for 2012 central events. The arrows show the BDT discriminator boundary.

png pdf
Additional Figure 16-d:
Illustration of the analysis BDT discriminator distribution in dimuon background data from the 5.45 $ < m_{\mu^{+}\mu^{-}} < $ 5.9 GeV sideband and $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ signal MC simulation. The plot shows the distribution for 2012 forward events. The arrows show the BDT discriminator boundary.

png pdf
Additional Figure 16-e:
Illustration of the analysis BDT discriminator distribution in dimuon background data from the 5.45 $ < m_{\mu^{+}\mu^{-}} < $ 5.9 GeV sideband and $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ signal MC simulation. The plot shows the distribution for 2016A central events. The arrows show the BDT discriminator boundary.

png pdf
Additional Figure 16-f:
Illustration of the analysis BDT discriminator distribution in dimuon background data from the 5.45 $ < m_{\mu^{+}\mu^{-}} < $ 5.9 GeV sideband and $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ signal MC simulation. The plot shows the distribution for 2016A forward events. The arrows show the BDT discriminator boundary.

png pdf
Additional Figure 16-g:
Illustration of the analysis BDT discriminator distribution in dimuon background data from the 5.45 $ < m_{\mu^{+}\mu^{-}} < $ 5.9 GeV sideband and $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ signal MC simulation. The plot shows the distribution for 2016B central events. The arrows show the BDT discriminator boundary.

png pdf
Additional Figure 16-h:
Illustration of the analysis BDT discriminator distribution in dimuon background data from the 5.45 $ < m_{\mu^{+}\mu^{-}} < $ 5.9 GeV sideband and $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ signal MC simulation. The plot shows the distribution for 2016B forward events. The arrows show the BDT discriminator boundary.

png pdf
Additional Figure 17:
Illustration of the analysis BDT discriminator distribution in logarithmic scale in dimuon background data from the 5.45 $ < m_{\mu^{+}\mu^{-}} < $ 5.9 GeV sideband and $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ signal MC simulation. In top row, from left to right the plot shows the distribution for 2011 central, 2011 forward, 2012 central, and 2012 forward events. In the bottom row, the plots (from left to right) show the distribution for 2016A central, 2016A forward, 2016B central, and 2016B forward events. The arrows show the BDT discriminator boundaries.

png pdf
Additional Figure 17-a:
Illustration of the analysis BDT discriminator distribution in logarithmic scale in dimuon background data from the 5.45 $ < m_{\mu^{+}\mu^{-}} < $ 5.9 GeV sideband and $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ signal MC simulation. IThe plot shows the distribution for 2011 central events. The arrows show the BDT discriminator boundary.

png pdf
Additional Figure 17-b:
Illustration of the analysis BDT discriminator distribution in logarithmic scale in dimuon background data from the 5.45 $ < m_{\mu^{+}\mu^{-}} < $ 5.9 GeV sideband and $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ signal MC simulation. IThe plot shows the distribution for 2011 forward events. The arrows show the BDT discriminator boundary.

png pdf
Additional Figure 17-c:
Illustration of the analysis BDT discriminator distribution in logarithmic scale in dimuon background data from the 5.45 $ < m_{\mu^{+}\mu^{-}} < $ 5.9 GeV sideband and $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ signal MC simulation. IThe plot shows the distribution for 2012 central events. The arrows show the BDT discriminator boundary.

png pdf
Additional Figure 17-d:
Illustration of the analysis BDT discriminator distribution in logarithmic scale in dimuon background data from the 5.45 $ < m_{\mu^{+}\mu^{-}} < $ 5.9 GeV sideband and $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ signal MC simulation. IThe plot shows the distribution for 2012 forward events. The arrows show the BDT discriminator boundary.

png pdf
Additional Figure 17-e:
Illustration of the analysis BDT discriminator distribution in logarithmic scale in dimuon background data from the 5.45 $ < m_{\mu^{+}\mu^{-}} < $ 5.9 GeV sideband and $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ signal MC simulation. IThe plot shows the distribution for 2016A central events. The arrows show the BDT discriminator boundary.

png pdf
Additional Figure 17-f:
Illustration of the analysis BDT discriminator distribution in logarithmic scale in dimuon background data from the 5.45 $ < m_{\mu^{+}\mu^{-}} < $ 5.9 GeV sideband and $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ signal MC simulation. IThe plot shows the distribution for 2016A forward events. The arrows show the BDT discriminator boundary.

png pdf
Additional Figure 17-g:
Illustration of the analysis BDT discriminator distribution in logarithmic scale in dimuon background data from the 5.45 $ < m_{\mu^{+}\mu^{-}} < $ 5.9 GeV sideband and $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ signal MC simulation. IThe plot shows the distribution for 2016B central events. The arrows show the BDT discriminator boundary.

png pdf
Additional Figure 17-h:
Illustration of the analysis BDT discriminator distribution in logarithmic scale in dimuon background data from the 5.45 $ < m_{\mu^{+}\mu^{-}} < $ 5.9 GeV sideband and $\mathrm{B}^{0}_{\mathrm{s}}\to \mu^{+}\mu^{-}$ signal MC simulation. IThe plot shows the distribution for 2016B forward events. The arrows show the BDT discriminator boundary.
References
1 C. Bobeth et al. $ \mathrm{B_{s,d}^0 \to \ell^{+}\ell^{-} } $ in the standard model with reduced theoretical uncertainty PRL 112 (2014) 101801 1311.0903
2 C. Bobeth, M. Gorbahn, and E. Stamou Electroweak corrections to $ \mathrm{B_{s,d}^0 \to \ell^{+}\ell^{-} } $ PRD 89 (2014) 034023 1311.1348
3 T. Hermann, M. Misiak, and M. Steinhauser Three-loop QCD corrections to $ \mathrm{B_{s}^0 \to \mu^{+}\mu^{-} } $ JHEP 12 (2013) 097 1311.1347
4 M. Beneke, C. Bobeth, and R. Szafron Enhanced electromagnetic correction to the rare B-meson decay $ \mathrm{B_{s,d}^0 \to \mu^{+}\mu^{-} } $ PRL 120 (2018) 011801 1708.09152
5 M. Beneke, C. Bobeth, and R. Szafron Power-enhanced leading-logarithmic QED corrections to $ \mathrm{B_{q}^0 \to \mu^{+}\mu^{-} } $ 1908.07011
6 Flavour Lattice Averaging Group Collaboration FLAG Review 2019 1902.08191
7 Fermilab Lattice and MILC Collaboration B- and D-meson leptonic decay constants from four-flavor lattice QCD PRD 98 (2018) 074512 1712.09262
8 ETM Collaboration Mass of the b quark and B meson decay constants from N$ _f = $ 2+1+1 twisted-mass lattice QCD PRD 93 (2016) 114505 1603.04306
9 HPQCD Collaboration B-Meson decay constants from improved lattice nonrelativistic QCD with physical u, d, s, and c quarks PRL 110 (2013) 222003 1302.2644
10 C. Hughes, C. T. H. Davies, and C. J. Monahan New methods for B meson decay constants and form factors from lattice NRQCD PRD 97 (2018) 054509 1711.09981
11 CMS Collaboration Measurement of the $ \mathrm{B_{s}^0 \to \mu^{+}\mu^{-} }\ $ branching fraction and search for $ \mathrm{B_{d}^0 \to \mu^{+}\mu^{-} }\ $ with the CMS Experiment PRL 111 (2013) 101804 CMS-BPH-13-004
1307.5025
12 CMS and LHCb Collaborations Observation of the rare $ \mathrm{B_{s}^0 \to \mu^{+}\mu^{-} }\ $ decay from the combined analysis of CMS and LHCb data Nature 522 (2015) 68 1411.4413
13 LHCb Collaboration Measurement of the $ \mathrm{B_{s}^0 \to \mu^{+}\mu^{-} }\ $ branching fraction and effective lifetime and search for $ \mathrm{B_{d}^0 \to \mu^{+}\mu^{-} }\ $ decays PRL 118 (2017) 191801 1703.05747
14 ATLAS Collaboration Study of the rare decays of $ \mathrm{B_{s}^0 }\ $ and $ \mathrm{B^0 }\ $ mesons into muon pairs using data collected during 2015 and 2016 with the ATLAS detector JHEP 04 (2019) 098 1812.03017
15 HFLAV Collaboration Averages of b-hadron, c-hadron, and $ \tau $-lepton properties as of summer 2016 EPJC 77 (2017) 895 1612.07233
16 Particle Data Group, M. Tanabashi et al. Review of particle physics PRD 98 (2018) 030001
17 K. De Bruyn et al. Probing new physics via the $ \mathrm{B_{s}^0 \to \mu^{+}\mu^{-} }\ $ effective lifetime PRL 109 (2012) 041801 1204.1737
18 K. De Bruyn et al. Branching ratio measurements of $ \mathrm{B_{s}^0 }\ $ decays PRD 86 (2012) 014027 1204.1735
19 LHCb Collaboration Measurement of the fragmentation fraction ratio $ f_{\mathrm{s}}/f_{\mathrm{d}}\ $ and its dependence on B meson kinematics JHEP 04 (2013) 001 1301.5286
20 ATLAS Collaboration Determination of the ratio of b-quark fragmentation fractions $ f_{\mathrm{s}}/f_{\mathrm{d}}\ $ in pp collisions at $ \sqrt{s}= $ 7 TeV with the ATLAS detector PRL 115 (2015) 262001 1507.08925
21 LHCb Collaboration Measurement of b hadron fractions in 13 TeV pp collisions PRD 100 (2019) 031102 1902.06794
22 M. Pivk and F. R. Le Diberder SPlot: A statistical tool to unfold data distributions NIMA 555 (2005) 356 physics/0402083
23 A. Khodjamirian, C. Klein, T. Mannel, and Y. M. Wang Form factors and strong couplings of heavy baryons from QCD light-cone sum rules JHEP 09 (2011) 106 1108.2971
24 T. Sjostrand, S. Mrenna, and P. Z. Skands PYTHIA 6.4 physics and manual JHEP 05 (2006) 026 hep-ph/0603175
25 T. Sjostrand et al. An introduction to PYTHIA 8.2 CPC 191 (2015) 159 1410.3012
26 D. J. Lange The EvtGen particle decay simulation package NIMA 462 (2001) 152
27 P. Golonka and Z. Was PHOTOS Monte Carlo: a precision tool for QED corrections in Z and W decays EPJC 45 (2006) 97 hep-ph/0506026
28 N. Davidson, T. Przedzinski, and Z. Was PHOTOS interface in C++: technical and physics documentation CPC 199 (2016) 86 1011.0937
29 GEANT4 Collaboration GEANT4--a simulation toolkit NIMA 506 (2003) 250
30 CMS Collaboration The CMS experiment at the CERN LHC JINST 3 (2008) S08004 CMS-00-001
31 CMS Collaboration CMS tracking performance results from early LHC operation EPJC 70 (2010) 1165 CMS-TRK-10-001
1007.1988
32 CMS Collaboration Tracking POG results for pion efficiency with the $ \mathrm{D^{*+}} $ meson using data from 2016 and 2017 CDS
33 CMS Collaboration Performance of CMS muon reconstruction in pp collision events at $ \sqrt{s}= $ 7 TeV JINST 7 (2012) P10002 CMS-MUO-10-004
1206.4071
34 CMS Collaboration Performance of the CMS muon detector and muon reconstruction with proton-proton collisions at $ \sqrt{s} = $ 13 TeV JINST 13 (2018) P06015 CMS-MUO-16-001
1804.04528
35 CMS Collaboration The CMS trigger system JINST 12 (2017) P01020 CMS-TRG-12-001
1609.02366
36 H. Voss, A. Hocker, J. Stelzer, and F. Tegenfeldt TMVA, the toolkit for multivariate data analysis with ROOT in XIth International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT), p. 40 2007 [PoS(ACAT)040] physics/0703039
37 CMS Collaboration Description and performance of track and primary-vertex reconstruction with the CMS tracker JINST 9 (2014) P10009 CMS-TRK-11-001
1405.6569
38 CMS Collaboration Measurement of b-hadron lifetimes in pp collisions at $ \sqrt{s} = $ 8 TeV EPJC 78 (2018) 457 CMS-BPH-13-008
1710.08949
39 M. J. Oreglia A study of the reactions $\psi' \to \gamma\gamma \psi$ PhD thesis, Stanford University, 1980 SLAC Report SLAC-R-236, see A
40 K. S. Cranmer Kernel estimation in high-energy physics CPC 136 (2001) 198 hep-ex/0011057
41 S. S. Wilks The large-sample distribution of the likelihood ratio for testing composite hypotheses Annals Math. Statist. 9 (1938) 60
42 G. J. Feldman and R. D. Cousins A unified approach to the classical statistical analysis of small signals PRD 57 (1998) 3873 physics/9711021
43 A. L. Read Presentation of search results: the CL$ _{\rm s} $ technique JPG 28 (2002) 2693
44 T. Junk Confidence level computation for combining searches with small statistics NIMA 434 (1999) 435 hep-ex/9902006
45 CMS Collaboration Measurement of the $ \Lambda_{\mathrm{b}}^0 $ lifetime in pp collisions at $ \sqrt{s} = $ 7 TeV JHEP 07 (2013) 163 CMS-BPH-11-013
1304.7495
46 F. E. James Statistical methods in experimental physics World Scientific, Singapore
47 J. Neyman Outline of a theory of statistical estimation based on the classical theory of probability Phil. Trans. Roy. Soc. Lond. A 236 (1937) 333
Compact Muon Solenoid
LHC, CERN