Principal Component Analysis in ROOT

ROOT Homepage

Introduction

Principal Component Analysis (PCA) is a nifty way to reduce the amount data you need to crunch through when doing Physics analysis, which often is a hairy buisness.

Documentation

You can find some documentation here. There's also Postscript document avaliable.

The Algorithms

The Covariance matrix and mean values of the input data is caculated on the fly by the following equations:

<xi>(1) = xi1
<xi>(n) = <xi>(n-1) + (1/n) (xin - <xi>(n-1))

Cij(1) = 0
Cij(n) = Cij(n-1) + 1/(n-1)[ (xin - <xi>(n)) (xjn - <xj>(n))] - (1/n) Cij(n-1)

since this is a really fast method, with no rounding errors (please refer to CERN 72-21 pp. 54-106).

Numerical Recipes

The symmetric matrix C that is generated from the datapoints, is then tridiagonalised using the Householder algorithm as predented in Numerical Recipes in C.

Finally, using a QL decomposition with implict shift as presented in Numerical Recipes in C, the eigenvalues and -vectors of the covariance matrix C is found.

The ROOT Class

Older stuff - LINTRA

You can get the LINTRA code here in a tar-ball with one big flat Fortran (4) source file plus example code. You need ZEBRA, HBOOK, and FFREAD from CERNLIB to compile this program/library.

For nostalgic reasons the LINTRA code is also avaliable here as a tar-ball with a PATCHY archive (.car) and the same example code as above.

In the later case, you need PATCHY (Release notes) to unpack the source files. You can find PATCHY on CERNs ftp server:

            ftp://asisftp.cern.ch/cernlib/<arch>/patchy/4.15/bin/
          

You also need kernlib and packlib from CERNLIB to compile LINTRA. You can get it from:

            ftp://asisftp.cern.ch/cernlib/<arch>/2000/
          

(the above links will take you to the Redhat 6.1 Linux architecture, substitute your own architecture for <arch>)

Interresting Links:

Index page   Christian Holm Christensen Valid CSS! Valid HTML 4.01!