Note 8015

   The note describes the new technique I developed for vertexing. The idea is to use
EM clustering technique to perform clustering in two dimensions:
space (Z position of the track) and time. You can find the description of the EM techniques,
resolutions for tracks, resolution for vertexes, efficiencies for vertexes, ... .


The code is available: ClusteringModule.cc and ClusteringModule.hh
The code is completely independent from any CDF software and can be used from inside any piece of code. It is a general implementation of the EM (Expectation Minimization) clustering technique in multi-dimensions (3 maximum at the moment) and one can cluster, for example, the penguin population at the south pole.

Here is how I use in from inside of standard ntuple to cluster tracks in space-time. Space is the Z position of the track, and time is calculated from the COT hits.

Compile it:
gSystem->CompileMacro("ClusteringModule.cc","k");

In the .hh of my analysis module:
#include "ClusteringModule.hh"
ClusteringModuleCl3d _clmod;

Once in the constructor or beginning of the job:
 _clmod.setModulePars(6,2.4,0,  2, 0.,  1.,0.6,0.);
  _clmod.setSeedStrategy(ClusteringModuleCl3d::Combined);
  float lowleft[3] = {-120.,-16,0.};
  float ritop[3] = {120.,16,0.};
  _clmod.findRoughClusters(lowleft, ritop);
!!! If this part is executed for every event, you will end up with a memory leak !!!

setModulePars(6,2.4,0,  2, 0.,  1.,0.6,0.) - first three parameters determine the grid size for the initial seeds, first is Z position, second one is time, and third dimension is not used. Fourth parameter defines the number of dimensions, 2 here for space and time. Fifth parameter (0) determines the minimum tracks SumPt for the cluster. Last three parameters determine the a priori resolution in each dimension - position, time, and the last dimension is not used.
_clmod.findRoughClusters(lowleft, ritop) - determines the total size of each dimension - 120 cm corresponds to Z position, and 16 ns corresponds to time.

In the event:
 _clmod.wipeClean(); //important - clean up from the last event
  const int ne = 1000; // the size for the total number of tracks

//start iterating over tracks and put information into the clustering object
  int ntrk = 0;
  for(int ie=0; ie< ne, ie < fTrackBlock->NTracks(); ie++) {
    TStnTrack* trk = (fTrackBlock->Track(ie));
    TMatrix55* cov = trk->Cov();
    float z0err = sqrt( (*cov)(2,2) );
    //select tracks that you think should be used //continue;
   
    float trpars[3] = {trk->Z0(),trk->CotT0(),0.};
    float tperrs[3] = {z0err,trk->CotT0Err(),0.};
    PointCl3d point(&trpars[0],&tperrs[0]);
    point.setObject(trk);
    point.setWeight(trk->Pt());
    _clmod.addEventPoint(&point);
    ntrk++;
    if(ntrk >= ne) break;
  }
  _clmod.getIntoClustering(); //here the clustering is done

Get Results:
 _clmod.orderBySumPWeights(); //order clusters by the maximum weight - SumPt in our case
//many other orderings exist, look for them in the beginning of .cc file or ask me
  for(int i = 0; i < _clmod.clusters()->size(); i++){
    ClusterCl3d* cl = _clmod.cluster(i);
     cl->mean(0);                // z position of the cluster determined by EM algorithm
    cl->mean(1);                 // time
    cl->sigma(0);               //z RMS
    cl->sigma(1);               // time RMS
    cl->sumWeights();       // tracks SumPt
    cl->nPoints();               //number of tracks
    cl->weight();                // weight as determined by EM algorithm !!! do not confuse with SumPt
    (cl->pointsMean())[0]; // mean position in Z calculated from tracks
    (cl->pointsMean())[1]; // mean position in time calculated from tracks
    (cl->pointsRms())[0];  // RMS
    (cl->pointsRms())[1];  // RMS
 }

I hope that all possible information one needs to know is accessible. If you need something that you think is not provided, please ask me.



Max Goncharov, maxi@fnal.gov.