Things to Know About Database
Before you read any further:
This is the view
of the database from the end point of the user who is
not the Oracle expert, language and terminology are not what database
experts use.
This page assumes that you are somewhat familiar
with C++ and AC++ framework, and words like SQL and database do not
make you feel uncomfortable. There is an excellent CDF note 5088 that
you should also read before proceeding any further. In my experience if
you want to fully understand and control how AC++ job uses database,
you should always look into this note on page 8, figure 2, and been
able to imagine what really happens in the database.
Where to read/write
There are 4 databases (DB) for CDF calibration
purposes: integration online (cdfonint) and offline (cdfofint) exist
for development purposes. Production online (cdfonprd) and offline
(cdfofprd) are the databases used for production. Offline databases
have write access disabled, which means that users can not write data
into those. Online databases are behind a very restrictive firewall
which allows access only from online machines and fcdflnx2.
Unlike offline databases, online ones allow write access by the users
(that is why they are behind firewall). Data is streamlined from
online DBs to offline ones. Therefor the normal mode of operation is to
commit data into online DB, wait for several minutes while your data
propagates into offline DB. Once it happens, you can fire up AC++ job
and use the tables you have just committed. Remember to make sure that
CalibrationManager looks into the offline DB, otherwise you are not
going to access the tables on online DBs because of the firewall
(unless you are running on online machines or fcdflnx2).
Access to calib
tables
There are 3 major ways to access calibrations in the
database:
1:
By using USED_SETS (or "unkyed get"): it happens
when you use detDB_mgr.get(detDB_data), where detDB_mgr is the Manager
object that you create like: detDB_mgr = Manager<...>
(tableName), and detDB_data is the storage container for you data. You
must understand that underneath get() method actually SQL command is
executed, and all parameters (except for the table name) like database
name, etc. are determined by CalibrationManager. This is the standard
when production runs. However, you must be aware how used are created
and propagated, and understand how to set CalibrationManager
accordingly. There is an excellent web page explaining the USED_SETS
mechanism:
2:
Use of "keyed" get. You can also control what table
you want to extract: by using detDB_mgr.get(fkey,detDB_data) method.
fkey is the object of type RunListKey that carries all necessary
information about the table you want:
RunListKey
fkey(status,tableName, algorithm, Run(runNumber),
Version(Version::latest) )
If you look at the fields that are provided for fkey
you will realize that they correspond to the fields in CALIBRUNLISTS
tables and, in fact, uniquely identify data you want. If the DB manager
is constructed the same way is in example 1, the database name is still
under the control of CalibrationManager. But if you do something like:
detDB_mgr =
Manager<...> (database, tableName), the database will
be chosen by you.
3:
Third, and the least obvious way is to write your
own SQL and use C++ interface to the database to extract data.
EMTiming
Calibration scheme.
There are 10 total tables for EMTiming system, they
all start with TDC. Each table is uniquely identified by CID number,
and by a combination of 4 other fields in CALIBRUNLISTS table:
CALIB_RUN, CALIB_VESRION, CALIB_TABLE, and DATA_STATUS.
CALIB_TABLE is the name of the calibration table,
something like TDCCEMSLEWING for EMTiming system.
DATA_STATUS can be COMPLETE, TEST, RAW, and GOOD. I
am not sure where this field is used at all.
CALIB_VERSION is any integer number
CALIB_RUN is any integer number
ALGORITHM is the filed that can be any string
The scheme:
- CALIB_RUN for each table is the actually run number starting from
which this table is valid
- CALIB_VERSION can be any number, but tables with the highest
version number should be used
- ALGORITHM is integer number that specifies what reconstruction
algorithm should be applied.
- Once 2 tables with CALIB_RUN=run1 and CALIB_RUN=run2 are
committed, no table can be committed with the run1<CALIB_RUN<run2
and same other 3 fields (CALIB_VESRION, DATA_STATUS, and ALGORITHM)
Those 3 simple
rules make development and testing of the system quite
flexible and powerful.
- If you are reconstructing data for run=RunNumber, then you
know that the table with the highest CALIB_RUN <= RunNumber must be
used.
- If you made a mistake in creating a table or came up with
the table that does better reconstruction job, just bump up the
CALIB_VERSION by one.
- If in the future you realize that the reconstruction must
be changed, change the algorithm number and modify your code
accordingly.
Normally one has to specify the algorithm version
and DATA_STATUS for AC++ jobs, and the appropriate table with the
highest CALIB_RUN will be chosen. You can also set the particular
CALIB_VESRION if you want to use only tables with this version.
Also these scheme allows writing the code with the
minimal access to the database. Normally in the beginning of each run
all systems access the calibrations for this run. Now imagine that you
stripped few events from each run for ALL data CDF has ever taken and
want to run AC++ job on it. You will end up accessing the database
constantly. If, however, the calibrations do not change very often,
then when the table for particular CALIB_RUN is accessed, you KNOW
right away at what run the calibration data is changed in the future:
just find the table with MIN(CALIB_RUN) > this CALIB_RUN.