REST-for-Physics
v2.3
Rare Event Searches ToolKit for Physics
|
It allows to group a number of runs that satisfy given metadata conditions.
This class allows to make a selection of ROOT data files that fulfill certain metadata conditions allowing to create a group of files that define a particular dataset. The files will be searched in a relative or absolute path that is given together the filePattern
parameter.
We will be able to define the dates range where files will be accepted, using startTime
and endTime
parameters. The run start time and end time stored inside TRestRun will be evaluated to decide if the file should be considered.
A summary of the basic parameters follows:
startTime
will be considered.endTime
will be considered.We may add rules for any metadata class existing inside our ROOT datafiles. For such, we use the filter
key where we define the metadata member name where we want to evaluate the rules. We need to define the metadata
field where we specify the class name or metadata user given name, together with the metadata member we want to access, the metadata member must be named using the coventions defined inside the methods TRestRun::ReplaceMetadataMember and TRestRun::ReplaceMetadataMembers.
Three optional fields can be used to apply the rule:
Example of metadata rule:
Once the files that fulfill the given dates, filename pattern and metadata rules have been identified, the initialization will produce an instance to a ROOT::RDataFrame and an instance to a ROOT::TTree that will give access to the unified analysis tree. The available columns or branches at those instances will be defined by the user inside this metadata class, through the special keywords observables
and processObservables
.
Their use can be seen in the following example:
The basic usage of this class is by loading the metadata class as any other metadata class. After initialization, the user will get access to the internal RDataFrame and TTree data members, as shown in the following example:
We can then use our favorite ROOT::RDataFrame or TTree methods.
On top of performing a compilation of runs to construct a dataset and access the data in a unified way, we may also save the generated dataset to disk. This feature might be used to generate easier to handle data compilations that have been extracted from an official data repository.
Different output formats are supported:
root
: It will store the simplified TTree
with the observables selected by the user and compiled with the corresponding file selection. The root file will also contain a TRestDataSet object to allow future users of the output file generated to identify the origin of the data.txt
or csv
: It will create an ASCII table where each column will contain the data of a given branch. A header will be written inside the file with all the information found inside the TRestDataSet instance.Example 1 Generate DataSet from config file:
Example 2 Import existing DataSet:
Example 3 Automatically importing a dataset using restRoot
Sometimes we will be willing that our dataset contains few variables that are extremelly meaningful for the data compilation, and that will be required for further calculations or for the proper interpretation of the data. The key <quantity
will allow the user to define relevant quantities that will be stored together with the dataset. These quantitites must be extracted from existing metadata members that are present at the original files. There are different fields allowed inside, such as: name
, metadata
, strategy
and description
.
Example:
The name
field will be the user given name of the quantity. The metadata
field inside the <quantity
definition will allow to include a metadata member or a calculation based on a formula where metadata members intervine. The method TRestRun::ReplaceMetadataMembers is the responsible to translate the given metadata formula into a numeric value, check the documentation inside that method to find out the proper format of metadata members inside this field.
There are also different strategies for extracting the quantity value, which are defined by the user using the field strategy
, the different options available are:
metadata
definition for each of the selected files that will be included in the dataset.Using the method TRestDataSet::Define method we can implement a formula based on column names and relevant quantities. Then, the relevant quantities will be sustituted by their dataset value.
It is also possible to add new column definitions inside the RML so that the new column will be already pre-generated and included in the new dataset when we invoke TRestDataSet::GenerateDataSet.
We can use a valid mathematical expression where we may include any already existing column or observable, and/or any relevant quantity we introduced in our dataset definition.
We may use the addColumn
keyword to add new columns as follows:
where SolarFlux
,GeneratorArea
and Nsim
are the given names of the relevant quantities inside the dataset.
It is also possible to add cuts used to filter the data that will be stored inside the dataset. We can do that including a TRestCut definition inside the TRestDataSet.
For example, the following cut definition would discard entries with unexpected values inside the specified column, process_status
.
REST-for-Physics - Software for Rare Event Searches Toolkit
History of developments:
2022-November: First implementation of TRestDataSet Javier Galan
Definition at line 34 of file TRestDataSet.h.
#include <TRestDataSet.h>
Data Structures | |
struct | RelevantQuantity |
Public Member Functions | |
ROOT::RDF::RNode | ApplyRange (size_t from, size_t to) |
This method reduces the number of samples inside the dataset by selecting a range. | |
ClassDefOverride (TRestDataSet, 8) | |
ROOT::RDF::RNode | DefineColumn (const std::string &columnName, const std::string &formula) |
This function will add a new column to the RDataFrame using the same scheme as the usual RDF::Define method, but it will on top of that evaluate the values of any relevant quantities used. More... | |
void | EnableMultiThreading (Bool_t enable=true) |
void | Export (const std::string &filename, std::vector< std::string > excludeColumns={}) |
It will generate an output file with the dataset compilation. Only the selected branches and the files that fulfill the metadata filter conditions will be included. More... | |
void | GenerateDataSet () |
This function generates the data frame with the filelist and column names (or observables) that have been defined by the user. | |
auto | GetAddedColumns () const |
auto | GetCut () const |
ROOT::RDF::RNode | GetDataFrame () const |
Gives access to the RDataFrame. | |
auto | GetEndTime () const |
size_t | GetEntries () |
It returns the number of entries found inside fDataFrame and prints out a warning if the number of entries inside the tree is not the same. | |
auto | GetFilePattern () const |
std::vector< std::string > | GetFileSelection () |
It returns a list of the files that have been finally selected. | |
auto | GetFileSelection () const |
auto | GetFilterContains () const |
auto | GetFilterEndTime () const |
auto | GetFilterEqualsTo () const |
auto | GetFilterGreaterThan () const |
auto | GetFilterLowerThan () const |
auto | GetFilterMetadata () const |
auto | GetFilterStartTime () const |
size_t | GetNumberOfBranches () |
Number of variables (or observables) | |
size_t | GetNumberOfColumns () |
Number of variables (or observables) | |
auto | GetObservablesList () const |
auto | GetProcessObservablesList () const |
auto | GetQuantity () const |
auto | GetStartTime () const |
Double_t | GetTotalTimeInSeconds () const |
It returns the accumulated run time in seconds. | |
TTree * | GetTree () const |
Gives access to the tree. | |
void | Import (const std::string &fileName) |
This function imports metadata from a root file it import metadata info from the previous dataSet while it opens the analysis tree. | |
void | Import (std::vector< std::string > fileNames) |
This function initializes the chained tree and the RDataFrame using as input several root files that should contain TRestDataSet metadata information. The values of the first dataset will be considered to be stored in this new instance. More... | |
void | Initialize () override |
This function initialize different parameters from the TRestDataSet. | |
auto | IsMergedDataSet () const |
ROOT::RDF::RNode | MakeCut (const TRestCut *cut) |
This function applies a TRestCut to the dataframe and returns a dataframe with the applied cuts. Note that the cuts are not applied directly to the dataframe on TRestDataSet, to do so you should do fDataFrame = MakeCut(fCut);. | |
Bool_t | Merge (const TRestDataSet &dS) |
This function merge different TRestDataSet metadata in current dataSet. | |
TRestDataSet & | operator= (TRestDataSet &dS) |
Operator to copy TRestDataSet metadata. | |
void | PrintMetadata () override |
Prints on screen the information about the metadata members of TRestDataSet. | |
ROOT::RDF::RNode | Range (size_t from, size_t to) |
This method returns a RDataFrame node with the number of samples inside the dataset by selecting a range. It will not modify internally the dataset. See ApplyRange to modify internally the dataset. | |
void | SetDataFrame (const ROOT::RDF::RNode &dS) |
void | SetFilePattern (const std::string &pattern) |
void | SetObservablesList (const std::vector< std::string > &obsList) |
void | SetQuantity (const std::map< std::string, RelevantQuantity > &quantity) |
void | SetTotalTimeInSeconds (Double_t seconds) |
TRestDataSet () | |
Default constructor. | |
TRestDataSet (const char *cfgFileName, const std::string &name="") | |
Constructor loading data from a config file. More... | |
~TRestDataSet () | |
Default destructor. | |
Public Member Functions inherited from TRestMetadata | |
void | AddLog (std::string log="", bool print=true) |
Add logs to messageBuffer. | |
void | DoNotStore () |
If this method is called the metadata information will not be stored in disk. | |
TVector2 | Get2DVectorParameterWithUnits (std::string parName, TVector2 defaultValue=TVector2(-1, -1)) |
TVector3 | Get3DVectorParameterWithUnits (std::string parName, TVector3 defaultValue=TVector3(-1, -1, -1)) |
TString | GetCommit () |
Returns the REST commit value stored in fCommit. | |
std::string | GetConfigBuffer () |
Returns the config section of this class. | |
std::string | GetDataMemberValue (std::string memberName) |
Get the value of data member as string. More... | |
std::vector< std::string > | GetDataMemberValues (std::string memberName, Int_t precision=0) |
Get the value of datamember as a vector of strings. More... | |
TString | GetDataPath () |
Returns a std::string with the path used for data storage. | |
Double_t | GetDblParameterWithUnits (std::string parName, Double_t defaultValue=PARAMETER_NOT_FOUND_DBL) |
Gets the value of the parameter name parName, after applying unit conversion. More... | |
Bool_t | GetError () const |
It returns true if an error was identified by a derived metadata class. | |
TString | GetErrorMessage () |
Returns a std::string containing the error message. | |
TString | GetLibraryVersion () |
Returns the REST libraty version stored in fLibraryVersion. | |
TString | GetMainDataPath () |
Gets a std::string with the path used for data storage. | |
Int_t | GetNumberOfErrors () const |
Int_t | GetNumberOfWarnings () const |
std::string | GetParameter (std::string parName, TString defaultValue=PARAMETER_NOT_FOUND_STR) |
Returns corresponding REST Metadata parameter from multiple sources. More... | |
std::string | GetSectionName () |
Returns the section name of this class, defined at the beginning of fSectionName. | |
TRestStringOutput::REST_Verbose_Level | GetVerboseLevel () |
returns the verboselevel in type of REST_Verbose_Level enumerator | |
TString | GetVerboseLevelString () |
returns the verbose level in type of TString More... | |
TString | GetVersion () |
Returns the REST version stored in fVersion. | |
Int_t | GetVersionCode () |
UInt_t | GetVersionMajor () const |
UInt_t | GetVersionMinor () const |
UInt_t | GetVersionPatch () const |
Bool_t | GetWarning () const |
It returns true if an error was identified by a derived metadata class. | |
TString | GetWarningMessage () |
Returns a std::string containing the warning message. | |
TRestMetadata * | InstantiateChildMetadata (int index, std::string pattern="") |
This method will retrieve a new TRestMetadata instance of a child element of the present TRestMetadata instance based on the index given by argument, which defines the element order to be retrieved, 0 for first element found, 1 for the second element found, etc. More... | |
TRestMetadata * | InstantiateChildMetadata (std::string pattern="", std::string name="") |
This method will retrieve a new TRestMetadata instance of a child element of the present TRestMetadata instance based on the name given by argument. More... | |
Bool_t | isCleanState () const |
Bool_t | isOfficialRelease () const |
Int_t | LoadConfigFromBuffer () |
Initialize data from a string element buffer. More... | |
Int_t | LoadConfigFromElement (TiXmlElement *eSectional, TiXmlElement *eGlobal, std::map< std::string, std::string > envs={}) |
Main starter method. More... | |
Int_t | LoadConfigFromFile (const std::string &configFilename, const std::string §ionName="") |
Give the file name, find out the corresponding section. Then call the main starter. | |
virtual void | Merge (const TRestMetadata &) |
TRestMetadata & | operator= (const TRestMetadata &) |
void | Print () |
Implementing TObject::Print() method. | |
void | PrintConfigBuffer () |
Print the config xml section stored in the class. More... | |
void | PrintMessageBuffer () |
Print the buffered message. | |
void | PrintTimeStamp (Double_t timeStamp) |
Print the current time on local machine. More... | |
void | SetConfigFile (std::string configFilename) |
set config file path from external | |
void | SetError (std::string message="", bool print=true, int maxPrint=5) |
A metadata class may use this method to signal that something went wrong. | |
void | SetHostmgr (TRestManager *m) |
Set the host manager for this class. | |
void | SetSectionName (std::string sName) |
set the section name, clear the section content | |
void | SetVerboseLevel (TRestStringOutput::REST_Verbose_Level v) |
sets the verbose level | |
void | SetWarning (std::string message="", bool print=true, int maxPrint=5) |
A metadata class may use this method to signal that something went wrong. | |
void | Store () |
If this method is called the metadata information will be stored in disk. | |
TRestMetadata (const TRestMetadata &) | |
virtual void | UpdateMetadataMembers () |
Method to allow implementation of specific metadata members updates at inherited classes. | |
virtual Int_t | Write (const char *name=nullptr, Int_t option=0, Int_t bufsize=0) |
overwriting the write() method with fStore considered | |
void | WriteConfigBuffer (std::string fName) |
Writes the config buffer to a file in append mode. | |
~TRestMetadata () | |
TRestMetadata default destructor. | |
Protected Member Functions | |
virtual std::vector< std::string > | FileSelection () |
Function to determine the filenames that satisfy the dataset conditions. | |
void | RegenerateTree (std::vector< std::string > finalList={}) |
It regenerates the tree so that it is an exact copy of the present DataFrame. | |
Protected Member Functions inherited from TRestMetadata | |
std::string | ElementToString (TiXmlElement *ele) |
Convert an TiXmlElement object to string. More... | |
TVector2 | Get2DVectorParameterWithUnits (std::string parName, TiXmlElement *e, TVector2 defaultValue=TVector2(-1, -1)) |
TVector3 | Get3DVectorParameterWithUnits (std::string parName, TiXmlElement *e, TVector3 defaultValue=TVector3(-1, -1, -1)) |
Double_t | GetDblParameterWithUnits (std::string parName, TiXmlElement *e, Double_t defaultVal=PARAMETER_NOT_FOUND_DBL) |
TiXmlElement * | GetElement (std::string eleDeclare, TiXmlElement *e=nullptr) |
Get an xml element from a given parent element, according to its declaration. | |
TiXmlElement * | GetElementFromFile (std::string configFilename, std::string NameOrDecalre="") |
Open an xml encoded file and find its element. More... | |
TiXmlElement * | GetElementWithName (std::string eleDeclare, std::string eleName) |
Get an xml element from the default location, according to its declaration and its field "name". | |
TiXmlElement * | GetElementWithName (std::string eleDeclare, std::string eleName, TiXmlElement *e) |
Get an xml element from a given parent element, according to its declaration and its field "name". | |
std::string | GetFieldValue (std::string fieldName, std::string definition, size_t fromPosition=0) |
Gets field value in an xml element string by parsing it as TiXmlElement. | |
std::string | GetFieldValue (std::string parName, TiXmlElement *e) |
Returns the field value of an xml element which has the specified name. More... | |
std::string | GetKEYDefinition (std::string keyName) |
Gets the first key definition for keyName found inside buffer starting at fromPosition. More... | |
std::string | GetKEYDefinition (std::string keyName, size_t &Position) |
std::string | GetKEYDefinition (std::string keyName, size_t &Position, std::string buffer) |
std::string | GetKEYDefinition (std::string keyName, std::string buffer) |
std::string | GetKEYStructure (std::string keyName) |
Gets the first key structure for keyName found inside buffer after fromPosition. More... | |
std::string | GetKEYStructure (std::string keyName, size_t &Position) |
std::string | GetKEYStructure (std::string keyName, size_t &Position, std::string buffer) |
std::string | GetKEYStructure (std::string keyName, size_t &Position, TiXmlElement *ele) |
std::string | GetKEYStructure (std::string keyName, std::string buffer) |
TiXmlElement * | GetNextElement (TiXmlElement *e) |
Get the next sibling xml element of this element, with same eleDeclare. | |
std::string | GetParameter (std::string parName, size_t &pos, std::string inputString) |
Returns the value for the parameter name parName found in inputString. More... | |
std::string | GetParameter (std::string parName, TiXmlElement *e, TString defaultValue=PARAMETER_NOT_FOUND_STR) |
Returns the value for the parameter named parName in the given section. More... | |
std::pair< std::string, std::string > | GetParameterAndUnits (std::string parname, TiXmlElement *e=nullptr) |
Returns the unit string of the given parameter of the given xml section. More... | |
std::map< std::string, std::string > | GetParametersList () |
It retrieves a map of all parameter:value found in the metadata class. | |
TString | GetSearchPath () |
virtual void | InitFromRootFile () |
Method called after the object is retrieved from root file. | |
virtual Int_t | LoadSectionMetadata () |
This method does some preparation of xml section. More... | |
void | ReadAllParameters () |
Reflection methods, Set value of a datamember in class according to TRestMetadata::fElement. More... | |
void | ReadParametersList (std::map< std::string, std::string > &list) |
It reads a parameter list and associates it to its corresponding metadata member. par0 --> fPar0. | |
std::string | ReplaceConstants (const std::string buffer) |
Identifies "constants" in the input buffer, and replace them with corresponding value. More... | |
std::string | ReplaceVariables (const std::string buffer) |
Identifies environmental variable replacing marks in the input buffer, and replace them with corresponding value. More... | |
void | ReSetVersion () |
Resets the version of TRestRun to REST_RELEASE. Only TRestRun is allowed to update version. | |
std::string | SearchFile (std::string filename) |
Search files in current directory and directories specified in "searchPath" section. More... | |
void | SetLibraryVersion (TString version) |
Set the library version of this metadata class. | |
TiXmlElement * | StringToElement (std::string definition) |
Parsing a string into TiXmlElement object. More... | |
TRestMetadata () | |
TRestMetadata default constructor. | |
TRestMetadata (const char *configFilename) | |
constructor | |
void | UnSetVersion () |
Resets the version of TRestRun to -1, in case the file is old REST file. Only TRestRun is allowed to update version. | |
Private Member Functions | |
void | InitFromConfigFile () override |
Initialization of specific TRestDataSet members through an RML file. More... | |
Private Attributes | |
std::vector< std::pair< std::string, std::string > > | fColumnNameExpressions |
A list of new columns together with its corresponding expressions added to the dataset. | |
TRestCut * | fCut = nullptr |
Parameter cuts over the selected dataset. | |
ROOT::RDF::RNode | fDataFrame = ROOT::RDataFrame(0) |
The resulting RDF::RNode object after initialization. | |
Double_t | fEndTime = REST_StringHelper::StringToTimeStamp(fFilterStartTime) |
TimeStamp for the end time of the last file. | |
Bool_t | fExternal = false |
std::string | fFilePattern = "" |
A glob file pattern that must be satisfied by all files. | |
std::vector< std::string > | fFileSelection |
A list populated by the FileSelection method using the conditions of the dataset. | |
std::vector< std::string > | fFilterContains |
If not empty it will check if the metadata member contains the string. | |
std::string | fFilterEndTime = "3000/12/31" |
All the selected runs will have an ending date before fEndTime. | |
std::vector< Double_t > | fFilterEqualsTo |
If the corresponding element is not empty it will check if the metadata member is equal. | |
std::vector< Double_t > | fFilterGreaterThan |
If the corresponding element is not empty it will check if the metadata member is greater. | |
std::vector< Double_t > | fFilterLowerThan |
If the corresponding element is not empty it will check if the metadata member is lower. | |
std::vector< std::string > | fFilterMetadata |
A list of metadata members where filters will be applied. | |
std::string | fFilterStartTime = "2000/01/01" |
All the selected runs will have a starting date after fStartTime. | |
std::vector< std::string > | fImportedFiles |
The list of dataset files imported. | |
Bool_t | fMergedDataset = false |
It keeps track if the generated dataset is a pure dataset or a merged one. | |
Bool_t | fMT = false |
A flag to enable Multithreading during dataframe generation. | |
std::vector< std::string > | fObservablesList |
It contains a list of the observables that will be added to the final tree or exported file. | |
std::vector< std::string > | fProcessObservablesList |
It contains a list of the process where all observables should be added. | |
std::map< std::string, RelevantQuantity > | fQuantity |
The properties of a relevant quantity that we want to store together with the dataset. | |
Double_t | fStartTime = REST_StringHelper::StringToTimeStamp(fFilterEndTime) |
TimeStamp for the start time of the first file. | |
Double_t | fTotalDuration = 0 |
The total integrated run time of selected files. | |
TChain * | fTree = nullptr |
A pointer to the generated tree. | |
Additional Inherited Members | |
Protected Attributes inherited from TRestMetadata | |
std::string | configBuffer |
The buffer where the corresponding metadata section is stored. Filled only during Write() | |
std::string | fConfigFileName |
Full name of the rml file. More... | |
std::map< std::string, std::string > | fConstants |
Saving a list of rml constants. name-value std::pair. Constants are temporary for this class only. | |
TiXmlElement * | fElement |
Saving the sectional element together with global element. | |
TiXmlElement * | fElementGlobal |
Saving the global element, to be passed to the resident class, if necessary. | |
Bool_t | fError = false |
It can be used as a way to identify that something went wrong using SetError method. | |
TString | fErrorMessage = "" |
A std::string to store an optional error message through method SetError. | |
TRestManager * | fHostmgr |
All metadata classes can be initialized and managed by TRestManager. | |
Int_t | fNErrors = 0 |
It counts the number of errors notified. | |
Int_t | fNWarnings = 0 |
It counts the number of warnings notified. | |
std::string | fSectionName |
Section name given in the constructor of the derived metadata class. | |
Bool_t | fStore |
This variable is used to determine if the metadata structure should be stored in the ROOT file. | |
std::map< std::string, std::string > | fVariables |
Saving a list of rml variables. name-value std::pair. | |
TRestStringOutput::REST_Verbose_Level | fVerboseLevel |
Verbose level used to print debug info. | |
Bool_t | fWarning = false |
It can be used as a way to identify that something went wrong using SetWarning method. | |
TString | fWarningMessage = "" |
It can be used as a way to identify that something went wrong using SetWarning method. | |
std::string | messageBuffer |
The buffer to store the output message through TRestStringOutput in this class. | |
endl_t | RESTendl |
Termination flag object for TRestStringOutput. | |
TRestDataSet::TRestDataSet | ( | const char * | cfgFileName, |
const std::string & | name = "" |
||
) |
Constructor loading data from a config file.
If no configuration path is defined using TRestMetadata::SetConfigFilePath the path to the config file must be specified using full path, absolute or relative.
The default behaviour is that the config file must be specified with full path, absolute or relative.
cfgFileName | A const char* giving the path to an RML file. |
name | The name of the specific metadata. It will be used to find the corresponding TRestAxionMagneticField section inside the RML. |
Definition at line 319 of file TRestDataSet.cxx.
ROOT::RDF::RNode TRestDataSet::DefineColumn | ( | const std::string & | columnName, |
const std::string & | formula | ||
) |
This function will add a new column to the RDataFrame using the same scheme as the usual RDF::Define method, but it will on top of that evaluate the values of any relevant quantities used.
For example, the following code line would create a new column named test
replacing the relevant quantity Nsim
and the previously existing column probability
.
Definition at line 618 of file TRestDataSet.cxx.
void TRestDataSet::Export | ( | const std::string & | filename, |
std::vector< std::string > | excludeColumns = {} |
||
) |
It will generate an output file with the dataset compilation. Only the selected branches and the files that fulfill the metadata filter conditions will be included.
For the moment we produce two different output files.
csv
or txt
extension): It will produce a table with observable values, including a header of the dataset conditions.root
extension): It will write to disk an Snapshot of the current dataset, i.e. in standard TTree format, together with a copy of the TRestDataSet instance that contains the conditions used to generate the dataset. Definition at line 861 of file TRestDataSet.cxx.
void TRestDataSet::Import | ( | std::vector< std::string > | fileNames | ) |
This function initializes the chained tree and the RDataFrame using as input several root files that should contain TRestDataSet metadata information. The values of the first dataset will be considered to be stored in this new instance.
The metadata member fMergedDataset
will be set to true to understand this dataset is the combination of several datasets, and not a pure original one.
Definition at line 1097 of file TRestDataSet.cxx.
|
overrideprivatevirtual |
Initialization of specific TRestDataSet members through an RML file.
Reading filters
Reading observables
Reading process observables
Reading relevant quantities
Reading new dataset columns
Reimplemented from TRestMetadata.
Definition at line 732 of file TRestDataSet.cxx.