Commit f5193454 authored by Andrea Tugores's avatar Andrea Tugores
Browse files

command description

parent a9a57d6b
%% Cell type:markdown id: tags:
<CENTER><img src="http://opendata.atlas.cern/DataAndTools/pictures/ATLASOD.gif" style="width:50%"></CENTER>
%% Cell type:markdown id: tags:
<CENTER><h1> Analysis Techniques used in Experimental Particle Physics</h1></CENTER>
%% Cell type:markdown id: tags:
<p style='text-align: justify;'>
A set of pp collision data has been released by the ATLAS Collaboration to the public for educational purposes. The general aim of the 13 TeV ATLAS Open Data and tools released is to provide a straightforward interface to replicate the procedures used by high-energy-physics researchers and enable users to experience the analysis of particle physics data in educational environments.
</p>
Let's take a current ATLAS Open Data sample and create histograms.
%% Cell type:markdown id: tags:
### Import some Python packages
<p style='text-align: justify;'>
We organize a large number of files in different folders and subfolders based on some criteria, so that we can find and manage them easily. A package is a hierarchical file directory structure that defines a single Python application environment that consists of modules and subpackages.
</p>
<p style='text-align: justify;'>
A module allows you to logically organize your Python code. Grouping related code into a module makes the code easier to understand and use. A module can define functions, classes and variables. A module can also include runnable code.
</p>
<p style='text-align: justify;'>
The <strong>OS module</strong> in Python provides functions for interacting with the operating system. OS comes under Python’s standard utility modules. This module provides a portable way of using operating system-dependent functionality.It is possible to automatically perform many operating system tasks. The OS module provides functions for creating and removing a directory (folder), fetching its contents, changing and identifying the current directory, etc.
</p>
<p style='text-align: justify;'>
The <strong>datetime module</strong> supplies classes for manipulating dates and times. These classes provide a number of functions to deal with dates, times and time intervals. Date and datetime are an object in Python, so when you manipulate them, you are actually manipulating objects and not string or timestamps.
</p>
<p style='text-align: justify;'>
With PyROOT, ROOT’s Python-C++ bindings, you can use <strong>ROOT</strong> from Python. PyROOT is HEP’s entrance to all C++ from Python, for example, for frameworks and their steering code. The PyROOT bindings are automatic and dynamic: no pre-generation of Python wrappers is necessary. With PyROOT you can access the full ROOT functionality from Python while benefiting from the performance of the ROOT C++ libraries.
</p>
**TMath** encapsulate most frequently used Math functions. The basic functions Min, Max, Abs and Sign are defined in TMathBase.
<p style='text-align: justify;'>
Finally, the <strong>pandas</strong> library is a Python library that serves as a tool for reading, writing and manipulating data in the form of DataFrames and Series objects. It will not be used to running or plotting the Analysis, but to read the ATLAS Open Data database with the descriptions of the analysis that can be done in this notebook.
</p>
%% Cell type:code id: tags:
``` python
# Sugerencia
# !pip install pandas
# !pip install matplotlib
```
#### Install modules or packages in python
To run from the virtual machine it is not necessary to install any package because the necessary ones are already installed.
In case you need to install a package: **!pip install pandas**
%% Cell type:code id: tags:
``` python
import os
import datetime
import ROOT
from ROOT import TMath
#import ROOT
#from ROOT import TMath
import pandas as pd
import matplotlib.pyplot as plt
```
%% Cell type:markdown id: tags:
<p style='text-align: justify;'>
One of the classes defined in the datetime module is datetime class. Its a combination of date and time along with the attributes year, month, day, hour, minute, second, microsecond, and tzinfo. We then used now() method to create a datetime object containing the current local date and time.
</p>
%% Cell type:code id: tags:
``` python
starttime = datetime.datetime.now()
print(starttime)
```
%% Cell type:markdown id: tags:
<p style='text-align: justify;'>
<strong>os.popen():</strong> This method opens a pipe to or from command. The return value can be read or written depending on whether mode is ‘r’ or ‘w’. The mode parameter is not required, if not provided, the default "r" is taken for the mode.
</p>
%% Cell type:code id: tags:
``` python
myCmd = os.popen('date').read()
print(myCmd)
```
%% Cell type:markdown id: tags:
## Get the code the 13TeV analysis
<p style='text-align: justify;'>
Create folder named <strong>'atlas-data'</strong> where the information of twelve physics analysis examples inspired by the published results of ATLAS Collaboration will be cloned and stored to demonstrate the wide range of scenarios. The folder will be created and the data will be cloned only the first time the notebook is run.
</p>
Setting the path as a variable. To get the location of the current working directory <strong>os.getcwd()</strong> is used.
%% Cell type:code id: tags:
``` python
directory0 = os.getcwd()
print(directory0)
```
%% Cell type:markdown id: tags:
<p style='text-align: justify;'>
<strong>os.system(command)</strong> method execute the command (a string) in a subshell. This method is implemented by calling the Standard C function system. If command generates any output, it is sent to the interpreter standard output stream. Whenever this method is used then the respective shell of the Operating system is opened and the command is executed on it. The command parameter indicates which command to execute.
</p>
The <strong>mkdir</strong> command allows users to create or make new directories. mkdir stands for “make directory”.
%% Cell type:code id: tags:
``` python
folder_demo = 'atlas-data'
if os.path.exists(directory0+"/atlas-data"):
print("The folder exists")
else:
command = 'mkdir '+folder_demo
os.system(command)
```
%% Cell type:markdown id: tags:
Show the contents in the current folder, and check your new folder is there.
Commands:
<p style='text-align: justify;'>
'ls' with no option list files and directories in bare format where we won’t be able to view details like file types, size, modified date and time, permission and links etc.
</p>
The command <strong>'ls lhrt'</strong> shows the last modified documents.
%% Cell type:code id: tags:
``` python
myCmd = os.popen('ls -lhrt').read()
print(myCmd)
```
%% Cell type:markdown id: tags:
Print the current path/location.
<p style='text-align: justify;'>
The <strong>pwd</strong> command stands for print working directory. When invoked the command prints the complete path of the current working directory.
</p>
%% Cell type:code id: tags:
``` python
myCmd = os.popen('pwd').read()
print(myCmd)
```
%% Cell type:markdown id: tags:
Get into the folder which we just created.
Use the <strong>chdir</strong> command to change to another directory. The syntax is chdir followed by the name of the directory you want to go to.
%% Cell type:code id: tags:
``` python
os.chdir(folder_demo+"/")
```
%% Cell type:markdown id: tags:
Print the current path and check that you are really there.
%% Cell type:code id: tags:
``` python
myCmd = os.popen('pwd').read()
print(myCmd)
```
%% Cell type:markdown id: tags:
Setting the path as a variable.
%% Cell type:code id: tags:
``` python
directory1 = os.getcwd()
print(directory1)
```
%% Cell type:markdown id: tags:
#### Clone the analysis code
If it is the first time the notebook is run, the data is cloned. Otherwise, it warns that the data already exists.
The frameworks implement the protocols needed for reading the datasets, making an analysis selection, writing out histograms and plotting the results.
<p style='text-align: justify;'>
<strong>os.path</strong> is a module implements some useful functions on pathnames. The path parameters can be passed as either strings, or bytes. The os.path module is always the path module suitable for the operating system Python is running on, and therefore usable for local paths.
</p>
**os.path.exists** os.path is checking if the path physically exists.
%% Cell type:code id: tags:
``` python
if os.path.exists(directory1+"/atlas-outreach-cpp-framework-13tev"):
print("The repository exists")
else:
myCmd = os.popen('git clone https://github.com/atlas-outreach-data-tools/atlas-outreach-cpp-framework-13tev.git').read()
print(myCmd)
```
%% Cell type:markdown id: tags:
Get into the folder which contains the analysis code.
%% Cell type:code id: tags:
``` python
os.chdir("atlas-outreach-cpp-framework-13tev/")
```
%% Cell type:markdown id: tags:
Print the current path and check that you are really there.
%% Cell type:code id: tags:
``` python
myCmd = os.popen('pwd').read()
print(myCmd)
```
%% Cell type:markdown id: tags:
Let's see the time.
%% Cell type:code id: tags:
``` python
myCmd = os.popen('date').read()
print(myCmd)
```
%% Cell type:markdown id: tags:
Show the contents in the current folder.
The framework consists of two main parts:
<p style='text-align: justify;'>
The analysis part, located within the <strong>Analysis directory</strong>: it performs the particular object selection and stores the output histograms.
The plotting part, located within the <strong>Plotting directory</strong>: it makes the final Data / Prediction plots.
</p>
%% Cell type:code id: tags:
``` python
myCmd = os.popen('ls -lhrt').read()
print(myCmd)
```
%% Cell type:markdown id: tags:
Setting the path as a variable.
%% Cell type:code id: tags:
``` python
directory2 = os.getcwd()
print(directory2)
```
%% Cell type:markdown id: tags:
**Output folders**
<p style='text-align: justify;'>
Create the output folders where the analysis results will be stored. If it is the first time the notebook is run, the output folders will be created. Otherwise, they are reported to already exist. As all the output folders are created at the same time in the next two cells it is verified that they were created using any of the folders of the 12 data examples.
</p>
<p style='text-align: justify;'>
<strong>echo</strong> command is used to display line of text/string that are passed as an argument . This is a built in command that is mostly used in shell scripts and batch files to output status text to the screen or a file.
</p>
<p style='text-align: justify;'>
<strong>"echo \"1\" | ./welcome.sh"</strong> : 1 indicates that you want to automatically create all output directories in the 12 analysis subfolders, if you put 0 instead you can delete content if necessary.
</p>
%% Cell type:code id: tags:
``` python
if os.path.exists(directory2+"/Analysis/ZBosonAnalysis/Output_ZBosonAnalysis"):
print("Folders exists")
else:
command1 = "echo \"1\" | ./welcome.sh"
os.system(command1)
```
%% Cell type:markdown id: tags:
Check that the folders were created.
%% Cell type:code id: tags:
``` python
myCmd = os.popen('ls -lhrt Analysis/HZZAnalysis').read()
print(myCmd)
```
%% Cell type:markdown id: tags:
### Let's run an Analysis now
<p style='text-align: justify;'>
The naming of the sub-folders follows a simple rule: "NNAnalysis", where NN can be WBoson, ZBoson, TTbar, SingleTop, WZDiBoson, ZZDiBoson, HZZ, HWW, Hyy, ZPrimeBoosted, ZTauTau and SUSY. After you enter your choice, you will see the description of the analysis that has been obtained from a pandas dataframe generated from a .csv file with the analysis information.
</p>
### Select one analysis
**analysis_df** is a DataFrame that contains description and explanation about each of the analyzes.
%% Cell type:code id: tags:
``` python
analysis_df = pd.read_csv(f"{directory0}/notebooks-info/analysis_info.csv", sep="_").set_index("Analysis")
```
%% Cell type:markdown id: tags:
Print all possible analysis options.
%% Cell type:code id: tags:
``` python
print("The options are:\n WBosonAnalysis\n ZBosonAnalysis\n TTbarAnalysis\n SingleTopAnalysis\n WZDiBosonAnalysis\n ZZDiBosonAnalysis\n HZZAnalysis\n HWWAnalysis\n ZTauTauAnalysis\n HyyAnalysis\n SUSYAnalysis\n ZPrimeBoostedAnalysis")
```
%% Cell type:code id: tags:
``` python
while True:
analysis = input("Enter your choice: ")
try:
os.chdir("Analysis/"+analysis)
break
except FileNotFoundError:
print("Analysis not found")
```
%% Cell type:markdown id: tags:
Show a short description of the selected analysis.
%% Cell type:code id: tags:
``` python
print(analysis_df["Description"].loc[f"{analysis}"])
```
%% Cell type:markdown id: tags:
Print the current path/location.
%% Cell type:code id: tags:
``` python
myCmd = os.popen('pwd').read()
print(myCmd)
```
%% Cell type:code id: tags:
``` python
myCmd = os.popen('ls -lhrt').read()
print(myCmd)
```
%% Cell type:markdown id: tags:
### Run the C++ code now
Please, note that we are writing the output in a log file. (This file can be inspected in real time in a terminal).
<p style='text-align: justify;'>
A bash script (run.sh), executed via a Linux/UNIX shell called source: helps you in running the analysis. The script has two options for the user to select how they want to enhance the analysis.
</p>
<p style='text-align: justify;'>
The first option will ask you: do you want to run over all the samples one-by-one, or to run over only data or only simulated samples? The latter options can help you to speed up the analysis, as you can run several samples in several terminals.
</p>
<p style='text-align: justify;'>
The second option will ask you: do you want to use the Parallel ROOT Facility (PROOF), a ROOT-integrated tool that enables the analysis of the input samples in parallel on a many-core machine? If your ROOT version has PROOF integrated, it will speed up the analysis by a factor of roughly 5.
</p>
<p style='text-align: justify;'>
After you choose the options, the code will compile and create the needed ROOT shared libraries, and the analysis selection will begin: it will run over each input sample defined in main_NNAnalysis.C.
</p>
<p style='text-align: justify;'>
<strong>"echo \"0\n0\" | ./run.sh >log"</strong> : the first 0 indicates run all data and MC one after another. The second 0 indicates that you don't want to use PROOF.
</p>
%% Cell type:markdown id: tags:
Setting the path as a variable.
%% Cell type:code id: tags:
``` python
directory3 = os.getcwd()
print(directory3)
```
%% Cell type:markdown id: tags:
Functions such as **expanduser()** and **expandvars()** can be invoked explicitly when an application desires shell-like path expansion.
<p style='text-align: justify;'>
<strong>os.path.expanduser()</strong> method in Python is used to expand an initial path component ~(tilde symbol) or ~user in the given path to user’s home directory. On Unix platforms, an initial ~ is replaced by the value of HOME environment variable, if it is set. Otherwise, os.path.expanduser() method search for user’s home directory in password directory using an in-built module pwd. Path containing an initial ~user component is looked up directly in the password directory.
</p>
%% Cell type:code id: tags:
``` python
homedirectory = os.path.expanduser("~")
print(homedirectory)
```
%% Cell type:code id: tags:
``` python
logPath = directory3.replace(f"{homedirectory}/", "")
print(logPath)
```
%% Cell type:markdown id: tags:
**tail:** output the last part of files.
%% Cell type:code id: tags:
``` python
print(f"Copy all of this command:\ntail -f {logPath}/log")
```
%% Cell type:markdown id: tags:
<p style='text-align: justify;'>
<strong>"echo \"0\n0\" | ./run.sh >log"</strong> : the first 0 indicates run all data and MC one after another. The second 0 indicates that you don't want to use PROOF.
</p>
%% Cell type:code id: tags:
``` python
command4 = "echo \"0\n0\" | ./run.sh >log"
os.system(command4)
```
%% Cell type:markdown id: tags:
If you want, you can paste the command that you just copied to your terminal and see the update of the analysis that you are running on live.
%% Cell type:markdown id: tags:
#### Now the code is done (I hope you see a 0 as output )
Get into the Plotting folder. The plotting code is located in the Plotting folder.
%% Cell type:code id: tags:
``` python
os.chdir("../../Plotting/")
```
%% Cell type:markdown id: tags:
Print the current path and check that you are really there.
%% Cell type:code id: tags:
``` python
myCmd = os.popen('pwd').read()
print(myCmd)
```
%% Cell type:markdown id: tags:
### Let's run the Plotting code
### Select the right analysis
#### The options are:
0 = WBosonAnalysis
1 = ZBosonAnalysis
2 = TTbarAnalysis
3 = SingleTopAnalysis
4 = WZDiBosonAnalysis
5 = ZZDiBosonAnalysis
6 = HWWAnalysis
7 = HZZAnalysis
8 = ZTauTauAnalysis
9 = HyyAnalysis
10 = SUSYAnalysis
11 = ZPrimeBoostedAnalysis
%% Cell type:markdown id: tags:
Bash script (plotme.sh): helps you in running the plotting code.
The script has two options for the user to select how they want to enhance the analysis.
<p style='text-align: justify;'>
The first option will be: which analysis exactly out of the 12 you want to plot?
The second option will ask you for the location of the Output_NNAnalysis directory that was created by running the Analysis code.
</p>
<p style='text-align: justify;'>
After you choose the options, the code will compile and create the needed ROOT shared libraries, and the plotting will begin. If everything was successful, the code will create in the output directory (histograms) the corresponding plots defined in HistoList_ANALYSISNAME.txt.
</p>
<p style='text-align: justify;'>
<strong>"echo \"9\n0\" | ./plotme.sh"</strong> : The first number indicates which of the 12 analyzes you want to use and the second, in this case, 0 indicates that the location of the directory is Output_NNAnalysis.
</p>
%% Cell type:code id: tags:
``` python
while True:
number = input('Choose your analysis. Enter a number: ')
try:
number = int(number)
if 0<= number <=9:
break
else:
print("Valid range, please: 0-9.")
except ValueError:
print("Try it again.")
```
%% Cell type:code id: tags:
``` python
command5 = f"echo \"{number}\n0\" | ./plotme.sh"
os.system(command5)
```
%% Cell type:markdown id: tags:
Show histograms created.
%% Cell type:code id: tags:
``` python
myCmd = os.popen('ls -lhrt histograms/*png').read()
print(myCmd)
```
%% Cell type:markdown id: tags:
Setting the path as a variable.
%% Cell type:code id: tags:
``` python
directory4 = os.getcwd()
```
%% Cell type:markdown id: tags:
#### Show histograms
**os.listdir()** method in python is used to get the list of all files and directories in the specified directory.
%% Cell type:code id: tags:
``` python
files = os.listdir(f"{directory4}/histograms")
for i in files:
path = f"{directory4}/histograms/{i}"
image = plt.imread(path)
plt.figure(figsize=(50,30))
plt.imshow(image)
plt.axis("off")
plt.show()
```
%% Cell type:markdown id: tags:
Show explanation from the ATLAS Open Data 13 TeV Documentation.
%% Cell type:code id: tags:
``` python
print(analysis_df["Explanation"].loc[f"{analysis}"])
```
%% Cell type:code id: tags:
``` python
myCmd = os.popen('date').read()
print(myCmd)
endtime = datetime.datetime.now()
print("Analysis finished in % 2d min % 2d s" %(((endtime - starttime).seconds)/60,((endtime - starttime).seconds)%60))
```
%% Cell type:markdown id: tags:
### Done!
......
atlas-outreach-cpp-framework-13tev @ e80943df
Subproject commit 0fabe1d45102cb4339d808c9187276e6c555bf8c
Subproject commit e80943df6c4e9e22e79662bdd9c90915cc237427
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment