Commit f319cc45 authored by Andrea Tugores's avatar Andrea Tugores
Browse files

Update README.md

parent ab4d58cb
......@@ -3,9 +3,9 @@
Taking advantage of the open sciene approach of the ATLAS Open Data project and it's tools, for educational purposes, which allow to have an idea of how they perform data analysis in high-energy physics, the project was focused on the automation of the notebooks where these analyzes are carried out and the documentation was expanded to make them easier to understand. In addition, the Git and Zenodo tools were integrated into a program to promote the concepts of open science and reproducibility.
## Overview
ATLAS Open Data is an educational project in High Energy Physics that provides data and tools to high-school, master and ungraduate students, as well as teachers and lectures, to help educate them in physics analysis techniques used in experimental particle physics.
[ATLAS Open Data](https://atlas-opendata.web.cern.ch/atlas-opendata/) is an educational project in High Energy Physics that provides data and tools to high-school, master and ungraduate students, as well as teachers and lectures, to help educate them in physics analysis techniques used in experimental particle physics.
One of the main data files of ATLAS Open Data is the 13 TeV samples, which were released in 2016. As part of the resources available to the public, they are also accompanied by a set of Jupyter notebooks. The ATLAS Open Data Jupyter Notebooks allow data analysis to be performed directly in a web browser by integrating the ROOT framework with the Jupyter Notebook technology, a combination called *ROOTbook*. The frameworks implement the protocols needed for reading the datasets, making an analysis selection, writing out histograms and plotting the results, by only needing to execute the code written in the Jupyter Notebook, taking advantage of it's functionalities.
One of the main data files of ATLAS Open Data is the 13 TeV samples, which were released in 2016. As part of the resources available to the public, they are also accompanied by a set of Jupyter notebooks. The ATLAS Open Data Jupyter Notebooks allow data analysis to be performed directly in a web browser by integrating the [ROOT](https://root.cern/about/) framework with the Jupyter Notebook technology, a combination called *ROOTbook*. The frameworks implement the protocols needed for reading the datasets, making an analysis selection, writing out histograms and plotting the results, by only needing to execute the code written in the [Jupyter Notebook](https://jupyter.org/), taking advantage of it's functionalities.
The project is encapsulated in the ATLAS Open Data project, by following and sharing the philosophy behind it. To understand the project, it is necessary to know some important concepts that have revolutionized the scientific world, such as Open Science and Reproducibility.
......@@ -33,9 +33,9 @@ The last update of the ATLAS Open Data Jupyter Notebooks, although containing th
In addition to that, there was a lack of the proper documentation inside the notebooks, both from the physical aspects of the analysis and from the commands in the code. These weak points meant a longer process of learning for the less experienced and knowledgeable users of these Jupyter Notebooks series.
On a side part, even when ATLAS has a platform to create code, run analysis, and get data from their ATLAS Open Data, which is the ATLAS Virtual Machine (VM), this was not fully integrated to the Jupyter Notebooks in the sense that the mention of the ATLAS VM did not describe it in an accurate way, that presents it as an extraordinary tool to run the ATLAS Open Data Jupyter Notebooks and to later created original content from ATLAS Open Data. The ATLAS VM was also not introduced clearly in the README file. For students who accessed the ATLAS GitHub repository before the ATLAS Open Data website, the information and tutorials of the VM would remain unknown until they navigate to the other sites.
On a side part, even when ATLAS has a platform to create code, run analysis, and get data from their ATLAS Open Data, which is the ATLAS Virtual Machine (VM), this was not fully integrated to the Jupyter Notebooks in the sense that the mention of the ATLAS VM did not describe it in an accurate way, that presents it as an extraordinary tool to run the ATLAS Open Data Jupyter Notebooks and to later created original content from ATLAS Open Data. The ATLAS VM was also not introduced clearly in the original README file. For students who accessed the ATLAS GitHub repository before the ATLAS Open Data website, the information and tutorials of the VM would remain unknown until they navigate to the other sites.
Finally, even though the philosophy behind ATLAS Open Data is about Open Science and Reproducibility, there is no information available for the user of at least some platforms to make science more shareable and discoverable. Even when Git is a well known software for version control that facilitates scientific reproducibility, not all students are Git users, and not all students are familiar with executing commands in a terminal, or have a complete understanding of the basic git commands. The ATLAS Open Data Jupyter Notebooks are great educational tools but they work best for users who are already GitHub users, and have already had experience with the functioning of the git commands in their local machine.
Finally, even though the philosophy behind ATLAS Open Data is about Open Science and Reproducibility, there is no information available for the user of at least some platforms to make science more shareable and discoverable. Even when Git is a well known software for version control that facilitates scientific reproducibility, not all students are Git users, and not all students are familiar with executing commands in a terminal, or have a complete understanding of the basic git commands.
Going beyond the code, the need of sharing, citing and the DOI versioning in the scientific community is constantly growing. Although it is not required, or needed, for the ATLAS Open Data website to include options for scientists to help in the publishing of their contributions, CERN is part of the organizations that fund Zenodo.
......@@ -47,7 +47,7 @@ Considering interactive content as a powerful resource that provides a unique ex
Python packages are the fundamental unit of shareable code in Python, and they make it easy to reuse your code; maintain it and share it with your colleagues and the wider Python community. By taking this into account, being sure that all the needed packages were imported at the beginning of the notebook was an important step to guarantee that the analysis and plotting would take place when their corresponding commands were executed by the user.
Apart from the previous required packages, new libraries were added, such as pandas and matplotlib, to be able to use DataFrames to store information of the analysis, and to show the histograms that are produced within the same notebook.
Apart from the previous required packages, new libraries were added, such as [pandas](https://pandas.pydata.org/pandas-docs/stable/) and [matplotlib](https://matplotlib.org/), to be able to use DataFrames to store information of the analysis, and to show the histograms that are produced within the same notebook.
To run the code as smoothly as possible, conditionals were included in certain cells, to avoid the repeated execution of commands that were only needed to be used only once.
......@@ -57,21 +57,27 @@ By adding interactivity with the user and storing their responses in variables t
This work was complemented by including detailed information in the markdown cells about: the imported Python packages, their uses and functionalities; explanations of what was happening in each of the code cells; the commands and the methods function, and the physical analysis details.
Continuing with the approach to integrate the ATLAS Virtual Machine, the solution to this problem was simple, but not less effective: by writing a tutorial in the README file, and including the links to the ATLAS VM documentation and the already made video tutorials, the information is more reachable and easier to navigate in.
Continuing with the approach to integrate the ATLAS Virtual Machine, the solution to this problem was simple, but not less effective: by writing a tutorial in the README file found in the Documentation folder.
Ultimately, to promote the usage of Git and Zenodo, a program was created, named the Git\&Zenodo Assistant. Its main menu is stored in a tutorial.py file that calls the functions in the packages and subpackages which contain the necessary code to run all the program’s options.\\
Ultimately, to promote the usage of Git and Zenodo, a program was created, named the Git\&Zenodo Assistant. Its main menu is stored in a tutorial.py file that calls the functions in the packages and subpackages which contain the necessary code to run all the program’s options.
The code for the program is written in Python3 and it requires the installation of these packages and their updates: pip, to install and manage the other packages; requests, for the codes of Zenodo Rest API; termcolor, for coloring the words that appear in the program; pandas, to manage the database of the controlled vocabulary for the metadata of Zenodo; and tkinter, (and tkcalendar) for using the graphical user interface. All of these packages are listed in a .sh file with the explicit name of requirements.sh. The instructions of installation are explained in the README file.
The code for the program is written in Python3 and it requires the installation of these packages and their updates: pip, to install and manage the other packages; requests, for the codes of Zenodo Rest API; termcolor, for coloring the words that appear in the program; pandas, to manage the database of the controlled vocabulary for the metadata of Zenodo; and tkinter, (and tkcalendar) for using the graphical user interface. All of these packages are listed in a .sh file with the explicit name of requirements.sh. The instructions of installation are explained in the README file found in the Git-Zenodo-Assistant folder.
The program was designed to run specifically in ATLAS Virtual Machine terminal for better integration, although creating a program compatible in local machines and different operative systems could be desirable.\\
The program was designed to run specifically in ATLAS Virtual Machine terminal for better integration, although creating a program compatible in local machines and different operative systems could be desirable.
\noindent The true functionalities of the Git\&Zenodo Assistant Program come from its packages and subpackages. They contain functions that allow the user to get links for the Git and Zenodo documentation; to review some important concepts of Git, its commands and Zenodo’s features, in windows that are optional to open; to upload a file to Zenodo, including its metadata by filling some required information, and to push a file to Git not by typing git commands, but by pressing enter to execute. In case of needing help, the README file contains the links and steps to create a GitHub and Zenodo account and to generate the tokens to access the options of the program, which are necessary to git push and upload a file
Note that, because interactivity was considered an important part of the experience, the tkinter module was used to generate windows for: selecting a date from a calendar; choosing a directory or a file from a file browser to get a path; choosing from a list of options.
Note that, because interactivity was considered an important part of the experience, the tkinter module was used to generate windows for: selecting a date from a calendar; choosing a directory or a file from a file browser to get a path; choosing from a list of options. All of this, while conserving the terminal experience, as the git commands appear in green letters to emphasize the importance of executing git commands in the terminal
## Repository structure
The repository contains three folders: Documentation, Analysis-notebook, and Git-Zenodo-Assistant. Each of these folders has it's own README where the contents of the folder and the code are explained in detail, in cases where they have code files.
The repository in which the README, the Notebook and the program are stored also contains image files and markdown documents with the information that is provided in the notebook and the program, as a way to make it easier for the user to review the information without running any code, making documentation always available.
### Documentation folder:
The README file in this folder explains how to install the virtual machine, which is essential to run the notebook. Including the links to the ATLAS VM documentation and the already made video tutorials, the information is more reachable and easier to navigate in.
## Structure of repository
### Analysis-notebook folder:
### Git-Zenodo-Assistant folder:
The true functionalities of the Git\&Zenodo Assistant Program come from its packages and subpackages. They contain functions that allow the user to get links for the Git and Zenodo documentation; to review some important concepts of Git, its commands and Zenodo’s features, in windows that are optional to open; to upload a file to Zenodo, including its metadata by filling some required information, and to push a file to Git not by typing git commands, but by pressing enter to execute. In case of needing help, the README file contains the links and steps to create a GitHub and Zenodo account and to generate the tokens to access the options of the program, which are necessary to git push and upload a file
## Results
The notebook allows the user to have a clearer idea of what is happening as each cell is executed.User interactivity is no longer through direct modifications to the code but through inputs where the user is asked to enter any of the available options. By adding information on the physics and on the generated histograms, the user has the opportunity to access the entire analysis process without leaving the notebook, being able to even see the results.
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment