| Plain tests in dev branch: [](https://jenkins.eosc-synergy.eu/job/eosc-synergy-org/job/onedataSim/job/dev/) | onedatasim-s0 image: [](https://jenkins.eosc-synergy.eu/job/eosc-synergy-org/job/onedataSim/job/build-S0/) | onedatasim-s1 image: [](https://jenkins.eosc-synergy.eu/job/eosc-synergy-org/job/onedataSim/job/build-S1/) |
| Plain tests in dev branch: [](https://jenkins.eosc-synergy.eu/job/eosc-synergy-org/job/onedataSim/job/dev/) | onedatasim-s0 image: [](https://jenkins.eosc-synergy.eu/job/eosc-synergy-org/job/onedataSim/job/build-S0/) | onedatasim-s1 image: [](https://jenkins.eosc-synergy.eu/job/eosc-synergy-org/job/onedataSim/job/build-S1/) |
|------|------|-------|
|------|------|-------|
## About
## About
onedataSim standardises the simulations and their analysis in LAGO Collaboration to curate, re-use and publish the results, following the Data Management Plan (DMP) established (https://lagoproject.github.io/DMP/). For this purpose, onedataSim packets ARTI and related software into a Docker image, giving researchers the advantage of obtaining results on any plataform and publishing them on LAGO repositories.
onedataSim standardises the simulations and their analysis in LAGO Collaboration to curate, re-use and publish the results, following the Data Management Plan (DMP) established (https://lagoproject.github.io/DMP/). For this purpose, onedataSim packets ARTI and related software into a Docker image, giving researchers the advantage of obtaining results on any plataform and publishing them on LAGO repositories.
...
@@ -13,7 +12,7 @@ When using onedataSim or the data and metadata that it outputs, please cite the
...
@@ -13,7 +12,7 @@ When using onedataSim or the data and metadata that it outputs, please cite the
A. J. Rubio-Montero, R. Pagán-Muñoz, R. Mayo-García, A. Pardo-Diaz, I. Sidelnik and H. Asorey, "*A Novel Cloud-Based Framework For Standardized Simulations In The Latin American Giant Observatory (LAGO)*," 2021 Winter Simulation Conference (WSC), 2021, pp. 1-12, doi: [10.1109/WSC52266.2021.9715360](https://doi.org/10.1109/WSC52266.2021.9715360)
A. J. Rubio-Montero, R. Pagán-Muñoz, R. Mayo-García, A. Pardo-Diaz, I. Sidelnik and H. Asorey, "*A Novel Cloud-Based Framework For Standardized Simulations In The Latin American Giant Observatory (LAGO)*," 2021 Winter Simulation Conference (WSC), 2021, pp. 1-12, doi: [10.1109/WSC52266.2021.9715360](https://doi.org/10.1109/WSC52266.2021.9715360)
### Acknowledgment
### Acknowledgment
This work is financed by [EOSC-Synergy](https://www.eosc-synergy.eu/) project (EU H2020 RI Grant No 857647), but it is also currently supported by human and computational resources under the [EOSC](https://www.eosc-portal.eu/) umbrella (specially [EGI](https://www.egi.eu), [GEANT](https://geant.org) ) and the [members](http://lagoproject.net/collab.html) of the LAGO Collaboration.
This work is financed by [EOSC-Synergy](https://www.eosc-synergy.eu/) project (EU H2020 RI Grant No 857647), but it is also currently supported by human and computational resources under the [EOSC](https://www.eosc-portal.eu/) umbrella (specially [EGI](https://www.egi.eu), [GEANT](https://geant.org) ) and the [members](http://lagoproject.net/collab.html) of the LAGO Collaboration.
...
@@ -26,7 +25,7 @@ However, the main objective of onedataSim is to standardise the simulation and i
...
@@ -26,7 +25,7 @@ However, the main objective of onedataSim is to standardise the simulation and i
1.**``do_sims_onedata.py``** that:
1.**``do_sims_onedata.py``** that:
- executes simulations as ``do_sims.sh``, exactly with same parameters;
- executes simulations as ``do_sims.sh``, exactly with same parameters;
- caches partial results as local scratch and then copies them to the official [LAGO repository](https://datahub.egi.eu) based on [OneData](https://github.com/onedata);
- caches partial results as local scratch and then copies them to the official [LAGO repository](https://datahub.egi.eu) based on [OneData](https://github.com/onedata);
- makes standardised metadata for every inputs and results and includes them as extended attributes in OneData filesystem.
- makes standardised metadata for every inputs and results and includes them as extended attributes in OneData filesystem.
2.**``do_showers_onedata.py``** that:
2.**``do_showers_onedata.py``** that:
- executes analysis as ``do_showers.sh`` does.
- executes analysis as ``do_showers.sh`` does.
- caches the selected simulation to be analisyed in local from the official [LAGO repository](https://datahub.egi.eu) and then stores again the results to the repository;
- caches the selected simulation to be analisyed in local from the official [LAGO repository](https://datahub.egi.eu) and then stores again the results to the repository;
...
@@ -34,19 +33,19 @@ However, the main objective of onedataSim is to standardise the simulation and i
...
@@ -34,19 +33,19 @@ However, the main objective of onedataSim is to standardise the simulation and i
Storing results on the official repository with standardised metadata enables:
Storing results on the official repository with standardised metadata enables:
- sharing results with other LAGO members;
- sharing results with other LAGO members;
- future searches and publishing through institutional/goverment catalog providers and virtual observatories such as the [B2FIND](https://b2find.eudat.eu/group/lago);
- future searches and publishing through institutional/goverment catalog providers and virtual observatories such as the [B2FIND](https://b2find.eudat.eu/group/lago);
- properly citing scientific data and diseminating results through internet through Handle.net' PiDs;
- properly citing scientific data and diseminating results through internet through Handle.net' PiDs;
- building new results based on data minig or big data techniques thanks to linked metadata.
- building new results based on data minig or big data techniques thanks to linked metadata.
Therefore, we encourage LAGO researchers to use these programs for their simulations.
Therefore, we encourage LAGO researchers to use these programs for their simulations.
## Pre-requisites
## Pre-requisites
1. Be acredited in [LAGO Virtual Organisation](https://lagoproject.github.io/DMP/docs/howtos/how_to_join_LAGO_VO/) to obtain a OneData personal [token.](https://lagoproject.github.io/DMP/docs/howtos/how_to_login_into_OneData/)
1. Be acredited in [LAGO Virtual Organisation](https://lagoproject.github.io/DMP/docs/howtos/how_to_join_LAGO_VO/) to obtain a OneData personal [token.](https://lagoproject.github.io/DMP/docs/howtos/how_to_login_into_OneData/).
2. Had [Docker](https://www.docker.com/)(or[Singularity](https://singularity.lbl.gov/) or [udocker](https://pypi.org/project/udocker/)) installed on your PC (or HPC/HTC facility)
2. Had [Docker](https://www.docker.com/)(or[Singularity](https://singularity.lbl.gov/) or [udocker](https://pypi.org/project/udocker/)) installed on your PC (or HPC/HTC facility).
It is only needed [Docker Engine](https://docs.docker.com/engine/install/) to run onedataSim container, this is, the *SERVER* mode. However, the *DESKTOP* mode is the only available for Windows and MacOs, it includes the Docker Engine but also more functionalities.
It is only needed [Docker Engine](https://docs.docker.com/engine/install/) to run onedataSim container, this is, the *SERVER* mode. However, the *DESKTOP* mode is the only available for Windows and MacOs, it includes the Docker Engine but also more functionalities.
On linux, the recommended way is to remove all docker packages included by default in your distro and to make use of Docker repositories.
On linux, the recommended way is to remove all docker packages included by default in your distro and to make use of Docker repositories.
...
@@ -67,11 +66,11 @@ On an newly Debian release with the last Docker:
...
@@ -67,11 +66,11 @@ On an newly Debian release with the last Docker:
@@ -85,43 +84,38 @@ onedataSim, ARTI and required software (CORSIKA, GEANT4, ROOT) are built, tested
...
@@ -85,43 +84,38 @@ onedataSim, ARTI and required software (CORSIKA, GEANT4, ROOT) are built, tested
Depending on the type of data that you want generate and/or processs (i.e. [S0, S1, S2](https://lagoproject.github.io/DMP/DMP/#types-and-formats-of-generatedcollected-data)), you should pull different image, because their size.
Depending on the type of data that you want generate and/or processs (i.e. [S0, S1, S2](https://lagoproject.github.io/DMP/DMP/#types-and-formats-of-generatedcollected-data)), you should pull different image, because their size.
-**``onedatasim-s0``** is mainly for generate S0 datasets (simulations with ``do_sims_onedata.py``), but also allows S1 analysis. Therefore it includes the modified CORSIKA for LAGO, which it results in a heavy image (~911.7 MB).
-**``onedatasim-s0``** is mainly for generate S0 datasets (simulations with ``do_sims_onedata.py``), but also allows S1 analysis. Therefore it includes the modified CORSIKA for LAGO, which it results in a heavy image (~911.7 MB).
-**``onedatasim-s1``** is only for generate S1 datasets (analysis with ``do_showers_onedata.py``), but the image is smaller (currently, ~473.29 MB).
-**``onedatasim-s1``** is only for generate S1 datasets (analysis with ``do_showers_onedata.py``), but the image is smaller (currently, ~473.29 MB).
- ( Future: ``onedatasim-s2`` will be mainly for generate S2 datasets (detector response). It will include GEANt4/ROOT, and consequently, heaviest (~ 1GB)).
- ( Future: ``onedatasim-s2`` will be mainly for generate S2 datasets (detector response). It will include GEANt4/ROOT, and consequently, heaviest (~ 1GB)).
(Currently for our DockerHub space, downloads are limited to 100/day per IP. If you are many nodes under a NAT, you should consider distributing internally the docker image through ``docker save`` and ``load commands``).
(Currently for our DockerHub space, downloads are limited to 100/day per IP. If you are many nodes under a NAT, you should consider distributing internally the docker image through ``docker save`` and ``load commands``).
## Executing a stardandised simulation & analisys to be stored in OneData repositories for LAGO
## Executing a stardandised simulation & analisys to be stored in OneData repositories for LAGO
This automatised execution is the preferred one in LAGO collaboration.
This automatised execution is the preferred one in LAGO collaboration.
You can execute ``do_sims_onedata.py`` or ``do_showers_onedata.py`` in a single command, without the needed of log into the container. If there is a lack of paramenters, it prompts you for them, if not this starts and the current progress is shown while the results are automatically stored in OneData.
You can execute ``do_sims_onedata.py`` or ``do_showers_onedata.py`` in a single command, without the needed of log into the container. If there is a lack of paramenters, it prompts you for them, if not this starts and the current progress is shown while the results are automatically stored in OneData.
If you count on an standalone server for computing or a virtual machine instantiated with enough procesors memory and disk, you only need add the **-j \<procs\>** param to enable multi-processing:
If you count on an standalone server for computing or a virtual machine instantiated with enough procesors memory and disk, you only need add the **-j \<procs\>** param to enable multi-processing:
If you has enough permissions (sudo) to run Docker in privileged mode on a cluster and get the computing nodes in exclusive mode, you can run many simulations at time.
If you has enough permissions (sudo) to run Docker in privileged mode on a cluster and get the computing nodes in exclusive mode, you can run many simulations at time.
2. Example for an Slurm instantiated on EOSC resources (pre-configured by IM):
2. Example for an Slurm instantiated on EOSC resources (pre-configured by IM):
You can access to head node through SSH, using ``cloudadm`` account, but then you can gain root privileges with ``sudo``.
You can access to head node through SSH, using ``cloudadm`` account, but then you can gain root privileges with ``sudo``.
Slurm and a directory shared by NFS are already configured (/home), but some configruation has to be done: to share the users' directories and to install spackages needed for Docker:
Slurm and a directory shared by NFS are already configured (/home), but some configruation has to be done: to share the users' directories and to install spackages needed for Docker:
...
@@ -244,7 +232,7 @@ cd /home/cloudadm
...
@@ -244,7 +232,7 @@ cd /home/cloudadm
sbatch simulation.sbatch
sbatch simulation.sbatch
```
```
A simulation.sbatch file for testing functionality can be one that will write the allowed parameters in <jobnumber>.log:
A simulation.sbatch file for testing functionality can be one that will write the allowed parameters in <jobnumber>.log:
sudo docker run --privileged-eONECLIENT_ACCESS_TOKEN=$TOKEN-eONECLIENT_PROVIDER_HOST=$ONEPROVIDER-i onedatasim-s0:dev bash -lc"do_sims_onedata.py -?"
sudo docker run --privileged-eONECLIENT_ACCESS_TOKEN=$TOKEN-eONECLIENT_PROVIDER_HOST=$ONEPROVIDER-i onedatasim-s0:dev bash -lc"do_sims_onedata.py -?"
```
```
## Instructions only for developers
## Instructions only for developers
### Building the onedataSim container
### Building the onedataSim container
Every container has different requrirements. To build the ``onedatasim-s0`` container is needed to provide as parameter an official ``lago-corsika`` image as base installation. This is so because ARTI simulations currently call [CORSIKA 7](https://www.ikp.kit.edu/corsika/79.php), which source code is licensed only for the internal use of LAGO collaborators. On the other hand, ``onedatasim-s2`` requires GEANT4/Root, and other official images must be used.
Every container has different requrirements. To build the ``onedatasim-s0`` container is needed to provide as parameter an official ``lago-corsika`` image as base installation. This is so because ARTI simulations currently call [CORSIKA 7](https://www.ikp.kit.edu/corsika/79.php), which source code is licensed only for the internal use of LAGO collaborators. On the other hand, ``onedatasim-s2`` requires GEANT4/Root, and other official images must be used.
On the other hand, other parameters allow choosing ARTI and onedataSim branches, which is fundamental for developing.
On the other hand, other parameters allow choosing ARTI and onedataSim branches, which is fundamental for developing.
#### Example: building images from default branches (currently "dev"):
#### Example: building images from default branches (currently "dev"):
You must indicate the BASE_OS parameter if you want creating S0 or S2 images:
You must indicate the BASE_OS parameter if you want creating S0 or S2 images:
#### Example: building ``onedatasim-s0`` from featured branches:
#### Example: building ``onedatasim-s0`` from featured branches:
If you have the newer releases of *git* installed in your machine, you can build the container with one command. Note that afther the *.git* link, there hare an '#' followed of again the ONEDATASIM_BRANCH name.
If you have the newer releases of *git* installed in your machine, you can build the container with one command. Note that afther the *.git* link, there hare an '#' followed of again the ONEDATASIM_BRANCH name.
### Storing data on testing spaces based on OneData:
### Storing data on testing spaces based on OneData:
You can use testing spaces such as ``test8`` to store testing runs during development. For this purpose you should the suitable OneData provider and use the the ``--onedata_path`` parameter to select the correct path.
You can use testing spaces such as ``test8`` to store testing runs during development. For this purpose you should the suitable OneData provider and use the the ``--onedata_path`` parameter to select the correct path.
For ``test8``, you should choose ceta-ciemat-**02**.datahub.egi.eu and any directory <dir> under the ``--onedata_path /mnt/datahub.egi.eu/test8/<dir>`` path:
For ``test8``, you should choose ceta-ciemat-**02**.datahub.egi.eu and any directory <dir> under the ``--onedata_path /mnt/datahub.egi.eu/test8/<dir>`` path: