@@ -55,29 +55,48 @@ established. For this purpose, onedataSim includes two main programs:
...
@@ -55,29 +55,48 @@ established. For this purpose, onedataSim includes two main programs:
- caches partial results as local scratch and then copies them to the official
- caches partial results as local scratch and then copies them to the official
[LAGO repository](https://datahub.egi.eu) based on
[LAGO repository](https://datahub.egi.eu) based on
[OneData](https://github.com/onedata);
[OneData](https://github.com/onedata);
- makes standardised metadata for every inputs and results and includes them as extended attributes in OneData filesystem.
- makes standardised metadata for every inputs and results and includes them
as extended attributes in OneData filesystem.
2.**``do_showers_onedata.py``** that:
2.**``do_showers_onedata.py``** that:
- executes analysis as ``do_showers.sh`` does.
- executes analysis as ``do_showers.sh`` does.
- caches the selected simulation to be analisyed in local from the official [LAGO repository](https://datahub.egi.eu) and then stores again the results to the repository;
- caches the selected simulation to be analisyed in local from the official
- makes also standardised metadata for these results and updates the corresponding catalog on OneData.
[LAGO repository](https://datahub.egi.eu) and then stores again the results
to the repository;
- makes also standardised metadata for these results and updates the
corresponding catalog on OneData.
Storing results on the official repository with standardised metadata enables:
Storing results on the official repository with standardised metadata enables:
- sharing results with other LAGO members;
- sharing results with other LAGO members;
- future searches and publishing through institutional/goverment catalog providers and virtual observatories such as the [B2FIND](https://b2find.eudat.eu/group/lago);
- future searches and publishing through institutional/goverment catalog
- properly citing scientific data and diseminating results through internet through Handle.net' PiDs;
providers and virtual observatories such as the
- building new results based on data minig or big data techniques thanks to linked metadata.
[B2FIND](https://b2find.eudat.eu/group/lago);
- properly citing scientific data and diseminating results through internet
through Handle.net' PiDs;
- building new results based on data minig or big data techniques thanks to
linked metadata.
Therefore, we encourage LAGO researchers to use these programs for their simulations.
Therefore, we encourage LAGO researchers to use these programs for their
simulations.
## Pre-requisites
## Pre-requisites
1. Be acredited in [LAGO Virtual Organisation](https://lagoproject.github.io/DMP/docs/howtos/how_to_join_LAGO_VO/) to obtain a OneData personal [token.](https://lagoproject.github.io/DMP/docs/howtos/how_to_login_into_OneData/).
2. Had [Docker](https://www.docker.com/)(or[Singularity](https://singularity.lbl.gov/) or [udocker](https://pypi.org/project/udocker/)) installed on your PC (or HPC/HTC facility).
2. Had [Docker](https://www.docker.com/)
(or [Singularity](https://singularity.lbl.gov/)
or [udocker](https://pypi.org/project/udocker/))
installed on your PC (or HPC/HTC facility).
It is only needed [Docker Engine](https://docs.docker.com/engine/install/) to run onedataSim container, this is, the *SERVER* mode. However, the *DESKTOP* mode is the only available for Windows and MacOs, it includes the Docker Engine but also more functionalities.
It is only needed [Docker Engine](https://docs.docker.com/engine/install/) to
run onedataSim container, this is, the *SERVER* mode. However, the *DESKTOP*
mode is the only available for Windows and MacOs, it includes the Docker
Engine but also more functionalities.
On linux, the recommended way is to remove all docker packages included by default in your distro and to make use of Docker repositories.
On linux, the recommended way is to remove all docker packages included by
default in your distro and to make use of Docker repositories.
For example, for a old Debian based distribution such as Ubuntu:
For example, for a old Debian based distribution such as Ubuntu:
```sh
```sh
...
@@ -110,13 +129,25 @@ On CentOS 7 with root:
...
@@ -110,13 +129,25 @@ On CentOS 7 with root:
## Downloading the official docker images to run onedataSim
## Downloading the official docker images to run onedataSim
onedataSim, ARTI and required software (CORSIKA, GEANT4, ROOT) are built, tested and packed into Docker images, following a in a [CI/CD fashion](https://jenkins.eosc-synergy.eu/job/eosc-synergy-org/job/onedataSim/). When whole steps of the CI/CD pipeline are sucesfully completed, the images are certified and stored in the space of LAGO Collaboration at [Docker-Hub](https://hub.docker.com/u/lagocollaboration). The process assures the quality of the software published.
onedataSim, ARTI and required software (CORSIKA, GEANT4, ROOT) are built,
tested and packed into Docker images, following a in a
Depending on the type of data that you want generate and/or processs (i.e. [S0, S1, S2](https://lagoproject.github.io/DMP/DMP/#types-and-formats-of-generatedcollected-data)), you should pull different image, because their size.
When whole steps of the CI/CD pipeline are sucesfully completed, the images
-**``onedatasim-s0``** is mainly for generate S0 datasets (simulations with ``do_sims_onedata.py``), but also allows S1 analysis. Therefore it includes the modified CORSIKA for LAGO, which it results in a heavy image (~911.7 MB).
are certified and stored in the space of LAGO Collaboration at
-**``onedatasim-s1``** is only for generate S1 datasets (analysis with ``do_showers_onedata.py``), but the image is smaller (currently, ~473.29 MB).
- ( Future: ``onedatasim-s2`` will be mainly for generate S2 datasets (detector response). It will include GEANt4/ROOT, and consequently, heaviest (~ 1GB)).
The process assures the quality of the software published.
Depending on the type of data that you want generate and/or processs (i.e.
(Currently for our DockerHub space, downloads are limited to 100/day per IP. If you are many nodes under a NAT, you should consider distributing internally the docker image through ``docker save`` and ``load commands``).
(Currently for our DockerHub space, downloads are limited to 100/day per IP.
If you are many nodes under a NAT, you should consider distributing internally
the docker image through ``docker save`` and ``load commands``).
## Executing a stardandised simulation & analisys to be stored in OneData repositories for LAGO
## Executing a stardandised simulation & analisys to be stored in OneData repositories for LAGO
This automatised execution is the preferred one in LAGO collaboration.
This automatised execution is the preferred one in LAGO collaboration.
You can execute ``do_sims_onedata.py`` or ``do_showers_onedata.py`` in a single command, without the needed of log into the container. If there is a lack of paramenters, it prompts you for them, if not this starts and the current progress is shown while the results are automatically stored in OneData.
You can execute ``do_sims_onedata.py`` or ``do_showers_onedata.py`` in a single
command, without the needed of log into the container. If there is a lack of
paramenters, it prompts you for them, if not this starts and the current
progress is shown while the results are automatically stored in OneData.
If you count on an standalone server for computing or a virtual machine instantiated with enough procesors memory and disk, you only need add the **-j \<procs\>** param to enable multi-processing:
If you count on an standalone server for computing or a virtual machine
instantiated with enough procesors memory and disk, you only need add the
**-j \<procs\>** param to enable multi-processing:
```sh
```sh
sudo docker run --privileged-eONECLIENT_ACCESS_TOKEN=$TOKEN\
sudo docker run --privileged-eONECLIENT_ACCESS_TOKEN=$TOKEN\
If you has enough permissions (sudo) to run Docker in privileged mode on a cluster and get the computing nodes in exclusive mode, you can run many simulations at time.
If you has enough permissions (sudo) to run Docker in privileged mode on a
cluster and get the computing nodes in exclusive mode, you can run many
### Executing on clusters instantiated by oneself in IaaS cloud providers
### Executing on clusters instantiated by oneself in IaaS cloud providers
1. First you has to create and configure a cluster in the cloud:
1. First you has to create and configure a cluster in the cloud:
- Using the EOSC public cloud, that enables the pre-configuration of Slurm and other schedulers (Kubernetes). [See EOSC-Synergy training for LAGO](https://moodle.learn.eosc-synergy.eu/course/view.php?id=16)
- Using the EOSC public cloud, that enables the pre-configuration of Slurm
and other schedulers (Kubernetes).
[See EOSC-Synergy training for LAGO](https://moodle.learn.eosc-synergy.eu/course/view.php?id=16)
- Using commercial public clouds (Amazon, Azure, Google, etc).
- Using commercial public clouds (Amazon, Azure, Google, etc).
- Using private clouds (institutional orchestators as OpenStack, OpenNebula, XenServer, VMware, etc).
- Using private clouds (institutional orchestators as OpenStack,
OpenNebula, XenServer, VMware, etc).
2. Example for an Slurm instantiated on EOSC resources (pre-configured by IM):
2. Example for an Slurm instantiated on EOSC resources (pre-configured by IM):
You can access to head node through SSH, using ``cloudadm`` account, but then you can gain root privileges with ``sudo``. Slurm and a directory shared by NFS are already configured (/home), but some configruation has to be done: to share the users' directories and to install spackages needed for Docker:
You can access to head node through SSH, using ``cloudadm`` account, but then
you can gain root privileges with ``sudo``. Slurm and a directory shared by NFS
are already configured (/home), but some configruation has to be done: to share
the users' directories and to install spackages needed for Docker:
Every container has different requrirements. To build the ``onedatasim-s0`` container is needed to provide as parameter an official ``lago-corsika`` image as base installation. This is so because ARTI simulations currently call [CORSIKA 7](https://www.ikp.kit.edu/corsika/79.php), which source code is licensed only for the internal use of LAGO collaborators. On the other hand, ``onedatasim-s2`` requires GEANT4/Root, and other official images must be used.
Every container has different requrirements. To build the ``onedatasim-s0``
container is needed to provide as parameter an official ``lago-corsika`` image
as base installation. This is so because ARTI simulations currently call
[CORSIKA 7](https://www.ikp.kit.edu/corsika/79.php), which source code is
licensed only for the internal use of LAGO collaborators. On the other hand,
``onedatasim-s2`` requires GEANT4/Root, and other official images must be used.
On the other hand, other parameters allow choosing ARTI and onedataSim branches, which is fundamental for developing.
On the other hand, other parameters allow choosing ARTI and onedataSim
branches, which is fundamental for developing.
#### Example: building images from default branches (currently "dev")
#### Example: building images from default branches (currently "dev")
#### Example: building ``onedatasim-s0`` from featured branches
#### Example: building ``onedatasim-s0`` from featured branches
If you have the newer releases of *git* installed in your machine, you can build the container with one command. Note that afther the *.git* link, there hare an '#' followed of again the ONEDATASIM_BRANCH name.
If you have the newer releases of *git* installed in your machine, you can build
the container with one command. Note that afther the *.git* link, there hare an
### Logging into container for developing purposes
### Logging into container for developing purposes
1. Runing scripts & attaching a local directory at login.
1. Runing scripts & attaching a local directory at login.
To log into the container only has to run bash without parameters, positioned alwasy at the end of the command. Additionally, You can mount a local directory inside the container the with the parameter **--volume \<local path\>:\<container path\>**. For example:
To log into the container only has to run bash without parameters,
positioned alwasy at the end of the command. Additionally, You can mount
a local directory inside the container the with the parameter
**--volume \<local path\>:\<container path\>**. For example:
### Storing data on testing spaces based on OneData
### Storing data on testing spaces based on OneData
You can use testing spaces such as ``test8`` to store testing runs during development. For this purpose you should the suitable OneData provider and use the the ``--onedata_path`` parameter to select the correct path.
You can use testing spaces such as ``test8`` to store testing runs during
development. For this purpose you should the suitable OneData provider and use
the the ``--onedata_path`` parameter to select the correct path.
For ``test8``, you should choose ceta-ciemat-**02**.datahub.egi.eu and any directory <dir> under the ``--onedata_path /mnt/datahub.egi.eu/test8/<dir>`` path:
For ``test8``, you should choose ceta-ciemat-**02**.datahub.egi.eu and any
directory <dir> under the ``--onedata_path /mnt/datahub.egi.eu/test8/<dir>``