tangos

Tangos is a system for building and querying databases summarising the results of numerical galaxy simulations.

Learn more in the following categories:

View the Project on GitHub

Parallelisation

If you want to speed up time-consuming tangos operations from your command line, you can run many of them in parallel. For this you can either use python’s built-in multiprocessing module, or MPI.

The eligible tangos commands for parallel execution are:

How to run in parallel with multiprocessing (since 1.8.0)

From your command line or job script use:

tangos [normal tangos command here] --backend multiprocessing-<N>

where <N> should be replaced by the number of processes to use. See notes below on how to choose the number of processes (it’s not necessarily just the number of cores available).

How to run in parallel with mpi4py

You first need to install MPI and mpi4py on your machine. This is straight-forward with, for example, anaconda python distributions – just type conda install mpi4py. With regular python distributions, you need to install MPI on your machine and then pip install mpi4py (which will compile the python bindings).

Once this has successfully installed, you can run in parallel, using the following from your command line:

mpirun -np <N> [normal tangos command here] --backend mpi4py

Here,

Alternatively you can use the pypar library to interface with MPI. If you specify no backend tangos will default to running in single-processor mode which means MPI will launch N processes that are not aware of each other’s presence. This is very much not what you want. Limitations in the MPI library mean it’s not possible for tangos to reliably auto-detect it has been MPI-launched.

Important options for tangos write

For tangos write, there are multiple parallelisation modes. For best results, it’s important to understand which is appropriate for your use case – they balance file IO, memory usage and throughput differently.

The default mode parallelises at the snapshot level. Each worker loads an entire snapshot at once, then works through the halos within that snapshot. This is highly efficient in terms of communication requirements. However, for large simulations it can cause memory usage to be overly high, and it also means that shared disks can be asked for a lot of IO all at the same time.

In case either the memory or IO requirements of this approach are not feasible, tangos write accepts a --load-mode argument:

Older load modes

The below are still available, but are less flexible and are rarely likely to be the best option.

tangos write worked example

Let’s consider the longest process in the tutorials which involves writing images and more to the changa+AHF tutorial simulation.

Some of the underlying pynbody manipulations are already parallelised. One can therefore experiment with different configurations but experience suggests the best option is to switch off all pynbody parallelisation (i.e. set the number of threads to 1) and allow tangos complete control. This is because only some pynbody routines are parallelised whereas tangos is close to embarassingly parallel. Once pynbody threading is disabled, the version of the above command that is most efficient is:

mpirun -np 5 tangos write dm_density_profile gas_density_profile uvi_image --with-prerequisites --include-only="NDM()>5000" --include-only="contamination_fraction<0.01" --for tutorial_changa --backend mpi4py --load-mode server-shared-mem

for a machine with 4 processors.

How many processes? Note that the number of processes was actually one more than the number of processors. This is because the server process generally only uses the CPU while other processes are blocked. So, using only 4 processes would leave one processor idle most of the time.

What load mode? Here we chose --load-mode=server-shared-mem by considering the possibilities:

For tangos link and tangos crosslink, parallelisation is currently implemented only at the snapshot level. Suppose you have a simulation with M particles. Each worker loads needs to store at least 2M integers at any one time (possibly more depending on the underlying formats) in order to work out the merger tree.

Consequently for large simulations, you may need to use a machine with lots of memory and/or use fewer processes than you have cores available.

tangos add

The tangos add tool is also only parallelised at the snapshot level, but it does not need to load much data per snapshot and so this does not normally pose a problem.