Halos in Pynbody

Finding the groups of particles that represent galaxies is the key first step of simulation analysis. There are several public group finders available today that find these groups of particles. Pynbody includes interfaces to several commonly used group finders.

Groups that are virialized are called “halos”, so groups are available in pynbody using the halos function on a simulation object (SimSnap). When halos is called, pynbody creates a HaloCatalogue that consists of Halo objects. The Halo object holds information about the particle IDs and other properties about a given halo. A HaloCatalogue is an object is a compilation of all the halos in a given snapshot.

Some possible halo finders that pynbody recognises include:

The halos() function in SimSnap automatically determines whether halo data exists on disk already for pynbody to read, or if it should run a halo finder to create halo data.

This tutorial will show you how to setup and configure pynbody to best use the group finder functionality built into pynbody. If you are not familiar with Pynbody in general, it is recommended that you first have a look at the Basic snapshot manipulation tutorial.

Configuration

pynbody reads a number of different halo formats including the popular subfind. However, it is most comfortable with either AHF or (more experimentally) Rockstar and can in many cases actually run these codes for you if you haven’t already generated halo catalogues for your simulation.

Rockstar

To install Rockstar, grab the code from Peter Behroozi’s bitbucket repository, make it, and copy it into your $PATH

> git clone https://bitbucket.org/pbehroozi/rockstar-galaxies.git
> cd rockstar-galaxies; make
> cp rockstar-galaxies ~/bin/

AHF

To install AHF, take the most recent version from Alexander Knebe’s AHF page, uncompress it

> wget http://popia.ft.uam.es/AHF/files/ahf-v1.0-084.tgz
> tar zxf ahf-v1.0-084.tgz; cd ahf-v1.0-084

Edit Makefile.config appropriate for your code, make AHF, and copy it into your $PATH.

> make AHF
> cp AHF-v1.0-084 ~/bin/

Now pynbody will use one of these halo finders to create group files you can use to analyze your simulations.

Configuration

As described in Configuring pynbody, you can tell pynbody which group finder you prefer in your configuration file, ~/.pynbodyrc. In the general section, you can arrange the priority of halo finders to use as you like.

Working with Halos and Catalogues

We will use the AHF catalogue here since that is the one that is available for the sample output in the testdata bundle. The SubFind specific halo / subhalo structure is handled later.

In [1]: import pynbody, matplotlib.pylab as plt

In [2]: s = pynbody.load('testdata/g15784.lr.01024.gz')

In [3]: s.physical_units()

We’ve got the snapshot loaded, now we ask pynbody to load any available halo catalogue:

In [4]: h = s.halos()

h is the halo catalogue.

Note

If the halo finders have to run to find the groups, they may take some time. AHF typically takes 5 minutes for a million particle simulation while Rockstar takes 5-10 minutes running on a single processor.

We can easily retrieve some basic information, like the total number of halos in this catalogue:

In [5]: len(h)
Out[5]: 1411

To actually access a halo, use square bracket syntax. For example, the following returns the number of particles in halos 1 and 2

In [6]: len(h[1]), len(h[2])
Out[6]: (502300, 82543)

The catalogue has halos ordered by number of particles, so the first halo for this zoom simulation will be the one we would most likely be interested in. Halo IDs begin with 1 for many halo finders (including AHF, which is the sample file being used here).

As may now be evident, “halos” are treated using the SubSnap class. The syntax for dealing with an individual halo therefore precisely mirrors the syntax for dealing with an entire simulation. For example, we can get the total mass in halo 1 and see the position of its first few particles as follows:

In [7]: h[1]['mass'].sum().in_units('1e12 Msol')
Out[7]: SimArray(1.69639241, '1.00e+12 Msol')

In [8]: h[1]['pos'][:5]
Out[8]: 
SimArray([[ 1612.48349829, -2366.71869028, -8636.70597822],
          [ 1686.51999651, -2283.22670956, -8590.83078126],
          [ 1598.7917708 , -2379.39974068, -8668.24339003],
          [ 1574.22601517, -2269.69573101, -8562.83392873],
          [ 1645.0148744 , -2306.01402053, -8661.97775342]], 'kpc')

A really common use-case is that one wants to center the simulation on a given halo and analyze some of its properties. Since halos are just SubSnap objects, this is easy to do:

In [9]: pynbody.analysis.halo.center(h[1])
Out[9]: <pynbody.transformation.GenericTranslation at 0x7f9288e4d2e0>

In [10]: im = pynbody.plot.image(h[1].d, width = '500 kpc', cmap=plt.cm.Greys, units = 'Msol kpc^-2')
../_images/halo1_image.png

Halo catalogue information

Any additional information generated by the halo finder is available through the properties dictionary associated with halos. For example

In [11]: h[1].properties['children']
Out[11]: []

returns a list of sub-halos of this halo. Here there are no sub-halos, so we’ve been returned an empty list. To see everything that is known about the halo one can use the standard python dictionary method keys:

In [12]: h[1].properties.keys()
Out[12]: dict_keys(['omegaM0', 'omegaL0', 'h', 'boxsize', 'a', 'time', 'halo_id', 'npart', 'nvpart', 'Xc', 'Yc', 'Zc', 'VXc', 'VYc', 'VZc', 'mass', 'Rvir', 'Vmax', 'Rmax', 'sigV', 'lambda', 'Lx', 'Ly', 'Lz', 'a_axis', 'Eax', 'Eay', 'Eaz', 'b_axis', 'Ebx', 'Eby', 'Ebz', 'c_axis', 'Ecx', 'Ecy', 'Ecz', 'ovdens', 'Redge', 'nbins', 'Ekin', 'Epot', 'mbp_offset', 'com_offset', 'r2', 'lambdaE', 'v_esc', 'Phi0', 'n_gas', 'M_gas', 'lambda_gas', 'Lx_gas', 'Ly_gas', 'Lz_gas', 'a_gas', 'Eax_gas', 'Eay_gas', 'Eaz_gas', 'b_gas', 'Ebx_gas', 'Eby_gas', 'Ebz_gas', 'c_gas', 'Ecx_gas', 'Ecy_gas', 'Ecz_gas', 'Ekin_gas', 'Epot_gas', 'lambdaE_gas', 'n_star', 'M_star', 'lambda_star', 'Lx_star', 'Ly_star', 'Lz_star', 'a_star', 'Eax_star', 'Eay_star', 'Eaz_star', 'b_star', 'Ebx_star', 'Eby_star', 'Ebz_star', 'c_star', 'Ecx_star', 'Ecy_star', 'Ecz_star', 'Ekin_star', 'Epot_star', 'lambdaE_star', 'fstart', 'children'])

Dealing with big simulations and lots of halos

Sometimes, simulations are too large to fit in the memory of your analysis machine. On the other hand, pynbody never actually loads particle data until it’s needed so it is possible to load a halo catalogue anyway.

Consider the following example.

In [13]: f = pynbody.load("testdata/g15784.lr.01024")

In [14]: h = f.halos()

In [15]: h[2].properties['mass']/1e12 # another property calculated by AHF in Msol/h
Out[15]: 11.9684

In [16]: len(h[2])
Out[16]: 82543

At no point does this load data from the simulation file; it only accesses the halo catalogue. In fact, with some formats (including AHF, which is what’s in our sample test data here), you can specify dummy=True to load only the properties dictionary:

In [17]: h = f.halos(dummy=True)

In [18]: h[2].properties['mass'] # this is still OK
Out[18]: 11968400000000.0

In [19]: len(h[2]) # this, of course, is unknown
Out[19]: 82543

Note

The remainder of this section requires the underlying snapshot loader to support partial loading, which is currently only the case for tipsy and nchilada formats. See Code-specific loaders.

Combined with pynbody’s partial-loading system, one can go further and pull only a single halo into your computer’s memory at once. The following example shows you how:

In [20]: h2data = h.load_copy(2)

In [21]: len(h2data) # this is correct again
Out[21]: 82543

In [22]: h2data['mass']
Out[22]: 
SimArray([2.23517413e-10, 2.23517413e-10, 2.23517413e-10, ...,
          1.14440915e-07, 1.14440915e-07, 1.14440915e-07], '4.75e+16 Msol')

As you can see from the last line, you can now access particle arrays but the key difference is that h2data as constructed above only loads the particles that are required. Conversely accessing arrays directly from h[2] actually loads the full simulation array into memory, even if only part of it is ever going to be used.

Write halo catalog (i.e. convert AHF outfiles to tipsy format)

Tipsy is a particle viewer. A tipsy format file can be useful for quick viewing in tipsy to check whether the AHF halo finder did anything sensible. Write the (ahf) halo catalog to disk. Former idl users might notice that this produces outfiles similar to ‘Alyson’s idl script’.

The 3 written file types are:

  1. .gtp (tipsy file with halos as star particles);

  2. .grp (ascii halo id of every snapshot particle, 0 if none);

  3. .stat (ascii condensed version of AHF halos file).

This halo file set emulates the halo finder SKID. Tipsy and skid can be found at http://www-hpcc.astro.washington.edu/tools/.

Working with SubFind Halos and Subhalos

If using the Gadget3 SubFind HDF5 output (for example, OWLS / Eagle or Smaug sims) most of the examples from AHF above can be used, except for the subhalos structure. One major change is that the halo catalogue is a separate file to the snapshot.

In [23]: import pynbody, matplotlib.pylab as plt

In [24]: s = pynbody.load('testdata/Test_NOSN_NOZCOOL_L010N0128/data/snapshot_103/snap_103.hdf5')

In [25]: s.physical_units()

We’ve got the snapshot loaded and can access the particle data in any manner we like as usual but unlike AHF we can’t load halos. Instead to get pynbody to load the halo catalogue we have to access the subfind output directly:

In [26]: s = pynbody.load('testdata/Test_NOSN_NOZCOOL_L010N0128/data/subhalos_103/subhalo_103')

In [27]: s.physical_units()

In [28]: h = s.halos()

h is the Friends-of-Friends (FOF) halo catalogue, upon which SubFind is based.

As with the AHF example we can easily retrieve some basic information, like the total number of halos in this catalogue:

In [29]: len(h), h.ngroups, h.nsubhalos
Out[29]: (4226, 4226, 3294)

Where the last value is the number of subhalos, see next section on these. To actually access a halo, use square bracket syntax as before. For example, the following returns the number of particles in halos 1 and 2

In [30]: len(h[1]), len(h[2])
Out[30]: (5064, 4910)

The catalogue has FOF halos ordered by number of particles, so the first halo for this small box simulation will be the largest object. Halo IDs begin with 0 for SubFind / FOF unlike AHF.

The “halos” are treated using the SubFindHDFSnap class. The syntax for dealing with an individual halo is the same as AHF and the snapshot simulation. For example, we can get the total mass in the second FOF halo and see the position of its first few particles as follows:

In [31]: h[1]['mass'].sum().in_units('1e12 Msol')
Out[31]: SimArray(0.12650432, '1.00e+12 Msol')

In [32]: h[1]['pos'][:5]
Out[32]: 
SimArray([[2361.6180398 , 1970.33215876,  790.31792316],
          [2361.5917753 , 1970.42023542,  790.38369763],
          [2361.5818129 , 1970.40982018,  790.26443187],
          [2361.68822943, 1970.38491419,  790.33558378],
          [2361.54785018, 1970.31540382,  790.31322499]], 'kpc')

A really common use-case is that one wants to center the simulation on a given halo and analyze some of its properties:

In [33]: pynbody.analysis.halo.center(h[1], vel=False)
Out[33]: <pynbody.transformation.GenericTranslation at 0x7f9289445a30>

In [34]: im = pynbody.plot.image(h[1].d, width = '40 kpc', cmap=plt.cm.Greys, units = 'Msol kpc^-2')
../_images/halo1_image_subfind.png

Subhalo catalogue information

After the FOF group has been found, SubFind runs on this reduced particle list to determine gravitational bound substructures (or subhalos) within the larger FOF halo. To access the list of subhalos simply call:

In [35]: h[1].sub[:]
Out[35]: 
[<SimSnap "testdata/Test_NOSN_NOZCOOL_L010N0128/data/subhalos_103/subhalo_103:fof_group_1_subhalo_0" len=4353>,
 <SimSnap "testdata/Test_NOSN_NOZCOOL_L010N0128/data/subhalos_103/subhalo_103:fof_group_1_subhalo_1" len=658>]

to return a list of sub-halos of this halo. Then one can select subhalo particles as before (e.g. dark matter velocities):

In [36]: h[1].sub[0].d['vel']
Out[36]: 
SimArray([[  30.14551033,  -18.16250408,   19.76518634],
          [ -10.09107784,  -20.86471745,   14.59541386],
          [  30.12826743,   28.85348224,  -61.51923505],
          ...,
          [  50.27657113, -172.77083383,  -24.60836056],
          [   3.33786135, -166.7441768 ,   19.80777034],
          [  71.04409011, -192.64996009,  -21.51857116]], 'km s**-1')

for the main (i.e. first) subhalo of the second FOF halo. As with AHF additional halo catalogue values such as the centre of mass, or the velocity dispersion, can be accessed by the properties list for each halo / subhalo. Note that the subhalo properties list is far more extensive than the FOF halo:

In [37]: h[2].properties
Out[37]: 
{'a': 0.1666666,
 'omegaB0': 0.0458,
 'omegaM0': 0.275,
 'omegaL0': 0.725,
 'boxsize': Unit("2.37e+03 kpc"),
 'h': 0.702,
 'time': Unit("1.23e+00 kpc s km**-1"),
 'Flag_Cooling': 0,
 'Flag_Feedback': 1,
 'Flag_Metals': 1,
 'Flag_Sfr': 1,
 'Flag_StellarAge': 1,
 'MassTable': array([0.        , 0.00303354, 0.        , 0.        , 0.        ,
        0.        ]),
 'NumFilesPerSnapshot': 1,
 'NumPart_ThisFile': array([277386, 279009,      0,      0,   1635,      0], dtype=int32),
 'NumPart_Total': array([0, 0, 0, 0, 0, 0], dtype=uint32),
 'NumPart_Total_HighWord': array([0, 0, 0, 0, 0, 0], dtype=uint32),
 'Redshift': 5.00000240000096,
 'RunLabel': b'"SubFind Parent Box 10 with 128 WMAP7"\n',
 'halo_id': 2,
 'CenterOfMass': SimArray([4.10285216, 6.03583403, 0.52038905], '3.09e+24 cm a h**-1'),
 'CenterOfMassVelocity': SimArray([-0.43261364,  7.97283217, -3.20020145], '1.00e+05 cm a**1/2 s**-1'),
 'Mass': SimArray(8.55388652, '1.99e+43 g h**-1'),
 'MassType': SimArray([1.30246423, 6.97107166, 0.        , 0.        , 0.28035063,
           0.        ], '1.99e+43 g h**-1'),
 'StarFormationRate': SimArray(21.57577237, 'Msol yr**-1'),
 'EnvironmentMass': SimArray(5.66983794, '1.99e+43 g h**-1'),
 'FirstSubOfHalo': SimArray(8., '1.00e+00'),
 'Halo_M_Crit200': SimArray(5.66983788, '1.99e+43 g h**-1'),
 'Halo_M_Crit2500': SimArray(2.06321389, '1.99e+43 g h**-1'),
 'Halo_M_Crit500': SimArray(3.71012813, '1.99e+43 g h**-1'),
 'Halo_M_Mean200': SimArray(5.70745231, '1.99e+43 g h**-1'),
 'Halo_M_Mean2500': SimArray(2.09112239, '1.99e+43 g h**-1'),
 'Halo_M_Mean500': SimArray(3.74168088, '1.99e+43 g h**-1'),
 'Halo_M_TopHat200': SimArray(6.10513212, '1.99e+43 g h**-1'),
 'Halo_R_Crit200': SimArray(0.09568132, '3.09e+24 cm a h**-1'),
 'Halo_R_Crit2500': SimArray(0.02943404, '3.09e+24 cm a h**-1'),
 'Halo_R_Crit500': SimArray(0.06120673, '3.09e+24 cm a h**-1'),
 'Halo_R_Mean200': SimArray(0.09628485, '3.09e+24 cm a h**-1'),
 'Halo_R_Mean2500': SimArray(0.0296866, '3.09e+24 cm a h**-1'),
 'Halo_R_Mean500': SimArray(0.06162657, '3.09e+24 cm a h**-1'),
 'Halo_R_TopHat200': SimArray(0.10221511, '3.09e+24 cm a h**-1'),
 'NsubPerHalo': SimArray(6., '1.00e+00')}

In [38]: h[2].properties['CenterOfMass']
Out[38]: SimArray([4.10285216, 6.03583403, 0.52038905], '3.09e+24 cm a h**-1')

In [39]: h[2].sub[4].properties
Out[39]: 
{'a': 0.1666666,
 'omegaB0': 0.0458,
 'omegaM0': 0.275,
 'omegaL0': 0.725,
 'boxsize': Unit("2.37e+03 kpc"),
 'h': 0.702,
 'time': Unit("1.23e+00 kpc s km**-1"),
 'Flag_Cooling': 0,
 'Flag_Feedback': 1,
 'Flag_Metals': 1,
 'Flag_Sfr': 1,
 'Flag_StellarAge': 1,
 'MassTable': array([0.        , 0.00303354, 0.        , 0.        , 0.        ,
        0.        ]),
 'NumFilesPerSnapshot': 1,
 'NumPart_ThisFile': array([277386, 279009,      0,      0,   1635,      0], dtype=int32),
 'NumPart_Total': array([0, 0, 0, 0, 0, 0], dtype=uint32),
 'NumPart_Total_HighWord': array([0, 0, 0, 0, 0, 0], dtype=uint32),
 'Redshift': 5.00000240000096,
 'RunLabel': b'"SubFind Parent Box 10 with 128 WMAP7"\n',
 'halo_id': 4,
 'CenterOfMass': SimArray([4.04154364, 6.01377394, 0.52375431], '3.09e+24 cm a h**-1'),
 'CenterOfMassVelocity': SimArray([-110.75990997,  -42.79930189,   12.74084843], '1.00e+05 cm s**-1'),
 'GasSpin': SimArray([-0.00720652,  0.00422568,  0.00261926], '3.09e+29 cm**2 s**-1 h**-1'),
 'GrNr': SimArray(2., '1.00e+00'),
 'InertiaTensor': SimArray([ 4.03272663e-08,  1.44714877e-08, -1.30650405e-08,
            1.44714877e-08,  2.74477659e-08, -2.03776720e-08,
           -1.30650405e-08, -2.03776720e-08,  1.09618478e-07], '1.00e+00'),
 'KineticEnergy': SimArray(43.73894562, '1.99e+53 cm**2 g h**-1 s**-2'),
 'Mass': SimArray(0.07278115, '1.99e+43 g h**-1'),
 'MassType': SimArray([0.01114693, 0.06067077, 0.        , 0.        , 0.00096345,
           0.        ], '1.99e+43 g h**-1'),
 'NSFSpin': SimArray([-0.01060776,  0.00725118,  0.00515794], '3.09e+29 cm**2 s**-1 h**-1'),
 'Position': SimArray([4.04084158, 6.01266718, 0.52605957], '3.09e+24 cm a h**-1'),
 'SFSpin': SimArray([-0.00051066, -0.00173048, -0.00237851], '3.09e+29 cm**2 s**-1 h**-1'),
 'StarFormationRate': SimArray(0.02626844, 'Msol yr**-1'),
 'StarSpin': SimArray([-0.00552418, -0.00141915,  0.00129262], '3.09e+29 cm**2 s**-1 h**-1'),
 'SubHalfMass': SimArray([0.00133806, 0.00118501, 0.        , 0.        , 0.        ,
           0.        ], '3.09e+24 cm h**-1'),
 'SubHalfMassProj': SimArray([0.00088806, 0.00087589, 0.        , 0.        , 0.        ,
           0.        ], '3.09e+24 cm h**-1'),
 'SubMostBoundID': SimArray(189504., '1.00e+00'),
 'SubParentHalo': SimArray(0., '1.00e+00'),
 'SubSpin': SimArray([-0.00650772, -0.00060551, -0.00222365], '3.09e+29 cm**2 s**-1 h**-1'),
 'SubStellarVelDisp': SimArray(5.86365743, '1.00e+05 cm s**-1'),
 'SubStellarVelDispHalfProj': SimArray(7.82626988, '1.00e+05 cm s**-1'),
 'SubVelDisp': SimArray(20.69081247, '1.00e+05 cm s**-1'),
 'SubVmax': SimArray(37.465686, '1.00e+05 cm s**-1'),
 'SubVmaxRad': SimArray(0.00157904, '3.09e+24 cm h**-1'),
 'ThermalEnergy': SimArray(2.66547059, '1.99e+53 cm**2 g h**-1 s**-2'),
 'TotalEnergy': SimArray(-25.37823918, '1.99e+53 cm**2 g h**-1 s**-2')}

In [40]: h[2].sub[4].properties['CenterOfMass']
Out[40]: SimArray([4.04154364, 6.01377394, 0.52375431], '3.09e+24 cm a h**-1')

To access the entire dataset of a given property (say all of the Stellar Velocity Dispersions) requires an embedded for loop over the HDF5 catalogue and appending to an array:

In [41]: SubStellarVelDisp = [[subhalo.properties['SubStellarVelDisp'] for subhalo in halo.sub] for halo in h]

In [42]: SubStellarVelDisp[5]
Out[42]: 
[SimArray(41.94170324, '1.00e+05 cm s**-1'),
 SimArray(33.54862684, '1.00e+05 cm s**-1'),
 SimArray(17.48874067, '1.00e+05 cm s**-1'),
 SimArray(0., '1.00e+05 cm s**-1')]