A walk through pynbody’s low-level facilities

The following example shows how to load a file, determine various attributes, access some data and make use of unit information.

If you’re more interested in making pretty pictures and plots straight away, you may wish to read the basic facilities tutorial first.

Note

This tutorial assumes basic familiarity with python and is written as a series of ipython cells, like a Jupyter notebook. Before you start make sure pynbody is properly installed. See Pynbody Installation for more information. You will also need the standard pynbody test files if you want to follow the tutorial. These files are available separately here: <https://github.com/pynbody/pynbody/releases> (testdata.tar.gz).

After you have extracted the testdata folder (e.g. with tar -xzf testdata.tar.gz), launch ipython or Jupyter. At the prompt or in a cell, type import pynbody. If all is installed correctly, this should silently succeed, and you are ready to use pynbody.

First steps

We will start by loading one of the test files. We’ll also load the numpy module as it provides some functions we’ll make use of later.

In [1]: import pynbody

In [2]: import numpy as np

In [3]: f = pynbody.load("testdata/test_g2_snap")

Here we’ve loaded a sample gadget file. Not much seems to have happened when you called pynbody.load(), but the variable f is now your calling point for accessing data.

In fact f is an object known as a SimSnap (see pynbody.snapshot.SimSnap).

Behind the scenes, the function inspects the provided path and decides what file format it is. At the time of writing, supported file formats include tipsy, nchilada, gadget-1, gadget-2, gadget-HDF and ramses. For most purposes you should never need to know what type of file you are dealing with – that’s the whole point of the pynbody framework.

Note

If you look inside the testdata folder, you’ll notice that our test snapshot is actually an example of a spanned gadget snapshot. There is not actually a file called test_g2_snap, but rather two files from two CPUs, test_g2_snap.0 and test_g2_snap.1. pynbody knows to load both of these if you ask for test_g2_snap; if you ask for test_g2_snap.1 (for instance), it’ll load only that particular file.

The SimSnap object that’s returned is currently a fairly empty holder for data. The data will be loaded from disk as and when you request it.

Finding out something about the file

Let’s start to inspect the file we’ve opened. The standard python operator len can be used to query the number of particles in the file:

In [4]: len(f)
Out[4]: 8192

We can also find out about particles of a particular type or family as it is known within pynbody. To find out which families are present in the file, use families():

In [5]: f.families()
Out[5]: [<Family gas>, <Family dm>, <Family star>]

You can pick out just the particles belonging to a family by using the syntax f.family. So, for example, we can see how many particles of each type are present:

In [6]: len(f.dm)
Out[6]: 4096

In [7]: len(f.gas)
Out[7]: 4039

In [8]: len(f.star)
Out[8]: 57

Useful information about the file is stored in a python dictionary called properties:

In [9]: f.properties
Out[9]: 
{'omegaM0': 0.2669,
 'omegaL0': 0.7331,
 'boxsize': Unit("3.00e+03 kpc a h**-1"),
 'a': 0.2777777798158637,
 'h': 0.71,
 'time': Unit("3.55e+00 s kpc a**1/2 km**-1 h**-1")}

Like any python dictionary, specific properties can be accessed by name:

In [10]: f.properties['a']
Out[10]: 0.2777777798158637

These names are standardized across different file formats. Here for example z means redshift, a means the cosmological scalefactor, h indicates the Hubble constant in standard units (100 km/s/Mpc).

Note

Actually f.properties has some behaviour which is very slightly different from a normal python dictionary. For further information see SimDict.

Retrieving data

Like f.properties, f itself also behaves like a python dictionary. The standard python method f.keys() returns a list of arrays that are currently in memory.

In [11]: f.keys()
Out[11]: []

Right now it’s empty! That’s actually correct because data is only retrieved when you first access it. To find out what could be loaded, use the pynbody-specific method f.loadable_keys().

In [12]: f.loadable_keys()
Out[12]: ['iord', 'pos', 'vel', 'mass']

This looks a bit more promising. To access data, simply use the normal dictionary syntax. For example f['pos'] returns an array containing the 3D-coordinates of all the particles.

In [13]: f['pos']
Out[13]: 
SimArray([[  53.318974,  177.84364 ,  128.22311 ],
          [ 306.75046 ,  140.44455 ,  215.37149 ],
          [ 310.99908 ,   64.1345  ,  210.53595 ],
          ...,
          [2870.9016  , 2940.1711  , 1978.7949  ],
          [2872.4114  , 2939.2197  , 1983.916   ],
          [2863.6511  , 2938.0544  , 1980.0615  ]], dtype=float32, 'kpc a h**-1')

Note

Array names are standardized across all file formats. For instance, even if you load a Gadget-HDF file – which internally refers to the position array as coordinates – you still access that array from pynbody by the name pos. The intention is that code never needs to be adapted simply because you have switched file format. However the name mapping is fully configurable should you wish to adopt different conventions.

Some arrays are stored only for certain families. For example, densities are stored only for gas particles and are accessed as f.gas['rho']. To find out what arrays are available for the gas family, use f.gas.loadable_keys():

In [14]: f.gas.loadable_keys()
Out[14]: 
['nheq',
 'nhe',
 'iord',
 'nhp',
 'sfr',
 'pos',
 'u',
 'nh',
 'smooth',
 'vel',
 'nhep',
 'mass',
 'rho']

So, we can get the density of the gas particles like this:

In [15]: f.gas['rho']
Out[15]: 
SimArray([1.3888609e-09, 3.3617684e-09, 4.5273674e-09, ..., 8.5340952e-09,
          7.4101774e-09, 1.4051752e-09], dtype=float32, '1.00e+10 h**2 Msol kpc**-3 a**-3')

Note

The SimArray objects are actually numpy arrays with some added functionality (such as unit tracking, discussed below). Numerical operations are very nearly as fast as their numpy equivalents. However, if you want to squeeze the performance of your code, you can always get a vanilla numpy array by using the numpy view mechanism, e.g. f.gas['rho'].view(type=numpy.ndarray)

Creating your own arrays

You can create arrays using the obvious assignment syntax:

In [16]: f['twicethemass'] = f['mass']*2

You can also define new arrays for one family of particles:

In [17]: f.gas['myarray'] = f.gas['rho']**2

An array created in this way exists only for the gas particles; trying to access it for other particles raises an exception.

Alternatively, you can define derived arrays which are calculated (and re-calculated) on demand. For example,

In [18]: @pynbody.derived_array
   ....: def thricethemass(sim) :
   ....:     return sim['mass']*3
   ....: 

At this point, nothing has been calculated. However, when you ask for the array, the values are calculated and stored

In [19]: f['thricethemass']
Out[19]: 
SimArray([0.02464408, 0.02464408, 0.02464408, ..., 0.02464408, 0.02464408,
          0.02464408], dtype=float32, '1.00e+10 Msol h**-1')

This has the advantage that your new thricethemass array is automatically updated when you change the mass array:

In [20]: f['mass'][0] = 1

In [21]: f['thricethemass']
Out[21]: 
SimArray([3.        , 0.02464408, 0.02464408, ..., 0.02464408, 0.02464408,
          0.02464408], dtype=float32, '1.00e+10 Msol h**-1')

Note, however, that the array is not re-calculated every time you access it, only if the mass array has changed. Therefore you don’t waste any time by using derived arrays. For more information see the reference documentation for derived arrays.

Keeping on top of units

You might have noticed in the output from the above experiments that pynbody keeps track of unit information whenever it can.

Warning

It’s worth understanding exactly where pynbody gets this information from, in case anything goes wrong. In the case of Ramses, and Gadget-HDF files the unit information is stored within your snapshot, and pynbody takes advantage of this. For old-style Gadget snapshots, the default cosmological gadget setup is assumed. For nchilada and tipsy, an nchilada or gasoline .param file is sought in the directory from which you are loading the snapshot and its immediate parent. You can also create a text file with the same name as your snapshot but the extension .units to override the units at load time. For example, such a file can contain

pos: kpc a
vel: km s^-1
mass: Msol

to specify distance units are comoving kiloparsecs, velocity units are kilometers per second, and mass is in solar masses.

You can print out the units of any given array by accessing the units property:

In [22]: f['mass'].units
Out[22]: Unit("1.00e+10 Msol h**-1")

However, it’s usually more helpful to simply convert your arrays into something more managable than the internal units. Pynbody arrays can be converted using the in_units() function; just pass in a string representing the units you want.

In [23]: f['pos'].in_units('Mpc')
Out[23]: 
SimArray([[0.02086032, 0.06957889, 0.05016554],
          [0.12001192, 0.05494701, 0.08426115],
          [0.12167414, 0.02509174, 0.08236931],
          ...,
          [1.1232009 , 1.1503017 , 0.7741764 ],
          [1.1237916 , 1.1499294 , 0.77617997],
          [1.1203643 , 1.1494735 , 0.774672  ]], dtype=float32, 'Mpc')

Note

The function in_units() returns a copy of your array in new units. Next time you access f['pos'] it will be back in its original units. If you want to permanently convert the array in-place use convert_units() or see below.

Another option is to request that pynbody converts all your arrays into something sensible, using physical_units(),

In [24]: f.physical_units()

Take a look at what’s happened to the density:

In [25]: f.gas['rho']
Out[25]: 
SimArray([ 326.6502 ,  790.66406, 1064.8047 , ..., 2007.1586 , 1742.821  ,
           330.4872 ], dtype=float32, 'Msol kpc**-3')

Note that the conversion will also be made when loading any arrays in future; for example:

In [26]: f['vel']
Out[26]: 
SimArray([[ 27.938293 ,   4.983705 , -10.008866 ],
          [ 15.361564 ,   5.7859726,   4.3631563],
          [ -8.357319 ,  -2.8885257,  22.809904 ],
          ...,
          [ 27.749176 ,  85.60175  ,  15.532437 ],
          [ 40.755856 ,  59.442867 ,  44.244846 ],
          [ 38.383965 ,  68.63973  ,  46.01429  ]], dtype=float32, 'km s**-1')

A new array generated from a unary or binary operation will inherit the correct units. For example

In [27]: 5*f['vel']
Out[27]: 
SimArray([[139.69147 ,  24.918526, -50.04433 ],
          [ 76.807816,  28.929863,  21.81578 ],
          [-41.786594, -14.442629, 114.04952 ],
          ...,
          [138.74588 , 428.00876 ,  77.662186],
          [203.77928 , 297.21432 , 221.22423 ],
          [191.91983 , 343.19867 , 230.07144 ]], dtype=float32, 'km s**-1')

In [28]: (f['vel']**2).units
Out[28]: Unit("km**2 s**-2")

In [29]: np.sqrt(((f['vel']**2).sum(axis=1)*f['mass'])).units
Out[29]: Unit("km Msol**1/2 s**-1")

You can even associate arrays with the loaded SimSnap unit system even when you create them outside the SimSnap. This is useful for keeping things tidy with your unit conversions if you are calculating quantities that don’t apply to all of the particles. For instance:

In [30]: array = pynbody.array.SimArray(np.random.rand(10)) # make the newly-formed numpy array a pynbody array

In [31]: array.sim = f # this links the array to the simulation

In [32]: array.units = 'Mpc a' # we set units that require cosmology information

In [33]: array
Out[33]: 
SimArray([0.85226331, 0.23154282, 0.47172   , 0.00602858, 0.45068116,
          0.27028286, 0.23957908, 0.33810041, 0.79071016, 0.27698084], 'Mpc a')

In [34]: array.in_units('kpc')
Out[34]: 
SimArray([236.73980995,  64.31744957, 131.03333337,   1.67460631,
          125.18921295,  75.07857348,  66.54974572,  93.91678135,
          219.64171327,  76.93912349], 'kpc')

Note that the units were correctly converted into physical units in the last step.

For more information see the reference documentation for pynbody.units.

Subsnaps

An important concept within pynbody is that of a subsnap. These are objects that look just like a SimSnap but actually only point at a subset of the particles within a parent. Subsnaps are always instances of the SubSnap class.

You’ve already seen some examples of subsnaps, actually. When you accessed f.gas or f.dm, you’re given back a subsnap pointing at only those particles. However, subsnaps can be used in a much more general way. For example, you can use python’s normal array slicing operations. Here we take every tenth particle:

In [35]: every_tenth = f[::10]

In [36]: len(every_tenth)
Out[36]: 820

In common with python’s normal mode of working, this does not copy any data, it merely creates another pointer into the existing data. As an example, let’s modify the position of one of our particles in the new view:

In [37]: every_tenth['pos'][1]
Out[37]: SimArray([140.2888 , 122.21798,  75.80529], dtype=float32, 'kpc')

In [38]: every_tenth['pos'][1] = [1,2,3]

In [39]: every_tenth['pos'][1]
Out[39]: SimArray([1., 2., 3.], dtype=float32, 'kpc')

This change is reflected in the main snapshot.

In [40]: f['pos'][10]
Out[40]: SimArray([1., 2., 3.], dtype=float32, 'kpc')

Note

If you’re used to numpy’s flexible indexing abilities, you might like to note that, typically, f[array_name][index] == f[index][array_name]. The difference is that applying the index to the whole snapshot is more flexible and can lead to simpler code. In particular, numpy_array[index] may involve copying data whereas f[index] never does; it always returns a new object pointing back at the old one.

You can pass in an array of boolean values representing whether each successive particle should be included (True) or not (False). This allows the use of numpy’s comparison operators. For example:

In [41]: f_slab = f[(f['x']>1000)&(f['x']<2000)]

In [42]: f_slab['x'].min()
Out[42]: SimArray(1000.11975, dtype=float32, 'kpc')

In [43]: f_slab['x'].max()
Out[43]: SimArray(1173.6926, dtype=float32, 'kpc')

In [44]: f['x'].min()
Out[44]: SimArray(0.04504352, dtype=float32, 'kpc')

In [45]: f['x'].max()
Out[45]: SimArray(1173.6926, dtype=float32, 'kpc')

Here f_slab is pointing at only those particles which have x-coordinates between 1000 and 2000.

Note that subsnaps really do behave exactly like snapshots. So, for instance, you can pick out sub-subsnaps or sub-sub-subsnaps.

In [46]: len(f_slab.dm)
Out[46]: 1236

In [47]: len(f_slab.dm[::10])
Out[47]: 124

In [48]: f_slab[[100,105,252]].gas['pos']
Out[48]: 
SimArray([[1029.6443 ,  421.70737,  169.25818],
          [1097.2487 ,  377.8681 ,  149.81082],
          [1059.9261 ,  374.7446 ,  192.37201]], dtype=float32, 'kpc')

Note

Under most circumstances there is very little performance penalty to using a SubSnap. However in performance-critical code it is worth understanding a little more about what’s going on under the hood. See Performance optimisation in pynbody.

Filters

Another way you can select a subset of particles is to use a filter. This can lead to more readable code than the equivalent explicitly written condition. For example, to pick out a sphere centered on the origin, you can use:

In [49]: from pynbody.filt import *

In [50]: f_sphere = f[Sphere('10 kpc')]

For a list of filters, see pynbody.filt.

Where next?

This concludes the tutorial for basic use of pynbody. Further tutorials for specific tasks are available. We are happy to provide further assistance via our user group email list.