A walk through pynbody’s low-level facilities¶
The following example shows how to load a file, determine various attributes, access some data and make use of unit information.
If you’re more interested in making pretty pictures and plots straight away, you may wish to read the basic facilities tutorial first.
Note
This tutorial assumes basic familiarity with python and is written as a series of ipython cells, like a Jupyter notebook. Before you start make sure pynbody is properly installed. See Pynbody Installation for more information. You will also need the standard pynbody test files if you want to follow the tutorial. These files are available separately here: <https://github.com/pynbody/pynbody/releases> (testdata.tar.gz).
After you have extracted the testdata folder (e.g. with tar -xzf
testdata.tar.gz
), launch ipython or Jupyter. At the prompt or in
a cell, type import pynbody
. If all is installed correctly, this should silently
succeed, and you are ready to use pynbody.
First steps¶
We will start by loading one of the test files. We’ll also load the numpy module as it provides some functions we’ll make use of later.
In [1]: import pynbody
In [2]: import numpy as np
In [3]: f = pynbody.load("testdata/test_g2_snap")
Here we’ve loaded a sample gadget file. Not much seems to have
happened when you called pynbody.load()
, but the variable f
is now your calling point for accessing data.
In fact f
is an object known as a SimSnap (see
pynbody.snapshot.SimSnap
).
Behind the scenes, the function inspects the provided path and decides what file format it is. At the time of writing, supported file formats include tipsy, nchilada, gadget-1, gadget-2, gadget-HDF and ramses. For most purposes you should never need to know what type of file you are dealing with – that’s the whole point of the pynbody framework.
Note
If you look inside the testdata folder, you’ll notice that
our test snapshot is actually an example of a spanned gadget
snapshot. There is not actually a file called test_g2_snap
, but
rather two files from two CPUs, test_g2_snap.0
and
test_g2_snap.1
. pynbody knows to load both of these if you ask
for test_g2_snap; if you ask for test_g2_snap.1 (for instance),
it’ll load only that particular file.
The SimSnap object that’s returned is currently a fairly empty holder for data. The data will be loaded from disk as and when you request it.
Finding out something about the file¶
Let’s start to inspect the file we’ve opened. The standard python operator len
can be used to query the number
of particles in the file:
In [4]: len(f)
Out[4]: 8192
We can also find out about particles of a particular type or family
as it is known within pynbody. To find out which families are present
in the file, use families()
:
In [5]: f.families()
Out[5]: [<Family gas>, <Family dm>, <Family star>]
You can pick out just the particles belonging to a family by using the
syntax f.family
. So, for example, we can see how many particles of
each type are present:
In [6]: len(f.dm)
Out[6]: 4096
In [7]: len(f.gas)
Out[7]: 4039
In [8]: len(f.star)
Out[8]: 57
Useful information about the file is stored in a python dictionary called properties:
In [9]: f.properties
Out[9]:
{'omegaM0': 0.2669,
'omegaL0': 0.7331,
'boxsize': Unit("3.00e+03 kpc a h**-1"),
'a': 0.2777777798158637,
'h': 0.71,
'time': Unit("3.55e+00 s kpc a**1/2 km**-1 h**-1")}
Like any python dictionary, specific properties can be accessed by name:
In [10]: f.properties['a']
Out[10]: 0.2777777798158637
These names are standardized across different file formats. Here for example z means redshift, a means the cosmological scalefactor, h indicates the Hubble constant in standard units (100 km/s/Mpc).
Note
Actually f.properties
has some behaviour which is
very slightly different from a normal python dictionary. For further
information see SimDict
.
Retrieving data¶
Like f.properties
, f
itself also behaves like a python
dictionary. The standard python method
f.
keys()
returns a list of arrays
that are currently in memory.
In [11]: f.keys()
Out[11]: []
Right now it’s empty! That’s actually correct because data is only
retrieved when you first access it. To find out what could be loaded,
use the pynbody-specific method
f.
loadable_keys()
.
In [12]: f.loadable_keys()
Out[12]: ['iord', 'pos', 'vel', 'mass']
This looks a bit more promising.
To access data, simply use the normal dictionary syntax. For example
f['pos']
returns an array containing the 3D-coordinates of all the
particles.
In [13]: f['pos']
Out[13]:
SimArray([[ 53.318974, 177.84364 , 128.22311 ],
[ 306.75046 , 140.44455 , 215.37149 ],
[ 310.99908 , 64.1345 , 210.53595 ],
...,
[2870.9016 , 2940.1711 , 1978.7949 ],
[2872.4114 , 2939.2197 , 1983.916 ],
[2863.6511 , 2938.0544 , 1980.0615 ]], dtype=float32, 'kpc a h**-1')
Note
Array names are standardized across all file
formats. For instance, even if you load a Gadget-HDF file – which
internally refers to the position array as coordinates – you
still access that array from pynbody by the name pos
. The
intention is that code never needs to be adapted simply because you
have switched file format. However the name mapping is fully
configurable should you wish to adopt
different conventions.
Some arrays are stored only for certain families. For example,
densities are stored only for gas particles and are accessed as
f.gas['rho']
. To find out what arrays are available for the gas
family, use
f.gas.
loadable_keys()
:
In [14]: f.gas.loadable_keys()
Out[14]:
['nheq',
'nhe',
'iord',
'nhp',
'sfr',
'pos',
'u',
'nh',
'smooth',
'vel',
'nhep',
'mass',
'rho']
So, we can get the density of the gas particles like this:
In [15]: f.gas['rho']
Out[15]:
SimArray([1.3888609e-09, 3.3617684e-09, 4.5273674e-09, ..., 8.5340952e-09,
7.4101774e-09, 1.4051752e-09], dtype=float32, '1.00e+10 h**2 Msol kpc**-3 a**-3')
Note
The SimArray
objects are actually
numpy arrays with some added functionality (such as unit tracking,
discussed below). Numerical operations are very nearly as fast as
their numpy equivalents. However, if you want to squeeze the
performance of your code, you can always get a vanilla numpy array by
using the numpy view mechanism,
e.g. f.gas['rho'].view(type=numpy.ndarray)
Creating your own arrays¶
You can create arrays using the obvious assignment syntax:
In [16]: f['twicethemass'] = f['mass']*2
You can also define new arrays for one family of particles:
In [17]: f.gas['myarray'] = f.gas['rho']**2
An array created in this way exists only for the gas particles; trying to access it for other particles raises an exception.
Alternatively, you can define derived arrays which are calculated (and re-calculated) on demand. For example,
In [18]: @pynbody.derived_array
....: def thricethemass(sim) :
....: return sim['mass']*3
....:
At this point, nothing has been calculated. However, when you ask for the array, the values are calculated and stored
In [19]: f['thricethemass']
Out[19]:
SimArray([0.02464408, 0.02464408, 0.02464408, ..., 0.02464408, 0.02464408,
0.02464408], dtype=float32, '1.00e+10 Msol h**-1')
This has the advantage that your new thricethemass array is automatically updated when you change the mass array:
In [20]: f['mass'][0] = 1
In [21]: f['thricethemass']
Out[21]:
SimArray([3. , 0.02464408, 0.02464408, ..., 0.02464408, 0.02464408,
0.02464408], dtype=float32, '1.00e+10 Msol h**-1')
Note, however, that the array is not re-calculated every time you access it, only if the mass array has changed. Therefore you don’t waste any time by using derived arrays. For more information see the reference documentation for derived arrays.
Keeping on top of units¶
You might have noticed in the output from the above experiments that pynbody keeps track of unit information whenever it can.
Warning
It’s worth understanding exactly where pynbody gets this
information from, in case anything goes wrong. In the case
of Ramses, and Gadget-HDF files the unit information is stored
within your snapshot, and pynbody takes advantage of this. For
old-style Gadget snapshots, the default cosmological gadget setup is
assumed. For nchilada and tipsy, an nchilada or gasoline
.param
file is sought in the directory from which you are loading
the snapshot and its immediate parent. You can also create a text file
with the same name as your snapshot but the extension .units
to override
the units at load time. For example, such a file can contain
pos: kpc a
vel: km s^-1
mass: Msol
to specify distance units are comoving kiloparsecs, velocity units are kilometers per second, and mass is in solar masses.
You can print out the units of any given array by accessing the
units
property:
In [22]: f['mass'].units
Out[22]: Unit("1.00e+10 Msol h**-1")
However, it’s usually more helpful to simply convert your arrays into
something more managable than the internal units. Pynbody arrays can
be converted using the in_units()
function; just pass in a string representing the units you want.
In [23]: f['pos'].in_units('Mpc')
Out[23]:
SimArray([[0.02086032, 0.06957889, 0.05016554],
[0.12001192, 0.05494701, 0.08426115],
[0.12167414, 0.02509174, 0.08236931],
...,
[1.1232009 , 1.1503017 , 0.7741764 ],
[1.1237916 , 1.1499294 , 0.77617997],
[1.1203643 , 1.1494735 , 0.774672 ]], dtype=float32, 'Mpc')
Note
The function in_units()
returns a copy of
your array in new units. Next time you access f['pos']
it will be
back in its original units. If you want to permanently convert the array in-place
use convert_units()
or see below.
Another option is to request that pynbody converts all your arrays
into something sensible, using
physical_units()
,
In [24]: f.physical_units()
Take a look at what’s happened to the density:
In [25]: f.gas['rho']
Out[25]:
SimArray([ 326.6502 , 790.66406, 1064.8047 , ..., 2007.1586 , 1742.821 ,
330.4872 ], dtype=float32, 'Msol kpc**-3')
Note that the conversion will also be made when loading any arrays in future; for example:
In [26]: f['vel']
Out[26]:
SimArray([[ 27.938293 , 4.983705 , -10.008866 ],
[ 15.361564 , 5.7859726, 4.3631563],
[ -8.357319 , -2.8885257, 22.809904 ],
...,
[ 27.749176 , 85.60175 , 15.532437 ],
[ 40.755856 , 59.442867 , 44.244846 ],
[ 38.383965 , 68.63973 , 46.01429 ]], dtype=float32, 'km s**-1')
A new array generated from a unary or binary operation will inherit the correct units. For example
In [27]: 5*f['vel']
Out[27]:
SimArray([[139.69147 , 24.918526, -50.04433 ],
[ 76.807816, 28.929863, 21.81578 ],
[-41.786594, -14.442629, 114.04952 ],
...,
[138.74588 , 428.00876 , 77.662186],
[203.77928 , 297.21432 , 221.22423 ],
[191.91983 , 343.19867 , 230.07144 ]], dtype=float32, 'km s**-1')
In [28]: (f['vel']**2).units
Out[28]: Unit("km**2 s**-2")
In [29]: np.sqrt(((f['vel']**2).sum(axis=1)*f['mass'])).units
Out[29]: Unit("km Msol**1/2 s**-1")
You can even associate arrays with the loaded
SimSnap
unit system even when you create
them outside the SimSnap
. This is useful
for keeping things tidy with your unit conversions if you are
calculating quantities that don’t apply to all of the particles. For
instance:
In [30]: array = pynbody.array.SimArray(np.random.rand(10)) # make the newly-formed numpy array a pynbody array
In [31]: array.sim = f # this links the array to the simulation
In [32]: array.units = 'Mpc a' # we set units that require cosmology information
In [33]: array
Out[33]:
SimArray([0.85226331, 0.23154282, 0.47172 , 0.00602858, 0.45068116,
0.27028286, 0.23957908, 0.33810041, 0.79071016, 0.27698084], 'Mpc a')
In [34]: array.in_units('kpc')
Out[34]:
SimArray([236.73980995, 64.31744957, 131.03333337, 1.67460631,
125.18921295, 75.07857348, 66.54974572, 93.91678135,
219.64171327, 76.93912349], 'kpc')
Note that the units were correctly converted into physical units in the last step.
For more information see the reference documentation for
pynbody.units
.
Subsnaps¶
An important concept within pynbody is that of a subsnap. These are
objects that look just like a SimSnap
but actually only point
at a subset of the particles within a parent. Subsnaps are always
instances of the SubSnap
class.
You’ve already seen some examples of subsnaps, actually. When you
accessed f.gas
or f.dm
, you’re given back a subsnap pointing
at only those particles. However, subsnaps can be used in a much more
general way. For example, you can use python’s normal array slicing
operations. Here we take every tenth particle:
In [35]: every_tenth = f[::10]
In [36]: len(every_tenth)
Out[36]: 820
In common with python’s normal mode of working, this does not copy any data, it merely creates another pointer into the existing data. As an example, let’s modify the position of one of our particles in the new view:
In [37]: every_tenth['pos'][1]
Out[37]: SimArray([140.2888 , 122.21798, 75.80529], dtype=float32, 'kpc')
In [38]: every_tenth['pos'][1] = [1,2,3]
In [39]: every_tenth['pos'][1]
Out[39]: SimArray([1., 2., 3.], dtype=float32, 'kpc')
This change is reflected in the main snapshot.
In [40]: f['pos'][10]
Out[40]: SimArray([1., 2., 3.], dtype=float32, 'kpc')
Note
If you’re used to numpy’s flexible indexing abilities, you
might like to note that, typically, f[array_name][index] ==
f[index][array_name]
. The difference is that applying the index to
the whole snapshot is more flexible and can lead to simpler code. In
particular, numpy_array[index]
may involve copying data whereas
f[index]
never does; it always returns a new object pointing back at
the old one.
You can pass in an array of boolean values representing whether each successive particle should be included (True) or not (False). This allows the use of numpy’s comparison operators. For example:
In [41]: f_slab = f[(f['x']>1000)&(f['x']<2000)]
In [42]: f_slab['x'].min()
Out[42]: SimArray(1000.11975, dtype=float32, 'kpc')
In [43]: f_slab['x'].max()
Out[43]: SimArray(1173.6926, dtype=float32, 'kpc')
In [44]: f['x'].min()
Out[44]: SimArray(0.04504352, dtype=float32, 'kpc')
In [45]: f['x'].max()
Out[45]: SimArray(1173.6926, dtype=float32, 'kpc')
Here f_slab is pointing at only those particles which have x-coordinates between 1000 and 2000.
Note that subsnaps really do behave exactly like snapshots. So, for instance, you can pick out sub-subsnaps or sub-sub-subsnaps.
In [46]: len(f_slab.dm)
Out[46]: 1236
In [47]: len(f_slab.dm[::10])
Out[47]: 124
In [48]: f_slab[[100,105,252]].gas['pos']
Out[48]:
SimArray([[1029.6443 , 421.70737, 169.25818],
[1097.2487 , 377.8681 , 149.81082],
[1059.9261 , 374.7446 , 192.37201]], dtype=float32, 'kpc')
Note
Under most circumstances there is very little performance penalty to using a SubSnap. However in performance-critical code it is worth understanding a little more about what’s going on under the hood. See Performance optimisation in pynbody.
Filters¶
Another way you can select a subset of particles is to use a filter. This can lead to more readable code than the equivalent explicitly written condition. For example, to pick out a sphere centered on the origin, you can use:
In [49]: from pynbody.filt import *
In [50]: f_sphere = f[Sphere('10 kpc')]
For a list of filters, see pynbody.filt
.
Where next?¶
This concludes the tutorial for basic use of pynbody. Further tutorials for specific tasks are available. We are happy to provide further assistance via our user group email list.