# Performance optimisation in pynbody¶

Pynbody is built on top of numpy, which means that learning how to optimize numpy array manipulations is the most important route to writing efficient code; see http://scipy-lectures.github.com/advanced/optimizing/index.html for an introduction.

However there are a few issues which are specific to pynbody. First, a large number of pynbody’s most common operations are parallelized: make sure you have set up these routines to behave in a way that matches your needs by reading the page on Use of multiple processors by pynbody.

Other than that, there are some more subtle issues. The most important of these that we have come across is the overheads incurred by using `SubSnap`s which are explained below.

## Overheads of SubSnaps¶

### A template for performance-critical code¶

To cut a long story short, if your routine does a lot of array access on an object which might be a SubSnap of a certain flavour (explained further below), you will find that wrapping your code as follows speeds it up.

```
def my_expensive_operation(sim_or_subsim) :
with sim_or_subsim.immediate_mode :
mass_array = sim_or_subsim['mass']
other_array = sim_or_subsim['other']
# ... get other arrays required...
#
# perform multiple operations on arrays
#
# At end, copy back results if the arrays have
# changed
sim_or_subsim['other'] = other_array
```

The remainder of this document unpacks what this does and why it should be necessary.

### What is a SubSnap, really?¶

When you construct a SubSnap, the framework records which particles of the underlying SimSnap are included and which are not. Thereafter, if you access an array from the SubSnap, it is constructed in one of two ways.

If the set of particles indexed by the SubSnap is expressable a slice, the arrays constructed are still numpy arrays. This will be the case if you explicitly slice the simulation (e.g.

`f[2:100:3]`

), or if you ask for a specific particle family (e.g.`f.dm`

).If the set of particles is not expressable in this way, the arrays constructed are emulating numpy arrays and this can become expensive (see below). This will be the cae if you ask for a list of particles (e.g.

`f[[2,10,15,22]]`

) use a numpy-like indexing trick (e.g.`f[f['x']>10]`

or`f[numpy.where(f['x']>10)]`

) or use a`filter`

.

In the first case, the optimization is unchanged from the raw SimSnap case. In the second case, the situation is different. To understand how and why, we need to look at the difference between an indexed and a sliced numpy array.

### Numpy’s behaviour¶

This section explains why the reason for the slightly awkward design of IndexedSubArray. The pynbody framework requires all sub-arrays to continue pointing to the original data but a simple experiment with numpy shows that it does not enable this behaviour in all cases that we want to cover.

Here’s what happens when you use a slice of an existing numpy array.

```
In [1]: import numpy as np
In [2]: a = np.zeros(10)
In [3]: b = a[3:5]
In [4]: b[1] = 100
In [5]: a
Out[5]: array([ 0., 0., 0., 0., 100., 0., 0., 0., 0., 0.])
```

The a array has been updated as required, because the b and a objects actually point back to the same part of the computer memory.

On the other hand, when you *index* a numpy array, the behaviour is different.

```
In [6]: c = a[[4,5,6]]
In [7]: c[1] = 200
In [8]: a
Out[8]: array([ 0., 0., 0., 0., 100., 0., 0., 0., 0., 0.])
```

Here changing c has not updated a. That’s because the construction of c actually
*copied* the relevant data instead of just pointing back at it. This is necessitated by
the underlying design of numpy arrays requiring the data to lie in a predictable
pattern in the memory.

### Back to pynbody¶

The IndexedSubArray class fixes this problem:

```
In [9]: import pynbody
In [10]: d = pynbody.array.IndexedSimArray(a, [4,5,6])
In [11]: d[1] = 200
In [12]: a
Out[12]: array([ 0., 0., 0., 0., 100., 200., 0., 0., 0., 0.])
```

Note that a has been updated correctly. This is achieved by the IndexedSimArray
*emulating*, rather than *wrapping*, a numpy array; internally
the syntax d[1]=200 is then translated into a[[4,5,6][1]]=200.

The cost of this is that each time you call a function that requires a numpy array as an input, the IndexedSimArray has to generate a proxy for this purpose. This can become slow. Have a look at the following timings:

```
In [13]: %time for i in range(10000) : d+=1
CPU times: user 76.7 ms, sys: 199 us, total: 76.9 ms
Wall time: 77 ms
In [14]: %time for i in range(10000) : a+=1
CPU times: user 14.3 ms, sys: 21 us, total: 14.3 ms
Wall time: 14.4 ms
```

Adding to the subarray is *slower* than adding to the entire array!
This is because of the overheads of continually constructing proxy
numpy arrays to pass to the `__add__`

method.

### How to remove this bottleneck¶

We should emphasize that the example above is quite contrived, since it forces re-construction of the numpy proxy 10000 times. In user code, the number of numpy proxies that have to be constructed will be vastly smaller, so the fractional overheads are normally quite small.

Nonetheless, it does sometimes become a problem for performance-critical code. For that reason, it’s possible to avoid constructing IndexedSimArray`s altogether and force only `numpy arrays to be returned. This means you must take responsibility for understanding which operations copy, as opposed to referencing, data.

This is known as immediate mode and is activated using python’s with mechanism. Let’s create a test snapshot and a subview into that snapshot to try it out.

```
In [15]: f = pynbody.new(dm=100)
In [16]: sub_f = f[[20,21,22]]
```

Under normal conditions, the type of arrays returned from `sub_f`

is `IndexedSimArray`

.
Updating one of these arrays will transparently update the main snapshot.

```
In [17]: sub_mass = sub_f['mass']
In [18]: type(sub_mass)
Out[18]: pynbody.array.IndexedSimArray
In [19]: sub_mass[:]=3
In [20]: f['mass'][[20,21,22]]
Out[20]: SimArray([3., 3., 3.])
```

Conversely, in immediate mode, the type of arrays returned from `sub_f`

is `SimArray`

(so just
a wrapper round a real numpy array). But updating that returned numpy array has *no effect* on the
parent snapshot.

```
In [21]: with f.immediate_mode :
....: sub_mass = sub_f['mass']
....:
In [22]: type(sub_mass)
Out[22]: pynbody.array.SimArray
In [23]: sub_mass
Out[23]: SimArray([3., 3., 3.])
In [24]: sub_mass[:]=5
In [25]: sub_mass # updated as expected
Out[25]: SimArray([5., 5., 5.])
In [26]: f['mass'][[20,21,22]] # NOT updated - should still be 3,3,3!
Out[26]: SimArray([3., 3., 3.])
```

So it becomes your responsibility to copy the results back in this case, if required. A template for performance critical code which might be operating on a SubSnap follows.

Note

`with f_sub.immediate_mode`

is equivalent to `with f.immediate_mode`

where f_sub is any SubSnap of f.

So in summary, the template code at the start of this document advocates:

storing a

copyof the data for the subset of particlesworking on the copy

(if necessary) updating the main snapshot data explicitly before returning

## Overheads of raw SimArrays¶

Note

This information is provided for interest. We have never come across a realistic use case where the following is necessary.

In pynbody, arrays are implemented by the class `SimArray`

. This is a wrapper
around a numpy array. There is a small extra cost associated with every operation to allow
units to be matched and updated. For long arrays such as those found in typical simulations, this is usually a tiny fraction of the
actual computation time. We have never found it to be a problem, but if you want to disable the
unit tracking you can always do so using numpy’s view mechanism to get a raw numpy array.
Suppose you have a SimSnap `f`

; then `pos = f['pos'].view(numpy.ndarray)`

(for example) will return the position
array without any of the SimArray wrapping. The new pos variable can be manipulated without
any unit handling code being called.