Automated design and analysis of ODE Solvers

David I. Ketcheson

King Abdullah University of Science and Technology (KAUST)

Collaborators: Lajos Loczi, Matteo Parsani, and Umair bin Waheed

Presented at SciCADE 2013 in Valladolid, Spain

NodePy: A laboratory for studying numerical integrators

http://github.com/ketch/nodepy/

Written in Python
Uses numpy and scipy to do both exact and floating-point computation (depending on the nature of the input)
Today I won't discuss this related software: http://github.com/ketch/RK-Opt

This talk is an IPython notebook that you can download and run yourself:
- http://www.davidketcheson.info/talks/SciCADE-talk.ipynb
Or you can run it directly on Wakari:
- https://www.wakari.io/sharing/bundle/ketch/SciCADE-talk

Is numerical analysis software useful?

By this I don't mean software implementations of numerical algorithms, but rather software for conducting numerical analysis.

Is it worth my time?

Most of the code in NodePy is code I would write anyway for my research

I spend a little more time to organize, document, and make it publicly available, so that:
- Code from different projects can be used together
- I can remember how to use my code one or two years later
- Others can use my code too
- Errors in my code are more likely to be detected

In this talk I will try to show by example that such software is useful to those who study numerical methods
Such software can also be useful to those who use numerical methods, by helping them determine which methods to use

NodePy knows about many common methods:

In [2]:

rk4 = rk.loadRKM('RK44')
print rk4

Classical RK4
The original four-stage, fourth-order method of Kutta
 0   |
 1/2 | 1/2
 1/2 |      1/2
 1   |           1
_____|____________________
     | 1/6  1/3  1/3  1/6

In [3]:

ab6=lm.Adams_Bashforth(6)
print ab6.alpha, ab6.beta

[0 0 0 0 0 -1 1] [-95/288 959/480 -3649/720 4991/720 -2641/480 4277/1440 0]

You can also make up your own methods -- just give the coefficients:

In [4]:

A = np.array([[0, 0,],[one/2, one/2]])
b = np.array([one/2,one/2])
itrap = rk.RungeKuttaMethod(A,b)
print itrap

Runge-Kutta Method

 0   |
 1   | 1/2  1/2
_____|__________
     | 1/2  1/2

What do we know about these methods? We may be interested in their

stability (linear, nonlinear)
accuracy (order, leading coefficients)
storage requirements: how much memory do we need when using them?
potential for concurrency: can we compute multiple stages in parallel?
symplecticity
...

In [5]:

S=rk4.plot_stability_region()

Here is something a little more difficult:

In [6]:

ab6.plot_boundary_locus()

In [7]:

ab6.plot_stability_region()

Notice that NodePy automatically finds the part of the stable region that is connected to the origin and sets the plot bounds appropriately.

Next let's check the order of each method.

In [8]:

print rk4.order()
print itrap.order()
print ab6.order()

4
2
6

This we knew already. But how badly does RK4 violate the 5th order conditions?

In [54]:

print rk4.order_conditions(5)
print rk4.principal_error_norm()

[1/576 -1/288 1/96 0 1/288 -1/96 0 1/96 1/120]
0.0145045823432

The first line gives the amounts by which RK4 violates the order condition corresponding to each of the order-five rooted trees. The second gives the \(L_2\) norm of those coefficients.

In [10]:

forest = rt.list_trees(5)
print forest

['{{T^3}}', '{{{T^2}}}', '{{{{T}}}}', '{{T{T}}}', '{T{T^2}}', '{T{{T}}}', '{T^2{T}}', '{{T}{T}}', '{T^4}']

And here are those trees.

In [53]:

T=rt.plot_all_trees(5)

Classes of methods in NodePy

Runge-Kutta (explicit and implicit)
- Runge-Kutta-Chebyshev methods
- Strong stability preserving methods
- Embedded pairs
- Low-storage RK methods and pairs
Linear multistep (explicit and implicit)
General linear (multistep, multistage) methods
- Two- and three-step Runge-Kutta
Iterative methods
- Extrapolation
- Deferred correction
...

Comparing one-step methods of very high order

Having all these methods and the ability to query their properties makes it trivial to ask some interesting questions.

Let's take a look at extrapolation methods. We'll look at methods obtained by extrapolating explicit Euler and the explicit midpoint rule.

Note: this part of the talk comes from a paper that is joint work with student Umair bin Waheed (KAUST).

In [12]:

from IPython.display import HTML
HTML('<iframe src=http://arxiv.org/abs/1305.6165?format=mobile width=800 height=350></iframe>')

Out[12]:

Extrapolation methods

We usually think of extrapolation as an iterative process, but if we know beforehand how many iterations we'll use, then it's just a kind of Runge-Kutta method.

In [13]:

ex4E = rk.extrap(4)
ex4M = rk.extrap(2,'midpoint')
print ex4E
print ex4M

Ex-Euler 4

 0    |
 1/2  | 1/2
 1/3  | 1/3
 2/3  | 1/3         1/3
 1/4  | 1/4
 1/2  | 1/4                     1/4
 3/4  | 1/4                     1/4   1/4
______|__________________________________________
      | 0     2     -9/2  -9/2  8/3   8/3   8/3
Ex-Midpoint 2

 0    |
 1/2  | 1/2
 1/4  | 1/4
 1/2  |             1/2
 3/4  | 1/4               1/2
______|______________________________
      | 0     -1/3  2/3   0     2/3

In [14]:

S=ex4E.plot_stability_region()
S=ex4M.plot_stability_region()

Extrapolation methods have a lot of potential for parallelism (see this review paper):

In [15]:

rk_graph.plot_dependency_graph(ex4E)

In [16]:

print len(ex4E)
print ex4E.num_seq_dep_stages()

7
4

In [17]:

from IPython.display import Image
Image('./parallel_table.png')

Out[17]:

Extrapolation pairs

NodePy also understands embedded Runge-Kutta pairs. Let's look at some pairs based on extrapolation.

In [18]:

ex54 = rk.extrap_pair(5)
print ex54

Euler extrapolation pair of order 5(4)

 0      |
 1/2    | 1/2
 1/3    | 1/3
 2/3    | 1/3             1/3
 1/4    | 1/4
 1/2    | 1/4                             1/4
 3/4    | 1/4                             1/4     1/4
 1/5    | 1/5
 2/5    | 1/5                                                     1/5
 3/5    | 1/5                                                     1/5     1/5
 4/5    | 1/5                                                     1/5     1/5     1/5
________|________________________________________________________________________________________
        | 0       -4/3    27/4    27/4    -32/3   -32/3   -32/3   125/24  125/24  125/24  125/24
        |         -2/3    9/2     9/2     -8      -8      -8      25/6    25/6    25/6    25/6

Let's compare this pair with Fehlberg's pair.

In [19]:

fehlberg = rk.loadRKM('Fehlberg45')
print fehlberg.order()
print fehlberg.embedded_method.order()

5
4

In [20]:

problem = ivp.load_ivp('vdp')
methods = [fehlberg,ex54]
tol = [10.**(-m) for m in range(2,8)]
[work,err]   = conv.ptest(methods,problem,tol,verbosity=0)

The relatively poor performance of the extrapolation method may seem puzzling if we just compare the size of the leading error term coefficients:

In [21]:

print ex54.principal_error_norm()
print fehlberg.principal_error_norm()

0.00184975286137
0.00335574469285

Instead, the difference in performance is related to the number of stages used by each method:

In [22]:

print len(ex54)
print len(fehlberg)

11
6

In [23]:

Image('./parallel_table.png')

Out[23]:

Notice that the number of stages grows quadratically with the order! Most interest in extrapolation methods is focused on methods of very high order, for which it is difficult to directly develop Runge-Kutta methods.

There is another well-known class of very high order integrators: the deferred correction methods.

In [24]:

dc109 = rk.DC_pair(9)

There are a few 10th-order Runge-Kutta methods out there; the one we've had the most success with is due to Curtis (1975):

In [37]:

from nodepy import loadmethod
curtis10 = loadmethod.load_rkpair_from_file('rk108curtis.txt')

Which of these 10th order methods is most efficient in general?

In [26]:

ex108 = rk.extrap_pair(5,'midpoint')
methods = [ex108,dc109,curtis10]
problem = ivp.detest('D1')
[work,err]   = conv.ptest(methods,problem,tol,verbosity=0)

At 10th order, extrapolation is competitive. Why is deferred correction so inefficient?

In [27]:

print curtis10.principal_error_norm()
print ex108.principal_error_norm()
print dc109.principal_error_norm()

7.68312478367e-07
1.44835573043e-06
6.42205612914e-08

It's definitely not the error coefficients -- DC has the smallest! What about the number of stages?

In [28]:

print len(curtis10)
print len(ex108)
print len(dc109)

21
26
82

In [29]:

Image('./fulltable.png')

Out[29]:

Concurrency

Given that extrapolation is competitive in serial, it seems that if implemented in parallel, it should have a significant advantage over traditional RK methods.

One immediate question is, can the theoretical parallel speedup be achieved in practice?

We examined this by adding a single OpenMP directive to Ernst Hairer's well-known odex code. We tested the speedup using a gravitational \(N\)-body problem with 400 bodies.

Here's what you get with the most naive approach (dynamic scheduling):

In [30]:

Image(filename='odex_speedup.png')

Out[30]:

By doing slightly more work (another 10 lines of code), you can get (nearly) the theoretical maximum speedup with the expected number of processors.

In [31]:

Image('odex_speedup_load_balanced.png')

Out[31]:

Part 3: Internal amplification of roundoff errors in RK methods

Something curious happens when one uses very high order Euler extrapolation methods at tight tolerances.

\[\begin{align} x''(t) & = -x/r^3 & x(0) & = 0.7 & x'(0) & = 0,\\ y''(t) & = -y/r^3 & y(0) & = 0 & y'(0) & = \sqrt{13/7}, \\ r^2 & = x^2 + y^2. \end{align}\]

In [32]:

f45 = rk.loadRKM('Fehlberg45')
ex = rk.extrap_pair(12)
problem = ivp.detest('D2')

In [55]:

tolerances = [1.e-3,1.e-4,1.e-5,1.e-6,1.e-7,1.e-8,1.e-9,1.e-10,1.e-11,1.e-12,1.e-13,1.e-14]
errors = [[],[]]
work = [[],[]]
tols = [[],[]]

for imeth, method in enumerate((f45,ex)):
    print method.name
    for tol in tolerances:
        if (imeth==1 and tol<1.e-10): continue
        t,y = method(problem,errtol=tol,dt=0.01)
        errors[imeth].append(np.abs(y[-1][0]+0.177702735714040))
        work[imeth].append(len(t))
        tols[imeth].append(tol)

Fehlberg RK5(4)6
Euler extrapolation pair of order 12(11)
Maximum number of steps reached; giving up.

In [41]:

plotstyles = ['-ok','--sb','-.dr']
for i in range(2):
    plt.loglog(tols[i],errors[i],plotstyles[i],linewidth=3,markersize=10)
    plt.hold(True)
plt.legend(['Fehlberg','Extrapolation'],loc='best')
plt.xlabel('Input tolerance')
plt.ylabel('Measured error')
plt.xticks(plt.xticks()[0][1::3])
plt.yticks(plt.yticks()[0][1::3])
plt.hold(False)

In [42]:

for i in range(2):
    plt.loglog(tols[i],work[i],plotstyles[i],linewidth=3,markersize=10)
    plt.hold(True)
plt.legend(['Fehlberg','Extrapolation'],loc='best')
plt.xlabel('Measured error')
plt.ylabel('Time steps taken')
plt.xticks(plt.xticks()[0][1::3])
plt.yticks(plt.yticks()[0][1::3])
plt.hold(False)

What's going on here?

An \(s\)-stage RK method can be written

\[\begin{align} y_i &= u_n + \Delta t \sum_{j=1}^s a_{ij} f(t_n+c_j \Delta t, y_j) & (1\leq i \leq s), \\ u_{n+1} &= u_n + \Delta t \sum_{j=1}^s b_j f(t_n+c_j \Delta t, y_j). \end{align}\]

But in reality there are (roundoff) errors in the stages: \[\begin{align} \tilde{y}_i &= u_n + \Delta t \sum_{j=1}^s a_{ij} f(t_n+c_j \Delta t, \tilde{y}_j) + \tilde{r}_i & (1\leq i \leq s), \\ \tilde{u}_{n+1} &= u_n + \Delta t \sum_{j=1}^s b_j f(t_n+c_j \Delta t, \tilde{y}_j) + \tilde{r}_{s+1}. \end{align}\]

Of course, nobody implements extrapolation in Butcher form; they use a certain Shu-Osher form. With roundoff errors, it reads:

\[\begin{align} \tilde{y}_i &= v_i u_n + \sum_{j=1}^s (\alpha_{ij} \tilde{y}_j + \Delta t \beta_{ij}f(t_n+c_j \Delta t, \tilde{y}_j) ) + \tilde{r}_i & (1 \leq i \leq s+1), \\ \tilde{u}_{n+1} &= \tilde{y}_{s+1} \end{align}\]

This implementation choice makes a big difference in the propagation of roundoff errors.

Internal stability can be understood by considering the linear, scalar (\(m=1\)) initial value problem

\[\begin{align} \begin{cases} u'(t) = \lambda u(t), & \lambda \in {\mathbb C} \\ u(t_0) = u_0. \end{cases} \end{align}\]

Then we find that

\[\begin{equation} \begin{split} \epsilon_{n+1} & = P(z) \epsilon_n + Q(z;\alpha,\beta) \tilde{r}_{1:s} + \tilde{r}_{s+1} \end{split} \end{equation}\]

where \(Q\) is a vector of internal stability polynomials.

Definition: The maximum internal amplification factor of a RK method is

\[\begin{align} {\mathcal M}(\alpha,\beta):= \max_{j} \sup_{z \in {\mathcal S}} |Q_{j}(z)|, \end{align}\]

We can consider the whole stability region \({\mathcal S}\), but when the step size is very small, the value at the origin is most relevant:

\[\begin{align} {\mathcal M_0} = \max_j |Q_j(0)|. \end{align}\]

In [46]:

print fehlberg.maximum_internal_amplification()

Out[46]:

(5.3895917900074943, 0.0)

In [47]:

print ex.maximum_internal_amplification()

(332603.42306967499, 137786.59611992945)

In [48]:

print ex.maximum_internal_amplification(use_butcher=True)

(169897.66258197918, 0.0)

In [52]:

Image(filename='mtable.png')

Out[52]:

In [51]:

Image(filename='internal_errors.png')

Out[51]:

The last portion of the talk is taken from joint work with Lajos Loczi (KAUST) and Matteo Parsani (NASA):

In [57]:

HTML('<iframe src=http://arxiv.org/abs/1309.1317?format=mobile width=800 height=350></iframe>')

Out[57]:

In []: