By this I don't mean software implementations of numerical algorithms, but rather software for conducting numerical analysis.
from nodepy import *
import sympy
one = sympy.Rational(1)
import numpy as np
from sympy import init_printing
%matplotlib inline
import matplotlib.pyplot as plt
NodePy knows about many common methods:
rk4 = rk.loadRKM('RK44')
print rk4
Classical RK4 The original four-stage, fourth-order method of Kutta 0 | 1/2 | 1/2 1/2 | 1/2 1 | 1 _____|____________________ | 1/6 1/3 1/3 1/6
ab6=lm.Adams_Bashforth(6)
print ab6.alpha, ab6.beta
[0 0 0 0 0 -1 1] [-95/288 959/480 -3649/720 4991/720 -2641/480 4277/1440 0]
You can also make up your own methods -- just give the coefficients:
A = np.array([[0, 0,],[one/2, one/2]])
b = np.array([one/2,one/2])
itrap = rk.RungeKuttaMethod(A,b)
print itrap
Runge-Kutta Method 0 | 1 | 1/2 1/2 _____|__________ | 1/2 1/2
What do we know about these methods? We may be interested in their
S=rk4.plot_stability_region()
Here is something a little more difficult:
ab6.plot_boundary_locus()
ab6.plot_stability_region()
Notice that NodePy automatically finds the part of the stable region that is connected to the origin and sets the plot bounds appropriately.
Next let's check the order of each method.
print rk4.order()
print itrap.order()
print ab6.order()
4 2 6
This we knew already. But how badly does RK4 violate the 5th order conditions?
print rk4.order_conditions(5)
print rk4.principal_error_norm()
[1/576 -1/288 1/96 0 1/288 -1/96 0 1/96 1/120] 0.0145045823432
The first line gives the amounts by which RK4 violates the order condition corresponding to each of the order-five rooted trees. The second gives the \(L_2\) norm of those coefficients.
forest = rt.list_trees(5)
print forest
['{{T^3}}', '{{{T^2}}}', '{{{{T}}}}', '{{T{T}}}', '{T{T^2}}', '{T{{T}}}', '{T^2{T}}', '{{T}{T}}', '{T^4}']
And here are those trees.
T=rt.plot_all_trees(5)
Having all these methods and the ability to query their properties makes it trivial to ask some interesting questions.
Let's take a look at extrapolation methods. We'll look at methods obtained by extrapolating explicit Euler and the explicit midpoint rule.
Note: this part of the talk comes from a paper that is joint work with student Umair bin Waheed (KAUST).
from IPython.display import HTML
HTML('<iframe src=http://arxiv.org/abs/1305.6165?format=mobile width=800 height=350></iframe>')
We usually think of extrapolation as an iterative process, but if we know beforehand how many iterations we'll use, then it's just a kind of Runge-Kutta method.
ex4E = rk.extrap(4)
ex4M = rk.extrap(2,'midpoint')
print ex4E
print ex4M
Ex-Euler 4 0 | 1/2 | 1/2 1/3 | 1/3 2/3 | 1/3 1/3 1/4 | 1/4 1/2 | 1/4 1/4 3/4 | 1/4 1/4 1/4 ______|__________________________________________ | 0 2 -9/2 -9/2 8/3 8/3 8/3 Ex-Midpoint 2 0 | 1/2 | 1/2 1/4 | 1/4 1/2 | 1/2 3/4 | 1/4 1/2 ______|______________________________ | 0 -1/3 2/3 0 2/3
S=ex4E.plot_stability_region()
S=ex4M.plot_stability_region()
Extrapolation methods have a lot of potential for parallelism (see this review paper):
rk_graph.plot_dependency_graph(ex4E)
print len(ex4E)
print ex4E.num_seq_dep_stages()
7 4
from IPython.display import Image
Image('./parallel_table.png')
NodePy also understands embedded Runge-Kutta pairs. Let's look at some pairs based on extrapolation.
ex54 = rk.extrap_pair(5)
print ex54
Euler extrapolation pair of order 5(4) 0 | 1/2 | 1/2 1/3 | 1/3 2/3 | 1/3 1/3 1/4 | 1/4 1/2 | 1/4 1/4 3/4 | 1/4 1/4 1/4 1/5 | 1/5 2/5 | 1/5 1/5 3/5 | 1/5 1/5 1/5 4/5 | 1/5 1/5 1/5 1/5 ________|________________________________________________________________________________________ | 0 -4/3 27/4 27/4 -32/3 -32/3 -32/3 125/24 125/24 125/24 125/24 | -2/3 9/2 9/2 -8 -8 -8 25/6 25/6 25/6 25/6
Let's compare this pair with Fehlberg's pair.
fehlberg = rk.loadRKM('Fehlberg45')
print fehlberg.order()
print fehlberg.embedded_method.order()
5 4
problem = ivp.load_ivp('vdp')
methods = [fehlberg,ex54]
tol = [10.**(-m) for m in range(2,8)]
[work,err] = conv.ptest(methods,problem,tol,verbosity=0)
The relatively poor performance of the extrapolation method may seem puzzling if we just compare the size of the leading error term coefficients:
print ex54.principal_error_norm()
print fehlberg.principal_error_norm()
0.00184975286137 0.00335574469285
Instead, the difference in performance is related to the number of stages used by each method:
print len(ex54)
print len(fehlberg)
11 6
Image('./parallel_table.png')
Notice that the number of stages grows quadratically with the order! Most interest in extrapolation methods is focused on methods of very high order, for which it is difficult to directly develop Runge-Kutta methods.
There is another well-known class of very high order integrators: the deferred correction methods.
dc109 = rk.DC_pair(9)
There are a few 10th-order Runge-Kutta methods out there; the one we've had the most success with is due to Curtis (1975):
from nodepy import loadmethod
curtis10 = loadmethod.load_rkpair_from_file('rk108curtis.txt')
Which of these 10th order methods is most efficient in general?
ex108 = rk.extrap_pair(5,'midpoint')
methods = [ex108,dc109,curtis10]
problem = ivp.detest('D1')
[work,err] = conv.ptest(methods,problem,tol,verbosity=0)
At 10th order, extrapolation is competitive. Why is deferred correction so inefficient?
print curtis10.principal_error_norm()
print ex108.principal_error_norm()
print dc109.principal_error_norm()
7.68312478367e-07 1.44835573043e-06 6.42205612914e-08
It's definitely not the error coefficients -- DC has the smallest! What about the number of stages?
print len(curtis10)
print len(ex108)
print len(dc109)
21 26 82
Image('./fulltable.png')
Given that extrapolation is competitive in serial, it seems that if implemented in parallel, it should have a significant advantage over traditional RK methods.
One immediate question is, can the theoretical parallel speedup be achieved in practice?
We examined this by adding a single OpenMP directive to Ernst Hairer's well-known odex code. We tested the speedup using a gravitational \(N\)-body problem with 400 bodies.
Here's what you get with the most naive approach (dynamic scheduling):
Image(filename='odex_speedup.png')
By doing slightly more work (another 10 lines of code), you can get (nearly) the theoretical maximum speedup with the expected number of processors.
Image('odex_speedup_load_balanced.png')
Something curious happens when one uses very high order Euler extrapolation methods at tight tolerances.
\[\begin{align} x''(t) & = -x/r^3 & x(0) & = 0.7 & x'(0) & = 0,\\ y''(t) & = -y/r^3 & y(0) & = 0 & y'(0) & = \sqrt{13/7}, \\ r^2 & = x^2 + y^2. \end{align}\]
f45 = rk.loadRKM('Fehlberg45')
ex = rk.extrap_pair(12)
problem = ivp.detest('D2')
tolerances = [1.e-3,1.e-4,1.e-5,1.e-6,1.e-7,1.e-8,1.e-9,1.e-10,1.e-11,1.e-12,1.e-13,1.e-14]
errors = [[],[]]
work = [[],[]]
tols = [[],[]]
for imeth, method in enumerate((f45,ex)):
print method.name
for tol in tolerances:
if (imeth==1 and tol<1.e-10): continue
t,y = method(problem,errtol=tol,dt=0.01)
errors[imeth].append(np.abs(y[-1][0]+0.177702735714040))
work[imeth].append(len(t))
tols[imeth].append(tol)
Fehlberg RK5(4)6 Euler extrapolation pair of order 12(11) Maximum number of steps reached; giving up.
plotstyles = ['-ok','--sb','-.dr']
for i in range(2):
plt.loglog(tols[i],errors[i],plotstyles[i],linewidth=3,markersize=10)
plt.hold(True)
plt.legend(['Fehlberg','Extrapolation'],loc='best')
plt.xlabel('Input tolerance')
plt.ylabel('Measured error')
plt.xticks(plt.xticks()[0][1::3])
plt.yticks(plt.yticks()[0][1::3])
plt.hold(False)
for i in range(2):
plt.loglog(tols[i],work[i],plotstyles[i],linewidth=3,markersize=10)
plt.hold(True)
plt.legend(['Fehlberg','Extrapolation'],loc='best')
plt.xlabel('Measured error')
plt.ylabel('Time steps taken')
plt.xticks(plt.xticks()[0][1::3])
plt.yticks(plt.yticks()[0][1::3])
plt.hold(False)
What's going on here?
An \(s\)-stage RK method can be written
\[\begin{align} y_i &= u_n + \Delta t \sum_{j=1}^s a_{ij} f(t_n+c_j \Delta t, y_j) & (1\leq i \leq s), \\ u_{n+1} &= u_n + \Delta t \sum_{j=1}^s b_j f(t_n+c_j \Delta t, y_j). \end{align}\]
But in reality there are (roundoff) errors in the stages: \[\begin{align} \tilde{y}_i &= u_n + \Delta t \sum_{j=1}^s a_{ij} f(t_n+c_j \Delta t, \tilde{y}_j) + \tilde{r}_i & (1\leq i \leq s), \\ \tilde{u}_{n+1} &= u_n + \Delta t \sum_{j=1}^s b_j f(t_n+c_j \Delta t, \tilde{y}_j) + \tilde{r}_{s+1}. \end{align}\]
Of course, nobody implements extrapolation in Butcher form; they use a certain Shu-Osher form. With roundoff errors, it reads:
\[\begin{align} \tilde{y}_i &= v_i u_n + \sum_{j=1}^s (\alpha_{ij} \tilde{y}_j + \Delta t \beta_{ij}f(t_n+c_j \Delta t, \tilde{y}_j) ) + \tilde{r}_i & (1 \leq i \leq s+1), \\ \tilde{u}_{n+1} &= \tilde{y}_{s+1} \end{align}\]
This implementation choice makes a big difference in the propagation of roundoff errors.
Internal stability can be understood by considering the linear, scalar (\(m=1\)) initial value problem
\[\begin{align} \begin{cases} u'(t) = \lambda u(t), & \lambda \in {\mathbb C} \\ u(t_0) = u_0. \end{cases} \end{align}\]
Then we find that
\[\begin{equation} \begin{split} \epsilon_{n+1} & = P(z) \epsilon_n + Q(z;\alpha,\beta) \tilde{r}_{1:s} + \tilde{r}_{s+1} \end{split} \end{equation}\]
where \(Q\) is a vector of internal stability polynomials.
Definition: The maximum internal amplification factor of a RK method is
\[\begin{align} {\mathcal M}(\alpha,\beta):= \max_{j} \sup_{z \in {\mathcal S}} |Q_{j}(z)|, \end{align}\]
We can consider the whole stability region \({\mathcal S}\), but when the step size is very small, the value at the origin is most relevant:
\[\begin{align} {\mathcal M_0} = \max_j |Q_j(0)|. \end{align}\]
print fehlberg.maximum_internal_amplification()
(5.3895917900074943, 0.0)
print ex.maximum_internal_amplification()
(332603.42306967499, 137786.59611992945)
print ex.maximum_internal_amplification(use_butcher=True)
(169897.66258197918, 0.0)
Image(filename='mtable.png')
Image(filename='internal_errors.png')
The last portion of the talk is taken from joint work with Lajos Loczi (KAUST) and Matteo Parsani (NASA):
HTML('<iframe src=http://arxiv.org/abs/1309.1317?format=mobile width=800 height=350></iframe>')