These notes are based on Chapter 11 of Jones, O., Maillardet, R., & Robinson, A. (2009). Introduction to scientific programming and simulation using R. CRC Press.

Introduction

A definite integral \(\int_a^b f(x)dx\) is easy to compute if we can find an antiderivative \(F\) such that \(F'(x) = f(x)\) because \(\int_a^b f(x)dx = F(b)-F(a)\).

Unfortunately, for many functions, it is impossible to write down an antiderivative in “closed form” so that we can compute definite integrals without numerical methods.

Numerical integration could provide good approximation of definite integrals. For example, the cumulative distribution function (CDF) of the standard normal distribution \[ \Phi(z) = \int_{-\infty}^z\frac{1}{\sqrt{2\pi}}e^{-x^2/2}dx \] is commonly computed via numerical integration. In R, for example, pnorm(-1) gives \(\Phi(-1)=0.159\).

We cover three numerical integration techniques:

  • Trapezoidal rule,
  • Simpson’s rule,
  • Adaptive quadrature.

An example problem is to integrate \(f(x) = x\sin(x)+5\) over \([0,3\pi]\), which can be analytically done using integration by parts. \[\begin{align} \int_0^{3\pi} x\sin(x)+5 dx &= \int_0^{3\pi} x\sin(x) dx + \int_0^{3\pi} 5 dx\\ &= \left( -x\cos(x)\Bigr\rvert_0^{3\pi} + \int_0^{3\pi}\cos(x)dx \right) + 15\pi\\ &= 18\pi\\ &\approx 56.5486677646 \end{align}\]

Trapezoidal rule

Given a finite set of subintervals, the trapezoidal rule approximates the area under \(y=f(x)\) by the finite sum of trapezoid areas, which is easy to calculate.

The following is an example with \(n=6\) intervals.

curve(x*sin(x)+5, xaxt = "n", from=-.5, to=9.9, ylab="y")
abline(h=0)
title(expression(y==x*sin(x)+5))
# How to generate this using a loop?
x <- seq(0, 3*pi, .5*pi)
letters <- c(expression(x[0]==0),expression(x[1]),expression(x[2]),
             expression(x[3]),expression(x[4]),expression(x[5]),
             expression(x[6]==3*pi))
axis(1, at=x, labels=letters)
from_x <- c(0,.5*pi,pi,1.5*pi,2*pi,2.5*pi,3*pi)
from_y <- rep(0,7)
to_x <- from_x
to_y <- sapply(from_x, function(x) return(x*sin(x)+5))
segments(from_x,from_y,to_x,to_y,lty=3)

from_x <- from_x[-length(from_x)]
from_y <- to_y[-length(to_y)]
to_x <- to_x[-1]
to_y <- to_y[-1]
segments(from_x,from_y,to_x,to_y,lty=3)

Let the length of each interval be denoted by \(h = (b-a)/n\). Then, the sum of all trapezoid areas (\(T\)) is given by: \[\begin{align} T &= \frac{h}{2}(f(x_0)+f(x_1))+\frac{h}{2}(f(x_1)+f(x_2))+\dots\\ &\qquad +\frac{h}{2}(f(x_{n-1})+f(x_n))\\ &= \frac{h}{2}(f(x_0)+2f(x_1)+2f(x_2)+\dots+2f(x_{n-1})+f(x_n)) \end{align}\]

trapezoid <- function(f, a, b, n = 100) {
  h <- (b-a)/n
  x <- seq(a, b, by = h)
  y <- sapply(x, f)
  T <- h*(y[1]/2 + sum(y[2:n]) + y[n+1]/2)
  return(T)
}
f <- function(x) return(x*sin(x)+5)
trapezoid(f, 0, 3*pi, 100)
## [1] 56.54169

Simpson’s rule

As indicated by the use of trapezoids, the trapezoidal rule approximates \(f\) by a piece-wise linear function. Simpson’s rule uses, instead, a piece-wise quadratic function for approximation of \(f\).

Since a quadratic function \(f(x) = ax^2 + bx + c\) has three parameters, three non-collinear points \((u,f(u)\), \((v,f(v))\), and \((w,f(w))\) pin down the function through a system of three linear equations. (Without loss of generality, assume \(u<v<w\).)

Collinearity means \((v,f(v)) = t(u,f(u))+(1-t)(w,f(w))\) for \(t\in(0,1)\). Can you see its implication?

Solving the system of equation, we get \[ p(x) = f(u)\frac{(x-v)(x-w)}{(u-v)(u-w)} + f(v)\frac{(x-u)(x-w)}{(v-u)(v-w)} + f(w)\frac{(x-u)(x-v)}{(w-u)(w-v)} .\]

Plug in each of three points and see this make sense.

If \(u=v-h\) and \(w=v+h\), then we get the following integral \[ \int_u^w p(x)dx = \frac{h}{3}(f(u)+4f(v)+f(w)) .\]

Now, consider \(n\) number of subintervals where \(n\) is even. Then, we can carry out the above integration \(n/2\) times and sum them up to get an approximate integral (\(S\)).

\[\begin{align} S &= \frac{h}{3}(f(x_0)+f(x_1)+f(x_2)) + \frac{h}{3}(f(x_2)+f(x_3)+f(x_4)) + \dots \\ &\qquad + \frac{h}{3}(f(x_{n-2})+f(x_{n-1})+f(x_{n}))\\ &= \frac{h}{3}(f(x_0)+4f(x_1)+2f(x_2)+4f(x_3)+\dots+4f(x_{n-1})+f(x_{n})) \end{align}\]

As above, the function value for the midpoint of each interval gets \(4\). In contrast, the values for even points (except \(0\) and \(n\)) get \(2\) as they each appear in two subintervals.

Let’s apply Simpson’s rule to the same problem.

simpson_n <- function(f, a, b, n = 100) {
  n <- max(c(2*(n %/% 2), 4)) # Make sure n is even and >= 4
  h <- (b-a)/n
  x1 <- seq(a+h, b-h, by = 2*h) # from x_1 to x_{n-1}
  x2 <- seq(a+2*h, b-2*h, by = 2*h) # from x_2 to x_{n-2}
  f1 <- sapply(x1, f)
  f2 <- sapply(x2, f)
  S <- h/3*(f(a) + f(b) + 4*sum(f1) + 2*sum(f2))
  return(S)
}
f <- function(x) return(x*sin(x)+5)
simpson_n(f, 0, 3*pi, 100)
## [1] 56.54867

For \(n=100\), it seems to work better than the trapezoidal rule.

How large should \(n\) be?

When using Simpson’s rule in practice, we need some rule for choosing \(n\) which results in reasonably accurate approximation.

Suppose that \(f\) is Riemann integrable or continuous almost everywhere. Let \(I = \int_a^b f(x)dx\). Also, let \(S(n)\) be the approximate integral via Simpson’s rule using \(n\) subintervals. Then, we have \(|S(n) - I|\to0\). Furthermore, by the triangle inequality, \[\begin{align} |S(2n)-S(n)| &= |S(2n)-I-S(n)+I|\\ &= |S(2n)-I+(-(S(n)-I))|\\ &\leq |S(2n)-I|+|-(S(n)-I)|\\ &= |S(2n)-I|+|S(n)-I|\\ &\to 0 \end{align}\] Thus, given some small \(\epsilon\), we can use \(|S(2n)-S(n)|\leq\epsilon\) as a stopping criterion as we increase \(n\).

As a rule of thumb, \(\epsilon = 10^{-8}\) works well.

Now, modify the above Simpson’s rule so that, given \(\epsilon\), we can search for a large enough \(n\). Observe that we increase \(n\) by a factor of \(2\) each time, so that we can reuse previous function evaluations. In practice evaluating \(f\) is the most expensive operation when doing numerical integration. If we increased \(n\) by just \(2\) (\(n\) has to be even), then the points at which we evaluate \(f\) would all change, except \(a\) and \(b\), so we would be calculating \(f\) at \(n−1\) new points. If we double \(n\) then we can reuse all our existing \(f\) values, so we only need to calculate \(f\) at \(n\) new points.

simpson <- function(f, a, b, tol=1e-8, verbose = FALSE) {
  n <- 4
  h <- (b - a)/n
  fx <- sapply(seq(a, b, by = h), f)
  S <- sum(fx*c(1, 4, 2, 4, 1))*h/3
  S.diff <- tol + 1 # ensures we loop at least once
  
  while (S.diff > tol) {
    S.old <- S
    n <- 2*n
    h <- h/2
    # reuse for the original locations
    fx[seq(1, n+1, by = 2)] <- fx
    # new values for each midpoint
    fx[seq(2, n, by = 2)] <- sapply(seq(a+h, b-h, by = 2*h), f)
    S <- h/3*(fx[1] + fx[n+1] + 4*sum(fx[seq(2, n, by = 2)]) +
                2*sum(fx[seq(3, n-1, by = 2)]))
    S.diff <- abs(S - S.old)
  }
  cat('The number of intervals: n =', n, '\n')
  return(S)
}
f <- function(x) return(x*sin(x)+5)
simpson(f, 0, 3*pi)
## The number of intervals: n = 1024
## [1] 56.54867

Adaptive quadrature

Evenly dividing the integration range is natural and will probably work if we have computational capacity for large \(n\). If not, we can divide the range more cleverly and achieve similar performance with smaller \(n\).

Adaptive quadrature is Simpson’s rule with subinterval width \(h\) automatically adapted. The key idea is that we keep halving each subinterval (resulting in two sub-subintervals) until the stopping criterion is met.

  1. Halve \([a,b]\) into subintervals \([a,c]\) and \([c,b]\).
  2. Apply to each subinterval Simpson’s rule with \(n=2\).
  3. Get an integral over \([a,b]\) as the sum of integrals from both subintervals.
  4. Stop if the criterion is met (i.e., the difference in integrals pre and post subdivisions is less than \(\epsilon\)).
  5. Go back to #1 for each of \([a,c]\) and \([c,b]\).

See the recursion when going back to #1 from #5, where each subinterval at #5 becomes the interval \([a,b]\) at #1 and starts it over.

This way, subdivision locally stops for each subinterval, resulting in varying lengths of subintervals. Let’s apply adaptive quadrature to integrate \(f(x) = 1.5\sqrt{x}\) over \([0,1]\). Note that the graph is steep near the origin as \(f'(x) = \frac{3}{4\sqrt{x}} \to \infty\).

q.recursion <- function(f, a, b, c, fa, fb, fc, I.old, tol, level) {
  level.max <- 64
  if (level > level.max) {
    cat("recursion limit reached: singularity likely\n")
    return(NULL)
  } else {
    h <- (b - a)/4
    f1 <- f(a + h) # for the midpoint of left subinterval
    f2 <- f(b - h) # for the midpoint of right subinterval
    I.left <- h*(fa + 4*f1 + fc)/3 # integral for the left
    I.right <- h*(fc + 4*f2 + fb)/3 # integral for the right
    I <- I.left + I.right
    f.count <- 2 # due to f1 and f2 above

    if (abs(I - I.old) > tol) {
      q.left <- q.recursion(f, a, c, a+h, fa, fc, f1, I.left, tol/2, level+1)
      q.right <- q.recursion(f, c, b, b-h, fc, fb, f2, I.right, tol/2, level+1)
      I <- q.left[1] + q.right[1]
      f.count <-  f.count + q.left[2] + q.right[2]
    }

    return(c(I, f.count))
  }
}
quadrature <- function(f, a, b, tol=1e-8) {
  c = (a + b)/2
  fa <- f(a)
  fb <- f(b)
  fc <- f(c)
  h <- (b - a)/2
  I <- h*(fa + 4*fc + fb)/3 # approximate integral
  q.out <- q.recursion(f, a, b, c, fa, fb, fc, I, tol, 1)
  q.out[2] <- q.out[2] + 3
  return(q.out)
}

f <- function(x) return(1.5 * sqrt(x))
result = quadrature(f, 0, 1, 1e-09)

An approximate integral = 1 with 1205 function evaluations.