These notes are based on Chapter 11 of Jones, O., Maillardet, R., & Robinson, A. (2009). Introduction to scientific programming and simulation using R. CRC Press.
A definite integral \(\int_a^b
f(x)dx\) is easy to compute if we can find an antiderivative
\(F\) such that \(F'(x) = f(x)\) because \(\int_a^b f(x)dx = F(b)-F(a)\).
Unfortunately, for many functions, it is impossible to write down an
antiderivative in a closed form. This is a key motivation for numerical
integration, which could provide good approximations of definite
integrals. For example, the cumulative distribution function (CDF) of
the standard normal distribution \[\Phi(z) =
\int_{-\infty}^z\frac{1}{\sqrt{2\pi}}e^{-x^2/2}dx\] is commonly
computed via numerical integration. R has a built-in pnorm
for \(\Phi\) and, for example, the
following is a good approximation of \(\Phi(1.96)\).
pnorm(1.96)
## [1] 0.9750021
A general strategy in both analytical and numerical mathematics is when a function is too complex to handle, we approximate it by simple functions. However, there is a key difference between two realms. In analytical integration, an integral is defined as a limit. So, it usually does not matter whether we approximate the integrand by functions or constant values. In numerical integration, on the other hand, we always have to settle for finite granularity, so the choice of simple functions affects the performance of approximation.
In these notes, we consider linear and quadratic approximations. Specifically, the following are three methods in ascending order of sophistication:
These are deterministic methods for numerical integration. In contrast, Monte Carlo integration is a class of stochastic methods.
A test problem is to integrate \(f(x) = x\sin(x)+5\) over \([0,3\pi]\), which can be analytically computed using integration by parts. \[\begin{align} \int_0^{3\pi} x\sin(x)+5 dx &= \int_0^{3\pi} x\sin(x) dx + \int_0^{3\pi} 5 dx\\ &= \left( -x\cos(x)\Bigr\rvert_0^{3\pi} + \int_0^{3\pi}\cos(x)dx \right) + 15\pi\\ &= 18\pi\\ &\approx 56.5486677646 \end{align}\]
<- function(x) x*sin(x)+5
f integrate(f, 0, 3*pi) # R built-in function
## 56.54867 with absolute error < 8e-12
Given a finite set of subintervals, the trapezoidal rule approximates the area under \(y=f(x)\) by the finite sum of trapezoid areas, which are easy to compute. For example, using \(n=6\) subintervals,
curve(x*sin(x)+5, xaxt="n", lwd=1.5, from=-.5, to=9.9, ylab="y")
abline(h=0)
title("Illustration of the trapezoidal rule")
<- seq(0, 3*pi, .5*pi)
x # How can we generate these expressions using a loop?
<- c(expression(x[0]==0), expression(x[1]), expression(x[2]),
letters expression(x[3]), expression(x[4]), expression(x[5]),
expression(x[6]==3*pi))
axis(1, at=x, labels=letters)
<- c(0, .5*pi, pi, 1.5*pi, 2*pi, 2.5*pi, 3*pi)
from_x <- rep(0,7)
from_y <- from_x
to_x <- sapply(from_x, function(x) x*sin(x)+5)
to_y segments(from_x, from_y, to_x, to_y, lty=3)
<- from_x[-length(from_x)]
from_x <- to_y[-length(to_y)]
from_y <- to_x[-1]
to_x <- to_y[-1]
to_y segments(from_x, from_y, to_x, to_y, lty=3)
and using \(n=12\) subintervals,
curve(x*sin(x)+5, xaxt="n", lwd=1.5, from=-.5, to=9.9, ylab="y")
abline(h=0)
title("Illustration of the trapezoidal rule")
<- seq(0, 3*pi, length.out=13)
x # How can we generate these expressions using a loop?
<- c(expression(x[0]==0), expression(x[1]), expression(x[2]),
letters expression(x[3]), expression(x[4]), expression(x[5]),
expression(x[6]), expression(x[7]), expression(x[8]),
expression(x[9]), expression(x[10]), expression(x[11]),
expression(x[12]==3*pi))
axis(1, at=x, labels=letters)
<- seq(0, 3*pi, length.out=13)
from_x <- rep(0,13)
from_y <- from_x
to_x <- sapply(from_x, function(x) x*sin(x)+5)
to_y segments(from_x, from_y, to_x, to_y, lty=3)
<- from_x[-length(from_x)]
from_x <- to_y[-length(to_y)]
from_y <- to_x[-1]
to_x <- to_y[-1]
to_y segments(from_x, from_y, to_x, to_y, lty=3)