These notes are based on Chapter 10 of Jones, O., Maillardet, R., & Robinson, A. (2009). Introduction to scientific programming and simulation using R. CRC Press.

Introduction

A root is a solution of \(f(x)=0\). Visually, a root is a point where the graph \(y=f(x)\) crosses the x-axis. For example, in the interval \([0,10]\), there are two roots of the following \(f\):

plot(c(1,7), c(0,0), pch = 19,
     xlim=c(0,10), ylim=c(-1050,1050),
     xlab="x", ylab="y",
     main=expression(y==x^4-37*x^3+447*x^2-1867*x+1456))
curve((x-1)*(x-7)*(x-13)*(x-16), from=0, to=10, add=TRUE)
abline(h=0)
text(1, 100, "root at x = 1", pos=4)
text(7, -100, "root at x = 7", pos=4)

Root finding is important because:

Loan repayments

Let’s suppose:

  • \(P_n\) : Remaining amount of loan after \(n\) months
  • \(P_0\) : Initial amount of loan
  • \(N\) : Duration of repayments
  • \(A\) : Monthly repayment amount
  • \(r\) : Monthly interest rate

Given \(P_{n-1}\), we know: \[P_n = (1+r)P_{n-1} - A\] That is, each month, interest based on \(P_{n-1}\) is added and then you repay \(A\). The solution of the difference equation (i.e., the expression of \(P_n\) as a function of \(n\)) is:

\[ P_n = (1+r)^nP_0 - \frac{A}{r}\left((1+r)^n-1\right) .\]

To see this, for each \(k\in\{1,\dots,n\}\), multiply both sides of \(P_k = (1+r)P_{k-1} - A\) by \((1+r)^{n-k}\) and get a telescoping sum.

We negotiate with a bank on the interest rate. A question is under what \(r\), can we repay the loan after \(N\) months?

We could get the answer by solving \(P_N=0\) for \(r\): \[\begin{alignat}{2} && P_N &= 0\\ &\Leftrightarrow \quad & (1+r)^NP &= \frac{A}{r}\left((1+r)^N-1\right)\\ &\Leftrightarrow \quad & (1+r)^N &= \frac{A}{A-rP}\\ &&&\vdots \end{alignat}\] Well, this is hard (at least for me). Instead, we may use numerical root-finding methods for \(f(r):=P_N\).

Fixed-point iteration

Fixed point

Let \(g: \mathbb{R}\to\mathbb{R}\) be a continuous function. Then, a fixed point of \(g\) is a solution of \(g(x)=x\). Visually, a fixed point is a point where the graph \(y=g(x)\) crosses the 45-degree line. For example, a hyperbolic tangent \[g(x) = 2\tanh(x) = 2\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}\] has three fixed points: \(x = -1.915, 0, 1.915\).

x <- c(-1.915,0,1.915)
y <- c(-1.915,0,1.915)
plot(x, y, pch = 19, xlim=c(-2,2), ylim=c(-2,2))
title(expression(y==2*tanh(x)))
curve(2*tanh(x), from=-3, to=3, add=TRUE)
curve(x*1, from=-3, to=3, add=TRUE)
text(-1.55, -.8, "fixed point at -1.915")
text(0, -.2, "fixed point at 0", pos=4)
text(1.55, .8, "fixed point at 1.915")
arrows(-1.8, -1, -1.9, -1.8, length=0.1)
arrows(1.8, 1, 1.9, 1.8, length=0.1)

Given \(f\), root finding is to compute a value \(a\) such that \(f(a)=0\). But, as long as we legally manipulate the equation \(f(a)=0\), the same \(a\) satisfies both the original \(f(a)=0\) and transformed equations.

This is why we may turn root finding into a fixed-point problem. Among many possible ways to manipulate \(f(a)=0\), the following is perhaps the simplest. \[f(a)=0 \;\Leftrightarrow\; f(a)+a=a\] Then, we define a new function \(g\) such that \(g(x):=f(x)+x\) and try to find a fixed point of \(g\) to find a root of \(f\).

Clearly, this is not the only way of transformation and, as seen below, the performance of fixed-point iteration depends on how we transform \(f(a)=0\) and define \(g\).

Method

The fixed-point iteration is a method for solving \(g(x)=x\) by iteratively generating points \(x_0,x_1,\dots\) that, we hope, converge to \(a\) such that \(g(a)=a\).

Given \(x_n\), the generation of points follows a simple difference equation \(x_{n+1}=g(x_n)\). Clearly, to start somewhere at \(n=0\), we need an initial guess \(x_0\).

If a sequence \(\{x_n\}\) converges to some point \(a\), then the continuity of \(g\) implies that \(a\) is indeed a fixed point of \(g\).

\[a = \lim_{n\to\infty}x_{n+1} = \lim_{n\to\infty}g(x_n) = g\left(\lim_{n\to\infty}x_n\right) = g(a)\]

An issue of fixed-point iteration is that convergence of \(\{x_n\}\) is not guaranteed but depends on \(g\) and \(x_0\).

For example, in the case of \(g(x)=2\tanh(x)\), \(\{x_n\}\) converges no matter what \(x_0\) is.

x <- c(.2,0.395,0.4,0.751,0.751,1.271,1.271,1.708,1.708,
       -.2,-0.395,-0.4,-0.751,-0.751,-1.271,-1.271,-1.708,-1.708)
y <- c(0.395,0.4,0.751,0.751,1.271,1.271,1.708,1.708,1.873,
       -0.395,-0.4,-0.751,-0.751,-1.271,-1.271,-1.708,-1.708,-1.873)
plot(x, y, xlim=c(-2,2), ylim=c(-2,2), cex=.7)
points(c(-1.915,0,1.915), c(-1.915,0,1.915), pch = 19)
text(0, -.2, "unstable fixed point", pos=4)
title(expression(y==2*tanh(x)))
curve(2*tanh(x), from=-3, to=3, add=TRUE)
curve(x*1, from=-3, to=3, add=TRUE)
arrows(.2, .4, .4, .4, length=0.1, angle=10)
arrows(.4, .4, .4, .75, length=0.1, angle=10)
arrows(.4, .75, .75, .75, length=0.1, angle=10)
arrows(.75, .75, .75, 1.27, length=0.1, angle=10)
arrows(.75, 1.27, 1.27, 1.27, length=0.1, angle=10)
arrows(1.27, 1.27, 1.27, 1.7, length=0.1, angle=10)
arrows(1.27, 1.7, 1.71, 1.71, length=0.1, angle=10)
arrows(1.71, 1.71, 1.71, 1.87, length=0.1, angle=10)
arrows(1.71, 1.87, 1.84, 1.87, length=0.1, angle=10)
arrows(-.2, -.4, -.4, -.4, length=0.1, angle=10)
arrows(-.4, -.4, -.4, -.75, length=0.1, angle=10)
arrows(-.4, -.75, -.75, -.75, length=0.1, angle=10)
arrows(-.75, -.75, -.75, -1.27, length=0.1, angle=10)
arrows(-.75, -1.27, -1.27, -1.27, length=0.1, angle=10)
arrows(-1.27, -1.27, -1.27, -1.7, length=0.1, angle=10)
arrows(-1.27, -1.7, -1.71, -1.71, length=0.1, angle=10)
arrows(-1.71, -1.71, -1.71, -1.87, length=0.1, angle=10)
arrows(-1.71, -1.87, -1.84, -1.87, length=0.1, angle=10)

Can you follow how arrows move? Basically, a point on the graph \(y=g(x)\) moves horizontally to the 45-degree line because the current \(y\) value, say \(y=g(x_n)\), becomes a new \(x=x_{n+1}\) value. Then, it moves vertically to the graph and gets a new \(y=g(x_{n+1})\) value.

In contrast, in the case of \(g(x)=e^x-1\), \(\{x_n\}\) converges if \(x_0\leq0\) and diverges if \(x_0>0\).

x <- c(.5,0.65,0.65,0.916,0.916,1.5,
       -1.5,-0.777,-0.777,-0.54,-0.54)
y <- c(0.65,0.65,0.916,0.916,1.5,1.5,
       -0.777,-0.777,-0.54,-0.54,-0.417)
plot(x, y, xlim=c(-1.5,1.5), ylim=c(-1.5,2), cex=.7)
points(0, 0, pch = 19)
text(0, -.2, "half-stable fixed point", pos=4)
title(expression(y==e^x-1))
curve(exp(x)-1, from=-2, to=2, add=TRUE)
curve(x*1, from=-2, to=2, add=TRUE)
arrows(.5, .65, .65, .65, length=0.1, angle=10)
arrows(.65, .65, .65, .916, length=0.1, angle=10)
arrows(.65, .916, .916, .916, length=0.1, angle=10)
arrows(.916, .916, .916, 1.5, length=0.1, angle=10)
arrows(.916, 1.5, 1.5, 1.5, length=0.1, angle=10)
arrows(1.5, 1.5, 1.5, 2, length=0.1, angle=10)
arrows(-1.5, -0.777, -.777, -0.777, length=0.1, angle=10)
arrows(-0.777, -0.777, -.777, -0.54, length=0.1, angle=10)
arrows(-0.777, -0.54, -0.54, -0.54, length=0.1, angle=10)
arrows(-0.54, -0.54, -0.54, -0.417, length=0.1, angle=10)
arrows(-0.54, -0.417, -0.45, -0.417, length=0.1, angle=10)