These notes are based on Chapter 10 of Jones, O., Maillardet, R., & Robinson, A. (2009). Introduction to scientific programming and simulation using R. CRC Press.

# Introduction

A root is a solution of $$f(x)=0$$. Visually, a root is a point where the graph $$y=f(x)$$ crosses the x-axis. For example, in the interval $$[0,10]$$, there are two roots of the following $$f$$:

plot(c(1,7), c(0,0), pch = 19,
xlim=c(0,10), ylim=c(-1050,1050),
xlab="x", ylab="y",
main=expression(y==x^4-37*x^3+447*x^2-1867*x+1456))
curve((x-1)*(x-7)*(x-13)*(x-16), from=0, to=10, add=TRUE)
abline(h=0)
text(1, 100, "root at x = 1", pos=4)
text(7, -100, "root at x = 7", pos=4)

Root finding is important because:

• Factorisation of a polynomial, one of the most popular classes of functions (e.g., Taylor series approximation). For example, \begin{align} f(x) &= x^4-37x^3+447x^2-1867x+1456\\ &= (x-1)(x-7)(x-13)(x-16) \end{align}
• Many problems can be modelled as root finding of suitable functions. Remember that finding a root of $$f(x)$$ is equivalent to solving an equation $$g(x)=h(x)$$ where $$f(x) = g(x)-h(x)$$.

## Loan repayments

Let’s suppose:

• $$P_n$$ : Remaining amount of loan after $$n$$ months
• $$P_0$$ : Initial amount of loan
• $$N$$ : Duration of repayments
• $$A$$ : Monthly repayment amount
• $$r$$ : Monthly interest rate

Given $$P_{n-1}$$, we know: $P_n = (1+r)P_{n-1} - A$ That is, each month, interest based on $$P_{n-1}$$ is added and then you repay $$A$$. The solution of the difference equation (i.e., the expression of $$P_n$$ as a function of $$n$$) is:

$P_n = (1+r)^nP_0 - \frac{A}{r}\left((1+r)^n-1\right) .$

To see this, for each $$k\in\{1,\dots,n\}$$, multiply both sides of $$P_k = (1+r)P_{k-1} - A$$ by $$(1+r)^{n-k}$$ and get a telescoping sum.

We negotiate with a bank on the interest rate. A question is under what $$r$$, can we repay the loan after $$N$$ months?

We could get the answer by solving $$P_N=0$$ for $$r$$: \begin{alignat}{2} && P_N &= 0\\ &\Leftrightarrow \quad & (1+r)^NP &= \frac{A}{r}\left((1+r)^N-1\right)\\ &\Leftrightarrow \quad & (1+r)^N &= \frac{A}{A-rP}\\ &&&\vdots \end{alignat} Well, this is hard (at least for me). Instead, we may use numerical root-finding methods for $$f(r):=P_N$$.

# Fixed-point iteration

## Fixed point

Let $$g: \mathbb{R}\to\mathbb{R}$$ be a continuous function. Then, a fixed point of $$g$$ is a solution of $$g(x)=x$$. Visually, a fixed point is a point where the graph $$y=g(x)$$ crosses the 45-degree line. For example, a hyperbolic tangent $g(x) = 2\tanh(x) = 2\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}$ has three fixed points: $$x = -1.915, 0, 1.915$$.

x <- c(-1.915,0,1.915)
y <- c(-1.915,0,1.915)
plot(x, y, pch = 19, xlim=c(-2,2), ylim=c(-2,2))
title(expression(y==2*tanh(x)))
curve(2*tanh(x), from=-3, to=3, add=TRUE)
curve(x*1, from=-3, to=3, add=TRUE)
text(-1.55, -.8, "fixed point at -1.915")
text(0, -.2, "fixed point at 0", pos=4)
text(1.55, .8, "fixed point at 1.915")
arrows(-1.8, -1, -1.9, -1.8, length=0.1)
arrows(1.8, 1, 1.9, 1.8, length=0.1)

Given $$f$$, root finding is to compute a value $$a$$ such that $$f(a)=0$$. But, as long as we legally manipulate the equation $$f(a)=0$$, the same $$a$$ satisfies both the original $$f(a)=0$$ and transformed equations.

This is why we may turn root finding into a fixed-point problem. Among many possible ways to manipulate $$f(a)=0$$, the following is perhaps the simplest. $f(a)=0 \;\Leftrightarrow\; f(a)+a=a$ Then, we define a new function $$g$$ such that $$g(x):=f(x)+x$$ and try to find a fixed point of $$g$$ to find a root of $$f$$.

Clearly, this is not the only way of transformation and, as seen below, the performance of fixed-point iteration depends on how we transform $$f(a)=0$$ and define $$g$$.

## Method

The fixed-point iteration is a method for solving $$g(x)=x$$ by iteratively generating points $$x_0,x_1,\dots$$ that, we hope, converge to $$a$$ such that $$g(a)=a$$.

Given $$x_n$$, the generation of points follows a simple difference equation $$x_{n+1}=g(x_n)$$. Clearly, to start somewhere at $$n=0$$, we need an initial guess $$x_0$$.

If a sequence $$\{x_n\}$$ converges to some point $$a$$, then the continuity of $$g$$ implies that $$a$$ is indeed a fixed point of $$g$$.

$a = \lim_{n\to\infty}x_{n+1} = \lim_{n\to\infty}g(x_n) = g\left(\lim_{n\to\infty}x_n\right) = g(a)$

An issue of fixed-point iteration is that convergence of $$\{x_n\}$$ is not guaranteed but depends on $$g$$ and $$x_0$$.

For example, in the case of $$g(x)=2\tanh(x)$$, $$\{x_n\}$$ converges no matter what $$x_0$$ is.

x <- c(.2,0.395,0.4,0.751,0.751,1.271,1.271,1.708,1.708,
-.2,-0.395,-0.4,-0.751,-0.751,-1.271,-1.271,-1.708,-1.708)
y <- c(0.395,0.4,0.751,0.751,1.271,1.271,1.708,1.708,1.873,
-0.395,-0.4,-0.751,-0.751,-1.271,-1.271,-1.708,-1.708,-1.873)
plot(x, y, xlim=c(-2,2), ylim=c(-2,2), cex=.7)
points(c(-1.915,0,1.915), c(-1.915,0,1.915), pch = 19)
text(0, -.2, "unstable fixed point", pos=4)
title(expression(y==2*tanh(x)))
curve(2*tanh(x), from=-3, to=3, add=TRUE)
curve(x*1, from=-3, to=3, add=TRUE)
arrows(.2, .4, .4, .4, length=0.1, angle=10)
arrows(.4, .4, .4, .75, length=0.1, angle=10)
arrows(.4, .75, .75, .75, length=0.1, angle=10)
arrows(.75, .75, .75, 1.27, length=0.1, angle=10)
arrows(.75, 1.27, 1.27, 1.27, length=0.1, angle=10)
arrows(1.27, 1.27, 1.27, 1.7, length=0.1, angle=10)
arrows(1.27, 1.7, 1.71, 1.71, length=0.1, angle=10)
arrows(1.71, 1.71, 1.71, 1.87, length=0.1, angle=10)
arrows(1.71, 1.87, 1.84, 1.87, length=0.1, angle=10)
arrows(-.2, -.4, -.4, -.4, length=0.1, angle=10)
arrows(-.4, -.4, -.4, -.75, length=0.1, angle=10)
arrows(-.4, -.75, -.75, -.75, length=0.1, angle=10)
arrows(-.75, -.75, -.75, -1.27, length=0.1, angle=10)
arrows(-.75, -1.27, -1.27, -1.27, length=0.1, angle=10)
arrows(-1.27, -1.27, -1.27, -1.7, length=0.1, angle=10)
arrows(-1.27, -1.7, -1.71, -1.71, length=0.1, angle=10)
arrows(-1.71, -1.71, -1.71, -1.87, length=0.1, angle=10)
arrows(-1.71, -1.87, -1.84, -1.87, length=0.1, angle=10)

Can you follow how arrows move? Basically, a point on the graph $$y=g(x)$$ moves horizontally to the 45-degree line because the current $$y$$ value, say $$y=g(x_n)$$, becomes a new $$x=x_{n+1}$$ value. Then, it moves vertically to the graph and gets a new $$y=g(x_{n+1})$$ value.

In contrast, in the case of $$g(x)=e^x-1$$, $$\{x_n\}$$ converges if $$x_0\leq0$$ and diverges if $$x_0>0$$.

x <- c(.5,0.65,0.65,0.916,0.916,1.5,
-1.5,-0.777,-0.777,-0.54,-0.54)
y <- c(0.65,0.65,0.916,0.916,1.5,1.5,
-0.777,-0.777,-0.54,-0.54,-0.417)
plot(x, y, xlim=c(-1.5,1.5), ylim=c(-1.5,2), cex=.7)
points(0, 0, pch = 19)
text(0, -.2, "half-stable fixed point", pos=4)
title(expression(y==e^x-1))
curve(exp(x)-1, from=-2, to=2, add=TRUE)
curve(x*1, from=-2, to=2, add=TRUE)
arrows(.5, .65, .65, .65, length=0.1, angle=10)
arrows(.65, .65, .65, .916, length=0.1, angle=10)
arrows(.65, .916, .916, .916, length=0.1, angle=10)
arrows(.916, .916, .916, 1.5, length=0.1, angle=10)
arrows(.916, 1.5, 1.5, 1.5, length=0.1, angle=10)
arrows(1.5, 1.5, 1.5, 2, length=0.1, angle=10)
arrows(-1.5, -0.777, -.777, -0.777, length=0.1, angle=10)
arrows(-0.777, -0.777, -.777, -0.54, length=0.1, angle=10)
arrows(-0.777, -0.54, -0.54, -0.54, length=0.1, angle=10)
arrows(-0.54, -0.54, -0.54, -0.417, length=0.1, angle=10)
arrows(-0.54, -0.417, -0.45, -0.417, length=0.1, angle=10)