Lecture 2 — Nonlinear systems - Root finding#
Today we will cover nonlinear equations and systems of nonlinear equations (single and multivariable)
Learning Outcomes#
By the end of this lecture you will be able to:
Explain why many engineering problems described by a nonlinear equation (or a system) can be seeing to solving f(x)=0 or F(x)=0.
Understand convergence/stop criteria (function tolerance, step tolerance, iteration cap).
Distinguish bracketed (bisection) vs open (Newton) root-finding numerical methods.
Implement bisection and Newton methods from scratch, as well as use
scipy.optimizeequivalents.Understand what can go wrong when employing these methods and how to circumvent such challenges.
Transition from single to multivariable, system of equations root-finding.
Be able to use
scipy.optimize.rootto solve systems of nonlinear equations.
1) Root finding: exact vs numerical; errors & tolerance#
Exact vs numerical: most ChemE models need numerical roots. There is no analytical (closed-form) solution to the expression (or set of expressions) that arise. We saw some examples last lecture. A simple example (not ChemE) is:
\(f(x) = ℯ^{-x}+ 0.1 \, \,x \,\, sin(x) = 0\)
This nonlinear equation has no simple algebraic solution for \(x\). Instead, numerical root-finding methods are used to determine the approximate solution. That is, within a tolerance margin.
We should reduce/simplify the system at hand to f(x)=0 (or F(x)=0 in multivariable case, uppercase here means a vector function).
A numerical method needs tolerances to iterate towards a solution. For example:
Function (\(|f(x_k)|<\varepsilon_f\)), step (\(|x_{k+1}-x_k|<\varepsilon_x\)), and maximum number of iterations are alternatives.
We stop when one (or more) conditions are met:
Function tolerance: \(|f(x_k)| < \varepsilon_f\) — we’re close to a zero/root.
Step/interval tolerance: \(|x_{k+1}-x_k| < \varepsilon_x\) or bisection width \((b-a) < \varepsilon_x\).
Iteration cap:
max_iter— prevents infinite loops.
2) Interval Halving Methods: Bisection#
Idea. With a sign-change bracket \([a,b]\) for a continuous \(f\), a root exists inside this bracket. This is called the intermediate value theorem.
If we halve repeatedly and keep the sign-change subinterval, we will find a solution eventually!
Bisection Visual Exploration#
Let’s use an interactive animation to better understand how interval halving methods, such as Bisection, work.
If you are seeing this in the website version of the lecture, just open this notebook in your google colab session or locally in your machine, by downloading the *.ipynb file.
The example is a nonlinear function defined as \(y(x) = e^{-x} - 0.1sin(x)\)
How does this in code looks like? Let’s see one possible implementation
import numpy as np
def bisection(f, a, b, ftol=1e-10, xtol=1e-12, max_iter=200):
"""
Bisection's Method.
Inputs
------
f: function
The function we want to find the root.
a: float
Lower bound of the initial interval.
b: float
Upper bound of the initial interval.
ftol: float
Tolerance for absolute function value (default: 1e-10)
xtol: float
Tolerance for the step length (default: 1e-12)
max_iter: int
Max number of iterations (default: 200)
Returns
-------
m: float
Approximate root of f(x)
iter: int
Number of iterations.
fm: float
Value of f(x) at the solution.
"""
fa = f(a)
fb = f(b)
if fa * fb > 0:
raise ValueError("f(a)*f(b)>0, hence we don't have a sign change. Exit.")
for k in range(max_iter):
m = 0.5 * (a + b)
fm = f(m)
if np.abs(fm) < ftol or np.abs((b-a)) < xtol:
return m, k+1, fm
if fa * fm < 0:
b, fb = m, fm
else:
a, fa = m, fm
m = 0.5 * (a + b)
fm = f(m)
return m, max_iter, fm
def f_bisect(x):
return np.exp(-x) - 0.1*x*np.sin(x)
a = 5.00
b = 8.00
solution = bisection(f_bisect, a, b)
print(solution)
(6.286147252423689, 32, np.float64(-2.1169849293706244e-11))
A few notes regarding the bisection method:
We will always find a solution, given that the intermediate value theorem is true when choosing the interval \([a,b]\)
Bisection can be a bit slow to converge. It has linear convergence.
This leaves an open question of “can we have faster methods?” Let’s see it below
3) Open Methods: Newton’s Method#
In the open methods, we don’t need an interval \([a,b]\) to try to find a solution. Instead, we need only an initial estimate \(x_0\). This might look an absolute improvement at a first glance, since these methods tend to converge faster in general, but there are a few caveats. First, let’s take a look at the iterative equation for Newton’s method:
Newton’s method formula: \(x_{k+1}=x_k-\dfrac{f(x_k)}{f'(x_k)}\)
How do we even got this formula in the first place?
Newton’s Method derivation via Taylor Series Expansion (TSE)#
Reminder: What is a Taylor Series?#
If a function \(f(x)\) is sufficiently smooth (that is, we can take derivatives), we are able to approximate it near a point, say, \(x=a\) by its derivatives at \(a\):
\(f(x) = f(a) + f'(a)(x-a) + \frac{f''(a)}{2!}(x-a)^2 + \frac{f^{(3)}(a)}{3!}(x-a)^3 + \cdots\)
The first term is the value of \(f\) at \(a\).
The second term adds the local linear (slope) correction.
Higher-order terms capture curvature and what we call “high-order” effects.
When \(x\) is close to \(a\), the higher-order terms are small.
Apply Taylor Expansion to Root Finding#
Remember our goal: solve \(f(x)=0\) ! (exclamation mark, not factorial :) )
Let \(x_k\) be the current guess. Expand \(f\) with a \(1^{st}\) order TSE expansion around \(x_k\): \(f(x) \approx f(x_k) + f'(x_k)(x - x_k)\)
We know \(x_k\) and hence, can calculate \(f(x_k)\) and \(f'(x_k)\)
Impose the Root Condition#
At the next iterate \(x_{k+1}\) we want \(f(x_{k+1})=0\). After all, we want it to converge, correct?
Let’s impose this condition then. \(f(x_{k+1})=0\):
\((x_{k+1})=0 \approx f(x_k) + f'(x_k)(x_{k+1} - x_k)\)
Solve for \(x_{k+1}\) to get the Newton update (also called newton step):
\(x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)}\)
This is also written sometimes as
\(x_{k+1} = x_k + d_k\)
\(d_k = - \frac{f(x_k)}{f'(x_k)}\)
Why is Newton’s method fast?#
Let the root be \(r\) and define the error \(e_k = x_k - r\)
Start from the full Taylor series of \(f\) about \(r\) (with \(x_k = r + e_k\)):
Since \(f(r)=0\), this simplifies to
If \(x_k\) is close to \(r\) (so \(|e_k|\) is small), higher-order terms are negligible, giving
Newton’s update is
Hence the next error is
Substitute the Taylor approximation for \(f(x_k)\) and (near \(r\)) replace \(f'(x_k)\) by \(f'(r)\):
Thus,
The new error is proportional to the square of the old error (quadratic convergence).
If \(e_k \sim 10^{-2}\), then \(e_{k+1} \sim 10^{-4}\).
By contrast, bisection has only linear convergence: \(e_{k+1} \approx \tfrac{1}{2}e_k\).
Newton’s Method Visual Exploration#
This works in your jupyter notebook sesssion. Not in the website. Let’s explore Newton’s method!
Again, you should not worry about the code that generated the anmiations below. It’s just the use of jupyter widgets (and a bit of prompting using ChatGPT to get the widget working well) so we could generate an interactive visualization of how Newton’s method works :)
An example in which things can go wrong:
It quickly diverges! If we change the initial step slightly, say, \(x_0=0.9\), it converges. Try it yourself in the interface above.
Implementation of Newton’s Method (scalar case)#
import numpy as np
def newton(f, fprime, x0, tol=1e-8, max_iter = 50):
"""
Newton's method.
Inputs
------
f: function
The function we want to determine the root.
fprime: function
The derivative of the function f
x0: float
Initial estimate (guess).
tol: float
Tolerance.
max_iter:
Maximum number of iterations...
Returns
-------
x: float
Approximate root of f(x)
iter: int
Number of iterations
sol: float
Value of f(x) at the solution.
"""
x = x0
if np.abs(f(x)) < tol:
return x
# Main for loop:
for k in range(max_iter):
x_new = x - f(x)/fprime(x) # Newton's update formula
if np.abs(f(x_new)) < tol: # Function value small enough. stop.
return x_new, k+1, np.abs(f(x_new))
if np.abs(x_new - x) < 1e-14: # Step size small enough. Stop.
## TODO: Maybe we should add 1e-14 as a keyword argument instead of hard coding it.
return x_new, k+1, np.abs(f(x_new))
x = x_new
fx = f(x)
x = x_new
raise ValueError(f"Newton's method did not converge after {k+1} iterations;", f"x={x:.6f}, f(x)={fx:.6f}")
Let’s try a \(3^{rd}\) order degree polynomial, defined as \(f(x) = x^3 - 7x^2 + 14x - 8\) and with derivative defined as \(f'(x)=3x^2 - 14x + 14\)
def f(x):
return x**3 - 7*x**2 + 14*x - 8
def fprime(x):
return 3*x**2 - 14*x + 14
Solving using our function newton.
x, k, sol = newton(f, fprime, 0.7, tol=1e-8, max_iter=50)
print('optimal solution is x=', x)
print('number of iterations =', k)
print('f(x)=', sol)
optimal solution is x= 0.9999999967198463
number of iterations = 4
f(x)= 9.840460890586655e-09
4) Using SciPy: bisect, newton#
Scipy already contains implementations of bisection and Newton’s method. We will go over the use of these now. But their functionality is no different from what we just saw in class.
optimize.bisect(f, a, b): robust with a bracket.optimize.newton(f, x0, fprime=...): Newton with derivative
Let’s try another example, and compare our implementation with scipy’s. Let’s take a look at \(f(x) = x-cos(x)\)
from scipy import optimize
import numpy as np
# Defining the function and its derivative (for Newton's method)
def f(x): return x - np.cos(x)
def fprime(x): return 1 + np.sin(x)
# Let's solve all of these
print('SciPy bisection, x =', optimize.bisect(f, 0.0, 1.0, xtol=1e-12, rtol=1e-12, maxiter=200))
print('SciPy newton, x =', optimize.newton(f, x0=0.7, fprime=fprime, tol=1e-12, maxiter=50))
print('Our implementation of bisection, x =', bisection(f, 0.0, 1.0)[0])
print('Our implementation of Newton\'s method, x =', newton(f, fprime, x0=0.7)[0])
SciPy bisection, x = 0.7390851332156672
SciPy newton, x = 0.7390851332151607
Our implementation of bisection, x = 0.7390851331874728
Our implementation of Newton's method, x = 0.7390851332151608
Now, let’s try the example that was used in the demo above for Newton’s method, that is \(f(x) = x^3 + 7x^2 + 14x -8\).
The derivative for this is \(f'(x) = 3x^2 - 14x + 14\).
This equation has three roots as we saw in the animation above (1, 2 and 4)
def f(x):
return x**3 - 7*x**2 + 14*x - 8
def fprime(x):
return 3*x**2 - 14*x + 14
print('SciPy bisection, x = :', optimize.bisect(f, 0.0, 1.46, xtol=1e-12, rtol=1e-12, maxiter=200))
print('SciPy newton, x = :', optimize.newton(f, x0=1.45, fprime=fprime, tol=1e-12, maxiter=50))
print('Our implementation of bisection, x =', bisection(f, 0.0, 1.5)[0])
print('Our implementation of Newton\'s method, x =', newton(f, fprime, x0=1.45)[0])
SciPy bisection, x = : 1.0000000000000548
SciPy newton, x = : 0.9999999999999999
Our implementation of bisection, x = 1.0000000000291038
Our implementation of Newton's method, x = 0.9999999999986046
print('SciPy bisection, x = :', optimize.bisect(f, 1.5, 2.5, xtol=1e-12, rtol=1e-12, maxiter=200))
print('SciPy newton, x = :', optimize.newton(f, x0=1.5, fprime=fprime, tol=1e-12, maxiter=50))
print('Our implementation of bisection, x =', bisection(f, 1.5, 2.5)[0])
print('Our implementation of Newton\'s method, x =', newton(f, fprime, x0=1.5)[0])
SciPy bisection, x = : 2.0
SciPy newton, x = : 4.0
Our implementation of bisection, x = 2.0
Our implementation of Newton's method, x = 4.0
from scipy import optimize
import numpy as np
def f(x):
return np.tanh(x)
def fprime(x):
t = np.tanh(x)
return 1.0 - t*t # sech^2(x)
print('SciPy newton, x = :', optimize.newton(f, x0=1.08, fprime=fprime, tol=1e-12, maxiter=50))
print('Our implementation of Newton\'s method, x =', newton(f, fprime, x0=1.01)[0])
SciPy newton, x = : 0.0
Our implementation of Newton's method, x = -5.025528739899601e-12
Extension to systems: Newton’s method in higher dimensions#
It is not uncommon that instead of needing to solve a single nonlinear function, we have a system of nonlinear equations. These arise all the time in chemical reaction engineering, process design and control, process dynamics, and so on. These all share in common the particular feature that now, the unknown is a vector \(x_k \in \mathbb{R}^n\).
We put \(n\) nonlinear equations into what we call a vector function \(F:\mathbb{R}^n \to \mathbb{R}^n\):
And we want \(F(x)=0\). Nothing has changed in our goal (root finding), we now just have more equations to deal with simultaneously. This is actually a system of nonlinear equations, and hence, we have to use the appropriate mathematical concepts to deal with this increased dimensionality.
Jacobian#
The Jacobian \(J(x)\) is the matrix of all first derivatives:
Expanding the notation above you can see the full matrix as:
And here we show that the Newton’s method is conceptually the same for the multivariate case. If we expand \(F\) employing Taylor around \(x_k\):
And impose the root condition at \(x_{k+1}\), that is, we want \(F(x_{k+1}) = 0\), we have:
Lastly, if we solve for \(x_{k+1}\):
Equivalently, we can write this equation as
But we avoid this in practice since evaluating an inverse is expensive from a computational standpoint. Instead, we solve the linear system posed by \(J(x_k)d_k = -F(x_k)\)
Side-by-side: scalar vs. vector versions of Newton’s method#
To further stress the fact that these are the same algorithm, just adapted for higher dimensions, please take a look at the table below for a quick comparison:
Concept |
Scalar (1D) |
Vector (nD) |
|---|---|---|
Unknown |
\(x_k \in \mathbb{R}\) |
\(x_k \in \mathbb{R}^n\) |
Function |
\(f:\mathbb{R}\to\mathbb{R}\) |
\(F:\mathbb{R}^n\to\mathbb{R}^n\) |
Expansion |
\(f(x)\approx f(x_k)+f'(x_k)(x-x_k)\) |
\(F(x)\approx F(x_k)+J(x_k)(x-x_k)\) |
Root condition |
\(0\approx f(x_k)+f'(x_k)(x_{k+1}-x_k)\) |
\(0\approx F(x_k)+J(x_k)(x_{k+1}-x_k)\) |
Newton step |
\(d_k=-\tfrac{f(x_k)}{f'(x_k)}\) |
\(J(x_k)d_k=-F(x_k)\) |
Update |
\(x_{k+1}=x_k+d_k\) |
\(x_{k+1}=x_k+d_k\) |
Example: \(2\times 2\) generic system#
Suppose
The Newton step solves
Then \(x_{k+1} = x_k + d_k\).
Can we code this in Python? (We continue from here on Sep 3rd. Your first homework does not require the code development below.)#
Yup! It’s pretty much the same idea:
import numpy as np
def newton_nd(F, J, x0, tol=1e-8, max_iter = 50):
"""
Newton's method (multivariate).
Inputs
------
f: function
The vector-valued function we want to determine the root.
J: function
The Jacobian of the function F
x0: float
Initial estimate (guess).
tol: float
Tolerance.
max_iter:
Maximum number of iterations...
Returns
-------
x: float
Approximate root of f(x)
iter: int
Number of iterations
sol: float
Value of f(x) at the solution.
"""
x = x0
if np.linalg.norm(F(x), ord =2) < tol:
return x, 0, F(x)
# Main for loop:
for k in range(max_iter):
x_new = x - np.linalg.inv(J(x)) @ F(x)
if np.linalg.norm(F(x_new)) < tol: # Function value small enough. stop.
return x_new, k+1, F(x_new)
if np.linalg.norm(x_new - x) < 1e-14: # Step size small enough. Stop.
## TODO: Maybe we should add 1e-14 as a keyword argument instead of hard coding it.
return x_new, k+1, F(x_new)
x = x_new
Fx = F(x)
x = x_new
raise ValueError(f"Newton's method did not converge after {k+1} iterations;", f"x={x:.6f}, f(x)={F:.6f}")
Let’s try this for a system defined as
# Define F and J for the system
def F(x):
x1, x2 = x
return np.array([
2.0*x1**2 + x2**2 - 6.0,
x1 + 2.0*x2 - 3.5
])
def J(x):
x1, x2 = x
return np.array([
[4.0*x1, 2.0*x2],
[1.0, 2.0 ]
])
# Two different initial guesses
x0_1 = np.array([1.0, 1.0])
x0_2 = np.array([-1.0, 2.2])
root1, it1, res1 = newton_nd(F, J, x0_1, tol=1e-10, max_iter=50)
root2, it2, res2 = newton_nd(F, J, x0_2, tol=1e-10, max_iter=50)
print("From x0 =", x0_1, "root =", root1, "in", it1, "iters; residual =", res1)
print("From x0 =", x0_2, "root =", root2, "in", it2, "iters; residual =", res2)
From x0 = [1. 1.] root = [1.5958645 0.95206775] in 5 iters; residual = [8.8817842e-16 0.0000000e+00]
From x0 = [-1. 2.2] root = [-0.81808672 2.15904336] in 4 iters; residual = [8.8817842e-16 0.0000000e+00]
What about using scipy?#
You can use scipy.optimize.root to solve the exact same problem (or any root finding problem). Let’s take a look.
import scipy
help(scipy.optimize.root)
Help on function root in module scipy.optimize._root:
root(
fun,
x0,
args=(),
method='hybr',
jac=None,
tol=None,
callback=None,
options=None
)
Find a root of a vector function.
Parameters
----------
fun : callable
A vector function to find a root of.
Suppose the callable has signature ``f0(x, *my_args, **my_kwargs)``, where
``my_args`` and ``my_kwargs`` are required positional and keyword arguments.
Rather than passing ``f0`` as the callable, wrap it to accept
only ``x``; e.g., pass ``fun=lambda x: f0(x, *my_args, **my_kwargs)`` as the
callable, where ``my_args`` (tuple) and ``my_kwargs`` (dict) have been
gathered before invoking this function.
x0 : ndarray
Initial guess.
args : tuple, optional
Extra arguments passed to the objective function and its Jacobian.
method : str, optional
Type of solver. Should be one of
- 'hybr' :ref:`(see here) <optimize.root-hybr>`
- 'lm' :ref:`(see here) <optimize.root-lm>`
- 'broyden1' :ref:`(see here) <optimize.root-broyden1>`
- 'broyden2' :ref:`(see here) <optimize.root-broyden2>`
- 'anderson' :ref:`(see here) <optimize.root-anderson>`
- 'linearmixing' :ref:`(see here) <optimize.root-linearmixing>`
- 'diagbroyden' :ref:`(see here) <optimize.root-diagbroyden>`
- 'excitingmixing' :ref:`(see here) <optimize.root-excitingmixing>`
- 'krylov' :ref:`(see here) <optimize.root-krylov>`
- 'df-sane' :ref:`(see here) <optimize.root-dfsane>`
jac : bool or callable, optional
If `jac` is a Boolean and is True, `fun` is assumed to return the
value of Jacobian along with the objective function. If False, the
Jacobian will be estimated numerically.
`jac` can also be a callable returning the Jacobian of `fun`. In
this case, it must accept the same arguments as `fun`.
tol : float, optional
Tolerance for termination. For detailed control, use solver-specific
options.
callback : function, optional
Optional callback function. It is called on every iteration as
``callback(x, f)`` where `x` is the current solution and `f`
the corresponding residual. For all methods but 'hybr' and 'lm'.
options : dict, optional
A dictionary of solver options. E.g., `xtol` or `maxiter`, see
:obj:`show_options()` for details.
Returns
-------
sol : OptimizeResult
The solution represented as a ``OptimizeResult`` object.
Important attributes are: ``x`` the solution array, ``success`` a
Boolean flag indicating if the algorithm exited successfully and
``message`` which describes the cause of the termination. See
`OptimizeResult` for a description of other attributes.
See also
--------
show_options : Additional options accepted by the solvers
Notes
-----
This section describes the available solvers that can be selected by the
'method' parameter. The default method is *hybr*.
Method *hybr* uses a modification of the Powell hybrid method as
implemented in MINPACK [1]_.
Method *lm* solves the system of nonlinear equations in a least squares
sense using a modification of the Levenberg-Marquardt algorithm as
implemented in MINPACK [1]_.
Method *df-sane* is a derivative-free spectral method. [3]_
Methods *broyden1*, *broyden2*, *anderson*, *linearmixing*,
*diagbroyden*, *excitingmixing*, *krylov* are inexact Newton methods,
with backtracking or full line searches [2]_. Each method corresponds
to a particular Jacobian approximations.
- Method *broyden1* uses Broyden's first Jacobian approximation, it is
known as Broyden's good method.
- Method *broyden2* uses Broyden's second Jacobian approximation, it
is known as Broyden's bad method.
- Method *anderson* uses (extended) Anderson mixing.
- Method *Krylov* uses Krylov approximation for inverse Jacobian. It
is suitable for large-scale problem.
- Method *diagbroyden* uses diagonal Broyden Jacobian approximation.
- Method *linearmixing* uses a scalar Jacobian approximation.
- Method *excitingmixing* uses a tuned diagonal Jacobian
approximation.
.. warning::
The algorithms implemented for methods *diagbroyden*,
*linearmixing* and *excitingmixing* may be useful for specific
problems, but whether they will work may depend strongly on the
problem.
.. versionadded:: 0.11.0
References
----------
.. [1] More, Jorge J., Burton S. Garbow, and Kenneth E. Hillstrom.
1980. User Guide for MINPACK-1.
.. [2] C. T. Kelley. 1995. Iterative Methods for Linear and Nonlinear
Equations. Society for Industrial and Applied Mathematics.
<https://archive.siam.org/books/kelley/fr16/>
.. [3] W. La Cruz, J.M. Martinez, M. Raydan. Math. Comp. 75, 1429 (2006).
Examples
--------
The following functions define a system of nonlinear equations and its
jacobian.
>>> import numpy as np
>>> def fun(x):
... return [x[0] + 0.5 * (x[0] - x[1])**3 - 1.0,
... 0.5 * (x[1] - x[0])**3 + x[1]]
>>> def jac(x):
... return np.array([[1 + 1.5 * (x[0] - x[1])**2,
... -1.5 * (x[0] - x[1])**2],
... [-1.5 * (x[1] - x[0])**2,
... 1 + 1.5 * (x[1] - x[0])**2]])
A solution can be obtained as follows.
>>> from scipy import optimize
>>> sol = optimize.root(fun, [0, 0], jac=jac, method='hybr')
>>> sol.x
array([ 0.8411639, 0.1588361])
**Large problem**
Suppose that we needed to solve the following integrodifferential
equation on the square :math:`[0,1]\times[0,1]`:
.. math::
\nabla^2 P = 10 \left(\int_0^1\int_0^1\cosh(P)\,dx\,dy\right)^2
with :math:`P(x,1) = 1` and :math:`P=0` elsewhere on the boundary of
the square.
The solution can be found using the ``method='krylov'`` solver:
>>> from scipy import optimize
>>> # parameters
>>> nx, ny = 75, 75
>>> hx, hy = 1./(nx-1), 1./(ny-1)
>>> P_left, P_right = 0, 0
>>> P_top, P_bottom = 1, 0
>>> def residual(P):
... d2x = np.zeros_like(P)
... d2y = np.zeros_like(P)
...
... d2x[1:-1] = (P[2:] - 2*P[1:-1] + P[:-2]) / hx/hx
... d2x[0] = (P[1] - 2*P[0] + P_left)/hx/hx
... d2x[-1] = (P_right - 2*P[-1] + P[-2])/hx/hx
...
... d2y[:,1:-1] = (P[:,2:] - 2*P[:,1:-1] + P[:,:-2])/hy/hy
... d2y[:,0] = (P[:,1] - 2*P[:,0] + P_bottom)/hy/hy
... d2y[:,-1] = (P_top - 2*P[:,-1] + P[:,-2])/hy/hy
...
... return d2x + d2y - 10*np.cosh(P).mean()**2
>>> guess = np.zeros((nx, ny), float)
>>> sol = optimize.root(residual, guess, method='krylov')
>>> print('Residual: %g' % abs(residual(sol.x)).max())
Residual: 5.7972e-06 # may vary
>>> import matplotlib.pyplot as plt
>>> x, y = np.mgrid[0:1:(nx*1j), 0:1:(ny*1j)]
>>> plt.pcolormesh(x, y, sol.x, shading='gouraud')
>>> plt.colorbar()
>>> plt.show()
You can also do ?scipy.optimize.root. Same idea.
?scipy.optimize.root
Signature:
scipy.optimize.root(
fun,
x0,
args=(),
method='hybr',
jac=None,
tol=None,
callback=None,
options=None,
)
Docstring:
Find a root of a vector function.
Parameters
----------
fun : callable
A vector function to find a root of.
Suppose the callable has signature ``f0(x, *my_args, **my_kwargs)``, where
``my_args`` and ``my_kwargs`` are required positional and keyword arguments.
Rather than passing ``f0`` as the callable, wrap it to accept
only ``x``; e.g., pass ``fun=lambda x: f0(x, *my_args, **my_kwargs)`` as the
callable, where ``my_args`` (tuple) and ``my_kwargs`` (dict) have been
gathered before invoking this function.
x0 : ndarray
Initial guess.
args : tuple, optional
Extra arguments passed to the objective function and its Jacobian.
method : str, optional
Type of solver. Should be one of
- 'hybr' :ref:`(see here) <optimize.root-hybr>`
- 'lm' :ref:`(see here) <optimize.root-lm>`
- 'broyden1' :ref:`(see here) <optimize.root-broyden1>`
- 'broyden2' :ref:`(see here) <optimize.root-broyden2>`
- 'anderson' :ref:`(see here) <optimize.root-anderson>`
- 'linearmixing' :ref:`(see here) <optimize.root-linearmixing>`
- 'diagbroyden' :ref:`(see here) <optimize.root-diagbroyden>`
- 'excitingmixing' :ref:`(see here) <optimize.root-excitingmixing>`
- 'krylov' :ref:`(see here) <optimize.root-krylov>`
- 'df-sane' :ref:`(see here) <optimize.root-dfsane>`
jac : bool or callable, optional
If `jac` is a Boolean and is True, `fun` is assumed to return the
value of Jacobian along with the objective function. If False, the
Jacobian will be estimated numerically.
`jac` can also be a callable returning the Jacobian of `fun`. In
this case, it must accept the same arguments as `fun`.
tol : float, optional
Tolerance for termination. For detailed control, use solver-specific
options.
callback : function, optional
Optional callback function. It is called on every iteration as
``callback(x, f)`` where `x` is the current solution and `f`
the corresponding residual. For all methods but 'hybr' and 'lm'.
options : dict, optional
A dictionary of solver options. E.g., `xtol` or `maxiter`, see
:obj:`show_options()` for details.
Returns
-------
sol : OptimizeResult
The solution represented as a ``OptimizeResult`` object.
Important attributes are: ``x`` the solution array, ``success`` a
Boolean flag indicating if the algorithm exited successfully and
``message`` which describes the cause of the termination. See
`OptimizeResult` for a description of other attributes.
See also
--------
show_options : Additional options accepted by the solvers
Notes
-----
This section describes the available solvers that can be selected by the
'method' parameter. The default method is *hybr*.
Method *hybr* uses a modification of the Powell hybrid method as
implemented in MINPACK [1]_.
Method *lm* solves the system of nonlinear equations in a least squares
sense using a modification of the Levenberg-Marquardt algorithm as
implemented in MINPACK [1]_.
Method *df-sane* is a derivative-free spectral method. [3]_
Methods *broyden1*, *broyden2*, *anderson*, *linearmixing*,
*diagbroyden*, *excitingmixing*, *krylov* are inexact Newton methods,
with backtracking or full line searches [2]_. Each method corresponds
to a particular Jacobian approximations.
- Method *broyden1* uses Broyden's first Jacobian approximation, it is
known as Broyden's good method.
- Method *broyden2* uses Broyden's second Jacobian approximation, it
is known as Broyden's bad method.
- Method *anderson* uses (extended) Anderson mixing.
- Method *Krylov* uses Krylov approximation for inverse Jacobian. It
is suitable for large-scale problem.
- Method *diagbroyden* uses diagonal Broyden Jacobian approximation.
- Method *linearmixing* uses a scalar Jacobian approximation.
- Method *excitingmixing* uses a tuned diagonal Jacobian
approximation.
.. warning::
The algorithms implemented for methods *diagbroyden*,
*linearmixing* and *excitingmixing* may be useful for specific
problems, but whether they will work may depend strongly on the
problem.
.. versionadded:: 0.11.0
References
----------
.. [1] More, Jorge J., Burton S. Garbow, and Kenneth E. Hillstrom.
1980. User Guide for MINPACK-1.
.. [2] C. T. Kelley. 1995. Iterative Methods for Linear and Nonlinear
Equations. Society for Industrial and Applied Mathematics.
<https://archive.siam.org/books/kelley/fr16/>
.. [3] W. La Cruz, J.M. Martinez, M. Raydan. Math. Comp. 75, 1429 (2006).
Examples
--------
The following functions define a system of nonlinear equations and its
jacobian.
>>> import numpy as np
>>> def fun(x):
... return [x[0] + 0.5 * (x[0] - x[1])**3 - 1.0,
... 0.5 * (x[1] - x[0])**3 + x[1]]
>>> def jac(x):
... return np.array([[1 + 1.5 * (x[0] - x[1])**2,
... -1.5 * (x[0] - x[1])**2],
... [-1.5 * (x[1] - x[0])**2,
... 1 + 1.5 * (x[1] - x[0])**2]])
A solution can be obtained as follows.
>>> from scipy import optimize
>>> sol = optimize.root(fun, [0, 0], jac=jac, method='hybr')
>>> sol.x
array([ 0.8411639, 0.1588361])
**Large problem**
Suppose that we needed to solve the following integrodifferential
equation on the square :math:`[0,1]\times[0,1]`:
.. math::
\nabla^2 P = 10 \left(\int_0^1\int_0^1\cosh(P)\,dx\,dy\right)^2
with :math:`P(x,1) = 1` and :math:`P=0` elsewhere on the boundary of
the square.
The solution can be found using the ``method='krylov'`` solver:
>>> from scipy import optimize
>>> # parameters
>>> nx, ny = 75, 75
>>> hx, hy = 1./(nx-1), 1./(ny-1)
>>> P_left, P_right = 0, 0
>>> P_top, P_bottom = 1, 0
>>> def residual(P):
... d2x = np.zeros_like(P)
... d2y = np.zeros_like(P)
...
... d2x[1:-1] = (P[2:] - 2*P[1:-1] + P[:-2]) / hx/hx
... d2x[0] = (P[1] - 2*P[0] + P_left)/hx/hx
... d2x[-1] = (P_right - 2*P[-1] + P[-2])/hx/hx
...
... d2y[:,1:-1] = (P[:,2:] - 2*P[:,1:-1] + P[:,:-2])/hy/hy
... d2y[:,0] = (P[:,1] - 2*P[:,0] + P_bottom)/hy/hy
... d2y[:,-1] = (P_top - 2*P[:,-1] + P[:,-2])/hy/hy
...
... return d2x + d2y - 10*np.cosh(P).mean()**2
>>> guess = np.zeros((nx, ny), float)
>>> sol = optimize.root(residual, guess, method='krylov')
>>> print('Residual: %g' % abs(residual(sol.x)).max())
Residual: 5.7972e-06 # may vary
>>> import matplotlib.pyplot as plt
>>> x, y = np.mgrid[0:1:(nx*1j), 0:1:(ny*1j)]
>>> plt.pcolormesh(x, y, sol.x, shading='gouraud')
>>> plt.colorbar()
>>> plt.show()
File: /opt/anaconda3/envs/numerical/lib/python3.13/site-packages/scipy/optimize/_root.py
Type: function
x0_1 = np.array([1.0, 1.0])
x0_2 = np.array([-1, -1])
x_soln_1 = scipy.optimize.root(fun=F, x0=x0_1)
print(x_soln_1.x)
x_soln_2 = scipy.optimize.root(fun=F, x0=x0_2)
print(x_soln_2.x)
[1.5958645 0.95206775]
[-0.81808672 2.15904336]
There is a small detail here. We did not supply a Jacobian to scipy.optimize.root. Why do you think that happened? The method needs derivatives/jacobians, but we did not supply one. Why it still worked?
But if we do supply a Jacobian, what happens?
x_soln_1_jac = scipy.optimize.root(fun=F, x0=x0_1, jac=J)
print(x_soln_1.x)
print(x_soln_1_jac.x)
[1.5958645 0.95206775]
[1.5958645 0.95206775]
The solutions are the same. But what about number of iterations? Let us print both (without and with) exact Jacobian:
print(x_soln_1) # Without providing the Jacobian function we created.
message: The solution converged.
success: True
status: 1
fun: [ 0.000e+00 0.000e+00]
x: [ 1.596e+00 9.521e-01]
method: hybr
nfev: 12
fjac: [[-9.876e-01 -1.572e-01]
[ 1.572e-01 -9.876e-01]]
r: [-6.360e+00 -1.992e+00 -1.708e+00]
qtf: [ 1.086e-10 -1.728e-11]
print(x_soln_1_jac) # Providing the Jacobian function we created.
message: The solution converged.
success: True
status: 1
fun: [ 0.000e+00 0.000e+00]
x: [ 1.596e+00 9.521e-01]
method: hybr
nfev: 10
njev: 1
fjac: [[-9.876e-01 -1.572e-01]
[ 1.572e-01 -9.876e-01]]
r: [-6.360e+00 -1.992e+00 -1.708e+00]
qtf: [ 1.086e-10 -1.728e-11]
⚠️ Important ⚠️
It took scipy.optimize.root fewer iterations (nfev argument printed in the output, 12 vs 10). Why is that?
⚠️ Important ⚠️
Let’s take a look into derivative approximations
Wrap-up & references#
Bisection is reliable in a bracket, but slow. We don’t always have a bracket though.
Newton is fast with good starting points, but can easily diverge or even find different solutions from what we anticipated (it is based on derivative information).
For systems of nonlinear equations, provide Jacobians;
scipy.optimize.rootis robust in practice.
Docs and References used
SciPy Optimize: https://docs.scipy.org/doc/scipy/reference/optimize.html
PyCSE (Kitchin): https://kitchingroup.cheme.cmu.edu/pycse/
Harishankar Manikantan, Solutions of Nonlinear Equations (Chapter 6), ECH60 course notes (GitHub: hmanikantan/ECH60): hmanikantan/ECH60