Lecture 2 — Nonlinear systems - Root finding

Lecture 2 — Nonlinear systems - Root finding#

Today we will cover nonlinear equations and systems of nonlinear equations (single and multivariable)

Learning Outcomes#

By the end of this lecture you will be able to:

Explain why many engineering problems described by a nonlinear equation (or a system) can be seeing to solving f(x)=0 or F(x)=0.
Understand convergence/stop criteria (function tolerance, step tolerance, iteration cap).
Distinguish bracketed (bisection) vs open (Newton) root-finding numerical methods.
Implement bisection and Newton methods from scratch, as well as use scipy.optimize equivalents.
Understand what can go wrong when employing these methods and how to circumvent such challenges.
Transition from single to multivariable, system of equations root-finding.
Be able to use scipy.optimize.root to solve systems of nonlinear equations.

1) Root finding: exact vs numerical; errors & tolerance#

Exact vs numerical: most ChemE models need numerical roots. There is no analytical (closed-form) solution to the expression (or set of expressions) that arise. We saw some examples last lecture. A simple example (not ChemE) is:

\(f(x) = ℯ^{-x}+ 0.1 \, \,x \,\, sin(x) = 0\)

This nonlinear equation has no simple algebraic solution for \(x\). Instead, numerical root-finding methods are used to determine the approximate solution. That is, within a tolerance margin.

We should reduce/simplify the system at hand to f(x)=0 (or F(x)=0 in multivariable case, uppercase here means a vector function).
A numerical method needs tolerances to iterate towards a solution. For example:

Function (\(|f(x_k)|<\varepsilon_f\)), step (\(|x_{k+1}-x_k|<\varepsilon_x\)), and maximum number of iterations are alternatives.

We stop when one (or more) conditions are met:

Function tolerance: \(|f(x_k)| < \varepsilon_f\) — we’re close to a zero/root.
Step/interval tolerance: \(|x_{k+1}-x_k| < \varepsilon_x\) or bisection width \((b-a) < \varepsilon_x\).
Iteration cap: max_iter — prevents infinite loops.

2) Interval Halving Methods: Bisection#

Idea. With a sign-change bracket \([a,b]\) for a continuous \(f\), a root exists inside this bracket. This is called the intermediate value theorem.

If we halve repeatedly and keep the sign-change subinterval, we will find a solution eventually!

Bisection Visual Exploration#

Let’s use an interactive animation to better understand how interval halving methods, such as Bisection, work.

If you are seeing this in the website version of the lecture, just open this notebook in your google colab session or locally in your machine, by downloading the *.ipynb file.

The example is a nonlinear function defined as \(y(x) = e^{-x} - 0.1sin(x)\)

How does this in code looks like? Let’s see one possible implementation

import numpy as np


def bisection(f, a, b, ftol=1e-10, xtol=1e-12, max_iter=200):

    """
    Bisection's Method.

    Inputs
    ------
    f: function
        The function we want to find the root.
    
    a: float
        Lower bound of the initial interval.
    b: float
        Upper bound of the initial interval.
    ftol: float
        Tolerance for absolute function value (default: 1e-10)
    xtol: float
        Tolerance for the step length (default: 1e-12)
    max_iter: int
        Max number of iterations (default: 200)

    Returns
    -------
    m: float
        Approximate root of f(x)
    iter: int
        Number of iterations.
    
    fm: float
        Value of f(x) at the solution.    
    """
    fa = f(a)
    fb = f(b)

    if fa * fb > 0:
        raise ValueError("f(a)*f(b)>0, hence we don't have a sign change. Exit.")


    for k in range(max_iter):
        m = 0.5 * (a + b)

        fm = f(m)

        if np.abs(fm) < ftol or np.abs((b-a)) < xtol:
            return m, k+1, fm

        if fa * fm < 0:
            b, fb = m, fm
        else:
            a, fa = m, fm
    
    m =  0.5 * (a + b)

    fm = f(m)


    return m, max_iter, fm

def f_bisect(x): 
    return np.exp(-x) - 0.1*x*np.sin(x)

a = 5.00
b = 8.00
solution = bisection(f_bisect, a, b)

print(solution)

(6.286147252423689, 32, np.float64(-2.1169849293706244e-11))

A few notes regarding the bisection method:

We will always find a solution, given that the intermediate value theorem is true when choosing the interval \([a,b]\)
Bisection can be a bit slow to converge. It has linear convergence.

This leaves an open question of “can we have faster methods?” Let’s see it below

3) Open Methods: Newton’s Method#

In the open methods, we don’t need an interval \([a,b]\) to try to find a solution. Instead, we need only an initial estimate \(x_0\). This might look an absolute improvement at a first glance, since these methods tend to converge faster in general, but there are a few caveats. First, let’s take a look at the iterative equation for Newton’s method:

Newton’s method formula: \(x_{k+1}=x_k-\dfrac{f(x_k)}{f'(x_k)}\)

How do we even got this formula in the first place?

Newton’s Method derivation via Taylor Series Expansion (TSE)#

Reminder: What is a Taylor Series?#

If a function \(f(x)\) is sufficiently smooth (that is, we can take derivatives), we are able to approximate it near a point, say, \(x=a\) by its derivatives at \(a\):

\(f(x) = f(a) + f'(a)(x-a) + \frac{f''(a)}{2!}(x-a)^2 + \frac{f^{(3)}(a)}{3!}(x-a)^3 + \cdots\)

The first term is the value of \(f\) at \(a\).
The second term adds the local linear (slope) correction.
Higher-order terms capture curvature and what we call “high-order” effects.
When \(x\) is close to \(a\), the higher-order terms are small.

Apply Taylor Expansion to Root Finding#

Remember our goal: solve \(f(x)=0\) ! (exclamation mark, not factorial :) )

Let \(x_k\) be the current guess. Expand \(f\) with a \(1^{st}\) order TSE expansion around \(x_k\): \(f(x) \approx f(x_k) + f'(x_k)(x - x_k)\)

We know \(x_k\) and hence, can calculate \(f(x_k)\) and \(f'(x_k)\)

Impose the Root Condition#

At the next iterate \(x_{k+1}\) we want \(f(x_{k+1})=0\). After all, we want it to converge, correct?

Let’s impose this condition then. \(f(x_{k+1})=0\):

\((x_{k+1})=0 \approx f(x_k) + f'(x_k)(x_{k+1} - x_k)\)

Solve for \(x_{k+1}\) to get the Newton update (also called newton step):

\(x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)}\)

This is also written sometimes as

\(x_{k+1} = x_k + d_k\)

\(d_k = - \frac{f(x_k)}{f'(x_k)}\)

Why is Newton’s method fast?#

Let the root be \(r\) and define the error \(e_k = x_k - r\)

Start from the full Taylor series of \(f\) about \(r\) (with \(x_k = r + e_k\)):

\[ f(x_k) = f(r+e_k) = f(r) + f'(r)e_k + \frac{f''(r)}{2!}e_k^2 + \frac{f^{(3)}(r)}{3!}e_k^3 + \cdots \]

Since \(f(r)=0\), this simplifies to

\[ f(x_k) = f'(r)e_k + \frac{f''(r)}{2}e_k^2 + \frac{f^{(3)}(r)}{6}e_k^3 + \cdots \]

If \(x_k\) is close to \(r\) (so \(|e_k|\) is small), higher-order terms are negligible, giving

\[ f(x_k) \approx f'(r)e_k + \tfrac{1}{2} f''(r)e_k^2 \]

Newton’s update is

\[ x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)} \]

Hence the next error is

\[ e_{k+1} = x_{k+1} - r = e_k - \frac{f(x_k)}{f'(x_k)} \]

Substitute the Taylor approximation for \(f(x_k)\) and (near \(r\)) replace \(f'(x_k)\) by \(f'(r)\):

\[f'(x_k) = f'(r + e_k) = f'(r) + f''(r)e_k + \tfrac12 f^{(3)}(r)e_k^2 + \cdots \approx f'(r) \]

\[ e_{k+1} \approx e_k - \frac{f'(r)e_k + \tfrac{1}{2}f''(r)e_k^2}{f'(r)} = e_k - (e_k - \frac{f''(r)}{2f'(r)} e_k^2) = \frac{f''(r)}{2f'(r)} e_k^2 \]

Thus,

\[ e_{k+1} \propto e_k^2 \]

The new error is proportional to the square of the old error (quadratic convergence).

If \(e_k \sim 10^{-2}\), then \(e_{k+1} \sim 10^{-4}\).

By contrast, bisection has only linear convergence: \(e_{k+1} \approx \tfrac{1}{2}e_k\).

Newton’s Method Visual Exploration#

This works in your jupyter notebook sesssion. Not in the website. Let’s explore Newton’s method!

Again, you should not worry about the code that generated the anmiations below. It’s just the use of jupyter widgets (and a bit of prompting using ChatGPT to get the widget working well) so we could generate an interactive visualization of how Newton’s method works :)

An example in which things can go wrong:

It quickly diverges! If we change the initial step slightly, say, \(x_0=0.9\), it converges. Try it yourself in the interface above.

Implementation of Newton’s Method (scalar case)#

import numpy as np


def newton(f, fprime, x0, tol=1e-8, max_iter = 50):
    """
    Newton's method.


    Inputs
    ------

    f: function
        The function we want to determine the root.

    fprime: function
        The derivative of the function f
    x0: float
        Initial estimate (guess).
    tol: float
        Tolerance.
    max_iter:
        Maximum number of iterations...


    Returns
    -------

    x: float
        Approximate root of f(x)
    iter: int
        Number of iterations
    sol: float
        Value of f(x) at the solution.
    
    
    """

    x = x0

    if np.abs(f(x)) < tol:
        return x
    
    # Main for loop:

    for k in range(max_iter):

        x_new = x - f(x)/fprime(x)  # Newton's update formula

        if np.abs(f(x_new)) < tol: # Function value small enough. stop.
            return x_new, k+1, np.abs(f(x_new))
        if np.abs(x_new - x) < 1e-14: # Step size small enough. Stop.
            ## TODO: Maybe we should add 1e-14 as a keyword argument instead of hard coding it.
            return x_new, k+1, np.abs(f(x_new))
        
        x = x_new

    
    fx = f(x)
    x = x_new

    raise ValueError(f"Newton's method did not converge after {k+1} iterations;", f"x={x:.6f}, f(x)={fx:.6f}")

Let’s try a \(3^{rd}\) order degree polynomial, defined as \(f(x) = x^3 - 7x^2 + 14x - 8\) and with derivative defined as \(f'(x)=3x^2 - 14x + 14\)

def f(x):
    return x**3 - 7*x**2 + 14*x - 8

def fprime(x):
    return 3*x**2 - 14*x + 14

Solving using our function newton.

x, k, sol = newton(f, fprime, 0.7, tol=1e-8, max_iter=50)

print('optimal solution is x=', x)
print('number of iterations =', k)
print('f(x)=', sol)

optimal solution is x= 0.9999999967198463
number of iterations = 4
f(x)= 9.840460890586655e-09

4) Using SciPy: `bisect`, `newton`#

Scipy already contains implementations of bisection and Newton’s method. We will go over the use of these now. But their functionality is no different from what we just saw in class.

optimize.bisect(f, a, b): robust with a bracket.
optimize.newton(f, x0, fprime=...): Newton with derivative

Let’s try another example, and compare our implementation with scipy’s. Let’s take a look at \(f(x) = x-cos(x)\)

from scipy import optimize
import numpy as np

# Defining the function and its derivative (for Newton's method)
def f(x):      return x - np.cos(x)
def fprime(x): return 1 + np.sin(x)

# Let's solve all of these
print('SciPy bisection, x =', optimize.bisect(f, 0.0, 1.0, xtol=1e-12, rtol=1e-12, maxiter=200))
print('SciPy newton, x =', optimize.newton(f, x0=0.7, fprime=fprime, tol=1e-12, maxiter=50))
print('Our implementation of bisection, x =', bisection(f, 0.0, 1.0)[0])
print('Our implementation of Newton\'s method, x =', newton(f, fprime, x0=0.7)[0])

SciPy bisection, x = 0.7390851332156672
SciPy newton, x = 0.7390851332151607
Our implementation of bisection, x = 0.7390851331874728
Our implementation of Newton's method, x = 0.7390851332151608

Now, let’s try the example that was used in the demo above for Newton’s method, that is \(f(x) = x^3 + 7x^2 + 14x -8\).

The derivative for this is \(f'(x) = 3x^2 - 14x + 14\).

This equation has three roots as we saw in the animation above (1, 2 and 4)

def f(x):
    return x**3 - 7*x**2 + 14*x - 8

def fprime(x):
    return 3*x**2 - 14*x + 14


print('SciPy bisection, x = :', optimize.bisect(f, 0.0, 1.46, xtol=1e-12, rtol=1e-12, maxiter=200))
print('SciPy newton, x = :', optimize.newton(f, x0=1.45, fprime=fprime, tol=1e-12, maxiter=50))
print('Our implementation of bisection, x =', bisection(f, 0.0, 1.5)[0])
print('Our implementation of Newton\'s method, x =', newton(f, fprime, x0=1.45)[0])

SciPy bisection, x = : 1.0000000000000548
SciPy newton, x = : 0.9999999999999999
Our implementation of bisection, x = 1.0000000000291038
Our implementation of Newton's method, x = 0.9999999999986046

print('SciPy bisection, x = :', optimize.bisect(f, 1.5, 2.5, xtol=1e-12, rtol=1e-12, maxiter=200))
print('SciPy newton, x = :', optimize.newton(f, x0=1.5, fprime=fprime, tol=1e-12, maxiter=50))
print('Our implementation of bisection, x =', bisection(f, 1.5, 2.5)[0])
print('Our implementation of Newton\'s method, x =', newton(f, fprime, x0=1.5)[0])

SciPy bisection, x = : 2.0
SciPy newton, x = : 4.0
Our implementation of bisection, x = 2.0
Our implementation of Newton's method, x = 4.0

from scipy import optimize
import numpy as np

def f(x):
    return np.tanh(x)

def fprime(x):
    t = np.tanh(x)
    return 1.0 - t*t  # sech^2(x)

print('SciPy newton, x = :', optimize.newton(f, x0=1.08, fprime=fprime, tol=1e-12, maxiter=50))
print('Our implementation of Newton\'s method, x =', newton(f, fprime, x0=1.01)[0])

SciPy newton, x = : 0.0
Our implementation of Newton's method, x = -5.025528739899601e-12

Extension to systems: Newton’s method in higher dimensions#

It is not uncommon that instead of needing to solve a single nonlinear function, we have a system of nonlinear equations. These arise all the time in chemical reaction engineering, process design and control, process dynamics, and so on. These all share in common the particular feature that now, the unknown is a vector \(x_k \in \mathbb{R}^n\).
We put \(n\) nonlinear equations into what we call a vector function \(F:\mathbb{R}^n \to \mathbb{R}^n\):

\[\begin{split} F(x) = \begin{bmatrix} f_1(x_1,\dots,x_n) \\ \vdots \\ f_n(x_1,\dots,x_n) \end{bmatrix}, \qquad \end{split}\]

And we want \(F(x)=0\). Nothing has changed in our goal (root finding), we now just have more equations to deal with simultaneously. This is actually a system of nonlinear equations, and hence, we have to use the appropriate mathematical concepts to deal with this increased dimensionality.

Jacobian#

The Jacobian \(J(x)\) is the matrix of all first derivatives:

\[ J(x) = \left[\frac{\partial f_i}{\partial x_j}(x)\right]_{i,j=1}^n \]

Expanding the notation above you can see the full matrix as:

\[\begin{split} J(x) = \begin{bmatrix} \dfrac{\partial f_1}{\partial x_1}(x) & \dfrac{\partial f_1}{\partial x_2}(x) & \cdots & \dfrac{\partial f_1}{\partial x_n}(x) \\ \dfrac{\partial f_2}{\partial x_1}(x) & \dfrac{\partial f_2}{\partial x_2}(x) & \cdots & \dfrac{\partial f_2}{\partial x_n}(x) \\ \vdots & \vdots & \ddots & \vdots \\ \dfrac{\partial f_n}{\partial x_1}(x) & \dfrac{\partial f_n}{\partial x_2}(x) & \cdots & \dfrac{\partial f_n}{\partial x_n}(x) \end{bmatrix} \end{split}\]

And here we show that the Newton’s method is conceptually the same for the multivariate case. If we expand \(F\) employing Taylor around \(x_k\):

\[ F(x) \approx F(x_k) + J(x_k)(x - x_k) \]

And impose the root condition at \(x_{k+1}\), that is, we want \(F(x_{k+1}) = 0\), we have:

\[ 0 \approx F(x_k) + J(x_k)(x_{k+1} - x_k) \]

Lastly, if we solve for \(x_{k+1}\):

\[ J(x_k)(x_{k+1} - x_k) = -F(x_k), \]

\[ x_{k+1} = x_k + d_k, \quad d_k \text{ solves } J(x_k)d_k = -F(x_k) \]

Equivalently, we can write this equation as

\[ x_{k+1} = x_k - J(x_k)^{-1} F(x_k) \]

But we avoid this in practice since evaluating an inverse is expensive from a computational standpoint. Instead, we solve the linear system posed by \(J(x_k)d_k = -F(x_k)\)

Side-by-side: scalar vs. vector versions of Newton’s method#

To further stress the fact that these are the same algorithm, just adapted for higher dimensions, please take a look at the table below for a quick comparison:

Concept	Scalar (1D)	Vector (nD)
Unknown	\(x_k \in \mathbb{R}\)	\(x_k \in \mathbb{R}^n\)
Function	\(f:\mathbb{R}\to\mathbb{R}\)	\(F:\mathbb{R}^n\to\mathbb{R}^n\)
Expansion	\(f(x)\approx f(x_k)+f'(x_k)(x-x_k)\)	\(F(x)\approx F(x_k)+J(x_k)(x-x_k)\)
Root condition	\(0\approx f(x_k)+f'(x_k)(x_{k+1}-x_k)\)	\(0\approx F(x_k)+J(x_k)(x_{k+1}-x_k)\)
Newton step	\(d_k=-\tfrac{f(x_k)}{f'(x_k)}\)	\(J(x_k)d_k=-F(x_k)\)
Update	\(x_{k+1}=x_k+d_k\)	\(x_{k+1}=x_k+d_k\)

Example: \(2\times 2\) generic system#

Suppose

\[\begin{split} F(x) = \begin{bmatrix} f_1(x_1,x_2) \\ f_2(x_1,x_2) \end{bmatrix}, \quad J(x) = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2}\\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} \end{bmatrix} \end{split}\]

The Newton step solves

\[\begin{split} \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2}\\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} \end{bmatrix}_{x_k} \begin{bmatrix} d_{k,1}\\ d_{k,2} \end{bmatrix} = - \begin{bmatrix} f_1(x_k)\\ f_2(x_k) \end{bmatrix} \end{split}\]

Then \(x_{k+1} = x_k + d_k\).

Can we code this in Python? (We continue from here on Sep 3rd. Your first homework does not require the code development below.)#

Yup! It’s pretty much the same idea:

import numpy as np


def newton_nd(F, J, x0, tol=1e-8, max_iter = 50):
    """
    Newton's method (multivariate).


    Inputs
    ------

    f: function
        The vector-valued function we want to determine the root.

    J: function
        The Jacobian of the function F
    x0: float
        Initial estimate (guess).
    tol: float
        Tolerance.
    max_iter:
        Maximum number of iterations...


    Returns
    -------

    x: float
        Approximate root of f(x)
    iter: int
        Number of iterations
    sol: float
        Value of f(x) at the solution.
    
    
    """

    x = x0

    if np.linalg.norm(F(x), ord =2)  < tol:
        return x, 0, F(x) 
    # Main for loop:

    for k in range(max_iter):

       

        x_new = x - np.linalg.inv(J(x)) @ F(x)

        if np.linalg.norm(F(x_new)) < tol: # Function value small enough. stop.
            return x_new, k+1, F(x_new)
        if np.linalg.norm(x_new - x) < 1e-14: # Step size small enough. Stop.
            ## TODO: Maybe we should add 1e-14 as a keyword argument instead of hard coding it.
            return x_new, k+1, F(x_new)
        
        x = x_new

    
    Fx = F(x)
    x = x_new

    raise ValueError(f"Newton's method did not converge after {k+1} iterations;", f"x={x:.6f}, f(x)={F:.6f}")

Let’s try this for a system defined as

\[\begin{split} \begin{align*} 2 x_1^2 + x_2^2 = 6 \\ x_1 + 2 x_2 = 3.5 \end{align*} \end{split}\]

# Define F and J for the system
def F(x):
    x1, x2 = x
    return np.array([
        2.0*x1**2 + x2**2 - 6.0,
        x1 + 2.0*x2 - 3.5
    ])

def J(x):
    x1, x2 = x
    return np.array([
        [4.0*x1, 2.0*x2],
        [1.0,    2.0   ]
    ])

# Two different initial guesses
x0_1 = np.array([1.0, 1.0])      
x0_2 = np.array([-1.0, 2.2])     

root1, it1, res1 = newton_nd(F, J, x0_1, tol=1e-10, max_iter=50)
root2, it2, res2 = newton_nd(F, J, x0_2, tol=1e-10, max_iter=50)

print("From x0 =", x0_1, "root =", root1, "in", it1, "iters; residual =", res1)
print("From x0 =", x0_2, "root =", root2, "in", it2, "iters; residual =", res2)

From x0 = [1. 1.] root = [1.5958645  0.95206775] in 5 iters; residual = [8.8817842e-16 0.0000000e+00]
From x0 = [-1.   2.2] root = [-0.81808672  2.15904336] in 4 iters; residual = [8.8817842e-16 0.0000000e+00]

What about using scipy?#

You can use scipy.optimize.root to solve the exact same problem (or any root finding problem). Let’s take a look.

import scipy

help(scipy.optimize.root)

Help on function root in module scipy.optimize._root:

root(
    fun,
    x0,
    args=(),
    method='hybr',
    jac=None,
    tol=None,
    callback=None,
    options=None
)
    Find a root of a vector function.

    Parameters
    ----------
    fun : callable
        A vector function to find a root of.

        Suppose the callable has signature ``f0(x, *my_args, **my_kwargs)``, where
        ``my_args`` and ``my_kwargs`` are required positional and keyword arguments.
        Rather than passing ``f0`` as the callable, wrap it to accept
        only ``x``; e.g., pass ``fun=lambda x: f0(x, *my_args, **my_kwargs)`` as the
        callable, where ``my_args`` (tuple) and ``my_kwargs`` (dict) have been
        gathered before invoking this function.
    x0 : ndarray
        Initial guess.
    args : tuple, optional
        Extra arguments passed to the objective function and its Jacobian.
    method : str, optional
        Type of solver. Should be one of

        - 'hybr'             :ref:`(see here) <optimize.root-hybr>`
        - 'lm'               :ref:`(see here) <optimize.root-lm>`
        - 'broyden1'         :ref:`(see here) <optimize.root-broyden1>`
        - 'broyden2'         :ref:`(see here) <optimize.root-broyden2>`
        - 'anderson'         :ref:`(see here) <optimize.root-anderson>`
        - 'linearmixing'     :ref:`(see here) <optimize.root-linearmixing>`
        - 'diagbroyden'      :ref:`(see here) <optimize.root-diagbroyden>`
        - 'excitingmixing'   :ref:`(see here) <optimize.root-excitingmixing>`
        - 'krylov'           :ref:`(see here) <optimize.root-krylov>`
        - 'df-sane'          :ref:`(see here) <optimize.root-dfsane>`

    jac : bool or callable, optional
        If `jac` is a Boolean and is True, `fun` is assumed to return the
        value of Jacobian along with the objective function. If False, the
        Jacobian will be estimated numerically.
        `jac` can also be a callable returning the Jacobian of `fun`. In
        this case, it must accept the same arguments as `fun`.
    tol : float, optional
        Tolerance for termination. For detailed control, use solver-specific
        options.
    callback : function, optional
        Optional callback function. It is called on every iteration as
        ``callback(x, f)`` where `x` is the current solution and `f`
        the corresponding residual. For all methods but 'hybr' and 'lm'.
    options : dict, optional
        A dictionary of solver options. E.g., `xtol` or `maxiter`, see
        :obj:`show_options()` for details.

    Returns
    -------
    sol : OptimizeResult
        The solution represented as a ``OptimizeResult`` object.
        Important attributes are: ``x`` the solution array, ``success`` a
        Boolean flag indicating if the algorithm exited successfully and
        ``message`` which describes the cause of the termination. See
        `OptimizeResult` for a description of other attributes.

    See also
    --------
    show_options : Additional options accepted by the solvers

    Notes
    -----
    This section describes the available solvers that can be selected by the
    'method' parameter. The default method is *hybr*.

    Method *hybr* uses a modification of the Powell hybrid method as
    implemented in MINPACK [1]_.

    Method *lm* solves the system of nonlinear equations in a least squares
    sense using a modification of the Levenberg-Marquardt algorithm as
    implemented in MINPACK [1]_.

    Method *df-sane* is a derivative-free spectral method. [3]_

    Methods *broyden1*, *broyden2*, *anderson*, *linearmixing*,
    *diagbroyden*, *excitingmixing*, *krylov* are inexact Newton methods,
    with backtracking or full line searches [2]_. Each method corresponds
    to a particular Jacobian approximations.

    - Method *broyden1* uses Broyden's first Jacobian approximation, it is
      known as Broyden's good method.
    - Method *broyden2* uses Broyden's second Jacobian approximation, it
      is known as Broyden's bad method.
    - Method *anderson* uses (extended) Anderson mixing.
    - Method *Krylov* uses Krylov approximation for inverse Jacobian. It
      is suitable for large-scale problem.
    - Method *diagbroyden* uses diagonal Broyden Jacobian approximation.
    - Method *linearmixing* uses a scalar Jacobian approximation.
    - Method *excitingmixing* uses a tuned diagonal Jacobian
      approximation.

    .. warning::

        The algorithms implemented for methods *diagbroyden*,
        *linearmixing* and *excitingmixing* may be useful for specific
        problems, but whether they will work may depend strongly on the
        problem.

    .. versionadded:: 0.11.0

    References
    ----------
    .. [1] More, Jorge J., Burton S. Garbow, and Kenneth E. Hillstrom.
       1980. User Guide for MINPACK-1.
    .. [2] C. T. Kelley. 1995. Iterative Methods for Linear and Nonlinear
       Equations. Society for Industrial and Applied Mathematics.
       <https://archive.siam.org/books/kelley/fr16/>
    .. [3] W. La Cruz, J.M. Martinez, M. Raydan. Math. Comp. 75, 1429 (2006).

    Examples
    --------
    The following functions define a system of nonlinear equations and its
    jacobian.

    >>> import numpy as np
    >>> def fun(x):
    ...     return [x[0]  + 0.5 * (x[0] - x[1])**3 - 1.0,
    ...             0.5 * (x[1] - x[0])**3 + x[1]]

    >>> def jac(x):
    ...     return np.array([[1 + 1.5 * (x[0] - x[1])**2,
    ...                       -1.5 * (x[0] - x[1])**2],
    ...                      [-1.5 * (x[1] - x[0])**2,
    ...                       1 + 1.5 * (x[1] - x[0])**2]])

    A solution can be obtained as follows.

    >>> from scipy import optimize
    >>> sol = optimize.root(fun, [0, 0], jac=jac, method='hybr')
    >>> sol.x
    array([ 0.8411639,  0.1588361])

    **Large problem**

    Suppose that we needed to solve the following integrodifferential
    equation on the square :math:`[0,1]\times[0,1]`:

    .. math::

       \nabla^2 P = 10 \left(\int_0^1\int_0^1\cosh(P)\,dx\,dy\right)^2

    with :math:`P(x,1) = 1` and :math:`P=0` elsewhere on the boundary of
    the square.

    The solution can be found using the ``method='krylov'`` solver:

    >>> from scipy import optimize
    >>> # parameters
    >>> nx, ny = 75, 75
    >>> hx, hy = 1./(nx-1), 1./(ny-1)

    >>> P_left, P_right = 0, 0
    >>> P_top, P_bottom = 1, 0

    >>> def residual(P):
    ...    d2x = np.zeros_like(P)
    ...    d2y = np.zeros_like(P)
    ...
    ...    d2x[1:-1] = (P[2:]   - 2*P[1:-1] + P[:-2]) / hx/hx
    ...    d2x[0]    = (P[1]    - 2*P[0]    + P_left)/hx/hx
    ...    d2x[-1]   = (P_right - 2*P[-1]   + P[-2])/hx/hx
    ...
    ...    d2y[:,1:-1] = (P[:,2:] - 2*P[:,1:-1] + P[:,:-2])/hy/hy
    ...    d2y[:,0]    = (P[:,1]  - 2*P[:,0]    + P_bottom)/hy/hy
    ...    d2y[:,-1]   = (P_top   - 2*P[:,-1]   + P[:,-2])/hy/hy
    ...
    ...    return d2x + d2y - 10*np.cosh(P).mean()**2

    >>> guess = np.zeros((nx, ny), float)
    >>> sol = optimize.root(residual, guess, method='krylov')
    >>> print('Residual: %g' % abs(residual(sol.x)).max())
    Residual: 5.7972e-06  # may vary

    >>> import matplotlib.pyplot as plt
    >>> x, y = np.mgrid[0:1:(nx*1j), 0:1:(ny*1j)]
    >>> plt.pcolormesh(x, y, sol.x, shading='gouraud')
    >>> plt.colorbar()
    >>> plt.show()

You can also do ?scipy.optimize.root. Same idea.

?scipy.optimize.root

Signature:
scipy.optimize.root(
    fun,
    x0,
    args=(),
    method='hybr',
    jac=None,
    tol=None,
    callback=None,
    options=None,
)
Docstring:
Find a root of a vector function.

Parameters
----------
fun : callable
    A vector function to find a root of.

    Suppose the callable has signature ``f0(x, *my_args, **my_kwargs)``, where
    ``my_args`` and ``my_kwargs`` are required positional and keyword arguments.
    Rather than passing ``f0`` as the callable, wrap it to accept
    only ``x``; e.g., pass ``fun=lambda x: f0(x, *my_args, **my_kwargs)`` as the
    callable, where ``my_args`` (tuple) and ``my_kwargs`` (dict) have been
    gathered before invoking this function.
x0 : ndarray
    Initial guess.
args : tuple, optional
    Extra arguments passed to the objective function and its Jacobian.
method : str, optional
    Type of solver. Should be one of

    - 'hybr'             :ref:`(see here) <optimize.root-hybr>`
    - 'lm'               :ref:`(see here) <optimize.root-lm>`
    - 'broyden1'         :ref:`(see here) <optimize.root-broyden1>`
    - 'broyden2'         :ref:`(see here) <optimize.root-broyden2>`
    - 'anderson'         :ref:`(see here) <optimize.root-anderson>`
    - 'linearmixing'     :ref:`(see here) <optimize.root-linearmixing>`
    - 'diagbroyden'      :ref:`(see here) <optimize.root-diagbroyden>`
    - 'excitingmixing'   :ref:`(see here) <optimize.root-excitingmixing>`
    - 'krylov'           :ref:`(see here) <optimize.root-krylov>`
    - 'df-sane'          :ref:`(see here) <optimize.root-dfsane>`

jac : bool or callable, optional
    If `jac` is a Boolean and is True, `fun` is assumed to return the
    value of Jacobian along with the objective function. If False, the
    Jacobian will be estimated numerically.
    `jac` can also be a callable returning the Jacobian of `fun`. In
    this case, it must accept the same arguments as `fun`.
tol : float, optional
    Tolerance for termination. For detailed control, use solver-specific
    options.
callback : function, optional
    Optional callback function. It is called on every iteration as
    ``callback(x, f)`` where `x` is the current solution and `f`
    the corresponding residual. For all methods but 'hybr' and 'lm'.
options : dict, optional
    A dictionary of solver options. E.g., `xtol` or `maxiter`, see
    :obj:`show_options()` for details.

Returns
-------
sol : OptimizeResult
    The solution represented as a ``OptimizeResult`` object.
    Important attributes are: ``x`` the solution array, ``success`` a
    Boolean flag indicating if the algorithm exited successfully and
    ``message`` which describes the cause of the termination. See
    `OptimizeResult` for a description of other attributes.

See also
--------
show_options : Additional options accepted by the solvers

Notes
-----
This section describes the available solvers that can be selected by the
'method' parameter. The default method is *hybr*.

Method *hybr* uses a modification of the Powell hybrid method as
implemented in MINPACK [1]_.

Method *lm* solves the system of nonlinear equations in a least squares
sense using a modification of the Levenberg-Marquardt algorithm as
implemented in MINPACK [1]_.

Method *df-sane* is a derivative-free spectral method. [3]_

Methods *broyden1*, *broyden2*, *anderson*, *linearmixing*,
*diagbroyden*, *excitingmixing*, *krylov* are inexact Newton methods,
with backtracking or full line searches [2]_. Each method corresponds
to a particular Jacobian approximations.

- Method *broyden1* uses Broyden's first Jacobian approximation, it is
  known as Broyden's good method.
- Method *broyden2* uses Broyden's second Jacobian approximation, it
  is known as Broyden's bad method.
- Method *anderson* uses (extended) Anderson mixing.
- Method *Krylov* uses Krylov approximation for inverse Jacobian. It
  is suitable for large-scale problem.
- Method *diagbroyden* uses diagonal Broyden Jacobian approximation.
- Method *linearmixing* uses a scalar Jacobian approximation.
- Method *excitingmixing* uses a tuned diagonal Jacobian
  approximation.

.. warning::

    The algorithms implemented for methods *diagbroyden*,
    *linearmixing* and *excitingmixing* may be useful for specific
    problems, but whether they will work may depend strongly on the
    problem.

.. versionadded:: 0.11.0

References
----------
.. [1] More, Jorge J., Burton S. Garbow, and Kenneth E. Hillstrom.
   1980. User Guide for MINPACK-1.
.. [2] C. T. Kelley. 1995. Iterative Methods for Linear and Nonlinear
   Equations. Society for Industrial and Applied Mathematics.
   <https://archive.siam.org/books/kelley/fr16/>
.. [3] W. La Cruz, J.M. Martinez, M. Raydan. Math. Comp. 75, 1429 (2006).

Examples
--------
The following functions define a system of nonlinear equations and its
jacobian.

>>> import numpy as np
>>> def fun(x):
...     return [x[0]  + 0.5 * (x[0] - x[1])**3 - 1.0,
...             0.5 * (x[1] - x[0])**3 + x[1]]

>>> def jac(x):
...     return np.array([[1 + 1.5 * (x[0] - x[1])**2,
...                       -1.5 * (x[0] - x[1])**2],
...                      [-1.5 * (x[1] - x[0])**2,
...                       1 + 1.5 * (x[1] - x[0])**2]])

A solution can be obtained as follows.

>>> from scipy import optimize
>>> sol = optimize.root(fun, [0, 0], jac=jac, method='hybr')
>>> sol.x
array([ 0.8411639,  0.1588361])

**Large problem**

Suppose that we needed to solve the following integrodifferential
equation on the square :math:`[0,1]\times[0,1]`:

.. math::

   \nabla^2 P = 10 \left(\int_0^1\int_0^1\cosh(P)\,dx\,dy\right)^2

with :math:`P(x,1) = 1` and :math:`P=0` elsewhere on the boundary of
the square.

The solution can be found using the ``method='krylov'`` solver:

>>> from scipy import optimize
>>> # parameters
>>> nx, ny = 75, 75
>>> hx, hy = 1./(nx-1), 1./(ny-1)

>>> P_left, P_right = 0, 0
>>> P_top, P_bottom = 1, 0

>>> def residual(P):
...    d2x = np.zeros_like(P)
...    d2y = np.zeros_like(P)
...
...    d2x[1:-1] = (P[2:]   - 2*P[1:-1] + P[:-2]) / hx/hx
...    d2x[0]    = (P[1]    - 2*P[0]    + P_left)/hx/hx
...    d2x[-1]   = (P_right - 2*P[-1]   + P[-2])/hx/hx
...
...    d2y[:,1:-1] = (P[:,2:] - 2*P[:,1:-1] + P[:,:-2])/hy/hy
...    d2y[:,0]    = (P[:,1]  - 2*P[:,0]    + P_bottom)/hy/hy
...    d2y[:,-1]   = (P_top   - 2*P[:,-1]   + P[:,-2])/hy/hy
...
...    return d2x + d2y - 10*np.cosh(P).mean()**2

>>> guess = np.zeros((nx, ny), float)
>>> sol = optimize.root(residual, guess, method='krylov')
>>> print('Residual: %g' % abs(residual(sol.x)).max())
Residual: 5.7972e-06  # may vary

>>> import matplotlib.pyplot as plt
>>> x, y = np.mgrid[0:1:(nx*1j), 0:1:(ny*1j)]
>>> plt.pcolormesh(x, y, sol.x, shading='gouraud')
>>> plt.colorbar()
>>> plt.show()
File:      /opt/anaconda3/envs/numerical/lib/python3.13/site-packages/scipy/optimize/_root.py
Type:      function

x0_1 = np.array([1.0, 1.0])      
x0_2 = np.array([-1, -1])     



x_soln_1 = scipy.optimize.root(fun=F, x0=x0_1)
print(x_soln_1.x) 

x_soln_2 = scipy.optimize.root(fun=F, x0=x0_2)
print(x_soln_2.x) 

[1.5958645  0.95206775]
[-0.81808672  2.15904336]

There is a small detail here. We did not supply a Jacobian to scipy.optimize.root. Why do you think that happened? The method needs derivatives/jacobians, but we did not supply one. Why it still worked?

But if we do supply a Jacobian, what happens?

x_soln_1_jac = scipy.optimize.root(fun=F, x0=x0_1, jac=J)

print(x_soln_1.x) 
print(x_soln_1_jac.x) 

[1.5958645  0.95206775]
[1.5958645  0.95206775]

The solutions are the same. But what about number of iterations? Let us print both (without and with) exact Jacobian:

print(x_soln_1) # Without providing the Jacobian function we created.

 message: The solution converged.
 success: True
  status: 1
     fun: [ 0.000e+00  0.000e+00]
       x: [ 1.596e+00  9.521e-01]
  method: hybr
    nfev: 12
    fjac: [[-9.876e-01 -1.572e-01]
           [ 1.572e-01 -9.876e-01]]
       r: [-6.360e+00 -1.992e+00 -1.708e+00]
     qtf: [ 1.086e-10 -1.728e-11]

print(x_soln_1_jac) # Providing the Jacobian function we created.

 message: The solution converged.
 success: True
  status: 1
     fun: [ 0.000e+00  0.000e+00]
       x: [ 1.596e+00  9.521e-01]
  method: hybr
    nfev: 10
    njev: 1
    fjac: [[-9.876e-01 -1.572e-01]
           [ 1.572e-01 -9.876e-01]]
       r: [-6.360e+00 -1.992e+00 -1.708e+00]
     qtf: [ 1.086e-10 -1.728e-11]

⚠️ Important ⚠️

It took scipy.optimize.root fewer iterations (nfev argument printed in the output, 12 vs 10). Why is that?