Definitions
-----------

Matrices
========
Starting of, we have the general Tikhonov function

.. math::
    J = \frac{1}{2} (A x - y)^T W (A x - y) + \frac{1}{2} \vert \Lambda x \vert^2

where :math:`\Lambda` is the regularization matrix, and :math:`W` is the weight
matrix. We want to solve for :math:`x` such that :math:`J` is minimal:

.. math::
    \frac{\partial J}{\partial x^T} = A^T W (A x - y) + (\Lambda^T \Lambda) x = 0

which yields

.. math::
    \Lambda^T \Lambda x &= - A^T W A x + A^T W y \\
    (\Lambda^T \Lambda + A^T W A) x &= A^T W y \\
    x &= (\Lambda^T \Lambda + A^T W A)^{-1} A^T W y

Alternative derivation: defining the residue
:math:`r = Ax - y`, we instead find

.. math::
    \frac{\partial J}{\partial x^T} = A^T W r + (\Lambda^T \Lambda) x = 0

.. math::
    x = - (\Lambda^T \Lambda)^{-1} A^T W r

we can substitute this solution for :math:`x` back into
:math:`r`:

.. math::
    r (I + A (\Lambda^T \Lambda)^{-1} A^T W) = - y

The simplest choice of :math:`\Lambda = \lambda I`, in which case this equation
simplifies to

.. math::
    r (I + \frac{1}{\lambda^2} A A^T W) = - y

We introduce the following notation:

.. math::
    T &= \Lambda^T \Lambda \\
    H_y &= A A^T \\
    H_x &= A^T A
    R_y &= (I + A (\Lambda^T \Lambda)^{-1} A^T W)

This notation is chosen because :math:`H_x` is the Hessian of :math:`x`, and
:math:`H_y = A A^T` can be thought of as the Hessian of :math:`y`. :math:`R_y`
is so named because it is the regularized version of :math:`A`.

Multiple Datasets
=================
In the previous section :math:`y` was assumed to be a vector.
(Technically, a :math:`(N_y, 1)`-matrix.)
However, it is perfectly allowed to regularize multiple data sets at once by
turning it into a :math:`(N_y, N_{sets})`-matrix, where :math:`N_{sets}` is the
number of data sets. The function :math:`J` then becomes

.. math::
    J_k &= \frac{1}{2} (A x_k - y_k)^T W (A x_k - y_k) + \frac{1}{2} \vert \Lambda x_k \vert^2 \\
    J &= \sum_{k=1}^{N_{sets}} J_k

Functionals
===========
Things get truly interesting, and surprisingly simple, when we work with
functionals instead. We start from

.. math::
    J = \frac{1}{2} \sum_{i=1}^{N} ( \int_{-\infty}^{\infty} A_i(t) x(t) dt - y_i)^2 + \frac{1}{2} \int_{-\infty}^{\infty} (\Lambda(t) x(t))^2 dt

where :math:`A_i(t)` is the kernel of integral, for example :math:`e^{- s_i t}`
for a Laplace transform. As always, there is some ambiguity/freedom in the shape
of :math:`\Lambda`. Here it is written as a scalar function, but it could also
be chosen as a constant, or as a function with index :math:`i`.

Repeating the same steps as above, we find that

.. math::
    x(t) = - \frac{1}{\Lambda(t)^2} \sum_{i=1}^{N} A_i(t) r_i

which leads to

.. math::
    r_i &= \int_{-\infty}^{\infty} A_i(t) x(t) dt - y \\
    r_i &= - \sum_{j=1}^{N} r_j \int_{-\infty}^{\infty} \frac{A_j(t) A_i(t)}{\Lambda(t)^2} dt - y
    r_i &= - \sum_{j=1}^{N} r_j M_{ij} - y

where :math:`M_ij = \int_{-\infty}^{\infty} \frac{A_j(t) A_i(t)}{\Lambda(t)^2} dt`.