Conjugate Method
Quadratic Problem
$$
\min f(x) : f(x) = \frac{1}{2} \langle x, Qx \rangle – b^T x \;\;\;\;\;\text{ minimizer } x^* = Q^{-1}b \\
\nabla f(x) = Qx -b = g(x)$$
Steepest Descent
$$
x_{i+1} = x_i – \alpha_i g_i \;\;\;\;\; \alpha_i = \underset{\lambda \geq 0}{\arg} \min f(x_i – \lambda g_i)$$
식(1)에서 최적 $\lambda$ 는 $\frac{\partial f}{\partial \lambda} = 0$을 만족한다. 즉, ($x = x_{i+1}$ 이라 생각하면, 그리고 $\alpha$ 대신 $\lambda$로 놓고 생각하면)
$$
\frac{\partial f}{\partial \lambda} = {\frac{\partial f}{\partial x}}^T \frac{\partial \lambda}{\partial x} = [Q(x_i – \lambda g_i) -b]^T(-g_i) = 0$$
그러므로
$$
\begin{align}
0 &= [Q(x_i – \lambda g_i) -b]^T g_i = x_i^T Q g_i – \lambda g_i^T Q g_i -b^T g_i = (Q x_i – b)^T g_i – \lambda g_i^T Q g_i \\
&\implies (Q x_i – b)^T g_i = g_i^T g_i = \lambda g_i^T Q g_i \\
&\implies \lambda = \frac{g_i^T g_i}{g_i^T Q g_i}
\end{align}$$
따라서,
$$
x_{i+1} = x_i – \frac{g_i^T g_i}{g_i^T Q g_i} g_i$$
Usage
for any $f(x)$ and given $x_i$$\rightarrow$ Second order approximation near $x_i$
$$
f(x) \approx f(x_i) + \langle \nabla f(x_i), x- x_i \rangle + \frac{1}{2} \langle x – x_i, Q(x- x_i) \rangle$$
이러면 문제
$$
\left. \min f(x) \right|_{x^*}, \;\;\; f(x) = \frac{1}{2} \langle x, Qx \rangle – b^T x$$
에 대하여 Let
$$
F(x) = \frac{1}{2} \langle x – x^*, Q(x – x^*)$$
Claim
$\min f(x)$ and $min F(x)$ construct a same ${x_i}$ sequence.
proof of claim
$$
\begin{align}
F(x) &= \frac{1}{2} \langle x, Qx \rangle – {x^*}^T Q x + \frac{1}{2} \langle x^*, Qx^* \rangle &\\
&= \frac{1}{2} \langle x, Qx \rangle – b^T x + \frac{1}{2} \langle x^*, Qx^* \rangle &\because \nabla f(x^*) = Qx^* – b = 0 \\
&= f(x) + \frac{1}{2} \langle x^*, Qx^* \rangle
\end{align}$$
따라서 $\frac{\partial f}{\partial x} = \frac{\partial F}{\partial x}$ 그러므로 같은 최적화 문제가 된다.
Lemma
$$
F(x_{i+1}) = \left[ I – \frac{(g_i^T g_i)^2}{(g_i Q g_i)(g_i Q^{-1} g_i)} \right] F(x_i)$$
proof of Lemma
$$
g_i \triangleq Qx_i -b = Qx_i – Qx^{*} = Q(x_i – x^{*}) = Qy_i$$
따라서, $F(x_i) = \frac{1}{2}y_i^T Q y_i$. 고로
$$
F(x_i) – F(x_{i+1}) = \frac{1}{2} [y_i Q y_i – y_{i+1} Q y_{i+1}]$$
로 놓을 수 있으므로
$$
\begin{align}
y_{i+1}^T Q y_{i+1} &= (x_{i+1} – x^*)^T Q (x_{i+1} – x^*) = (x_i – \alpha_i g_i – x^*)^T Q (x_i – \alpha_i g_i – x^*) \\
&= (y_i – \alpha_i g_i)^T Q (y_i – \alpha_i g_i) \\
&= y_i^T Qy_i – 2 \alpha_i g_i^T Q y_i + \alpha_i^2 g_i^T Q g_i
\end{align}$$
따라서, Since $ \alpha = \lambda = \frac{g_i^T g_i}{g_i^T Q g_i} $
$$
\begin{align}
\frac{F(x_i) – F(x_{i+1})}{F(x_i)} &= – \frac{- 2 \alpha_i g_i^T Q y_i + \alpha_i^2 g_i^T Q g_i }{y_i^T Q y_i} = \frac{ 2 \alpha_i g_i^T Q Q^{-1} g_i – \alpha_i^2 g_i^T Q g_i}{g_i^T Q^{-1} Q Q^{-1} g_i} \\
&= \frac{ 2 \alpha_i g_i^T g_i – \alpha_i^2 g_i^T Q g_i}{g_i^T Q^{-1} g_i} = \frac{2 \frac{g_i^T g_i}{g_i^T Q g_i} g_i^T g_i – \frac{(g_i^T g_i)^2}{(g_i^T Q g_i)^2} g_i^T Q g_i}{g_i^T Q^{-1} g_i} \\
&= \frac{\frac{1}{g_i^T Q g_i}\left( 2(g_i^T g_i)^2 – g_i^T g_i \right)}{g_i^T Q^{-1} g_i} \\
&= \frac{g_i^T g_i}{(g_i^T Q g_i)(g_i^T Q^{-1} g_i)}
\end{align}$$
Method of Conjugate direction
Definition
A basis ${u_i}_{i=0}^{\infty}$ of $\mathbb{R}^n$ is said to be orthogonal if $\langle u_i, u_j \rangle = 0, \;\forall i \neq j$.
Definition : Q- Conjugate
Given a symmetric positive definite matrix $Q$, a basis ${u_i}_{i=0}^{\infty}$ is said to be “Q- Conjugate” if $\langle u_i, Qu_j \rangle = 0, \;\forall i \neq j$
Theorem
Let $f:\mathbb{R}^n \rightarrow \mathbb{R}$ be continously differentiable. Suppose that $\hat{x}$ minimizes $f(\cdot)$ on the subspace spanned by ${y_1, \cdots y_n}, k \leq n$, then $\langle \nabla f(\hat{x}), y_i \rangle = 0, \; i=1, \cdots k$.
proof
Let $S$ be the subspace spanned by ${y_1, \cdots y_n}$ then the $S = {y \in \mathbb{R}^n }$
Since $\hat{x}$ is a minimizer on $S$ ($\min_{x \in S} f(x) = f(\hat{x})$)
$$
\exists \hat{\alpha} \in \mathbb{R}^n \;\;\text{ such that } \hat{x} = Y\ \hat{\alpha} \;\;\text{ where }\;\; Y = [y_1, \cdots y_k], \hat{\alpha} = [\hat{\alpha}_1, \cdots, \hat{\alpha}_k]^T$$
Since any $x \in S$ can be represented $\alpha$ by $x = Y \alpha$ for some $\alpha \in \mathbb{R}^k$. Then
$$
\min_{x \in S} f(x) = \min_{\alpha \in \mathbb{R}^k} f(Y \alpha) \triangleq \min_{\alpha \in \mathbb{R}^k} g(\alpha) \;\;\;\text{ such that } \;\;\;\left. \frac{\partial g}{\partial \alpha} \right|_{\alpha = \hat{\alpha}} = 0$$
then (since $\hat{x} = Y \hat{\alpha}$, $\left. \frac{\partial x}{\partial \alpha} \right|_{x = \hat{x}, \alpha = \hat{\alpha}}= Y$)
$$
\frac{\partial g}{\partial \alpha} = \left( \frac{\partial f}{\partial x} \frac{\partial x}{\partial \alpha}\right)^T = \left( \nabla f(\hat{x})^T Y \right)^T = Y^T \cdot \nabla f(\hat{x}) = 0$$
Thus,
$$
\langle \nabla f(\hat{x}), y_i \rangle = 0, \;\;\; \forall i = 1, \cdots k$$
Theorem 2
Let $H$ be a symmetric positive definite $n \times n$ matrix, and let $d \in \mathbb{R}^n$ be arbitrary.
Let $ f(x) = \frac{1}{2} \langle x, Hx \rangle + \langle d, x \rangle $
Let ${ h_i }_{i=0}^{n-1} $ be on H-conjugate basis for $\mathbb{R}^n$
If ${ x_i }_{i=0}^h$ is a sequence constructed according to $x_0$ given by
$$
\begin{align}
x_{i+1} &= x_i – \lambda_i h_i \\
\lambda_i &= \arg \min_{\lambda \geq 0} f(x_i – \lambda h_i)
\end{align}$$
then $x_i$ minimizes $f(\cdot)$ on the subspace spanned by $h_0, \cdots , h_{i-1}$
(즉, $\mathbb{R}^1$ span하여 최소, $\rightarrow$$\mathbb{R}^2$ span하여 최소 …. $\mathbb{R}^n$ span하여 최소로 가져가는 것)
proof
Since $f$ is strictly convex
$\hat{x}$ is a minimizer of $f(\cdot)$ on the subspace spanned by
$$
\{y_i, \cdots, y_k \} \Leftrightarrow \langle \nabla f(\hat{x}), y_i \rangle = 0 \;\;\;\; i=1, \cdots k$$
- 이는 앞에서 증명한 바 당연하다.
It is necessary to prove that $\langle \nabla f(x_i), h_j \rangle = 0$, for $j=0, \cdots i-1$
먼저 $\nabla f(x_i)$는
$$
\nabla f(x_i) = Hx_i + d_i = g_i$$
다음 $\lambda_i$는 가정에서 $f(x_i -\lambda h_i)$를 최소화 시켜야 하므로
$$
0 = \left. \frac{\partial f}{\partial \lambda} \right|_{\lambda=\lambda_i} = {\frac{\partial f}{\partial x}}^T \cdot {\left. \frac{\partial x}{\partial \lambda} \right|_{\lambda = \lambda_i}} = \nabla f(x_i – \lambda_i h_i)^T (-h_i)$$
그러므로
$$
\langle g_{i+1}, h_i \rangle = 0, \;\;\; \forall i=0, \cdots , n-1
\tag{1}$$
Thus
$$
\begin{align}
g_i &= H x_i + d \\
&= H(x_{i-1} – \lambda_i h_{i-1}) + d \\
&= g_{i-1} – \lambda_{i-1} H h_{i-1} \\
&= – \lambda_{i-1} H h_{i-1} – – \lambda_{i-2} H h_{i-2} – \cdots – – \lambda_{0} H h_{0} + g_0
\end{align}$$
Since ${ h_i }$ are H-conjugate (It means that $\langle h_i, Hh_i \rangle = 0, \;\; j \neq i$ )
$$
\begin{align}
\langle g_i, h_i \rangle &= -\lambda_{i-1} \langle H h_{i-1}, h_i \rangle -\lambda_{i-2} \langle H h_{i-2}, h_i \rangle – \lambda_{0} \langle H h_{0}, h_i \rangle + \langle g_0, h_i \rangle \\
&= \langle g_0, h_i \rangle
\end{align}
\tag{2}$$
그리고 for $k < i$ 에 대해서는 방정식 (2)에서 0이 되지 않는 항 $- \lambda _k \langle h_k, Hh_k \rangle $ 이 하나 존재하므로
$$
\langle g_i, h_k \rangle = \langle g_0, h_k \rangle – \lambda _k \langle h_k, Hh_k \rangle$$
Let $i=k+1$ 즉, $k=i-1$ then 식 (1)에서
$$
0 = \langle g_{k+1}, h_k \rangle = \langle g_0, h_k \rangle – \lambda _k \langle h_k, Hh_k \rangle$$
따라서 임의의 $\bar{k} < k < i $ 에 대하여
$$
\langle g_{i}, h_{\bar{k}} \rangle = \langle g_0, h_{\bar{k}} \rangle – \lambda_{\bar{k}} \langle h_{\bar{k}}, Hh_{\bar{k}} \rangle = 0$$
그러므로
$$
\langle g_i, h_k \rangle = \langle \nabla f(x_i), h_k \rangle = 0 \;\;\;\;\; \text{for } k=1, \cdots i-1$$
Q.E.D
Conjugate Method
$$
\text{solve }\;\;\;\; \min_{x \in \mathbb{R}^n} \frac{1}{2} \langle x, Hx \rangle + \langle d, x \rangle, \;\;\; H > 0$$
Procesdure | Processing |
---|---|
Data | $x_o \in \mathbb{R}^n$ |
Step 0 | Set $i=0$, $h_0 = g_0 = Hx_0 + d$ |
Step 1 | Compute the step length |
$ \lambda_i = \arg \min f(x_i – \lambda_i h_i)$ | |
Step 2 | update, Set |
$x_{i+1} = x_i – \lambda_i h_i$ | |
$g_{i+1} = Hx_{i+1} + d$ | |
$ h_{i+1} = g_{i+1} + h_i r_i$ | |
so that $\langle h_{i+1}, Hh_i \rangle = 0$ | |
Step 3 | Replace $i $ by $i+1$ and goto Step 1 |
where $r_i$ is defined as
$$
r_i = -\frac{\langle Hh_i, g_{i+1} \rangle }{\langle h_i , Hh_i \rangle}$$
Theorem 1
Suppose that $H$ is an $n \times n$ positive semi-definite symmetric matrix and $d \in \mathbb{R}^n$ arbitrary, then the conjugate gradient method solves the problem
$$
\min_{x \in \mathbb{R}^n} \frac{1}{2} \langle x, Hx \rangle + \langle d, x \rangle$$
in at most $n$ iterations
proof
이 증명은 조금 문제가 있으므로 추가 검토가 필요하다.
- We need to prove that $\langle h_i Hh_j \rangle = 0$ for $i \neq j$
By induction, since $h_0 = g_0$$$
\begin{align}
\langle g_1, h_0 \rangle &= \langle Hx_1 + d, h_0 \rangle \\
&= \langle H(x_0 – \lambda_0 h_0) + d, h_0 \rangle\\
&=\langle Hx_0 + d – \lambda H h_0, h_0 \rangle
\end{align}
\tag{3}$$at $\lambda = \lambda_0$
$$
\begin{align}
\left. \frac{\partial f}{\partial \lambda} \right|_{\lambda = \lambda_0} &= \left. {\frac{\partial f}{\partial x}}^T \frac{\partial x}{\partial \lambda} \right|_{\lambda = \lambda_0} = \langle {\left. Hx + d \right|_{x = x_0 – \lambda_0 h_0 }}, -h_0 \rangle \\
&= \langle Hx_0 – \lambda_0 H h_0 + d, -h_0 \rangle = 0
\end{align}$$이므로 식(3)은
$$
\langle Hx_0 + d – \lambda H h_0, h_0 \rangle = 0
\tag{4}$$$\langle h_{i+1}, Hh_i \rangle = 0$ 조건에서 $\langle h_{1}, Hh_0 \rangle = 0$ 이고 이 조건을 만족하기 위한 $r_i$를 구하면
$$
0 = \langle Hh_i, h_{i+1} \rangle = \langle Hh_i, g_{i+1} + r_i h_i \rangle = \langle Hh_i, g_{i+1} \rangle + r_i \langle Hh_i, h_i \rangle = 0$$따라서
$$
r_i = – \frac{\langle Hh_i, g_{i+1} \rangle}{\langle Hh_i, h_i \rangle}$$ - 먼저, 다음의 가정을 놓는다.
$i=k$, Suppose that $\langle g_i, g_j \rangle = 0$ and $\langle h_i, Hh_j \rangle = 0$ for $ 0 \leq i, j \leq k < n$ with $i \neq j$
let $i = k+1$ need to prove that
$$
\langle g_{k+1}, g_i \rangle = 0, \langle h_{k+1}, H h_i \rangle = 0 \;\;\;\; \text { for } i=0, \cdots, k$$
식 (4)에서 0 대신 $i$를 대입하면
$$
\langle g_{k+1}, h_i \rangle = 0$$
이는 다시말해 $g_{k+1} = Hx_{i+1} + d = H(x_i -\lambda_i h_i) + d$ 에서
$$
\langle Hx_i + d – \lambda_i H h_i, h_i \rangle = 0 \Leftrightarrow \langle g_i, h_i \rangle – \lambda_i \langle Hh_i, h_i \rangle = 0$$
따라서
$$
\lambda_i = \frac{\langle g_i, h_i \rangle}{\langle Hh_i, h_i \rangle}$$
이 결과를 사용하여 본격적으로 $\langle g_{k+1}, g_i \rangle = 0$ 를 증명한다.
$$
\begin{align}
\langle g_i, g_{i+1} \rangle &= \langle g_i, g_i – \lambda_i H h_i \rangle \\
&= \langle g_i, g_i \rangle – \lambda_i \langle g_i, Hh_i \rangle
&= \langle g_i, g_i \rangle – \frac{\langle g_i, h_i \rangle \cdot \langle g_i, Hh_i \rangle}{\langle h_i, Hh_i \rangle}
\end{align}
\tag{5}$$
(식 (5)의 마지막 항은 매우 중요하다.)
$$
\begin{align}
\langle g_i, h_i \rangle &= \langle g_i, g_i + r_{i-1} h_{i-1} \rangle &\\
&= \langle g_i, g_i \rangle + r_{i-1} \langle g_i, h_{i-1} \rangle &\\
&= \langle g_i, g_i \rangle &\because \langle g_i, h_{i-1} \rangle = 0
\end{align}$$
and
$$
\begin{align}
\langle h_i, Hh_i \rangle &= \langle g_i + r_{i-1}h_{i-1}, Hh_i \rangle &\\
&= \langle g_i, Hh_i \rangle + r_{i-1} \langle h_{i-1}, Hh_i \rangle & \\
&= \langle g_i, Hh_i \rangle &\because \langle h_{i-1}, Hh_i \rangle = \langle h_i, Hh_{i-1} \rangle = 0 (\text { by Symmetric of } H)
\end{align}$$
그러므로
$$
\langle g_i, g_{i+1} \rangle = \langle g_i, g_i \rangle – \frac{\langle g_i, h_i \rangle \cdot \langle g_i, Hh_i \rangle}{\langle h_i, Hh_i \rangle} = 0$$
따라서
$$
\forall i, \langle g_i, g_{i+1} \rangle = 0 \Rightarrow \text{ if } i=k, \langle g_k, g_{k+1} \rangle = 0$$
여기까지 증명에서 index 1의 차이가 나는 vector들은 Orthogonal 함이 증명되었다.
그러나, 일반적인 경우 즉, $i=1, \cdots k-1$ 과 $k$의 Orthogonal 함은 아직 증명되지 않았다. 따라서 이를 증명한다.
Let $i \neq 0, \; i < k$ i.e. $ i=1, \cdots k-1$,
$$
\langle g_{k+1}, g_i \rangle = \langle g_k – \lambda_k H h_k, g_i \rangle = \langle g_k, g_i \rangle – \lambda_k \langle H h_k, g_i \rangle$$
가정 $\langle g_i, g_j \rangle = 0$ for $ 0 \leq i, j \leq k < n$ with $i \neq j$ 에서
- 위 가정 보다는 만일 $i = k-1$ 이면 위 식은 Index 1 차이가 나므로 $\langle g_k, g_i \rangle = 0$, 이것을 확대 시켜나가는 방식으로 증명하는 것이 더 낫다.
$$
\langle g_k, g_i \rangle = 0$$
이므로 위 식은
$$
\langle g_{k+1}, g_i \rangle = – \lambda_k \langle H h_k, h_i – r_{i-1}h_{i-1} \rangle = – \lambda_k \langle H h_k, h_i \rangle + \lambda_k r_{i-1} \langle H h_k, h_{i-1} \rangle$$
여기에서 가정 $\langle h_i, Hh_j \rangle = 0$ for $ 0 \leq i, j \leq k < n$ with $i \neq j$ 에 의해
$$
\langle H h_k, h_i \rangle = 0, \;\;\;\; \langle H h_k, h_{i-1} \rangle = 0 \tag{6}$$
따라서
$$
\langle g_{k+1}, g_i \rangle = 0$$
마지막으로 $i=0$에 대하ㅣ여 살펴보면
$$
\begin{align}
\langle g_{k+1}, g_0 \rangle &= \langle g_k – \lambda_k H h_k, g_0 \rangle & \\
&= \langle g_k, g_0 \rangle – \lambda_k \langle Hh_k, g_0 \rangle & \because \langle g_k, g_0 \rangle = 0 \\
&= – \lambda_k \langle H h_k, h_0 \rangle & \because g_0 = h_0, \;\;\text{ and }\langle H h_k, h_0 \rangle = 0 \\
&= 0
\end{align}$$
그러므로 ${g_i }$는 모두 Orthogonal, 즉, Orthogonal BAsis $g_i$를 찾은 것이다.
마지막으로, $\langle h_{k+1}, Hh_k \rangle = 0$ 이므로 (첫번째 증명에서)
for $i=0, \cdots k-1$
$$
\begin{align}
\langle h_{k+1}, Hh_i \rangle &= \langle g_{k+1} + r_k h_k, Hh_i \rangle &\\
&= \langle g_{k+1}, Hh_i \rangle + r_k \langle h_k, Hh_i \rangle & \because \langle h_k, H h_i \rangle = 0 \text{ by (6)} \\
&= \langle H x_{i+1} + d, Hh_i \rangle &\because g_{i+1} = g_i – \lambda_i H h_i \\
&= \langle g_i – \lambda_i H h_i, H h_i \rangle &\because \frac{1}{\lambda_i}(g_{i+1} – g_i) = Hh_i \\
&= \langle g_{k+1}, \frac{1}{\lambda_i}(g_{i+1} – g_i) \rangle = 0
\end{align}$$
따라서 $h$– conjugate가 증명
Q.E.D
General Case
$\min_{\mathbb{R}^n} f(x)$에 대하여 다음의 아이디어를 적용한다.
Conjugate method에서 실제 Hessian을 구하는 것은 어려운 문제이다. 따라서 이를 Gradient로 Estimation 한다. 즉,
$$
r_i = -\frac{\langle Hh_i, g_{i+1} \rangle }{\langle h_i, Hh_i \rangle}$$
등에서 $H$를 구하기 어렵다. 그러므로
$$
g_{i+1} = g_i – \lambda_i H h_i \Leftrightarrow Hh_i = \frac{1}{\lambda}(g_i – g_{i+1})$$
그러므로
$$
r_i = -\frac{\langle \frac{1}{\lambda}(g_i – g_{i+1}), g_{i+1} \rangle}{\langle h_i, \frac{1}{\lambda}(g_i – g_{i+1}) \rangle} = – \frac{\langle g_{i+1}, g_{i+1} \rangle – \langle g_i, g_{i+1} \rangle}{\langle h_i, g_i \rangle}$$
Since $h_i = g_i + r_{i-1} h_{i-1}$
$$
\langle h_i, g_i \rangle = \langle g_i + r_{i-1} h_{i-1}, g_i \rangle = \langle g_i, g_i \rangle + r_{i-1} \langle h_{i-1}, g_i \rangle = \|g_i\|^2 + 0$$
따라서
$$
r_i = – \frac{\langle g_{i+1}, g_{i+1} \rangle – \langle g_i, g_{i+1} \rangle}{\| g_i \|^2}$$
따라서 다음의 두 가지 방법이 유도 된다.
- Polak – Riebel Formula
$$
r_i = – \frac{\langle g_{i+1}, g_{i+1} \rangle – \langle g_i, g_{i+1} \rangle}{\| g_i \|^2}$$
- Fletcher-Reeves Formula
- Since for the quadratic case, $\langle g_{i+1}, g_{i} \rangle = 0$, that results can be extended.
$$
r_i = – \frac{\| g_{i+1} \|^2}{\| g_i \|^2}$$
- Since for the quadratic case, $\langle g_{i+1}, g_{i} \rangle = 0$, that results can be extended.
General Conjugate Gradient Algorithm
Procesdure | Processing |
---|---|
Data | $x_o \in \mathbb{R}^n$ |
Step 0 | Set $i=0$, $h_0 = \nabla f(x_0)$ |
Step 1 | Compute the step length |
$ \lambda_i = \arg \min f(x_i – \lambda_i h_i)$ | |
Step 2 | update, Set |
$x_{i+1} = x_i – \lambda_i h_i$ | |
Set $r_i^{PR}$, $r_i^{FR}$ | |
$h_{i+1} = \nabla f(x_{i+1}) + r_i^{PR} h_i$ or $h_{i+1} = \nabla f(x_{i+1}) + r_i^{FR} h_i$ | |
Step 3 | Replace $i $ by $i+1$ and goto Step 1 |
where $r_i^{PR}$, $r_i^{FR}$ is defined as
$$
\begin{align}
r_i^{PR} &= -\frac{\langle \nabla f(x_{i+1}) – \nabla f(x_i), \nabla f(x_{i+1}) \rangle }{\| \nabla f(x_{i}) \|^2} \\
r_i^{FR} &= -\frac{\|\nabla f(x_{i+1}) \|^2 }{\| \nabla f(x_{i}) \|^2}
\end{align}$$
Theorem 1
Suppose that $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is twice continuous differentiable and
$$
\exists 0 < m \leq M \;\;\; \text{ such that } \;\;\; \forall x,y \in \mathbb{R}^n, \;\; m \|y\|^2 \leq \langle y, H(x)y \rangle \leq M \|y\|^2$$
Polak-Riebler algorithm is applied to solve $\min f(x)$ producing a sequence ${ x_i }_{i=0}^{\infty}$ then
- $\exists \rho \in (0,1) \;\;\; \text{such that} \;\;\; \langle \nabla f(x_i), h_i \rangle \geq \rho \|\nabla f(x_i) \| \| h_i \|$
- The sequence $\{ x_i \}_{i=0}^{\infty}$ converges to $\hat{x}$, the unique minimizer of $f(\cdot)$.
(위의 것을 만족하면 아래 명제도 만족됨)
proof of 1.
let $ g_i = \nabla f(x_i) $, then
$$
\begin{align}
g_{i+1} &= \nabla f (x_{i+1}) = \nabla f( x_i – \nabla_i h_i) \\
&= \nabla f(x_i) + \int_0^1 H(x_i – s \lambda_i h_i )(-\lambda_i h_i) ds \\
&= g_i – \lambda_i \int_0^1 H(x_i – s \lambda_i h_i ) h_i ds
\end{align}$$
Now, $\langle g_{i+1}, h_i \rangle = 0$ by construction (because of $h$– conjugate)
It means that $\langle g_{i} – \lambda_i H_i h_i, h_i \rangle = 0$. Thus,
$$
\begin{align}
\lambda_i &= \frac{\langle g_i, h_i \rangle}{\langle Hh_i, h_i} &\text{in algorithm} h_i = g_i + r_{i-1}^{PR} h_{i-1}
&= \frac{\langle g_i, g_i + r_{i-1}^{PR} h_{i-1} \rangle}{\langle Hh_{i-1}, h_i \rangle} = \frac{\|g_i\|^2}{\langle Hh_{i-1}, h_i \rangle}
\end{align}$$
and
$$
r_{i}^{PR} = \frac{\langle g_{i+1} – g_i, g_{i+1} \rangle}{\| g_i \|^2} = -\lambda_i \frac{\langle Hh_i, g_{i+1} \rangle}{\| g_i \|^2} = -\frac{\langle Hh_i, g_{i+1} \rangle}{\langle Hh_{i}, h_i \rangle}$$
그러므로
$$
\| r_{i}^{PR} \| \leq \frac{M \| h_i \| \| g_{i+1} \|}{m \| h_i \|^2} = \frac{M}{m} \cdot \frac{\| g_{i+1} \|}{\| h_i \|}$$
그리고
$$
\| h_{i+1} \| \leq \| g_{i+1} \| + \frac{M}{m} \frac{\| g_{i+1} \|}{\| h_i \|} \cdot \|h_i \| = (1 + \frac{M}{m}) \| g_i \|$$
마지막으로, Since $\langle g_{i+1}, h_i \rangle = 0$
$$
\langle g_{i+1}, h_{i+1} \rangle = \langle g_{i+1}, g_{i+1} + r_i^{PR} h_i \rangle = \| g_{i+1} \|^2$$
따라서
$$
\frac{\langle g_{i+1}, h_{i+1} \rangle}{\| g_{i+1} \| \| h_{i+1} \| } = \frac{\| g_{i+1} \|^2}{\| g_{i+1} \| \| h_{i+1} \|} \geq \frac{\| g_{i+1} \|^2}{(1+ \frac{M}{m}) \| g_{i+1} \|} = \frac{1}{1+ \frac{M}{m}} \triangleq \rho$$
위에서 구한 $\rho$ 는 $0 < \rho < 1$ 이다. 따라서
$$
\langle g_{i+1}, h_{i+1} \rangle \geq \rho \| g_{i+1} \| \|h_{i+1} \|$$
Q.E.D
Theorem 2
If Fletcher-Reeves algorithm is applied to solve $\min f(x)$ then
(a) $\exists { t_i }_{i=0}^{\infty}$
- $t_i > 0, \;\; \forall i$
- $t_i \rightarrow 0 \;\;\; \text{as} \;\;\; i \rightarrow \infty$
- $\sum_{i=0}^k t_i^2 \rightarrow \infty \;\;\; \text{as} \;\;\; k \rightarrow \infty$
- $\langle \nabla f(x_i), h_i \rangle \geq t_i \| \nabla f(x_i) \| \| h_i \|$
(b) ${ x_i }$ converges to $\hat{x}$, the unique minizer.
증명은 생략한다.
Note
- Quadratic Case에서 Step Size $\lambda_i$는 다음과 같이 계산된다.
$$
\lambda_i = \frac{\langle g_i, h_i \rangle}{\langle Hh_i, h_i \rangle}$$ - Conjugate의 결과는 결국 이렇게 되는 것이다.
$$
\langle g_i, h_k \rangle = \langle \nabla f(x_i), h_k \rangle = 0 \;\;\;\;\; \text{for } k=1, \cdots i-1$$