Newton’s Method

Local Newton’s Algorithm

Assumption

$f$ is twice differentiable (continulously) … Convex Condition
Locally Lipschtz continuous … 한정된 미분치
- i.e. given any bounded set $S \subset \mathbb{R}^n$
  $$
  \exists L < \infty \;\;\textit{such that}\;\; \forall x', x \in S,\;\; \|f(x') - f(x) \| \leq L \cdot \|x' - x\|$$
$\exists 0 < m \leq M < \infty$ such that $\forall y \in \mathbb{R}^n$ .
$$
m \|y\|^2 < \langle y, H(\hat{x}) y \rangle \leq M \|y\|^2$$

where $\hat{x}$ is a local minimizer.

Basic Idea

Given $f(\cdot)$, we cam approximate $f$ near $x_i$ using a taylor expansion such that

$$
f(x) \approx f(x_i) + \langle \nabla f(x_i), x – x_i \rangle + \frac{1}{2} \langle x – x_i, H(x_i)(x – x_i) \rangle \tag{1}$$

It is minimized at $x_{i+1} = x_i – H^{-1}(x_i)\nabla f(x_i)$ 그리고 $\nabla f(x) = 0$ 에서 식(1)을 $x$ 에 대하여 미분하면

$$
\begin{align}
\nabla f(x) &= \nabla f(x_i) + H(x_i) (x – x_i) \\
0 &= \nabla f(x_i) + H(x_i) (x – x_i) \\
-\nabla f(x_i) &= H(x_i) (x – x_i) \\
x – x_i &= -H^{-1}(x_i)\nabla f(x_i)
\end{align}$$

Therefore, by assumption, $H(\hat{x})$ is positive definite and, since $ m \leq \|x\| \leq M$,

$$
\exists B(\hat{x}, \rho), \;\;\textit{such that}\;\; \frac{m}{2} \|y\|^2 < \langle y, H(x) y \rangle \leq 2M \|y\|^2$$ 즉, Positive Definite 존재하는 지역에서만 성립.

Local Newton’s Method

Procesdure	Processing
Data	$x_o \in \mathbb{R}$
Step 0	Set $i=0$
Step 1	Compute the search direction
	$ h_i = -H^{-1}(x_i) \nabla f(x_i)$
Step 2	Update $x_{i+1} = x_i + h_i$
	Replace $i$ by $i+1$
Step 3	goto step 1

Assumption

$H(\cdot)$ is locally Lipschitz continuous i.e.
- For a bounded set $S \subset \mathbb{R}^n$
  $$
  \exists L < \infty, \;\; \|H(x') - H(x'') \| \leq L \cdot \| x' - x'' \|, \;\; \forall x' x'', \in S$$
$\exists 0 < m \leq M < \infty$ such that
$$
m \| y \|^2 \leq \langle y, H(\hat{x}) y \rangle \leq M \|y\|^2 \;\; \forall y \in \mathbb{R}^n$$

Theorem 1.

Suppose that the above assumptions are satisfied, then $\exists \hat{\rho} > 0$ such that $\forall x_0 \in B^o(\hat{x}, \hat{\rho})$, the sequence $\{x_i\}_{i=0}^{\infty}$ constructed by the algorithm converging to $\hat{x}$ quadratically. (i.e. by root rate 2)

proof

Since $H(\cdot)$ is continuous, $\exists \rho > 0$, and $L < \infty$ such that $\forall x \in B^o(\hat{x}, \hat{\rho})$.

$$
\begin{align}
\frac{m}{2}\|y\| \leq \langle y, H(x)y \rangle \leq 2 M \|y\|^2 \;\;\;\;\;\;\;\;\;& \forall y \in \mathbb{R}^n \\
\| H(x’) – H(x”) \| \leq L \cdot \| x’ – x” \|, \;\;\;\;\;\;\; & x’, x” \in B^o(\hat{x}, \rho)
\end{align}$$

Since that $x_i \in B^o(\hat{x}, \rho)$ then

$$
\begin{align}
H(x_i) (x_{i+1} – x_i) &= H(x_i) \left( H^{-1}(x_i) \nabla f(x_i) \right) = – \nabla f(x_i) & \\
&= – (\nabla f(x_i) – \nabla f(\hat{x})) & \because \text{ since } \nabla f(\hat{x}) = 0
\end{align}$$

Since

$$
g(x_i) – g(\hat{x}) = \int_0^1 \langle \nabla g(\hat{x} + s (x_i – \hat{x})), (x_i – \hat{x}) \rangle ds$$

The above equation is equal to

$$
– (\nabla f(x_i) – \nabla f(\hat{x})) = -\int_0^1 \langle H(\hat{x} + s (x_i – \hat{x})), (x_i – \hat{x}) \rangle ds$$

Subtract $H(x_i)\hat{x}$ from both sides

$$
H(x_i) (x_{i+1} – x_i) = H(x_i) (x_{i+1} – x_i) – H(x_i)\hat{x} + H(x_i)\hat{x} = H(x_i)(x_{x_i+1} – \hat{x}) + H(x_i)(\hat{x} – x_i)$$

Henceforth,

$$
H(x_i)(x_{x_i+1} – \hat{x}) + H(x_i)(\hat{x} – x_i) = -\int_0^1 \langle H(\hat{x} + s (x_i – \hat{x})), (x_i – \hat{x}) \rangle ds \\
\begin{align}
\Rightarrow H(x_i)(x_{x_i+1} – \hat{x}) &= \int_0^1 \left( H(x_i) – H(\hat{x} + s (x_i – \hat{x})) \right) ds (x_i – \hat{x}) \\
\Rightarrow x_{x_i+1} – \hat{x} &= H^{-1}(x_i)\int_0^1 \left( H(x_i) – H(\hat{x} + s (x_i – \hat{x})) \right) ds (x_i – \hat{x}) \\
\Rightarrow \| x_{x_i+1} – \hat{x} \| &\leq \| H^{-1}(x_i) \| \int_0^1 \| H(x_i) – H(\hat{x} + s (x_i – \hat{x})) \|ds \| x_i – \hat{x} \|
\end{align}$$

By the definition of the Induced norm, $\|H(x)\| \| = \sup_{\|y\|=1} \langle y, H(x)y \rangle \leq 2M$, by the assumptions.
Thus, $\| H^{-1}(x) \| \leq \frac{2}{m} $. Therefore

$$
\begin{align}
\| x_{x_i+1} – \hat{x} \| &\leq \frac{2}{m} \int_0^1 L \cdot \| x_i – \hat{x} – s (x_i – \hat{x}) \|ds \| x_i – \hat{x} \| \\
&= \frac{2}{m} \int_0^1 L \cdot \| (1-s)(x_i – \hat{x}) \|ds \| x_i – \hat{x} \| \\
&\leq \frac{2}{m} L \cdot \| x_i – \hat{x} \|^2 \cdot \int_0^1 (1-s) ds \\
&\leq \frac{L}{m} \| x_i – \hat{x} \|^2
\end{align}$$

if $\frac{L}{m} \| x_i – \hat{x} \| < 1$ then, $\| x_{i+1} – x_i \| < \| x_i - \hat{x} \|$. (자승의 영향을 없앤다.)

For any $\alpha \in (0,1)$ pick $\hat{\rho} = \min(\rho, \alpha \frac{m}{L})$. Then, if $x_0 \in B^o(\hat{x}, \hat{\rho}) $, i.e.

$$
\| x_0 – \hat{x} \| \leq \hat{\rho} \leq \alpha \frac{m}{L} \Rightarrow \frac{L}{m} \| x_0 – \hat{x} \| \leq \alpha < 1$$ Now, we have that.

$$
\begin{align}
\| x_1 – \hat{x} \| &\leq \frac{L}{m} \| x_0 – \hat{x} \|^2 \leq \alpha \|x_0 – \hat{x} \| \\
\| x_2 – \hat{x} \| &\leq \frac{L}{m} \| x_1 – \hat{x} \|^2 \leq \alpha \|x_1 – \hat{x} \| \leq \alpha \|x_0 – \hat{x} \|^2 \\
\cdots \\
\| x_i – \hat{x} \| &\leq \alpha^i \| x_0 – \hat{x} \|
\end{align}$$

Since $\alpha \in (0,1)$, $a^i \rightarrow 0$ as $i \rightarrow \infty$. Therefore $\|x_i – \hat{x} \| \rightarrow 0$ as $i \rightarrow \infty$.
It implies that $\{x_i \}_{i=0}^{\infty}$ converges to $\hat{x}$.

Is it quadratic converge? From the abovbe inequalities, we can obtain

$$
\| x_{i+1} – x_i \| \leq \frac{L}{m} \| x_i – \hat{x_i} \|^2.$$

Multiply $\frac{L}{m}$ both sides

$$
\frac{L}{m} \| x_{i+1} – x_i \| \leq (\frac{L}{m} \| x_i – \hat{x_i} \|)^2.$$

Let $\mu_i = \frac{L}{m} \| x_i – \hat{x} \|$ then we have

$$
\mu_{i+1} \leq \mu_i^2 \Rightarrow \mu_{i+1} \leq (\mu_{i-1}^2)^2 \cdots \leq (\mu_0^2)^{i+1}$$

It represents that the sequence $\{x_i \}_{i=0}^{n}$ converges with root rate 2, so that it calls thst quadratic converge.

Global Newton Method for Convex function

Assumption

a) $f:\mathbb{R}^n \rightarrow \mathbb{R}$ is twice continuoiusly locally Lipschitz differentiable. i.e. given any bounded set $S \subset \mathbb{R}^n$.

$$
\forall x,y \in S, \;\;\exists L_s < \infty \;\;\text{ such that }\;\; \|H(x) - H(y) \| \leq L_s \| x- y \|$$ b) $\exists m > 0$ such that $m \|y\|^2 \leq \langle y, H(x)y \rangle \;\; \forall x, y \in \mathbb{R}^n$

다시말해 b)는 Strictly convex를 의미함 (최소한 0은 아니기 때문에)

Proposoition

Under the condition which is mentioned in the above assumptions,
a) The level set of $f(\cdot)$ is compact
b) $\min f(x)$ has a unique minimizer.

Proof

a) done
b) Suppose that i.e. $\exists x^*$ and $x^{**} $
왜냐하면 일단, Convex 이므로 Compact 이다. 고로 최소한 하나의 극점은 존재한다. (Convex Set 참조)
그러므로

$$
\begin{align}
0 &= f(x^{**}) – f(x^*) = \langle \nabla f(x^{**}), x^{**} – x^{*} \rangle + \int_0^1 (1-s) \langle x^{**} – x^{*}, H(x^* + s(x^{**} – x^*))(X^** – x^*) \rangle ds \\
&= \frac{1}{2} m \| x^{**} – x^* \|^2 \leq 0
\end{align}$$

위 식이 만족하려면 $x^{**} = x^*$ .

Armijo-Newton Method

Procesdure	Processing
Data	$x_o \in \mathbb{R}^n$
Step 0	Set $i=0$
Step 1	Compute the search direction
	$ h_i = -H^{-1}(x_i) \nabla f(x_i)$
Step 2	Compute the step size
Step 3	Set $x_{i+1} = x_i + \lambda_i h_i$
Step 4	Replace $i$ by $i+1$ and goto Step 1

Step size for Algorithm

$$
\lambda_i = \arg \max_{k \in \mathbb{N}} \{ \beta^k | f(x_i + \lambda h_i) – f(x_i) \leq \beta^k \alpha \langle h_i, \nabla f(x_i) \rangle \}$$

Theorem 2

Suppose that the assumption holds, if $\{x_i\}_{i=0}^{\infty}$ is a sequence constructed by the Armijo-Newton method then $x_i \rightarrow \hat{x}$ as $i \rightarrow \infty$ quadratically.
where $\hat{x}$ is the unique minimizer of $f(x)$

여기서 Directional Derivation 과 Step-size 조건은 다음과 같다.

$$
\begin{align}
h_i = -H^{-1}(x_i) \nabla f(x_i) \\
f(x_i + \beta^k h_i) – f(x_i) \leq \beta^k \alpha \langle h_i, \nabla f(x_i) \rangle
\end{align}$$

proof

It needs to prove that

$x_i \rightarrow \hat{x}$ as $i \rightarrow \infty$
Converge quadratically

proof of 1
먼저 Convergence를 증명한다.
By the proposition, the level set is compact.
Consider $L_0 = \{ x \in \mathbb{R}^n | f(x) \leq f(x_0) \}$ 이때

Strictly Convex 때문에 다음이 만족되며

$$
m \| y \|^2 \leq \langle y, H(x)y \rangle$$

Compact 때문에 다음이 만족된다.

$$
\langle y, H(x)y \rangle \leq M \| y \|^2$$

따라서

$$
m \| y \|^2 \leq \langle y, H(x)y \rangle \leq M \| y \|^2$$

이를 증명한다.

Let $h(x) = -H^{-1}(x) \nabla f(x)$
Then $H(\cdot), \nabla f(\cdot)$ are continuous by assumption.

Let $r = \max \{f(x + \lambda h(x)) | x \in L_0, \lambda \in [0,1] \}$ then

$$
g(x,y) = \langle y, H(x)y \rangle \;\;\; \text{ and } \;\; \|y\| \leq 1 \;\;\;\text{ by compact} \\
x \in L_r = \{ z \in \mathbb{R}^n | f(z) \leq r \} \;\;\;\;\;\;\;\;\;\text{ by compact}$$

왜냐하면 $g : \mathbb{R}^n \times \mathbb{R}^n \rightarrow \mathbb{R}$ 에서 $X \subset \mathbb{R}^n$ 이 Compact 이고 $Y \subset \mathbb{R}^n$ 이 Compact이고 $M \subset \mathbb{R}$ 이 Compact 이기 떄문이다. 즉, $ g: X \times Y \rightarrow M$ . 이기 때문.

그리고 $g(x,y)$는 ($(x,y)$에서 continuous, 고로 $g(x,y) \leq M \| y \|^2$
그러므로 $ x_i \overset{K}{\rightarrow} \hat{x}, \;\{x_i\}_{i=0}^{\infty} \subset L_r$ which is compact. It implies that there exists at least one accumulation point, i.e. $\{x_i\}_{i \in K}$ converges to $\hat{x}$

(왜냐하면, $f(x_i)$는 monotone decreasing sequence 이기 때문)

다음 $\nabla f(\hat{x}) = 0$ 을 증명한다.

Suppose not, i.e. $\nabla f(\hat{x}) \neq 0$, for any $x \in L_r, h(x) = – H^{-1}(x)\nabla f(x)$. 다음과 같이 놓자.

$$
\begin{align}
f(x + \lambda h) – f(x) – \lambda \alpha \langle h, \nabla f(x) \rangle &= \langle \nabla f(x), \lambda h \rangle – \lambda \alpha \langle h, \nabla f(x) \rangle + \int_0^1 (1-s) \langle \lambda h, H(x + s \lambda h) \lambda h \rangle ds \\
&= -\lambda (1 – \alpha) \langle \nabla f(x), H^{-1}(x)\nabla f(x) \rangle + \lambda^2 \int_0^1 (1-s) \langle h, H(x + s \lambda h) h \rangle ds \\
&\leq -\lambda (1 – \alpha) \frac{1}{M}\| \nabla f(x) \|^2 + \lambda^2 \int_0^1 (1-s) \| H^{-1}(x) \|^2 \cdot \| \nabla f(x) \|^2 \cdot \| H(x + s \lambda h) \| ds \\
& \leq -\lambda (1 – \alpha) \frac{1}{M}\| \nabla f(x) \|^2 + \frac{\lambda^2 M}{2 m^2} \| \nabla f(x) \|^2 \\
&= -\lambda \| \nabla f(x) \|^2 \left( \frac{1- \alpha}{M} – \frac{\lambda M}{2 m^2} \right)
\end{align}$$

여기에서

$$
f(x_i + \beta^k h_i) – f(x_i) \leq \beta^k \alpha \langle h_i, \nabla f(x_i) \rangle \Rightarrow f(x_i + \beta^k h_i) – f(x_i) – \beta^k \alpha \langle h_i, \nabla f(x_i) \rangle \leq 0$$

에 따라 반드시

$$
\left( \frac{1- \alpha}{M} – \frac{\lambda M}{2 m^2} \right) \geq 0$$

이어야 한다. 그러므로, 정의에 의해 $\lambda > 0$ 이기 떄문에 $0 < \hat{\lambda} \leq \frac{2m^2 (1 - \alpha)}{M^2}$ 인 $\hat{\lambda}$ 에 대하여

$$
\forall \lambda \in (0, \hat{\lambda}], \;\; f(x_i + \lambda_i h(x_i)) – f(x_i) \leq \lambda_i \alpha \langle h(x_i), \nabla f(x_i) \rangle \tag{2}$$

가 만족된다.
또한, 다음 그림과 같이 $\lambda_i \geq \beta \hat{\lambda}$ 이므로, $\lambda_i$는 상하한을 가진다.

Fig01

따라서, Since $\nabla f(\hat{x}) \neq 0$, i.e. $\langle h(\hat{x}). \nabla f(\hat{x}) \rangle = -\delta < 0$ (Descent 방향이므로 위와 같이 $\lambda$가 선택되어진다면 $\nabla f(\hat{x}) \neq 0$ 조건에서 식 (2)에서 당연하다.),

$$
\exists i_0 \;\; \text{ such that } \;\; \langle h_i, \nabla f(x_i) \rangle \leq -\frac{\delta}{2}$$

by continuity of $h(\cdot)$ and $\nabla f(\cdot)$, and $x_i \rightarrow \hat{x}$,

$$
f(x_{i+1}) – f(x_i) \leq \lambda_i \alpha \langle \nabla f(x_i), h_i \rangle \leq \beta \hat{\lambda} \alpha \left( -\frac{\delta}{2} \right) < 0$$ 따라서 $f(x_i)$는 수렴하지 않고 발산한다. i.e. $f(x_i) \rightarrow \infty, \; \forall i \geq i_0 $
그러므로 $\nabla f(\hat{x}) = 0$ 이다.

proof of convergence quadratically

idea는 local newton method는 quadratically하게 수렴한다는 것을 응용
Local Newton method

$$
\begin{align}
h_i &= -H^{-1}(x_i) \nabla f(x_i) \\
x_{i+1} &= x_i + h_i
\end{align}$$

Armijo Newton

$$
\begin{align}
h_i &= -H^{-1}(x_i) \nabla f(x_i) \\
x_{i+1} &= x_i + \lambda_i h_i
\end{align}$$

두 알고리즘의 경우 $\lambda = 1$ 이면 같다.

Need to prove that

$$
\exists i_0 \;\; \text{ such that } \;\; \forall i \geq i_0, \lambda_i = 1$$

Set $\lambda = 1$ 이면

$$
f(x_i + \lambda_i h_i) – f(x_i) – \alpha \langle \nabla f(x_i), h_i \rangle = \langle \nabla f(x_i), h_i \rangle + \int_0^1 (1-s) \langle
h_i, H(x_i + sh_i) h_i \rangle ds – \alpha \langle \nabla f(x_i), h_i \rangle \tag{3}$$

Assumptiom (a) holds : $H(\cdots)$ Locally Lipschitz continuous, i.e.

$$
\| H(x’) – H(x”) \| \leq L \| x’ – x” \| \;\;\; \forall x’. x” \in S \subset \mathbb{R}^n$$

where $S$ is a bounded set.

Since $x_i \rightarrow \hat{x}$ and $\nabla f(\cdot)$ is continuous, $\nabla f(x_i) \rightarrow \nabla f(\hat{x}) = 0$
then let

$$
\begin{align}
\langle \nabla f(x_i), h_i \rangle &= \frac{1}{2} \langle \nabla f(x_i), h_i \rangle + \frac{1}{2} \langle \nabla f(x_i), h_i \rangle \\
&= -\frac{1}{2} \langle \nabla f(x_i), H^{-1}(x_i)\nabla f(x_i) \rangle + \int_0^1 (1-s)ds \langle \nabla f(x_i), h_i \rangle \\
&= -\frac{1}{2} \langle \nabla f(x_i), H^{-1}(x_i)\nabla f(x_i) \rangle – \int_0^1 (1-s) \langle \nabla f(x_i), H^{-1}(x_i)\nabla f(x_i) \rangle ds \\
&= -\frac{1}{2} \langle \nabla f(x_i), H^{-1}(x_i)\nabla f(x_i) \rangle – \int_0^1 (1-s) \langle H(x_i) H^{-1}(x_i) \nabla f(x_i), H^{-1}(x_i)\nabla f(x_i) \rangle ds\\
&= -\frac{1}{2} \langle \nabla f(x_i), H^{-1}(x_i)\nabla f(x_i) \rangle – \int_0^1 (1-s) \langle H^{-1}(x_i) \nabla f(x_i), H(x_i) H^{-1}(x_i)\nabla f(x_i) \rangle ds\\
\end{align}
\tag{4}$$

Substitue (4) to (3), then the right term of the equation is

$$
\begin{align}
&-\frac{1}{2} \langle \nabla f(x_i), H^{-1}(x_i)\nabla f(x_i) \rangle + \alpha \langle \nabla f(x_i), H^{-1}(x_i)\nabla f(x_i) \rangle \\
&+ \int_0^1(1-s) \langle H^{-1}(x_i)\nabla f(x_i), \left( H(x_i + s h_i) – H(x_i) \right) H^{-1} \nabla f(x_i) \rangle ds \\
\end{align}
\tag{5}$$

Since $\alpha \in (0, \frac{1}{2})$, and $m < H \leq M$

$$
\begin{align}
(5) &= -(\frac{1}{2} – \alpha) \langle \nabla f(x_i), H^{-1}(x_i)\nabla f(x_i) \rangle + \int_0^1(1-s) \langle h_i, ( H(x_i + s h_i) – H(x_i)) h_i\rangle ds \\
&\leq -(\frac{1}{2} – \alpha) \frac{1}{M} \| \nabla f(x_i) \|^2 + \int_0^1(1-s) \| H^{-1}(x_i) \|^2 \| \nabla f(x_i) \|^2 \| H(x_i + s h_i) – H(x_i) \|^2 ds \\
&\leq -(\frac{1}{2} – \alpha) \frac{1}{M} \| \nabla f(x_i) \|^2 + \int_0^1(1-s) \| H^{-1}(x_i) \|^2 \| \nabla f(x_i) \|^2 \cdot L \cdot s \| h_i \| ds \\
&\leq -(\frac{1}{2} – \alpha) \frac{1}{M} \| \nabla f(x_i) \|^2 + \int_0^1s(1-s) ds \cdot L \cdot \frac{\| \nabla f(x_i) \|^3}{m^3} \;\;\;\;\;\; \because \|H^{-1}(x_i) \| < \frac{1}{m} \\ &\leq \| \nabla f(x_i) \|^2 \left( -\frac{1}{M}(\frac{1}{2} - \alpha) + \frac{L}{6m^3} \| \nabla f(x_i) \| \right) \end{align} \tag{6}$$ Since $\nabla f(x_i) \rightarrow \nabla f(\hat{x}) = 0$ 에서 it is possible to get a $i_0$ such that

$$
\| \nabla f(x_i) \| \leq \frac{6m^3}{L M}(\frac{1}{2} – \alpha).$$

It implies that

$$
\exists i_0 \in \mathbb{N} \;\;\text{ such that } f(x_i + h_i) = f(x_i) – \alpha \langle \nabla f(x_i), h_i \rangle \leq 0 \;\;\forall i_0 \geq i$$

식 (6)에서 Armijo -Newton Method는 local minima에 Quadratically converge 함을 ( $\| \nabla f(x_i) \|^2$ 때문에) 알 수 있다.

이후의 증명은 결국 Local Newton Method의 그것과 같다.

Newton Method의 단점

$h_i = -H^{-1}(x_i) \nabla f(x_i) $ 에서

Does $H^{-1}(x_i)$ exists for any $x_i$
Should we compute $H^{-1}$, and $\nabla f(x_i)$

Solution

Let $g_i \equiv \nabla f(x_i)$ ve applied to the update rule $ x_{i+1} = x_i – \alpha_i M_i g_i$ for some matrix $M_i$.
Consider a Taylor expansion of $f(\cdot)$ at $x = x_i$

$$
f(x_{i+1}) = f(x_i) + \langle \nabla f(x_i), x_{i+1} – x_i \rangle + O(|x_{i+1} – x_i|^2) \\
\Rightarrow f(x_{i+1}) – f(x_i) = \langle g_i -\alpha_i M_i g_i \rangle + O(|x_{i+1} – x_i|^2) < 0$$ 위 식이 성립하기 위한 조건은 $M_i$ 가 positive definite 이어야 한다.

Let $M_i = I$ then $x_{i+1} = x_i – \alpha_i g_i$ is converged linearly.
이것을 Armijo-Newton에 결합시키면
Let

$$
M_i = (\varepsilon_i I + H(x_i))^{-1}$$

그러면 어떻게 $\varepsilon_i$를 잡을 것인가.
- pick any $\delta > 0$, $\delta \geq \varepsilon_i + \lambda_i H(x_i)$ 되도록 잡는다.
  ($\lambda_i$ 또 무엇?)

Final characters are always wrap up with the below sentence.

It is Dummy characters for editing and blogging.

<li>Following is the dummy equation for Blogging and Editing</li>

p class=”mathjax”>$$
h_i = \beta(\hat{\lambda} + \varepsilon)$$

월	화	수	목	금	토	일
« 3월
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Stochastic Nonlinear Mathematics

태그 보관물: Optimization

Newton’s Method

Newton’s Method

Local Newton’s Algorithm

Assumption

Basic Idea

Local Newton’s Method

Assumption

Theorem 1.

proof

Global Newton Method for Convex function

Assumption

Proposoition

Proof

Armijo-Newton Method

Theorem 2

proof

Newton Method의 단점

Solution

Jinwuk Seok's Blog