# L-BFGS

L-BFGS is an algorithm for quasi-Newton optimization. L-BFGS uses the Broyden–Fletcher–Goldfarb–Shanno update to approximate the Hessian matrix (L-BFGS stands for 'limited memory BFGS'). L-BFGS is particularly well suited for optimization problems with a large number of dimensions. This is because L-BFGS never explicitly forms or stores the Hessian matrix, which can be quite expensive when the number of dimensions $n\,\!$ becomes large. Instead, L-BFGS maintains a history of the past $m\,\!$ updates of the position $x\,\!$ and gradient $\nabla f(x)$, where generally the history $m\,\!$ can be short, often less than 10. These updates are used to implicitly do operations requiring the Hessian (or its inverse). While strictly, a straight-forward BFGS implementation at the $i\,\!$th iteration would represent the Hessian approximation as informed by all updates on $0 \ldots i-1$, L-BFGS does quite well using updates from only the most recent iterations $i-m \ldots i-1$.

## Representation

An L-BFGS implementation looks just like any other straight-forward quasi-Newton algorithm, with the exception that the process of getting the direction $d_k=-H_k g_k\,\!$. There are multiple published approaches to using a history of updates to form this direction vector. Here, we give a common approach, the so-called "two loop recursion" initially given by [1] and [2].

We'll take as given $x_k\,\!$, the position at the $k\,\!$th iteration, and $g_k=\nabla f(x_k)$ where $f\,\!$ is the function being minimized, where both are column vectors. Then we keep the updates $s_k = x_k - x_{k-1}\,\!$ and $y_k = g_k - g_{k-1}\,\!$. We'll define $r_k = \frac{1}{y^T_k s_k}$. $H^0_k\,\!$ will be the 'initial' approximate inverse Hessian that our estimate at iteration $k\,\!$ begins with. Then we can compute the (uphill) direction as follows :

$q = g_k\,\!$
For $i=m-1 \ldots 0$
$a_i = r_i s^T_i q\,\!$
$q = q - a_i y_i\,\!$
$z = H^{0}_k q$
For $i=0 \ldots m-1$
$b_i = r_i y^T_i z\,\!$
$z = z + s_i (a_i - b_i)\,\!$
$H_k g_k = z\,\!$

Commonly, the inverse Hessian $H^0_k\,\!$ is represented as a diagonal matrix, so that initially setting $z\,\!$ requires only an element-by-element multiplication.

Unfortunately, this two loop update only works for the inverse Hessian. Approaches to implementing L-BFGS using the direct approximate Hessian $B_k\,\!$ have also been developed, as have other means of approximating the inverse Hessian[3].

## Implementation and variants

An early, open source implementation of L-BFGS in Fortran exists. Multiple other open source implementations have been produced by as translations of this Fortran code (e.g. this one in java). Other implementations exists (e.g. Matlab), frequently as part of generic optimization libraries (e.g. Mathematica) A variant which can handle box-constraints on the variables, L-BFGS-B also exists as ACM TOMS algorithm 778,[4] and can be called from R's optim general-purpose optimizer routing by using method="L-BFGS-B"[5]

## Works cited

1. H. Matthies and G. Strang. The solution of non linear finite element equations (1979), International Journal for Numerical Methods in Engineering 14, pp. 1613-1626.
2. J. Nocedal. Updating Quasi-Newton Matrices with Limited Storage (1980), Mathematics of Computation 35, pp. 773-782.
3. Byrd, R.H, Nocedal,J., and Schnabel, R.B. "Representations of Quasi-Newton Matrices and their use in Limited Memory Methods", (1994) Mathematical Programming, 63, 4, pp. 129-156
4. C. Zhu, R. H. Byrd and J. Nocedal. L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization (1997), ACM Transactions on Mathematical Software, Vol 23, Num. 4, pp. 550-560
5.

## Works referenced

1. D. C. Liu and J. Nocedal. On the Limited Memory Method for Large Scale Optimization (1989), Mathematical Programming B, 45, 3, pp. 503-528.
2. R. H. Byrd, P. Lu and J. Nocedal. A Limited Memory Algorithm for Bound Constrained Optimization (1995), SIAM Journal on Scientific and Statistical Computing, 16, 5, pp. 1190-1208.