Skip to content

Commit d2d9f76

Browse files
authored
Docs improvement, sparsity and symbolic AD (#141)
* docs and breaking changes * minor changes * dep version bump * refactor * add docs links * fix docs links * remove pairs in content table * fix headers * use quote * add repo link * remove an AD remark
1 parent f03f541 commit d2d9f76

23 files changed

+720
-313
lines changed

Project.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name = "Nonconvex"
22
uuid = "01bcebdf-4d21-426d-b5c4-6132c1619978"
33
authors = ["Mohamed Tarek <[email protected]> and contributors"]
4-
version = "1.0.4"
4+
version = "2.0.0"
55

66
[deps]
77
NonconvexCore = "035190e5-69f1-488f-aaab-becca2889735"
@@ -10,8 +10,8 @@ Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
1010
Reexport = "189a3867-3050-52da-a836-e630ba90ab69"
1111

1212
[compat]
13-
NonconvexCore = "1"
14-
NonconvexUtils = "0.2"
13+
NonconvexCore = "1.1"
14+
NonconvexUtils = "0.4"
1515
Reexport = "1"
1616
julia = "1"
1717

README.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,54 @@ The `JuliaNonconvex` organization hosts a number of packages which are available
2929
| [NonconvexUtils.jl](https://github.com/JuliaNonconvex/NonconvexUtils.jl) | Some utility functions for automatic differentiation, history tracing, implicit functions and more. | [![Build Status](https://github.com/JuliaNonconvex/NonconvexUtils.jl/workflows/CI/badge.svg)](https://github.com/JuliaNonconvex/NonconvexUtils.jl/actions) | [![Coverage](https://codecov.io/gh/JuliaNonconvex/NonconvexUtils.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/JuliaNonconvex/NonconvexUtils.jl) |
3030
| [NonconvexTOBS.jl](https://github.com/JuliaNonconvex/NonconvexTOBS.jl) | Binary optimization algorithm called "topology optimization of binary structures" ([TOBS](https://www.sciencedirect.com/science/article/abs/pii/S0168874X17305619?via%3Dihub)) which was originally developed in the context of optimal distribution of material in mechanical components. | [![Build Status](https://github.com/JuliaNonconvex/NonconvexTOBS.jl/workflows/CI/badge.svg)](https://github.com/JuliaNonconvex/NonconvexTOBS.jl/actions) | [![Coverage](https://codecov.io/gh/JuliaNonconvex/NonconvexTOBS.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/JuliaNonconvex/NonconvexTOBS.jl) |
3131

32+
## Design philosophy
33+
34+
Nonconvex.jl is a Julia package that implements and wraps a number of constrained nonlinear and mixed integer nonlinear programming solvers. There are 4 unique features of Nonconvex.jl compared to similar packages such as JuMP.jl and NLPModels.jl:
35+
36+
1. Emphasis on a function-based API. Objectives and constraints are normal Julia functions.
37+
2. The ability to nest algorithms to create more complicated algorithms.
38+
3. The ability to automatically handle structs and different container types in the decision variables by automatically vectorizing and un-vectorizing them in an AD compatible way.
39+
40+
## Installing Nonconvex
41+
42+
To install Nonconvex.jl, open a Julia REPL and type `]` to enter the package mode. Then run:
43+
```julia
44+
add Nonconvex
45+
```
46+
47+
Alternatively, copy and paste the following code to a Julia REPL:
48+
```julia
49+
using Pkg; Pkg.add("Nonconvex")
50+
```
51+
52+
## Loading Nonconvex
53+
54+
To load and start using Nonconvex.jl, run:
55+
```julia
56+
using Nonconvex
57+
```
58+
59+
## Quick example
60+
61+
```julia
62+
using Nonconvex
63+
Nonconvex.@load NLopt
64+
65+
f(x) = sqrt(x[2])
66+
g(x, a, b) = (a*x[1] + b)^3 - x[2]
67+
68+
model = Model(f)
69+
addvar!(model, [0.0, 0.0], [10.0, 10.0])
70+
add_ineq_constraint!(model, x -> g(x, 2, 0))
71+
add_ineq_constraint!(model, x -> g(x, -1, 1))
72+
73+
alg = NLoptAlg(:LD_MMA)
74+
options = NLoptOptions()
75+
r = optimize(model, alg, [1.0, 1.0], options = options)
76+
r.minimum # objective value
77+
r.minimzer # decision variables
78+
```
79+
3280
## How to contribute?
3381

3482
**A beginner?** The easiest way to contribute is to read the documentation, test the package and report issues.

docs/make.jl

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@ makedocs(
55
sitename="Nonconvex.jl",
66
pages = [
77
"Getting started" => "index.md",
8-
"Problem definition" => "problem.md",
8+
"Problem definition" => "problem/problem.md",
9+
"Gradients, Jacobians and Hessians" => "gradients/gradients.md",
910
"Algorithms" => [
1011
"Overview" => "algorithms/algorithms.md",
1112
"algorithms/mma.md",
@@ -18,7 +19,7 @@ makedocs(
1819
"algorithms/mts.md",
1920
"algorithms/sdp.md",
2021
],
21-
"Gradients, Jacobians and Hessians" => "gradients.md",
22+
"Optimization result" => "result.md"
2223
],
2324
)
2425

docs/src/algorithms/ipopt.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,11 @@ alg = IpoptAlg()
2727

2828
The options keyword argument to the `optimize` function shown above must be an instance of the `IpoptOptions` struct when the algorihm is an `IpoptAlg`. To specify options use keyword arguments in the constructor of `IpoptOptions`, e.g:
2929
```julia
30-
options = IpoptOptions(first_order = false, tol = 1e-4)
30+
options = IpoptOptions(first_order = false, tol = 1e-4, sparse = false)
3131
```
32-
There are 2 important and special options:
32+
There are 3 important and special options:
3333
- `first_order`: `true` by default. When `first_order` is `true`, the first order Ipopt algorithm will be used. And when it is `false`, the second order Ipopt algorithm will be used.
34+
- `sparse`: `false` by default. When `sparse` is set to `true`, the gradients, Jacobians and Hessians of the function, constraint and Lagrangian functions will be treated as sparse vectors/matrices. In order to be effective, this should be combined with custom gradient/Hessian rules, sparsification or symbolification of some functions in the model. For more on custom gradients, sparsification and symbolification, see the [gradients section](../gradients/gradients.md) in the documentation.
3435
- `linear_constraints`: `false` by default. When `linear_constraints` is `true`, the Jacobian of the constraints will be computed and sparsified once at the beginning. When it is `false`, dense Jacobians will be computed in every iteration.
3536

3637
All the other options that can be set can be found on the [Ipopt options](https://coin-or.github.io/Ipopt/OPTIONS.html) section of Ipopt's documentation.

docs/src/algorithms/minlp.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ alg = JuniperIpoptAlg()
2121
options = JuniperIpoptOptions()
2222
result = optimize(model, alg, x0, options = options)
2323
```
24-
Juniper is an optional dependency of Nonconvex, so you need to load it in order to use it. Note that the integer constraints must be specified when defining variables. See the [problem definition](../problem.md) documentation for more details.
24+
Juniper is an optional dependency of Nonconvex, so you need to load it in order to use it. Note that the integer constraints must be specified when defining variables. See the [problem definition](../problem/problem.md) documentation for more details.
2525

2626
### Construct an instance
2727

@@ -56,7 +56,7 @@ alg = PavitoIpoptCbcAlg()
5656
options = PavitoIpoptCbcOptions()
5757
result = optimize(model, alg, x0, options = options)
5858
```
59-
Pavito is an optional dependency of Nonconvex, so you need to load it in order to use it. Note that the integer constraints must be specified when defining variables. See the [problem definition](../problem.md) documentation for more details.
59+
Pavito is an optional dependency of Nonconvex, so you need to load it in order to use it. Note that the integer constraints must be specified when defining variables. See the [problem definition](../problem/problem.md) documentation for more details.
6060

6161
### Construct an instance
6262

docs/src/gradients.md

Lines changed: 0 additions & 69 deletions
This file was deleted.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Using ChainRules in ForwardDiff
2+
3+
`ForwardDiff` is a forward-mode AD package that pre-dates `ChainRules`. `ForwardDiff` therefore does not use the `frule`s defined in `ChainRules`. In order to force `ForwardDiff` to use the `frule` defined for a function, one can use the `Nonconvex.NonconvexUtils.@ForwardDiff_frule` macro provided in `Nonconvex`. This is useful in case `ForwardDiff` is used for the entire function but a component of this function has an efficient `frule` defined that you want to take advantage of. To force `ForwardDiff` to use the `frule` defined for a function `f(x::AbstractVector)`, you can use:
4+
```julia
5+
Nonconvex.NonconvexUtils.@ForwardDiff_frule f(x::AbstractVector{<:ForwardDiff.Dual})
6+
```
7+
The signature of the function specifies the method that will be re-directed to use the `frule` from `ChainRules`. Such `frule` therefore needs to be defined for `f` to begin with. `f` with multiple inputs, scalar inputs and other input collection types are also supported.

docs/src/gradients/gradients.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Gradients, Jacobians and Hessians
2+
3+
By default, `Nonconvex` uses:
4+
- The reverse-mode automatic differentiation (AD) package, [`Zygote.jl`](https://github.com/FluxML/Zygote.jl), for computing gradients and Jacobians of functions, and
5+
- The forward-mode AD package, [`ForwardDiff.jl`](https://github.com/JuliaDiff/ForwardDiff.jl), over `Zygote.jl` for computing Hessians.
6+
7+
However, one can force `Nonconvex` to use other AD packages or even user-defined gradients and Hessians using special function modifiers. Those special function modifiers customize the behaviour of functions without enforcing the same behaviour on other functions. For instance:
8+
- A specific AD package can be used for one constraint function while the default AD packages are used for other functions in the optimization problem.
9+
- The history of gradients of a specific function can be stored without storing all the gradients of all the functions.
10+
- For functions with a sparse Jacobian or Hessian, the sparsity can be used to speedup the AD using sparse, forward-mode AD for these functions.
11+
12+
In some cases, function modifiers can even be composed on top of each other to create more complex behaviours.
13+
14+
> In `Nonconvex`, function modifiers modify the behaviour of a function when differentiated once or twice using either `ForwardDiff` or any [`ChainRules`](https://github.com/JuliaDiff/ChainRules.jl)-compatible AD package, such as `Zygote.jl`. The following features are all implemented in [`NonconvexUtils.jl`](https://github.com/JuliaNonconvex/NonconvexUtils.jl) and re-exported from `Nonconvex`.
15+
16+
```@contents
17+
Pages = ["user_defined.md", "other_ad.md", "chainrules_fd.md", "sparse.md", "symbolic.md", "implicit.md", "history.md"]
18+
Depth = 3
19+
```

docs/src/gradients/history.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Storing history of gradients
2+
3+
Often one may want to store intermediate solutions, function values and gradients for visualisation or post-processing. This is currently not possible with `Nonconvex.jl` as not all solvers support a callback mechanism. To workround this, the `TraceFunction` modifier can be used to store input, output and optionally gradient values
4+
during the optimization:
5+
```julia
6+
F = TraceFunction(f; on_call = false, on_grad = true)
7+
```
8+
`F` can now be used inplace of `f` in objective and/or constraint functions in a `Nonconvex` model. If the `on_call` keyword argument is set to `true` (default is `true`), the input and output values are stored every time the function `F` is called. If the `on_grad` keyword argument is set to `true` (default is `true`), the input, output and gradient values are stored every time the function `F` is differentiated with either `ForwardDiff` or any `ChainRules`-compatible AD package such as `Zygote.jl`. The history is stored in `F.trace`. The `TraceFunction` modifier can be compsed with other AD-centric function modifiers in `Nonconvex`, e.g. the `sparsify` or `symbolify` function modifiers.

docs/src/gradients/implicit.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Implicit differentiation
2+
3+
## Background
4+
5+
Differentiating implicit functions efficiently using the implicit function theorem has many applications including:
6+
- Nonlinear partial differential equation constrained optimization
7+
- Differentiable optimization layers in deep learning (aka deep declarative networks)
8+
- Differentiable fixed point iteration algorithms for optimal transport (e.g. the Sinkhorn methods)
9+
- Gradient-based bi-level and robust optimization (aka anti-optimization)
10+
- Multi-parameteric programming (aka optimization sensitivity analysis)
11+
12+
For more on implicit differentation, refer to the last part of the [_Understanding automatic differentiation (in Julia)_](https://www.youtube.com/watch?v=UqymrMG-Qi4) video on YouTube and the [_Efficient and modular implicit differentiation_](https://arxiv.org/abs/2105.15183) manuscript for an introduction to the methods implemented here.
13+
14+
## Relationship to [`ImplicitDifferentiation.jl`](https://github.com/gdalle/ImplicitDifferentiation.jl)
15+
16+
[`ImplicitDifferentiation.jl`](https://github.com/gdalle/ImplicitDifferentiation.jl) is an attempt to simplify the implementation in `Nonconvex` making it more lightweight and better documented. For instance, the [documentation of `ImplicitDifferentiation`](https://gdalle.github.io/ImplicitDifferentiation.jl/) presents a number of examples of implicit functions all of which can be defined and used using `Nonconvex`.
17+
18+
## Explicit parameters
19+
20+
There are 4 components to any implicit function:
21+
1. The parameters `p`
22+
2. The variables `x`
23+
3. The residual `f(p, x)` which is used to define `x(p)` as the `x` which satisfies `f(p, x) == 0` for a given value `p`
24+
4. The algorithm used to evaluate `x(p)` satisfying the condition `f(p, x) == 0`
25+
26+
In order to define a differentiable implicit function using `Nonconvex`, you have to specify the "forward" algorithm which finds `x(p)`. For instance, consider the following example:
27+
```julia
28+
using SparseArrays, NLsolve, Zygote, Nonconvex
29+
30+
N = 10
31+
A = spdiagm(0 => fill(10.0, N), 1 => fill(-1.0, N-1), -1 => fill(-1.0, N-1))
32+
p0 = randn(N)
33+
34+
f(p, x) = A * x + 0.1 * x.^2 - p
35+
function forward(p)
36+
# Solving nonlinear system of equations
37+
sol = nlsolve(x -> f(p, x), zeros(N), method = :anderson, m = 10)
38+
# Return the zero found (ignore the second returned value for now)
39+
return sol.zero, nothing
40+
end
41+
```
42+
`forward` above solves for `x` in the nonlinear system of equations `f(p, x) == 0` given the value of `p`. In this case, the residual function is the same as the function `f(p, x)` used in the forward pass. One can then use the 2 functions `forward` and `f` to define an implicit function using:
43+
```julia
44+
imf = ImplicitFunction(forward, f)
45+
xstar = imf(p0)
46+
```
47+
where `imf(p0)` solves the nonlinear system for `p = p0` and returns the zero `xstar` of the nonlinear system. This function can now be part of any arbitrary Julia function differentiated by Zygote, e.g. it can be part of an objective function in an optimization problem using gradient-based optimization:
48+
```julia
49+
obj(p) = sum(imf(p))
50+
g = Zygote.gradient(obj, p0)[1]
51+
```
52+
53+
In the implicit function's adjoint rule definition, the partial Jacobian `∂f/∂x` is used according to the implicit function theorem. Often this Jacobian or a good approximation of it might be a by-product of the `forward` function. For example when the `forward` function does an optimization using a BFGS-based approximation of the Hessian of the Lagrangian function, the final BFGS approximation can be a good approximation of `∂f/∂x` where the residual `f` is the gradient of the Lagrangian function wrt `x`. In those cases, this Jacobian by-product can be returned as the second argument from `forward` instead of `nothing`.
54+
55+
## Implicit parameters
56+
57+
In some cases, it may be more convenient to avoid having to specify `p` as an explicit argument in `forward` and `f`. The following is also valid to use and will give correct gradients with respect to `p`:
58+
```julia
59+
function obj(p)
60+
N = length(p)
61+
f(x) = A * x + 0.1 * x.^2 - p
62+
function forward()
63+
# Solving nonlinear system of equations
64+
sol = nlsolve(f, zeros(N), method = :anderson, m = 10)
65+
# Return the zero found (ignore the second returned value for now)
66+
return sol.zero, nothing
67+
end
68+
imf = ImplicitFunction(forward, f)
69+
return sum(imf())
70+
end
71+
g = Zygote.gradient(obj, p0)[1]
72+
```
73+
Notice that `p` was not an explicit argument to `f` or `forward` in the above example and that the implicit function is called using `imf()`. Using some explicit parameters and some implicit parameters is also supported.
74+
75+
## Matrix-free linear solver in the adjoint
76+
77+
In the adjoint definition of implicit functions, a linear system:
78+
```julia
79+
(df/dy) * x = v
80+
```
81+
is solved to find the adjoint vector. To solve the system using a matrix-free iterative solver (GMRES by default) that avoids constructing the Jacobian `df/dy`, you can set the `matrixfree` keyword argument to `true` (default is `false`).
82+
83+
When set to `true`, the entrie Jacobian matrix is formed and the linear system is solved using LU factorization.
84+
85+
## Arbitrary data structures
86+
87+
Both `p` and `x` above can be arbitrary data structures, not just arrays of numbers.
88+
89+
## Tolerance
90+
91+
The implicit function theorem assumes that some conditions `f(p, x) == 0` is satisfied. In practice, this will only be approximately satisfied. When this condition is violated, the gradient reported by the implicit function theorem cannot be trusted since its assumption is violated. The maximum tolerance allowed to "accept" the solution `x(p)` and the gradient is given by the keyword argument `tol` (default value is `1e-5`). When the norm of the residual function `f(p, x)` is greater than this tolerance, `NaN`s are returned for the gradient instead of the value computed via the implicit function theorem. If additionally, the keyword argument `error_on_tol_violation` is set to `true` (default value is `false`), an error is thrown if the norm of the residual exceeds the specified tolerance `tol`.

0 commit comments

Comments
 (0)