Add WoodburyEstimator for high dimensionality (#94)

timholy · web-flow · commit a9ab2209ecfe · 2024-01-09T11:20:37.000+01:00
* Add `WoodburyEstimator` for high dimensionality

When the covariance matrix is too large to handle in full,
this provides the option to model it as `Σ = σ²I + U * Λ * U'`
for some low-rank `U` and diagonal `Λ`.

* Address review comments

* Fix errors

I swapped two of the norms (whoops) and made one typo.

* Add complete docs on Woodbury

* Fix c formula
diff --git a/Project.toml b/Project.toml
@@ -1,17 +1,21 @@
 name = "CovarianceEstimation"
 uuid = "587fd27a-f159-11e8-2dae-1979310e6154"
 authors = ["Mateusz Baran <mateuszbaran89@gmail.com>", "Thibaut Lienart"]
-version = "0.2.11"
+version = "0.2.12"
 
 [deps]
 LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
 Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
 StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
+TSVD = "9449cd9e-2762-5aa3-a617-5413e99d722e"
+WoodburyMatrices = "efce3f68-66dc-5838-9240-27a6d6f5f9b6"
 
 [compat]
 LinearAlgebra = "1"
 Statistics = "1"
 StatsBase = "0.33, 0.34"
+WoodburyMatrices = "1"
+TSVD = "0.4"
 julia = "1.6"
 
 [extras]
diff --git a/docs/src/assets/donoho_fig3.png b/docs/src/assets/donoho_fig3.png
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -7,7 +7,7 @@ A package for robustly estimating covariance matrices of real-valued data.
 ## Package Features
 
 - Standard corrected and uncorrected covariance estimators,
-- Linear and Nonlinear shrinkage estimators
+- Linear and Nonlinear shrinkage estimators, including estimators for covariance matrices too large to store in dense form
 - Focus on speed and lightweight dependencies
 
 ## Manual outline
diff --git a/docs/src/lib/public.md b/docs/src/lib/public.md
@@ -29,4 +29,7 @@ PerfectPositiveCorrelation
 ConstantCorrelation
 AnalyticalNonlinearShrinkage
 BiweightMidcovariance
+NormLossCov
+StatLossCov
+WoodburyEstimator
 ```
diff --git a/docs/src/man/nlshrink.md b/docs/src/man/nlshrink.md
@@ -8,4 +8,26 @@ F = eigen(X)
 F.U*(d̃ .* F.U') # d̃ is a vector of transformed eigenvalues
 ```
 
-Currently, only the analytical nonlinear shrinkage ([`AnalyticalNonlinearShrinkage`](@ref)) method is implemented.
+Currently, there are two flavors of analytical nonlinear shrinkage:
+- [`AnalyticalNonlinearShrinkage`](@ref) is recommended in cases where the covariance matrix can be stored as a dense matrix
+- for cases where the covariance matrix is too large to handle in dense form, [`WoodburyEstimator`](@ref) models the covariance matrix as
+
+    Σ = σ²I + U * Λ * U'
+
+where `σ` is a scalar, `I` is the identity matrix, `U` is a low-rank semi-orthogonal matrix, and `Λ` is diagonal.
+One can readily compute with this representation via the [Woodbury matrix identity](https://en.wikipedia.org/wiki/Woodbury_matrix_identity) and the [WoodburyMatrices package](https://github.com/JuliaLinearAlgebra/WoodburyMatrices.jl).
+This formulation approximates the covariance matrix as if all but a few (largest) eigenvalues are equal to `σ²`.
+A [truncated singular value decomposition](https://github.com/JuliaLinearAlgebra/TSVD.jl) of the data matrix is
+performed and the corresponding eigenvalues are shrunk by optimal methods for a wide variety of loss functions:
+
+- [`NormLossCov`](@ref) allows you to specify that you want to minimize some notion of loss against the "true" covariance matrix
+- [`StatLossCov`](@ref) allows you to optimize for certain specific statistical outcomes, e.g., optimizing the accuracy of Mahalanobis distances.
+
+The eigenvalue shrinkage function is plotted for all choices below:
+
+![Donoho et al Fig 3](../assets/donoho_fig3.png)
+
+For complete details, see:
+
+Donoho, D.L., Gavish, M. and Johnstone, I.M., 2018.
+Optimal shrinkage of eigenvalues in the spiked covariance model. Annals of statistics, 46(4), p.1742.
diff --git a/src/CovarianceEstimation.jl b/src/CovarianceEstimation.jl
@@ -4,6 +4,8 @@ using Statistics
 using StatsBase
 using LinearAlgebra
 import StatsBase: cov
+using WoodburyMatrices
+using TSVD
 
 export cov
 export CovarianceEstimator, SimpleCovariance,
@@ -14,12 +16,18 @@ export CovarianceEstimator, SimpleCovariance,
     # Eigendecomposition-based methods
     AnalyticalNonlinearShrinkage,
     # Biweight midcovariance
-    BiweightMidcovariance
+    BiweightMidcovariance,
+    # Woodbury-based methods
+    WoodburyEstimator,
+    # Loss functions
+    NormLossCov, StatLossCov
 
 
 include("utils.jl")
+include("loss.jl")
 include("biweight.jl")
 include("linearshrinkage.jl")
 include("nonlinearshrinkage.jl")
+include("woodbury.jl")
 
 end # module
diff --git a/src/loss.jl b/src/loss.jl
@@ -0,0 +1,120 @@
+# Nonlinear shrinkage estimation can be described in terms of a loss function measuring the error between
+# the target and the sample eigenvalues. In:
+#    Donoho, D.L., Gavish, M. and Johnstone, I.M., 2018.
+#    Optimal shrinkage of eigenvalues in the spiked covariance model. Annals of statistics, 46(4), p.1742.
+# there is a systematic analysis of different loss functions; their classification scheme is encoded here.
+
+# Implementation note:
+# While we could parametrize all these different loss functions in the type system and use dispatch,
+# that would induce needless specialization: every function that took a `LossFunction` would have to be
+# specialized, even if the the argument encoding the loss function is merely "passed-through."
+# So instead, we hide details from the type system, dividing the Donoho classification scheme into just two types,
+# `NormLossCov` and `StatLossCov`, and use a fast runtime check to determine which shrinkage function to use.
+
+abstract type LossFunction end
+
+"""
+    NormLossCov(norm::Symbol, pivotidx::Int)
+
+Specify a loss function for which the estimated covariance will be optimal. `norm` is one of
+`:L1`, `:L2`, or `:Linf`, and `pivotidx` is an integer from 1 to 7, as specified in Table 1 (p. 1755)
+of Donoho et al. (2018). In the table below, `A` and `B` are the target and sample covariances,
+respectively, and the loss function is the specified norm on the quantity in the `pivot` column:
+
+| `pivotidx` | `pivot` | Notes |
+|------------|---------|-------|
+| 1          | `A - B`     | |
+| 2          | `A⁻¹ - B⁻¹`     | |
+| 3          | `A⁻¹ B - I`     | Not available for `:L1` |
+| 4          | `B⁻¹ A - I`     | Not available for `:L1` |
+| 5          | `A⁻¹ B + B⁻¹ A - 2I` | Not supported |
+| 6          | `sqrt(A) \\ B / sqrt(A) - I` | |
+| 7          | `log(sqrt(A) \\ B / sqrt(A))` | Not supported |
+
+See also [`StatLossCov`](@ref).
+
+Reference:
+    Donoho, D.L., Gavish, M. and Johnstone, I.M., 2018.
+    Optimal shrinkage of eigenvalues in the spiked covariance model. Annals of statistics, 46(4), p.1742.
+"""
+struct NormLossCov <: LossFunction
+    # Lᴺᴷ where N is the norm and K is an integer (1 through 7) representing the pivot function
+    norm::Symbol
+    pivotidx::Int
+
+    function NormLossCov(norm::Symbol, pivotidx::Int)
+        norm ∈ (:L1, :L2, :Linf) || throw(ArgumentError("norm must be :L1, :L2, or :Linf"))
+        1 <= pivotidx <= 7 || throw(ArgumentError("pivotidx must be from 1 to 7 (see Table 1 in Donoho et al. (2018))"))
+        return new(norm, pivotidx)
+    end
+end
+
+"""
+    StatLossCov(mode::Symbol)
+
+Specify a loss function for which the estimated covariance will be optimal. `mode` is one of
+`:st`, `:ent`, `:div`, `:aff`, or `:fre`, as specified in Table 2 (p. 1757) of Donoho et al. (2018).
+In the table below, `A` and `B` are the target and sample covariances, respectively:
+
+| `mode` | loss |  Interpretation  |
+|--------|---------|-----|
+| `:st`  | `st(A, B) = tr(A⁻¹ B - I) - log(det(B)/det(A))` | Minimize `2 Dₖₗ(N(0, B)||N(0, A))` where `N` is normal distribution |
+| `:ent` | `st(B, A)` | Minimize errors in Mahalanobis distances |
+| `:div` | `st(A, B) + st(B, A)` | |
+| `:aff` | `0.5 * log(det(A + B) / (2 * sqrt(det(A*B))))` | Minimize Hellinger distance between `N(0, A)` and `N(0, B)` |
+| `:fre` | `tr(A + B - 2sqrt(A*B))` | |
+"""
+struct StatLossCov <: LossFunction
+    mode::Symbol
+
+    function StatLossCov(mode::Symbol)
+        statlosses = (:st, :ent, :div, :aff, :fre)
+
+        mode ∈ statlosses || throw(ArgumentError("mode must be among $(statlosses)"))
+        return new(mode)
+    end
+end
+
+
+# Implement Table 2, Donoho et al. (2018), p. 1757
+
+function shrinker(loss::NormLossCov, ℓ::Real, c::Real, s::Real)
+    # See top of file for why these are branches rather than dispatch
+    norm, pivotidx = loss.norm, loss.pivotidx
+    pivotidx ∈ (5, 7) && throw(ArgumentError("Pivot index $(pivotidx) is not supported, see Table 2 in Donoho et al. 2018"))
+    if norm == :L2         # Frobenius
+        return  pivotidx == 1 ? ℓ * c^2 + s^2 :
+                pivotidx == 2 ? ℓ / (c^2 + ℓ * s^2) :
+                pivotidx == 3 ? (ℓ * c^2 + ℓ^2 * s^2) / (c^2 + ℓ^2 * s^2) :
+                pivotidx == 4 ? (ℓ^2 * c^2 + s^2) / (ℓ * c^2 + s^2) :
+                #= pivotidx == 6 =# 1 + (ℓ - 1) * c^2 / (c^2 + ℓ * s^2)^2
+    elseif norm == :Linf   # Operator
+        pivotidx ∈ (3, 4) && throw(ArgumentError("Pivot index $(pivotidx) is not supported for Linf norm, see Table 2 in Donoho et al. 2018"))
+        return  pivotidx ∈ (1, 2) ? ℓ :
+                #= pivotidx == 6 =# 1 + (ℓ - 1) / (c^2 + ℓ * s^2)
+    elseif norm == :L1     # Nuclear
+        val = pivotidx == 1 ? 1 + (ℓ - 1) * (1 - 2s^2) :
+              pivotidx == 2 ? ℓ / (c^2 + (2ℓ-1)*s^2) :
+              pivotidx == 3 ? ℓ / (c^2 + ℓ^2*s^2) :
+              pivotidx == 4 ? (ℓ^2*c^2 + s^2) / ℓ :
+              #= pivotidx == 6 =# (ℓ - (ℓ - 1)^2*c^2*s^2) / (c^2 + ℓ*s^2)^2
+        return max(val, 1)
+    end
+    throw(ArgumentError("Norm $(norm) is not supported"))
+end
+
+function shrinker(loss::StatLossCov, ℓ::Real, c::Real, s::Real)
+    mode = loss.mode
+    if mode == :st
+        return ℓ / (c^2 + ℓ * s^2)
+    elseif mode == :ent
+        return ℓ * c^2 + s^2
+    elseif mode == :div
+        return sqrt((ℓ^2 * c^2 + ℓ * s^2) / (c^2 + ℓ * s^2))
+    elseif mode == :fre
+        return (sqrt(ℓ) * c^2 + s^2)^2
+    elseif mode == :aff
+        return ((1 + c^2)*ℓ + s^2) / (1 + c^2 + ℓ * s^2)
+    end
+    throw(ArgumentError("Mode $(mode) is not supported"))
+end
diff --git a/src/utils.jl b/src/utils.jl
@@ -17,3 +17,9 @@ totalweight(_, weights) = sum(weights)
 # Dividing by zero produces zero
 guardeddiv(num, denom) = iszero(denom) ? zero(num)/oneunit(denom) : num/denom
 diaginv(guard::Bool, num, v) = guard ? map(z -> guardeddiv(num, z), v) : num ./ v
+
+function weightedX(X::AbstractMatrix, weights::FrequencyWeights; dims=1)
+    rootweights = sqrt.(weights)
+    return dims == 1 ? X .* rootweights : rootweights' .* X
+end
+weightedX(X::AbstractMatrix; dims=1) = X
diff --git a/src/woodbury.jl b/src/woodbury.jl
@@ -0,0 +1,109 @@
+# Covariance estimation for high-dimensional data
+# Uses a covariance model of the form `Σ = σ²I + U * Λ * U'`, where `U` is a low-rank matrix of eigenvectors and
+# `Λ` is a diagonal matrix capturing the excess width along the dimensions in `U` compared to isotropic.
+
+# If you're curious about dispatch and inferrability, see the "Implementation note" at the top of src/loss.jl.
+# We employ the same de-specialization trick even for `σ²` in `WoodburyEstimator`, as we'll adopt the eltype of
+# `Λ` anyway.
+
+"""
+    WoodburyEstimator(loss::LossFunction, rank::Integer;
+                      σ²::Union{Real,Nothing}=nothing, corrected::Bool=false)
+
+Specify that covariance matrices should be estimated using a "spiked" covariance model
+
+    Σ = σ²I + U * Λ * U'
+
+`loss` is either a [`NormLossCov`](@ref) or [`StatLossCov`](@ref) object, which specifies the
+loss function for which the estimated covariance will be optimal. `rank` is the maximum
+number of eigenvalues `Λ` to retain in the model. Optionally, one may specify `σ²` directly,
+or it can be estimated from the data matrix (`σ²=nothing`). Set `corrected=true` to use
+the unbiased estimator of the variance.
+"""
+struct WoodburyEstimator{L<:LossFunction} <: CovarianceEstimator
+    loss::L
+    rank::Int
+    σ²::Union{Real,Nothing} # common diagonal variance, `nothing` indicates unknown
+    corrected::Bool
+end
+WoodburyEstimator(loss::LossFunction, rank::Integer; σ²::Union{Real,Nothing}=nothing, corrected::Bool=false) =
+    WoodburyEstimator(loss, rank, σ², corrected)
+
+"""
+    cov(estimator::WoodburyEstimator, X::AbstractMatrix, weights::FrequencyWeights...; dims::Int=1, mean=nothing, UsV=nothing)
+
+Estimate the covariance matrix from the data matrix `X` using a "spiked" covariance model
+
+    Σ = σ²I + U * Λ * U',
+
+where `U` is a low-rank matrix of eigenvectors and `Λ` is a diagonal matrix.
+
+Reference:
+    Donoho, D.L., Gavish, M. and Johnstone, I.M., 2018.
+    Optimal shrinkage of eigenvalues in the spiked covariance model. Annals of statistics, 46(4), p.1742.
+
+When `σ²` is not supplied in `estimator`, it is calculated from the residuals `X - X̂`, where `X̂` is the
+low-rank approximation of `X` used to generate `U` and `Λ`.
+
+If `X` is too large to manipulate in memory, you can pass `UsV = (U, s, V)` (a truncated SVD of `X - mean(X; dims)`)
+and then `X` will only be used compute the dimensionality and number of observations. This requires that you
+specify `estimator.σ²`.
+"""
+function cov(estimator::WoodburyEstimator,  X::AbstractMatrix{<:Real}, weights::FrequencyWeights...;
+             dims::Int=1, mean=nothing, UsV = nothing)
+    # Argument validation
+    dims ∈ (1, 2) || throw(ArgumentError("Argument dims can only be 1 or 2 (given: $dims)"))
+    p = size(X, 3 - dims)
+    p >= estimator.rank || throw(ArgumentError("Argument rank (got $(estimator.rank)) must be less than the number of observations (size(X, dims)=$(size(X, dims)))"))
+    wn = totalweight(size(X, dims), weights...)
+
+    local ΔX
+    U, s, V = if UsV === nothing
+        if mean === nothing
+            mean = Statistics.mean(X, weights...; dims=dims)
+        end
+        # Compute the low-rank approximation of the centered data matrix
+        ΔX = weightedX(X .- mean, weights...; dims=dims)
+        tsvd(ΔX, estimator.rank)
+    else
+        UsV
+    end
+
+    T = eltype(s)
+    σ² = estimator.σ²
+    σ² = if σ² === nothing
+        ΔΔX = ΔX - U*Diagonal(s)*V'
+        # The number of degrees of freedom is (number of observations minus the rank)*dimensionality
+        nσ = (totalweight(size(X, dims), weights...) - estimator.rank) * size(X, 3-dims)
+        sum(abs2, ΔΔX) / (nσ - estimator.corrected)
+    else
+        T(σ²)
+    end::T     # fix inferrability (see note at top of file)
+
+    # Ratio of dimensionality to number of observations (the principal parameter in Random Matrix Theory)
+    γ = p / wn
+
+    # Implement the optimal shrinkage algorithm
+    λ_shrunk = shrink.(Ref(estimator.loss), s.^2 ./ wn, σ², γ)
+    keep = (!iszero).(λ_shrunk)
+
+    # Return the shrunk covariance matrix as a WoodburyMatrix
+    return SymWoodbury(σ² * I(p), dims == 1 ? V[:, keep] : U[:, keep], Diagonal(λ_shrunk[keep]))
+end
+
+function shrink(loss::LossFunction, λ::Real, σ²::Real, γ::Real)
+    # Implement the procedure on Donoho et al. (2018), p. 1758
+    # We return the difference from σ², since that's already contained in the diagonal term
+    λu = λ / σ²
+    λ₊ = (1 + sqrt(γ))^2
+    λu < λ₊ && return zero(σ²)
+    # Calculate the "de-biased" eigenvalue ℓ (Eq. 1.10)
+    λ′ = λu + 1 - γ
+    ℓ = (λ′ + sqrt(λ′^2 - 4λu)) / 2
+    # Calculate the cosine (Eq. 1.6)
+    c = sqrt((1 - γ / (ℓ - 1)^2) / (1 + γ / (ℓ - 1)))
+    # Calculate the sine
+    s = sqrt(1 - c^2)
+    # Apply the shrinker
+    return σ² * (shrinker(loss, ℓ, c, s) - 1)
+end
diff --git a/test/experiments/woodbury.jl b/test/experiments/woodbury.jl
diff --git a/test/runtests.jl b/test/runtests.jl
diff --git a/test/test_woodbury.jl b/test/test_woodbury.jl