Merge pull request #16 from JuliaLinearAlgebra/teh/docs

timholy · web-flow · commit aa2f81f5f618 · 2025-10-22T03:25:35.000-05:00
Minor documentation improvements and workflow updates
diff --git a/.github/dependabot.yml b/.github/dependabot.yml
@@ -0,0 +1,7 @@
+# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
+version: 2
+updates:
+  - package-ecosystem: "github-actions"
+    directory: "/" # Location of package manifests
+    schedule:
+      interval: "weekly"
diff --git a/.github/workflows/CI.yml b/.github/workflows/CI.yml
@@ -13,19 +13,19 @@ jobs:
       fail-fast: false
       matrix:
         version:
-          - '1.7'
+          - 'min'
           - '1'
         os:
           - ubuntu-latest
         arch:
           - x64
     steps:
       - uses: actions/checkout@v2
-      - uses: julia-actions/setup-julia@v1
+      - uses: julia-actions/setup-julia@v2
         with:
           version: ${{ matrix.version }}
           arch: ${{ matrix.arch }}
-      - uses: actions/cache@v1
+      - uses: actions/cache@v4
         env:
           cache-name: cache-artifacts
         with:
diff --git a/Project.toml b/Project.toml
@@ -1,7 +1,7 @@
 name = "IncrementalSVD"
 uuid = "de227602-7e15-40a7-b166-bbaff82a52b8"
 authors = ["Tim Holy <tim.holy@gmail.com>"]
-version = "1.0.0"
+version = "1.0.1"
 
 [deps]
 LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
diff --git a/README.md b/README.md
@@ -3,11 +3,13 @@
 IncrementalSVD provides incremental (updating) singular value decomposition.
 This allows you to update an existing SVD with new columns, and even implement
 online SVD with streaming data.
+For cheap approximations of the SVD on large data, it can be orders-of-magnitude
+more accurate than techniques involving random projection.
 
 ## All-at-once usage
 
-For reasons that will be described below, if you want a truncated SVD and your matrix is small enough to fit in memory,
-you're better off using [TSVD](https://github.com/JuliaLinearAlgebra/TSVD.jl). However, IncrementalSVD can do it too:
+If you want a truncated SVD and your matrix is small enough to fit in memory,
+you can use IncrementalSVD like this:
 
 ```julia
 julia> using IncrementalSVD, LinearAlgebra
@@ -21,8 +23,11 @@ julia> Vt = Diagonal(s) \ (U' * X);
 
 Note that `Vt` is *not* returned by `isvd`; for reasons described [below](#on-the-fly-v) we compute it afterwards.
 
-`isvd` uses incremental updating, which is lossy to an extent that depends on the distribution of singular values.
-For comparison:
+In typical cases, `isvd` returns a (good) *approximation* of the true SVD.
+This is in contrast with packages like
+[TSVD](https://github.com/JuliaLinearAlgebra/TSVD.jl) which return an exact
+(within numerical precision) answer.
+Let's compare the error of the rank-4 approximation computed by `isvd` with that computed by TSVD:
 
 ```julia
 julia> using TSVD
@@ -36,9 +41,10 @@ julia> norm(X - U2*Diagonal(s2)*V2')
 1.9177860422120783
 ```
 In this particular case, the rank-4 absolute error with TSVD is a few percent better than with IncrementalSVD.
-The error of incremental SVD comes from the fact that it works on chunks, and there is a truncation step after each chunk that discards information; see [Brand 2006](#references) Eq 5 for more insight.
+The error of incremental SVD comes from the fact that it works on chunks, and after each chunk any excess components are truncated, resulting in a loss of information.
+See [Brand 2006](#references) Eq 5 for more insight.
 
-However, the *real* use-case for IncrementalSVD is in computing incremental updates or handling cases where `X` is too large to fit in memory all at once, and for such applications it handily beats alternatives like random projection + power iteration (e.g., `rsvd` from [RandomizedLinAlg.jl](https://github.com/JuliaLinearAlgebra/RandomizedLinAlg.jl)).
+However, the *real* use-case for IncrementalSVD is in computing incremental updates or handling cases where `X` is too large to fit in memory all at once, and for such applications it handily beats alternatives like random projection + power iteration (e.g., `rsvd` from [RandomizedLinAlg.jl](https://github.com/JuliaLinearAlgebra/RandomizedLinAlg.jl)). See details below.
 
 ## Incremental updates
 
@@ -63,7 +69,11 @@ julia> s
  4.18050301615471
  3.662876466035874
  2.923979120208828
+```
+
+For comparison, the true answer is:
 
+```julia
 julia> F = svd(X);
 
 julia> F.S
@@ -75,6 +85,8 @@ julia> F.S
  1.7956053622541457
 ```
 
+The singular values computed by `update!` were accurate to 3-5 digits.
+
 `isvd` is just a thin wrapper over this basic iterative update.
 
 ## Reducing error
diff --git a/src/IncrementalSVD.jl b/src/IncrementalSVD.jl
@@ -14,6 +14,19 @@ The public functions are:
 - [`IncrementalSVD.Cache`](@ref)          (not exported)
 """ IncrementalSVD
 
+"""
+    U, s = isvd(X::AbstractMatrix{<:Real}, nc)
+
+Compute an incremental thin SVD of the matrix `X`, returning the left singular
+vectors `U` and the singular values `s`.  The number of retained components
+is specified by `nc`.
+
+`V` may be obtained via `V = (X' * U) / Diagonal(s)`.
+
+`isvd` is just a wrapper around repeated calls to [`IncrementalSVD.update!`](@ref).
+In cases with streaming or large `X` that cannot fit into memory, you may prefer
+to use `IncrementalSVD.update!` directly with smaller chunks of `X`.
+"""
 function isvd(X::AbstractMatrix{<:Real}, nc)
     Base.require_one_based_indexing(X)
     T = float(eltype(X))
@@ -80,9 +93,8 @@ computation of the SVD. `U` and `s` are updated in-place as well as returned.
 You can reuse temporary storage by creating `cache` (see [`IncrementalSVD.Cache`](@ref)).
 
 There are two ways to initialize:
-- `U, s, V = zeros(T, m, r), zeros(T, r), zeros(T, n, r)`. This specifies
-  the element type `T`, the number of rows `m`, the rank `r`, and the number
-  of columns `n`. If you're computing `V`, this is the only option.
+- `U, s = zeros(T, m, r), zeros(T, r)`. This specifies the element type `T`, the
+  number of rows `m` and the rank `r`.
 - `U, s = nothing, nothing`. This will use `size(U) = size(A)`, i.e.,
   the chunk size specifies the truncated rank.