Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
140 changes: 140 additions & 0 deletions finance/proposal-calls/cycle5/mhvk-units-masked-uncertainties.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
### Title

Quantities with Array API support, Improved Support for Masks and Uncertainties

### Project Team

Marten van Kerkwijk

### Project Description / Scope of Work

I request continued partial buy-out from my professorship at UofT to be able
to work one day a week on projects that are too large for the time I can
otherwise commit for astropy. Specifically, I propose,

- Facilitate Quantity becoming a container class that can handle not just
ndarray but any type of array, i.e., also dask, jax, etc.
- Ensure Quantity is fully compliant with the new Quantity API being developped.
- Extend the same machinery to Masked and Distribution so that all main astropy
classes can use arbitrary array classes.
- Also extend the machinery to the internal arrays used by Time.
- Speed up unit conversion and thus all of astropy by smarter conversion functions
and caching.
- Finish my implementation of a Variable class that tracks uncertainties and
their correlations analytically (based on the uncertainties package).

#### Roadmap Items

I split these into direct goals of my work and pieces that will be enabled by
it. Here, note that my goal of adding quantity support for non-numpy arrays
includes support for JAX and Dask arrays, which would thus provide a major
requirement for astropy as a whole having support for those.

Directly addressed:

- :green_circle: Add quantity support for non-NumPy arrays.

- :large_orange_diamond: Improve interoperability between unit packages (e.g.,
`astropy.units`, `pint`, `unyt`).

Provides a major requiremeent for:

- :red_square: Support JIT compilation (e.g., numba, JAX, etc.) throughout
Astropy core and coordinated packages.

- :large_orange_diamond: Improve and/or maintain interoperability with
performant I/O file formats and libraries such as HDF5 and Dask.

#### Project / Work / Deliverables

Prior to cycle 4, I spent about a day per week on astropy core, in reviews,
bug fixes, and development. I managed to use extra time for fairly large
developments (Quantity historically and Masked and Uncertainty more recently,
with also fairly major contributions to Time, Table, Representation and
numpy), but it was difficult to find enough time to actually wrap up larger
projects (at least outside sabbaticals). This changed with cycle 4 funding,
and a major part of this request is to complete some of the main parts of the
project I proposed for that cycle.

In particular, in the current cycle I have started to develop Quantity 2.0.
As proposed in [APE 25](https://github.com/astropy/astropy-APEs/pull/91), this
follows the [Array API](https://data-apis.org/array-api/), ensuring
that the new Quantity class will work with any array that supports that API,
which includes those that really matter, like Dask for large, disk-based data
sets and JAX for GPU acceleration. There is a
[prototype](https://github.com/astropy/quantity-2.0), which already supports a
large part of the Array API (basically, those provided by numpy ufuncs) for JAX
and Dask. The work has been waylaid a little in a good way: during this period,
serious discussions started between the various units packages about a shared
[Quantity API](https://github.com/quantity-dev), which we would of course want
to follow.

The primary goal of my proposal here is to finish the implementation, make it
compatible with the new Quantity API, ensure there are no performance
regressions, and of course document it all.

A nice benefit of the approach laid out in
[APE 25](https://github.com/astropy/astropy-APEs/pull/91) is that it will be
very easy to extend it to Masked and Distribution (and possibly Variable), as
those basically are already the type of container classes that APE 25
envisions.

Furthermore, a direct benefit of Quantity being able to use other array types
than ndarray is that this will nearly automatically extend to coordinates
(since those use quantities almost exclusively; I foresee little more work
than adjusting tests!). Time will be slightly more work, as it works directly
with ndarray, but also here the path is straightforward: I can just follow my
earlier work on ensuring Time can work with Masked.

Most of the above would benefit application of astropy on large arrays, by
allowing disk-based ones, and analysis via GPUs. But astropy is often used on
small arrays too, and while reviewing our own Quantity code as well as the
code for ndarray that it relies on, I realized there are a number of ways in
which we can improve the performance of Quantity and Unit operations for
scalars and small arrays, mostly by reducing overhead. Some initial PRs on
the numpy side add a [fast path for
scalars](https://github.com/numpy/numpy/pull/29819) and [include array storage
in the object](https://github.com/numpy/numpy/pull/29878). On the Quantity
side proper, I have a skeleton of code that would make unit conversion
substantially faster, especially if combined with caching. This would again
mostly benefit small arrays. Also for larger ones, I see a nice path forward:
the new dtype machinery of numpy provides a way to do the scaling needed for
unit conversion as part of an operation, thus avoiding the need to create
large temporary arrays.

Finally, an undergraduate I was taught that a number without a unit or an
uncertainty is meaningless. Quantity provides the former, and Distribution
provides a monto-carlo like method for the latter. But often we just would
like to have error propagation, but including covariance. More than a decade
ago, I made a [PR](https://github.com/astropy/astropy/pull/3715) to introduce
a Variable class that tracks uncertainties and covariances (based on the
[uncertainties package](https://pythonhosted.org/uncertainties/), but extended
it to deal natively with arrays). This has been stalled since, but I believe
would still be super useful. A stretch goal of the current proposal is to
finally finish it.

### Approximate Budget

I request funding to replace salary equivalent to one day a week, reducing my
regular employment at the University of Toronto correspondingly. At a
standard rate of USD 150/hour for 8 hours per week and 45 weeks, this
corresponds to USD $54,000 per year.

I note that my approved cycle 4 proposal was for two years, but the funding
cycle means that only 16 months were used. Here, I request funding both to
finish that approved part of my project ($36,000), and in addition request
funding for another year to finish the whole ($54,000). The total request is
thus $90,000.

### Period of Performance

Ideally, I would be covered until June 2027, which is the end of an academic
year.

I note that the funding provides me with teaching relief for one semester of
an academic year. During those semesters I spend more than 1 day/week on
astropy, while I spend less time when I teach. The first semester I had
teaching relief was July-December 2024, and the second will be January-June
2026 (some of which will thus be covered by funds already paid). I think it
will be possible to ensure the next semester will be July-December 2026, so
that most work is finished in 2026.