Skip to content

DEBT: Prefill Trickery #226

@will-lms

Description

@will-lms

PR #224 adds a "rachet" to the prompt progress callback. This is needed so that prompt progress does not appear to go backwards when we perform prefill in the CacheWrapper (the component that enables cache re-use for text-only prompts).

When prefill is done by the CacheWrapper, the progress callback is called as expected. The callback and a single input token are then passed to the mlx_lm.stream_generate method, which calls the progress callback again with "0 out of 1" tokens processed. This causes progress to appear to regress briefly. The ratchet patches this by ignoring any progress callback that is not monotonically increasing.

This is more of a band-aid than a proper fix. A proper solution would be to unify the prefill logic so that we don't need to "trick" mlx_lm.stream_generate into thinking there is only a single prompt token to process.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions