Skip to content

Commit c788f17

Browse files
authored
[Docs]: Readme Fix (#617)
Signed-off-by: Abukhoyer Shaik <[email protected]>
1 parent ed965fd commit c788f17

File tree

2 files changed

+4
-2
lines changed

2 files changed

+4
-2
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -108,8 +108,8 @@ For more details about using ``QEfficient`` via Cloud AI 100 Apps SDK, visit [Li
108108

109109
## Documentation
110110

111-
* [Quick Start Guide](https://quic.github.io/efficient-transformers/source/quick_start.html#)
112-
* [Python API](https://quic.github.io/efficient-transformers/source/hl_api.html)
111+
* [Quick Start Guide](https://quic.github.io/efficient-transformers/source/quick_start.html)
112+
* [QEFF API](https://quic.github.io/efficient-transformers/source/qeff_autoclasses.html)
113113
* [Validated Models](https://quic.github.io/efficient-transformers/source/validate.html)
114114
* [Models coming soon](https://quic.github.io/efficient-transformers/source/validate.html#models-coming-soon)
115115

docs/source/supported_features.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ Supported Features
3030
- Enables execution with FP8 precision, significantly improving performance and reducing memory usage for computational tasks.
3131
* - Prefill caching
3232
- Enhances inference speed by caching key-value pairs for shared prefixes, reducing redundant computations and improving efficiency.
33+
* - On Device Sampling
34+
- Enables sampling operations to be executed directly on the QAIC device rather than the host CPU for QEffForCausalLM models. This enhancement significantly reduces host-device communication overhead and improves inference throughput and scalability. Refer `sample script <https://github.com/quic/efficient-transformers/blob/main/examples/on_device_sampling.py>`_ for more **details**.
3335
* - Prompt-Lookup Decoding
3436
- Speeds up text generation by using overlapping parts of the input prompt and the generated text, making the process faster without losing quality. Refer `sample script <https://github.com/quic/efficient-transformers/blob/main/examples/pld_spd_inference.py>`_ for more **details**.
3537
* - :ref:`PEFT LoRA support <QEffAutoPeftModelForCausalLM>`

0 commit comments

Comments
 (0)