-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Description
Bug Description
The audio transcription endpoint does not consider the actual audio file content when generating cache keys. Instead, it only uses the filename, which causes incorrect cache hits.
Problem
When caching is enabled, sending two different audio files in a short interval returns the same transcription, even though the audio content is different. This happens because:
- The cache key is generated based on the filename only (via
get_audio_file_name()) - Different audio files with the same filename share the same cache key
- Audio files passed as bytes or file-like objects without unique names also share cache keys incorrectly
Impact
- Data Integrity: Users receive incorrect transcriptions from cache
- Reliability: Same transcription returned for different audio files
- User Experience: Workaround requires manually disabling cache with
{cache: { 'no-cache': true }}
Steps to Reproduce
- Enable caching (e.g.,
litellm.cache = Cache()) - Send two different audio files with the same filename (or without filenames)
- Observe that both requests return the same transcription
Expected Behavior
Each unique audio file should generate a unique cache key based on its content, not just its filename. Different audio files should never share the same cache key, regardless of their names.
Current Workaround
Users can disable cache for transcription calls:
litellm.transcription(
model="whisper-1",
file=audio_file,
cache={"no-cache": True}
)However, this is not ideal as caching is still useful for identical audio files.
Proposed Solution
I've implemented a fix in PR #16462 that:
- Calculates SHA256 hash of audio file content instead of using filename
- Handles all input types (bytes, file paths, file-like objects, tuples)
- Maintains backward compatibility with existing code
- Includes comprehensive tests to prevent regression
The solution ensures that:
- Different audio files always generate different cache keys
- Identical audio files generate the same cache key
- Works correctly with all supported input types
Related PR
See PR #16462 for the complete implementation: #16462
Additional Context
This bug affects all users who:
- Use caching with audio transcription
- Process multiple audio files
- Pass audio files as bytes or file-like objects
The fix is minimal, well-tested, and maintains full backward compatibility.