You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Qwen3 VL models rely on "Deepstack" embeddings to improve image-to-text capabilities. This is not implemented in mlx-lm yet, so OCR capability with the unified arch (i.e. vision_add_on) is subpar compared to the mlx-vlm implementation. Ref: 12