what differences Between the GitHub Open-Source Version and the Paper Implementation of DeepSeek-Chat-Lite

Thank you for your work.  

May I ask what the differences are between the open-source code on GitHub and the version described in the paper?  
I tested **deepseek-chat-lite** in bigbench, and the throughput is around **280 ms**, whereas the paper reports **155 ms**. I’d like to know where the differences come from.  

test script is：
`CUDA_VISIBLE_DEVICES=0 python examples/interface_example.py  --model_name_or_path /app/data/DeepSeek-V2-Lite-Chat --offload_dir /root/moe-infinity --device_memory_ratio 0.75 --out_len 32`

Many thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

what differences Between the GitHub Open-Source Version and the Paper Implementation of DeepSeek-Chat-Lite #68

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

what differences Between the GitHub Open-Source Version and the Paper Implementation of DeepSeek-Chat-Lite #68

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions