Skip to content

汉字的信息密度更高,适用于和 LLM 通信 #706

@king-bcolor

Description

@king-bcolor

可以提取一些抽象规范,用于规定汉字和大模型沟通,打造最节省 token 的语言
比如一段中文
晨,小明食椒,腹痛而奔,求水于厨人,不遂而亡
chatGPT翻译成英文:
In the morning, Xiao Ming ate chili, suffered stomach pain, ran away, asked the cook for water, but failed to get it, and died.

https://platform.openai.com/tokenizer 网站中计算 token
最后中文 22 token ,英文 28 token。
在文言文中,"小明"如果再次出现往往简写成"明"来代指,
如果是字符 "Xiao Ming" 占 3 token,如果是"明"只占 1 token

Image Image Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions