新華社評測中國各種 LLM　文心一言得分逼近 GPT-3.5

2023-06-11

Published by

藍骨

Baidu co-founder and CEO Robin Li speaks at the unveiling of Baidus AI chatbot Ernie Bot at an event in Beijing on March 16, 2023. - Chinese search engine company Baidu's shares fell as much as 10 percent after the company unveiled its ChatGPT-like AI software, with investors unimpressed by the bot's display of linguistic and maths skills. (Photo by MICHAEL ZHANG / AFP) / The erroneous mention[s] appearing in the metadata of this photo by MICHAEL ZHANG has been modified in AFP systems in the following manner: [Baidu CTO Wang Haifeng] instead of [Baidu co-founder and CEO Robin Li]. Please immediately remove the erroneous mention[s] from all your online services and delete it (them) from your servers. If you have been authorized by AFP to distribute it (them) to third parties, please ensure that the same actions are carried out by them. Failure to promptly comply with these instructions will entail liability on your part for any continued or post notification usage. Therefore we thank you very much for all your attention and prompt action. We are sorry for the inconvenience this notification may cause and remain at your disposal for any further information you may require.

AI 大型語言模型 (LLM) 的熱潮自從 ChatGPT 推出以來一直持續，中國的科技企業也陸續推出類似的模型。最近新華社就針對各種 LLM 進行測試，結果發現文心一言的能力與 GPT-3.5 已經相當接近。

新華社研究院中國企業發展研究中心早前針對中國的主流 LLM 進行了體驗評測，之後發表了《人工智能大模型體驗報告》，其中包含了四個評價標準，包括基礎能力、智商測試、情商測試、工作輔助能力，結合 300 個問題的回答表現進行評分。結果發現，OpenAI 的 GPT-4 和 GPT-3.5 表現仍然領先中國廠商開發的模型，不過百度的「文心一言」以 1112 分直逼 GPT-3.5 的 1148 分，其中的「智商測試」部分更已經超越 GPT-3.5。

報告指，隨著人工智能的地位和作用越來越重要，政府、企業和社會需要共同努力，各大廠商應投入更多資源，科技企業可以持續發力自研 LLM，而專注於解決方案的行業廠商可以考慮透過深耕行業來彰顯特色。

來源：豆苗網