亚洲精品国产美女久久久久久 ,精品伊人久久香,野战视频在线看,国产精品高清一区二区三区绿帽,久热视频精品中文字幕

作者｜子川

來(lái)源｜AI先鋒官

剛剛，OpenAI一口氣在API中推出三款新模型:GPT-4.1、GPT-4.1 mini和 GPT-4.1 nano。

劃重點(diǎn)：

GPT-4.1系列是多模態(tài)模型，不過(guò)只能通過(guò) API 使用。
GPT-4.1系列全面優(yōu)于 GPT-40 和 GPT-4o mini。
GPT-4.1的指令跟隨、編碼和智能方面的功能已經(jīng)整合到最新版本的GPT-4o 中，OpenAI表示后續(xù)會(huì)將更多GPT-4.1的功能整合到GPT-4o。
支持100萬(wàn)Token,是GPT-4o的8倍。
GPT-4.1 nano是OpenAI史上最快、最便宜的模型。
知識(shí)庫(kù)截至?xí)r間是2024年6月。

我們先來(lái)看一下GPT-4.1的跑分成績(jī)。

在 SWE-bench Verified（衡量真實(shí)世界軟件工程技能的指標(biāo)）中，GPT-4.1取得了 54.6% 的成績(jī)，相較于 GPT-4o 提高了 21.4 個(gè)百分點(diǎn)，比 GPT-4.5 提高了 26.6 個(gè)百分點(diǎn)。

在 Scale 的 MultiChallenge 基準(zhǔn)測(cè)試（衡量指令遵循能力的一項(xiàng)指標(biāo)）中，GPT-4.1 的得分為 38.3%，比 GPT-4o 提高了 10.5 個(gè)百分點(diǎn)。

GPT-4.1 系列在圖像理解方面表現(xiàn)也十分不錯(cuò)，尤其是 GPT-4.1 mini，它在圖像基準(zhǔn)測(cè)試中擊敗甚至擊敗了 GPT-4o。

在基于 30-60 分鐘的無(wú)字幕長(zhǎng)視頻回答多項(xiàng)選擇題中，GPT-4.1 在“長(zhǎng)視頻、無(wú)字幕”類別中得分為 72.0%，高于 GPT-4o 的 65.3%。

除了紙面實(shí)力比較抗打外，GPT-4.1的實(shí)際效果也不錯(cuò)。

相比GPT-4o，GPT-4.1 在前端編碼方面有顯著的提升，能夠創(chuàng)建功能更強(qiáng)大且美觀度更高的網(wǎng)頁(yè)應(yīng)用。在OpenAI內(nèi)部的測(cè)試中，付費(fèi)人類評(píng)審員在 80% 的情況下更喜歡 GPT-4.1 生成的網(wǎng)站，而不是 GPT-4o 生成的網(wǎng)站。

測(cè)試Prompt: Make a flashcard web application. The user should be able to create flashcards, search through their existing flashcards, review flashcards, and see statistics on flashcards reviewed. Preload ten cards containing a Hindi word or phrase and its English translation. Review interface: In the review interface, clicking or pressing Space should flip the card with a smooth 3-D animation to reveal the translation. Pressing the arrow keys should navigate through cards. Search interface: The search bar should dynamically provide a list of results as the user types in a query. Statistics interface: The stats page should show a graph of the number of cards the user has reviewed, and the percentage they have gotten correct. Create cards interface: The create cards page should allow the user to specify the front and back of a flashcard and add to the user's collection. Each of these interfaces should be accessible in the sidebar. Generate a single page React app (put all styles inline).

GPT-4.1

GPT-4o

除了自家測(cè)試外，OpenAI還邀請(qǐng)了其他人進(jìn)行測(cè)試GPT4.1的性能。

Windsurf（編程助手）在對(duì)GPT-4.1進(jìn)行測(cè)試時(shí)，發(fā)現(xiàn)GPT-4.1 在 Windsurf 的內(nèi)部編碼基準(zhǔn)上的得分比 GPT-4o 高 60%，他們的用戶也指出，GPT-4.1在工具調(diào)用方面的效率提高了 30%。

同時(shí)GPT-4.1在Blue J最具挑戰(zhàn)性的真實(shí)稅務(wù)場(chǎng)景的內(nèi)部基準(zhǔn)測(cè)試中，GPT-4.1的準(zhǔn)確率比GPT-4o高出53%。

最后就是大家最關(guān)心的價(jià)格了。