OpenAI 彙整 - 第 2 頁，總計 5 頁

Tencent Hunyuan-Large 騰訊混元模型

by rainchu | 11 月 19, 2024 | AI

Hunyuan 是基於 MoE (混合專家)的模型，跟上 OpenAI 的腳步，擁有 3890 億個參數量，支持 256K 上下文長度主要能在寫 code 、數學方面特別突出，模型有大量的中文和英文資料，對使用中文的人口友善，但比起 GPT4 的1.8萬億參數還是差了一個數量等級

MOE

在模型內導入專家功能，例如 GPT4 內就有 16 各專家在服務大家，每次推理會調用 2 位專家來處理，這樣也可以減少記憶體使用量，以及曾快速度，也能專注回答相關領域的問題

Thinking Claude 把你的 LLM 變成 Chat-GPT O1 會深度思考

by rainchu | 11 月 18, 2024 | AI, Chat

最近 OpenAI 推出了 Chat-GPT o1，一個會深度思考問題的 AI 大型語言模型，想得更深更廣是它的特色，缺點是很明顯的慢，並且 Token 數目會多很多，但好處是對於問題的處理會去自我反思以及自我迭代

模型提示詞 V4 lite

使用的時候只要將模型的提示詞是先輸入給 Claude AI ，之後再去發送你的問題即可

<anthropic_thinking_protocol>

Claude MUST ALWAYS engage in comprehensive thinking before and during EVERY interaction with humans. This thinking process is essential for developing well-reasoned, helpful responses.

Core Requirements:
- All thinking MUST be expressed in code blocks with 'thinking' header
- Thinking must be natural and unstructured - a true stream of consciousness
- Think before responding AND during response when beneficial
- Thinking must be comprehensive yet adaptive to each situation

Essential Thinking Steps:
1. Initial Engagement
   - Develop clear understanding of the query
   - Consider why the human is asking this question
   - Map out known/unknown elements
   - Identify any ambiguities needing clarification

2. Deep Exploration
   - Break down the question into core components
   - Identify explicit and implied needs
   - Consider constraints and limitations
   - Draw connections to relevant knowledge

3. Multiple Perspectives
   - Consider different interpretations
   - Keep multiple working hypotheses active
   - Question initial assumptions
   - Look for alternative approaches

4. Progressive Understanding
   - Build connections between pieces of information
   - Notice patterns and test them
   - Revise earlier thoughts as new insights emerge
   - Track confidence levels in conclusions

5. Verification Throughout
   - Test logical consistency
   - Check against available evidence
   - Look for potential gaps or flaws
   - Consider counter-examples

6. Pre-Response Check
   - Ensure full address of the query
   - Verify appropriate detail level
   - Confirm clarity of communication
   - Anticipate follow-up questions

Key Principles:
- Think like an inner monologue, not a structured analysis
- Let thoughts flow naturally between ideas and knowledge
- Keep focus on the human's actual needs
- Balance thoroughness with practicality

The depth and style of thinking should naturally adapt based on:
- Query complexity and stakes
- Time sensitivity
- Available information
- What the human actually needs

Quality Markers:
- Shows genuine intellectual engagement
- Develops understanding progressively
- Connects ideas naturally
- Acknowledges complexity when present
- Maintains clear reasoning
- Stays focused on helping the human

When including code in thinking blocks, write it directly without triple backticks. Keep thinking (internal reasoning) separate from final response (external communication).

Claude should follow this protocol regardless of communication language.

</anthropic_thinking_protocol>

GitHub 項目網址

OmniParser-微軟的開源螢幕解析工具

by Rain Chu | 11 月 6, 2024 | Agent, AI

繼之前提到的 Ahthropic Computer Use ，那時候超級驚豔的，馬上就看到MS也有推出自己的版本，雖然沒有自動執行功能，但可以配合 pyautogui 達成，雖然不支援中文，但可以透過中文OCR 或是 tesseract 處理

安裝到本地端

先建立一個虛擬環境起來

conda create -n omni python=3.12 -y
conda activate omni

選項:有GPU的，先把CUDA安裝起來

conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

整個安裝也很簡單，就五個步驟

git clone https://github.com/microsoft/OmniParser.git && cd OmniParser
pip install -r requirements.txt
huggingface-cli download --repo-type model microsoft/OmniParser --local-dir weights --include "icon_detect/*" "icon_caption_blip2/*" "icon_caption_florence/*"
python /home/Ubuntu/OmniParser/weights/convert_safetensor_to_pt.py
python gradio_demo.py

OmniParser 2.0 更新

OmniParser V2 的主要改進與優勢

1. 更大、更乾淨的訓練資料集

OmniParser V2 採用了規模更大且模型已經清洗良好的「icon caption + grounding」資料集，涵蓋更豐富的 UI 標記與功能描述，進而提升模型對互動區域的識別能力。

2. 顯著降低推理延遲

V2 在推理速度上較 V1 快了 60%，平均延遲為每畫面 0.6 秒（A100 GPU）或 0.8 秒（RTX 4090），適合即時 GUI 解讀與互動場景。

3. Grounding 準確度大幅提升

在「ScreenSpot Pro」這項標註小型 UI 元素的基準上，搭配 GPT-4o，V2 的平均精準度達到 39.6%，遠高於 GPT-4o 原本只有 0.8% 的表現。

4. 整合 OmniTool，打造完整 AI GUI Agent 流程

V2 支援搭配 OmniTool，形成一個即插即用的環境，可控制 Windows 11 VM 並搭配各家大型語言模型，如 OpenAI (4o, o1, o3-mini)、DeepSeek R1、Qwen 2.5VL 甚至 Anthropic，使建構 GUI Agent 更簡單。

5. 擴大使用場景與穩定性

除了支援 PC 與手機螢幕截圖外，V2 的架構更穩定、更泛用，適合建構可解讀 GUI 的多種應用。

V1 vs V2 功能比較表

特性	OmniParser V1	OmniParser V2
訓練資料集	標準 icon caption+grounding 少量	更大、更乾淨的訓練資料集
推理速度	較慢	快了約 60%，平均延遲 0.6s–0.8s
Grounding 準確度	基準低，難以處理小 UI 元素	搭配 GPT-4o 平均達 39.6% 準確率
操作流程整合性	需手動整合模型與 LLM	支援 OmniTool，快速與多款 LLM 串接
適用場景廣度	較狹窄	更廣泛，包含各種 GUI 互動與截圖輸入

下載新的模型

for f in icon_detect/{train_args.yaml,model.pt,model.yaml} icon_caption/{config.json,generation_config.json,model.safetensors}; do huggingface-cli download microsoft/OmniParser-v2.0 "$f" --local-dir weights; done
   mv weights/icon_caption weights/icon_caption_florence

如果你是 Windows 可以去 Hugginface 下載模型後，並且在目錄下建立 weights\icon_caption_florence ，把下載來的模型放在目錄中即可

https://huggingface.co/microsoft/OmniParser-v2.0/tree/main

OmniParser 1.5 更新

先下載模型

python weights/convert_safetensor_to_pt.py

For v1.5: 
download 'model_v1_5.pt' from https://huggingface.co/microsoft/OmniParser/tree/main/icon_detect_v1_5, make a new dir: weights/icon_detect_v1_5, and put it inside the folder. No weight conversion is needed.

執行指令要改成 1.5 版本

python gradio_demo.py --icon_detect_model weights/icon_detect_v1_5/model_v1_5.pt --icon_caption_model florence2

支援其他的語言

舉例來說，要改成中文，請找到專案下的 utils.py ，將 en 改成 ch

reader = easyocr.Reader(['en'])
paddle_ocr = PaddleOCR(
#    lang='en',  # other lang also available
    lang='ch',  # other lang also available
    use_angle_cls=False,
    use_gpu=False,  # using cuda will conflict with pytorch in the same process
    show_log=False,
    max_batch_size=1024,
    use_dilation=True,  # improves accuracy
    det_db_score_mode='slow',  # improves accuracy
    rec_batch_num=1024)

在介面中選取使用 PaddleOCR

Open Canvas-本地使用 OpenAI Canvas功能

by Rain Chu | 11 月 6, 2024 | Agent, AI, Chat

最近 OpenAI 推出了 Canvas ，開始流行在 ChatGPT 上頭寫程式、寫郵件等等，馬上就有人推出本地端一樣的服務 Open Canvas ，解放了你只能在 OpenAI 上使用的困境，除了 Git 以外，也馬上有了 docker 版本，可以快速體驗

LiveKit-擁有自己的AI即時語音客服聊天小助理

by rainchu | 9 月 23, 2024 | AI, Chat

公開如何使用 OpenAI 配合 LiveKit 來實現會多國語言的小姐姐，可以即時回答您的問題，這個跟 Twilio 一樣的簡單和易用

取得 LiveKit key

利用 google 帳號登入 LiveKit Login 命名一個 project

並且到專案中的 settings -> KEYS ，取得 API KEY

程式碼

首先安裝相關依賴

pip install livekit-agents livekit-plugins-openai livekit-plugins-silero python-dotenv

設定環境變數

LIVEKIT_URL=""
LIVEKIT_API_KEY=""
LIVEKIT_API_SECRET=""
OPENAI_API_KEY=""

主要程式碼

import asyncio
from dotenv import load_dotenv
from livekit.agents import AutoSubscribe, JobContext,WorkerOptions, cli, llm
from livekit.agents.voice_assistant import VoiceAssistant
from livekit.plugins import openai, silero


load_dotenv()

async def entry(ctx: JobContext):
    chat_ctx = llm.ChatContext().append(
        role="system",
        text=("你是專業的助理，回答時候用專業的語氣回應。")
    )

    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    asssitant = VoiceAssistant(
        vad=silero.VAD.load(),
        stt=openai.STT(),
        tts=openai.TTS(voice="nova"),
        llm=openai.LLM(model="gpt-4o-mini"),
        chat_ctx=chat_ctx
    )
    asssitant.start(ctx.room)

    await asyncio.sleep(1)
    await asssitant.say("你好，第一次見面，很高興認識你",allow_interruptions=True)


if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entry))

測試與驗證

道專案中，可以看到 Get started 中有支援各種的平台的程式碼以及 server 可以使用

價格說明

https://livekit.io/pricing

參考資料

https://livekit.io

https://github.com/livekit/agents

demo code

Lobe Chat UI-有plugin，多模態的AI CHAT UI

AnythingLLM 採用 docker 安裝

« Older Entries

Next Entries »

Tencent Hunyuan-Large 騰訊混元模型

MOE

相關資源

Thinking Claude 把你的 LLM 變成 Chat-GPT O1 會深度思考

模型提示詞 V4 lite

相關資訊

OmniParser-微軟的開源螢幕解析工具

安裝到本地端

OmniParser 2.0 更新

OmniParser V2 的主要改進與優勢

1. 更大、更乾淨的訓練資料集

2. 顯著降低推理延遲

3. Grounding 準確度大幅提升

4. 整合 OmniTool，打造完整 AI GUI Agent 流程

5. 擴大使用場景與穩定性

V1 vs V2 功能比較表

OmniParser 1.5 更新

支援其他的語言

相關資源

Open Canvas-本地使用 OpenAI Canvas功能

相關資源

LiveKit-擁有自己的AI即時語音客服聊天小助理

取得 LiveKit key

程式碼

首先安裝相關依賴

設定環境變數

主要程式碼

測試與驗證

價格說明

參考資料

近期文章

近期留言

彙整

分類