Computer Use 彙整

OmniParser-微軟的開源螢幕解析工具

by Rain Chu | 11 月 6, 2024 | Agent, AI

繼之前提到的 Ahthropic Computer Use ，那時候超級驚豔的，馬上就看到MS也有推出自己的版本，雖然沒有自動執行功能，但可以配合 pyautogui 達成，雖然不支援中文，但可以透過中文OCR 或是 tesseract 處理

安裝到本地端

先建立一個虛擬環境起來

conda create -n omni python=3.12 -y
conda activate omni

選項:有GPU的，先把CUDA安裝起來

conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

整個安裝也很簡單，就五個步驟

git clone https://github.com/microsoft/OmniParser.git && cd OmniParser
pip install -r requirements.txt
huggingface-cli download --repo-type model microsoft/OmniParser --local-dir weights --include "icon_detect/*" "icon_caption_blip2/*" "icon_caption_florence/*"
python /home/Ubuntu/OmniParser/weights/convert_safetensor_to_pt.py
python gradio_demo.py

OmniParser 1.5 更新

先下載模型

python weights/convert_safetensor_to_pt.py

For v1.5: 
download 'model_v1_5.pt' from https://huggingface.co/microsoft/OmniParser/tree/main/icon_detect_v1_5, make a new dir: weights/icon_detect_v1_5, and put it inside the folder. No weight conversion is needed.

執行指令要改成 1.5 版本

python gradio_demo.py --icon_detect_model weights/icon_detect_v1_5/model_v1_5.pt --icon_caption_model florence2

支援其他的語言

舉例來說，要改成中文，請找到專案下的 utils.py ，將 en 改成 ch

reader = easyocr.Reader(['en'])
paddle_ocr = PaddleOCR(
#    lang='en',  # other lang also available
    lang='ch',  # other lang also available
    use_angle_cls=False,
    use_gpu=False,  # using cuda will conflict with pytorch in the same process
    show_log=False,
    max_batch_size=1024,
    use_dilation=True,  # improves accuracy
    det_db_score_mode='slow',  # improves accuracy
    rec_batch_num=1024)

在介面中選取使用 PaddleOCR

OmniParser-微軟的開源螢幕解析工具

安裝到本地端

OmniParser 1.5 更新

支援其他的語言

相關資源

近期文章

近期留言

彙整

分類