OmniParser-微軟的開源螢幕解析工具
繼之前提到的 Ahthropic Computer Use ,那時候超級驚豔的,馬上就看到MS也有推出自己的版本,雖然沒有自動執行功能,但可以配合 pyautogui 達成,雖然不支援中文,但可以透過中文OCR 或是 tesseract 處理
安裝到本地端
先建立一個虛擬環境起來
conda create -n omni python=3.12 -y conda activate omni
選項:有GPU的,先把CUDA安裝起來
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
整個安裝也很簡單,就五個步驟
git clone https://github.com/microsoft/OmniParser.git && cd OmniParser pip install -r requirements.txt huggingface-cli download --repo-type model microsoft/OmniParser --local-dir weights --include "icon_detect/*" "icon_caption_blip2/*" "icon_caption_florence/*" python /home/Ubuntu/OmniParser/weights/convert_safetensor_to_pt.py python gradio_demo.py
OmniParser 1.5 更新
先下載模型
python weights/convert_safetensor_to_pt.py For v1.5: download 'model_v1_5.pt' from https://huggingface.co/microsoft/OmniParser/tree/main/icon_detect_v1_5, make a new dir: weights/icon_detect_v1_5, and put it inside the folder. No weight conversion is needed.
執行指令要改成 1.5 版本
python gradio_demo.py --icon_detect_model weights/icon_detect_v1_5/model_v1_5.pt --icon_caption_model florence2
支援其他的語言
舉例來說,要改成中文,請找到專案下的 utils.py ,將 en 改成 ch
reader = easyocr.Reader(['en']) paddle_ocr = PaddleOCR( # lang='en', # other lang also available lang='ch', # other lang also available use_angle_cls=False, use_gpu=False, # using cuda will conflict with pytorch in the same process show_log=False, max_batch_size=1024, use_dilation=True, # improves accuracy det_db_score_mode='slow', # improves accuracy rec_batch_num=1024)
在介面中選取使用 PaddleOCR
近期留言