Select Page

GraphRAG 使用本地端的 Ollama

GraphRAG圖像檢索增強生成(Graph Retrieval-Augmented Generation,GraphRAG)超好用,但也超級貴,超級花錢,想要省錢的話,就要用本地端的服務如(Ollama),要用的話,可以按照下面的步驟處理,前提是你已經可以使用 OpenAI 版本的 GraphRAG 了,本篇是要把 OpenAI 改成 Ollama

講在前面

要先設定好 GraphRAG

下載以及安裝好 Ollama

安裝 ollama 的 Python 套件

pip install ollama

修改原先的 setting.yaml 檔案

把舊的 yaml 檔案改成 ollama 的設定檔,新的設定檔案如下

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  # model: gpt-4o-mini
  model_supports_json: true # recommended if this is available for your model.
  # max_tokens: 4000
  # request_timeout: 180.0
  # api_base: https://<instance>.openai.azure.com
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made

  # ollama api_base
  api_base: http://localhost:11434/v1
  model: llama3

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    # model: text-embedding-3-small
    
    # ollama
    model: nomic-embed-text 
    api_base: http://localhost:11434/api

    # api_base: https://<instance>.openai.azure.com
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional
  
chunks:
  size: 300
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents
    
input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 0

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 0

community_report:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 8000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: true # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: true # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: true
  raw_entities: true
  top_level_nodes: true

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # max_tokens: 12000

global_search:
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

其中修改 llm 區塊

修改 model: llama3

加入 api_base: http://localhost:11434/v1

修改 embeddings 區塊

model: nomic-embed-text

api_base: http://localhost:11434/api

修改 GraphRAG 的程式碼

除了設定好 setting.yaml 以外,程式碼也要修改成可以支持 ollama 的程式碼,有兩處要改,可以用以下現成的程式碼

  1. C:\Users\xxx\anaconda3\envs\GraphRAG\Lib\site-packages\graphrag\llm\openai\openai_embeddings_llm.py

加入 ollama setting 區塊

# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License

"""The EmbeddingsLLM class."""

from typing_extensions import Unpack

from graphrag.llm.base import BaseLLM
from graphrag.llm.types import (
    EmbeddingInput,
    EmbeddingOutput,
    LLMInput,
)
import ollama

from .openai_configuration import OpenAIConfiguration
from .types import OpenAIClientTypes


class OpenAIEmbeddingsLLM(BaseLLM[EmbeddingInput, EmbeddingOutput]):
    """A text-embedding generator LLM."""

    _client: OpenAIClientTypes
    _configuration: OpenAIConfiguration

    def __init__(self, client: OpenAIClientTypes, configuration: OpenAIConfiguration):
        self.client = client
        self.configuration = configuration

    async def _execute_llm(
        self, input: EmbeddingInput, **kwargs: Unpack[LLMInput]
    ) -> EmbeddingOutput | None:
        args = {
            "model": self.configuration.model,
            **(kwargs.get("model_parameters") or {}),
        }
        # openai setting
        # embedding = await self.client.embeddings.create(
        #    input=input,
        #    **args,
        #)
        #return [d.embedding for d in embedding.data]

        # ollama setting
        embedding_list=[]
        for inp in input:
            embedding = ollama.embeddings(model='qwen:7b', prompt=inp) #如果要改模型, 模型的名字要換掉
            embedding_list.append(embedding['embedding'])
        return embedding_list        
  1. C:\Users\xxx\anaconda3\envs\GraphRAG\Lib\site-packages\graphrag\query\llm\oai\embedding.py

加入 ollama setting 區塊,並且關閉 openai setting 區塊即可

# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License

"""OpenAI Embedding model implementation."""

import asyncio
from collections.abc import Callable
from typing import Any

import numpy as np
import tiktoken
from tenacity import (
    AsyncRetrying,
    RetryError,
    Retrying,
    retry_if_exception_type,
    stop_after_attempt,
    wait_exponential_jitter,
)

from graphrag.query.llm.base import BaseTextEmbedding
from graphrag.query.llm.oai.base import OpenAILLMImpl
from graphrag.query.llm.oai.typing import (
    OPENAI_RETRY_ERROR_TYPES,
    OpenaiApiType,
)
from graphrag.query.llm.text_utils import chunk_text
from graphrag.query.progress import StatusReporter
import ollama

class OpenAIEmbedding(BaseTextEmbedding, OpenAILLMImpl):
    """Wrapper for OpenAI Embedding models."""

    def __init__(
        self,
        api_key: str | None = None,
        azure_ad_token_provider: Callable | None = None,
        model: str = "text-embedding-3-small",
        deployment_name: str | None = None,
        api_base: str | None = None,
        api_version: str | None = None,
        api_type: OpenaiApiType = OpenaiApiType.OpenAI,
        organization: str | None = None,
        encoding_name: str = "cl100k_base",
        max_tokens: int = 8191,
        max_retries: int = 10,
        request_timeout: float = 180.0,
        retry_error_types: tuple[type[BaseException]] = OPENAI_RETRY_ERROR_TYPES,  # type: ignore
        reporter: StatusReporter | None = None,
    ):
        OpenAILLMImpl.__init__(
            self=self,
            api_key=api_key,
            azure_ad_token_provider=azure_ad_token_provider,
            deployment_name=deployment_name,
            api_base=api_base,
            api_version=api_version,
            api_type=api_type,  # type: ignore
            organization=organization,
            max_retries=max_retries,
            request_timeout=request_timeout,
            reporter=reporter,
        )

        self.model = model
        self.encoding_name = encoding_name
        self.max_tokens = max_tokens
        self.token_encoder = tiktoken.get_encoding(self.encoding_name)
        self.retry_error_types = retry_error_types

    def embed(self, text: str, **kwargs: Any) -> list[float]:
        """
        Embed text using OpenAI Embedding's sync function.

        For text longer than max_tokens, chunk texts into max_tokens, embed each chunk, then combine using weighted average.
        Please refer to: https://github.com/openai/openai-cookbook/blob/main/examples/Embedding_long_inputs.ipynb
        """
        token_chunks = chunk_text(
            text=text, token_encoder=self.token_encoder, max_tokens=self.max_tokens
        )
        chunk_embeddings = []
        chunk_lens = []
        for chunk in token_chunks:
            try:
                # openai setting
                #embedding, chunk_len = self._embed_with_retry(chunk, **kwargs)
                #chunk_embeddings.append(embedding)
                #chunk_lens.append(chunk_len)

                # ollama setting
                embedding = ollama.embeddings(model="nomic-embed-text", prompt=chunk)['embedding'] #如果要替換嵌入模型, 請修改此處的模型名稱
                chunk_lens.append(chunk)
                chunk_embeddings.append(embedding)
                chunk_lens.append(chunk_lens)                
            # TODO: catch a more specific exception
            except Exception as e:  # noqa BLE001
                self._reporter.error(
                    message="Error embedding chunk",
                    details={self.__class__.__name__: str(e)},
                )

                continue
        #chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
        #chunk_embeddings = chunk_embeddings / np.linalg.norm(chunk_embeddings)
        return chunk_embeddings.tolist()

    async def aembed(self, text: str, **kwargs: Any) -> list[float]:
        """
        Embed text using OpenAI Embedding's async function.

        For text longer than max_tokens, chunk texts into max_tokens, embed each chunk, then combine using weighted average.
        """
        token_chunks = chunk_text(
            text=text, token_encoder=self.token_encoder, max_tokens=self.max_tokens
        )
        chunk_embeddings = []
        chunk_lens = []
        embedding_results = await asyncio.gather(*[
            self._aembed_with_retry(chunk, **kwargs) for chunk in token_chunks
        ])
        embedding_results = [result for result in embedding_results if result[0]]
        chunk_embeddings = [result[0] for result in embedding_results]
        chunk_lens = [result[1] for result in embedding_results]
        chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)  # type: ignore
        chunk_embeddings = chunk_embeddings / np.linalg.norm(chunk_embeddings)
        return chunk_embeddings.tolist()

    def _embed_with_retry(
        self, text: str | tuple, **kwargs: Any
    ) -> tuple[list[float], int]:
        try:
            retryer = Retrying(
                stop=stop_after_attempt(self.max_retries),
                wait=wait_exponential_jitter(max=10),
                reraise=True,
                retry=retry_if_exception_type(self.retry_error_types),
            )
            for attempt in retryer:
                with attempt:
                    embedding = (
                        self.sync_client.embeddings.create(  # type: ignore
                            input=text,
                            model=self.model,
                            **kwargs,  # type: ignore
                        )
                        .data[0]
                        .embedding
                        or []
                    )
                    return (embedding, len(text))
        except RetryError as e:
            self._reporter.error(
                message="Error at embed_with_retry()",
                details={self.__class__.__name__: str(e)},
            )
            return ([], 0)
        else:
            # TODO: why not just throw in this case?
            return ([], 0)

    async def _aembed_with_retry(
        self, text: str | tuple, **kwargs: Any
    ) -> tuple[list[float], int]:
        try:
            retryer = AsyncRetrying(
                stop=stop_after_attempt(self.max_retries),
                wait=wait_exponential_jitter(max=10),
                reraise=True,
                retry=retry_if_exception_type(self.retry_error_types),
            )
            async for attempt in retryer:
                with attempt:
                    embedding = (
                        await self.async_client.embeddings.create(  # type: ignore
                            input=text,
                            model=self.model,
                            **kwargs,  # type: ignore
                        )
                    ).data[0].embedding or []
                    return (embedding, len(text))
        except RetryError as e:
            self._reporter.error(
                message="Error at embed_with_retry()",
                details={self.__class__.__name__: str(e)},
            )
            return ([], 0)
        else:
            # TODO: why not just throw in this case?
            return ([], 0)

參考資料

Monica AI-Chrome 的一站式外掛

Monica AI-Chrome 的一站式外掛

Monica AI 最早是一個 Chrome 瀏覽器的外掛,只要安裝好後按下 Cmd+M 或 Ctrl+M,就可以開始與她聊天。或讓 Monica 幫助您組織和插入文本到任何網頁上。後台可以接上各家的AI模型,並且從超過 80 個模板中選擇一個模板,可以快速生成營銷文案。選擇網頁上的文本,讓 Monica 為您解釋、翻譯、改寫文章。

Monica ai 全方位AI助理的功能

官網:

https://monica.im

參考資料

Lobe Chat UI-有plugin,多模態的AI CHAT UI

Lobe Chat UI-有plugin,多模態的AI CHAT UI

一個可以支援本地模型(ollama),支援使用者拖拉圖片到對話框、文生圖、STT、TTS、插件設計(Plugin)、自建GPTs、資料庫的強大的 Web Chat UI

支持各種方法安裝

https://lobehub.com/zh-TW/docs/self-hosting/start

本地開發

git clone https://github.com/lobehub/lobe-chat.git
cd lobe-chat
pnpm install
pnpm run dev

Docker 安裝

docker run -d -p 3210:3210 \
  -e OPENAI_API_KEY=sk-xxxx \
  -e ACCESS_CODE=lobe66 \
  --name lobe-chat \
  lobehub/lobe-chat

參考資料

GitHUB

利用 Claude Dev 幫你自動化開發程式

利用 Claude Dev 幫你自動化開發程式

Claude Dev 提供了一個 AI 自動生成程式的開發工具,融合了 VS Code 的編輯器功能和強大的 Claude 3.5 Sonnet’s 模型。這套系統融合了代碼的自動生成過程,更在軟體開發的各個階段提供了全方面的支持。

特色介紹

代碼開發和文件管理的全新模式

利用 Claude 3.5 Sonnet’s 的主動式編碼能力,Claude Dev 能夠逐步處理複雜的軟件開發任務。它不僅允許創建和編輯文件,還能探索複雜項目,並在獲得使用者許可後,執行終端機命令。

以往要採用AI開發,必須在各種AI工具以及視窗中切換,有遇到錯誤也需要手動張貼錯誤訊息,提供給AI除錯,現在超越了傳統的代碼自動完成或技術支持,為開發者提供了更全面的幫助。

AI 監督並且和使用者互動

傳統的自主AI腳本通常在沙盒環境中運行,而 Claude Dev 提供了一個圖形使用者UI,可以用來監督每一個變更的文件和執行的命令。這種方式確保了操作的安全性,並使開發者能夠安全地探索主動式 AI 的潛力。你還可以將圖片貼入聊天中,利用 Claude 的視覺能力將模型轉換成功能完整的應用程序,或者用截圖修復錯誤。

深入瞭解每一步的變更

Claude Dev 允許你直接在編輯器中查看每次更改的差異,並在聊天中通過語法高亮預覽跟踪進度。終端命令也可以直接在聊天中運行,你無需自己打開終端機。此外,每次使用工具或發送信息到 API 前,都會出現許可按鈕(例如「批准終端命令」),讓你能夠控制操作。

代碼和項目管理的高效工具

Claude Dev 擁有全面性寫程式的能力:

  • 執行系統上的終端命令
  • 列出指定目錄的頂層文件路徑
  • 遞迴列出指定目錄及其子目錄中的所有文件路徑
  • 解析頂層源代碼文件以提取關鍵元素名稱,如 Class 和 Function 等。

透由這些工具的運用,結合自然語言處理的基礎,使 Claude Dev 能夠理解代碼庫的結構和意圖,從而有效地協助開發者去開發大型和複雜的項目。

高級代碼概覽和文件結構的智能分析

從項目文件結構到高級代碼概覽,Claude Dev 使用如 tree-sitter 的工具來解析源代碼,提取出 Class 、Function、 Method 等定義。這種深度分析使 Claude Dev 能夠迅速了解代碼的結構和用途,並根據任務需要閱讀最相關的文件。

實時監控AI的成本控制

Claude Dev 也能跟踪整個任務循環和個別請求的 API 使用成本,並設定在任務中允許的最大 API 請求數量。任務完成後,Claude 可以決定是否通過如

open -a "Google Chrome" index.html

的終端命令來向你展示結果,只需點擊一下即可運行。

這些高級功能證明了 Claude Dev 不僅是代碼自動生成的工具,更是一個全面的開發環境,讓開發者能夠更有效地控制和優化他們的開發流程。通过提供一個全方位的開發解決方案,Claude Dev 真正實現了零代碼開發的未來。

參考資料

https://github.com/saoudrizwan/claude-dev

下載 VSCode Extension

ANTHROPIC API

GraphRAG與我踩過的坑

GraphRAG與我踩過的坑

2024/07 相信 AI 界最火的是 Microsoft 推出的 GraphRAG 了,看起來很簡單,但坑也不少,網路上教學很多,我這邊專門做一集推坑以及救贖的文章

訓練價格過高

用便宜模型 gpt-4o-mini

llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: gpt-4o-mini
  model_supports_json: true # recommended if this is available for your model.

用 local ollama, vllm, LM Studio

要用 ollama 的話,要先安裝 ollama 的庫

pip install ollama

並且用別人已經改好的程式碼

git clone https://github.com/TheAiSingularity/graphrag-local-ollama.git

執行細節可以看

https://medium.com/@vamshirvk/unlocking-cost-effective-local-model-inference-with-graphrag-and-ollama-d9812cc60466

視覺化模型

請下載 Gephi

打開 settings.yaml 並且找到 snapshots 將 graphml 打開,這樣子在 index 的時候就會幫你生成 .graphml 的檔案,之後就可以用 Gephi 去編輯他

snapshots:
  graphml: true
  raw_entities: true
  top_level_nodes: true

參考資料

GraphRAG Github

https://github.com/microsoft/graphrag

AnythingLLM 採用 docker 安裝

AnythingLLM 採用 docker 安裝

AnythingLLm 官方最推薦的安裝方法是採用 docker,可以最快速的體驗 anything llm web ui 以及驗證想法,雖然說用docker安裝已經超簡單了,但還是有些小細節值得記錄一下,並且告訴大家避免採坑。

最低要求

最低需要 Docker v18.03+ 版本在 Win/Mac 上和 20.10+ 版本在 Linux/Ubuntu 上才能解析 host.docker.internal

Linux:在 docker run 命令中添加 –add-host=host.docker.internal參數以使其能夠解析。例如:在主機上 localhost:8000 上運行的 Chroma 主機 URL 在 AnythingLLM 中使用時需改為 http://host.docker.internal:8000。

安裝指令

 docker pull mintplexlabs/anythingllm

Windows 的設定指令,定義好你要存放 llm 的位址

$env:STORAGE_LOCATION="$HOMEDocumentsanythingllm"; `
If(!(Test-Path $env:STORAGE_LOCATION)) {New-Item $env:STORAGE_LOCATION -ItemType Directory}; `
If(!(Test-Path "$env:STORAGE_LOCATION.env")) {New-Item "$env:STORAGE_LOCATION.env" -ItemType File}; `
docker run -d -p 3001:3001 `
--cap-add SYS_ADMIN `
-v "$env:STORAGE_LOCATION`:/app/server/storage" `
-v "$env:STORAGE_LOCATION.env:/app/server/.env" `
-e STORAGE_DIR="/app/server/storage" `
mintplexlabs/anythingllm;

多個 Anything llm containers

如果你需要安裝多個anything llm,那記得要改兩個位置

  1. $env:STORAGE_LOCATION=”$HOMEDocumentsanythingllm-yourid”; `
  2. docker run -d -p 8001(改成你自己的port):3001 `

修改玩會像是下面的樣子

$env:STORAGE_LOCATION="$HOMEDocumentsanythingllm-yourid"; `
If(!(Test-Path $env:STORAGE_LOCATION)) {New-Item $env:STORAGE_LOCATION -ItemType Directory}; `
If(!(Test-Path "$env:STORAGE_LOCATION.env")) {New-Item "$env:STORAGE_LOCATION.env" -ItemType File}; `
docker run -d -p 8001:3001 `
--cap-add SYS_ADMIN `
-v "$env:STORAGE_LOCATION`:/app/server/storage" `
-v "$env:STORAGE_LOCATION.env:/app/server/.env" `
-e STORAGE_DIR="/app/server/storage" `
--name yourid `
mintplexlabs/anythingllm;

之後可以執行 Docker run

docker run -d -p 8001:3001 --cap-add SYS_ADMIN --user root -v "$env:STORAGE_LOCATION:/app/server/storage" -v "$env:STORAGE_LOCATION.env:/app/server/.env" -e STORAGE_DIR="/app/server/storage" mintplexlabs/anythingllm

利用 Docker Compose 安裝

先寫一段 docker-compose.yml ,再用 docker-compose up -d 執行即可

version: '3.8'
services:
  anythingllm:
    image: mintplexlabs/anythingllm
    container_name: anythingllm
    ports:
      - "3001:3001"
    volumes:
      - ./storage:/app/server/storage
      - ./env.txt:/app/server/.env
    environment:
      - STORAGE_DIR=/app/server/storage
    cap_add:
      - SYS_ADMIN
    extra_hosts:
      - "host.docker.internal:host-gateway"
    restart: unless-stopped

利用介面安裝

在 docker desktop 的搜尋框內輸入 anythingllm ,找到 mintpolexlabs/anythingllm,並且按下 Run

下載完畢後記得要做第一次的初始化設定喔

TIPS

記得你現在運行的服務是在 docker 中,如果您在 localhost 上運行其他服務,如 Chroma、LocalAi 或 LMStudio,您將需要使用 http://host.docker.internal:xxxx 從 Docker 容器內訪問該服務,因為 localhost對主機系統來說無法解析。

參考資料

https://docs.useanything.com/installation/self-hosted/local-docker

錯誤解決

如果遇到Error : ‘Invalid file upload. EACCES: permission denied, open ‘/app/collector/hotdir/xxxx.txt’,可以用以下的方法解決

docker run -d -p 8001:3001 --cap-add SYS_ADMIN --user root -v "$env:STORAGE_LOCATION:/app/server/storage" -v "$env:STORAGE_LOCATION.env:/app/server/.env" -e STORAGE_DIR="/app/server/storage" mintplexlabs/anythingllm