这篇文章距离上次修改已过254天，其中的内容可能已经有所变动。

【图像分割】Grounded Segment Anything：根据文字自动画框或分割环境配置和使用教程

Grounded Segment Anything 是一种结合了 OpenAI 的 GPT 和 Meta 的 Segment Anything 模型（SAM）的创新工具。它可以根据用户输入的文本提示，自动生成图像分割的框或掩码。本教程将从环境配置开始，逐步介绍如何安装和使用该工具，同时包含代码示例和图解。

一、Grounded Segment Anything 的概述

1. Grounded Segment Anything 是什么？

功能：根据用户输入的自然语言描述，对目标图像中的特定区域进行分割或画框。
优势：无需训练，快速部署；结合 SAM 模型的强大分割能力，能够识别并精准定位任意目标。

二、环境配置

要使用 Grounded Segment Anything，我们需要安装相关依赖，包括 PyTorch、SAM、GroundingDINO 等。

1. 环境需求

Python 版本：3.8 或以上
GPU：建议支持 CUDA 的显卡
操作系统：Linux / MacOS / Windows

2. 安装步骤

（1）安装 PyTorch

安装适合你硬件的 PyTorch 版本。以下以 CUDA 11.8 为例：

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

（2）克隆 Grounded Segment Anything 仓库

git clone https://github.com/IDEA-Research/Grounded-Segment-Anything.git
cd Grounded-Segment-Anything

（3）安装依赖

pip install -r requirements.txt

（4）下载预训练模型

需要下载 GroundingDINO 和 SAM 的权重文件：

GroundingDINO：下载地址
SAM：下载地址

下载后，将模型权重保存到 models/ 目录下。

三、代码示例

以下是一个使用 Grounded Segment Anything 进行图像分割的完整示例。

1. 导入库和加载模型

import torch
from groundingdino.util.inference import load_model, predict
from segment_anything import SamPredictor, sam_model_registry
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt

# 加载 GroundingDINO 模型
dino_model = load_model("models/groundingdino_swint_ogc.pth")

# 加载 SAM 模型
sam_checkpoint = "models/sam_vit_h_4b8939.pth"
sam = sam_model_registry["vit_h"](checkpoint=sam_checkpoint)
sam_predictor = SamPredictor(sam)

2. 加载图像

# 读取并预处理图像
image_path = "example.jpg"
image = Image.open(image_path).convert("RGB")
image_np = np.array(image)

# 设置 SAM 图像
sam_predictor.set_image(image_np)

3. 根据文本提示生成框

# 文本提示
text_prompt = "a cat"

# 使用 GroundingDINO 生成候选框
boxes, scores, phrases = predict(
    model=dino_model,
    image=image_np,
    text_prompt=text_prompt,
    box_threshold=0.3,  # 置信度阈值
    text_threshold=0.25
)

# 可视化生成的框
for box in boxes:
    plt.gca().add_patch(plt.Rectangle(
        (box[0], box[1]),
        box[2] - box[0],
        box[3] - box[1],
        edgecolor='red',
        fill=False,
        linewidth=2
    ))
plt.imshow(image_np)
plt.show()

4. 使用 SAM 模型分割框中区域

# 选择一个框（以第一个为例）
selected_box = boxes[0]

# 使用 SAM 分割框内区域
masks, _, _ = sam_predictor.predict(
    box=np.array(selected_box),
    multimask_output=False
)

# 显示分割结果
plt.figure(figsize=(10, 10))
plt.imshow(image_np)
plt.imshow(masks[0], alpha=0.5, cmap="jet")  # 叠加掩码
plt.axis("off")
plt.show()

四、完整运行流程图解

1. GroundedDINO 提取文本相关框

输入：text_prompt="a cat"。
输出：框的坐标和得分。

GroundedDINO 画框示意图

2. SAM 精确分割目标

输入：GroundedDINO 提供的框。
输出：分割的掩码。

SAM 分割示意图

五、应用场景

1. 自动化标注

通过自然语言输入，自动生成分割标注，大大提高数据标注效率。

2. 目标检测与分割

快速检测并分割特定对象，适用于工业检测、医学图像等领域。

3. 智能图像编辑

结合分割结果，对目标区域进行替换、增强等操作。

六、常见问题与解决方案

1. CUDA Out of Memory 错误

原因：图像过大或模型占用显存过多。
解决：缩小图像尺寸或切换到低版本的 SAM 模型。

2. 分割结果不理想

原因：文本描述过于模糊。
解决：提高文本描述的细化程度，例如增加目标的颜色、位置等特征。

3. 模型下载速度慢

解决：使用加速下载工具或国内镜像。

七、总结

通过 Grounded Segment Anything，可以轻松实现基于文字提示的图像分割任务。无论是自动化标注还是智能编辑，它都展示了强大的实用性。结合本教程，你可以快速上手该工具，为你的项目增添新的可能性。

推荐实验：

尝试不同的文本提示，观察对分割结果的影响。
修改代码，将分割结果保存为 PNG 格式。
集成到 Flask 或 Streamlit 应用中，实现在线分割服务。

快去尝试吧！🎉

【图像分割】Grounded Segment Anything根据文字自动画框或分割环境配置和使用教程