百度网盘AI大赛-文档检测优化赛B榜第一名方案


手机拍照已经是日常工作生活中很常见的一种行为。本次比赛需要通过算法去除杂乱的拍摄背景并精准矿区文档边缘,通过对给定的真实场景下采集得到的带有拍摄背景的文件图片进行边缘智能识别,最终输出处理后的扫描结果图片。

☞☞☞AI 智能聊天, 问答助手, AI 智能搜索, 免费无限量使用 DeepSeek R1 模型☜☜☜

百度网盘ai大赛-文档检测优化赛b榜第一名方案 -

一、赛题介绍

手机拍照已经是日常工作生活中很常见的一种行为。本次比赛需要通过算法去除杂乱的拍摄背景并精准矿区文档边缘,通过对给定的真实场景下采集得到的带有拍摄背景的文件图片进行边缘智能识别,最终输出处理后的扫描结果图片。

二、竞赛数据集

数据集基本涵盖了日常生活中常见的文档种类,包含了2797张图片,分别提供了文档mask标注png格式、文档边缘mask标注png格式、文档边缘关键点点对的标注方式。百度网盘AI大赛-文档检测优化赛B榜第一名方案 -

三、赛题分析与训练调优过程

1、赛题分析

本次赛题的目的是通过算法计算文档的四个拐点坐标,在提供的baseline(https://aistudio.baidu.com/aistudio/projectdetail/3861946) 里面是直接通过回归四个关键点的方式进行计算的,四个点的坐标直接从边缘关键点点对中提取。

# 从关键点点对中提取四个角点的坐标def get_corner(self, sites, corner_flag):    # corner_flag 1:top_left 2:top_right 3:bottom_right 4:bottom_left
    if corner_flag == 1:        target_sites = [0,0]
    elif corner_flag == 2 :        target_sites = [1,0]
    elif corner_flag == 3 :        target_sites = [1,1]
    elif corner_flag == 4 :        target_sites = [0,1]

    min_dis = 3
    best_x = 0
    best_y = 0
    for site in sites:        if abs(site[0]-target_sites[0])+abs(site[1]-target_sites[1])<min_dis:            min_dis=abs(site[0]-target_sites[0])+abs(site[1]-target_sites[1])            best_x = site[0]            best_y = site[1]    return best_x, best_y

通过baseline的总结和塞梯介绍的video,总结可以通过两种方式解决这个赛题。百度网盘AI大赛-文档检测优化赛B榜第一名方案 -

百度网盘AI大赛-文档检测优化赛B榜第一名方案 -

  • 一种是直接利用回归模型计算文档四个角点,通过对数据的观察,发现部分角点位于图像边缘处,不利于角点回归的收敛,因此这种方式存在瓶颈。

百度网盘AI大赛-文档检测优化赛B榜第一名方案 -

  • 另一种是计算文档所在区域的mask,直接提交mask结果,由后台任务进行四个角点的提取并计算mIoU,这种方式简单明了,可以直接借助paddleSeg进行训练,因此本赛题采用paddleSeg进行训练验证。

2、模型选择

  • 尝试一:在官方提供的baseline的基础上,将resnet152替换成HRNet64,其他保持不变,在A榜上取得miou为0.92851,排名二十二;
  • 尝试二:使用HRNet48作为backbone,使用DBNet的格式进行分割回归,在A榜最终miou为0.94751,排名十八;
  • 尝试三:使用paddleseg进行分割,使用unet发现存在分割图存在孔洞的情况,通过文献了解发现OCRNet,尝试使用OCRNet训练3000个epoch,在A榜miou得分为0.97011,排名第九;
  • 尝试四:本着大力出奇迹的想法,使用OCRNet训练30000个epoch,同时在loss上采用ohem模式,精细调整模拟退火的T_max为训练的总迭代次数,没来得及在A榜测试,在B榜提交miou得分为0.96402,排名第一。

2.1 HRNET介绍

HRNet的backbone分成4个stage,每个stage分成蓝色框和橙色框两部分。其中蓝色框部分是每个stage的基本结构,由多个branch组成,HRNet中stage1蓝色框使用的是BottleNeck,stage2&3&4蓝色框使用的是BasicBlock。其中橙色框部分是每个stage的过渡结构,HRNet中stage1橙色框是一个TransitionLayer,stage2&3橙色框是一个FuseLayer和一个TransitionLayer的叠加,stage4橙色框是一个FuseLayer。

百度网盘AI大赛-文档检测优化赛B榜第一名方案 -

2.2 OCRNET---基于物体区域的上下文信息

微软亚洲研究院提出的 OCR 方法的主要思想是显式地把像素分类问题转化成物体区域分类问题,这与语义分割问题的原始定义是一致的,即每一个像素的类别就是该像素属于的物体的类别,换言之,与 PSPNet 和 DeepLabv3+ 的上下文信息最主要的不同就在于 OCR 方法显式地增强了物体信息。 百度网盘AI大赛-文档检测优化赛B榜第一名方案 -

OCR 方法的实现主要包括3个阶段,并给出每个阶段的矩阵表示(具体的计算过程可参考其开源的代码):

(1)从骨干网络得到特征表示,并估测一个简单粗略的语义分割结果作为 OCR 方法的一个输入 ,即软物体区域(Soft Object Regions),矩阵表示为bch*w;

(2)根据软物体区域(bchw)和网络最深层输入的特征表示(bkhw)表示计算出 K 组向量,即物体区域表示(Object Region Representations),其中每一个向量对应一个语义类别的特征表示,矩阵表示为bck*1;

(3)计算网络最深层输出的像素特征表示(Pixel Representations)与计算得到的物体区域特征表示(Object Region Representation)之间的关系矩阵(b*(hw)k),然后根据每个像素和物体区域特征表示在关系矩阵中的数值把物体区域特征加权求和,得到最后的物体上下文特征表示 OCR (Object Contextual Representation),矩阵表示为bkh*w 。

当把物体上下文特征表示 OCR (bkhw)与网络最深层输入的特征表示(bkhw)拼接之后作为上下文信息增强的特征表示(Augmented Representation)(b2kh*w),可以基于增强后的特征表示预测每个像素的语义类别,也可拼接上ASPP特征,即OCR+Features+ASPP,具体算法框架可以参考图5。

简小派 简小派

简小派是一款AI原生求职工具,通过简历优化、岗位匹配、项目生成、模拟面试与智能投递,全链路提升求职成功率,帮助普通人更快拿到更好的 offer。

简小派 123 查看详情 简小派

综上,OCR 可计算一组物体区域的特征表达,然后根据物体区域特征表示与像素特征表示之间的相似度将这些物体区域特征表示传播给每一个像素。百度网盘AI大赛-文档检测优化赛B榜第一名方案 -

3、解压数据

In [1]
! wget https://staticsns.cdn.bcebos.com/amis/2025-4/1649731549425/train_datasets_document_detection_0411.zip! unzip -oq /home/aistudio/train_datasets_document_detection_0411.zip! rm -rf __MACOSX
! rm -rf /home/aistudio/train_datasets_document_detection_0411.zip
--2025-05-24 20:34:44--  https://staticsns.cdn.bcebos.com/amis/2025-4/1649731549425/train_datasets_document_detection_0411.zip
正在解析主机 staticsns.cdn.bcebos.com (staticsns.cdn.bcebos.com)... 221.195.34.35
正在连接 staticsns.cdn.bcebos.com (staticsns.cdn.bcebos.com)|221.195.34.35|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度: 258661599 (247M) [application/zip]
正在保存至: “train_datasets_document_detection_0411.zip”

train_datasets_docu 100%[===================>] 246.68M  54.7MB/s    in 4.4s    

2025-05-24 20:34:49 (56.4 MB/s) - 已保存 “train_datasets_document_detection_0411.zip” [258661599/258661599])

4、数据拆分(train:val=9:1)

创建文件夹,用来保存拆分后的数据

In [2]
!mkdir -p data/train/images data/train/labels
!mkdir -p data/val/images data/val/labels

为了适配paddleseg,需要将标注的png图片转成单通道的png图片,并且里面mask区域标注为1(文档区域)或者0(背景区域)

In [3]
import osimport cv2import shutilfrom glob import globfrom tqdm import tqdm
In [4]
idx = 0train_lst = []
val_lst = []
images = glob('train_datasets_document_detection_0411/images/*')  # 获取所有图片for image in tqdm(images):
    idx = idx + 1
    name = os.path.basename(image)
    label = image.replace('images', 'segments').replace('.jpg', '.png')  # 获取mask分割图片
    tp = 'val' if idx % 10 == 0 else 'train'  # train/val拆分
    label_img = cv2.imread(label) // 255  # 1为文档区域,0为背景区域
    cv2.imwrite(f'data/{tp}/labels/{os.path.basename(label)}', label_img[:,:,0])  # 保存单通道图片
    shutil.copy(image, f'data/{tp}/images')    # shutil.copy(label, f'data/{tp}/labels')
    if tp == 'train':
        train_lst.append(name)    else:
        val_lst.append(name)# 生成paddleseg训练需要的train_list.txt和val_list.txtwith open('train_list.txt', 'w') as f:    for fn in train_lst:
        f.write(f"/home/aistudio/data/train/images/{fn} /home/aistudio/data/train/labels/{fn.replace('.jpg', '.png')}\n")    

with open('val_list.txt', 'w') as f:    for fn in val_lst:
        f.write(f"/home/aistudio/data/val/images/{fn} /home/aistudio/data/val/labels/{fn.replace('.jpg', '.png')}\n")
100%|██████████| 2797/2797 [00:13<00:00, 205.28it/s]

5、安装依赖包

安装特定版本的paddleseg(2.5.0)

In [5]
!pip install paddleseg==2.5.0
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting paddleseg==2.5.0
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/17/76/84a07245cb5a0ceae11a9a94c5d2be8a2cec94b3a0b883676d166eeacf2a/paddleseg-2.5.0-py3-none-any.whl (295 kB)     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 295.9/295.9 KB 1.0 MB/s eta 0:00:00a 0:00:01Requirement already satisfied: pyyaml>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddleseg==2.5.0) (5.1.2)
Requirement already satisfied: scipy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddleseg==2.5.0) (1.6.3)
Requirement already satisfied: visualdl>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddleseg==2.5.0) (2.2.3)
Requirement already satisfied: tqdm in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddleseg==2.5.0) (4.27.0)
Requirement already satisfied: filelock in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddleseg==2.5.0) (3.0.12)
Requirement already satisfied: prettytable in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddleseg==2.5.0) (0.7.2)
Requirement already satisfied: sklearn in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddleseg==2.5.0) (0.0)
Requirement already satisfied: opencv-python in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddleseg==2.5.0) (4.1.1.26)
Requirement already satisfied: Pillow>=7.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0->paddleseg==2.5.0) (8.2.0)
Requirement already satisfied: flask>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0->paddleseg==2.5.0) (1.1.1)
Requirement already satisfied: requests in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0->paddleseg==2.5.0) (2.24.0)
Requirement already satisfied: protobuf>=3.11.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0->paddleseg==2.5.0) (3.14.0)
Requirement already satisfied: six>=1.14.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0->paddleseg==2.5.0) (1.16.0)
Requirement already satisfied: bce-python-sdk in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0->paddleseg==2.5.0) (0.8.53)
Requirement already satisfied: pandas in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0->paddleseg==2.5.0) (1.1.5)
Requirement already satisfied: flake8>=3.7.9 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0->paddleseg==2.5.0) (4.0.1)
Requirement already satisfied: pre-commit in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0->paddleseg==2.5.0) (1.21.0)
Requirement already satisfied: shellcheck-py in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0->paddleseg==2.5.0) (0.7.1.1)
Requirement already satisfied: numpy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0->paddleseg==2.5.0) (1.19.5)
Requirement already satisfied: Flask-Babel>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0->paddleseg==2.5.0) (1.0.0)
Requirement already satisfied: matplotlib in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0->paddleseg==2.5.0) (2.2.3)
Requirement already satisfied: scikit-learn in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from sklearn->paddleseg==2.5.0) (0.24.2)
Requirement already satisfied: importlib-metadata<4.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl>=2.0.0->paddleseg==2.5.0) (4.2.0)
Requirement already satisfied: pycodestyle<2.9.0,>=2.8.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl>=2.0.0->paddleseg==2.5.0) (2.8.0)
Requirement already satisfied: pyflakes<2.5.0,>=2.4.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl>=2.0.0->paddleseg==2.5.0) (2.4.0)
Requirement already satisfied: mccabe<0.7.0,>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl>=2.0.0->paddleseg==2.5.0) (0.6.1)
Requirement already satisfied: itsdangerous>=0.24 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl>=2.0.0->paddleseg==2.5.0) (1.1.0)
Requirement already satisfied: click>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl>=2.0.0->paddleseg==2.5.0) (7.0)
Requirement already satisfied: Jinja2>=2.10.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl>=2.0.0->paddleseg==2.5.0) (3.0.0)
Requirement already satisfied: Werkzeug>=0.15 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl>=2.0.0->paddleseg==2.5.0) (0.16.0)
Requirement already satisfied: Babel>=2.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl>=2.0.0->paddleseg==2.5.0) (2.8.0)
Requirement already satisfied: pytz in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl>=2.0.0->paddleseg==2.5.0) (2019.3)
Requirement already satisfied: pycryptodome>=3.8.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl>=2.0.0->paddleseg==2.5.0) (3.9.9)
Requirement already satisfied: future>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl>=2.0.0->paddleseg==2.5.0) (0.18.0)
Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->visualdl>=2.0.0->paddleseg==2.5.0) (2.8.2)
Requirement already satisfied: cycler>=0.10 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->visualdl>=2.0.0->paddleseg==2.5.0) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->visualdl>=2.0.0->paddleseg==2.5.0) (1.1.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->visualdl>=2.0.0->paddleseg==2.5.0) (3.0.8)
Requirement already satisfied: identify>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl>=2.0.0->paddleseg==2.5.0) (1.4.10)
Requirement already satisfied: aspy.yaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl>=2.0.0->paddleseg==2.5.0) (1.3.0)
Requirement already satisfied: toml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl>=2.0.0->paddleseg==2.5.0) (0.10.0)
Requirement already satisfied: nodeenv>=0.11.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl>=2.0.0->paddleseg==2.5.0) (1.3.4)
Requirement already satisfied: cfgv>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl>=2.0.0->paddleseg==2.5.0) (2.0.1)
Requirement already satisfied: virtualenv>=15.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl>=2.0.0->paddleseg==2.5.0) (16.7.9)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl>=2.0.0->paddleseg==2.5.0) (1.25.6)
Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl>=2.0.0->paddleseg==2.5.0) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl>=2.0.0->paddleseg==2.5.0) (2019.9.11)
Requirement already satisfied: idna<3,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl>=2.0.0->paddleseg==2.5.0) (2.8)
Requirement already satisfied: joblib>=0.11 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-learn->sklearn->paddleseg==2.5.0) (0.14.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-learn->sklearn->paddleseg==2.5.0) (2.1.0)
Requirement already satisfied: zipp>=0.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from importlib-metadata<4.3->flake8>=3.7.9->visualdl>=2.0.0->paddleseg==2.5.0) (3.8.0)
Requirement already satisfied: typing-extensions>=3.6.4 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from importlib-metadata<4.3->flake8>=3.7.9->visualdl>=2.0.0->paddleseg==2.5.0) (4.2.0)
Requirement already satisfied: MarkupSafe>=2.0.0rc2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Jinja2>=2.10.1->flask>=1.1.1->visualdl>=2.0.0->paddleseg==2.5.0) (2.0.1)
Requirement already satisfied: setuptools in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib->visualdl>=2.0.0->paddleseg==2.5.0) (56.2.0)
Installing collected packages: paddleseg
Successfully installed paddleseg-2.5.0WARNING: You are using pip version 22.0.4; however, version 22.1.1 is *ailable.
You should consider upgrading via the '/opt/conda/envs/python35-paddle120-env/bin/python -m pip install --upgrade pip' command.

6、训练时图像增强方式

训练时,采用随机水平翻转、随机像素替换、随即旋转、随机模糊策略、随机尺度变换、resize和normalize变换;

在推理和测试阶段,只采用resize和normalize变换。

In [6]
#创建Transformimport paddleseg.transforms as Tfrom paddleseg.datasets import OpticDiscSeg,Dataset

train_transforms = [
    T.RandomHorizontalFlip(),                                                              # 水平翻转
    T.RandomDistort(),                                                                     # 随机扭曲
    T.RandomRotation(max_rotation = 10,im_padding_value =(0,0,0),label_padding_value = 0), # 随机旋转
    T.RandomBlur(),                                                                        # 随机模糊
    T.RandomScaleAspect(min_scale = 0.8, aspect_ratio = 0.5),                              # 随机缩放
    
    T.Resize(target_size=(512, 512)),
    T.Normalize()                                                                          # 归一化 mean Default: [0.5, 0.5, 0.5]  std Default: [0.5, 0.5, 0.5].]

val_transforms = [
    T.Resize(target_size=(512, 512)),
    T.Normalize()
]

test_transforms = [
    T.Resize(target_size=(512, 512)),
    T.Normalize()
]

7、构建数据集(Dataset)

In [7]
#创建DataSetdataset_root = '/home/aistudio/data'train_path  = '/home/aistudio/train_list.txt'val_path  = '/home/aistudio/val_list.txt'# 构建训练集train_dataset = Dataset(  # Dataset为paddle默认的数据加载方式,如有需要可以重写此类,这里不需要
    dataset_root=dataset_root,
    train_path=train_path,
    transforms=train_transforms,
    num_classes=2,  # 0/1两种类别
    mode='train'
    )# 构建验证集val_dataset = Dataset(
    dataset_root=dataset_root,
    val_path=val_path,
    transforms=val_transforms,
    num_classes=2,    
    mode='val'
    )

8、预览数据

第一次可能执行不成功,需要再执行一次

In [9]
# 预览数据import matplotlib.pyplot as pltimport numpy as np
plt.figure(figsize=(16,16))for i in range(1,6,2):
    img, label = train_dataset[100]
    label = label * 255
    img = np.transpose(img, (1,2,0))
    img = img*0.5 + 0.5
    plt.subplot(3,2,i),plt.imshow(img,'gray'),plt.title('img'),plt.xticks([]),plt.yticks([])
    plt.subplot(3,2,i+1),plt.imshow(label,'gray'),plt.title('label'),plt.xticks([]),plt.yticks([])
    plt.show()
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/image.py:425: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  a_min = np.asscalar(a_min.astype(scaled_dtype))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/image.py:426: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  a_max = np.asscalar(a_max.astype(scaled_dtype))
<Figure size 1152x1152 with 2 Axes>
<Figure size 432x288 with 2 Axes>
<Figure size 432x288 with 2 Axes>

9、网络初始化

In [10]
from paddleseg.models import OCRNet, UNetfrom paddleseg.models import HRNet_W48
backbone = HRNet_W48(pretrained="https://bj.bcebos.com/paddleseg/dygraph/hrnet_w48_ssld.tar.gz")
model = OCRNet(num_classes=2, backbone=backbone, backbone_indices=[0])
W0524 20:35:36.742081   165 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0524 20:35:36.745203   165 gpu_context.cc:306] device: 0, cuDNN Version: 8.2.
2025-05-24 20:35:41 [INFO]	Loading pretrained model from https://bj.bcebos.com/paddleseg/dygraph/hrnet_w48_ssld.tar.gz
Connecting to https://bj.bcebos.com/paddleseg/dygraph/hrnet_w48_ssld.tar.gz
Downloading hrnet_w48_ssld.tar.gz
[==================================================] 100.00%
Uncompress hrnet_w48_ssld.tar.gz
[==================================================] 100.00%
2025-05-24 20:36:03 [INFO]	There are 1525/1525 variables loaded into HRNet.

10、设置优化器、调度器、loss

优化器采用paddle自带的Momentum;

调度器采用余弦模拟退火,T_max为总训练次数;(注:通过计算,可以知道lr最后会下降到什么问题,也有可能restart,让T_max==max_ter,主要是让最后学习率下降接近于0)

OCRNet的loss需要采用2个loss,这里采用带有难例挖掘的交叉熵损失和Dice损失,二者权重为1和0.2,表示更加关注带有难例的交叉熵损失。

In [13]
from paddleseg.models.losses import CrossEntropyLoss,DiceLoss,LovaszHingeLoss, MixedLoss, OhemCrossEntropyLossimport paddle# 设置学习率  base_lr = 0.002lr = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=base_lr, T_max=30000, verbose=False)#参数分别为初始学习率,训练的上限轮数,verbose若为true则每一轮更新时会输出一条信息# 设置优化器(这里选的是momentnum优化器)optimizer = paddle.optimizer.Momentum(lr, parameters=model.parameters(), momentum=0.9, weight_decay=4.0e-5)#参数分别为学习率,优化器需要优化的参数,动量因子,正则化方法(可以是float类型的L2正则化系数或者正则化策略)# 组合dice损失函数(混合损失运算)losses = {}
losses['types'] = [OhemCrossEntropyLoss(), DiceLoss()]
losses['coef'] = [1, 0.2]

11、启动训练

采用paddleseg自带的train函数启动训练

In [14]
from paddleseg.core import train

train(
    model=model,                       # 网络模型
    train_dataset=train_dataset,       # 填写训练集的dataset
    val_dataset=val_dataset,           # 填写验证集的dataset
    optimizer=optimizer,               # 优化器
    s*e_dir='/home/aistudio/output',    # 保存路径
    iters=30000,                        # 训练次数
    batch_size=16,                      # 每批处理图片的张数
    s*e_interval=3000,                 # 保存的间隔次数
    log_iters=100,                      # 日志打印间隔
    num_workers=0,                     # 异步加载数据的进程数目
    losses=losses,                     # 传入loss函数
    use_vdl=True)                      # 是否使用visualDL,visualDL是飞桨可视化分析工具,以丰富的图表呈现训练参数变化趋势、模型结构、数据样本、高维数据分布等
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/norm.py:654: UserWarning: When training, we now always track global mean and variance.
  "When training, we now always track global mean and variance.")
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/math_op_patch.py:278: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.float32, but right dtype is paddle.int64, the right dtype will convert to paddle.float32
  format(lhs_dtype, rhs_dtype, lhs_dtype))
2025-05-24 17:54:44 [INFO]	[TRAIN] epoch: 1, iter: 100/30000, loss: 0.7298, lr: 0.002000, batch_cost: 1.6588, reader_cost: 0.00992, ips: 9.6454 samples/sec | ETA 13:46:38
---------------------------------------------------------------------------KeyboardInterrupt Traceback (most recent call last)/tmp/ipykernel_202/1080567950.py in  13 num_workers=0,  # 异步加载数据的进程数目 14 losses=losses,  # 传入loss函数 ---> 15 use_vdl=True) # 是否使用visualDL,visualDL是飞桨可视化分析工具,以丰富的图表呈现训练参数变化趋势、模型结构、数据样本、高维数据分布等 /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleseg/core/train.py in train(model, train_dataset, val_dataset, optimizer, s*e_dir, iters, batch_size, resume_model, s*e_interval, log_iters, num_workers, use_vdl, losses, keep_checkpoint_max, test_config, precision, amp_level, profiler_options, to_static_training) 214 optimizer.step(loss) 215  else:--> 216  optimizer.step() 217 218 lr = optimizer.get_lr()  in step(self) /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py in __impl__(func, *args, **kwargs) 297 def __impl__(func, *args,**kwargs): 298  with _switch_tracer_mode_guard_(is_train=False): --> 299 return func(*args, **kwargs) 300 301  return __impl__(func)  in step(self) /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/wrapped_decorator.py in __impl__(func, *args, **kwargs) 23 def __impl__(func, *args, **kwargs): 24 wrapped_func = decorator_func(func) ---> 25 return wrapped_func(*args, **kwargs) 26 27  return __impl__ /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py in __impl__(*args, **kwargs) 432 assert _non_static_mode( 433 ), "We only support '%s()' in dynamic graph mode, please call 'paddle.disable_static()' to enter dynamic graph mode." % func.__name__ --> 434  return func(*args, **kwargs) 435 436 return __impl__ /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/optimizer/optimizer.py in step(self) 1218 1219 self._apply_optimize( -> 1220 loss=None, startup_program=None, params_grads=params_grads) 1221 1222  else: /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/optimizer/optimizer.py in _apply_optimize(self, loss, startup_program, params_grads) 961params_grads['params'] = self.append_regularization_ops( 962 params_grads['params'], self.regularization) --> 963  optimize_ops = self._create_optimization_pass(params_grads) 964 else: 965 program = loss.block.program /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/optimizer/optimizer.py in _create_optimization_pass(self, parameters_and_grads) 765 if param_and_grad[0].stop_gradient is False: 766 self._append_optimize_op(target_block, --> 767 param_and_grad) 768else: 769 for param_and_grad in parameters_and_grads['params']: /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/optimizer/momentum.py in _append_optimize_op(self, block, param_and_grad) 324  'regularization_method',regularization_method, 325 'regularization_coeff', regularization_coeff, 'multi_precision', --> 326 find_master) 327  return None 328 if in_dygraph_mode(): KeyboardInterrupt: 

12、推理测试

测试的结果会被保存在/home/aistudio/output/results路径下

In [11]
from paddleseg.core import predict
transforms = T.Compose([
    T.Resize(target_size=(512, 512)),
    T.Normalize()
])from paddleseg.models import OCRNet, UNetfrom paddleseg.models import HRNet_W48
backbone = HRNet_W48(pretrained="https://bj.bcebos.com/paddleseg/dygraph/hrnet_w48_ssld.tar.gz")
model = OCRNet(num_classes=2, backbone=backbone, backbone_indices=[0])# 生成图片列表image_list = []with open('/home/aistudio/val_list.txt' ,'r') as f:    for line in f.readlines():
        image_list.append(line.split()[0])

predict(
        model,        # 保存的模型文件
        model_path = '/home/aistudio/output/best_model/model.pdparams',
        transforms=transforms,
        image_list=image_list,
        s*e_dir='/home/aistudio/output/results',
    )
2025-05-24 20:36:08 [INFO]	Loading pretrained model from https://bj.bcebos.com/paddleseg/dygraph/hrnet_w48_ssld.tar.gz
2025-05-24 20:36:11 [INFO]	There are 1525/1525 variables loaded into HRNet.
2025-05-24 20:36:11 [INFO]	Loading pretrained model from /home/aistudio/output/best_model/model.pdparams
2025-05-24 20:36:12 [INFO]	There are 1583/1583 variables loaded into OCRNet.
2025-05-24 20:36:12 [INFO]	Start to predict...
279/279 [==============================] - 57s 204ms/step
代码解释

13、清理文件空间

In [14]
! rm -rf train_datasets_document_detection_0411/
! rm -rf output/iter*
! rm -rf output/results
! rm -rf data/train* data/val*
! rm *.txt
rm: 无法删除'*.txt': 没有那个文件或目录

以上就是百度网盘AI大赛-文档检测优化赛B榜第一名方案的详细内容,更多请关注其它相关文章!


# 节点营销的推广策略包括  # 的是  # 是一个  # 一言  # 加载  # 两种  # 分别为  # 拼多多网店优化网站  # 网站优化推广找哪家好  # 边缘  # 欧美网站建设美丽中国  # 衡水微信网站建设  # 企业营销推广的主要目的  # 怎么设置seo关键词  # 马站网站推广  # 宁河关键词排名优化工具  # 如何做网站优化推广工作  # yy  # 工具  # ai  # switch  # macos  # 百度网盘  # 百度  # 异步加载  # cos  # python  # red  # igs  # asic  # s  # 文档  # 中文网  # 百度网 


相关栏目: 【 Google疑问12 】 【 Facebook疑问10 】 【 优化推广96088 】 【 技术知识133117 】 【 IDC资讯59369 】 【 网络运营7196 】 【 IT资讯61894


相关推荐: IBM与NASA联手开源地理空间AI基础模型,促进气候科学领域进步  Vision Pro头显重磅发布;苹果收购AR厂商Mira  360发布数字安全和人工智能的强大结合:360安全大模型  人工智能驱动艺术,打开达利的超现实想象  杭州举办第19届亚运会,主题为「亚运元宇宙」的发布仪式举行  微软面向AI初学者推出免费网络课程  美版贴吧8000小组自爆停摆!拒绝数据被谷歌OpenAI白嫖,CEO被网友骂翻:背刺第三方应用  揭秘AI数字人语录:抖音AI小和尚、老者语录能赚钱吗?  走进首家“元宇宙”未来工厂,卡奥斯探知工业之旅出发!  国内首款大尺寸仿鸵双足机器人“大圣”亮相,穿戴红色战袍  日本演员工会提出AI立法建议 要求建立“声音肖像权”  IBM CEO克里希纳:人工智能潜在创新无法被监管  轻量级的深度学习框架Tinygrad  乐天派AI桌面机器人提供的正能量情绪价值直接拉满,妥妥的治愈系  刊·见 | 捕捉人工智能领域最新动态?收藏Applied Artificial Intelligence  图灵奖得主Hinton:我已经老了,如何控制比人类更聪明的AI交给你们了  “直击”AI新世界,智能机器人再次“火出圈”了  苹果式 AI 哲学:不着一字,处处落子  OpenAI更新GPT-4等模型,新增API函数调用,价格最高降75%  人工智能产业竞跑“未来赛道” 创新发展放大“赋能”效应  好莱坞面临全面停摆 好莱坞大罢工抵制“AI入侵”  兆讯传媒率先全面拥抱AI 数智广告内容焕发新生机  脑机接口产业联盟发布十大脑机接口关键技术  令人震惊的特斯拉机器人  抖音在Android平台获得VR|直播|软件著作权  讯飞星火大模型实现升级 助力通用人工智能人才培养  360发布AI数字人广场,可同孙悟空、爱因斯坦等古今中外角色对话  网易云音乐和小冰推出AI歌手音乐创作软件,首发内置12名AI歌手  DreamAvatar数字人使用教程  插画师对AI绘画软件的态度是怎样的?  小米创始人雷军将揭示小米AI在年度演讲中的最新进展  生成式人工智能如何改变云安全的游戏规则  LinkedIn 推出生成式 AI 辅助撰写帖文功能,将向所有用户开放  配 3D 机器人头像,谷歌展示全新安卓 LOGO  电池比 Air 2S 大 20%,大疆 Air 3 无人机现身 FCC  ChatGPT只讲这25个笑话!实验上千次有90%重复,网友:幽默是人类最后的尊严  如何提高集群协作效率?中外团队合作研发基于均值偏移的机器人队形控制策略  2025 WAIC|美团无人机发布第四代新机型  视觉中国推出AI灵感绘图功能  苹果AIGC专利:可通过语音指令生成AR/VR虚拟场景  1.6亿美元收购Singularity AI,昆仑万维布局通用人工智能  财联社首档运用虚拟人技术播报栏目《AI半小时》今晚上线!敬请期待  美图设计室2.0新增哪些功能  “图壤·阅读元宇宙”亮相北京国际图书博览会  昇腾AI & 讯飞星火:深度联手,共话国产大模型“大未来”  吉林首例!机器人辅助下搭桥手术成功实施  用人工智能技术,亚马逊为用户生成产品评论摘要,帮助他们轻松选购  ChatGPT 可以设计机器人吗?  AI框架生态峰会本周开幕 华为昇腾“朋友圈”再聚首 全球首个全模态大模型将登场  创作音乐/音频的Meta开源AI工具AudioCraft,让用户通过文本提示实现 

 2025-07-16

了解您产品搜索量及市场趋势,制定营销计划

同行竞争及网站分析保障您的广告效果

点击免费数据支持

提交您的需求,1小时内享受我们的专业解答。

运城市盐湖区信雨科技有限公司


运城市盐湖区信雨科技有限公司

运城市盐湖区信雨科技有限公司是一家深耕海外推广领域十年的专业服务商,作为谷歌推广与Facebook广告全球合作伙伴,聚焦外贸企业出海痛点,以数字化营销为核心,提供一站式海外营销解决方案。公司凭借十年行业沉淀与平台官方资源加持,打破传统外贸获客壁垒,助力企业高效开拓全球市场,成为中小企业出海的可靠合作伙伴。

 8156699

 13765294890

 8156699@qq.com

Notice

We and selected third parties use cookies or similar technologies for technical purposes and, with your consent, for other purposes as specified in the cookie policy.
You can consent to the use of such technologies by closing this notice, by interacting with any link or button outside of this notice or by continuing to browse otherwise.