###
DOI:
电力大数据:2025,28(01):-
←前一篇   |   后一篇→
本文二维码信息
基于深度学习的表结构识别技术研究
金莎1, 李林汉1, 谢海龙1, 黄晓宏1, 董前前2
(1.中国电建集团江西省电力建设有限公司;2.西安工程大学)
Table structure recognition in documents based on deep learning
JinSha1, Li LinHan1, Xie HaiLong1, HUANG XiaoHong1, DONG QianQian2
(1.PowerChina Jiangxi Electric Power Construction Co., Ltd.;2.Xi’an Polytechnic University)
摘要
图/表
参考文献
相似文献
本文已被:浏览 24次   下载 19
投稿时间:2024-10-15    修订日期:2024-12-24
中文摘要: 在新型电力系统中,大模型的应用正在逐渐增多,它们在处理和分析大量电力数据、提高电网运营效率方面发挥了巨大的作用。为解决模型训练数据中表格的规范化问题,本文提出了一种结合Spatial CNN和Grid CNN的深度学习方法,用于表格结构的识别与重建。通过Spatial CNN预测分割线掩码,并利用连通组件分析算法(CCA)提取分隔线,构建单元格网格。Grid CNN单元格合并模块对相邻单元格进行合并,以纠正分割错误,形成准确的表格结构。为提升模型效率,采用基于ResNet-18的特征金字塔网络(FPN)作为骨干网络,整合多尺度特征信息,增强表格结构的识别能力。该方法不仅提高了表格识别的准确性,还优化了重建过程,为自动化表格处理提供了有效的技术方案。实验结果表明,与现有性能优异的方法相比,在SciTSR数据集上,本文表结构识别方法在精确率和F1指标上各提高了1.1百分点和0.3百分点;在更为复杂的表数据集PubTabNet上,本文设计的表结构识别方法的TEDS-Struct指标达到97.1%,为电力文档表格的预处理提供了一种高效准确的解决方案。
Abstract:In the new type of power system, the application of large models is gradually increasing, and they have played a huge role in processing and analyzing a large amount of power data, and improving the efficiency of power grid operations. To solve the normalization problem of tables in the model training data, this paper proposes a deep learning method that combines Spatial CNN and Grid CNN for the identification and reconstruction of table structures. The method predicts the split line mask through the Spatial CNN and extracts the split lines by using the connected component analysis algorithm (CCA) to build a grid of cells. The Grid CNN cell merging module merges adjacent cells to correct splitting errors and form an accurate table structure. To improve the efficiency of the model, a feature pyramid network (FPN) based on ResNet-18 is used as the backbone network to integrate multi-scale feature information and enhance the recognition ability of the table structure. This method not only improves the accuracy of table recognition but also optimizes the reconstruction process, providing an effective technical solution for automated table processing. The experimental results show that compared with the existing excellent performance methods, the table structure recognition method in this paper has increased by 1.1 percentage points and 0.3 percentage points in precision and F1 metrics on the SciTSR dataset; on the more complex table dataset PubTabNet, the TEDS-Struct metric of the table structure recognition method in this paper reached 97.1%. The research in this paper provides a high-efficient and accurate solution for table processing and training of large language models.
文章编号:     中图分类号:TP311    文献标志码:
基金项目:中国电力建设股份有限公司项目;陕西省教育厅重点科学研究计划项目
引用文本: