###
DOI:
本文二维码信息
基于双向长短记忆网络融合模型的国网招标文件解析技术研究
徐世阳
(国网重庆市电力公司物资分公司)
RESEARCH ON ANALYSIS TECHNOLOGY OF STATE GRID BIDDING DOCUMENTS BASED ON FUSION MODEL OF BI-LSTM AND CONDITIONAL RANDOM FIELD CRF
xushiyang
(State Grid Chongqing Electric Power Company Materials Branch)
摘要
相似文献
本文已被:浏览 8次   下载 0
投稿时间:2024-03-10    修订日期:2024-04-06
中文摘要: 随着国家电网有限公司电子招投标业务的迅猛发展,供应商需要在大量的招标文件中及时、准确地获取与自身业务相关的招标关键信息,例如招标项目、开标时间、资格条件等。本文旨在研究一种适合国网招标文件特征的招标文件解析技术,让数据结构化、可视化呈现,从而助力供应商锁定投标时机,支撑经营决策。首先,将招标文件文本通过篇章分析、表格检测、文本纠错等,获取有效文本数据。将数据输入到不同解析算法模型中,通过标注数据来判定模型效果,得到5种合适的算法技术选型。然后,应用国网招标文件样本数据,通过模型定制和调优,进行进一步研究和测试,通过收集训练语料数据、构建测试数据集等,构建了融合双向长短记忆Bi-LSTM、机器学习CRF以及深度学习BERT的国网招标文件解析模型。最后,运用823条国网招标文件样本数据,对模型进行实战训练和对比测试。最终结论表明:双向长短记忆融合模型的各项指标均高于BERT+Bi-LSTM模型,并且机器学习CRF模型层可以通过加入一些约束条件来保证最终预测结果的有效性,而这些约束条件又是可以在训练数据时自动学习获得,能够更加准确、有效地提升招标文件的解析效果。
中文关键词: 国网招标  Bi-LSTM  CRF  文件结构分析  文本分析
Abstract:With the rapid development of the electronic bidding business of State Grid Corporation of China, suppliers need to timely and accurately obtain key bidding information related to their own business from a large number of bidding documents, such as bidding projects, bid opening time, qualification conditions, etc. This article aims to study a bidding document parsing technology that is suitable for the characteristics of State Grid bidding documents, allowing data to be structured and visualized, thereby helping suppliers lock in bidding opportunities and support business decisions. Firstly, the bidding document text is analyzed through text analysis, table detection, text error correction, etc. to obtain valid text data. Input data into different analytical algorithm models, annotate the data to determine the effectiveness of the model, and obtain 5 suitable algorithm technology options. Then, using sample data from the bidding documents of the State Grid, further research and testing were conducted through model customization and tuning. By collecting training corpus data and constructing test datasets, a bidding document parsing model integrating bidirectional long short memory Bi-LSTM, machine learning CRF, and deep learning BERT was constructed. Finally, 823 sample data from the bidding documents of the State Grid were used for practical training and comparative testing of the model. The final conclusion indicates that the various indicators of the bidirectional long short memory fusion model are higher than those of the BERT+Bi LSTM model, and the machine learning CRF model layer can ensure the effectiveness of the final prediction results by adding some constraints, which can be automatically learned during training data, and can more accurately and effectively improve the parsing effect of bidding documents.
文章编号:20240310001     中图分类号:    文献标志码:
基金项目:
引用文本: