本文已被:浏览 220次 下载 861次
投稿时间:2022-05-28 修订日期:2023-04-25
投稿时间:2022-05-28 修订日期:2023-04-25
中文摘要: 为解决电力地址库与外部地址库地址的匹配问题,保证电力地址准确性,实现电力系统与外部系统间数据信息共享互通,提出一种基于“检索器-鉴别器”架构的地址匹配模型。首先介绍地址匹配模型的详细结构,包括用于缩小地址检索范围的地址检索器和最终分辨地址是否匹配正确的地址鉴别器,其中地址检索器基于词频-逆文档频率算法构建,地址鉴别器基于中文预训练语言模型NEZHA构建。提出了一种负样本训练方法提升地址鉴别器辨别效果。详细介绍了实验分析所使用的两个数据集。实验结果表明基于“检索器-鉴别器”架构的电力地址匹配模型能够准确从外部地址库中找出与电力地址匹配的地址,其中,地址鉴别器能够非常准确地从多个候选地址中找出准确匹配地址,其F1分数达0.99以上。
Abstract:In order to solve the address matching problem between power address database and external address database and ensure the accuracy of power address, and realize the data information sharing and interworking between the power system and the external system, an address matching model based on "Retriever-Discriminator" architecture is proposed. Firstly, the detailed structure of the address matching model is introduced, including address retriever to narrow the search scope and the address discriminator to distinguish whether the address matches eventually. The Address Retriever is constructed based on the word frequency-inverse document frequency algorithm. The Address Discriminator is constructed based on the Chinese pre-training language model NEZHA. A negative sample training method is proposed to improve the discrimination effect of the model. Two datasets used in the experiment are introduced in detail. The experimental results show that the power address matching model based on “Retriever Discriminator” architecture can accurately find the address matching with the power address from the external address database. The Address Discriminator can find the matching address from multiple candidate address pairs with F1 score more than 0.99.
文章编号: 中图分类号: 文献标志码:
基金项目:
作者 | 单位 | |
赵坚鹏* | 国网杭州供电公司 | zhaojianpeng0830@163.com |
盛方 | 国网杭州供电公司 | |
徐川子 | 国网杭州供电公司 | |
陈奕 | 国网杭州供电公司 | |
罗庆 | 国网杭州供电公司 | |
陈聪 | 国网杭州供电公司 |
引用文本: