###
DOI:
电力大数据:2019,22(01):-
←前一篇   |   后一篇→
本文二维码信息
基于隐马尔科夫和主成分分析的电网数据词典构建
(国网北京市电力公司电力科学研究院)
The Construction of Grid Data Dictionary Based on HMM and PCA
(1.State Grid Beijing Electric and Power Research Institute,100075;2.Beijing,China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 420次   下载 822
投稿时间:2018-07-02    修订日期:2018-07-22
中文摘要: 电网企业拥有海量采用中文记录的非结构化文本信息,其中包含有大量重要的可靠性统计信息。但依靠人工对其进行挖掘不仅效率低而且准确性因人而异。如何高效、准确、智能地挖掘电网企业设备缺陷文本中重要的可靠性统计信息是目前亟待解决的问题。本文基于改式隐式马尔科夫算法对通过全过程技术监督工作采集的非结构化文本数据进行分句分词,制定研究非结构化数据的结构化表达规则。利用主成分分析、词向量以及深度神经网络等的自然语言处理算法对现有的问题描述文本中的同名词、同义词以及近义词等的语义相似度进行计算,并采用K阶近邻算法对降维后的词向量进行分类聚类。上述工作解决了缺陷文本句子成分难以划分、数字量无法精确提取等问题,形成一份国网系统运检专业领域的数据词典库,为电网领域的非结构化数据挖掘提供了新技术,为今后技术监督工作的展开具有重要意义和贡献。
中文关键词: 文本分类  分词  隐马尔可夫  技术监督
Abstract:Power grid enterprises have a large number of unstructured text data recorded in Chinese, in which contains a lot of im-portant reliability statistical information. However, it is not only inefficient, but also accurate to different people to mine the unstructured text data manually. Therefore, how to excavate the important reliability statistics information in the equipment defect text of power grid enterprises effectively and accurately and intelligently becomes a problem to be solved at present. In this work, we use was used to segment the unstructured text data collected by the whole process of technical supervision was segmented based on the modified Hidden Markov Model(HMM) algorithm, and the structured expression rules of unstructured data was formulated. Natural Language Processing(NLP) algorithms, like Principal Component Analysis(PCA), vector and depth neural network were used to calculated the semantic similarity of words, synonyms and synonyms after dimensionality reduction classification clustering through K Nearest Neighbor algorithm. Relaying on the works, the problems of defects of the text sentence and dividing the digital quantity accurately has been solved, and a data dictionary in state grid operation and maintenance field was created, which has important significance and contribution to the future expansion of technical supervision work.
文章编号:     中图分类号:    文献标志码:
基金项目:
引用文本: