###
DOI:
本文二维码信息
基于双智能体深度强化学习的含换电站主动配电网运行优化
许正波
(百色新铝电力有限公司)
Operation optimization of active distribution network with battery swap stations based on dual-agent deep reinforcement learning
xu zheng bo
(11)
摘要
相似文献
本文已被:浏览 1次   下载 0
投稿时间:2025-02-09    修订日期:2025-05-14
中文摘要: 针对分布式光伏大量接入配电网造成节点电压越限导致光伏弃电量增加问题,提出一种网荷互动的双智能体深度强化学习主动配电网优化运行策略。在配电网侧,以配电网电压偏移最小和光伏弃电量最少为目标,采用双深度Q网络(Double Deep Q-Network, DDQN)训练电容器组和光伏逆变器2种调节设备动作策略的智能体。在负荷侧,考虑到换电站调节容量大的特性,根据换电站历史换电数据和运营收益数据,采用孪生延迟深度确定性策略梯度(Twin Delayed Deep Deterministic Policy Gradient,TD3)算法训练智能体,使其在训练中获得运营收益最大的充放电经验。2个智能体在训练过程中通过换电站的充放电功率实现信息交互,各自实现目标函数最优化。经训练后的2个智能体分别部署于配电网调度中心和换电站控制中心,根据电网实时运行数据在线决策调节设备动作策略和换电站充放电策略,最后以某一10kV系统为算例验证模型有效性。
Abstract:In order to solve the problem that a large number of distributed photovoltaic (PV) are connected to the distribution network, which leads to the increase of PV abandoned power due to the overlimit of node voltage, a two-agent deep reinforcement learning active distribution network optimization strategy was proposed. On the power grid side, the Double Deep Q-Network (DDQN) is used to train three kinds of agents that adjust the operation strategy of the equipment, namely capacitor bank, reactive power compensation device and photovoltaic inverter, aiming at the minimum voltage deviation of the distribution network and the minimum PV power discarded. On the load side, considering the large regulating capacity of the power changing station, the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm agent is adopted according to the historical power changing data and operation income data of the power changing station. Make it in the training to obtain the maximum operating profit of charge and discharge experience. In the training process, the two agents realized information exchange through the charging and discharging power of the changing station, and each realized the optimization of the objective function. After training, the two agents are deployed in the distribution network dispatching center and the control center of the changing station respectively. According to the real-time operation data of the power grid, the action strategy of the equipment and the charging and discharging strategy of the changing station are adjusted online. Finally, a 10kV system is taken as an example to verify the effectiveness of the model.
文章编号:20250209001     中图分类号:    文献标志码:
基金项目:
作者单位邮编
许正波* 百色新铝电力有限公司 533013
Author NameAffiliationPostcode
xu zheng bo 11 533013
引用文本: