兰州理工大学学报 ›› 2021, Vol. 47 ›› Issue (1): 97-104.

• 自动化技术与计算机技术 • 上一篇    下一篇

MapReduce框架下基于线性回归的短期负荷预测

吴丽珍, 孔纯, 陈伟   

  1. 兰州理工大学 电气工程与信息工程学院, 甘肃 兰州 730050
  • 收稿日期:2019-11-11 出版日期:2021-02-28 发布日期:2021-03-11
  • 作者简介:吴丽珍(1973-),女,福建福州人,博士,教授.
  • 基金资助:
    国家自然科学基金(62063016),甘肃省基础研究创新群体项目(18JR3RA133)

Short-term load forecasting based on linear regression under MapReduce framework

WU Li-zhen, KONG Chun, CHEN Wei   

  1. College of Electrical and Information Engineering, Lanzhou Univ. of Tech., Lanzhou 730050, China
  • Received:2019-11-11 Online:2021-02-28 Published:2021-03-11

摘要: 为解决负荷预测时因数据量大、数据种类繁多带来的计算速度慢、预测精度低等问题,在MapReduce并行编程框架下,提出基于小批量随机梯度下降法的线性回归模型.首先,为清理智能配电终端产生的重复数据和不良数据,提出利用自适应近邻排序算法清除重复记录的数据,并利用K均值聚类的方法剔除异常数据和记录不完整的数据,然后利用F检验法来检验该数据集能否线性表征负荷,再利用T检验法检验特性向量与负荷间线性关系的显著性,并剔除与负荷线性关系较弱的特性向量.根据以上方法建立短期负荷预测模型,并将其用在甘肃武威某区域配电网短期负荷预测中.结果表明,所提出的短期负荷预测模型的平均绝对百分误差为2.043%,均方根误差为3 112.62.这些预测误差满足负荷预测的要求,极大地提高了负荷计算的速度,缩短了负荷预测时间.

关键词: 大数据分析, 小批量随机梯度下降, 短期负荷预测, 分布式并行计算, MapReduce框架

Abstract: In order to solve problems of slow calculation speed and low prediction accuracy caused by large amount of data and various kinds of data in load forecasting, a linear regression model based on small batch random gradient descent method is proposed in this paper under the framework of MapReduce parallel programming. First of all, in order to clean up repetitive data and bad data generated by the intelligent distribution terminal, the adaptive nearest neighbor sorting algorithm is proposed to remove the repeated data, and accordingly the K-means clustering method is used to eliminate abnormal data and incomplete data. The F-test method is then employed to test whether a data set can represent the load linearly. The T-test method is further adopted to test the significance of linear relationship between the characteristic vector and the load. Finally, any characteristic vector with weak linear relationship with the load is accordingly eliminated as a result. According to the above methods, a short-term load forecasting model is established and applied to the short-term load forecasting of distribution network in Wuwei, Gansu Province. Results from the prediction show that the average absolute percentage error of the proposed short-term load forecasting model is about 2.043%, and the root mean square error takes about 3 112.62. These forecasted errors meet the requirements of load forecasting, and not all improve greatly the speed of load calculation but also shorten the time for load forecasting.

Key words: big data analyses, mini-batch stochastic gradient descent, short-term load forecasting, distributed parallel computing, MapReduce framework

中图分类号: