Short Term Load Forecasting Based On Feature Extraction and Improved General Regression Neural Network Model
Short Term Load Forecasting Based On Feature Extraction and Improved General Regression Neural Network Model
Energy
journal homepage: www.elsevier.com/locate/energy
a r t i c l e i n f o a b s t r a c t
Article history: Along with the deregulation of electric power market as well as aggregation of renewable resources,
Received 27 May 2018 short term load forecasting (STLF) has become more and more momentous. However, it is a hard task due
Received in revised form to various influential factors that leads to volatility and instability of the series. Therefore, this paper
16 October 2018
proposes a hybrid model which combines empirical mode decomposition (EMD), minimal redundancy
Accepted 20 October 2018
Available online 23 October 2018
maximal relevance (mRMR), general regression neural network (GRNN) with fruit fly optimization al-
gorithm (FOA), namely EMD-mRMR-FOA-GRNN. The original load series is firstly decomposed into a
quantity of intrinsic mode functions (IMFs) and a residue with different frequency so as to weaken the
Keywords:
Short term load forecasting (STLF)
volatility of the series influenced by complicated factors. Then, mRMR is employed to obtain the best
Empirical mode decomposition (EMD) feature set through the correlation analysis between each IMF and the features including day types,
Minimal redundancy maximal relevance temperature, meteorology conditions and so on. Finally, FOA is utilized to optimize the smoothing factor
(mRMR) in GRNN. The ultimate forecasted load can be derived from the summation of the predicted results for all
General regression neural network (GRNN) IMFs. To validate the proposed technique, load data in Langfang, China are provided. The results
Fruit fly optimization algorithm (FOA) demonstrate that EMD-mRMR-FOA-GRNN is a promising approach in terms of STLF.
© 2018 Elsevier Ltd. All rights reserved.
https://doi.org/10.1016/j.energy.2018.10.119
0360-5442/© 2018 Elsevier Ltd. All rights reserved.
654 Y. Liang et al. / Energy 166 (2019) 653e663
Additionally, the selection or expression of input in regression accumulation effect of temperature in summer on load is obvi-
analysis has influence on the diversity and testability, which limits ously significant [41]. Meanwhile, large-scale connection of
its application in STLF [15]. The gray analysis has a good fitting ef- electric vehicles and renewable distributed power as well as the
fect on the smooth discrete series, thus this approach is not suitable implementation of demand side management based on elec-
for STLF due to the discrete data [16]. tricity price and incentive intensify the load forecasting
At this stage, with the propositions and prosperities of artificial complexity. Thus the existing single intelligent algorithms are
intelligent algorithms, scholars gradually apply these models to hard to obtain satisfactory forecasting results [42]. In order to
load forecasting, such as artificial neural networks (ANNs) deal with the challenges, the combination of data processing
[17,18,20] and support vector machine (SVM) [17,19,20]. For methods and prediction approaches has been widely applied to
example, in Ref. [17], authors reveal the effect of data integrity at- STLF. The hybrid prediction techniques are primarily divided into
tacks on the accuracy of four representative load forecasting two categories, one is weighted integration of the forecasting
models (multiple linear regression, support vector regression, results derived from multiple single models. Nie et al. [43]
artificial neural networks, and fuzzy interaction regression). Then, studied STLF based on autoregressive integrated moving average
the four aforementioned load forecasting models are used to (ARIMA) and SVM. Niu et al. [44] applied BPNN-SVM-KELM
generate one-year-ahead ex post point forecasts in order to provide model to prediction on the foundation of variance-covariance
a comparison of their forecast errors. The results show that the weight dynamic allocation. This type of approaches give full
support vector regression model is most robust, followed closely by play to the advantages of each single model, but the accuracy is
the multiple linear regression model, while the fuzzy interaction restricted in virtue of the fact that the complexity of the original
regression model is the least robust of the four. In Ref. [20], authors data itself is not dealt with. The other is the combination of data
build a power consumption forecasting model using various ma- pre-processing technology and intelligent algorithm. Wherein,
chine learning algorithms. They propose two electric load fore- data pre-processing methods contain two aspects, namely the
casting models using artificial neural network and support vector decomposition of original load series and extraction of correlated
regression. The experimental results show that the two forecasting factors.
models can achieve average error rate of 3.46e10% for all clusters. The decomposition methods of original load series commonly
This kind of approaches improve forecasting accuracy through used at home and abroad consist of wavelet transform (WT) and
simulation of human brain mechanism with self-learning and self- empirical mode decomposition (EMD). It is notable that WT has
optimizing capability [21]. Back propagation neural network the advantage of localization [45] but it's a difficult task to ratio-
(BPNN) is a typical representative of ANNs. Reference [22] intro- nalize the wavelet basis and decomposition scale [46]. EMD can
duced BPNN into STLF with the consideration of weather factors. effectively decompose nonstationary load series with adaptive
Reference [23] integrated improvement differential evolution with ability to further improve prediction precision due to the fact that
wavelet neural network to forecast the load. However, the draw- there is no need to select wavelet basis [47e49]. In terms of
backs of slow convergence and easily falling into local optimum extraction of modal associated factors after decomposition,
limit their application [24]. Therefore, SVM is utilized to avoid Reference [50] executed conditional mutual information to extract
network structure selection and local optimization in load fore- features. The first 50 characteristics were selected as the best
casting. For Reference [25] and [26], it can be seen that the pre- feature set by sorting the candidate features. In Reference [51], the
diction precision of SVM increases, but there is difficulty to cope input was studied through feature extraction from temperature
with large-scale training samples and multi-classification and historical load based on mutual information. The results
problems. demonstrated that the processing could improve the prediction
Generalized Regression Neural Network (GRNN) is a highly precision. Reference [52] adopted phase space reconstruction to
parallel radial basis function network based on one-pass algorithm, establish the original feature set and analyzed the correlation
which can approximate the implied mapping relationship based on between load and these characteristics to complete optimal
samples [27]. Even if the samples are scare, the output results of feature selection. However, these techniques merely take the
GRNN are able to converge to the optimal regression. This algo- relevance into account and ignore the redundancy among factors.
rithm has been applied to many forecasting fields owing to risk In Reference [53], minimal redundancy maximal relevance
reduction of tapping into local optimum and improvement of the (mRMR) applied to pattern recognition feature selection was
learning rate and generalization ability [28e33]. For example, In proposed which not only considered the correlation between
Ref. [28], authors apply GRNN in wind speed forecasting. The mean characteristics and target variables, but also obtained the redun-
absolute percentage error of the forecasting results in two cases are dant information among features. Due to its advantages, mRMR
respectively 8.95% and 9.87%, suggesting that the proposed has been applied in various fields, such as wind speed forecasting
approach outperforms the compared models. In Ref. [29], authors [54], epigenetic biomarker identification [55] and feature selection
propose that GRNN can also be used in electrical power system of transient stability assessment [56].
forecasting. In view of the effect of the smoothing factor on GRNN This paper proposes an integrated model that combines EMD,
performance, fruit fly optimization algorithm (FOA) is employed in mRMR, FOA with GRNN. EMD is firstly exploited to decompose the
this paper to determine its value. FOA is a novel global optimization original load series into regular IMFs and a residue. mRMR is uti-
technique on the basis of foraging behaviors [34]. Fruit fly itself is lized to derive the best feature set through the correlation analysis
superior to other species especially in terms of olfaction and vision. between each IMF and the features. Then, GRNN optimized by FOA
The olfactory organs can collect all kinds of odor floating in the air, is treated as the forecasting tool. EMD-mRMR-FOA-GRNN can
even the food sources 40 km away. Then the sharp vision can be improve the prediction accuracy in load forecasting as a result of
effectively utilized to find the location of food as well as compan- input reduction and full consideration of external sensitive factors.
ions and fly to that direction. It is proved that this approach pre- The rest of the paper is organized as follows: Section 2 provides a
sents excellent performance and high efficiency in solving complex brief description of EMD, mRMR, GRNN, FOA and establishes a
optimization problems [35e37], especially in optimizing the complete prediction framework; Section 3 verifies the precision
smoothing factor of GRNN [28,35,38e40]. and stability of the developed model through a practical case;
Multiple factors make difference to power load, such as Section 4 makes a further validation and the paper is concluded in
temperature, meteorology and day types. For instance, the Section 5.
Y. Liang et al. / Energy 166 (2019) 653e663 655
2. Methodology 8
>
> r ðtÞ imf2 ðtÞ ¼ r2 ðtÞ
< 1
2.1. EMD r2 ðtÞ imf3 ðtÞ ¼ r3 ðtÞ
(6)
>«
>
:
The load can be regarded as a time series composed of a set of rN1 ðtÞ imfn ðtÞ ¼ rn ðtÞ
intrinsic mode functions (IMFs) [47]. According to the definition of
IMF, there only exists one vibration mode in each cycle and no other where rn ðtÞ is a monotone function, the number of modes n rests
complicated singularities. However, the raw data may contain with the original load series.
multiple vibration modes at any time Thus, the application of EMD
to decompose load series is grounded on two assumptions: (1) The (7) The original signal xðtÞ is reconstructed as described in Eq.
complex signal to be decomposed is made up of IMFs. (2) Each IMF (7):
is independent of each other. The specific steps of EMD are sum-
X
N
marized as follows: xðtÞ ¼ imfn ðtÞ þ rN ðtÞ (7)
n¼1
(1) Identify the local maxima and minima in the original time
series xðtÞ and employ cubic spline function to fit the upper According to Step (1) to Step (7), the original load series is
envelope eup ðtÞ and lower envelope elow ðtÞ. decomposed into sub-series in diverse frequency, namely IMFs and
(2) Calculate the mean value m1 ðtÞ of the two envelopes eup ðtÞ a residue r. Then, feature correlation analysis is implemented on
and elow ðtÞ. sub-series to find the influential factors that affect the frequency.
2.2. mRMR
eup ðtÞ þ elow ðtÞ
m1 ðtÞ ¼ (1) mRMR applies mutual information to measure the dependence
2
between two variables, which not only takes the correlation be-
tween characteristics and target variables into account, but also
(3) Calculate the difference h1 ðtÞ between the original time se- obtains the redundant information among features [53].
ries xðtÞ and m1 ðtÞ:
2.2.1. Maximum relevance
In view of mRMR, maximum relevance can be expressed as the
h1 ðtÞ ¼ xðtÞ m1 ðtÞ (2)
mean value of mutual information between feature xi and the
target variable y.
(4) If h1 ðtÞ conforms to the conditions of IMFs, it can be treated
1X
as the first IMF which consists of the shortest periodic maxD ¼ Iðx ; yÞ (8)
jJj x 2J i
component in the original signal. Otherwise, h1 ðtÞ is regar- i
ded as original load series and repeat Step (1) to Step (3) until
the difference hk1 ðtÞ at k-th iteration meets the conditions of where xi represents the influential factors of the components; y is
IMF. the component; J is the feature set of xi including day types
(workdays and weekends, hours, holidays), weather (temperature);
jJj equals the number of features in J; D means the average value of
imf1 ðtÞ ¼ hk1 ðtÞ (3) mutual information Iðxi ; yÞ between feature xi and the target vari-
able y in J.
The standard deviation (SD) is introduced to determine whether Mutual information is the intersection of two or multiple
the sifting process can be stopped, so as to judge if hk1 ðtÞ is IMF. random variables. In load forecasting, mutual information method
is exploited to capture the linear and nonlinear dependence be-
tween the input and the target variables. If they are independent,
T
P 2
k1 the value of mutual information equals 0; If not, the value of mutual
h1 ðtÞ hk1 ðtÞ
information corresponds to the positive one of the relationship
SD ¼ t¼0 (4)
T
P strength. Iðxi ; yÞ can be calculated as:
k 2
h1 ðtÞ
t¼0 pðxi ; yÞ
Iðxi yÞ ¼ ∬ pðxi ; yÞlog dx dy (9)
where hk1 k pðxi ÞpðyÞ i
1 ðtÞ h1 ðtÞ is the mean value of the upper and lower
envelopes of hk1 ðtÞ; SD is between 0.2 and 0.3 [47].
1 where pðxi Þ and pðyÞ represents the marginal probability densities
of the random variables xi and y, partly; pðxi ; yÞ equals their joint
(5) The residue r1 ðtÞ can be obtained when the first IMF imf1 ðtÞ is
probability density. The value of mutual information Iðxi ; yÞ
separated from the original load series xðtÞ:
changes to the positive direction with the correlation between xi
and y. If the two variables are independent with each other, Iðxi ; yÞ
r1 ðtÞ ¼ xðtÞ imf1 ðtÞ (5) equals 0, implying that there exists no interdependence.
Minimum redundancy means the minimal dependence be- hidden layer [28].
tween xi and xj , as expressed in Eq. (10): The number of neurons in the pattern layer equals the quantity
of learning samples n. Each neuron corresponds to a different
1 X
minR ¼ I xi ; xj (10) learning sample. The transfer function of i-th neuron in the pattern
2
jJj xi ;xj 2J layer can be described as:
" #
Where R means the dependency between each feature. ðX Xi ÞT ðX Xi Þ
pi ¼ exp ; i ¼ 1; 2; /; n (14)
Eqs. (8) and (10) are integrated with mRMR function, as 2s2
described in Eq. (11):
where X and Xi represent the input and corresponding learning
maxjðD; RÞ; j ¼ D R (11)
sample of i-th neuron, respectively; s is the smoothing factor,
The aim of the mRMR technique is to select the feature that which pertains to the width coefficient in the Gaussian function.
maximizes the relevance and minimize the redundancy simulta- There are two types of neurons in the summation layer, one of
neously, thus incremental search can be used here [53]. Suppose which calculates the arithmetical sum of output in pattern layer.
the feature set Jn1 consists of n 1 features extracted from Fm , the The weights between this neuron and the ones in pattern layer all
n-th feature selected from fFm Jn1 g on the basis of incremental equal 1. The transfer function is shown in Eq. (15):
search are shown as:
2 3 X
n
SA ¼ pi (15)
1 X
mRMR : max 4I xj ; y I xj ; xi 5 (12) i¼1
xj 2Fm Jn1 n 1 x 2J
i n1 Other neurons calculate the weighted sum of the output in the
According to Eq. (12), the features with the maximum value of pattern layer. yij represents the j-th element in the i-th output Yi ,
mRMR are searched from the remaining characteristics in Fm suc- namely the weight between i-th neuron in the pattern layer and
cessively, which constitute the candidate feature set J. j-th neuron in the summation layer. The transfer function in the
Due to the different values of mRMR for xi in J, the input with summation layer can be expressed as:
insufficient information, corresponding to a smaller mRMR, will
X
n
cause accuracy descent. Hence, it is imperative to select suitable SNj ¼ yij pi j ¼ 1; 2; /; k (16)
characteristics from J as the optimal feature set. Additionally, the i¼1
mean ratio of the absolute error of each component to the actual
load is taken as the basis to evaluate the effect of the quantity of The number of neurons in the output layer is equivalent to the
b
dimension k of the output vector in the learning samples. YðXÞ is
input characteristics on the prediction performance. For instance,
load forecasting in n previous days can be implemented as: derived from the division results by the output of two different
types in the summation layer, that is:
1X n
jimf 0 ðtÞ imf ðtÞj
E¼ 100% (13) yj ¼ SNj SA j ¼ 1; 2; /; k (17)
n t¼1 yðtÞ
b
Here, YðXÞ approximately equals the mean of the whole
where yðtÞ is the actual load at time period t; imf 0 ðtÞ and imf ðtÞ dependent variables due to large smoothing factor s. However, if s
represents the forecasted and actual value of each component at b
tends to 0, YðXÞ approaches to the training sample. When the
time period t, respectively. The value of imf 0 ðtÞ is connected with forecasted point belongs to the training sample, the predicted
the prediction approach. values of the dependent variables will be very close to the corre-
sponding ones in the sample. Otherwise, the poor generalization
2.3. GRNN ability of GRNN will limit its prediction performance. If s is mod-
erate, all the dependent variables should be taken into account.
GRNN is a highly parallel radial basis function network based on Thus, the dependent variables corresponding to the predicted point
one-pass algorithm [27]. The diagram of GRNN is illustrated in distance are endowed with larger weights. As we can see, the
Fig. 1. The number of neurons in the input layer is equal to the performance of GRNN greatly depends on s. In this paper, FOA is
dimension m of the input vector in learning samples. Each neuron is introduced to determine the optimal value of this parameter.
a simple distribution unit which directly transfers the input to the
Input Layer Pattern Layer Summation Layer Output Layer 2.4. FOA
Start
imf1(t) imf2(t) ... imfn(t) r
Set Wn=imfn, Wn+1=r, i=1
3. Case study
Fig. 4. Original load series. 3.1. Load series decomposition with EMD
The load data per hour from August 1, 2017 to October 31, 2017
are collected from the power system in Langfang, China, totally
2208 records. Fig. 4 illustrates the original load series ranges from
around 764.69 MWe1030.9 MW with none apparent regularity.
The original load series are decomposed into eight IMFs and one
residue, as presented in Fig. 5.
It can be seen that the frequency of imf1 ~imf3 is obviously higher
than imf4 ~imf8 and the residue. The prediction is implemented on
each component separately and then sum the results up.
The original feature set Fm of each imfi ðtÞ and rcan be obtained
as listed in Table 1. Incremental search method is employed in this
paper to extract characteristics that satisfy Eq. (12) as the candidate
feature set. Then calculate the mRMR values of each feature and
arrange them in a descending order.
On the foundation of ranking results, the features selected in J
from the left to the right are input to FOA-GRNN model. The rela-
tionship function between the number of input features and errors
are established in line with Eq. (13).
As shown in Fig. 6, the best feature numbers of IMFs and the
residue are 2, 7, 5, 14, 10, 11, 7, 4 and 6, respectively. The corre-
Fig. 5. The EMD results of original load series.
sponding best input feature sets Q are listed in Table 2.
From Table 1, it can be analyzed the load of imf1 ~imf3 is mainly
influenced by historical load, while the load of imf4 ~imf8 and r is
Smellbest ¼ bestSmell (25) closely related to day types, temperature, rainy and snowy days as
well as the historical one.
X axis ¼ XðbestIndexÞ (26)
3.3. Load forecasting based on FOA-GRNN
Y axis ¼ YðbestIndexÞ (27)
The data from August 8, 2017 to October 30, 2017 are selected as
training set and the remaining data on October 31, 2017 are utilized
(7) Implement iteration optimization. Repeat Step (2) to Step (5) as test set. Based on the determination of the best input features for
and judge whether the smell concentration is better than the IMFs and the residue, FOA is employed to optimize the smoothing
previous one. If it is, go to Step (6). factor sin GRNN. In FOA, suppose the population size is 20, the
maximum iteration number equals 100 and random flight distance
ranges in [-10, 10]. The values of smoothing factor in GRNN opti-
mized by FOA are recorded in Table 3.
2.5. STLF model based on EMD-mRMR-FOA-GRNN The optimized smoothing factor is brought into the GRNN for
training, and the root mean square error is selected as the training
The 24 h STLF model combining EMD, mRMR, FOA, and GRNN is error index. The calculation equation is shown in Eq. (29). The
Table 1
Original feature set.
and residue is low. The IMFs and the residue are superimposed, and
the overall training error value is calculated to be 10.2279. The
respective components are predicted separately, and the results are
shown in Fig. 7. Then, the results of the respective component
predictions are superimposed to obtain a final prediction result,
and the prediction result is as shown in Fig. 8.
From Fig. 8, it is proved that the STLF model proposed in this
paper presents a high fitting degree for the original load series.
Table 2
The best input features of IMFs and the residue.
Table 3
The values of smoothing factor for each FOA-GRNN trained by imfi and r.
Smoothing factor 0.0753 0.0435 0.0066 0.0138 0.0128 0.0251 0.0042 0.0087 0.0051
b ðtÞ XðtÞ
X
REðtÞ ¼ 100% (28)
XðtÞ
Table 5
Parameter settings and input selection of contrastive models.
mRMR-FOA-GRNN The initial population: 20; Maximum number of After mRMR feature extraction: PLt-24, PLt-23, PLt-25, HRt, PLt-1, PLt-2,
iterations: 100; Random flight range: [-10, 10] PLt-48, DTt, TPt, PLt-72, HDt, WHt
FOA-GRNN The initial population: 20; Maximum number of iterations: PLt-1, PLt-2, PLt-3, PLt-24, PLt-48, PLt-72, TPt, HDt, DTt, HRt, WHt
100; Random flight range: [-10, 10]
GRNN Smoothing factor: 0.05 PLt-1, PLt-2, PLt-3, PLt-24, PLt-48, PLt-72, TPt, HDt, DTt, HRt, WHt
SVM Punish coefficient: 2; kernel parameter: 0.1 PLt-1, PLt-2, PLt-3, PLt-24, PLt-48, PLt-72, TPt, HDt, DTt, HRt, WHt
Table 6
Prediction results of load on October 31, 2017 (Unit: MW).
N
1 X X
b ðtÞ XðtÞ
MAE ¼ (29)
N t¼1
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u N 2
u1 X
RMSE ¼ t b ðtÞ XðtÞ
X (30)
N t¼1
N b
1 X X ðtÞ XðtÞ
100%
MAPE ¼ (31)
N t¼1 XðtÞ
4. Further study
Fig. 11. MAE, RMSE, MAPE and TIC of forecasting models (I). Fig. 13. MAE, RMSE, MAPE and TIC of forecasting models (II).
662 Y. Liang et al. / Energy 166 (2019) 653e663
5. Conclusion 2017/v10i10/86243.
[16] Zhao H, Guo S. An optimized grey model for annual power load forecasting.
Energy 2016;107:272e86. https://doi.org/10.1016/j.energy.2016.04.009.
Aiming at the nonlinearity and randomness of power load se- [17] Luo J, Hong T, Fang SC. Benchmarking robustness of load forecasting models
ries, a hybrid model integrating EMD, mRMR, FOA and GRNN is under data integrity attacks. Int J Forecast 2018;34(1):89e104. https://
established in this paper. The original load series is firstly decom- doi.org/10.1016/j.ijforecast.2017.08.004.
[18] Barrow DK, Crone SF. A comparison of AdaBoost algorithms for time series
posed into several IMFs and a residue in order to weaken the forecast combination. Int J Forecast 2016;32(4):1103e19. https://doi.org/
volatility of the series affected by complex indicators. Then, mRMR 10.1016/j.ijforecast.2016.01.006.
is applied to obtain the best feature set through the correlation [19] Ma Y, Sclavounos PD, Cross-Whiter J, et al. Wave forecast and its application
to the optimal control of offshore floating wind turbine for load mitigation.
analysis between each IMF and the features that contain day types, Renew Energy 2018;128:163e76. https://doi.org/10.1016/
temperature, meteorology conditions and so on. In the end, FOA is j.renene.2018.05.059.
utilized to optimize the smoothing factor in GRNN. The final pre- [20] Moon J, Park J, Hwang E, et al. Forecasting power consumption for higher
educational institutions based on machine learning. J Supercomput 2017;(3):
dicted load can be obtained through the summation of the fore- 1e23. https://doi.org/10.1007/s11227-017-2022-x.
casted results for all IMFs. It has been proved that the proposed [21] Ren Y, Suganthan PN, Srikanth N, et al. Random vector functional link network
model outperforms other four contrast methods (mRMR-FOA- for short-term electricity load demand forecasting. Inf Sci 2016;367:1078e93.
https://doi.org/10.1016/j.ins.2015.11.039.
GRNN, FOA-GRNN, GRNN and SVM) with the lowest MAE, RMSE, [22] Zeng YR, Zeng Y, Choi B, et al. Multifactor-influenced energy consumption
MAPE and TIC. Thus, the established technique is effective, efficient forecasting using enhanced back-propagation neural network. Energy
and practicable in STLF on the power system. 2017;127:381e96. https://doi.org/10.1016/j.energy.2017.03.094.
[23] Gwo-Ching L. Hybrid improved differential evolution and wavelet neural
network with load forecasting problem of air conditioning. Int J Electr Power
Acknowledgment Energy Syst 2014;61(1):673e82. https://doi.org/10.1016/j.ijepes.2014.04.014.
[24] Liang Y, Niu D, Ye M, et al. Short-term load forecasting based on wavelet
transform and least squares support vector machine optimized by improved
This work is supported by the Natural Science Foundation of cuckoo search. Energies 2016;9(10):827. https://doi.org/10.3390/en9100827.
China (Project No. 71471059 and 7180445). Wei-Chiang Hong [25] Abdoos A, Hemmati M, Abdoos AA. Short term load forecasting using a hybrid
intelligent method. Knowl Base Syst 2015;76:139e47. https://doi.org/
thanks the support from Ministry of Science and Technology,
10.1016/j.knosys.2014.12.008.
Taiwan (MOST 106-2221-E-161-005-MY2). [26] Barman M, Choudhury NBD, Sutradhar S. A regional hybrid Goa-SVM model
based on similar day approach for short-term load forecasting in Assam, India.
Energy 2018;145:710e20. https://doi.org/10.1016/j.energy.2017.12.156.
References [27] Specht DF. A general regression neural network. IEEE Trans Neural Network
1991;2(6):568e76. https://doi.org/10.1109/72.97934.
[1] Yang Y, Li S, Li W, et al. Power load probability density forecasting using [28] Niu D, Liang Y, Hong WC. Wind speed forecasting based on EMD and GRNN
Gaussian process quantile regression. Appl Energy 2018;213:499e509. optimized by FOA. Energies 2017;10(12):2001. https://doi.org/10.3390/
https://doi.org/10.1016/j.apenergy.2017.11.035. en10122001.
[2] Sun W, Liang Y. Research of least squares support vector regression based on [29] Zhu S, Lian X, Liu H, et al. Daily air quality index forecasting with hybrid
differential evolution algorithm in short-term load forecasting model. J Renew models: a case in China. Environ Pollut 2017;231(2):1232e44. https://doi.org/
Sustain Energy 2014;6(5):1e10. https://doi.org/10.1063/1.4900552. 10.1016/j.envpol.2017.08.069.
[3] Sadaei HJ, Guimara ~es FG, Silva CJD, et al. Short-term load forecasting method [30] Debnath KB, Mourshed M. Forecasting methods in energy planning models.
based on fuzzy time series, seasonality and long memory process. Int J Approx Renew Sustain Energy Rev 2018;88:297e325. https://doi.org/10.1016/
Reason 2017;83(C):196e217. https://doi.org/10.1016/j.ijar.2017.01.006. j.rser.2018.02.002.
[4] Tarsitano A, Amerise IL. Short-term load forecasting using a two-stage sarimax [31] Du P, Wang J, Yang W, et al. Multi-step ahead forecasting in electrical power
model. Energy 2017;133:108e14. https://doi.org/10.1016/ system using a hybrid forecasting system. Renew Energy 2018;122:533e50.
j.energy.2017.05.126. https://doi.org/10.1016/j.renene.2018.01.113.
[5] Taylor JW, Mcsharry PE. Short-term load forecasting methods: an evaluation [32] Haidar AMA, Mustafa MW, Ibrahim FAF, et al. Transient stability evaluation of
based on European data. IEEE Trans Power Syst 2007;22(4):2213e9. https:// electrical power system using generalized regression neural networks. Appl
doi.org/10.1109/TPWRS.2007.907583. Soft Comput J 2011;11(4):3558e70. https://doi.org/10.1016/
[6] Boroojeni KG, Amini MH, Bahrami S, et al. A novel multi-time-scale modeling j.asoc.2011.01.028.
for electric power demand forecasting: from short-term to medium-term [33] Cigizoglu HK. Application of generalized regression neural networks to
horizon. Elec Power Syst Res 2017;142:58e73. https://doi.org/10.1016/ intermittent flow forecasting and estimation. J Hydrol Eng 2005;10(4):
j.epsr.2016.08.031. 336e41. https://doi.org/10.1061/(ASCE)1084-0699(2005)10:4(336).
[7] Yildiz B, Bilbao JI, Sproul AB. A review and analysis of regression and machine [34] Pan WT. A new Fruit Fly Optimization Algorithm: taking the financial distress
learning models on commercial building electricity load forecasting. Renew model as an example. Knowl Base Syst 2012;26(2):69e74. https://doi.org/
Sustain Energy Rev 2017;73:1104e22. https://doi.org/10.1016/ 10.1016/j.knosys.2011.07.001.
j.rser.2017.02.023. [35] Cheng J, Xiong Y. The quality evaluation of classroom teaching based on FOA-
[8] Dudek G. Pattern-based local linear regression models for short-term load GRNN. Procedia Comput Sci 2017;107:355e60. https://doi.org/10.1016/
forecasting. Elec Power Syst Res 2016;130:139e47. https://doi.org/10.1016/ j.procs.2017.03.117.
j.epsr.2015.09.001. [36] Iscan H, Gunduz M. An application of fruit fly optimization algorithm for
[9] Zamo M, Mestre O, Arbogast P, et al. A benchmark of statistical regression traveling salesman problem. Procedia Comput Sci 2017;111:58e63. https://
methods for short-term forecasting of photovoltaic electricity production. Part doi.org/10.1016/j.procs.2017.06.010.
II: probabilistic forecast of daily production. Sol Energy 2014;105:804e16. [37] Du TS, Ke XT, Liao JG, et al. DSLC-FOA: an improved fruit fly optimization
https://doi.org/10.1016/j.solener.2014.03.026. algorithm application to structural engineering design optimization problems.
[10] Wu J, Wang J, Lu H, et al. Short term load forecasting technique based on the Appl Math Model 2018;55:314e39. https://doi.org/10.1016/
seasonal exponential adjustment method and the regression model. Energy j.apm.2017.08.013.
Convers Manag 2013;70(70):1e9. https://doi.org/10.1016/ [38] Li HZ, Guo S, Li CJ, et al. A hybrid annual power load forecasting model based
j.enconman.2013.02.010. on generalized regression neural network with fruit fly optimization algo-
[11] Asrari A, Javan DS, Monfared M. Application of gray-fuzzy-Markov chain rithm. Knowl Base Syst 2013;37(2):378e87. https://doi.org/10.1016/
method for day-ahead electric load forecasting. Przeglad Elektrotechniczny j.knosys.2012.08.015.
2012;3:228e37. [39] Niu D, Wang H, Chen H, et al. The general regression neural network based on
[12] Li GD, Wang CH, Masuda S, et al. A research on short term load forecasting the fruit fly optimization algorithm and the data inconsistency rate for
problem applying improved grey dynamic model. Int J Electr Power Energy transmission line icing prediction. Energies 2017;10(12):2066. https://
Syst 2011;33(4):809e16. https://doi.org/10.1016/j.ijepes.2010.11.005. doi.org/10.3390/en10122066.
[13] Li DC, Chang CJ, Chen CC, et al. Forecasting short-term electricity consumption [40] Zhang Y, Na S, Niu J, et al. The influencing factors, regional difference and
using the adaptive grey-based approachdan Asian case. Omega 2012;40(6): temporal variation of industrial technology innovation: evidence with the
767e73. https://doi.org/10.1016/j.omega.2011.07.007. FOA-GRNN model. Sustainability 2018;10(1):187. https://doi.org/10.3390/
[14] Deb C, Zhang F, Yang J, et al. A review on time series forecasting techniques for su10010187.
building energy consumption. Renew Sustain Energy Rev 2017;74:902e24. [41] Li Y, Bao YQ, Yang B, et al. Modification method to deal with the accumulation
https://doi.org/10.1016/j.rser.2017.02.085. effects for summer daily electric load forecasting. Int J Electr Power Energy
[15] Samuel IA, Emmanuel A, Odigwe IA, et al. A comparative study of regression Syst 2015;73:913e8. https://doi.org/10.1016/j.ijepes.2015.06.027.
analysis and artificial neural network methods for medium-term load fore- [42] Niu D, Wei Y. Short-term power load combinatorial forecast adaptively
casting. Indian J Sci Technol 2017;10(10):7. https://doi.org/10.17485/ijst/ weighted by FHNN similar-day clustering. Autom Electr Power Syst
Y. Liang et al. / Energy 166 (2019) 653e663 663
2013;37(3):54e7. https://doi.org/10.7500/AEPS201202139. load forecasting with hybrid neural networks and feature selection. IEEE Trans
[43] Nie H, Liu G, Liu X, et al. Hybrid of ARIMA and SVMs for short-term load Power Syst 2016;31(3):1788e98. https://doi.org/10.1109/
forecasting. Energy Procedia 2012;16(5):1455e60. https://doi.org/10.1016/ TPWRS.2015.2438322.
j.egypro.2012.01.229. [51] Wi YM, Joo SK, Song KB. Holiday load forecasting using fuzzy polynomial
[44] Niu D, Liang Y, Wang H, et al. Icing forecasting of transmission lines with a regression with weather feature selection and adjustment. IEEE Trans Power
modified back propagation neural network-support vector machine-extreme Syst 2012;27(2):596e603. https://doi.org/10.1109/TPWRS.2011.2174659.
learning machine with kernel (BPNN-SVM-KELM) based on the variance- [52] Kouhi S, Keynia F, Ravadanegh SN. A new short-term load forecast method
covariance weight determination method. Energies 2017;10:1196. https:// based on neuro-evolutionary algorithm and chaotic feature selection. Int J
doi.org/10.3390/en10081196. Electr Power Energy Syst 2014;62(11):862e7. https://doi.org/10.1016/
[45] Li S, Wang P, Goel L. Short-term load forecasting by wavelet transform and j.ijepes.2014.05.036.
evolutionary extreme learning machine. Elec Power Syst Res 2015;122: [53] Peng H, Long F, Ding C. Feature selection based on mutual information criteria
96e103. https://doi.org/10.1016/j.epsr.2015.01.002. of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern
[46] Fletcher P, Sangwine SJ. The development of the quaternion wavelet trans- Anal Mach Intell 2005;27(8):1226e38. https://doi.org/10.1109/
form. Signal Process 2017;136:2e15. https://doi.org/10.1016/ TPAMI.2005.159.
j.sigpro.2016.12.025. [54] Wang Q, Guan T, Qin B. Short-term wind speed forecasting of ORELM based on
[47] Huang NE, Wu MLC, Long SR, et al. A confidence limit for the empirical mode MRMR. Renew Energy Resour 2018;36(01):85e90. https://doi.org/10.13941/
decomposition and Hilbert spectral analysis. Proc Math Phys Eng Sci j.cnki.21-1469/tk.2018.01.013.
2003;459(2037):2317e45. https://doi.org/10.1098/rspa.2003.1123. [55] Mallik S, Bhadra T, Maulik U. Identifying epigenetic biomarkers using maximal
[48] Amjady N, Abedinia O. Short term wind power prediction based on improved relevance and minimal redundancy based feature selection for multi-omics
kriging interpolation, empirical mode decomposition, and closed-loop fore- data. IEEE Trans NanoBioscience 2017;(99). https://doi.org/10.1109/
casting engine. Sustainability 2017;9(11):2104. https://doi.org/10.3390/ TNB.2017.2650217. 1-1.
su9112104. [56] Yang LI, Xueping GU. Feature selection for transient stability assessment
[49] Lahmiri S. Comparing variational and empirical mode decomposition in based on improved maximal relevance and minimal redundancy criterion.
forecasting day-ahead energy prices. IEEE Syst J 2017;99:1e4. https://doi.org/ Proc Chin Soc Electr Eng 2013;33(34):179e86. https://doi.org/10.13334/
10.1109/JSYST.2015.2487339. j.0258-8013.pcsee.2013.34.024.
[50] Li S, Wang P, Goel L. A novel wavelet-based ensemble method for short-term