0% found this document useful (0 votes)
137 views12 pages

Long-Term Wind Speed and Power Forecasting Using Local RNNs Models

Uploaded by

Puja Dwi Lestari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views12 pages

Long-Term Wind Speed and Power Forecasting Using Local RNNs Models

Uploaded by

Puja Dwi Lestari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

IEEE TRANSACTIONS ON ENERGY CONVERSION, VOL. 21, NO.

1, MARCH 2006 273

Long-Term Wind Speed and Power Forecasting Using


Local Recurrent Neural Network Models
Thanasis G. Barbounis, John B. Theocharis, Member, IEEE, Minas C. Alexiadis, and
Petros S. Dokopoulos, Member, IEEE

Abstract—This paper deals with the problem of long-term wind scales in the order of several minutes or even hours are encoun-
speed and power forecasting based on meteorological information. tered, when power system scheduling is to be addressed [3]. In
Hourly forecasts up to 72-h ahead are produced for a wind park these cases, the time series approach is usually followed, as in
on the Greek island of Crete. As inputs our models use the numer-
ical forecasts of wind speed and direction provided by atmospheric [4] where a recurrent high-order neural network (NN) is used
modeling system SKIRON for four nearby positions up to 30 km for short-term prediction of wind power. Furthermore, similar
away from the wind turbine cluster. Three types of local recur- models have been applied for daily, weekly, or even monthly
rent neural networks are employed as forecasting models, namely, time series [5].
the infinite impulse response multilayer perceptron (IIR-MLP), the
Long-term prediction of wind power allows planning the con-
local activation feedback multilayer network (LAF-MLN), and the
diagonal recurrent neural network (RNN). These networks contain nection or disconnection of wind turbines or conventional gen-
internal feedback paths, with the neuron connections implemented erators, thus achieving low spinning reserve and optimal oper-
by means of IIR synaptic filters. Two novel and optimal on-line ating cost. It refers to hourly data and a time horizon of up to two
learning schemes are suggested for the update of the recurrent net- to three days ahead. In such cases, the statistical properties of
work’s weights based on the recursive prediction error algorithm.
The methods assure continuous stability of the network during the
the wind are not helpful, and hence, we have to rely on approx-
learning phase and exhibit improved performance compared to the imate wind forecasts provided by the national meteorological
conventional dynamic back propagation. Extensive experimenta- services. These predictions are calculated for some predefined
tion is carried out where the three recurrent networks are addi- reference points, not necessarily on the position of the park, so
tionally compared to two static models, a finite-impulse response the need arises for the reduction of these predictions to the site
NN (FIR-NN) and a conventional static-MLP network. Simulation
results demonstrate that the recurrent models, trained by the sug- of interest.
gested methods, outperform the static ones while they exhibit sig- In the past, considerable efforts have been devoted to utilizing
nificant improvement over the persistent method. meteorological information to derive the required forecasts.
Index Terms—Local recurrent neural networks, long-term wind Micro- and Meso-scale models (such as WASP or PARK) are
power forecasting, nonlinear recursive least square learning, real actually deterministic models that suggest various correcting pa-
time learning. rameters according to the terrain properties (orography, rough-
ness). They also take consideration of the number, type, and
I. INTRODUCTION location of wind turbines in the wind farm, the position, the
hub-height, the power curve of each one to produce the total

W IND ENERGY conversion systems (WECS) appear as


an appealing alternative to conventional power genera-
tion, being most appropriate for isolated power systems on is-
power output of the wind farm [6], [7].
Model output statistics (MOS) are used to translate nu-
merical meteorological inputs to actual wind power outputs,
lands or rural areas. Integration of accurate wind forecasts in the e.g., simple methods to correct bias and scaling errors of the
management routines involved in WECS provides a significant initial forecasted values [8]–[11]. Artificial Intelligence models
tool for optimizing operating costs and improving reliability. are actually advanced MOS techniques combining complex
However, due to highly complex interactions and the contri- input-output architecture, robust, and flexible adaptive algo-
bution of various meteorological parameters, wind forecasting rithms [12].
is a severe task. Despite the difficulties, a variety of approaches In this paper, we deal with long-term wind speed and power
have been suggested in the literature. The particular predic- forecasting for a wind park. Owing to the large time-scale,
tion method used depends on the available information and the we are based on three-days ahead meteorological predictions
time-scale of the application. Wind forecasts in the range of a (wind speed and direction) provided at four nearby sites of
few seconds are used for wind turbines control [1], [2]. Time the park. To account for the complex dynamics of the process,
three local recurrent neural networks with internal feedback
Manuscript received December 9, 2003; revised June 11, 2004. Paper no. paths are employed to produce 72–hours-ahead wind forecasts.
TEC-00357-2003. The above choice is motivated by the fact that these models
T. G. Barbounis and J. B. Theocharis are with the Electronic and Computer
Engineering Division, Electrical and Computer Engineering Department, Aris- exhibit faster learning compared to the fully recurrent neural
totle University of Thessaloniki, Thessaloniki, Greece. networks [13]. First, the infinite impulse response multilayer
M. C. Alexiadis and P. S. Dokopoulos are with the Electrical Power Systems perceptron (IIR-MLP) is suggested, having sufficiently rich
Laboratory, Electrical and Computer Engineering Department, Aristotle Uni-
versity of Thessaloniki, Thessaloniki, Greece. network architecture. Additionally, more relaxed structures are
Digital Object Identifier 10.1109/TEC.2005.847954 considered, including the local activation feedback multilayer
0885-8969/$20.00 © 2005 IEEE
274 IEEE TRANSACTIONS ON ENERGY CONVERSION, VOL. 21, NO. 1, MARCH 2006

Fig. 1. Geographical location of the wind park of Rokas, on the Greek island of
Crete. Also shown are the nodes where meteorological predictions are available
and the prevailing wind direction. Fig. 2. Configuration of the forecasting approach. The node predictions and
the wind forecasts produced by the models are given at the beginning of each
day. They cover a time period of 72-h ahead and are updated every 24 h.
network (LAF-MLN) and the diagonal recurrent neural network
(DRNN) model. Two novel and efficient learning schemes are The real data provided by the W/F SCADA system include
developed, a global and a decoupled approach of the recursive values of the wind speed measured at a reference point within
prediction error (RPE) algorithm, for updating the network’s the park and the total power of the farm, denoted as and
weights, suitable for on-line applications. The experimental , respectively. The data are recorded hourly from April
results show that accurate forecasts are obtained, with the 1st, 2000 until December 31st, 2000.
recurrent forecast models exhibiting superior performance with Given the node predictions, the objective is to develop an ad-
regard to the static networks. vanced forecast model providing 72-h-ahead wind speed and
power forecasts at the park, denoted as and , respec-
II. PROBLEM FORMULATION tively. The configuration of the forecasting approach is depicted
Let us consider the Rokas W/F with a capacity of 10.2 MW in Fig. 2. The wind estimates are given at the beginning of each
located at Eastern Crete, Greece. For efficient maintenance and day and the model is generally described by
resource planning of WECS, it is very helpful for us to know in
advance the wind speed and power at the park, for a time horizon
including a few days ahead. This allows an optimal policy to (2)
be designed, using the optimum amount of the available wind
turbines and scheduling the possible need for storing or supple- where stands for either or represents the
menting the generated power. nonlinear mapping function of the process, and the vectors
Due to the large time scale, the main source of information and are given as
is the wind meteorological predictions, near-surface “node pre-
dictions”, calculated for 4 specific positions (N, S, E, W) lo-
cated around the wind park (see Fig. 1). The node predictions
are given once per day, and assume for simplicity that they are (3)
available at the beginning of each day , where
is the number of days considered in the data set. For each Apparently, there are both spatial and temporal correlations
node and day, the meteorological data are formulated as records involved between the node predictions and the wind variables
comprising predictions of the wind speed and direction at the at the park to be forecasted, rendering the system a highly
succeeding 72-h ahead complex, dynamic, and nonstationary process. The wind values
are affected by large-scale atmospheric conditions and the
morphology of the surface landscape.
(1) Efficient forecasting dictates that the model should exhibit
the following properties. First, for each , the current and past
where , and values of the node predictions should be considered as model
denote the meteorological node predictions of inputs, as suggested by , so that the model can prop-
wind speed and direction, respectively, at time . erly identify the input trends and variations. Moreover, in the
Also available are the predictions of atmospheric pressure and absence of real wind speed and power values for times greater
temperature at the park site. than , the model’s previous estimates, , should be used
BARBOUNIS et al.: LONG-TERM WIND SPEED AND POWER FORECASTING USING LOCAL RNN MODELS 275

to derive the current output estimates, . Finally, the model


should be capable of memorizing the dynamic nature of the
process. In this paper, we employ three types of recurrent neural
networks, as advanced forecast models, to generate long-term
estimates of and . These networks belong to the
family of local-recurrent global-feedforward (LRGF) models
with internal dynamics, have strong temporal modeling capabil-
ities and fulfill completely the quality criteria described above.
Furthermore, two novel and efficient learning schemes are also
devised for the update of the network weights.
When an advanced model is not available, the so-called per-
sistent forecasts can be obtained with a minimal effort using the
most recent information available. Following this approach, the
forecast at future time-steps is determined
as an average of the past values

(4)

where are real values of wind speed and power, mea-


sured at times prior to . This naive predictor suggests that as
the forecasting time lag increases, correlation with recent past
measurements becomes negligible, so a longer scale average
should be preferred instead. The advantage gained by an ad-
vanced model is referred to as % error improvement over persis-
tent and serves as a means to evaluate the model’s performance.

III. NETWORK ARCHITECTURE


The LRGF recurrent networks are composed of layers ar-
ranged in a feedforward fashion. Each layer contains dynamic Fig. 3. Local recurrent NNs used for wind forecasting. (a) Neuron model for
IIR-MLP. (b) Neuron model for LAF-MLN. (c) DRNN network configuration.
processing units (neurons) with time-delay lines and/or feed-
back. Depending on the neuron dynamic model, three network
types are mainly considered, namely, the IIR-MLP, the local ac- the th layer. , and denote the order of the
tivation feedback MLN (LAF-MLN) and the output feedback MA and the AR part, respectively, of the synapse connecting the
MLN. In this paper, we focus on the IIR-MLP and the LAF- th neuron in the th layer with the th input coming from the
MLN. Additionally, a simplification of the LAF-MLN is con- th layer, with , and .
sidered, namely, the DRNN. and are
the coefficients of the MA and the AR, respectively, of the corre-
A. The IIR-MLP Model sponding synapse. is the bias term of each neuron. Finally,
The IIR-MLP architecture, suggested by Tsoi and Back [13], and are the node’s activation function and its
consists of neurons where the synaptic weights are replaced with derivative.
infinite impulse response (IIR) linear filters, also refereed to as The forward run at time evaluated for and
autoregressive moving average (ARMA) models, as shown in , is described as follows:
Fig. 3(a).
To cope with the structural complexity of the networks, the
following notational convention is employed. The IIR-MLP net-
work is assumed to consist of layers, with
and denoting the input and the output layer, respec-
(5)
tively. The th layer contains neurons, and being
the number of neurons in the input and the output layer, respec-
tively. is the output of the th neuron of the th layer at
(6)
time . In particular, are the network’s
input signals, while are the output
(7)
signals. is the output of the summing point, that is, the
input to the activation function of the th neuron of the th Assuming that the network is running along a training se-
layer at time . is the synaptic filter output at time con- quence (epoch wise mode) with the weights remaining fixed
necting the th neuron in the th layer with the th neuron of throughout the epoch, the neuron’s dynamics can be described
276 IEEE TRANSACTIONS ON ENERGY CONVERSION, VOL. 21, NO. 1, MARCH 2006

in a compact way using a notation employed in the adaptive filter while the network’s output is determined by
theory, as follows:
(16)
(8)
Notice that the DRNN can be derived as a special case of the
where LAF-MLN for . Particularly, we consider only the con-
stant terms of the MA filters for all neurons. More-
(9) over, the AR parts are reduced to first order filters
for the hidden layer’s neurons, while feedback is broken for the
neurons in the output layer.
(10)
IV. GRADIENT CALCULATIONS
and is the delay operator, . It The learning algorithm to be developed in the next section re-
can be seen that the IIR-MLP architecture realizes a sufficiently quires the computation of the gradients of the network’s output
rich class of models. It includes several known neural types as with respect to all trainable weights. Because of the existence
special cases, depending on the parameter settings of the IIR of internal dynamics, the traditional procedure of the standard
synaptic filters. For instance, the IIR-LMP can be reduced to a back propagation (BP) cannot be applied to determine these gra-
FIR-NN [14] by eliminating the feedback connections dients. Therefore, we employ the method of ordered derivatives
. Moreover, IIR-MLP reduces to the conventional static MLP [18], extensively used in the literature for calculating partial
[15] by discarding the MA and the AR parts of the synaptic fil- derivatives in complex recurrent networks. Notice that since the
ters . recursive approach is adopted, the chain rule derivative expan-
sion is developed in a forward fashion.
B. The LAF-MLN Model The gradients relate differential changes of the neuron outputs
The neuron model of the LAF-MLN structure [16] is shown to differential changes of a weight within the network. Consider
in Fig. 3(b). The output of a neuron summing-node is filtered an arbitrary weight denoting either or of
through an autoregressive (AR) adaptive filter before feeding a synapse ending to the th neuron of the th layer. Derivation of
the activation function. It should be noticed that, regarding the the gradient calculations for the IIR-MLP model is given in the
structural complexity, the LAF-MLN is considerably simpler Appendix. The respective calculations for the LAF-MLN and
compared to the IIR-MLP. This is due to the fact that while an the DRNN proceed along similar lines and therefore they are
AR part is introduced in every synaptic link in IIR-MLP, a single omitted.
AR recursion exists for each neuron in the LAF-MLN architec- Notice that the calculation of the gradients is achieved
ture. As a result, for the same network structure, the latter net- through higher order recurrent difference equations. This is
work contains a smaller number of tunable weights compared opposed to static network structures where the weights are
to the former one. updated using static relations [15]. As regards the IIR-MLP
The forward run equations for the LAF-MLN are described model and based on the gradient analysis described in the
as follows: Appendix, the following comments are in order. The gradients
of the synaptic filter output with regard to the filter weights
and are derived by passing and
, respectively, through the AR part of the synaptic
filter. Furthermore, at each t, the gradients of the synaptic filter
(11) outputs belonging to the succeeding layer
with respect to ( or ), where the weight change is
(12) assumed, are calculated by passing the corresponding gradients
of the th neuron’s output at the th layer through the filter
(13) . Hence, the gradient dynamics is identical to the forward
one, describing the dynamics of through the
C. The DRNN Model synapse to produce . Finally, the gradients of the filter
In order to further reduce the structural complexity issue, we outputs of the th layer with are derived
consider the DRNN, a modified form of the fully recurrent model in terms of the gradients of the neuron outputs of the preceding
[17]. DRNN is a two-layer network [Fig. 3(c)], where the hidden layer th with respect to .
layer contains self-recurrent neurons while the output layer is Following the above observations, the “gradient amidst dy-
composed of linear neurons. The hidden layer equations are namics” with respect to an arbitrary weight is described
in terms of an auxiliary network, , called the sensitivity net-
(14)
work [19]. For each , the associated is formed as
a sub-network of the original one, starting with the -th node
(15) of the -th layer and proceeding ahead until the output layer.
BARBOUNIS et al.: LONG-TERM WIND SPEED AND POWER FORECASTING USING LOCAL RNN MODELS 277

is fed by the gradients of with respect to while where is the prediction error for a particular value of .
its output is the gradient of the network outputs with regard to According to the GRPE method, all network weights are con-
. tinuously determined at each using the following recursion:
Notice that the gradient calculations are carried out, through
(19)
the sensitivity networks, in parallel with the forward network,
following similar dynamics. That is, the gradients are calculated (20)
on an on-line basis, as the original network runs in time. Hence, (21)
the use of the sensitivity networks provides a transparent tool for (22)
the evolution of the network gradients. It is assumed that during
(23)
on-line training the weights are changing smoothly, that is, they
remain almost constant. In that case, the weight gradients of where is the forgetting factor, . The GRPE algo-
higher order can be obtained as delayed versions rithm provides a recursive way of minimizing a quadratic crite-
of the gradients with respect to . rion of the prediction errors using the stochastic Gauss-Newton
search method [21]. The algorithm is identical to a nonlinear
V. LEARNING ALGORITHM recursive least squares (RLS) [22] method that minimizes the
Having determined the network gradients, we proceed to de- error criterion
veloping the learning algorithms for the on-line update of the
tunable weights. The most common algorithm used for on-line (24)
training of recurrent networks is real-time recurrent learning
(RTRL) [20], where the weight update is performed along the Furthermore, the recursion (19)–(23) has strong similarities to
negative gradient direction. Nevertheless, this method exhibits the extended Kalman filter (EKF) algorithm used in [23].
slow convergence rates because of the small learning rates re- is the gain matrix controlling the weight update and is the
quired, and most often becomes trapped to local minima. In error covariance matrix that define the search changes along the
an attempt to improve the performance, we employ an optimal Gauss–Newton direction. Assuming that no prior information
learning scheme, the recursive prediction error (RPE) identifi- is available, is usually taken , where is an
cation algorithm, with enhanced training qualities. Owing to the arbitrary large number.
second order information used in the update recursions, better- A key issue for the performance of the algorithm is the matrix
conditioned training is accomplished compared to the RTRL defined as follows:
method.
Along this direction, two novel algorithms are suggested for (25)
training the local feedback recurrent networks considered in
this paper, where the stability issue is assured throughout the where is a matrix containing the partial deriva-
learning process. First a global scheme is developed, called the tives of the predictor’s model, that is, the network’s outputs,
global RPE (GRPE), where all weights are simultaneously up- with respect to the trainable weights
dated. Additionally, to cope with the increased computational
complexity of the GRPE, we devised a local version of the al-
gorithm, called the decoupled RPE (DRPE). The DRPE is de-
rived by partitioning the global optimization problem into a set .. .. ..
. . . (26)
of manageable sub-problems, at the neuron level. Thus, consid-
erable computational and storage savings are gained while pre-
serving high accuracy qualities, similar to the ones of the GRPE.

A. The GRPE Algorithm These gradients are computed using the sensitivity networks, as
described in Section IV.
Let denote an n-dimensional composite vector including in (22) implements a projection mechanism into the
all synaptic weights of the recurrent network under considera- stability region. As discussed in Section IV, the gradient dy-
tion. We consider a nonlinear recurrent predictor of the type namics are identical to the forward dynamics of the recurrent
(17) network. Hence, learning stability also guarantees stable oper-
ation of the network run. The necessary and sufficient condi-
where describes the structure of the network. , tion for the gradients to tend to zero is that the AR parts of
a vector, comprises the network’s outputs, the synaptic IIR filters should be stable. This dictates that the
is a vector including the zeros of in IIR-MLP and LAF-MLN should
network’s inputs , and represents lie within the unit circle, which determines the stability region of
the internal states describing the network’s dynamics. the algorithm. Particularly, for the DRNN model, stability sug-
The real process to be modeled by the network, denoted as gests that the weights should lie within the region .
, is obtained by Hence, for stable training of the recurrent networks, the GRPE
algorithm has to be supplied with stability monitoring and a pro-
(18) jection tool. In this paper we follow a simple approach, where
278 IEEE TRANSACTIONS ON ENERGY CONVERSION, VOL. 21, NO. 1, MARCH 2006

the correction term is successively reduced until the new esti-


mates fall within the stability region.
denotes the learning rate taking values in the range [0,
1]. Because of the gradient complexity, it scales the correction
term in the weight updates and reduces the aggressiveness of
GRPE during the initial training phase. does not change at
each time but after each pass through the data of the epoch (it-
eration). Initially, takes a relatively small value, i.e., 0.01,
thus avoiding bad estimates to be obtained. In the following it-
erations, as training proceeds, is raised to unity following
a user-defined profile and the GRPE takes fully over.

B. The DRPE Algorithm


The DRPE is a local learning scheme that is achieved by di-
viding the network weighs into groups, at the neuron level.
Each group consists of the MA and AR weights, and Fig. 4. Wind speed distributions of the four node predictions and the actual
of the synapses pointing to a neuron. Let us consider the wind at the Rokas’ park.
-th neuron (group) described by a weight vector , so
that . The effect of the -th neuron
on the network’s outputs, is locally given while for the LAF-MLN
by the gradient matrix we have . The total number
of weights included in the network is .
Computational analysis shows that the computational load of
(27) the GRPE is while the storage requirements are
. Additionally, the computational and storage cost of the
Apparently, the above gradient is the -th column of the global DRPE is and ,
matrix in (26). Consider also the covariance matrix respectively. Since , it can be seen
of the GRPE in block-diagonal form that the computational burden of DRPE is considerably smaller
compared to the GRPE.
(28)
VI. INPUT SELECTION
Owing to the above weight grouping, GRPE is decomposed into
a set of decoupled algorithms where the weight vector As mentioned before, the recurrent neural networks were
, of each group is independently updated at each time trained and evaluated using 72-h-ahead meteorological predic-
following a recursion similar to the one in (19)–(23) tions at four nodes located in the vicinity of the wind park of
Rokas, Crete. Although wind speed (in ms ), direction (in
(29) degrees), pressure, and temperature predictions are given by
(30) the atmospheric modeling system of SKIRON, only the real
values of the wind speed and power are measured at the park
(31)
site. Finally, the meteorological “prediction set” includes a
(32) linear interpolation of the node predictions derived for the exact
(33) position of the W/F.
Selection of the model’s inputs is the first stage in model
The recursion of each group uses the associated gradient building. This problem has to be properly addressed having a
matrices , and . Additionally, each neuron great effect on the performance of the resulting forecast models.
requires the storage of an individual covariance matrix of Considering the pool of all available candidate inputs, those
size . variables should be selected exhibiting a significant degree of
correlation with regard to the model’s outputs to be forecasted.
C. Algorithmic Complexity The decision as to which variables will be included as model’s
Algorithmic complexity involves the computational cost inputs is made on the basis of two criteria: comparison of the
and the memory demands of a particular learning algorithm. wind speed distributions and the degree of cross-correlation be-
This load is measured in terms of the number of additions and tween the nodes’ and actual wind values. For the Rokas’ park,
multiplications required to train a network for one time-step. these criteria are shown in Fig. 4 and Table I, respectively. Fig. 4
For simplicity, let us consider a two-layered recurrent network shows that the (N, E, W) nodes have similar wind speed profiles
with inputs, one hidden layer with neurons, and as compared to the actual one at the park, covering the entire
neurons in the output layer. Furthermore, let denote speed range. By contrast, the Southern node follows a different
the number of weights pertaining to the group of a neuron distribution with its average speed being relatively low. More-
at the th layer, . As regards the IIR-MLP, we have over, Table I indicates that the Southern node has the smallest
BARBOUNIS et al.: LONG-TERM WIND SPEED AND POWER FORECASTING USING LOCAL RNN MODELS 279

TABLE I
AVERAGE “NODE PREDICTIONS” AND CROSS-CORRELATIONS OF THEM WITH
THE REAL VALUES OF WIND SPEED AND POWER AT THE ROKAS’ PARK

degree of cross-correlation with regard to the wind speed and


especially the wind power, with the rest of the nodes exhibiting
an adequate correlation of almost equivalent level. This is ex-
plained by the fact that, while the (N, E, W) nodes are placed
on the surface of the sea, the Southern node is located on the Fig. 5. Selection of the training and checking data patterns. Each square
land, possibly on a leeward position (see Fig. 1). Therefore, the denotes a 24-hour batch.
Southern node is discarded from being used as forecast model
input. the inputs, as well as the network’s outputs (estimates) at pre-
The input data based on interpolated values were discarded vious time-steps
because the linear interpolation is a very rough tool indeed to
describe such a complex nonlinear system as the one under con-
sideration. Data manipulation justifies the above decision. Fi-
nally, after extensive experimentation, it was found that inclu- The above formulation indicates that the recurrent models ful-
sion of the temperature and the atmospheric pressure as inputs fill all the requirements imposed by an efficient forecast model,
did not offer improved performance of the resulting models. as suggested by (2). Our method combines the time-series ap-
On the contrary, the presence of two more inputs was actually proach [4], [5], and the atmospheric modeling, simultaneously.
slowing down the learning process. Therefore, these variables On the other hand, in the absence of future target data, when con-
are also not included in the input set. ventional static (memoryless) NNs are employed for multi-step
In view of the above discussion, six major inputs are em- ahead prediction, we usually follow two approaches. According
ployed as inputs to the forecast models, comprising the wind to the first approach, a separate model is developed, providing
speeds and directions of the N, E, and W nodes. Before training, the wind forecasts at each time-step ahead. The second approach
the data were normalized within the range . As for the consists of feeding back the model’s output estimates as explicit
wind directions, in order for the networks to discriminate be- inputs through tapped-delay lines. In either case, we are faced
tween values located around the critical point 0 or 360 degrees with two drawbacks. First, the order of the past values to be in-
and establish the correct associations, they were first biased, that troduced to the network is unknown in practice. Secondly, the
is, the degree axis was shifted. Particularly, the bias was set to inclusion of additional inputs aggravates the parametric com-
degrees, for the Rokas’ park, a value decided by observing plexity of the models.
the distribution of the directions in the data. After having determined the model inputs, the training and
Concluding, as can be seen from Fig. 2, the real values of testing data patterns are created, as shown in Fig. 5. The patterns
wind speed and power for a day time period, involve node pre- are arranged in data batches containing 72 hourly node predic-
dictions at the input side obtained at the current, one-day and tions (inputs), along with the 72 respective values of wind speed
two days before. Therefore, to discriminate between the three or power measurements. The data batches are selected in such
different node-predictions corresponding to the same output a way so that neither the training nor the checking data sets are
data, we have introduced an additional input, the input index overlapping, thus guaranteeing complete independence between
denoted as and taking values , and , respectively. these two sets. Finally, in order to ensure continuity of the net-
The model’s input vector contains seven input terms in total, work’s states with the ones of the next 72-hour data batch, we
, and is formulated as follows: introduced in front of each batch the 24-hour data pairs of the
previous day. These extra 24 h are not used for training; they are
simply passed through the network to develop the proper states.
Notice that only the current values of the wind speeds and
VII. SIMULATION RESULTS
directions are used as inputs to the recurrent models at each ,
leading to parsimonious forecast models with a small number of The available data are divided into training and a checking
parameters. It should be mentioned that at the beginning of a day data set, composed of 3264 and 960 patterns, respectively. The
only the meteorological node predictions are given, while the training data set is used for training of the recurrent models
actual wind and power values are unknown (Fig. 2). Neverthe- using the learning algorithms suggested in Section V.
less, because of the MA and the AR parts of the synaptic filters, Moreover, the checking data set is used to evaluate the fore-
the output estimates are recurrently derived using past values of cast performance of the resulting models. For each recurrent
280 IEEE TRANSACTIONS ON ENERGY CONVERSION, VOL. 21, NO. 1, MARCH 2006

network type, two separate forecast models are developed, pro- TABLE II
viding at the beginning of each day, 72-h-ahead forecasts of the MAE AND RMSE FOR WIND POWER AND SPEED FORECASTS OBTAINED BY
THREE MODEL TYPES INCLUDING THREE (N, W, E), TWO (N, W), AND
wind speed and power at the park. On-line training is carried out ONE (W) INPUT NODES, AND TRAINED BY THE GRPE ALGORITHM
on the data batches for 400 epochs.
IIR-MLP networks with two hidden layers and seven neurons
per layer are selected. Both MA and AR parts of order 3 are
considered for the IIR synaptic filters in the hidden neurons.
Especially for the output neuron, an AR part of order 5 is chosen.
This allows the model to learn more efficiently the dependence of
the current output with regard to its past. Structures with similar
number of parameters are chosen for the other two recurrent
networks used. Particularly, a network with two hidden layers
and eight neurons per layer is considered for the LAF-MLN. The
order of the MA parts is set to 4 while the AR parts are the same as
in the case of IIR-MLP models. Finally, a DRNN model with one
hidden layer composed of 32 self-recurrent neurons is selected.
In order to validate our input selection, we considered three
scenarios (models), in which only the one (W), two (N, W), or
three (N, W, E) most correlated nodes are used as inputs to the
networks. Hence, the respective networks have three, five, and
seven inputs, respectively: the selected nodes’ predictions of the
TABLE III
speed and direction, and the input index. MAE AND RMSE FOR WIND POWER AND SPEED FORECASTS OBTAINED BY
The network weights are randomly selected initially THREE MODEL TYPES INCLUDING THREE (N, W, E), TWO (N, W), AND
ONE (W) INPUT NODES, AND TRAINED BY THE DRPE ALGORITHM
in the range while , involved in the AR parts
of the synaptic filters, are initialized so that the roots of the re-
sulting polynomial lie inside the unit circle,
as required for stable operation of the learning algorithm. As an
activation function, the hyperbolic tangent, ,
is used in our experiments. In order to avoid excessive errors
during the training stage caused by bad initial estimates of the
weights, the learning rate was set initially to a small value,
. In the following, as learning proceeds, is grad-
ually increased to unity, following the formula
, where is set to 0.8. The correlation matrix
is initialized as , where is the identity matrix
with size equal to the one of the weight vector.
Finally, during learning, stability monitoring of the RPE al-
gorithms is continuously performed. Assuming that the current
estimates lie outside the stability region, the projection mech-
anism is activated. Accordingly, the correction term is succes-
sively multiplied by a factor of 0.2 until the revised weight es-
timates obtained fulfill the stability conditions for each neuron neurons per layer, and a FIR-NN with two hidden layers and
and network type. linear FIR filters of sixth order. Furthermore, after the neces-
In order to justify the benefit gained by using local recurrent sary modifications to the computation of the network gradients,
NNs, apart from the IIR-MLP, the LAF-MLN and the DRNN, all forecast models are trained by means of the suggested GRPE
two additional static models are examined for comparison, and DRPE algorithms. Under these conditions, the models are
namely, a static MLP and a FIR neural network where the evaluated in terms of representation power and their capabilities
connection weights are realized by linear FIR filters [14]. The to produce efficient multistage forecasts. Nevertheless, an addi-
FIR-NNs are functionally equivalent to static MLPs, although tional case is considered where the static MLP is trained using
with better modeling capabilities since due to the FIR synapses, the conventional BP algorithm.
past values of the inputs are also taken into consideration. The Based on the data available for the Rokas’ park, for each
networks had again three, five, or seven inputs, depending on input case (3, 2, 1 input nodes) and network type, an exhaustive
the node predictions being used. set of experiments is carried out. As a measure to evaluate the
To establish a fair comparison basis, the structure of the com- performance of the forecast models the mean absolute (MAE)
peting networks is selected so that they contain approximately and the root mean square error (rmse) are used. The best results
the same number of parameters as the recurrent networks. Par- achieved for each case are cited in Tables II and III, where the
ticularly, a static MLP is chosen with two hidden layers and 20 models are trained by GRPE and the DRPE, respectively.
BARBOUNIS et al.: LONG-TERM WIND SPEED AND POWER FORECASTING USING LOCAL RNN MODELS 281

Fig. 6. Power forecast errors (MAE) for a horizon of 72 h obtained by the Fig. 7. Percentage improvement over the persistent method of the power
IIR-MLP, LAF-MLN, DRNN, and the static MLP (trained by BP) with three forecasts achieved by IIR-MLP, LAF-MLN, DRNN, and the static MLP
input nodes, along with the errors of the persistent method. (trained by BP) with three input nodes.

It can be seen that the wind speed and power performance


is improved when additional nodes are included as inputs. The
best results are attained for each network type when the com-
plete models are considered, including all three nodes (N, E, W).
Since the meteorological predictions at the three most correlated
nodes contain almost an equivalent amount of information, the
above observation verifies experimentally our input selection.
For the models trained by GRPE (Table II) it can be seen
that the recurrent forecast models exhibit consistently better
performance compared to the static models, FIR-NN and the
static MLP, thus justifying their use for multiple-step wind
forecasting. Owing to the richness of the network architecture,
IIR-MLP shows the best performance, outperforming the
LAF-MLN and the DRNN models in all cases. Particularly,
IIR-MLP provides for the wind power and speed a MAE (rmse)
Fig. 8. Speed forecast errors (MAE) for a horizon of 72 h obtained by the
of 1.2117 MW (1.5263) and 1.9755 ms (2.7211). Regarding IIR-MLP, the LAF-MLN, the DRNN, and the static MLP (trained by BP) with
the power forecasts, it outperforms the FIR-NN and the static three input nodes, along with the errors of the persistent method.
MLP by 11.82% and 12.7% in terms of MAE. Furthermore,
for the speed forecasts, an improvement of 7.12% and 9.44% is h ahead) where it is suggested to use short-term models
obtained, respectively. Notice that the static models in Table II (or a coupling of them with long-term models) in order to
are trained using an enhanced learning scheme, the GRPE al- outperform persistent forecast.
gorithm. Assuming that the MLP is trained by the conventional — Compared to the static MLP, the recurrent models exhibit,
BP method, a considerably larger improvement is achieved. In consistently, the best forecasting errors for all time-steps
that case, the IIR-MLP outperforms the static MLP by 24.94% ahead; the best performance is shown by IIR-MLP. In
and 21.10% for the wind power and speed, respectively. contrast to the persistent method, the errors move in the
Fig. 6 depicts the forecasting errors (MAE) of the wind vicinity of the MAE value (1.2117 MW) for the entire
power, for the complete models of the three recurrent networks time horizon. This indicates the capability of the recurrent
trained by GRPE and the static MLP, along with the errors models to produce robust multi-step ahead predictions.
obtained by the persistent method. Fig. 7 shows the percentage As a result, considerable improvement is attained over
improvement in wind power forecasting of each network model the persistent forecasts. Particularly, an average improve-
over the persistent method, on the basis of MAE criterion. In ment of over 50% is obtained for time-steps larger that
view of the above figures, the following observations are in 20, although smaller improvement is shown for shorter
order. time lags. Similar observations are also valid for the wind
— The persistent error is almost monotonically increasing speed, as shown in Figs. 8 and 9 where the forecasting
when larger time-steps are considered in the future. This errors and improvement over the persistent method are
behavior is continued for the first 50 steps ahead, settling given.
to a high error level for the rest of the time horizon. Nev- The ability of the model to learn the process dynamics and
ertheless, good estimates are obtained for the first few provide efficient forecasts is demonstrated in Fig. 10. A typical
time-steps as expected. This is the time range (up to 6–8 72-h curve belonging to the checking data set is considered,
282 IEEE TRANSACTIONS ON ENERGY CONVERSION, VOL. 21, NO. 1, MARCH 2006

Fig. 9. Percentage improvement over the persistent method of the speed Fig. 11. Real wind speed (solid line) and predicted wind speed (dashed line)
forecasts achieved by IIR-MLP, the LAF-MLN, the DRNN, and the static MLP in meters per second (m/s), for a typical speed forecast curve of the checking
(trained by BP) with three input nodes. data set, along with the respective predictions for three surrounding nodes.

nodes nearby the park site. Two novel learning algorithms are
introduced for the training of the recurrent forecast models, the
GRPE and the DRPE, that have considerably smaller computa-
tional and storage requirements. Extensive experimentation and
model comparison reveals the effectiveness of the suggested
learning methods. Moreover, it is shown that the recurrent
forecast models outperform the static rivals in terms of forecast
errors and the improvement gained over the persistent method.

APPENDIX
In view of the multilayer structure of the IIR-MLP, we can
distinguish three distinct cases as described below.
Case 1: Gradients of the neuron’s output with respect
to synaptic weights .
First, from (2) and (3), we have
Fig. 10. Real wind power (solid line) and predicted wind power (dashed line)
in MW, for a typical power forecast curve of the checking data set.
(A.1)
including the actual values of the wind power at the park and
the outputs of an IIR-MLP model. As shown, the forecasts are Applying the ordered derivatives forward chain rule with respect
good, following the trends of the real power closely. Similarly, in to ’s, we finally get
Fig. 11 a typical curve is plotted including wind speed forecasts
on the park site, with the nodes’ respective forecasts provided
by the SKIRON system. (A.2)
From the results shown in Table III it can be concluded that
the forecast models trained by the simplified local algorithm, the and following an adaptive filter notation as
DRPE, exhibit slightly inferior performance as contrasted to the
ones obtained by the GRPE (Table II). Notice however, that as
(A.3)
revealed from the complexity analysis, DRPE has considerably
less requirements than GRPE, in terms of computational cost
and storage needs. Nevertheless, the overall picture remains the where and denote an ordinary and an ordered
same, that is, the recurrent models outperform the static forecast derivative. Similarly, differentiation with respect to ’s leads to
models, with the best results achieved by the IIR-MLP. the following relations:

VIII. CONCLUSION
(A.4)
Three local recurrent neural networks are employed in this
paper, providing 72 time-steps ahead forecasts of the wind
speed and power at the Rokas’ wind park on the Greek island of (A.5)
Crete. Forecasting is based on meteorological data given at four
BARBOUNIS et al.: LONG-TERM WIND SPEED AND POWER FORECASTING USING LOCAL RNN MODELS 283

Case 2: Gradients of the neuron’s output of the Weather Forecasting Group (AM&WFG) of the University of
th layer with respect to synaptic weights of the th layer. Athens, Athens, Greece [24]. The authors especially wish to
Using the network’s architecture and the forward chain for- thank Prof. G. Kallos and Dr. P. Katsafados for their collabora-
mula, we get tion. Actual data from the Rokas W/F at Crete are provided by
Public Power Corporation of Greece. All data were gathered
and given to the authors under the frame of the MORE-CARE
project supported by the European Commission.

REFERENCES
(A.6) [1] E. A. Bossanyi, “Short-term wind prediction using Kalman filters,” Wind
Eng., vol. 9, no. 1, pp. 1–8, 1985.
[2] J. O. G. Tande and L. Landberg, “A 10 sec. forecast of wind turbine
output with neural networks,” in Proc. 4th European Wind Energy Conf.
(EWEC’93), Lübeck-Travemünde, Germany, 1993, pp. 747–777.
(A.7) [3] G. C. Contaxis and J. Kabouris, “Short-term scheduling in wind/diesel
autonomous system,” IEEE Trans. Power Syst., vol. 6, no. 3, pp.
1161–1167, Aug. 1991.
[4] G. N. Kariniotakis, G. S. Stavrakakis, and E. F. Nogaret, “Wind power
The required gradients are derived through forecasting using advanced neural networks models,” IEEE Trans. En-
ergy Convers., vol. 11, no. 4, pp. 762–767, Dec. 1996.
[5] A. More and M. C. Deo, “Forecasting wind with neural networks,” Ma-
rine Structures, vol. 16, pp. 35–49, 2003.
(A.8) [6] L. Landberg and S. J. Watson, “Short-term prediction of local wind con-
ditions,” Boundary-Layer Meteorol., vol. 70, p. 171, 1994.
[7] L. Landberg, A. Joensen, G. Giebel, H. Madsen, and T. S. Nielsen,
“Short-term prediction toward the 21st century,” in Proc. British Wind
Case 3: Gradients of the neuron’s output of the Energy Association, vol. 21, Cambridge, U.K., 1999.
[8] L. Landberg and A. Joensen, “A model to predict the output from wind
th, layer with respect to synaptic weights of farms—An update,” in Proc. British Wind Energy Association, vol. 20,
the th layer. Cardiff, Wales, U.K., 1998.
Based on the network’s structure, we have [9] S. J. Watson, G. Giebel, and A. Joensen, “The economic value of accu-
rate wind power forecasting to utilities,” in Proc. EWEC99, Nice, France,
1999, pp. 1109–1012.
[10] T. S. Nielsen and H. Madsen, “Experiences with statistical methods
for wind power prediction,” in Proc. EWEC99, Nice, France, 1999, pp.
1066–1069.
[11] E. Akylas, “Investigation of the effects of wind speed forecasts and eco-
nomic Evaluation of the increased penetration of wind energy for the
island of Crete,” in Proc. EWEC99, Nice, France, 1999, pp. 1074–1077.
(A.9) [12] G. Kariniotakis, D. Mayer, J. A. Halliday, A. G. Dutton, A. D. Irving, R.
A. Brownsword, P. S. Dokopoulos, and M. C. Alexiadis, “Load, wind,
and hydro power forecasting functions of the more-care EMS system,”
in Proc. Med Power 2002, Athens, Greece, Nov. 2002.
The above equation can be rewritten as [13] A. C. Tsoi and A. D. Back, “Locally recurrent globally feedforward net-
works: A critical review of architectures,” IEEE Trans. Neural Netw.,
vol. 5, no. 3, pp. 229–239, May 1994.
[14] E. A. Wan, “Temporal backpropagation for FIR neural networks,” in
(A.10) Proc. Int. Joint Conf. Neural Networks,, vol. 1, 1990, pp. 575–580.
[15] S. Haykin, Neural Networks: A Comprehensive Foundation. New
York: IEEE Press, 1994.
[16] P. Frasconi, M. Gori, and G. Soda, “Local feedback multilayered net-
The required derivatives for the neuron outputs are works,” Neural Comput., vol. 4, pp. 120–130, 1992.
[17] C.-C. Ku and K. Y. Lee, “Diagonal recurrent neural networks for dy-
namic systems control,” IEEE Trans. Neural Netw., vol. 6, no. 1, pp.
144–156, Jan. 1995.
[18] P. J. Werbos, “Beyond Regression: New Tools for Prediction and Anal-
ysis in the Behavioral Sciences,” Ph.D. dissertation, Committee on Appl.
Math., Harvard Univ., Cambridge, MA, 1974.
(A.11) [19] E. A. Wan and F. Beaufays, “Diagrammatic derivation of gradient algo-
where rithms for neural networks,” Neural Comput., vol. 8, pp. 182–201, 1996.
[20] R. J. Williams and J. Peng, “An efficient gradient-based algorithm for
on-line training of recurrent network structures,” Neural Comput., vol.
2, pp. 490–501, 1990.
(A.12) [21] L. Ljung and T. Söderström, Theory and Practice of Recursive Identifi-
cation, Cambridge, U.K.: MIT Press, 1983.
[22] S. Shah, F. Palmieri, and M. Datum, “Optimal filtering algorithms for
fast learning in feedforward neural networks,” Neural Netw., vol. 5, pp.
779–787, 1992.
ACKNOWLEDGMENT [23] G. V. Puskorius and L. A. Feldkamp, “Neurocontrol of nonlinear dy-
namical systems with Kalman filter trained recurrent networks,” IEEE
SKIRON is a weather forecasting system initially developed Trans. Neural Netw., vol. 5, no. 2, pp. 279–297, Mar. 1994.
for the Hellenic Meteorological Service based on the Eta/NCEP [24] G. Kallos, S. Nickovic, A. Papadopoulos, and P. Katsafados, “The SK-
IRON forecasting system and its capability to predict extreme weather
model. The meteorological numerical predictions are now pro- events in the Mediterranean,” in Proc. 7th Int. Symp. Natural and Man-
duced on a daily basis by the Atmospheric Modeling and Made Hazards (HAZARDS-98), Chania, Greece, May 1998.
284 IEEE TRANSACTIONS ON ENERGY CONVERSION, VOL. 21, NO. 1, MARCH 2006

Thanasis G. Barbounis was born in Lamia, Greece, Minas C. Alexiadis was born in Thessaloniki,
in 1977. He graduated in electrical engineering in Greece, in July 1969. He received the Dipl. Eng.
1999 from the Aristotle University of Thessaloniki, degree in 1994 and the Ph.D. degree in 2003, both
Thessaloniki, Greece, where he is currently pursuing from the Department of Electrical and Computer
the Ph.D. degree. Engineering, Aristotle University of Thessaloniki,
His research interests include artificial neural net- Thessaloniki, Greece.
works, fuzzy logic systems, and modeling of non- His research interests include renewable energy
linear systems. sources and artificial intelligence applications in
power systems.

Petros S. Dokopoulos (M’77) was born in Athens,


Greece, in September 1939. He received the Dipl.
Eng. degree from the Technical University of Athens,
Athens, Greece, in 1962 and the Ph.D. degree from
the University of Brunswick, Brunswick, Germany,
in 1967.
He was with the Laboratory for High Voltage
John B. Theocharis (M’90) graduated in electrical and Transmission, University of Brunswick
engineering in 1980 and received the Ph.D. degree in (1962–1967), the Nuclear Research Center at Julich,
1985 , both from the Aristotle University of Thessa- Julich, Germany (1967–1974), and the Joint Euro-
loniki, Thessaloniki, Greece. pean Torus (1974–1978). Since 1978, he has been a
He is currently an Associate Professor in the De- Full Pofessor at the Department of Electrical Engineering, Aristotle University
partment of Electronic and Computer Engineering, of Thessaloniki. He has worked as a Consultant to Brown Boveri and Cie,
Aristotle University of Thessaloniki. His research ac- Mannheim, Germany, to Siemens, Erlagen, Germany, to Public Power Corpo-
tivities include fuzzy systems, neural networks, adap- ration, Greece, and to National Telecommunication Organization, Greece. His
tive control, and modeling of complex nonlinear sys- scientific fields of interest are dielectrics, power switches, generators, power
tems. cables, alternative energy sources, transmission, and distribution and fusion.

You might also like