Long-Term Wind Speed and Power Forecasting Using Local RNNs Models
Long-Term Wind Speed and Power Forecasting Using Local RNNs Models
Abstract—This paper deals with the problem of long-term wind scales in the order of several minutes or even hours are encoun-
speed and power forecasting based on meteorological information. tered, when power system scheduling is to be addressed [3]. In
Hourly forecasts up to 72-h ahead are produced for a wind park these cases, the time series approach is usually followed, as in
on the Greek island of Crete. As inputs our models use the numer-
ical forecasts of wind speed and direction provided by atmospheric [4] where a recurrent high-order neural network (NN) is used
modeling system SKIRON for four nearby positions up to 30 km for short-term prediction of wind power. Furthermore, similar
away from the wind turbine cluster. Three types of local recur- models have been applied for daily, weekly, or even monthly
rent neural networks are employed as forecasting models, namely, time series [5].
the infinite impulse response multilayer perceptron (IIR-MLP), the
Long-term prediction of wind power allows planning the con-
local activation feedback multilayer network (LAF-MLN), and the
diagonal recurrent neural network (RNN). These networks contain nection or disconnection of wind turbines or conventional gen-
internal feedback paths, with the neuron connections implemented erators, thus achieving low spinning reserve and optimal oper-
by means of IIR synaptic filters. Two novel and optimal on-line ating cost. It refers to hourly data and a time horizon of up to two
learning schemes are suggested for the update of the recurrent net- to three days ahead. In such cases, the statistical properties of
work’s weights based on the recursive prediction error algorithm.
The methods assure continuous stability of the network during the
the wind are not helpful, and hence, we have to rely on approx-
learning phase and exhibit improved performance compared to the imate wind forecasts provided by the national meteorological
conventional dynamic back propagation. Extensive experimenta- services. These predictions are calculated for some predefined
tion is carried out where the three recurrent networks are addi- reference points, not necessarily on the position of the park, so
tionally compared to two static models, a finite-impulse response the need arises for the reduction of these predictions to the site
NN (FIR-NN) and a conventional static-MLP network. Simulation
results demonstrate that the recurrent models, trained by the sug- of interest.
gested methods, outperform the static ones while they exhibit sig- In the past, considerable efforts have been devoted to utilizing
nificant improvement over the persistent method. meteorological information to derive the required forecasts.
Index Terms—Local recurrent neural networks, long-term wind Micro- and Meso-scale models (such as WASP or PARK) are
power forecasting, nonlinear recursive least square learning, real actually deterministic models that suggest various correcting pa-
time learning. rameters according to the terrain properties (orography, rough-
ness). They also take consideration of the number, type, and
I. INTRODUCTION location of wind turbines in the wind farm, the position, the
hub-height, the power curve of each one to produce the total
Fig. 1. Geographical location of the wind park of Rokas, on the Greek island of
Crete. Also shown are the nodes where meteorological predictions are available
and the prevailing wind direction. Fig. 2. Configuration of the forecasting approach. The node predictions and
the wind forecasts produced by the models are given at the beginning of each
day. They cover a time period of 72-h ahead and are updated every 24 h.
network (LAF-MLN) and the diagonal recurrent neural network
(DRNN) model. Two novel and efficient learning schemes are The real data provided by the W/F SCADA system include
developed, a global and a decoupled approach of the recursive values of the wind speed measured at a reference point within
prediction error (RPE) algorithm, for updating the network’s the park and the total power of the farm, denoted as and
weights, suitable for on-line applications. The experimental , respectively. The data are recorded hourly from April
results show that accurate forecasts are obtained, with the 1st, 2000 until December 31st, 2000.
recurrent forecast models exhibiting superior performance with Given the node predictions, the objective is to develop an ad-
regard to the static networks. vanced forecast model providing 72-h-ahead wind speed and
power forecasts at the park, denoted as and , respec-
II. PROBLEM FORMULATION tively. The configuration of the forecasting approach is depicted
Let us consider the Rokas W/F with a capacity of 10.2 MW in Fig. 2. The wind estimates are given at the beginning of each
located at Eastern Crete, Greece. For efficient maintenance and day and the model is generally described by
resource planning of WECS, it is very helpful for us to know in
advance the wind speed and power at the park, for a time horizon
including a few days ahead. This allows an optimal policy to (2)
be designed, using the optimum amount of the available wind
turbines and scheduling the possible need for storing or supple- where stands for either or represents the
menting the generated power. nonlinear mapping function of the process, and the vectors
Due to the large time scale, the main source of information and are given as
is the wind meteorological predictions, near-surface “node pre-
dictions”, calculated for 4 specific positions (N, S, E, W) lo-
cated around the wind park (see Fig. 1). The node predictions
are given once per day, and assume for simplicity that they are (3)
available at the beginning of each day , where
is the number of days considered in the data set. For each Apparently, there are both spatial and temporal correlations
node and day, the meteorological data are formulated as records involved between the node predictions and the wind variables
comprising predictions of the wind speed and direction at the at the park to be forecasted, rendering the system a highly
succeeding 72-h ahead complex, dynamic, and nonstationary process. The wind values
are affected by large-scale atmospheric conditions and the
morphology of the surface landscape.
(1) Efficient forecasting dictates that the model should exhibit
the following properties. First, for each , the current and past
where , and values of the node predictions should be considered as model
denote the meteorological node predictions of inputs, as suggested by , so that the model can prop-
wind speed and direction, respectively, at time . erly identify the input trends and variations. Moreover, in the
Also available are the predictions of atmospheric pressure and absence of real wind speed and power values for times greater
temperature at the park site. than , the model’s previous estimates, , should be used
BARBOUNIS et al.: LONG-TERM WIND SPEED AND POWER FORECASTING USING LOCAL RNN MODELS 275
(4)
in a compact way using a notation employed in the adaptive filter while the network’s output is determined by
theory, as follows:
(16)
(8)
Notice that the DRNN can be derived as a special case of the
where LAF-MLN for . Particularly, we consider only the con-
stant terms of the MA filters for all neurons. More-
(9) over, the AR parts are reduced to first order filters
for the hidden layer’s neurons, while feedback is broken for the
neurons in the output layer.
(10)
IV. GRADIENT CALCULATIONS
and is the delay operator, . It The learning algorithm to be developed in the next section re-
can be seen that the IIR-MLP architecture realizes a sufficiently quires the computation of the gradients of the network’s output
rich class of models. It includes several known neural types as with respect to all trainable weights. Because of the existence
special cases, depending on the parameter settings of the IIR of internal dynamics, the traditional procedure of the standard
synaptic filters. For instance, the IIR-LMP can be reduced to a back propagation (BP) cannot be applied to determine these gra-
FIR-NN [14] by eliminating the feedback connections dients. Therefore, we employ the method of ordered derivatives
. Moreover, IIR-MLP reduces to the conventional static MLP [18], extensively used in the literature for calculating partial
[15] by discarding the MA and the AR parts of the synaptic fil- derivatives in complex recurrent networks. Notice that since the
ters . recursive approach is adopted, the chain rule derivative expan-
sion is developed in a forward fashion.
B. The LAF-MLN Model The gradients relate differential changes of the neuron outputs
The neuron model of the LAF-MLN structure [16] is shown to differential changes of a weight within the network. Consider
in Fig. 3(b). The output of a neuron summing-node is filtered an arbitrary weight denoting either or of
through an autoregressive (AR) adaptive filter before feeding a synapse ending to the th neuron of the th layer. Derivation of
the activation function. It should be noticed that, regarding the the gradient calculations for the IIR-MLP model is given in the
structural complexity, the LAF-MLN is considerably simpler Appendix. The respective calculations for the LAF-MLN and
compared to the IIR-MLP. This is due to the fact that while an the DRNN proceed along similar lines and therefore they are
AR part is introduced in every synaptic link in IIR-MLP, a single omitted.
AR recursion exists for each neuron in the LAF-MLN architec- Notice that the calculation of the gradients is achieved
ture. As a result, for the same network structure, the latter net- through higher order recurrent difference equations. This is
work contains a smaller number of tunable weights compared opposed to static network structures where the weights are
to the former one. updated using static relations [15]. As regards the IIR-MLP
The forward run equations for the LAF-MLN are described model and based on the gradient analysis described in the
as follows: Appendix, the following comments are in order. The gradients
of the synaptic filter output with regard to the filter weights
and are derived by passing and
, respectively, through the AR part of the synaptic
filter. Furthermore, at each t, the gradients of the synaptic filter
(11) outputs belonging to the succeeding layer
with respect to ( or ), where the weight change is
(12) assumed, are calculated by passing the corresponding gradients
of the th neuron’s output at the th layer through the filter
(13) . Hence, the gradient dynamics is identical to the forward
one, describing the dynamics of through the
C. The DRNN Model synapse to produce . Finally, the gradients of the filter
In order to further reduce the structural complexity issue, we outputs of the th layer with are derived
consider the DRNN, a modified form of the fully recurrent model in terms of the gradients of the neuron outputs of the preceding
[17]. DRNN is a two-layer network [Fig. 3(c)], where the hidden layer th with respect to .
layer contains self-recurrent neurons while the output layer is Following the above observations, the “gradient amidst dy-
composed of linear neurons. The hidden layer equations are namics” with respect to an arbitrary weight is described
in terms of an auxiliary network, , called the sensitivity net-
(14)
work [19]. For each , the associated is formed as
a sub-network of the original one, starting with the -th node
(15) of the -th layer and proceeding ahead until the output layer.
BARBOUNIS et al.: LONG-TERM WIND SPEED AND POWER FORECASTING USING LOCAL RNN MODELS 277
is fed by the gradients of with respect to while where is the prediction error for a particular value of .
its output is the gradient of the network outputs with regard to According to the GRPE method, all network weights are con-
. tinuously determined at each using the following recursion:
Notice that the gradient calculations are carried out, through
(19)
the sensitivity networks, in parallel with the forward network,
following similar dynamics. That is, the gradients are calculated (20)
on an on-line basis, as the original network runs in time. Hence, (21)
the use of the sensitivity networks provides a transparent tool for (22)
the evolution of the network gradients. It is assumed that during
(23)
on-line training the weights are changing smoothly, that is, they
remain almost constant. In that case, the weight gradients of where is the forgetting factor, . The GRPE algo-
higher order can be obtained as delayed versions rithm provides a recursive way of minimizing a quadratic crite-
of the gradients with respect to . rion of the prediction errors using the stochastic Gauss-Newton
search method [21]. The algorithm is identical to a nonlinear
V. LEARNING ALGORITHM recursive least squares (RLS) [22] method that minimizes the
Having determined the network gradients, we proceed to de- error criterion
veloping the learning algorithms for the on-line update of the
tunable weights. The most common algorithm used for on-line (24)
training of recurrent networks is real-time recurrent learning
(RTRL) [20], where the weight update is performed along the Furthermore, the recursion (19)–(23) has strong similarities to
negative gradient direction. Nevertheless, this method exhibits the extended Kalman filter (EKF) algorithm used in [23].
slow convergence rates because of the small learning rates re- is the gain matrix controlling the weight update and is the
quired, and most often becomes trapped to local minima. In error covariance matrix that define the search changes along the
an attempt to improve the performance, we employ an optimal Gauss–Newton direction. Assuming that no prior information
learning scheme, the recursive prediction error (RPE) identifi- is available, is usually taken , where is an
cation algorithm, with enhanced training qualities. Owing to the arbitrary large number.
second order information used in the update recursions, better- A key issue for the performance of the algorithm is the matrix
conditioned training is accomplished compared to the RTRL defined as follows:
method.
Along this direction, two novel algorithms are suggested for (25)
training the local feedback recurrent networks considered in
this paper, where the stability issue is assured throughout the where is a matrix containing the partial deriva-
learning process. First a global scheme is developed, called the tives of the predictor’s model, that is, the network’s outputs,
global RPE (GRPE), where all weights are simultaneously up- with respect to the trainable weights
dated. Additionally, to cope with the increased computational
complexity of the GRPE, we devised a local version of the al-
gorithm, called the decoupled RPE (DRPE). The DRPE is de-
rived by partitioning the global optimization problem into a set .. .. ..
. . . (26)
of manageable sub-problems, at the neuron level. Thus, consid-
erable computational and storage savings are gained while pre-
serving high accuracy qualities, similar to the ones of the GRPE.
A. The GRPE Algorithm These gradients are computed using the sensitivity networks, as
described in Section IV.
Let denote an n-dimensional composite vector including in (22) implements a projection mechanism into the
all synaptic weights of the recurrent network under considera- stability region. As discussed in Section IV, the gradient dy-
tion. We consider a nonlinear recurrent predictor of the type namics are identical to the forward dynamics of the recurrent
(17) network. Hence, learning stability also guarantees stable oper-
ation of the network run. The necessary and sufficient condi-
where describes the structure of the network. , tion for the gradients to tend to zero is that the AR parts of
a vector, comprises the network’s outputs, the synaptic IIR filters should be stable. This dictates that the
is a vector including the zeros of in IIR-MLP and LAF-MLN should
network’s inputs , and represents lie within the unit circle, which determines the stability region of
the internal states describing the network’s dynamics. the algorithm. Particularly, for the DRNN model, stability sug-
The real process to be modeled by the network, denoted as gests that the weights should lie within the region .
, is obtained by Hence, for stable training of the recurrent networks, the GRPE
algorithm has to be supplied with stability monitoring and a pro-
(18) jection tool. In this paper we follow a simple approach, where
278 IEEE TRANSACTIONS ON ENERGY CONVERSION, VOL. 21, NO. 1, MARCH 2006
TABLE I
AVERAGE “NODE PREDICTIONS” AND CROSS-CORRELATIONS OF THEM WITH
THE REAL VALUES OF WIND SPEED AND POWER AT THE ROKAS’ PARK
network type, two separate forecast models are developed, pro- TABLE II
viding at the beginning of each day, 72-h-ahead forecasts of the MAE AND RMSE FOR WIND POWER AND SPEED FORECASTS OBTAINED BY
THREE MODEL TYPES INCLUDING THREE (N, W, E), TWO (N, W), AND
wind speed and power at the park. On-line training is carried out ONE (W) INPUT NODES, AND TRAINED BY THE GRPE ALGORITHM
on the data batches for 400 epochs.
IIR-MLP networks with two hidden layers and seven neurons
per layer are selected. Both MA and AR parts of order 3 are
considered for the IIR synaptic filters in the hidden neurons.
Especially for the output neuron, an AR part of order 5 is chosen.
This allows the model to learn more efficiently the dependence of
the current output with regard to its past. Structures with similar
number of parameters are chosen for the other two recurrent
networks used. Particularly, a network with two hidden layers
and eight neurons per layer is considered for the LAF-MLN. The
order of the MA parts is set to 4 while the AR parts are the same as
in the case of IIR-MLP models. Finally, a DRNN model with one
hidden layer composed of 32 self-recurrent neurons is selected.
In order to validate our input selection, we considered three
scenarios (models), in which only the one (W), two (N, W), or
three (N, W, E) most correlated nodes are used as inputs to the
networks. Hence, the respective networks have three, five, and
seven inputs, respectively: the selected nodes’ predictions of the
TABLE III
speed and direction, and the input index. MAE AND RMSE FOR WIND POWER AND SPEED FORECASTS OBTAINED BY
The network weights are randomly selected initially THREE MODEL TYPES INCLUDING THREE (N, W, E), TWO (N, W), AND
ONE (W) INPUT NODES, AND TRAINED BY THE DRPE ALGORITHM
in the range while , involved in the AR parts
of the synaptic filters, are initialized so that the roots of the re-
sulting polynomial lie inside the unit circle,
as required for stable operation of the learning algorithm. As an
activation function, the hyperbolic tangent, ,
is used in our experiments. In order to avoid excessive errors
during the training stage caused by bad initial estimates of the
weights, the learning rate was set initially to a small value,
. In the following, as learning proceeds, is grad-
ually increased to unity, following the formula
, where is set to 0.8. The correlation matrix
is initialized as , where is the identity matrix
with size equal to the one of the weight vector.
Finally, during learning, stability monitoring of the RPE al-
gorithms is continuously performed. Assuming that the current
estimates lie outside the stability region, the projection mech-
anism is activated. Accordingly, the correction term is succes-
sively multiplied by a factor of 0.2 until the revised weight es-
timates obtained fulfill the stability conditions for each neuron neurons per layer, and a FIR-NN with two hidden layers and
and network type. linear FIR filters of sixth order. Furthermore, after the neces-
In order to justify the benefit gained by using local recurrent sary modifications to the computation of the network gradients,
NNs, apart from the IIR-MLP, the LAF-MLN and the DRNN, all forecast models are trained by means of the suggested GRPE
two additional static models are examined for comparison, and DRPE algorithms. Under these conditions, the models are
namely, a static MLP and a FIR neural network where the evaluated in terms of representation power and their capabilities
connection weights are realized by linear FIR filters [14]. The to produce efficient multistage forecasts. Nevertheless, an addi-
FIR-NNs are functionally equivalent to static MLPs, although tional case is considered where the static MLP is trained using
with better modeling capabilities since due to the FIR synapses, the conventional BP algorithm.
past values of the inputs are also taken into consideration. The Based on the data available for the Rokas’ park, for each
networks had again three, five, or seven inputs, depending on input case (3, 2, 1 input nodes) and network type, an exhaustive
the node predictions being used. set of experiments is carried out. As a measure to evaluate the
To establish a fair comparison basis, the structure of the com- performance of the forecast models the mean absolute (MAE)
peting networks is selected so that they contain approximately and the root mean square error (rmse) are used. The best results
the same number of parameters as the recurrent networks. Par- achieved for each case are cited in Tables II and III, where the
ticularly, a static MLP is chosen with two hidden layers and 20 models are trained by GRPE and the DRPE, respectively.
BARBOUNIS et al.: LONG-TERM WIND SPEED AND POWER FORECASTING USING LOCAL RNN MODELS 281
Fig. 6. Power forecast errors (MAE) for a horizon of 72 h obtained by the Fig. 7. Percentage improvement over the persistent method of the power
IIR-MLP, LAF-MLN, DRNN, and the static MLP (trained by BP) with three forecasts achieved by IIR-MLP, LAF-MLN, DRNN, and the static MLP
input nodes, along with the errors of the persistent method. (trained by BP) with three input nodes.
Fig. 9. Percentage improvement over the persistent method of the speed Fig. 11. Real wind speed (solid line) and predicted wind speed (dashed line)
forecasts achieved by IIR-MLP, the LAF-MLN, the DRNN, and the static MLP in meters per second (m/s), for a typical speed forecast curve of the checking
(trained by BP) with three input nodes. data set, along with the respective predictions for three surrounding nodes.
nodes nearby the park site. Two novel learning algorithms are
introduced for the training of the recurrent forecast models, the
GRPE and the DRPE, that have considerably smaller computa-
tional and storage requirements. Extensive experimentation and
model comparison reveals the effectiveness of the suggested
learning methods. Moreover, it is shown that the recurrent
forecast models outperform the static rivals in terms of forecast
errors and the improvement gained over the persistent method.
APPENDIX
In view of the multilayer structure of the IIR-MLP, we can
distinguish three distinct cases as described below.
Case 1: Gradients of the neuron’s output with respect
to synaptic weights .
First, from (2) and (3), we have
Fig. 10. Real wind power (solid line) and predicted wind power (dashed line)
in MW, for a typical power forecast curve of the checking data set.
(A.1)
including the actual values of the wind power at the park and
the outputs of an IIR-MLP model. As shown, the forecasts are Applying the ordered derivatives forward chain rule with respect
good, following the trends of the real power closely. Similarly, in to ’s, we finally get
Fig. 11 a typical curve is plotted including wind speed forecasts
on the park site, with the nodes’ respective forecasts provided
by the SKIRON system. (A.2)
From the results shown in Table III it can be concluded that
the forecast models trained by the simplified local algorithm, the and following an adaptive filter notation as
DRPE, exhibit slightly inferior performance as contrasted to the
ones obtained by the GRPE (Table II). Notice however, that as
(A.3)
revealed from the complexity analysis, DRPE has considerably
less requirements than GRPE, in terms of computational cost
and storage needs. Nevertheless, the overall picture remains the where and denote an ordinary and an ordered
same, that is, the recurrent models outperform the static forecast derivative. Similarly, differentiation with respect to ’s leads to
models, with the best results achieved by the IIR-MLP. the following relations:
VIII. CONCLUSION
(A.4)
Three local recurrent neural networks are employed in this
paper, providing 72 time-steps ahead forecasts of the wind
speed and power at the Rokas’ wind park on the Greek island of (A.5)
Crete. Forecasting is based on meteorological data given at four
BARBOUNIS et al.: LONG-TERM WIND SPEED AND POWER FORECASTING USING LOCAL RNN MODELS 283
Case 2: Gradients of the neuron’s output of the Weather Forecasting Group (AM&WFG) of the University of
th layer with respect to synaptic weights of the th layer. Athens, Athens, Greece [24]. The authors especially wish to
Using the network’s architecture and the forward chain for- thank Prof. G. Kallos and Dr. P. Katsafados for their collabora-
mula, we get tion. Actual data from the Rokas W/F at Crete are provided by
Public Power Corporation of Greece. All data were gathered
and given to the authors under the frame of the MORE-CARE
project supported by the European Commission.
REFERENCES
(A.6) [1] E. A. Bossanyi, “Short-term wind prediction using Kalman filters,” Wind
Eng., vol. 9, no. 1, pp. 1–8, 1985.
[2] J. O. G. Tande and L. Landberg, “A 10 sec. forecast of wind turbine
output with neural networks,” in Proc. 4th European Wind Energy Conf.
(EWEC’93), Lübeck-Travemünde, Germany, 1993, pp. 747–777.
(A.7) [3] G. C. Contaxis and J. Kabouris, “Short-term scheduling in wind/diesel
autonomous system,” IEEE Trans. Power Syst., vol. 6, no. 3, pp.
1161–1167, Aug. 1991.
[4] G. N. Kariniotakis, G. S. Stavrakakis, and E. F. Nogaret, “Wind power
The required gradients are derived through forecasting using advanced neural networks models,” IEEE Trans. En-
ergy Convers., vol. 11, no. 4, pp. 762–767, Dec. 1996.
[5] A. More and M. C. Deo, “Forecasting wind with neural networks,” Ma-
rine Structures, vol. 16, pp. 35–49, 2003.
(A.8) [6] L. Landberg and S. J. Watson, “Short-term prediction of local wind con-
ditions,” Boundary-Layer Meteorol., vol. 70, p. 171, 1994.
[7] L. Landberg, A. Joensen, G. Giebel, H. Madsen, and T. S. Nielsen,
“Short-term prediction toward the 21st century,” in Proc. British Wind
Case 3: Gradients of the neuron’s output of the Energy Association, vol. 21, Cambridge, U.K., 1999.
[8] L. Landberg and A. Joensen, “A model to predict the output from wind
th, layer with respect to synaptic weights of farms—An update,” in Proc. British Wind Energy Association, vol. 20,
the th layer. Cardiff, Wales, U.K., 1998.
Based on the network’s structure, we have [9] S. J. Watson, G. Giebel, and A. Joensen, “The economic value of accu-
rate wind power forecasting to utilities,” in Proc. EWEC99, Nice, France,
1999, pp. 1109–1012.
[10] T. S. Nielsen and H. Madsen, “Experiences with statistical methods
for wind power prediction,” in Proc. EWEC99, Nice, France, 1999, pp.
1066–1069.
[11] E. Akylas, “Investigation of the effects of wind speed forecasts and eco-
nomic Evaluation of the increased penetration of wind energy for the
island of Crete,” in Proc. EWEC99, Nice, France, 1999, pp. 1074–1077.
(A.9) [12] G. Kariniotakis, D. Mayer, J. A. Halliday, A. G. Dutton, A. D. Irving, R.
A. Brownsword, P. S. Dokopoulos, and M. C. Alexiadis, “Load, wind,
and hydro power forecasting functions of the more-care EMS system,”
in Proc. Med Power 2002, Athens, Greece, Nov. 2002.
The above equation can be rewritten as [13] A. C. Tsoi and A. D. Back, “Locally recurrent globally feedforward net-
works: A critical review of architectures,” IEEE Trans. Neural Netw.,
vol. 5, no. 3, pp. 229–239, May 1994.
[14] E. A. Wan, “Temporal backpropagation for FIR neural networks,” in
(A.10) Proc. Int. Joint Conf. Neural Networks,, vol. 1, 1990, pp. 575–580.
[15] S. Haykin, Neural Networks: A Comprehensive Foundation. New
York: IEEE Press, 1994.
[16] P. Frasconi, M. Gori, and G. Soda, “Local feedback multilayered net-
The required derivatives for the neuron outputs are works,” Neural Comput., vol. 4, pp. 120–130, 1992.
[17] C.-C. Ku and K. Y. Lee, “Diagonal recurrent neural networks for dy-
namic systems control,” IEEE Trans. Neural Netw., vol. 6, no. 1, pp.
144–156, Jan. 1995.
[18] P. J. Werbos, “Beyond Regression: New Tools for Prediction and Anal-
ysis in the Behavioral Sciences,” Ph.D. dissertation, Committee on Appl.
Math., Harvard Univ., Cambridge, MA, 1974.
(A.11) [19] E. A. Wan and F. Beaufays, “Diagrammatic derivation of gradient algo-
where rithms for neural networks,” Neural Comput., vol. 8, pp. 182–201, 1996.
[20] R. J. Williams and J. Peng, “An efficient gradient-based algorithm for
on-line training of recurrent network structures,” Neural Comput., vol.
2, pp. 490–501, 1990.
(A.12) [21] L. Ljung and T. Söderström, Theory and Practice of Recursive Identifi-
cation, Cambridge, U.K.: MIT Press, 1983.
[22] S. Shah, F. Palmieri, and M. Datum, “Optimal filtering algorithms for
fast learning in feedforward neural networks,” Neural Netw., vol. 5, pp.
779–787, 1992.
ACKNOWLEDGMENT [23] G. V. Puskorius and L. A. Feldkamp, “Neurocontrol of nonlinear dy-
namical systems with Kalman filter trained recurrent networks,” IEEE
SKIRON is a weather forecasting system initially developed Trans. Neural Netw., vol. 5, no. 2, pp. 279–297, Mar. 1994.
for the Hellenic Meteorological Service based on the Eta/NCEP [24] G. Kallos, S. Nickovic, A. Papadopoulos, and P. Katsafados, “The SK-
IRON forecasting system and its capability to predict extreme weather
model. The meteorological numerical predictions are now pro- events in the Mediterranean,” in Proc. 7th Int. Symp. Natural and Man-
duced on a daily basis by the Atmospheric Modeling and Made Hazards (HAZARDS-98), Chania, Greece, May 1998.
284 IEEE TRANSACTIONS ON ENERGY CONVERSION, VOL. 21, NO. 1, MARCH 2006
Thanasis G. Barbounis was born in Lamia, Greece, Minas C. Alexiadis was born in Thessaloniki,
in 1977. He graduated in electrical engineering in Greece, in July 1969. He received the Dipl. Eng.
1999 from the Aristotle University of Thessaloniki, degree in 1994 and the Ph.D. degree in 2003, both
Thessaloniki, Greece, where he is currently pursuing from the Department of Electrical and Computer
the Ph.D. degree. Engineering, Aristotle University of Thessaloniki,
His research interests include artificial neural net- Thessaloniki, Greece.
works, fuzzy logic systems, and modeling of non- His research interests include renewable energy
linear systems. sources and artificial intelligence applications in
power systems.