Forced Outage Analysis of Brazilian Thermal Power Plants using the Kruskal-Wallis Test

In this paper, the forced outage data from the Brazilian National Interconnected System (SIN) is provided and the statistics from the thermal power plants are computed. The SIN is characterized by a marked seasonality in electricity supply. In addition, the expansion pattern of the Brazilian electric sector shows signs of exhaustion, and the demand for flexible thermal power plants, based on availability, requires outage management as a reliability-centered maintenance (RCM) strategy. In this work, the non-parametric Kruskal-Wallis test and Dunn’s pairwise-comparisons were chosen for evaluating the mean forced outage duration (MFOD) and the forced outage factor (FOF) using the data from the national electricity system operator (ONS) with R Software. The distribution fitting was provided using Weibull++ software from Reliasoft. Based on the MFOD and unit failure rate data, the FOF for Brazilian thermal power plants is 3.33% with a 90% probability and 95% confidence level. Finally, Brazilian thermal power plants were benchmarked against North American power plants.


/2012 to the end of 2019, as that of the historic crit
cal period from June 1948to November 1955(Nacional do Sistema, 2020a).

Due to the predominance of hydraulic-powered electricity generation in the SIN, the contracting of thermal power plants is based on availability.This is because the cost merit, economic benefit, and security risk mitigation needed to supply the national power demand need to the evaluated before considering the use of contracted energy.These factors are evaluated based on the remuneration for the availability of the thermal generator and the reimbursement of operating costs incurred with the actual plant dispatch (Center for Regulatory and Infrastructure Studies, 2017).Figure 3  The Brazilian power grid long-term load forecasting (GW avg)

Figure 3: Electrical load in SIN from 2016 to 2020 (MW avg).Adapted from (Nacional do Sistema, 2017Sistema, , 2018Sistema, , 2019aSistema, , 2020b)).

Currently, thermal power generation in Brazil has a high variable unit cost (CVU), with 40% of the installed capacity having a CVU of over 250 BRL/MWh (~47 USD/MWh).This indicates that thermal power plants are used for electrical dispatch only in extremely unfavorable hydrological conditions, thus, increasing the challenges for maintaining the water level in reservoirs (Nacional do Sistema, 2020a).

The ONS, a private entity created in 1998, coordinates and controls the operations of the power generation and transmission facilities integrated in the SIN.In addition, the ONS evaluates th future and short-term conditions of these facilities (Sousa, 2009).

The decision-making process of the ONS is regulated by the grid procedures and approved by the Brazilian Electricity Regulatory Agency (ANEEL) through public hearings.In addition, the ONS provides ANEEL with input data and optimization models, to enable the reproduction of the method used for determining the plants' dispatch (Sousa, 2009).

In summary, the ONS functions to minimize the service cost, thus, decreasing the risk of insufficient power supply in each subsystem that composes the SIN.Thermal power stations operation provides efficiency gains during the dry and low-pour period, thus, increasing the firm energy of the system (Sousa, 2009).

According to the National Confederation of Indus ry, the national thermoelectric system of Brazil was originally contracted as a reserve for sporadic electrical performance during unfavorable hydrological periods.However, the data presented by Chamber of Electric Energy Commercialization (CCEE) show that the percentage of thermal power plants in the SIN load is approximately 16.5% (De Comercialização de Energia Elétrica, 2020) at most times in the ase operation (National Confederation of Industry, 2018).

The base operation is aimed at complementing electrical generation at all load levels.In addition to this, there is also the peak operation, which occurs in periods of fluctuating demand, and the operat on aimed at performing ancillary services such as (Ribeiro, 2019): controlling (i) primary frequency, (ii) secondary frequency, (iii) reactive support, (iv) self-restoration, (v) special protection system (SEP), and (vi) complementary dispatch for maintaining the operational power reserve (Nacional do Sistema, 2019b).

However, the expansion pattern of the Brazilian electric sector shows signs of exhaustion, with the inclusion of the ROR and intermittent renewable sources (wind and solar).Consequently, there is a growing demand for flexible thermal power plants operated in an electric dispatch mode (National Confederation of Industry, 2018).

The st uctural trend in the SIN involves the gradual reduction in the capacity of country's reservoirs regularization.Therefore, to recover water storage, thermal power complementation needs to be carried out more frequently and for longer periods, even during hydrological periods close to the long-term average (National Confederation of Industry, 2018).

In a study carried out by (Nogueira et al., 2019), they reported that 50% of Brazilian hydroelectric plants are over 20 years old and 32% are over 40 years old.T e aging of these plants is the major cause for the increasing frequency and duration of maintenance interventions, whether as scheduled or due to forced shutdowns.

ANEEL calculates the unavailability of hydroelectric plants based on two key performance indicators (KPI): scheduled unavailability rate (IP) and equivalent rate of forced unavailability (TEIF).The IP quantifies the percentage of hours that the turbo-generator was turned off to carry out scheduled maintenance, while the TEIF quantifies the percentage of hou s that the equipment remained in forced shutdown (Ministério de Minas e Energias, 2014).The reference values of the KPI are shown in Table 1.TM -2006 (IEEE, 2007)   The Brazilian electrical system (SEB) contracts are based on physical guarantee through energy guaranty contracts certified by the Ministry of Mines and Energy (MME) (Castro et al., 2016).One of the main performance risks for power plants is related to the decreasing physical guarantee due to unavailability, thus, leading to a reduction in the revenue of the commercialization of Electric Energy in Regulated Environments (CCEAR) and an increase in the additional costs for the acquisition of ballast in the Short-Term Mar et (MCP) (Martins, 2013).

The unavailability of each plant is computed by the ONS using the IP and TEIF.These values are calculated monthly, based on the moving average of the previous 60 months.If the verified unavailability KPIs are higher than those declared, there will be a reduction in the physical guarantee (Martins, 2 13).

Therefore, to minimize the risk of not supplying the national power demand, the combination of these variables is important: (i) basis operation, (ii) aging of hydroelectric plants, and (iii) flexible dispatch requirements: and demands from thermal power plants, high availability, and reliability standards.


Maintenance at Thermal Power Plants

Maintenance Management at the generating units is important for an economically optimized electrical dispatch in hydrothermal generation systems.However, it is challenging to choose the best schedule for preventive maintenance to minimize the operating cost of the generating agency, maximize the reliability of the system, and extend the units' operational life.In addition, with an increase in the size of the generating unit, the challenges also increase (Balaji et al., 2016).The maintenance costs of thermal power plants are greatly influenced by: (i) the type of starting, (ii) the frequency of starting, and (iii) the loading pattern (Dipak, 2015a).Figures 4 and 5 show examples of the effect of these factors on a g s turbine.(Dipak, 2015a).

Currently, the steam thermal power plants have larger sizes with a high steam-generating capacity and highly sophisticated firing system.However, with an increase in the unit size and capacity of thermal plants, forced outages have greater significance not only on the larger loss f revenue, but also greater risk of injury and damage to the plants (Dipak, 2015b).

To effectively manage these challenges, the reliability-centered Maintenance (RCM) process is carried out.The RCM process is a common-sense procedure for creating maintenance strategies to preserve assets' functions.The stan

rd SAE JA1011, published in 1999, se
s the criteria that any process must comply with to be considered an RCM.It establishes that for a process to be acknowledged as RCM, it must follow these seven steps (Sifonte and Reyes-Picknell, 2017): i. Delineate the operational context and the functions, and associated desired standards of assets performance (operational context and functions) ii.Determine how an asset can fail to fulfill its functions (functional failures) iii.Define the causes of each functional failure (failure modes) iv.Describe what happens when each failure occurs (failure effects) v. Classify the consequences of failure (failure consequences) vi.Determine what should be performed to predict or prevent each failure (tasks and intervals) vii.Decide whether other failure m nagement strategies may be more effective (one-time changes).Some uncertainties in thermal power plant dispatch can cause deviations from the system balance, which sometimes require inefficient and costly last-minute solutions in the near real-time time frame (Makarov et al., 2017).These uncertainties include: (i) uninstructed deviations of conventional generators rom their dispatch set points, (ii) generator forced outages, (iii) generator failures to start up, (iv) load drops, (v) losses of major transmission lines, and (vi) frequency variation (Makarov et al., 2017).

Outage management organization and administration ensures the safe and effective implementation and control of the maintenance activities during planned and forced outages.Outage planning and performance takes into consideration the safety, quality, and schedule in this order.Thus, the maintenance planning and scheduling process should reflect this (Lipár, 2012).

The objective of this study is to perform a multiple pairwise-comparisons of the annual forced shutdown statistics by ONS using the Kruskal-Wallis non-parametric test to verify if the populations' distributions have changed over the years.This allows for the computing of the reliability analysis for the forced outage data and to analyze the availability of thermal power plants and comparing it to benchmarks.


Materials and Methods


Thermal power plants: forced shutdown statistics

Forced shutdown involves the unscheduled removal of a component out of service due to failure or emergency shutdown.Forced shutdown requires the manual or automatic switching off of an equipment to avoid risks to the physical integrity of the people or the environment, or damage to the equipment, or other consequences to the electrical system.It includes accidental shutdown (without disturbance in the SIN), incorrect shutdown (with disturbance in the SIN), and shutdown resulting from SEP or configuration actions (Nacional do Sistema, 201 c).

The ONS provides performance and statistical reports issued from the generating units belonging to the SIN.Therefore, to apply the methodology used in this work, the report ONS DPL-REL-0099/2020: Relatório de Análise Estatística de Desligamentos Forçados de Equipamentos Referente ao ano de 2019 (Statistical Analysis Report on Forced Equipment Shutdowns for 201 ) was analyzed (Nacional do Sistema, 2019 c).

The ONS reported that the occurrence of the unsatisfactory operational performance of a thermal power plant directly impacts the security and reliability of electricity supply and energy tariffs, given that its unavailability causes the generation of another power unit with a higher CVU (Agência Nacional de Energia Elétrica, 2020).

The consolidated data of the fo

generators are presented using the indicators: Me
n Forced Shutdown Duration for Transmission and Generation Functions (DMDFF) and Forced Shutdown Rate for Transmission and Generation Functions (TDFF) (Nacional do Sistema, 2019c).

The DMDFF indicator aims to manage the performance of the transmission, transformation, reactive control, and the generation functions, regarding the mean duration of the forced shutdowns during the period considered.The DMDFF is calculated as follows (Nacional do Sistema, 2019d):
𝐷𝑀𝐷𝐹𝐹 = 𝐹𝑜𝑟𝑐𝑒𝑑 𝑜𝑢𝑡𝑎𝑔𝑒 ℎ𝑜𝑢𝑟𝑠 (𝐹𝑂𝐻) 𝑁𝑢𝑚𝑏𝑒𝑟𝑠 𝑜𝑓 𝑢𝑛𝑝𝑙𝑎𝑛𝑛𝑒𝑑 𝑜𝑢𝑡𝑎𝑔𝑒𝑠 𝑓𝑟𝑜𝑚 𝑖𝑛 𝑠𝑒𝑟𝑣𝑖𝑐𝑒 𝑠𝑡𝑎𝑡𝑒 Eq 1
The TDFF indicator aims to manage the rate of forced shutdowns on the transmission and generation functions during service hours at the period considered.The TDFF is calculated is as follows (Nacional do Sistema, 2019d):
𝑇𝐷𝐹𝐹 = 𝑁𝑢𝑚𝑏𝑒𝑟𝑠 𝑜𝑓 𝑢𝑛𝑝𝑙𝑎𝑛𝑛𝑒𝑑 𝑜𝑢𝑡𝑎𝑔𝑒𝑠 𝑓𝑟𝑜𝑚 𝑖𝑛 𝑠𝑒𝑟𝑣𝑖𝑐𝑒 𝑠𝑡𝑎𝑡𝑒 𝑆𝑒𝑟𝑣𝑖𝑐𝑒 𝐻𝑜𝑢𝑟𝑠 (𝑆𝐻)  8760
Eq. 2

Where:

The constant value 8760 is the annualization factor -24 h for 365 d.Where MFOD is the m an forced outage duration and MSTFO is the mean service time to forced.PH is the period hours or active hours and it represents the number of hours a unit was in the active state, FOF is the forced outage factor, i.e., the fraction of a given operating period in which a generating unit was not available due to forced outages.The IEE nomenclature was adopted to present the consolidated data in this work.


Kruskal-Wallis test and Multiple pairwise-comparisons

The data from the statistic report provided by ONS presents the information grouping over a 5 years interval (2015 to 2019).In this study, the Kruskal-Wallis KW) test was performed to test the null hypothesis (H0) that the annual probability distributions of the Brazilian thermal generators are equal.The KW test is a well-known non-parametric test applied in a wide range of disciplines such as engineering and manufacturing applications, ogy, psychology, and education (Ostertagová et al., 2014).

The KW test, named in honor of the American statisticians William Kruskal and W. Allen Wallis, was created in 1952 and is a non-parametric test used to compare th opulations.The test does not make assumptions about normality.However, it assumes that the populations have the same distribution, and that the samples are random and independent (Ostertagová et al., 2014;Ronald et al., 2007;Moreno et al., 2019).

The KW test requires that the data come from continuous probability distributions.The H0 being tested by the KW statistic assumes that all the population distributions are equal, and the

lternative hypothesis (H1) assumes that at least two p
pulation distributions differ from each other (Ostertagová et al., 2014;Ronald et al., 2007).Data are pooled across groups and ranked from the lowest value of the dependent variable to the highest value.

In case of a tie, the average rank is attributed to the tied experimental values (Andrew, 1998).

According to McDonald, , the KW test also assumes that the variation within the groups is equal (homoscedasticity), however, groups with different standard deviations have different distributions.Thus, if different groups have different shapes, the KW test may give inaccurate results (McDonald, 2007;Fagerland and Sandvik, 2009).

Since the KW test assumes that different groups have similar distribution, and groups with different standard deviations have different distributions, if the data are heteroscedastic, KW is no better than the one-way ANOVA and may be worse.For heteroscedastic data, the Welch's ANOVA is preferred (McDonald, 2007).The H0 of KW assumes that the distribution for all k populations are similar, while the H1 assumes that the distribution of at least two population differs.H0 and H1 can be expressed as follows (Andrew, 1998):
𝐻 0 : 𝐹 1 (𝑥) = 𝐹 2 (𝑥) = ⋯ = 𝐹 𝑘 (𝑥) Eq. 6
 1 : ∃1 ≤ ,  ≤ :   () ≠   () Eq. 7

The sum of the ranks Ri is calculated for each group i (i = 1, 2, . . ., k) of size ni, then the test statistic H is calculated, which represents the var ance of the ranks among all groups, with an adjustment for the number of ties (Hecke, 2010).
𝐻 = ( 12 𝑁(𝑁 + 1) ∑ 𝑅 𝑖 2 𝑛 𝑖 𝑘 𝑖=1 ) − 3(𝑁 + 1), 𝑁 = ∑ 𝑛 𝑖 𝑘 𝑖=1
Eq. 8

Where ni is the sample size for the i th group, Ri is the rank-sum for the i th group, N is the total sample size, and k is the number of groups.

If there are many ties in the samples, a correction factor must be applied, and the test statistic corrected for the ties will be (Ostertagová et al., 2014;Hollander et al., 2013):
𝐻 * = 𝐻 𝑓 * , 𝑓 * = 1 − ( ∑ (𝑡 𝑖 3 − 𝑡 𝑖 ) 𝑚 𝑖=1 𝑁 3 − 𝑁 ) Eq. 9
Where ti is the number of ties in the i th group of the m ties actor, H is the KW statistic, and H * is the KW statistic corrected for ties.Whenever H0 is true and either (Ostertagová et al., 2014):
{ 𝑘 = 3, 𝑛𝑖 ≥ 6 𝑓𝑜𝑟 𝑖 = 1,2,3 𝑘 > 3, 𝑛𝑖 ≥ 5 𝑓𝑜𝑟 𝑖 = 1,2, … , 𝑘 Eq. 10
The distribution of the test statistic, H, ion with (k-1) degrees of freedom.The H0 is rejected on the right-hand tail of the chi-square distribution (Ostertagová et al., 2014) After the te t statistic is calculated, the p-value is then calculated.The p-value is defined as the probability of observing the given value of the test statistic, or greater, under the H0 (Fer rejected on a significance level α, when a one-sided p-value is less than the significance level (Ferreira and Patino, 2015):
𝑃[𝜒 2 (𝑘) > 𝐻] < 𝛼 Eq. 11
Where α is the significance level, i.e., the probability of making the wrong de e of the chi-square distribution, and H is the statistic from the KW test.


Effect size for the Kruskal-Wallis test

The statistics of the effect size for the KW test provide the degree to which the data of one group has higher ranks than that of another group.The effect size is related to the probability that the value from one group will be greater than the value from another group (Mangiafico, 2016).

The eta-squared coefficient can be calculated as the measure of the KW test effect size.According to Prajapati et al., eta is ion and is the proportion of the total variance that is attributed to an effect (Prajapati et al., 2010).Eta-squared ranges from 0 to 1, and as a rule, 0.01 is a small effect, 0.06 is a moderate effec

and 0.14 is a large effect (Laerd Stati
tics, 2020).
𝐸 𝑅 2 = 𝐻 (𝑁 2 − 1)/(𝑁 + 1)
, { 0.01  0.06,   0.06  0.14,    0.14 ,  


Eq. 12

Where H is the statistic from the KW Test, k is the number of groups, and N is the total number of observations.According to Ferreira and Patino, there is a misconception that a very small p-value i dicates a highly relevant difference between groups.However, it is necessary to consider the effect size, as it may indicate that the sample size should be increased.The authors recommend preferably reporting the mean values for each group, the difference, and the 95% confidence interval, and then the p-value (Ferreira and Patino, 2015).

This significant result in a KW test indicates that there are gr it does not indicate which g

ups.Thu
, a post hoc procedure can be used to determine which groups are different from each other (Andrew, 1998).

According to Laerd Statistics, if the populations distributions have similar shapes, then, the medians can be compared to evaluate the distribution differences (Laerd Statistics, 2020).However, when the distribution shapes are different, the mean rank from the KW test should be considered.This is because with an increase in the group's mean rank, the observation values in that group increases in comparison to those of the other groups (Minitab 19 Support, 2020).The concept is shown in Figure 6.The KW test does not assume normality but assumes that the shapes of the distributions in different groups are similar.This suggests that non-p rametric tests are not a good solution for heteroscedastic data (McDonald, 2007).

According to Dinno, in the case of a rejected H0, the Dunn's test should be performed after the KW test (Dinno, 2015).The Dunn's test is based on the normal distribution with Bonferroni correction, where nc is the possible number of two-to-two comparisons that will be made between the k groups (Moreno et al., 2019).
𝑛 𝑐 = 𝐶 2 𝑘 = 𝑘! 2! (𝑘 − 2)! = 𝑘(𝑘 − 1) 2 Eq. 13
The test involves the comparison of the module of the differences between the means of the ranks for two groups | ̅  −  ̅  | with the least significant difference (LSD) (Moreno et al., 2019).
𝐿𝑆𝐷 = 𝑍 (𝛼 𝑐 ) √[ 𝑁(𝑁 + 1) 12 ( 1 𝑛 𝑖 + 1 𝑛 𝑗 )] , 𝛼 𝑐 = 𝛼 𝑘(𝑘 − 1)/2
Eq. 14

Where: LSD is the least significant difference, αc is the Bonferroni's correction, α is the significance level, ni and nj are the sample sizes of two compared populations, N is the total sample size, and Zαc comes from negative Z score table for αc (Moreno et al., 2019).

When | ̅  −  ̅  | ≥ , H0 is rejected, and the pairwise co lations are significantly different (Moreno et al., 2019).


Reliability analysis for the forced outage data

The analysis method proposed in this work considers all the thermal generators elected facilities according to the ONS's criteria (see Table 2) (Nacional do Sistema, 2019c):

i.

Thermal power plants with an effective power equal to or greater than 300 MW. ii.

Thermal power plants with 200 MW ≤ effective power <300 MW, with a transformation equal to or greater than 230 kV.iii.

Thermal power plants powered by natural gas, coal, and nuclear sources.The MFOD and unit failure rate indicators from the ther

l power plants are presented for 2015 to 2019, a
shown in Table 3 and Appendix    Based on the criteria by ONS, there are four major reasons for the forced power outage of power plants in SIN (Nacional do Sistema, 2019c): i. Internal outage, which is related to the mai parts of the power plant such as the insulators, the primary winding from transfo mers, stator, bearing and generator shaft, circuit breakers, and others.ii.Secondary outage, which is related to the com lementary or auxiliary equipment of the power plants such as wiring, protection, control, command, auxiliary services, ventilation, and cooling system.In addition, it also includes outages due to the incorrect performance from the protection system in case of external failures.iii.External outage, which is related to the outage due to the correct performance of the protection system due to failures (acting as a back-up protection) or due to overload caused by outage from a third party.

In addition, it also includes the outage caused by system configuration.iv.Operational outage, which is related to systemic electrical conditions such as oscillations, overvoltage, over-frequency, overload, and other systemic causes.


Results

The KW non-parametric test was computed using R software.The R software is a free software used for statistical computing and for the construction of graphics that can be downloaded and distributed free of charge under the GNU license (Landeiro, 2011;R Development Core Team, 2011).

The prerequisites for computing the KW test using R software require the R packages: (i) tidverse for data manipulation and visualization (Wickham et al., 2019), (ii) ggpubr for creating ready to publish plots (Alboukadel, 2020a), and (iii) rstatix, which provides R f

ctions f
r statistical analyses (Alboukadel, 2020b).Subsequently, the summary statistics was computed by groups using the kruskal_test function from the rstatix package (DataNovia, 2020).Since the number of groups used in this study was greater than 3 and the sample size was greater than 5 fo all groups, according to Eq. 10, the distribution of the test statistic, H, was well approximated using the chi-square distribution with (5 -1) degrees of freedom.The significance level chosen was α = 0.05.Manual calculations were performed to determine the mean ranks (Ri/ni), and the results are shown in Tables 4 and 5.Where N is the number of individuals, min is the minimum value, max is maximum value, median is the median, mean is the mean, sd is the standard deviation of the mean, se is the standard error of the mean, and ci is the 95% confidence interval of the mean (Kassambara, 2020a).

Mean forced outage duration Kruskal-Wallis test: Where .y. is the y variable used in the test, n is the sample count, statistic is the Kruskal-Wallis rank-sum statistic used to compute the p-value, df is the degree of freedom, p is the computed p-value, and method is the statistical test used to compare groups (Kassambara, 2020b).

The p-values of the computed KW test was less than a 0.05 significance level, thus, rejecting the H0 for both KPIs, indicating that at least two population distributions differ from each other.

After computing the KW p-values and plotting the boxplots, the statistics of effect size for the test was verified by computing the eta-squared from the kruskal_effsize function:

Mean forced outage duration effect size: Where .y. is the y variable used in the test, n is the sample counts, effsize is the estimate of the effect size, method is the eta-squared, and magnitude is the magnitude of effect size (Kassambara, 2020c).

This significant results in the KW tests indicate that there were group differences, however it does not indicate which groups.Thus, a Dunn test procedure was used to determine which groups were different from each other.To identify the differences between the consolidated annual values, boxplot graphs were plotted, as shown in figures 8 and 9.

Mean forced outage duration boxplot:

# Visualization: box plots with p-values pwc <-pwc %>% add_xy_position(x = "g oup") ggboxplot(mfod, x = "group", y = "weight", color = "group", palette = c("#00AFBB", "#E7B800", "#FC4E07", "#3CB371", "#BA55D3"), order = c("2015", "2016", "2017", "2018", "2019"), Unit f ilure rate boxplot:

# Visualization: box plots with p-values pwc <-pwc %>% add_xy_position(x = "group") ggboxplot(rate, x = " roup", y = "weight", color = "group", palette = c("#00AFBB", "#E7B800", "#FC4E07", "#3CB371", "#BA55D3"), order = c("2015", "2016", "2017", "2018", "2019"),  To verify the distribution fitting for the annual data from the MFOD and the unit failure rate, the data was fitted using the distribution wizard from software Weibull++ (version 19.0.2.1075, from Reliasoft).According to Reliasoft, the distribution wizard performs multiple goodness of fit tests to determine the best distribution for a data set based on the chosen parameter estimation method.In addition, it performs three goodness of fit tests to determine the rank of the distributions (Reliasoft, 2017):

i.The Kolmogorov-Smirnov test (Goodness of fit -GOF) tests used to determine the statistical d fference (the difference between the expected and obtained results).Default weighted in 40% for the Maximum likelihood (MLE) analysis method.ii.The Correlation coefficient test (Plot fit -PLOT) measures how well the plotted points fit a straight line.Default weighted in 10% for the MLE analysis method.iii.The Likelihood value test (Likelihood Ratio -LKV) computes the value of the log-likelihood function,

given the parameters of the distribution.Default weighted in 50% for the MLE analysis method.

Tables 6 and 7 show the distribution fitting for the MFOD and unit failure rate.The top ranked distri utions implemented for each year and KPI.The distribution shapes were plotted to evaluate the scale and skewness within the groups, as an indication of heteroscedasticity, as shown in Figures 10 and 11.Then, the distribution locations were computed considering the B50% life, assuming the groups had equal variation (homoscedasticity).The median is the value that the variable has a 50% probability of exceeding (Camarillo et al., 2017).Tables 8 and 9 show the B50% life as per computed using the Quick Calculation Pad from Weibull++ Software.The Two-sided Confidence Level of 95% was considered, which is the measure of the imprecision of the true effect size in the population of interest estimated in the study population (Patino and Ferreira, 2015).


Availability analysis of the thermal power plants

Using the implemented distributi

fitting, the B90% life was calculated for both th
MFOD and the unit failure rate with a 95% Confidence Level.The aim of the distribution fitting was to evaluate with a 90% probability, the expected time needed to recover the functions of the thermal power plants and the average number of outages.Data from the KW test and distribution fitting indicates that the MFOD increased for 2019 and the unit failure rate has been stable since 2016.Accordingly, the data from 2019 were chosen for the analysis.Tables 10 and 11 show the B90% life as computed using the quick calculation pad from Weibull++ Software.The Two-sided confidence level of 95% was also considered.Figures 12 and 13 show the probability of restoring the power plant function and the failure rate.Using the Eq. 5 and the consolidated data from Tables 10 and 11, the FOF for 2019 was calculated, as shown in Table 12: The FOF was compared to the data from the North American Electric Reliability Corporation (NERC).NERC is a not-for-profit international regulatory authority: the Electric Reliability Organization (ERO) for North America, subject to oversight by the Federal Energy Regulatory Commission (FERC) and governmental authorities in Canada (NERC, 2020).The consolidated statistics are shown in Table 13, using data from Generating unit statistical brochure (2018), containing data on units reporting events only.Nuclear (all types) All Sizes 99 1.25


Discussion

In this study, the computed KW test p-values wa

less than a
0.05 significance level, thus, rejecting the H0 for both KPIs, indicating that at least two population distributions differ from each other.In addition, the pairwise comparisons using the Dunn test show that:

i.The computed KW for the MFOD indicator rejected the H0, indicating that at least two population distributions differ from each other (P[χ ² (4) > 12.5] = 0.0142 < 0.05).In addition, based on the etasquared computing and the pairwise analysis by the Dunn test for the 0.05 significance level (p ≤ 0.05 for pairwise comparison 2017-2019), there was a difference between the population distributions from 2017 and 2019 with a small size effect.Furthermore, the KW test mean rank was 188.63 and 252.80 for 2017 and 2019, respectively.

ii.The computed KW for the unit failure rate indicator rejected the H 0 , indicating that at least two population distributions differ from each other (P[χ ² (4) > 36.5]= 0.000000231 < 0.05).In addition, based on the eta-squared computing and he pairwise analysis by the Dunn test for the 0.05 significance level (p ≤ 0.001 for pairwise comparison 2015-2016, p ≤ 0.05 for pairwise comparison 2015-2017, and p ≤ 0.0001 for pairwise comparisons 2015-2018 and 2015-2019), there was a difference between the population distribution from 2015 and that from the other years with a moderate size effect.The KW test mean rank was 289.93 for 2015 and varied from 184.50 to 229.39 for 2016-2020 data.

This significant result in the KW test indicated that there are differences between the groups, however it does not indicate which groups (Andrew, 1998).In addition, it does not indicate whether the difference is meaningful, nor does it specify how many of the groups are different from each other (Chan and Walmsley, 1997).If the result indicated no differences for the 0.05 significance level (p > 0.05), the stability system scenario could be verified by considering the five-years timeframe.Since the result showed significant differences, a multiple comparison between treatments was performed to construct pair-wise multiple comparisons to identify the source of the significance (Chan and Walmsley, 1997).

According to McDonald, the KW test does not assume normality, but that the shapes of the distributions in different groups are similar (McDonald and John, 2007).This indicates that non-parametric tests are not a good solution for heteroscedastic data.According to Laerd Statistics, the mean rank from the KW test should be considered for different shapes.With an increase in the group's mean rank, the observation values increase in comparison to those of the other groups (Minitab 19 Support, 2020).In addition, as reported by (McDonald, 2007), the standard deviations of the measurements of different groups should always be compared to evaluate their differences.

Based on the eta-squared computing, the computed KW for the MFOD indicator indicates a difference between the popu ation distributions from 2017 and 2019 with a small size effect.The KW test mean rank for 2017 and 2019 was 188.63 and 252.80, respectively, While the standard deviation for 2017 and 2019 was 4.89 and 15.1 hours, respectively.

These results indicate that the observation values from 2019 are higher than thos from 2017.In addition, it also indicates differences in the group distributions.Therefore, it is necessary to investigate for signs of skewness and variance, as proposed by (Fagerland and Sandvik, 2009).

The effect size of the KW test was computed to verify the degree to which one group has data with higher ranks than another group.This test is related to the probability that a value from one group will be greater than that fro another group (Mangiafico, 2016).According to Prajapati et al., the eta-squared is a measure of association, and the proportion of the total variance attributed to an effect (Prajapati et al., 2010).Eta-squared ranges from 0 to 1, and as a rule, 0.01 is a small effect, 0.06 is a moderate effect, and 0.14 is a large effect (Tomczak et al., 2014).

Based on the eta-squared (  2 = 0.0198 ≤ 0.06), the effect size for the MFOD KW test indicated that there is a difference between the population dist ibutions with a small size effect.In addition, based on the eta-squared (0.06 <   2 = 0.0732 ≤ 0.14), the effect size for the unit failure rate KW test indicated that there is a difference between the population distributions with a moderate size effect.

A post hoc procedure was performed to determine which roups are different from each other (Ribeiro, 2019.;Andrew, 1998).According to Dinno, when the H0 is rejected, the Dunn's test should follow the KW test (Dinno, 2015).The computed distribution fitting nd the distribution locations analysis show that:

i.When the groups had equal variation (homoscedasticity), the distribution fitting for the MFOD indicator confirmed a small size effect based on the com uted eta-squared (  2 = 0.0198 ≤ 0.06) and the pairwise analysis by the Dunn test for the 0.05 significance level.In addition, the distribution location (B50% life) slightly increased for the 2019 data (4.39 in 2017 and 6.84 in 2019), indicating a higher forced outage duration.

ii.When the groups had equal variation (homoscedasticity), the distribution fitting for the unit failure rate indicator confirmed the moderate size effect based on the computed eta-squared (0.06 <   2 = 0.0732 ≤ 0.14) and the pairwise analysis by the Dunn test for the 0.05 significance level.In addition, the distribution location (B50% life) decreased from the 2016 data on, indicating an overall lower unit failure rate (7.91 in 2015, 4 86 in 2016, 5.43 in 2017, 3.75 in 2018, and 4.20 in 2019).

iii.The distribution fitting for both the MFOD and unit failure rate indicated that the data have a nonnormal distribution, thus, ranking the distributions loglogistic and generalized gama for the MFOD data and 3-parameters Weibull, and generalized gama and lognormal distributions for the unit failure rate.

iv.The raphical analysis of MFOD pdf indicates a right-skewed behavior with an increased shifting in the 2019's location parameter, and a slightly higher scale-parameter compared to the other years.

v. The graphical analysis of unit failure rate pdf indicates a right-skewed behavior with an increased shifting in the 2015's location-parameter and a higher scale-parameter compared to the other years.

However, there are some limitations with using the KW test.According to McDonald, the KW test cannot detect the differences between symmetrical distributions with similar location, and very d fferent scale-parameter have different distributions.Sometimes, the KW test is considered a median test for the H0.This is because it assumes that the distributions in each group have similar shape, and the KW test can reject the null hypothesis even for same medians (McDonald, 2007).Another situation in which the median test for the H0 is violated is when the distributions have different degrees of skewness, which affects both type I error, rejecting the null hypothesis when it is true (Fagerland and Sandvik, 2009).

In addition, according to McDonald, there is no consensus about heteroscedastic data for applying a test that assumes homoscedasticity (McDonald, 2007).The graphical analysis of the MFOD probability and unit failure rate pdfs indicate a right-skewed behavior for all the available groups, and the medians were considered for the complementary evaluation of the omputed data.

The Distribution Wizard from Weibull++ oftware used in this study is a valuable tool for fitting distribution models.According to Reliasoft, the MLE analysis is an appropriate method for data sets with many o served failures, however, the MLE tends to be statistically distorted for small sample sizes.Therefore, the default weighte composition of three goodness of fit tests were applied to determine the rank of the distributions: (i) 40% for the Kolmogorov-Smirnov test, (ii) 10% for the Correlation coefficient test, and (iii) 50% for the Like ihood value test (Reliasoft, 2017).

By combining the analytical and graphical tools, it was possible to verify which data pack to choose for applying the B90% life analysis: in this case, the 2019 data from the MFOD and unit failure rate reports provided by ONS was used.The BX% life is the lifetime metric, in which X% of the units in a population fail, when considering the reliability analysis (Woo, 2017).Applying B90% life calculation for the MFOD data indicates a lifetime metric, in which 90% of the power units would recover to available state.Likewise, applying the B90% if calculation for the unit failure rate data indicates the failure/year metric which 90% of the power units would achieve.The choice of the year 2019 to process the availability analysis considers the following scenarios: (i) the atypical rise in the MFOD for 2019 and (ii) the stability of the unit failure rate o

r the past
years.

The B90% life and unavailability computing show that:

i.The FOF of the Brazilian thermal power plants is 3.33% with a 90% probability and a 95% confidence level, based on the data from the MFOD and unit failure rate data.

ii.This value is slightly higher than the upper limit applied by ANEEL for hydroelectric plants (higher value is 3.115%).

iii.The FOF of the Brazilian thermal power plants is also lower than 70% of North America thermal power plants (Fossil, Gas turbine, and Multi-boiler/multi-turbine types), based solely on the evaluated statistics.

The analysis of the operating scenario indicates that the combination of: (i) the historical second critical period in SIN (Nacional do Sistema, 2020 a) and (ii) the higher power demand in the f rst quarter of 2019 (Nacional do Sistema, 2020 a) (Figure 3), have contributed to the increase in the MFOD.Furthermore, there has been no significant difference in the annual unit failure rate since 2016, assuming that the groups had equal variation (homoscedasticity) and applying the KW test (p > 0.05).In addition, the FOF of the Brazilian thermal power plants (3.33% at 90% probability and 95% confidence level) is comparable t that from 2436 power plants from the NERC (NERC, 2018), indicating that the Brazilian thermal generation is a reliable system regardless of the challenges due to (i) the type of starting, (ii) the frequency of starting, and (iii) the loading pattern (Dipak, 2015a).


Conclusion

Brazil is a continental-size country and its SIN is characterized by a marked seasonality in its electricity supply.

The expansion pattern of the Brazilian electric sector shows signs of exhaustion, with the inclusion of the ROR and intermittent renewable sources.In addition, 50% of the Brazilian hydroelectric plant are over 20 years old and 32% are over 40 years old.Therefore, there is an increase in the demand for flexible thermal power plants based on availability that can be operated in electric dispatch mode.Therefore, the knowledge acquired from the statistical analysis of forced outages is fundamental to understand the availability of Brazilian thermal power plants, to ensure the security risk mitigation to supplying the national power demand.

Assuming the homoscedasticity, the KW test and Dunn's pairwis

comparisons are valuable non-parame
ric approaches for evaluating the differences in the population's distributions for forced outage data.The tests were applied for available forced outage data provided b