Willmott et al. (2011) proposed a new index, dr, and they compared the dr to ”mean absolute error (MAE) ” recordings that vary logically with MAE. However, this should be compared to an average absolute relative error, as MAE may vary with different samples/data sets, while the ”average absolute relative error” value may be the same (i.e., there is no change in the relative model). In this study, the dr index does not follow the logical trend within a given data set, as in Table 2 (combined analysis); and also ambiguously between different sets (1st year and data combined) – with a PMARE value. Similar inconsistencies are also observed for random records (Table 4, 1 . . . 3. Recordings – with PMARE). In this case, the denominator μ is added together by adding up the differences of all X and Y points compared to the average of X. The original version was based on gridded deviations, but was later modified 15 using absolute deviations, arguing that MAD (or MAE in this case, because it refers to errors between forecasts and observations instead of deviation) is a more natural measure of average errors and less ambiguous than RMSD (or RMSE)12. Another refinement of the index16 was intended to remove the predictions from the denominator, but as others have argued14, this amounts to resizing the expression of the coefficient of effectiveness, while the interesting reference point is lost. Again, these indices do not meet the requirement for symmetry.

To illustrate how the proposed index can be used in real case studies and how it is compared to other metrics, some examples are provided using real data. Geophysical data are generally structured according to the 2 or 3 known spatial dimensions and the temporal dimension, which leads to chronological series of geographic data. It is often interesting to evaluate separately the evolution over time of spatial concordance and the patterns of temporal correspondence with the dedicated protocols23. For brevity, this is a limited analysis, in which the first example focuses on the temporal matching of the time series of satellite images, while the second illustrates the spatial concordance between the different gross productivity series (GPP) in a single moment. Previous studies have provided comparable information for model evaluation indices (for some models or in general). However, there is no full standardization (or concrete proposals), including newly developed indices. The objective of this study is to verify and evaluate the available indices for model performance evaluation and to examine a logical, interpretable and unique index for general use in model evaluation. R, R2 and RMSE have been considered non-logical, ambiguous and misinterpreted by previous studies (and have been widely suggested from the range of performance indicators) as well as in this study. When the value of the RMS is standardized by the average measurement, the dispersal index (SI) is sometimes called (Zambresky 1989).

When the value of the RMS is standardized by a certain measure used for the propulsion of a model, it is sometimes referred to as ”OPI” (Ris et al. 1999). OpI, for example, can be used to provide an estimate of the power of a tree-level transformation model inside the water based on the height of the swell measured at sea. The four terms of the denominator can be represented geometrically, as stated in the Supplementary Information section. Following the explicit addition of the term covariance, the index ensures that if X and Y are negatively correlated, an index is zero if in the figure.