Quality control and homogenization of daily meteorological data in the trans-boundary region of the Jhelum River basin

Rashid MAHMOOD; JIA Shaofeng

doi:10.1007/s11442-016-1351-7

Journal of Geographical Sciences >

2016 , Vol. 26 >Issue 12: 1661 - 1674

DOI: https://doi.org/10.1007/s11442-016-1351-7

Orginal Article

Quality control and homogenization of daily meteorological data in the trans-boundary region of the Jhelum River basin

Rashid MAHMOOD ,
JIA Shaofeng

Expand

Key Laboratory of Water Cycle and Related Land Surface Processes, Institute of Geographic Science and Natural Resources Research, CAS, Beijing 100101, China

Author: Rashid Mahmood, E-mail: rashi1254@gmail.com; Jia Shaofeng, E-mail: jiasf@igsnrr.ac.cn

Received date: 2015-07-22

Accepted date: 2015-10-29

Online published: 2016-12-20

Supported by

National Natural Sciences Foundation of China, No.41471463

President’s International Fellowship Initiative CAS

Copyright

Fold

Abstract

Many studies such as climate variability, climate change, trend analysis, hydrological designs, agriculture decision-making etc. require long-term homogeneous datasets. Since homogeneous climate data is not available for climate analysis in Pakistan and India, the present study emphases on an extensive quality control and homogenization of daily maximum temperature, minimum temperature and precipitation data in the Jhelum River basin, Pakistan and India. A combination of different quality control methods and relative homogeneity tests were applied to achieve the objective of the study. To check the improvement after homogenization, correlation coefficients between the test and reference series calculated before and after the homogenization process were compared with each other. It was found that about 0.59%, 0.78% and 0.023% of the total data values are detected as outliers in maximum temperature, minimum temperature and precipitation data, respectively. About 32% of maximum temperature, 50% of minimum temperature and 7% of precipitation time series were inhomogeneous, in the Jhelum River basin. After the quality control and homogenization, 1% to 11% improvement was observed in the infected climate variables. This study concludes that precipitation daily time series are fairly homogeneous, except two stations (Naran and Gulmarg), and of a good quality. However, maximum and minimum temperature datasets require an extensive quality control and homogeneity check before using them into climate analysis in the Jhelum River basin.

Key words： quality control; homogenization; daily meteorological data; Jhelum River basin; Pakistan

Cite this article

Rashid MAHMOOD , JIA Shaofeng . Quality control and homogenization of daily meteorological data in the trans-boundary region of the Jhelum River basin[J]. Journal of Geographical Sciences, 2016 , 26(12) : 1661 -1674 . DOI: 10.1007/s11442-016-1351-7

1 Introduction

High quality and homogeneous long-term data series are essential in climate research, especially in climate change studies, which are used to assess climate variability and historical climate trends of mean and extreme climate events. However, most of the long climatic series not only have outliers and missing values but also are inhomogeneous (Cao and Yan, 2012; Trewin, 2013). Homogeneous climate time series are those where the variations are caused solely due to variation in climate and not due to non-climatic factors. The potential non-climatic factors are changes in instruments, changes in surroundings, relocation of monitoring stations, changes in observation methods etc. (Li-Juan and Zhong-Wei, 2012; Štěpánek et al., 2013). These factors may hide true signals of climate variability and climate change, leading towards some wrong conclusions of climate and hydrological studies (Costa and Soares, 2009). These are discussed in more details in Peterson et al. (1998), Aguilar et al. (2003) and Trewin (2010).

The climatic series that span from decades to centuries are rarely free of irregularities, errors and missing values. Although some specific inhomogeneous sites have only a marginal effect on the observed climate trends at the global scale, they can have substantial impact at the local or regional scale (Trewin, 2013). Thus, it is essential to produce homogeneous and quality controlled climate records before using them in climate analysis (Costa and Soares, 2009).

Several techniques such as Buishand range test (Buishand, 1982), Krukal-Wallis test (Kruskal, 1952; Kruskal and Wallis, 1952), Mann-Kendal test (Mann, 1945; Kendall, 1975), Multiple Analysis of Series for Homogenization (MASH) (Szentimrey, 1999), Pettit test (Pettitt, 1979), Regression-Based methods (Easterling and Peterson, 1995; Vincent, 1998), Standard Normal Homogeneity Test (SNHT) (Alexandersson, 1986) etc. have been developed for detection of irregularities on a site and their adjustment.

There are two main groups of homogeneity testing techniques; ‘absolute’ and ‘relative’. In the first group, the statistical tests are applied on each time series separately. In the relative methods, the statistical tests are applied on the difference of test series (time series under consideration) and reference series—created from some highly correlated stations in the region. Although both approaches are useful and valid to detect an inhomogeneity, the relative approach is more reliable than the absolute because it also considers the changes on the neighbor stations in the region (Peterson et al., 1998).

In homogenization, first, inhomogeneities are identified in a time series by using some techniques and then these irregularities are adjusted to make the site homogeneous (Trewin, 2013). Although several techniques are available, no single procedure is recommended.

Thus, the following four steps are commonly used to detect and adjust an inhomogeneous site: 1) basic quality control and metadata analysis, 2) reference series creation, 3) inhomogeneity detection and 4) adjustment for the compensation of inhomogeneity (Costa and Soares, 2009).

Many countries such as Australia (Trewin, 2013), Spain (Vicente-Serrano et al., 2010), Croatia (Zahradníček et al., 2014), Czech Republic (Štěpánek et al., 2009) and China (Feng et al., 2004) have developed homogenized meteorological datasets for climate analysis. However, in Pakistan and India, no quality controlled and homogeneous datasets are available for climate research. Thus, in the present study, quality controlled and homogeneous daily maximum temperature, minimum temperature and precipitation datasets are developed for the Jhelum River basin, Pakistan and India. This will provide an unprecedented resource for climate and climate change research in Pakistan. Station characteristics and data sources are described in Section 2. In Section 3, quality control and homogenization techniques are outlined. The main results and discussion are described in Section 4 and conclusions in Section 5.

2 Study area and data description

The upper Jhelum River basin is located in the north of Pakistan and spans between 33°-35°N and 73°-75.62°E, as shown in Figure 1. This is the second biggest tributary of the Indus River basin. The Jhelum basin has a drainage area of 33,342 km², with an elevation ranging from 200 to 6248 m. The whole basin drains into the Mangla Reservoir, the second largest reservoir in Pakistan, which was construction in 1967. The primary function of this reservoir is to provide water for irrigation of 6 million ha of land and to produce electricity as byproduct. The installed capacity of the reservoir is 1000 MW, which is 6% of the installed capacity of the country’s power production (Archer and Fowler, 2008; Mahmood and Babel, 2013).

View original graphic|Download|PPT slide

Figure 1 Location of the study area and geographic distribution of weather stations

Observed daily historical data of maximum temperature (22 weather stations), minimum temperature (22) and precipitation (27) were collected from Pakistan Meteorological Department (PMD), the Water and Power Development Authority of Pakistan (WAPDA) and the Indian Meteorological Department (IMD). The daily data of Gulmarg, Kupwara, Qazigund and Srinagar weather stations were obtained from IMD. The PMD provided climate data of Astore, Balakot, Garidopatta, Kotli, Muzaffarabad, Murree and Jhelum climate stations, and the remaining data was collected from WAPDA. Most of the precipitation series have data periods from 1961‒2009. However, most of the temperature time series range from 1971‒2009. The geographic distribution of these stations is shown in Figure 1. This shows that most of the stations are located in the eastern parts of the basin and on lower altitudes. The basic information about the stations such as location, mean distance between the stations, mean altitudinal differences between the stations, available data period and missing data of each station are given in Table 1.

Table 1 Geographic and basic information about the climate stations available in the Jhelum River basin

3 Methodology

3.1 Quality control

It is the primary emphasis of quality control to treat with outliers before application of any homogenization approach, which can mislead homogenization results (González-Rouco et al., 2001; Štěpánek et al., 2013; Zahradníček et al., 2014). There is a lack of generally recommended methodology for quality control of meteorological data. Thus, in the present study, a combination of different methods applied in Feng et al. (2004), Štěpánek et al. (2013) and Zahradníček et al. (2014) was used to identify erroneous data resulting from observation sources and digitization.

3.1.1 Extreme value check

In this method, daily values of a variable such as temperature are compared with the global and/or local historically observed extreme values of this variable. The data values which are greater than the highest and less than the lowest observed values of a variable are considered as erroneous values. These values are adjusted or removed from the data for subsequent quality control (Feng et al., 2004). In the present study, local temperature and precipitation extremes (PMD, 2014; Atta Ur and Shaw, 2015) were compared with the daily records to check outliers in the data series. These local extreme values for temperature and precipitation are presented in Table 2.

Table 2 Local extremes of Tx, Tn and Pr

Variable	High Extreme	Low Extreme	Source
Tx (°C)	53.5	-24.1	(PMD, 2014; Atta Ur and Shaw, 2015)
Tn (°C)	53.5	-24.1	(PMD, 2014; Atta Ur and Shaw, 2015)
Pr (mm)	668	-	(PMD, 2014)

3.1.2 Internal consistency check

Reek et al. (1992) concluded that the errors in the data series are mostly due to digitizing, unit difference, typos, different way of data reporting etc. So, they developed eight rules to check the erroneous data in meteorological time series. In the present study, the following three rules were used to check the daily time series, as used in Feng et al. (2004): 1) internal consistency detects the errors such as Tx is lower than Tn, 2) Flat-liner check recognizes the same data values for at least seven consecutive days (not applied to zero values of Pr) and 3) excessive diurnal temperature range (Tx-Tn>53.5°) is used to detect extraordinary large daily temperature range (Tx-Tn). Since no highest diurnal temperature range is found in the literature for Pakistan, a value of 53.5°—the highest maximum temperature in Pakistan—was used as the highest diurnal temperature range in the present study. If data values exceed the range of 53.5°, the values are identified as outliers.

3.1.3 Temporal outlier check

The above methods can detect some obvious errors in the data series. However, they cannot identify the errors such as where a data value is significantly different from the previous or the following value in the same time series (Feng et al., 2004). To detect these kinds of outliers, Tukey’s method, known as Inter Quartile Range (IQR) method, developed by Tukey (1997) was used in the present study, as in González-Rouco et al., (2001), Štěpánek et al. (2013) and Zahradníček et al. (2014) to detect the outliers in the climatic datasets. There are three main steps to detect outliers: 1) to find out the inter quartile range (IQR)—which is the difference between the first quartile (Q1) and the third quartile (Q3); 2) to calculate lower and upper extremes—the lower and upper extremes are calculated by subtracting 1.5×IQR from Q1 and adding 1.5×IQR into Q3, respectively; 3) values beyond these limits are considered to be possible outliers. If an IQR-coefficient of 3 is used, instead of 1.5, to calculate the upper and lower limits, then the values beyond these limits are considered to be the most probable outliers. This method is less sensitive to extreme values than the methods such as Z-score and Standard deviation method which use mean and/or standard deviation to detect outliers. This method has more resistant against outliers because quartiles are used in this method (Tukey, 1977; González-Rouco et al., 2001; Seo, 2006). In the present study, this method was applied on the differences of the test (specific station) and reference series (discussed in the next section) for the detection of erroneous data. In this study, IQR coefficient of 2 was used to give more assurance about outliers.

3.1.4 Spatial outlier check

This method is used to detect those outliers which are not detected by the previously mentioned methods. This method detects outliers by comparing test station values with neighbor stations’ values (Feng et al., 2004). Since no single method is generally recommended to deal with spatial outliers (Štěpánek et al., 2013), a combination of several methods was applied in the present study to identify outliers as done in Štěpánek et al. (2013) and Zahradníček et al. (2014). In this study, ProclimDB software developed by Štěpánek et al. (2010) was used for this purpose. This is a fully automated software for quality control of climate data. In this, several methods are available to detect spatial outliers. Among them, the following methods were used in the present study:

1) Pairwise comparison method. In this method, series of differences between test and neighbor stations are created and standardized. Cumulative density funtions (CDFs) for each difference series is calculated. If the average CDF exceeds the critical value (0.95), that value is considered as outlier. It means if the difference between the values at test and neighbor stations is statistically siginificant (α=0.05), the values of the test stations are considered as outliers.

2) Inter quartile range method. In this method, limits (higher and lower) are calculated from the neighbor stations and applied to the test series to find out the outliers. In the present study, a value of 2 (Tukey’s coefficient) was used during calculation of limits.

3) Technical series method. A technical (theoretical) series is created from neighbor stations by means of some statistical methods for spatial data (e.g., kriging and IDW). Then, this series is compared with the test series at a significance level of 0.5.

In the present study, five highly correlated neighbor stations, as discussed below, were used to create theoretical (technical) series for calculating limits for IQR method and for pairwise comparison method.

3.1.5 Creation of reference series

A change in a climatic time series which may be considered as an inhomogeneity in a dataset, but it may also be a result of a change in local or regional climate (Peterson et al., 1998). Several techniques have been introduced to overcome this kind of problems. Most of them use data from some highly correlated nearby stations in the region to establish a new time series (called as reference series) as a descriptor of regional climate. In the present study, a technique used in Zahradníček et al. (2014) and Štěpánek et al. (2013) was used to create reference series for each variable on each site. According to them, the first step is to select neighbour stations. These stations can be selected either by distances or by correlations. Correlation coefficients can be calculated either from raw station data or first order differences. In the present study, five highly correlated neighbor stations were selected, with the distance restricted to 150 km and altitude difference of 600 m. Then, the datasets of these highly correlated stations were standardized with the mean and standard deviation. At the end, Inverse Distance Weighting (IWD) method, equation 1, was used to take average of five selected standardized neighbors to create reference series.

(1)

where

is the reference series; y_i neighbor station; d is the distance between the test and neighbor stations; n is the number of neighbor stations; p is the power of distance—the higher the value of p, the greater the weights for the nearest neighbor station. In this study, a power of 0.5 and 1, as recommended in the manual of ProclimDB (Processing of Climatological Database) software (Štěpánek, 2010), was used to create reference series for temperature (Tx and Tn) and precipitation, respectively.

3.2 Homogenization

The presence of inhomogeneities is a common problem in climate time series. Most of these are related to abrupt changes in average values but also appears as changes in the trend of time series. These irregularities in climate data can deceive the actual results and lead to some wrong conclusions (Vicente-Serrano et al., 2010). Thus, to assess some meaningful climate analysis, the climate data must be homogeneous (Štěpánek et al., 2009). An ideal way to deal with such irregularities is to examine the station’s metadata—that records the historical information about station such as relocation of station, instrument change, type of instruments used etc. After detection of inhomogeneity through the metadata, the temporal variation of the inhomogeneous dataset from the station can be compared with the variation of the neighbor station or regional climate variation. However, most of the time, a complete metadata is not available for all stations in the region. Thus, some alternative subjective and objective methods are used to check the homogeneity (Feng et al., 2004). These methods are reviewed comprehensively in Peterson et al. (1998) and Costa and Soares (2009). These methods generally used the following steps during homogenization: 1) creation of reference series for comparison with the test series for relative homogenization, 2) application of statistical test to detect irregularities and 3) homogenization—adjustments to compensate with inhomogeneities and imputation of missing data. Since each statistical test renders results with some degree of uncertainty because of noise in the time series (Zahradníček et al., 2014), a combination of different tests is considered to be more effective to uncover data inhomogeneity. Thus, in this study, three relative homogeneity tests were applied for homogeneity check: SNHT (Alexandersson, 1986), Maronna and Yohai Bivariate test (Maronna and Yohai, 1978; Potter, 1981) and Easterling & Peterson test (Easterling and Peterson, 1995). Reference series for each test station were created from five highly correlated neighbor stations. These series can be divided into a duration of 40 years, with an overlap of 10 years if the series are of long period, e.g., more than 70 years. This is recommended for SNHT test to perform properly. Since there is a lack of methods to detect the inhomogeneities directly from the daily time series (Vicente-Serrano et al., 2010), these tests were applied on the monthly, seasonal and annual time series, the same as in Feng et al. (2004), Vicente-Serrano et al. (2010), Zahradníček et al. (2014) and Štěpánek et al. (2013). This approach is commonly used for inhomogeneity detection.

In ProclimDB, the main criterion for the identification of a year of breakpoint (abrupt change) is the probability of detection (PD) of a given year. This is the ratio of total detected breakpoints for a given year from all tests to the all theoretically possible breakpoints from all tests. PD values exceeding 10% and 20% (recommended by Štěpánek et al., 2010) are used to identify potential inhomogeneities in precipitation and temperature, respectively. The same values were used for the present study. Before taking the final decision about breakpoints, these breakpoints were also examined graphically to reduce any uncertainty.

The inhomogeneous series were corrected on a daily scale. The daily adjustments were calculated based on the reference series and smoothed by low pass filter because this better reflects the physical properties of time series (Figure 1). A 15-year data on both sides of breakpoint was used during the calculation of adjustments.

In adjustment calculation, first the difference series between the test and reference series are calculated before and after the breakpoint. Then, the adjustments are calculated by subtracting the difference series before breakpoint and the difference series after breakpoint. These adjustments can be smoothed by low pass filter, high pass filter or moving average (Štěpánek et al., 2013).

For validation of homogenization, correlation coefficients between test and reference series are calculated for each month before application of adjustments and after application of adjustments. Then, these correlations are compared with each other. If there is an increase in change in correlation coefficients, the adjustments are accepted (Zahradníček et al., 2014). The same was done in the present study.

The presence of missing data in climate time series is a common problem which must be considered when dealing statistically with the climate data. It can mislead the results and even prevent important analysis of the considered variable from being carried out. Currently, several statistical techniques have been developed to overcome this problem. They span from some simple methods, such as using a mean value, to some very sophisticated techniques, such as multiple imputation. However, their application depends mainly on the percentage of missing data. It is suggested that if percentage of missing values is not greater than 5, any method can be used. However, if percentage of missing values is greater than 5, some sophisticated methods such as regression and multiple imputation methods must be applied (Lo Presti et al., 2010). In the present study, a multiple imputation method, predictive mean matching (Heitjan and Little, 1991), was used to deal with missing data because most of the stations available for this study have missing data greater than 5%. On some stations, the missing percentage is even greater than 15% (Table 1). This is a semi-parametric approach which is similar to regression method except that these missing values are imputed randomly. This method ensures that the imputed values are plausible. It may perform better than regression if the normality assumption is violated (Horton and Lipsitz, 2001).

4 Results and discussion

4.1 Spatial correlation

For homogenization and quality control of climate data, it is essential to get information about spatial correlations among climate stations. Figure 2 shows average correlation coefficient of each climate station with all the other stations in the study area. The correlations were calculated for each variable (Tx, Tn and Pr) between stations, on daily time series. In case of Tx, the highest correlation (0.95) was observed on Kallar, Kotli, Muzaffarabad and Srinagar and the lowest (0.79) on Bagh. In case of Tn, the highest correlation (0.96) was found on Mangla, Srinagar, Domel and Muzaffarabad and the lowest (0.85) on Dhudial. Among Pr stations, Domel had the highest correlation of 0.62, and Gulmarg had the lowest correlation of 0.2. The spatial correlations for Tx and Tn were much stronger than Pr.

View original graphic|Download|PPT slide

Figure 2 Average correlation coefficients between weather stations in the Jhelum River basin

Figure 3 shows average correlations with respect to distance between climate stations in case of Tx, Tn and Pr. Highly correlated stations were observed within 40 km. As distance exceeded 40 km, correlations decreased quickly in case of Pr. On the other hand, in case of temperature, decreasing rate was very small. As expected, larger distances showed lower correlations between stations. As distance increased, the correlations decreased more quickly in precipitation as compared to temperature.

View original graphic|Download|PPT slide

Figure 3 Variation in correlation coefficients with respect to distance between weather stations in the Jhelum River basin

4.2 Quality control of daily data

In the present study, an extensive methodology comprising four checks (high/low extremes,internal consistency, temporal, and spatial outliers’ checks) were used to detect outliers. Table 3 shows percentage of outliers detected by each quality control method in Tx, Tn and Pr. These are the total outliers detected on all climate stations, in three variables (Tx, Tn and Pr). Very few values 6 (0.0019%), 24 (0.0076%) and 4 (0.001%) were identified by high/low extreme check in Tx, Tn and Pr, respectively. These values were adjusted manually by examining the neighbor station values as well as previous and following values around the infected value. A total of 846 (0.211%) were recognized as errors in Tx and Tn time series during the internal constancy check. Among them, 354 (0.0558%) errors were identified by Tx lower than Tn rule, and no error was detected by excessive diurnal range check. These errors were corrected by taking the average of five neighbor stations, and previous and following values of the infected value. Flat-liner check (same seven consecutive values) detected about 216 (0.0677%) values in Tx and 276 (0.0874%) in Tn time series. In this case, all values were removed from the datasets except the first value of each group of same consecutive values, the same as in Feng et al. (2004).

Table 3 Percentages of erroneous data in Tx, Tn and Pr time series during quality control in the Jhelum River basin

Method		Tx (%)	Tn (%)	Pr (%)
Total number of values processed		319100	315794	418499
High/Low extremes		0.0019	0.0076	0.0010
Internal consistency	Tx lower than Tn	0.0279	0.0279	‒
	Flat-liner	0.0677	0.0874	0.0000
	Excessive diurnal temperature range	0.0000	0.0000	‒
Temporal outliers		0.2407	0.2825	‒
Spatial outliers		0.2288	0.3467	0.0222
Total		0.5669	0.7521	0.0232

Tukey’s method detected 768 (0.241%) and 892 (0.283%) temporal outliers in Tx and Tn, respectively, and 730 (0.229%) and 1095 (0.347%) errors were identified by spatial outliers method. The errors detected by temporal and spatial outliers check were removed from the datasets before homogenization process. Table 3 shows that the most infected variable is Tn, and the less effected variable is Pr. Since, according to the best of our knowledge, no studies are reported on quality control in the Jhelum basin, these results were compared with Feng et al. (2004) conducted in China. It was found that Tx and Tn are more problematic than Chinese stations. Nonetheless, precipitation data is of high quality, which is comparable with China precipitation data.

4.3 Homogenization

Table 4 describes the number of stations having inhomogeneous datasets, the number of inhomogeneities in climate variables, and the years of inhomogeneities. A total of 23 inhomogeneities (2 in Pr time series, 9 in Tx, and 12 in Tn) and datasets of 20 stations (2 in Pr time series, 7 in Tx, and 11 in Tn) were identified as inhomogeneous. This means 28% of the climatic series (Tx, Tn and Pr) were found as inhomogeneous. Most of the inhomogeneous stations were found in Tx and Tn data series, with 7 (32%) and 11 (50%) stations, respectively. Only 2 (7%) of the Pr data series were detected as inhomogeneous. Some example of inhomogeneous station and breakpoints detected are show in Figure 4. Since no studies about homogenization are found in Pakistan, these results were compared with some other studies such as Feng et al. (2004) conducted in China, Štěpánek et al. (2013) in Czech Republic and Zahradníček et al. (2014) in Croatia. In Feng et al. (2004), Štěpánek et al. (2013) and Zahradníček et al. (2014), a percentage of inhomogeous stations was 37%, 42% and 23%, respectively. However, they conducted homogenization on more climate variables than this study.

Table 4 Inhomogeneous stations and number of breakpoints in Tx, Tn and Pr in the Jhelum River basin

SR	Station	Year of inhomogeneities
SR	Station	Tx	Tn	Pr
1	Bagh	1970	1970, 2004
2	Balakot		1979
3	Dhudial	1989	1989
4	Domel	1969, 1989	1969
5	Gulmarg		1987	1968
6	Gharidopatta	1969
7	Kotli	1981, 1995	1970
8	Kupwara		1985
9	Mangla		1972
10	Murree	1989	1989
11	Muzaffarabad	1969	1969
12	Naran	1989		1988
13	Rawalakot		1990
	Stations having inhomogeneities	7	11	2
	Stations having inhomogeneities (%)	31.8	50.0	7.4

View original graphic|Download|PPT slide

Figure 4 Detected inhomogeneities (a) in Tx on Gharidopatta weather station, (b) in Tn on Rawalakot and (c) in Pr on Naran, in the Jhelum River basin

When some data series are detected as inhomogeneous, that data then become questionable or invalid for climate change, climate variability and trend analysis (Feng et al., 2004). Since overall 28% of the stations are inhomogeneous, it is necessary to remove inhomoge-neities from data series to make it useful for climate analysis in the Jhelum River basin. So,correction (adjustments) were calculated for each inhomogeneous climate station from daily reference series and then adjusted with the infected raw data to compensate the breakpoints. An example of adjustments calculated for the breakpoint in 1990 for Tn of Rawalakot station is shown in Figure 5.

View original graphic|Download|PPT slide

Figure 5 Daily adjustments for the identified breakpoint in 1990 in Tn of Rawalakot station (shown in Figure 4b)

Figure 6 shows changes in average correlation coefficients (CC) calculated between the test and reference series before and after homogenization, on the infected climate stations. These changes were calculated for each climate station and for each climate variable (Tx, Tn and Pr), and then the monthly average change in CC was taken for all infected climate station. Increasing (positive) change in CC means improvement after homogenization, and decreasing (negative) change means no improvement in data series. Almost all months showed positive change except September (in case of Pr), October (Tx), and November (Pr). The improvement is ranged from 2% to 11% in Tx, from 1% to 8% in Tn and 0.1% to 3% in case of Pr, as shown in Figure 6. The maximum improvement was observed in the month of March (in case of Tx), August (Tn) and March (Pr). On the whole, after homogenization, the climate time series are improved and can be used for further climate analysis.

View original graphic|Download|PPT slide

Figure 6 Change in correlation coefficients (CC) between test and reference series before and after homogenization

5 Conclusions

In the present study, daily climate data (maximum temperature, minimum temperature and precipitation) of the Jhelum River basin was extensively quality controlled by applying a combination of different methods (high/low extreme check, internal consistency check, temporal and spatial outlier check). Then, inhomogeneities were detected by a combination of relative homogeneity methods (Standard Normal Homogeneity Test (SNHT), Bivariate test and Easterling & Peterson test) and adjusted by applying correction factors calculated from the reference series on daily basis.

During quality control, 0.59%, 0.78% and 0.023% of the total data values were detected as outliers in maximum temperature, minimum temperature and precipitation time series, respectively. During homogenization, maximum temperature series of 32%, minimum temperature series of 50% and precipitation series of 7% were identified as inhomogeneous, in the Jhelum River basin. After homogenization, the infected series were improved by 1% to 11%.

It was concluded that the precipitation daily time series are fairly homogeneous, except two stations (Naran and Gulmarg), and of a good quality. However, the maximum and minimum temperature datasets require an extensive quality control and homogeneity check before using them in climate analysis, especially in climate variability, climate change and trend analysis. The homogenized dataset will be used to assess the impact of climate change on the water resources of the Jhelum River basin in further studies.

The authors have declared that no competing interests exist.

References

Publishing order | Descend order by publishing year | Descend order by cited within

1	Aguilar E, Auer I, Brunet M et al., 2003. Guidelines on Climate Metadata and Homogenization WMO/TD No.1186. W. M. Organization: Geneva, 52 pp.

Alexandersson

, 1986. A homogeneity test applied to precipitation data.Journal of Climatology, 6: 661-675. doi: 10.1002/joc.3370060607.

Abstract In climate research it is important to have access to reliable data which are free from artificial trends or changes. One way of checking the reliability of a climate series is to compare it with surrounding stations. This is the idea behind all tests of the relative homogeneity. Here we will present a simple homogeneity test and apply it to a precipitation data set from south-western Sweden. More precisely we will apply it to ratios between station values and some reference values. The reference value is a form of a mean value from surrounding stations. It is found valuable to include short and incomplete series in the reference value. The test can be used as an instrument for quality control as far as the mean level of, for instance, precipitation is concerned. In practice it should be used along with the available station history. Several non-homogeneities are present in these series and probably reflect a serious source of uncertainty in studies of climatic trends and climatic change all over the world. The significant breaks varied from 5 to 25 per cent for this data set. An example illustrates the importance of using relevant climatic normals that refer to the present measurement conditions in constructing maps of anomalies.

模态框（Modal）标题

Abstract

Cite this article

1 Introduction

2 Study area and data description

Figure 1 Location of the study area and geographic distribution of weather stations

Table 1 Geographic and basic information about the climate stations available in the Jhelum River basin

3 Methodology

3.1 Quality control

Table 2 Local extremes of Tx, Tn and Pr

3.2 Homogenization

4 Results and discussion

4.1 Spatial correlation

Figure 2 Average correlation coefficients between weather stations in the Jhelum River basin

Figure 3 Variation in correlation coefficients with respect to distance between weather stations in the Jhelum River basin

4.2 Quality control of daily data

Table 3 Percentages of erroneous data in Tx, Tn and Pr time series during quality control in the Jhelum River basin

4.3 Homogenization

Table 4 Inhomogeneous stations and number of breakpoints in Tx, Tn and Pr in the Jhelum River basin

Figure 4 Detected inhomogeneities (a) in Tx on Gharidopatta weather station, (b) in Tn on Rawalakot and (c) in Pr on Naran, in the Jhelum River basin

Figure 5 Daily adjustments for the identified breakpoint in 1990 in Tn of Rawalakot station (shown in Figure 4b)

Figure 6 Change in correlation coefficients (CC) between test and reference series before and after homogenization

5 Conclusions

References