Journal of Geographical Sciences >
Machine learning-based identification for the main influencing factors of alluvial fan development in the Lhasa River Basin, Qinghai-Tibet Plateau
Chen Tongde (1993-), PhD Candidate, specialized in soil erosion and land quality evaluation. E-mail: xnctd2015@126.com |
Received date: 2021-10-01
Accepted date: 2022-02-14
Online published: 2022-10-25
Supported by
The Strategic Priority Research Program of Chinese Academy of Sciences(XDA20040202)
The Second Tibetan Plateau Scientific Expedition and Research Program (STEP)(2019QZKK0603)
Alluvial fans are an important land resource in the Qinghai-Tibet Plateau with the expansion of human activities. However, the factors of alluvial fan development are poorly understood. According to our previous investigation and research, approximately 826 alluvial fans exist in the Lhasa River Basin (LRB). The main purpose of this work is to identify the main influencing factors by using machine learning. A development index (Di) of alluvial fan was created by combining its area, perimeter, height and gradient. The 72% of data, including Di, 11 types of environmental parameters of the matching catchment of alluvial fan and 10 commonly used machine learning algorithms were used to train and build models. The 18% of data were used to validate models. The remaining 10% of data were used to test the model accuracy. The feature importance of the model was used to illustrate the significance of the 11 types of environmental parameters to Di. The primary modelling results showed that the accuracy of the ensemble models, including Gradient Boost Decision Tree, Random Forest and XGBoost, are not less than 0.5 (R2). The accuracy of the Gradient Boost Decision Tree and XGBoost improved after grid research, and their R2 values are 0.782 and 0.870, respectively. The XGBoost was selected as the final model due to its optimal accuracy and generalisation ability at the sites closest to the LRB. Morphology parameters are the main factors in alluvial fan development, with a cumulative value of relative feature importance of 74.60% in XGBoost. The final model will have better accuracy and generalisation ability after adding training samples in other regions.
Key words: alluvial fan; machine learning; feature importance; XGBoost; Lhasa River Basin
CHEN Tongde , WEI Wei , JIAO Juying , ZHANG Ziqi , LI Jianjun . Machine learning-based identification for the main influencing factors of alluvial fan development in the Lhasa River Basin, Qinghai-Tibet Plateau[J]. Journal of Geographical Sciences, 2022 , 32(8) : 1557 -1580 . DOI: 10.1007/s11442-022-2010-9
Figure 1 Location of the Lhasa River Basin (LRB) |
Figure 2 Distribution of alluvial fans in the Lhasa River Basin |
Figure 3 Typical alluvial fan and its matching catchment in the Lhasa River Basin |
Figure 4 Flow chart of modelling |
Figure 5 Conceptional map of the developmental state of alluvial fan. Height 1 is more than Height 2, and Area 1 is less than Area 2; The state of a is unsteady. The alluvial fan is easily eroded by external forces because the area is small, and the height is long. The state of d is the steadiest one amongst a, b and c due to the large area and long height. |
Table 1 Brief information of the four dependent parameters |
No. | Alluvial fan parameter | Abbreviation | Unit | Range |
---|---|---|---|---|
1 | Area | Fa | km2 | 0.05-82.99 |
2 | Perimeter | Fp | km | 0.95-50.59 |
3 | Average gradient | Fg | ° | 2.32-23.34 |
4 | Height | Fh | m | 19-570 |
Table 2 Brief of the 15 matching catchment independent parameters of alluvial fans |
No. | Name of parameter | Abbreviation | Unit | Range |
---|---|---|---|---|
1 | Catchment area | CA | km² | 0.16-490.77 |
2 | Catchment perimeter | CP | km | 1.51-125.78 |
3 | Catchment average slope gradient | CSGa | ° | 4.91-36.34 |
4 | Sunny slope percentage | CSA1 | % | 0-100 |
5 | Half-sunny slope percentage | CSA2 | % | 0-100 |
6 | Shady slope percentage | CSA3 | % | 0-100 |
7 | Half-shady slope percentage | CSA4 | % | 0-100 |
8 | Catchment relief | CR | m | 80-2283 |
9 | Catchment relief ratio | CRR | m/m | 0.35-1 |
10 | Catchment drainage density | CDD | km/km2 | 2.61-10.65 |
11 | Catchment shape coefficient | CSC | / | 1.08-2.49 |
12 | Catchment average rock hardness | RHa | / | 1-5 |
13 | Catchment average NDVI | VIa | / | 0.14-0.80 |
14 | Average annual rainfall | Ra | mm | 409-759 |
15 | Average annual glacier and snow cover | GSa | km2 | 0-0.01 |
Figure 6 Geological map of the Lhasa River Basin |
Figure 7 Distribution of rock hardness in the Lhasa River Basin |
Table 3 Correlation analyses between Di and independent parameters |
Di | CA | CP | CSGa | CSA1 | CSA2 | CSA3 | CSA4 | CR | CRR | CDD | CSC | Rha | VIa | Ra | GSa |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1.000 | 0.580** | 0.641** | 0.057 | 0.097** | -0.004 | 0.028 | -0.003 | 0.190** | 0.198** | -0.076* | 0.527** | -0.249** | -0.256** | -0.208** | 0.126** |
1.000 | 0.979** | 0.026 | 0.088* | 0.030 | 0.052 | 0.024 | 0.729** | -0.117** | 0.149** | 0.296** | -0.127** | -0.159** | -0.126** | 0.236** | |
1.000 | 0.033 | 0.093** | 0.024 | 0.051 | 0.020 | 0.686** | -0.080* | 0.106** | 0.468** | -0.144** | -0.191** | -0.160** | 0.189** | ||
1.000 | 0.051 | -0.007 | -0.049 | -0.009 | 0.065 | 0.017 | 0.027 | 0.067 | -0.025 | 0.048 | -0.136** | -0.013 | |||
1.000 | 0.541** | -0.866** | -0.803** | 0.033 | 0.046 | -0.022 | 0.056 | -0.425** | -0.131** | -0.100** | 0.024 | ||||
1.000 | -0.743** | -0.427** | 0.025 | 0.022 | -0.047 | -0.028 | -0.267** | -0.173** | 0.005 | 0.032 | |||||
1.000 | 0.578** | 0.033 | -0.046 | -0.025 | 0.018 | 0.377** | 0.108** | 0.031 | -0.003 | ||||||
1.000 | 0.024 | -0.042 | 0.034 | 0.005 | 0.362** | 0.104** | 0.102** | 0.008 | |||||||
1.000 | -0.481** | 0.351** | 0.095** | -0.073* | -0.047 | -0.086* | 0.121** | ||||||||
1.000 | 0.129** | 0.123** | -0.011 | -0.026 | -0.061 | -0.031 | |||||||||
1.000 | -0.121** | 0.084* | 0.108** | -0.028 | 0.040 | ||||||||||
1.000 | -0.146** | -0.193** | -0.221** | -0.130** | |||||||||||
1.000 | 0.267** | 0.146** | -0.140** | ||||||||||||
1.000 | 0.074* | -0.343** | |||||||||||||
1.000 | 0.119** | ||||||||||||||
1.000 |
Note: * is a significant correlation at the 0.05 level, ** is a significant correlation at the 0.01 level. The bold fonts represent the value of correlations greater than 0.7. A Spearman correlation analysis was conducted via SPSS 19.0 (SPSS Inc. Chicago, USA) |
Figure 8 Sample sites of alluvial fans in the Qinghai-Tibet Plateau (QTP) |
Figure 9 Primary results of different types of models |
Figure 10 Optimisation results of different types of models |
Table 4 Testing results of the generalisation ability of the alternative models |
River basin | Gradient Boost Decision Tree (R2) | XGBoost (R2) |
---|---|---|
Danupu River Basin | 0.569 | 0.670 |
Niyangqu River Basin | 0.277 | 0.389 |
Bayin River Basin | 0.093 | 0.297 |
Figure 11 Relative feature importance of the independent parameters |
[1] |
|
[2] |
|
[3] |
|
[4] |
|
[5] |
|
[6] |
|
[7] |
|
[8] |
|
[9] |
|
[10] |
|
[11] |
|
[12] |
|
[13] |
|
[14] |
|
[15] |
|
[16] |
|
[17] |
|
[18] |
|
[19] |
|
[20] |
|
[21] |
|
[22] |
|
[23] |
|
[24] |
|
[25] |
|
[26] |
|
[27] |
|
[28] |
|
[29] |
|
[30] |
|
[31] |
|
[32] |
|
[33] |
|
[34] |
|
[35] |
|
[36] |
|
[37] |
|
[38] |
|
[39] |
|
[40] |
|
[41] |
|
[42] |
|
[43] |
|
[44] |
|
[45] |
|
[46] |
|
[47] |
|
[48] |
|
[49] |
|
[50] |
|
[51] |
|
[52] |
|
[53] |
|
[54] |
|
[55] |
|
[56] |
|
[57] |
|
[58] |
|
[59] |
|
[60] |
|
[61] |
|
[62] |
|
[63] |
|
[64] |
|
[65] |
|
[66] |
|
[67] |
|
/
〈 |
|
〉 |