Plant and Animal Ecology

Vegetation Coverage Inversion based on Combined Active and Passive Remote Sensing: A Case Study of the Baiyangdian-Daqinghe Basin

  • YANG Jin , 1, 2 ,
  • SHI Mingchang , 1, 2, * ,
  • YANG Jianying 1, 2 ,
  • CHENG Fu 3 ,
  • YU Hongfeng 1, 2
Expand
  • 1. College of Soil and Water Conservation, Beijing Forestry University, Beijing 100083, China
  • 2. Key Laboratory of Soil and Water Conservation and Desertification Combating, Ministry of Education, Beijing Forestry University, Beijing 100083, China
  • 3. Monitor Center of Soil and Water Conservation, Ministry of Water Resources, Beijing 100055, China
*SHI Mingchang, E-mail:

YANG Jin, E-mail:

Received date: 2022-05-23

  Accepted date: 2022-08-20

  Online published: 2023-04-21

Supported by

The National Science and Technology Major Project of the Ministry of Science and Technology of China(2018ZX07110001)

Abstract

Due to poor penetrability, optical remote sensing can not identify understory vegetation beneath the canopy. Thus, the vegetation coverage extracted by optical remote sensing alone could not sufficiently capture understory vegetation information to formulate the vegetation coverage factor for soil erosion evaluation. To address this issue, the authors took the Baiyangdian-Daqing River Basin as the research object and considered the photon counting ICESat-2/ATLAS vegetation coverage sampling under different photon point classifications. Based on the measured data, satellite-ground collaborative vegetation coverage sampling was achieved in the study area. The results showed that compared with the inversion results extracted by the traditional NDVI pixel dichotomy, the vegetation coverage estimated by the random forest regression model constructed in this study was more accurate. To a certain extent, the proposed model can monitor the understory vegetation of dense forests and complement the lack of understory vegetation signal in optical remote sensing. In the three error tolerance 0.05, 0.1, and 0.15 ranges, the inversion accuracy of vegetation coverage was increased by -4.1%, 5.3%, and 9.4%, reaching the accuracy of 55.6%, 71.1%, and 94.3%, respectively.

Cite this article

YANG Jin , SHI Mingchang , YANG Jianying , CHENG Fu , YU Hongfeng . Vegetation Coverage Inversion based on Combined Active and Passive Remote Sensing: A Case Study of the Baiyangdian-Daqinghe Basin[J]. Journal of Resources and Ecology, 2023 , 14(3) : 591 -603 . DOI: 10.5814/j.issn.1674-764x.2023.03.014

1 Introduction

The soil and water conservation capacity of vegetation is mainly determined by its horizontal coverage and vertical structure (Huang et al., 2005). Wei et al. (2002) proposed a soil erosion model with layered coverage of arbor, shrub, and grass and improved the accuracy of soil erosion assessment. Scholars also suggested that the leaf area index could characterize the horizontal coverage and vertical structure of vegetation (Wang et al., 2006; Sun et al., 2010), which was more suitable for soil erosion research than the traditional vegetation coverage. Wen et al. (2010) proposed a layered vegetation index assessing soil erosion, which combined the soil and water conservation effect of different vegetation layers and weighted the vegetation coverage of arbor, shrub, grass, and litter layers to synthesize a structural vegetation coverage. According to their results, the layered vegetation index was more suitable for soil erosion evaluation than the traditional vegetation coverage, but its calculation was relatively complex. On this basis, the green vegetation index and the yellow vegetation index were defined to extract the structural vegetation coverage (Wen et al., 2010).
Although vegetation coverage in current soil erosion research is replaced by other indicators, its extraction still relies on passive optical remote sensing, which has specific problems due to its technical limitations. 1) With single-direction orthophoto projection and poor penetrability, passive optical remote sensing is prone to signal saturation in areas with dense vegetation, which can only reflect the horizontal structure of the vegetation canopy but can not effectively monitor the understory vegetation (Huang et al., 2005). 2) Optical remote sensing is easily affected by cloud and rain, which leads to frequent missing data, rendering it ineffective for long time series (He et al., 2015). 3) Since traditional optical remote sensing often extracts vegetation cover information based on spectral information, the great spectral differences between vegetation types, such as photosynthetic vegetation and non-photosynthetic vegetation, may cause errors (Zheng et al., 2016). The vegetation/non-vegetation distinction is easy using the spectral information provided by optical remote sensing data, which constitutes its advantages in extracting vegetation level information. However, passive optical remote sensing is prone to signal saturation in areas with dense vegetation and is vulnerable to cloud and rain interference, which limits its vegetation coverage extraction accuracy. Due to such disadvantages and defects in vegetation cover information extraction, the results of single remote sensing can not meet the requirements of some applications, which necessitates multi-source remote sensing data integrating the advantages of different remote sensing methods for vegetation cover information extraction (Guo et al., 2016).
Laser radar data actively transmit laser pulses to the ground and record the echo information reflected by the ground objects. As the laser pulses can pass through the canopy gap to capture the underlying surface coverage information of the forest, it has incomparable advantages in extracting vegetation vertical structure parameters (Li et al., 2016). Among them, ICESat-GLAS has been used to extract forest biomass, leaf area index, and other vegetation parameters (Ballhorn et al., 2011; Liu et al., 2016; Yang et al., 2019). The recently launched space-borne lidar ICESat-2/ATLAS provides a new data source for improving the accuracy of vegetation parameters and also injects new vitality and direction into extracting vegetation coverage for soil erosion assessment (Glenn et al., 2016; Narine et al., 2019a; Narine et al., 2019b; Narine et al., 2020; Silva et al., 2021). Due to the current technical limitations, the space-borne lidar data can not cover the entire study area, which can only be sampled discretely in space, while optical and microwave radar remote sensing can compensate for this deficiency (Scarth et al., 2019). In addition, the study area is a typical northern China rocky mountain area with complex terrain. Since the vegetation cover information extracted from remote sensing images contains terrain interference information, terrain factors can also be fully considered (Liu et al., 2015). With multi-source remote sensing data, the vegetation coverage information was sufficient to improve the extraction accuracy of soil erosion vegetation coverage.
In summary, this study takes the mountainous area in the upper reaches of the Baiyangdian-Daqinghe River as the study area to extract vegetation coverage for soil erosion. Active and passive multi-source remote sensing is adopted to integrate their advantages and fully capture the horizontal and vertical vegetation coverage information, thus achieving the inversion of vegetation coverage of soil erosion in large areas.

2 Study area and data sources

2.1 Overview of the study area

The study area is in the upper reaches of the Baiyangdian-Daqinghe Basin (113.19°E-115.85°E and 38.59°N- 40.01°N) in the Taihang Mountains, covering Hebei, Shanxi, Beijing, and Tianjin (Fig. 1). It is a typical northern China rocky mountain area with complex terrain and significant altitude difference. There are abundant vegetation types in the region with regular vertical differentiation characteristics. Larix principis-rupprechtii, Pinus tabulaeformis, Quercus mongolica, Prunus armeniaca, and Robinia pseudoacacia are the main vegetation types in the low and middle mountains. In high mountains, the coverage of Betula platyphylla Suk and Betula albosinensis Burk increases, and shrubs such as Vitex negundo, Ziziphus jujuba, Lespedeza bicolor, Cotinus coggygria, Corylus heterophylla, and Spiraea japonica are distributed in areas with poor conditions. Alpine meadows are distributed at the mountain tops beyond 2000 m above sea level.
Fig. 1 Scope of the study area and distribution of the quadrat

2.2 Data sources

The data used in this study mainly include field survey data, remote sensing data, and other auxiliary data, as detailed in Table 1.
Table 1 Data details
Data types Description Data source Function
Field survey data Ground survey/Sub-compartment data field survey Sampling and Verification
Spaceborne lidar ICESat-2-ATL03/ATL08 https://nsidc.org/data/icesat-2 Vegetation coverage sampling
Optical remote sensing Sentinel-2 Google earth engine (GEE) Extracting feature variables for vegetation coverage inversion modeling
Synthetic aperture radar (SAR) Sentinel-1 Google earth engine (GEE)
Terrain data ALOS DEM https://search.asf.alaska.edu/
Land use data CGLS-LC100 Google earth engine (GEE) Analysis of vegetation coverage sampling results and Modeling of vegetation coverage
(1) Field survey data
In this study, two types of field survey data were used. A vegetation coverage quadrat was collected in the mountainous area of the upper reaches of the Daqing River Basin in Baiyangdian from July 11 to August 11, 2019. The sample number was 866. The sampling criteria is that the vegetation in the sample has a uniform distribution. The vertical structure of coverage (the arbor layer, shrub layer, herb layer) is fully considered to obtain the coverage of the quadrat. The coverage was obtained by combining visual estimation and probability calculation and the quadrat size was 30 m×30 m. A certain number of observation points were set in the quadrat, and the coverage is the ratio of vegetation observation points to total observation points.
The other set is the 2014 sub-compartment survey data. In this study, seven indicators were used, including total vegetation coverage, arbor vegetation coverage (canopy density), shrub vegetation coverage, herb vegetation coverage, tree species structure, community structure, and forest age. Since the sub-compartment data was collected five years before the study, it was first filtered. According to the forest age group indicators recorded in the survey, the patches of mature and overmature forests with relatively stable forest structures were filtered from sub-compartment, and the Sentinel-2 image in 2019 was compared to ensure that there was no significant change in the types of vegetation within the patches. The data of 929 patches were finally obtained after screening, and the spatial distribution map is shown in Fig. 1.
(2) ICESat-2/ATLAS data
Space-borne lidar ICESat-2/ATLAS data were from NASA (https://icesat-2.gsfc.nasa.gov/), which launched the ICESat-2 satellite in September 2018. The ATLAS instrument emits green (532 nm) laser pulses at a 10 kHz repetition rate, with a circular footprint only 17 m in diameter, and yields approximately one transmitted laser pulse every 0.7 m along ground tracks. Approximately 1014 photons leave the ATLAS sensor with each pulse and travel through the atmosphere to earth. Of those which reflect off the surface, approximately 10 travel back through the atmosphere and into the ATLAS telescope, where their arrival is time-tagged by the instrument’s electronics (Neumann et al., 2021). ICESat-2/ATLAS provides Level 0, Level 1, Level 2, and Level 3 products. Level 0 product ATL00 is the original measurement data acquired by the sensor; Level 1 product is the product data after format conversion (ATL01) and instrument error correction (ATL02); ATL03 and ATL04 are two-level products, and ATL03 is global positioning data. The position information (longitude, latitude, height) of each photon along the orbit is recorded, and the photon signal is marked as a signal or noise point. Each photon has undergone geophysical correction, such as earth tide and atmospheric delay. ATL06 to ATL21 are level 3 products, mainly providing glaciers, ice sheets, sea ice, land and vegetation, and water elevation information products (Zhu et al., 2020).
This study mainly used ATL03 and ATL08. ATL03 and ATL08 data products of the ICESat-2 set were obtained from June to September 2019 and downloaded from https://nsidc.org/data/-icesat-2, as detailed in Table 2. In this study, ATL03 and ATL08 products are associated using the PhoREAL-v3.27 software (https://github.com/icesat-2UT/PhoREAL) to obtain photon point cloud data with positioning information. The spatial distribution of ICESat-2 laser points is shown in Fig. 2.
Table 2 List of ICESat-2/ATL03 and ICESat-2/ATL08 data
Order number ICESat-2 data product name used in research
1 ATL03/8_20190623194316_13220302_004_01.h5
2 ATL03/8_20190722181912_03770402_004_01.h5
3 ATL03/8_20190726181055_04380402_004_01.h5
4 ATL03/8_20190816170341_07580402_004_01.h5
5 ATL03/8_20190820165523_08190402_004_01.h5
6 ATL03/8_20190824164704_08800402_004_01.h5
7 ATL03/8_20190914153947_12000402_004_01.h5
8 ATL03/8_20190918153128_12610402_004_01.h5
9 ATL03/8_20190922152309_13220402_004_01.h5
10 ATL03/8_20190926151450_13830402_004_01.h5
Fig. 2 Distribution of ICESat-2/ATLAS laser points

3 Research method

3.1 Vegetation coverage sampling method based on satellite-ground coordination

In order to increase the number of samples in vegetation coverage inversion modeling, this study introduced ICESat-2 satellite-borne lidar to explore the feasibility of obtaining a large number of vegetation coverage samples by the collaborative sampling of satellite (satellite-borne lidar) and ground (field survey sample). Firstly, to address the space-borne lidar vegetation coverage sampling unit problem, this study uses a multi-scale segmentation algorithm and the eCogniton software to segment Sentinel-2 images, then the sampling unit is extracted. On this sampling unit, vegetation coverage sampling is carried out. Among them, the classification accuracy of photon points is the premise for ensuring the accurate sampling of ICESat-2 vegetation coverage. Two photon point classification methods were adopted to identify the optimal classification algorithm. One is the ATL08 photon point classification algorithm provided by NASA, and the other is the height threshold method for photon point classification. To identify the optimal height of photon point separation, the height for photon point classification was set to 0.5 m, 1.0 m, 1.5 m, 2.0 m, 2.5 m, 3.0 m, 3.5 m, 4.0 m, 4.5 m, and 5.0 m, respectively.
According to previous studies (Narine et al., 2019b), the vegetation coverage is calculated by vegetation photons and ground photon points based on the classification of photon points, and the calculation is as follows.
$\text{cover}=\frac{\text{n }\!\!\_\!\!\text{ ca}}{\text{n }\!\!\_\!\!\text{ ca}+\text{n }\!\!\_\!\!\text{ te}}$
where cover is the vegetation coverage in the sampling unit, n_ca is the number of vegetation canopy photon points, n_te is the number of ground photon points.

3.2 Construction of the vegetation coverage inversion method

3.2.1 Feature variable extraction and optimization method

(1) Feature variable extraction method based on Sentinel-1 and Sentinel-2 Data
The relationship between remote sensing factors and vegetation coverage was fully explored with reference to previous studies (Chen et al., 2019). A total of 189 remote sensing factors related to vegetation coverage were extracted based on Sentinel-1 and Sentinel-2, including three tasseled cap transformation variables, three principal component variables and their gray level co-occurrence matrix texture features (3×8), ten bands in Sentinel-2 (B2-8, B8A, B11-12), 147 spectral indexes, and two radar vegetation indexes as the alternative feature variables in the vegetation coverage remote sensing estimation model. Among them, the principal component analysis and texture feature extraction are completed in the ENVI software, the extraction method of tasseled cap transformation is detailed in the literature (Biradar et al., 2006), and the calculation of vegetation indexes is shown in Table 3.
Table 3 The calculation of vegetation indexes
Vegetation index Calculation formula Quantity
Normalized difference vegetation index NDI(i,j)=(Bi-Bj)/(Bi+Bj), where i, j=2,$\cdots$, 8, 8A, 11, 12; i≠j and i>j 45
Ratio vegetation index SR(i,j)= Bi/Bj, where i, j=2, $\cdots$, 8, 8A, 11, 12; i≠j 90
Enhanced vegetation index EVI=2.5×(B8-B4)/(B8+6B4-7.5B2+1) 1
Transformed normalized vegetation index TNDVI$=\sqrt{\left( \text{B}8-\text{B}4 \right)/\left( \text{B}8+\text{B}4 \right)+0.5}$ 1
Renormalized vegetation index RDVI$=\left( \text{B}8-\text{B}4 \right)/\sqrt{\text{B}8+\text{B}4}$ 1
Global environmental detection index
GEMI=a×(1-0.25×a)-(B4-0.125)/(1-B4), where a=[2×(B8A2-B42)+1.5×B8A+ 0.5×B4]/(B8A+B4+0.5) 1
Soil adjusted vegetation index SAVI=1.5×(B8-B4)/(B8+B4+0.5) 1
Adjustable vegetation index for transformed soil TSAVI=0.5×(B8-0.5×B4-0.5)/(0.5×B8+B4-0.15) 1
Red edge chlorophyll index CI=B7/B5-1 1
Red edge position index REP=705+35×[(B4+B7)/2-B5]/(B6-B5) 1
Plant senescence reflectance index PSRI=(B4-B3)/B6 1
Normalized vegetation canopy shadow index NDCSI(i)=NDVI×(Bi-Bimin)/(Bimax-Bimin), where NDVI=(B8-B4)/(B8+B4), where i=5, 6, 7 3
Normalized radar vegetation index NDVHVV=(VH-VV)/(VH+VV) 1
Ratio radar vegetation index RAVHVV=VV/VH 1

Note: Bi: The band i in Sentinel-2; Bimin: The minimum value at Bi band in Sentinel-2; Bimax: The maximum value at Bi band in Sentinel-2; VH: The backscattering intensity in Sentinel-1 under VH polarization; VV: The backscattering intensity in Sentinel-1 under VV polarization.

(2) Feature variable extraction method based on DEM data of digital elevation model
Terrain affects the expression of vegetation information in remote sensing images and the growth environment of vegetation, thereby affecting the growth and distribution of plants (Chen et al., 2019). Therefore, the terrain factors are extracted based on the preprocessed 10 m resolution ALOS DEM due to the complex terrain in this study area, as shown in Table 4.
Table 4 Terrain factor feature variables
Terrain factor Description
H Elevation
S Slope
SinA Sine in aspect, Eastward degree
CosA Cosine in aspect, Northward degree

Note: A: Aspect.

(3) Feature variable priority method
The feature variables were extracted from Sentinel-1, Sentinel-2, and DEM data, and the correlation between the feature variables and the sampling points of vegetation coverage was analyzed. The factors that did not significantly correlate with vegetation coverage (P>0.01) were excluded. The correlation coefficient between various variables was above 0.8, which was identified as information redundancy, and the strongest correlation factor with vegetation coverage was retained. In this study, the Pearson correlation coefficient is used for correlation analysis, which can be calculated as:
$r=\frac{\sum\limits_{i=1}^{n}{({{x}_{i}}-\bar{x})({{y}_{i}}-\bar{y})}}{\sqrt{\sum\limits_{i=1}^{n}{{{({{x}_{i}}-\bar{x})}^{2}}}\sum\limits_{i=1}^{n}{{{({{y}_{i}}-\bar{y})}^{2}}}}}$
where n is the number of samples; xi and yi are the feature variable in the i-th object and the measured vegetation coverage, respectively;$\bar{x}$ and $\bar{y}$ are the average of feature variable and measured vegetation coverage, respectively.

3.2.2 Vegetation coverage modeling algorithm

Random forest is one of the best nonparametric regression algorithms widely used to estimate vegetation parameters (Cheng et al., 2020). The random forest regression is carried out based on the vegetation coverage sample points collected by ICESat-2 and feature selection variables. The random forest regression was implemented using the machine learning software Weka. The RandomForest algorithm was selected, and the number of trees was set to 500. The max features is the sqrt of the number of feature variables.

3.3 Accuracy evaluation method

The random forest regression model was applied to the study area, and the results were compared with the inversion results of the traditional NDVI pixel dichotomy model to analyze the extraction effect (Wei et al., 2018). Among them, the pixel dichotomy assumes that each mixed pixel in the remote sensing image is composed of vegetation and bare soil. The spectral information of each pixel is the linear combination of the spectral information of these two components, and the proportion of each component is the corresponding weight in the linear combination.
Assuming that the proportion of vegetation in a pixel is FVC and the proportion of non-vegetation is 1-FVC, the vegetation coverage can be calculated as follows:
$NDVI=FVC\times NDV{{I}_{v}}+(1-FVC)\times NDV{{I}_{s}}$
in which
$FVC=\frac{NDVI-NDV{{I}_{\text{s}}}}{NDV{{I}_{\text{v}}}-NDV{{I}_{\text{s}}}}$
where FVC is vegetation coverage, NDVI is vegetation index, NDVIv is vegetation index of pure vegetation pixels, NDVIs is vegetation index of pure bare soil pixels, and 5% and 95% cumulative values in the study area are taken as NDVIs and NDVIv values, respectively.
In this study, the inversion results were evaluated by four indicators: the root mean square error (RMSE), evaluation accuracy (e < 0.05), evaluation accuracy (e < 0.1), and evaluation accuracy (e< 0.15).
RMSE measures the deviation between measured and estimated vegetation coverage, which can be calculated as follows:
$RMSE=\sqrt{\frac{\sum\limits_{\text{i}=1}^{N}{{{({{y}_{i}}-{{{\hat{y}}}_{i}})}^{2}}}}{N}}$
where N is the number of samples; yi and ${{\hat{y}}_{i}}$ are the estimated vegetation coverage in the i-th object and the measured vegetation coverage, respectively.
The evaluation accuracy evaluates how accurate the inversion results are within a specific allowable error range, which can be calculated as follows:
$E{{A}_{(e<i)}}=\frac{Count(e<i)}{Countall}$
where i = 0.05, 0.1, 0.15 represents the allowable error; Count (e < i) indicates the number of samples with error less than allowable error i between measured and estimated vegetation coverage; Countall is the total number of evaluation samples.

4 Results and analysis

4.1 Accuracy analysis of satellite-ground vegetation coverage sampling

The vegetation coverage extraction accuracy of ICESat-2 is mainly determined by the classification accuracy between vegetation photon points and ground photon points. Two different photon point classification methods were compared in this study. One is to use the photon classification results of ATL08 products to identify vegetation photon points and ground photon points. The other is using the height threshold method for photon point classification. In order to identify the optimal photon point separation height, this study set the heights to 0.5 m, 1.0 m, 1.5 m, 2.0 m, 2.5 m, 3.0 m, 3.5 m, 4.0 m, 4.5 m, and 5.0 m, respectively.
In order to ensure sufficient sampling density of vegetation coverage in each sampling unit, the samples with less than 50 photon point clouds are excluded. The vegetation coverage extraction accuracy of ICESat-2 was analyzed and verified based on 66 samples from ground survey in 2019, and the optimal photon classification method was selected. Figure 3 shows the measured vegetation coverage data and the vegetation coverage scatter plots under different photon point classification methods. A comparison of the 1:1 line showed that the ATL08 classification algorithm and the 0.5 to 2.0 m height threshold separation could overestimate the vegetation coverage collected by ICESat-2. With the separation height threshold of 2.5 to 3.5 m, the vegetation coverage collected by ICESat-2 was overestimated at low vegetation coverage and underestimated at high vegetation coverage. With the separation height threshold greater than 3.5 m, the vegetation coverage collected by ICESat-2 was underestimated due to the loss of low vegetation points at excessive separation height, and the calculated vegetation coverage was too low. Specifically, the vegetation coverage accuracy with different separation height ranking from high to low is Cover1.5m>Cover1m> Cover0.5m > Cover2m > Cover2.5m > CoverATL08 > Cover3m > Cover3.5m > Cover4m > Cover4.5m > Cover5m. The vegetation coverage accuracy calculated by photon point based on 1.5 m height separation was the highest. The correlation with the measured vegetation coverage was 0.763, which was higher than that collected by the ATL08 classification algorithm.
Fig. 3 Scatter plot of ICESat-2 vegetation coverage based on photon point classification and measured vegetation coverage

Note: FVC_ICESat (ATL08) is the ICESat-2 vegetation cover under ATL08 photon point classification; FVC_ICESat (i) is the vegetation coverage of ICESat-2 under i high threshold photon point classification; R is the correlation coefficient between ICESat-2 vegetation coverage and measured vegetation coverage.

4.2 Accuracy evaluation of vegetation coverage inversion results based on measured data

Based on 866 sample points of a ground survey in 2019, the accuracy of the results from the random forest regression model and NDVI pixel dichotomy was verified and analyzed (Table 5). Compared with the NDVI pixel dichotomy model, the root mean square error of the random forest regression model is reduced by 0.026 (Fig. 4a). Under the tolerance error of 0.05, the accuracy increased by -4.1% from 59.7% to 55.6%. Under the 0.1 tolerance error, the accuracy was increased from 65.8% to 71.1%. Under the tolerance error of 0.15, the accuracy of the random forest regression model is 94.3%, which is 9.4% higher than that of the NDVI pixel dichotomy.
Table 5 Accuracy of vegetation coverage estimation models based on the random forest regression model and NDVI pixel dichotomy
Methods RMSE EA (e<0.05) EA (e<0.1) EA (e<0.15)
RF 0.086 55.6% 71.1% 94.3%
NDVI 0.102 59.7% 65.8% 84.9%
AI 0.026 -4.1% 5.3% 9.4%

Note: RF: random forest regression model, NDVI: NDVI pixel dichotomy model, AI: accuracy improvement.

In order to further analyze the accuracy of inversion results in forest land, 926 plots selected from subcompartment data were used as forest land field survey data. According to the forest species structure, the inversion results were divided into seven types: broad-leaved pure forest, broad- leaved relative pure forest, broad-leaved mixed forest, coniferous and broad-leaved mixed forest, coniferous pure forest, coniferous relative pure forest, and coniferous mixedforest. The accuracy of inversion results for different types based on the random forest model was evaluated and compared with the inversion results of the traditional NDVI pixel dichotomy model. As shown in Table 6 and Fig. 4(b-h), compared with the inversion results extracted by the traditional NDVI pixel dichotomy, the vegetation coverage estimated by the random forest regression model is more consistent with the measured vegetation coverage, and the accuracy is higher. Specifically, the inversion accuracy in broad-leaved forests (BPF, BRPF, BMF) and coniferous and broad-leaved mixed forests (CBMF) has been greatly improved. However, the improvement in coniferous forests is relatively low, and the accuracy is even reduced in coniferous mixed forests. The reason may be the 1308 broad-leaved forest sample points when modeling based on ICESat-2 vegetation coverage but no coniferous forest sample points, resulting in poor accuracy of the inversion results in coniferous forest. This also better illustrates that the vegetation coverage estimation based on ICESat-2 vegetation coverage sampling can improve the inversion accuracy.
Fig. 4 Estimation of vegetation coverage error box diagram based on random forest regression and NDVI pixel dichotomy

Note: (a) All: ground survey; (b) BPF: Broad-leaved pure forest; (c) BRPF: Broad-leaved relative pure forest; (d) BMF: Broad-leaved mixed forest; (e) CBMF: Coniferous and broad-leaved mixed forest; (f) CPF: Coniferous pure forest; (g) CRPF: Relatively pure coniferous forest; (h) CMF: Coniferous mixed forest.

5 Discussion and outlook

5.1 Distribution characteristics of ICESat-2 point cloud under different land-use types

In order to intuitively analyze the point cloud distribution characteristics of ICESat-2 under different ground object types, the point cloud distribution for buildings, water, shrub, woodland, and farmland is shown in Fig. 5. The ATL08 photon point classification algorithm identifies the building error as the canopy point (Fig. 5a). The photon point distribution for water fluctuates up and down within 0.5 m, and the ATL08 classification algorithm identifies it as ground points (Fig. 5b). The photon point distribution of shrubs and grass is similar to that of low-rise buildings (Fig. 5c). In contrast, the points reflected by ground photons in buildings are denser. For forest land, ICESat-2 point clouds can better display the vertical stratification structure and effectively identify the understory vegetation (Fig. 5d). For farmland (maize), the photon points float up and down within 2 m, which is consistent with the growth height of maize (Fig. 5e).
Fig. 5 ICESat-2 point cloud distribution map for different land-use types

5.2 Optimal selection results of feature variables under different vegetation types

According to the six vegetation types, namely, farmland, shrub, grassland, wetland, deciduous broad-leaved forest, and other forests, the random forest regression is carried out based on the vegetation coverage samples collected by ICESat-2 (Table 7) and optimization results of feature variables (Fig. 6). The optimal feature variables corresponding to the six vegetation types are different. The ratio vegetation index is more sensitive than the normalized difference vegetation index in vegetation coverage monitoring. The red edge band (B5-7, 8A), near infrared band (B8) and short-wave infrared band (B11, B12) of Sentinel-2 and combined vegetation indexes show a high correlation with vegetation coverage of all vegetation types. It is indicated that Sentinel-2 data with red edge bands are more suitable for vegetation coverage extraction than Landsat data. The short-wave infrared band and the vegetation index constructed by it also show great advantages in vegetation coverage modeling. This is because the cellulose and lignin in the vegetation can absorb the short-wave infrared, and its monitoring does not depend on the photosynthetic characteristics of the vegetation. Therefore, the short-wave infrared band has advantages in the extraction of non-photosynthetic vegetation (Wang et al., 2018). The red edge band is sensitive to the monitoring of photosynthetic vegetation. By combining red edge band and short-wave infrared band, the coverage of photosynthetic and non-photosyntheticvegetation can be extracted, contributing to a more effective characterization of the role of vegetation in soil erosion. Furthermore, The texture features and slope play important roles in random forest modeling, especially for farmland, shrub, and grassland.
Fig. 6 Optimization results of features variables under different vegetation types

Note: PCAi_j is the texture features j of principal component i of Sentinel-2, where i=1, 2, 3; Cor: Correlation; Con: Contrast; Var: Variance; Entro: Entroy; Homo: Homogeneity; Diss: Dissimilarity; SM: Second moment.

Table 6 Accuracy evaluation of the random forest regression model and the NDVI pixel dichotomy model in different forests
TSS Methods RMSE EA (e<0.05) (%) EA (e<0.1)(%) EA (e<0.15)(%)
RF 0.114 58.50 64.20 82.90
BPF NDVI 0.152 39.80 52.80 70.70
AI 0.038 18.70 11.40 12.20
RF 0.108 63.70 75.60 88.90
BRPF NDVI 0.126 49.60 63.70 80.70
AI 0.018 14.10 11.90 8.20
RF 0.109 51.30 61.80 88.20
BMF NDVI 0.141 46.10 48.70 63.20
AI 0.032 5.20 13.10 25.00
RF 0.075 78.60 85.70 89.30
CBMF NDVI 0.100 60.70 64.30 85.70
AI 0.025 17.90 21.40 3.50
RF 0.123 40.70 49.70 75.00
CPF NDVI 0.147 43.10 49.20 66.00
AI 0.024 -2.40 0.50 9.00
RF 0.129 55.80 64.10 76.90
RPCF NDVI 0.134 48.10 60.30 77.60
AI 0.005 7.70 3.80 -0.70
RF 0.132 41.90 48.40 74.20
CMF NDVI 0.138 64.50 77.40 80.60
AI 0.006 -22.60 -31.00 -6.40

Note: TSS: Tree species structure; BPF: Broad-leaved pure forest; BRPF: Broad-leaved relative pure forest; BMF: Broad-leaved mixed forest; CBMF: Coniferous and broad-leaved mixed forest; CPF: Coniferous pure forest; RPCF: Relatively pure coniferous forest; CMF: Coniferous mixed forest; RF: Random forest regression model; NDVI: NDVI pixel dichotomy model; AI: Accuracy improvement.

Table 7 Parameters in random forest modeling
Vegetation type Sampling numbers Num Max_features
Farmland 772 27 6
Shurb 1087 33 6
Grassland 7724 41 7
Wetland 33 15 4
DBF 1308 31 6
OF 4971 40 7

Note: DBF: Deciduous broad-leaved forest; OF: Other forests; Num: The number of feature variables; Max_features: The max features in random forest modeling.

5.3 Comparison of inversion results based on the random forest model and NDVI pixel dichotomy

The vegetation coverage extraction results based on random forest regression and the NDVI pixel dichotomy models are presented in Fig. 7(a-b), which shows a large gap between the two.In order to further analyze the difference between the vegetation coverage inversion results of random forest regression and the NDVI pixel dichotomy models, the average absolute error (MAE) and average relative error (ME) of vegetation coverage extracted by the two models were calculated (Fig. 7(c-d)) as follows:
$FV{{C}_{ME}}=FV{{C}_{RF}}-FV{{C}_{NDVI}}$
$FV{{C}_{MAE}}=|FV{{C}_{RF}}-FV{{C}_{NDVI}}|$
Fig. 7 Vegetation coverage inversion results and differences between random forest and NDVI pixel dichotomy model
where FVCME is the average relative error; FVCMAE is the average absolute error; FVCRF represents the vegetation coverage extracted by the random forest regression model; FVCNDVI represents the vegetation coverage extracted by the random forest regression model.
The average absolute error for each vegetation type ranking from highest to lowest is farmland > evergreen coniferous forest > deciduous broad-leaved forest > shrub > herb > other forests (Table 8). The vegetation coverage extracted by the two methods for farmland is significantly different. The vegetation coverage extracted by ICESat-2 is significantly lower than that extracted by the NDVI pixel dichotomy model. The reason for the obvious underestimation of ICESat-2 in farmland may be the 1.5 m height threshold separation method, which is not suitable for sampling the vegetation coverage in farmland. As a result, some vegetation points are incorrectly separated into ground points, leading to underestimated farmland vegetation coverage.
Table 8 Comparison of extraction results of the random forest regression model and the NDVI pixel dichotomy model in different vegetation types
Land cover types Shrub Herb Farmland EC DB OF
FVCME 0.021 0.041 -0.163 -0.089 -0.069 -0.063
FVCMAE 0.145 0.141 0.256 0.161 0.153 0.109
FVCMEANRF 0.572 0.609 0.368 0.753 0.788 0.735
FVCMEANNDVI 0.792 0.548 0.537 0.878 0.927 0.792

Note: FVCME: Average absolute error; FVCMAE: Mean absolute error; FVCMEANRF: Average vegetation coverage extracted from random forest model; FVCMEANNDVI: Average vegetation coverage extracted by NDVI binary model; EC: Evergreen coniferous; DB: Deciduous broadleaved; OF: Other forests.

For farmlands, the average vegetation coverage extracted using ICESat-2 sampling under different photon point classification methods is listed in Table 9. The average vegetation coverage of farmland extracted based on the ATL08 classification method is the highest. Using the height threshold method, the average vegetation coverage of farmland decreases with the increase of the set height threshold.
Table 9 Average ICESat-2 vegetation coverage of different photon point classification methods in farmland
Classification methods ATL08 algorithm Height threshold method
Separation height - 0.5 m 1.5 m 2 m 2.5 m 3.5 m 4 m 5 m
Average vegetation coverage 0.507 0.454 0.313 0.257 0.214 0.156 0.135 0.102
Therefore, future research could select the appropriate photon point classification method based on the different farmland features and field survey data. Due to the lack of farmland samples, no specific analysis was conducted in this study. In future studies, the photon point classification method under different vegetation types should be fully considered to improve the inversion accuracy of vegetation coverage.
According to the different distribution maps generated by the two methods in Fig. 7(a-b), the upper right corner of the study area is significantly different. The reason is that the Sentinel-2 data is affected by cloud and rain disturbance, resulting in abnormal vegetation coverage calculated by the NDVI pixel method. However, the random forest regression method has the advantage of multiple remote sensing data sources and no reliance on a certain type of remote sensing data. Even with invalid Sentinel-2 data, effective vegetation coverage can still be extracted (Fig. 8).
Fig. 8 Comparison of inversion results of the random forest regression model and the NDVI pixel dichotomy model in cloud cover areas

6 Conclusions

Aiming at the problem that single optical remote sensing can not effectively extract vegetation coverage information for soil erosion assessment, this study took the Baiyangdian-Daqinghe River Basin as the research area and combined the measured data and space-borne lidar ICESat-2 data for satellite-ground collaborative vegetation coverage sampling. On this basis, multi-source data, such as Sentinel-2, Sentinel-1, and DEM, were combined, and the vegetation coverage inversion was achieved based on the random forest regression model. The results were compared with the extraction results of the traditional NDVI pixel dichotomy, and the following conclusions were obtained:
(1) The effects of different photon classification methods on the vegetation coverage extraction of ICESat-2 were different. Specifically, the extraction results of photon point classification based on the 1.5 m height threshold were the best, which were better than that of the official photon point classification by NASA and the 0.5 m, 1.0 m, 2.0 m, 2.5 m, 3.0 m, 3.5 m, 4.0 m, 4.5 m, and 5.0 m height thresholds. The correlation coefficient with the measured vegetation coverage was 0.763.
(2) The inversion results of the two models were evaluated using 866 sets of field survey data and 929 sets of forest land subcompartment survey data. The results showed that the inversion results of the random forest regression model were better than those of the traditional NDVI pixel dichotomy model. Specifically, the accuracy improvement was more obvious in forests. To a certain extent, the problem of signal missing in understory vegetation monitoring in dense forests by optical remote sensing was solved. In the range of 0.05, 0.1, and 0.15, the error tolerance of vegetation coverage was increased by -4.1%, 5.3%, and 9.4%, and the accuracy was 55.6%, 71.1%, and 94.3%, respectively.
(3) The inversion results of the two methods showed certain differences in farmland and cloud cover areas. The random forest regression model based on ICESat-2 vegetation coverage sampling underestimated the data in farmland due to the unreasonable separation height set in the ICESat-2 data for farmland vegetation coverage sampling. In cloud cover areas, NDVI pixel dichotomy extraction results showed serious deviations. The method in this study integrated multi-source remote sensing information and had low dependence on single remote sensing. For optical remote sensing in cloud cover areas, effective inversion of vegetation coverage was also achieved.

This research is supported by “The National Science and Technology Major Project of the Ministry of Science and Technology of China”. The authors would like to thank the editors and anonymous reviewers for their useful comments in improving the article. We would like to thank the PhoREAL team and Google earth engine team for their support of the ICESat-2, Sentinel-1, Sentinel-2 and landuse data processing aspects. We are very grateful for the NASA and the Alaska Satellite Facility for distributing the ICESat-2 data (https://nsidc.org/data/icesat-2) and the ALOS PALSAR DEM (https://asf.alaska.edu).

[1]
Ballhorn U, Jubanski J, Siegert F. 2011. ICESat/GLAS data as a measurement tool for peatland topography and peat swamp forest biomass in Kalimantan, Indonesia. Remote Sensing, 3(9): 1957-1982.

[2]
Biradar C M, Li D, Xia L, et al. 2006. A global map of rainfed cropland areas at the end of last millennium using remote sensing and geospatial techniques. International Society for Optics and Photonics, 6418: 64181Q. DOI: 10.1117/12.713204.

[3]
Chen L, Wang Y, Ren C, et al. 2019. Optimal combination of predictors and algorithms for forest above-ground biomass mapping from sentinel and SRTM data. Remote Sensing, 11(4): 414. DOI: 10.3390/rs11040414.

[4]
Cheng J Y, Zhang X F, Sun M, et al. 2020. Random forest model for the estimation of fractional vegetation coverage based on a UAV-ground co-sampling strategy. Acta Scientiarum Naturalium Universitatis Pekinensis, 56(1): 143-154. (in Chinese)

[5]
Glenn N F, Neuenschwander A, Vierling L A, et al. 2016. Landsat 8 and ICESat-2: Performance and potential synergies for quantifying dryland ecosystem vegetation cover and biomass. Remote Sensing of Environment, 185: 233-242.

[6]
Guo Y G, Wang X Y, He J, et al. 2016. Study on forest biomass estimation model based on multisource remote sensing data. Yangtze River, 47(3): 17-22. (in Chinese)

[7]
He H Y, Ling F L, Wang X Q, et al. 2015. Estimation of fractional vegetation coverage in water and soil loss area based on Radar vegetation index. Remote Sensing for Land & Resources, 27(4): 165-170. (in Chinese)

[8]
Huang J X, Wu B F, Zeng Y, et al. 2005. Review of tree, shrub, grass cover of horizontal and vertical scale retrieval from remotely sensed data. Advance in Earth Sciences, 20(8): 871-881. (in Chinese)

[9]
Li Z Y, Liu Q W, Pang Y. 2016. Review on forest parameters inversion using LiDAR. Journal of Remote Sensing, 20(5): 1138-1150. (in Chinese)

[10]
Liu C X, Wang X Y, Huang H B, et al. 2016. The importance of data type, laser spot density and modelling method for vegetation height mapping in continental China. International Journal of Remote Sensing, 37(24): 6127-6148.

[11]
Liu Q, Yang L, Liu Q H, et al. 2015. Review of forest above ground biomass inversion methods based on remote sensing technology. Journal of Remote Sensing, 19(1): 62-74. (in Chinese)

[12]
Narine L L, Popescu S, Malambo L. 2020. Using ICESat-2 to estimate and map forest aboveground biomass: A first example. Remote Sensing, 12(11): 1824. DOI:10.3390/rs12111824.

[13]
Narine L L, Popescu S, Neuenschwander A, et al. 2019a. Estimating aboveground biomass and forest canopy cover with simulated ICESat-2 data. Remote Sensing of Environment, 224: 1-11.

[14]
Narine L L, Popescu S, Zhou T, et al. 2019b. Mapping forest aboveground biomass with a simulated ICESat-2 vegetation canopy product and Landsat data. Annals of Forest Research, 62(1): 69-86.

[15]
Neumann T A, Brenner A, Hancock D, et al. 2021. ATLAS/ICESat-2 L2A Global Geolocated Photon Data, Version 5. Boulder, Colorado, USA: NASA National Snow and Ice Data Center Distributed Active Archive Center. DOI: 10.5067/ATLAS/ATL03.005.

[16]
Scarth P, Armston J, Lucas R, et al. 2019. A structural classification of Australian vegetation using ICESat/GLAS, ALOS PALSAR, and landsat sensor data. Remote Sensing, 11(2): 147. DOI: 10.3390/rs11020147.

[17]
Silva C A, Duncanson L, Hancock S, et al. 2021. Fusing simulated GEDI, ICESat-2 and NISAR data for regional aboveground biomass mapping. Remote Sensing of Environment, 253: 112234. DOI: 10.1016/j.rse.2020.112234.

[18]
Sun J J, Yu D S, Shi X Z, et al. 2010. Comparison of between LAI and VFC in relationship with soil erosion in the red soil hilly region of South China. Acta Pedologica Sinica, 47(6): 1060-1066. (in Chinese)

[19]
Wang K, Shi X Z, Yu D S, et al. 2006. Relationship between LAI and distributional character of soil erosion in hilly red soil regions. Ecology and Environment, 15(5): 1052-1055. (in Chinese)

[20]
Wei H B, Li R, Yang Q K. 2002. Research advances of vegetation effect on soil and water conservation in China. Chinese Journal of Plant Ecology, 26(4): 489-496. (in Chinese)

[21]
Wei X, Wang S, Wang Y. 2018. Spatial and temporal change of fractional vegetation cover in north-western China from 2000 to 2010. Geological Journal, 53: 427-434.

[22]
Wen Z M, Lees B, Jiao F, et al. 2010. Stratified vegetation cover index: A new way to assess vegetation impact on soil erosion. Catena, 83(1): 87-93.

[23]
Yang X B, Wang C, Pan F F, et al. 2019. Retrieving leaf area index in discontinuous forest using ICESat/GLAS full-waveform data based on gap fraction model. ISPRS Journal of Photogrammetry and Remote Sensing, 148: 54-62.

[24]
Zheng G X, Li X S, Zhang K X, et al. 2016. Spectral mixing mechanism analysis of photosynthetic/non-photosynthetic vegetation and bared soil mixture in the hunshandake (otindag) sandy land. Spectroscopy and Spectral Analysis, 36(4): 1063-1068. (in Chinese)

[25]
Zhu X X, Wang C, Xi X H, et al. 2020. Research progress of ICESat-2/ATLAS data processing and applications. Infrared and Laser Engineering, 49(11): 76-85. (in Chinese)

Outlines

/