You are here

Harvard Forest Data Archive


Harmonic Baseline Experiments for Landsat-Based Forest Condition Monitoring in Southern New England 2017

Related Publications



  • Lead: Valerie Pasquarella
  • Investigators: Audrey Barker Plotkin, Robert Bagchi, James Mickley
  • Contact: Information Manager
  • Start date: 2017
  • End date: 2017
  • Status: completed
  • Location: Southern New England
  • Latitude: +39.826112 to +43.997333
  • Longitude: -74.858418 to -68.528739
  • Elevation: 0 to 1062 meter
  • Taxa: Lymantria dispar (gypsy moth)
  • Release date: 2021
  • Revisions:
  • EML file: knb-lter-hfr.374.3
  • DOI: digital object identifier
  • EDI: data package
  • DataONE: data package
  • Related links:
  • Study type: modeling
  • Research topic: ecological informatics and modelling; historical and retrospective studies; invasive plants, pests and pathogens; regional studies
  • LTER core area: disturbance
  • Keywords: defoliation, forest disturbance, geographic information systems, insects, invasive species, modeling, remote sensing, spatial variability
  • Abstract:

    This dataset was developed as part of a study of harmonic baseline model parameterization for forest condition monitoring using Landsat time series. We implemented a previously published harmonic modeling approach for forest condition monitoring in Google Earth Engine and systematically assessed the relative ability of condition change products generated using various model parameterizations for predicting pest abundances and defoliation during the 2016-2018 gypsy moth (Lymantria dispar) outbreak in southern New England. We ran a series of 32 experiments that considered a variety of parameter choices for establishing multi-year “baseline” models representing relatively stable forest conditions for each Landsat pixel in our study area. We tested a full set of factors including (a) spectral vegetation index used for model fitting, (b) baseline-modeling period, (c) frequencies of harmonic regression terms, and (d) differences in Landsat time series input imagery. We generated average condition score estimates for each of these 32 baseline parameterizations for a May 1 to September 30, 2017 monitoring period, then used Generalized Linear Mixed Models to test the relationships between ground-based observations of defoliation and defoliator abundance (larva and egg masses). This archived dataset includes the full set of experimental raster results, as well as a “reanalysis” product from a previous implementation of our condition monitoring workflow. More information on model parameterization rankings can be found in the associated publication (Pasquarella et al. 2021).

  • Methods:

    We adapted the forest condition monitoring existing workflow presented in Pasquarella et al. (2017) to the Google Earth Engine (GEE) platform in order to improve our ability to test, scale, and reproduce results.

    The Earth Engine workflow used to produce our experimental results consists of four scripts: 1_baseline_generator; 2_predict_monitor_w_qa; 3_assessments_combine_paths_w_qa; 4_sample_results

    These scripts as well as a utilities package (utils.js) and several visualization scripts are available on GitHub ( and as a public GEE repository that can be accessed via the GitHub page.


    The baseline benerator script fits harmonic baseline models to time series of historic Landsat observations for each pixel in a user-defined study area specified based on the geometry of a GEE feature or feature collection. Additional user-specified parameters include the start and end years for the reference period, the set of harmonic frequencies used for fitting (“h12” or “h13”), the spectral index or transform to use as the dependent variable (TCG, NDVI, SR and EVI were used in our study, TCG, TCW, NBR, NDMI and NDSI are also available as options), and whether to use all available observations or constrain the use of Landsat 7 imagery to reduce spatial artifacts (“full” or “16d”). Users may also adjust a cloud cover threshold, with the default set to only use images with less than 80% cloud cover.

    The baseline generator script outputs a GEE image asset for each Landsat Path in the study area, and images are saved to an Image Collection created by the user. Path-based processing enables preservation of the native Landsat UTM projections during the model fitting stage, and duplicate results in endlap regions are removed using a quality mosaic that preserves the model results fit using the maximum number of observations. The resulting baseline raster images have 8 bands that store the harmonic regression coefficients as well as the model RMSE and number of observations used for fitting.


    The predict-monitor script uses the baseline images produced by the previous script to generate predicted values and associated condition scores for a user-specified monitoring period. For this study, we set the monitoring period within 2017, with a starting date of 05-01 and an end date of 09-30 for consistency with previous work. Users may also adjust the monitoring period cloud cover threshold, with a default setting to use only images with less than 50% cloud cover for monitoring.

    The predict-monitor script outputs a GEE image asset for each Landsat Path in the study area. Like the baseline generator, the Path-based processing model ensures that predictions and associated condition scores are generated in the same UTM projection as the baseline inputs and fitted harmonic models. The Path-based approach also controls for variations in view angle effect and observation frequency, keeping the Landsat sensors (rather than ground-based pixels) as the primary frame of reference.

    Predictions are generated for each clear pixel for each image acquired within the monitoring period. Condition scores are then calculated as the observed spectral value minus the predicted spectral value, divided by the RMSE of the harmonic baseline model used to generate predictions, and resulting scores are averaged such that the final product provides the mean score for each pixel over the given monitoring period (following Pasquarella et al. 2017). Predict-monitor results are added to a user-specified image collection, and each predict-monitor image has six bands, including the mean and standard deviation of condition scores for the monitoring period and the number of observations within the monitoring period, as well as the number of observations, RMSE, and slope of the baseline model, which provide a source of quality assessment.


    While Path-based results preserve input geometry, our southern New England study area, like many larger areas of interest, spans multiple Landsat Paths and UTM Zones. Therefore, as a final step in our monitoring and assessment workflow, we combine the Path-based predict-monitor results into a single product. The combine paths script reads in assessment images generated by the predict-monitor, reprojects all Paths to a common projection (i.e. NAD 83 Conus Albers / EPSG: 5070), and calculates a weighted average in Path overlap zones where two sets of mean condition scores exist. The result is a seamless image with a single weighted-mean score band, which is written to an image collection such that there is one product for each monitoring period/baseline experiment.

    Harmonic Baseline Experiments dataset

    The dataset developed as part of this study is archived on Zenodo. This archived dataset includes the full set of experimental raster results, as well as a “reanalysis” product from a previous implementation of our condition monitoring workflow. The list of raster files in this dataset can be found in hf374-01-inventory.csv.

    Raster file names

    Parameters used for each experiment are indicated in raster file names, and each results raster includes two bands: (1) score_weighted_mean, the metric of estimated change in vegetation “greenness” (unitless). Calculated as average difference between observed and predicted vegetation index values divided by harmonic baseline model RMSE for the May 1 to September 30 monitoring period. Scores are initially computed for each Landsat scene footprint with final scores weighted across Landsat orbital paths. (2) monitor_nobs, which gives the total number of observations used to compute the score_weighted_mean.

    Raster file name codes

    Spectral transforms: TCG = Tasseled Cap Greenness; NDVI = Normalized Difference Vegetation Index; SR = Simple Ratio; EVI = Enhanced Vegetation Index

    Baseline periods: 2000-2010; 2005-2015

    Harmonic frequencies: h12 = 12-month and 6-month harmonics, i.e. 1/365.25, 2/365.25; h13 = 12-month and 4-month harmonics, i.e. 1/365.25, 3/365.25

    Time series image inputs: full = All available observations; 16d = Single-sensor, excluding Landsat 7 when possible


    Pasquarella, V.J., Mickley, J.G., Barker Plotkin, A., MacLean, R. G., Anderson, R. M., Brown, L. M., Wagner, D. L., Singer, M. S., & Bagchi, R. (2021). Predicting defoliator abundance and defoliation measurements using Landsat-based condition scores. Remote Sensing in Ecology and Conservation.

    Pasquarella, V.J., Elkinton, J.S. & Bradley, B.A. (2018). Extensive gypsy moth defoliation in Southern New England characterized using Landsat satellite observations. Biological Invasions, 20: 3047-3053

    Pasquarella, V.J., Bradley, B.A, & Woodcock, C.E. (2017). Near-real-time monitoring of insect defoliation using Landsat time series. Forests, 8(8), 275.

  • Use:

    This dataset is released to the public under Creative Commons CC0 1.0 (No Rights Reserved). Please keep the dataset creators informed of any plans to use the dataset. Consultation with the original investigators is strongly encouraged. Publications and data products that make use of the dataset should include proper acknowledgement.

  • Citation:

    Pasquarella V. 2021. Harmonic Baseline Experiments for Landsat-Based Forest Condition Monitoring in Southern New England 2017. Harvard Forest Data Archive: HF374 (v.3). Environmental Data Initiative:

Detailed Metadata

hf374-01: raster files archived on Zenodo

  1. dir: directory
  2. filename: file name
  3. filesize_mb: file size in megabytes (unit: dimensionless / missing value: NA)
  4. description: contents of file
  5. doi: digital object identifier

hf374-02: Earth Engine code, scripts and datasets

  • Compression: zip
  • Format: csv, JavaScript script, R Markdown
  • Type: script, text