Chapter 2 Proposal

2.1 Research Topic

We initially began the search for a data source by identifying areas of interest - the members of our group expressed interest in geospatial data and environmental data. The effects of climate change are already tangible in many parts of the country and the world. For example, for two of our group members from California, USA, increased frequency and severity of drought and wildfire is already a reality. For this reason, analyzing environmental data is relevant to our interest and experiences. Our group is interested in exploring how the frequency, duration and severity of drought, and its relation to the occurrence of wildfires, has changed throughout the United States over time. Using US-based datasets, we would like to answer the following questions: 1. Have droughts in the USA become more frequent over time? Have they become more severe over time? Have certain areas become more or less prone to drought than others? 2. Have wildfires in the USA become more frequent or severe over time? Are certain locations now more prone to wildfire or less prone to wildfire? 3. Are changes in the frequency, duration, severity and location of drought related to similar trends in wildfire occurrence? In areas that have become more drought-prone, are wildfires now more common?

2.2 Data Availability

We will draw on datasets from the Center for Disease Control (CDC) and the National Fire Incident Reporting System (NFIRS) in our analyses. We chose to analyze two datasets because we are interested in both drought and wildfire trends and the relationship between the two, and we found excellent data sources for each trend maintained separately. The two datasets chosen are detailed below:

  1. The CDC United States Drought Monitor reports weekly drought estimates at the county level throughout the contiguous United States. This data was originally recorded as part of the United States Drought Monitor, a collaborative program jointly produced by the National Oceanic and Atmospheric Administration (NOAA), US Department of Agriculture (USDA), and National Drought Mitigation Center (NDMC) at the University of Nebraska - Lincoln. This information is included in a metadata file maintained on the CDC website. The data was provided to the CDC by the Cooperative Institute for Climate and Satellites - North Carolina. The data is maintained by the CDC National Environmental Public Health Tracking Network and used to generate drought measures. The data spans a time period between 2000-2016 and was last updated November 27, 2018. Each week, each county in the contiguous (lower 48) United States and Washington DC, encoded by a combination of state ID and county ID or FIPS, has a recorded ordinal value describing the drought status. The data is available in .csv format and contains 2.8 million rows - it can be read directly into R.

  2. NFIRS Public Data contains fire incident information across the US. The data is provided by the US Fire Administration’s National Fire Data Center. Each year the U.S. Fire Administration compiles publicly-released NFIRS incidents, collected by states during the previous calendar year, into a public data release. The data is collected based on uniform codes that fire departments use to report on the characteristics of incidents to which they respond, including but not limited to fires. Each year, the data is released as a collection of separate text files delimited by the “^” character. Additional text files contain more detail about specific incidents including fires, hazardous materials etc. To analyze the data, the text files for each year were be read into R, then combined into one DataFrame and filtered only to include wildfire-related incidents, then saved. In other words, some pre-processing was required and performed locally due to the large file sizes. Since this data is recorded by fire department personnel, there is some subjectivity when describing an incident’s characteristics (type of structure, cause of fire, etc.) and perhaps some data entry errors and missing values which will have to be addressed throughout the analysis.