Data Files

Input data

Input data should be specified using the following formats. The metadata CSV file specifies unique values for each thermostat such as equipment type and location. Each thermostat interval data CSV file contains hourly runtime information and is linked to the metadata CSV file by the interval_data_filename column.

Data files must contain all fields, even if there is no data for that field. Please refer to the

Example files here.

Thermostat Summary Metadata CSV format

Columns

Name Data Format Units Description
thermostat_id string N/A A uniquely identifying marker for the thermostat.
heat_type string N/A The type of controlled HVAC heating equipment. [1]
heat_stage string N/A The stages of controlled HVAC heating equipment. [2]
cool_type string N/A The type of controlled HVAC cooling equipment. [3]
cool_stage string N/A The stages of controlled HVAC cooling equipment. [4]
zipcode string, 5 digits N/A The ZIP Code where the thermostat is installed. [5]
utc_offset string N/A The UTC offset of the times in the corresponding interval data CSV. (e.g. “-0700” or “-5”. Data in UTC is offset “+0”)
interval_data_filename string N/A The filename of the interval data file corresponding to this thermostat. Should be specified relative to the location of the metadata file.
  • Each row should correspond to a single thermostat.
  • Nulls should be specified by leaving the field blank.
  • All interval data for a particular thermostat should use the same, single UTC offset provided in the metadata file.
  • The zipcode field should use the ZIP Code of the thermostat. This will be turned into a latitude / longitude that will be used for station lookups. The package that does this lookup is the zipcodes package. This package may be used to determine if a ZIP Code is valid or doesn’t map to a location.

Thermostat Interval Data CSV format

Columns

Name Data Format Units Description
thermostat_id string N/A Uniquely identifying marker for the thermostat.
datetime YYYY-MM-DD hh:mm:ss (ISO-8601) N/A Date and time of this set of readings.
cool_runtime_stg1 decimal or integer minutes Hourly runtime of cooling equipment (all units).
cool_runtime_stg2 decimal or integer minutes Hourly runtime of cooling equipment second stage (two-stage units only).
cool_runtime_equiv decimal or integer minutes Hourly full load equivalent runtime of cooling equipment (multi-stage units only).
heat_runtime_stg1 decimal or integer minutes Hourly runtime of heating equipment (all units).
heat_runtime_stg2 decimal or integer minutes Hourly runtime of heating equipment second stage (two-stage units only).
heat_runtime_equiv decimal or integer minutes Hourly full load equivalent runtime of heating equipment (multi-stage units only).
auxiliary_heat decimal or integer minutes Hourly runtime of auxiliary heat equipment.
emergency_heat decimal or integer minutes Hourly runtime of emergency heat equipment.
temp_in decimal, to nearest 0.5 °F °F Hourly average conditioned space temperature over the period of the reading.
  • If a heating or cooling type or stage is not present or not applicable a value of none or blank is sufficient to signify no data present.
  • All headers must be present in the file, even if there is no data for that column (use none or blank for missing data.)
  • Dates should be specified in the ISO 8601 date format (e.g. 2015-05-19 01:00:00, 2020-01-01 23:00:00).
  • Dates and times must be consecutive. (e.g.: 2020-01-01 23:00:00 should have 2020-01-02 00:00:00 on the next line and 2020-01-02 01:00:00 after that.)
  • All dates for the period must be represented and consecutive. (i.e. each date for a period must have a line in the data file.)
  • Each row should correspond to a single hourly reading from a thermostat. [6]
  • NULL should be specified by leaving the field blank.
  • Zero values should be specified as 0, rather than as blank.
  • If data is missing for a particular row of one column, data should still be provided for other columns in that row. For example, if runtime is missing for a particular hour, please still provide indoor conditioned space temperature for that hour, if available.
  • Runtimes should be less than or equal to 60 minutes (1 hour).
  • All temperatures should be specified in °F (to the nearest 0.5°F).
  • All runtime data MUST have the same UTC offset, as provided in the corresponding metadata file.
  • Outdoor temperature data need not be provided - it will be fetched automatically from NCDC using the eeweather package.
  • If any hour of a particular day’s runtime is missing data then all of the data for that particular day will also be marked as missing (NULL) since the day’s data is incomplete.
  • If more than 5% of runtime data is missing for a thermostat that thermostat will be discarded.
[1]

Possible values for heat_type are:

  • furnace_or_boiler: Forced air furnace (any fuel)
  • heat_pump_electric_backup: Heat pump with electric resistance heat (strip heat)
  • heat_pump_no_electric_backup: Heat pump without electric resistance heat
  • heat_pump_dual_fuel: Dual fuel heat pump (e.g. gas or oil fired)
  • electric_resistance: Electric resistance heat (Line-voltage thermostat)
  • other: Multi-zone, etc.
  • none: No central heating system
  • (blank): No central heating system
[2]

Possible values for heat_stage are:

  • single_stage: Single capacity heater or single stage compressor
  • single_speed: Synonym for single capacity heater or single stage compressor
  • two_stage: Dual capacity heater or dual stage compressor
  • two_speed: Synonym for dual capacity heater or dual stage compressor
  • modulating: Modulating or variable capacity unit
  • variable_speed: Modulating or variable capacity unit
  • none: No central heating system
  • (blank): No central heating system
[3]

Possible values for cool_type are:

  • heat_pump: Heat pump w/ cooling
  • central: Central AC
  • other: Mini-split, evaporative cooler, etc.
  • none: No central cooling system
  • (blank): No central cooling system
[4]

Possible values for cool_stage are:

  • single_stage: Single stage compressor
  • two_stage: Dual stage compressor
  • single_speed: Single stage compressor (synonym for single_stage)
  • two_speed: Dual stage compressor (synonym for two_stage)
  • modulating: Modulating or variable capacity compressor
  • none: No central cooling system
  • (blank): No central cooling system
[5]Will be used for matching with a weather station that provides external dry-bulb temperature data. This temperature data will be used to determine the bounds of the heating and cooling season over which metrics will be computed.
[6]Previous versions of this software had each row as one daily result. This version changes this to use hourly rows instead.

Metrics data

Individual thermostat-season

This file is referred to as the metrics file. The metrics file contains the metrics output from the metrics calculation method. It is an intermediary file that is currently not submitted for certification.

The following columns are an intermediate output generated for each thermostat-season.

Columns

Name Data Format Units Description
General outputs      
sw_version string N/A Software version.
ct_identifier string N/A Identifier for thermostat as provided in the metadata file.
heat_type string N/A Heating type for the thermostat
heat_stage string N/A Heating stage for the thermostat
cool_type string N/A Cooling type for the thermostat
cool_stage string N/A Cooling stage for the thermostat
heating_or_cooling string N/A Label for the core day set (e.g. ‘heating_2012-2013’).
station string, USAF ID N/A USAF identifier for station used to fetch hourly temperature data.
climate_zone string N/A EIC climate zone (consolidated).
start_date date ISO-8601 Earliest date in input file.
end_date date ISO-8601 Latest date in input file.
n_days_both_heating_and_cooling integer # days Number of days not included as core days due to presence of both heating and cooling.
n_days_insufficient_data integer # days Number of days not included as core days due to missing data.
n_core_cooling_days integer # days Number of days meeting criteria for inclusion in core cooling day set.
n_core_heating_days integer # days Number of days meeting criteria for inclusion in core heating day set.
n_days_in_inputfile_date_range integer # days Number of potential days in inputfile date range.
baseline10_core_cooling_comfort_temperature float °F Baseline comfort temperature as determined by 10th percentile of indoor temperatures.
baseline90_core_cooling_comfort_temperature float °F Baseline comfort temperature as determined by 90th percentile of indoor temperatures.
regional_average_baseline_cooling_comfort_temperature float °F Baseline comfort temperature as determined by regional average.
regional_average_baseline_heating_comfort_temperature float °F Baseline comfort temperature as determined by regional average.
Model outputs      
percent_savings_baseline_percentile float percent Percent savings as given by hourly average CTD or HTD method with 10th or 90th percentile baseline
avoided_daily_mean_core_day_runtime_baseline_percentile float minutes Avoided average daily runtime for core cooling days
avoided_total_core_day_runtime_baseline_percentile float minutes Avoided total runtime for core cooling days
baseline_daily_mean_core_day_runtime_baseline_percentile float minutes Baseline average daily runtime for core cooling days
baseline_total_core_day_runtime_baseline_percentile float minutes Baseline total runtime for core cooling days
percent_savings_baseline_regional float percent Percent savings as given by hourly average CTD or HTD method with 10th or 90th percentile regional baseline
avoided_daily_mean_core_day_runtime_baseline_regional float minutes Avoided average daily runtime for core cooling days
avoided_total_core_day_runtime_baseline_regional float minutes Avoided total runtime for core cooling days
baseline_daily_mean_core_day_runtime_baseline_regional float minutes Baseline average daily runtime for core cooling days
baseline_total_core_day_runtime_baseline_regional float minutes Baseline total runtime for core cooling days
mean_demand float °F Average cooling demand
alpha float minutes/Δ°F The fitted slope of cooling runtime to demand regression
tau float °F The fitted intercept of cooling runtime to demand regression
mean_sq_err float N/A Mean squared error of regression
root_mean_sq_err float N/A Root mean squared error of regression
cv_root_mean_sq_err float N/A Coefficient of variation of root mean squared error of regression
mean_abs_err float N/A Mean absolute error
mean_abs_pct_err float N/A Mean absolute percent error
Runtime outputs      
total_core_cooling_runtime float minutes Total core cooling equipment runtime
total_core_heating_runtime float minutes Total core heating equipment runtime
total_auxiliary_heating_core_day_runtime float minutes Total core auxiliary heating equipment runtime
total_emergency_heating_core_day_runtime float minutes Total core emergency heating equipment runtime
daily_mean_core_cooling_runtime float minutes Average daily core cooling runtime
daily_mean_core_heating_runtime float minutes Average daily core cooling runtime
Core mean temperatures      
core_cooling_days_mean_indoor_temperature float °F Mean of core cooling days indoor temperature
core_cooling_days_mean_outdoor_temperature float °F Mean of core cooling days outdoor temperature
core_heating_days_mean_indoor_temperature float °F Mean of heating days indoor temperature
core_heating_days_mean_outdoor_temperature float °F Mean of heating days outdoor temperature
core_mean_indoor_temperature float °F Mean of indoor temperature
core_mean_outdoor_temperature float °F Mean of outdoor temperature
Resistance heat outputs      
rhu1_00F_to_05F decmial 0.0=0%, 1.0=100% Resistance heat utilization for hourly temperature bin \(0 \leq T_{out} < 5\)
rhu1_05F_to_10F decmial 0.0=0%, 1.0=100% Resistance heat utilization for hourly temperature bin \(5 \leq T_{out} < 10\)
rhu1_10F_to_15F decmial 0.0=0%, 1.0=100% Resistance heat utilization for hourly temperature bin \(10 \leq T_{out} < 15\)
rhu1_15F_to_20F decmial 0.0=0%, 1.0=100% Resistance heat utilization for hourly temperature bin \(15 \leq T_{out} < 20\)
rhu1_20F_to_25F decmial 0.0=0%, 1.0=100% Resistance heat utilization for hourly temperature bin \(20 \leq T_{out} < 25\)
rhu1_25F_to_30F decmial 0.0=0%, 1.0=100% Resistance heat utilization for hourly temperature bin \(25 \leq T_{out} < 30\)
rhu1_30F_to_35F decmial 0.0=0%, 1.0=100% Resistance heat utilization for hourly temperature bin \(30 \leq T_{out} < 35\)
rhu1_35F_to_40F decmial 0.0=0%, 1.0=100% Resistance heat utilization for hourly temperature bin \(35 \leq T_{out} < 40\)
rhu1_40F_to_45F decmial 0.0=0%, 1.0=100% Resistance heat utilization for hourly temperature bin \(40 \leq T_{out} < 45\)
rhu1_45F_to_50F decmial 0.0=0%, 1.0=100% Resistance heat utilization for hourly temperature bin \(45 \leq T_{out} < 50\)
rhu1_50F_to_55F decmial 0.0=0%, 1.0=100% Resistance heat utilization for hourly temperature bin \(50 \leq T_{out} < 55\)
rhu1_55F_to_60F decmial 0.0=0%, 1.0=100% Resistance heat utilization for hourly temperature bin \(55 \leq T_{out} < 60\)
rhu1_30F_to_45F decmial 0.0=0%, 1.0=100% Resistance heat utilization for hourly temperature bin \(30 \leq T_{out} < 45\)
rhu2_00F_to_05F decmial 0.0=0%, 1.0=100% RHU2 filtered resistance heat utilization for hourly temperature bin \(0 \leq T_{out} < 5\)
rhu2_05F_to_10F decmial 0.0=0%, 1.0=100% RHU2 filtered resistance heat utilization for hourly temperature bin \(5 \leq T_{out} < 10\)
rhu2_10F_to_15F decmial 0.0=0%, 1.0=100% RHU2 filtered resistance heat utilization for hourly temperature bin \(10 \leq T_{out} < 15\)
rhu2_15F_to_20F decmial 0.0=0%, 1.0=100% RHU2 filtered resistance heat utilization for hourly temperature bin \(15 \leq T_{out} < 20\)
rhu2_20F_to_25F decmial 0.0=0%, 1.0=100% RHU2 filtered resistance heat utilization for hourly temperature bin \(20 \leq T_{out} < 25\)
rhu2_25F_to_30F decmial 0.0=0%, 1.0=100% RHU2 filtered resistance heat utilization for hourly temperature bin \(25 \leq T_{out} < 30\)
rhu2_30F_to_35F decmial 0.0=0%, 1.0=100% RHU2 filtered resistance heat utilization for hourly temperature bin \(30 \leq T_{out} < 35\)
rhu2_35F_to_40F decmial 0.0=0%, 1.0=100% RHU2 filtered resistance heat utilization for hourly temperature bin \(35 \leq T_{out} < 40\)
rhu2_40F_to_45F decmial 0.0=0%, 1.0=100% RHU2 filtered resistance heat utilization for hourly temperature bin \(40 \leq T_{out} < 45\)
rhu2_45F_to_50F decmial 0.0=0%, 1.0=100% RHU2 filtered resistance heat utilization for hourly temperature bin \(45 \leq T_{out} < 50\)
rhu2_50F_to_55F decmial 0.0=0%, 1.0=100% RHU2 filtered resistance heat utilization for hourly temperature bin \(50 \leq T_{out} < 55\)
rhu2_55F_to_60F decmial 0.0=0%, 1.0=100% RHU2 filtered resistance heat utilization for hourly temperature bin \(55 \leq T_{out} < 60\)
rhu2_30F_to_45F decmial 0.0=0%, 1.0=100% RHU2 filtered resistance heat utilization for hourly temperature bin \(30 \leq T_{out} < 45\)
rhu2IQFLT_00F_to_05F decmial 0.0=0%, 1.0=100% RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(0 \leq T_{out} < 5\)
rhu2IQFLT_05F_to_10F decmial 0.0=0%, 1.0=100% RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(5 \leq T_{out} < 10\)
rhu2IQFLT_10F_to_15F decmial 0.0=0%, 1.0=100% RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(10 \leq T_{out} < 15\)
rhu2IQFLT_15F_to_20F decmial 0.0=0%, 1.0=100% RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(15 \leq T_{out} < 20\)
rhu2IQFLT_20F_to_25F decmial 0.0=0%, 1.0=100% RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(20 \leq T_{out} < 25\)
rhu2IQFLT_25F_to_30F decmial 0.0=0%, 1.0=100% RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(25 \leq T_{out} < 30\)
rhu2IQFLT_30F_to_35F decmial 0.0=0%, 1.0=100% RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(30 \leq T_{out} < 35\)
rhu2IQFLT_35F_to_40F decmial 0.0=0%, 1.0=100% RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(35 \leq T_{out} < 40\)
rhu2IQFLT_40F_to_45F decmial 0.0=0%, 1.0=100% RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(40 \leq T_{out} < 45\)
rhu2IQFLT_45F_to_50F decmial 0.0=0%, 1.0=100% RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(45 \leq T_{out} < 50\)
rhu2IQFLT_50F_to_55F decmial 0.0=0%, 1.0=100% RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(50 \leq T_{out} < 55\)
rhu2IQFLT_55F_to_60F decmial 0.0=0%, 1.0=100% RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(55 \leq T_{out} < 60\)
rhu2IQFLT_30F_to_45F decmial 0.0=0%, 1.0=100% RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(30 \leq T_{out} < 45\)

Summary Statistics

The results of the statistics module are output into two separate file types: statistics and advanced statistics. Currently only the statistics file is required to be submitted for certification. See the tutorial for an example of the distinction between these files.

For each real- or integer-valued column (“###”) from the individual thermostat-season output (metrics file), the following summary statistics are generated.

(For readability, these columns are actually rows.)

Columns

Name Description
###_n Number of samples
###_upper_bound_95_perc_conf 95% confidence upper bound on mean value
###_mean Mean value
###_lower_bound_95_perc_conf 95% confidence lower bound on mean value
###_sem Standard error of the mean
###_1q q1 (q=quantile)
###_2.5q q2.5
###_5q q5
###_10q q10
###_15q q15
###_20q q20
###_25q q25
###_30q q30
###_35q q35
###_40q q40
###_45q q45
###_50q q50
###_55q q55
###_60q q60
###_65q q65
###_70q q70
###_75q q75
###_80q q80
###_85q q85
###_90q q90
###_95q q95
###_98q q98
###_99q q99

The following general columns are also output:

Columns

Name Description
sw_version Software version
product_id Alphanumeric product identifier
n_thermostat_core_day_sets_total Number of relevant thermostat rows from thermostat module output prior to filtering
n_thermostat_core_day_sets_kept Number of relevant thermostat rows from thermostat module not filtered out
n_thermostat_core_day_sets_discarded Number of relevant thermostat rows from thermostat module filtered out

Certification File

This file contains all of the relevant statistics for certification. It is submitted for certification.

Columns

Name Description
product_id Product ID
sw_version Software Version
metric Metric (percent_savings_baseline_percentile or rhu_30F_to_45F)
filter Filter Used (tau_cvrmse_savings_p01)
region Region (national_weighted_mean or all)
statistic Statistic (lower_bound_95 (95% confidence lower bound on mean value), q20 (20th percentile) or upper_bound_95 (95% confidence upper bound on mean value))
season Season (heating or cooling)
value Value

National weighted percent savings are computed by weighted average of percent savings results grouped by climate zone. Heavier weights are applied to results in climate zones which tend to have longer runtimes. Weightings used are available for download.

Thermostat Import Errors

This file is used for troubleshooting any thermostat import or creation errors. It is not submitted for certification.

It contains the following entries:

Columns

Name Description
thermostat_id Thermostat ID
error Error message for the thermostat