Data Files¶
Input data¶
Input data should be specified using the following formats. The metadata CSV file
specifies unique values for each thermostat such as equipment type and location.
Each thermostat interval data CSV file contains hourly runtime information and is linked
to the metadata CSV file by the interval_data_filename
column.
Data files must contain all fields, even if there is no data for that field. Please refer to the
Example files here
.
Thermostat Summary Metadata CSV format¶
Columns¶
Name | Data Format | Units | Description |
thermostat_id |
string | N/A | A uniquely identifying marker for the thermostat. |
heat_type |
string | N/A | The type of controlled HVAC heating equipment. [1] |
heat_stage |
string | N/A | The stages of controlled HVAC heating equipment. [2] |
cool_type |
string | N/A | The type of controlled HVAC cooling equipment. [3] |
cool_stage |
string | N/A | The stages of controlled HVAC cooling equipment. [4] |
zipcode |
string, 5 digits | N/A | The ZIP Code where the thermostat is installed. [5] |
utc_offset |
string | N/A | The UTC offset of the times in the corresponding interval data CSV. (e.g. “-0700” or “-5”. Data in UTC is offset “+0”) |
interval_data_filename |
string | N/A | The filename of the interval data file corresponding to this thermostat. Should be specified relative to the location of the metadata file. |
- Each row should correspond to a single thermostat.
- Nulls should be specified by leaving the field blank.
- All interval data for a particular thermostat should use the same, single UTC offset provided in the metadata file.
- The zipcode field should use the ZIP Code of the thermostat. This will be turned into a latitude / longitude that will be used for station lookups. The package that does this lookup is the zipcodes package. This package may be used to determine if a ZIP Code is valid or doesn’t map to a location.
Thermostat Interval Data CSV format¶
Columns¶
Name | Data Format | Units | Description |
thermostat_id |
string | N/A | Uniquely identifying marker for the thermostat. |
datetime |
YYYY-MM-DD hh:mm:ss (ISO-8601) | N/A | Date and time of this set of readings. |
cool_runtime_stg1 |
decimal or integer | minutes | Hourly runtime of cooling equipment (all units). |
cool_runtime_stg2 |
decimal or integer | minutes | Hourly runtime of cooling equipment second stage (two-stage units only). |
cool_runtime_equiv |
decimal or integer | minutes | Hourly full load equivalent runtime of cooling equipment (multi-stage units only). |
heat_runtime_stg1 |
decimal or integer | minutes | Hourly runtime of heating equipment (all units). |
heat_runtime_stg2 |
decimal or integer | minutes | Hourly runtime of heating equipment second stage (two-stage units only). |
heat_runtime_equiv |
decimal or integer | minutes | Hourly full load equivalent runtime of heating equipment (multi-stage units only). |
auxiliary_heat |
decimal or integer | minutes | Hourly runtime of auxiliary heat equipment. |
emergency_heat |
decimal or integer | minutes | Hourly runtime of emergency heat equipment. |
temp_in |
decimal, to nearest 0.5 °F | °F | Hourly average conditioned space temperature over the period of the reading. |
- If a heating or cooling type or stage is not present or not applicable a
value of
none
or blank is sufficient to signify no data present. - All headers must be present in the file, even if there is no data for that
column (use
none
or blank for missing data.) - Dates should be specified in the ISO 8601 date format (e.g.
2015-05-19 01:00:00
,2020-01-01 23:00:00
). - Dates and times must be consecutive. (e.g.:
2020-01-01 23:00:00
should have2020-01-02 00:00:00
on the next line and2020-01-02 01:00:00
after that.) - All dates for the period must be represented and consecutive. (i.e. each date for a period must have a line in the data file.)
- Each row should correspond to a single hourly reading from a thermostat. [6]
- NULL should be specified by leaving the field blank.
- Zero values should be specified as 0, rather than as blank.
- If data is missing for a particular row of one column, data should still be provided for other columns in that row. For example, if runtime is missing for a particular hour, please still provide indoor conditioned space temperature for that hour, if available.
- Runtimes should be less than or equal to 60 minutes (1 hour).
- All temperatures should be specified in °F (to the nearest 0.5°F).
- All runtime data MUST have the same UTC offset, as provided in the corresponding metadata file.
- Outdoor temperature data need not be provided - it will be fetched automatically from NCDC using the eeweather package.
- If any hour of a particular day’s runtime is missing data then all of the data for that particular day will also be marked as missing (NULL) since the day’s data is incomplete.
- If more than 5% of runtime data is missing for a thermostat that thermostat will be discarded.
[1] | Possible values for
|
[2] | Possible values for
|
[3] | Possible values for
|
[4] | Possible values for
|
[5] | Will be used for matching with a weather station that provides external dry-bulb temperature data. This temperature data will be used to determine the bounds of the heating and cooling season over which metrics will be computed. |
[6] | Previous versions of this software had each row as one daily result. This version changes this to use hourly rows instead. |
Metrics data¶
Individual thermostat-season¶
This file is referred to as the metrics file. The metrics file contains the metrics output from the metrics calculation method. It is an intermediary file that is currently not submitted for certification.
The following columns are an intermediate output generated for each thermostat-season.
Columns¶
Name | Data Format | Units | Description |
---|---|---|---|
General outputs | |||
sw_version |
string | N/A | Software version. |
ct_identifier |
string | N/A | Identifier for thermostat as provided in the metadata file. |
heat_type |
string | N/A | Heating type for the thermostat |
heat_stage |
string | N/A | Heating stage for the thermostat |
cool_type |
string | N/A | Cooling type for the thermostat |
cool_stage |
string | N/A | Cooling stage for the thermostat |
heating_or_cooling |
string | N/A | Label for the core day set (e.g. ‘heating_2012-2013’). |
station |
string, USAF ID | N/A | USAF identifier for station used to fetch hourly temperature data. |
climate_zone |
string | N/A | EIC climate zone (consolidated). |
start_date |
date | ISO-8601 | Earliest date in input file. |
end_date |
date | ISO-8601 | Latest date in input file. |
n_days_both_heating_and_cooling |
integer | # days | Number of days not included as core days due to presence of both heating and cooling. |
n_days_insufficient_data |
integer | # days | Number of days not included as core days due to missing data. |
n_core_cooling_days |
integer | # days | Number of days meeting criteria for inclusion in core cooling day set. |
n_core_heating_days |
integer | # days | Number of days meeting criteria for inclusion in core heating day set. |
n_days_in_inputfile_date_range |
integer | # days | Number of potential days in inputfile date range. |
baseline10_core_cooling_comfort_temperature |
float | °F | Baseline comfort temperature as determined by 10th percentile of indoor temperatures. |
baseline90_core_cooling_comfort_temperature |
float | °F | Baseline comfort temperature as determined by 90th percentile of indoor temperatures. |
regional_average_baseline_cooling_comfort_temperature |
float | °F | Baseline comfort temperature as determined by regional average. |
regional_average_baseline_heating_comfort_temperature |
float | °F | Baseline comfort temperature as determined by regional average. |
Model outputs | |||
percent_savings_baseline_percentile |
float | percent | Percent savings as given by hourly average CTD or HTD method with 10th or 90th percentile baseline |
avoided_daily_mean_core_day_runtime_baseline_percentile |
float | minutes | Avoided average daily runtime for core cooling days |
avoided_total_core_day_runtime_baseline_percentile |
float | minutes | Avoided total runtime for core cooling days |
baseline_daily_mean_core_day_runtime_baseline_percentile |
float | minutes | Baseline average daily runtime for core cooling days |
baseline_total_core_day_runtime_baseline_percentile |
float | minutes | Baseline total runtime for core cooling days |
percent_savings_baseline_regional |
float | percent | Percent savings as given by hourly average CTD or HTD method with 10th or 90th percentile regional baseline |
avoided_daily_mean_core_day_runtime_baseline_regional |
float | minutes | Avoided average daily runtime for core cooling days |
avoided_total_core_day_runtime_baseline_regional |
float | minutes | Avoided total runtime for core cooling days |
baseline_daily_mean_core_day_runtime_baseline_regional |
float | minutes | Baseline average daily runtime for core cooling days |
baseline_total_core_day_runtime_baseline_regional |
float | minutes | Baseline total runtime for core cooling days |
mean_demand |
float | °F | Average cooling demand |
alpha |
float | minutes/Δ°F | The fitted slope of cooling runtime to demand regression |
tau |
float | °F | The fitted intercept of cooling runtime to demand regression |
mean_sq_err |
float | N/A | Mean squared error of regression |
root_mean_sq_err |
float | N/A | Root mean squared error of regression |
cv_root_mean_sq_err |
float | N/A | Coefficient of variation of root mean squared error of regression |
mean_abs_err |
float | N/A | Mean absolute error |
mean_abs_pct_err |
float | N/A | Mean absolute percent error |
Runtime outputs | |||
total_core_cooling_runtime |
float | minutes | Total core cooling equipment runtime |
total_core_heating_runtime |
float | minutes | Total core heating equipment runtime |
total_auxiliary_heating_core_day_runtime |
float | minutes | Total core auxiliary heating equipment runtime |
total_emergency_heating_core_day_runtime |
float | minutes | Total core emergency heating equipment runtime |
daily_mean_core_cooling_runtime |
float | minutes | Average daily core cooling runtime |
daily_mean_core_heating_runtime |
float | minutes | Average daily core cooling runtime |
Core mean temperatures | |||
core_cooling_days_mean_indoor_temperature |
float | °F | Mean of core cooling days indoor temperature |
core_cooling_days_mean_outdoor_temperature |
float | °F | Mean of core cooling days outdoor temperature |
core_heating_days_mean_indoor_temperature |
float | °F | Mean of heating days indoor temperature |
core_heating_days_mean_outdoor_temperature |
float | °F | Mean of heating days outdoor temperature |
core_mean_indoor_temperature |
float | °F | Mean of indoor temperature |
core_mean_outdoor_temperature |
float | °F | Mean of outdoor temperature |
Resistance heat outputs | |||
rhu1_00F_to_05F |
decmial | 0.0=0%, 1.0=100% | Resistance heat utilization for hourly temperature bin \(0 \leq T_{out} < 5\) |
rhu1_05F_to_10F |
decmial | 0.0=0%, 1.0=100% | Resistance heat utilization for hourly temperature bin \(5 \leq T_{out} < 10\) |
rhu1_10F_to_15F |
decmial | 0.0=0%, 1.0=100% | Resistance heat utilization for hourly temperature bin \(10 \leq T_{out} < 15\) |
rhu1_15F_to_20F |
decmial | 0.0=0%, 1.0=100% | Resistance heat utilization for hourly temperature bin \(15 \leq T_{out} < 20\) |
rhu1_20F_to_25F |
decmial | 0.0=0%, 1.0=100% | Resistance heat utilization for hourly temperature bin \(20 \leq T_{out} < 25\) |
rhu1_25F_to_30F |
decmial | 0.0=0%, 1.0=100% | Resistance heat utilization for hourly temperature bin \(25 \leq T_{out} < 30\) |
rhu1_30F_to_35F |
decmial | 0.0=0%, 1.0=100% | Resistance heat utilization for hourly temperature bin \(30 \leq T_{out} < 35\) |
rhu1_35F_to_40F |
decmial | 0.0=0%, 1.0=100% | Resistance heat utilization for hourly temperature bin \(35 \leq T_{out} < 40\) |
rhu1_40F_to_45F |
decmial | 0.0=0%, 1.0=100% | Resistance heat utilization for hourly temperature bin \(40 \leq T_{out} < 45\) |
rhu1_45F_to_50F |
decmial | 0.0=0%, 1.0=100% | Resistance heat utilization for hourly temperature bin \(45 \leq T_{out} < 50\) |
rhu1_50F_to_55F |
decmial | 0.0=0%, 1.0=100% | Resistance heat utilization for hourly temperature bin \(50 \leq T_{out} < 55\) |
rhu1_55F_to_60F |
decmial | 0.0=0%, 1.0=100% | Resistance heat utilization for hourly temperature bin \(55 \leq T_{out} < 60\) |
rhu1_30F_to_45F |
decmial | 0.0=0%, 1.0=100% | Resistance heat utilization for hourly temperature bin \(30 \leq T_{out} < 45\) |
rhu2_00F_to_05F |
decmial | 0.0=0%, 1.0=100% | RHU2 filtered resistance heat utilization for hourly temperature bin \(0 \leq T_{out} < 5\) |
rhu2_05F_to_10F |
decmial | 0.0=0%, 1.0=100% | RHU2 filtered resistance heat utilization for hourly temperature bin \(5 \leq T_{out} < 10\) |
rhu2_10F_to_15F |
decmial | 0.0=0%, 1.0=100% | RHU2 filtered resistance heat utilization for hourly temperature bin \(10 \leq T_{out} < 15\) |
rhu2_15F_to_20F |
decmial | 0.0=0%, 1.0=100% | RHU2 filtered resistance heat utilization for hourly temperature bin \(15 \leq T_{out} < 20\) |
rhu2_20F_to_25F |
decmial | 0.0=0%, 1.0=100% | RHU2 filtered resistance heat utilization for hourly temperature bin \(20 \leq T_{out} < 25\) |
rhu2_25F_to_30F |
decmial | 0.0=0%, 1.0=100% | RHU2 filtered resistance heat utilization for hourly temperature bin \(25 \leq T_{out} < 30\) |
rhu2_30F_to_35F |
decmial | 0.0=0%, 1.0=100% | RHU2 filtered resistance heat utilization for hourly temperature bin \(30 \leq T_{out} < 35\) |
rhu2_35F_to_40F |
decmial | 0.0=0%, 1.0=100% | RHU2 filtered resistance heat utilization for hourly temperature bin \(35 \leq T_{out} < 40\) |
rhu2_40F_to_45F |
decmial | 0.0=0%, 1.0=100% | RHU2 filtered resistance heat utilization for hourly temperature bin \(40 \leq T_{out} < 45\) |
rhu2_45F_to_50F |
decmial | 0.0=0%, 1.0=100% | RHU2 filtered resistance heat utilization for hourly temperature bin \(45 \leq T_{out} < 50\) |
rhu2_50F_to_55F |
decmial | 0.0=0%, 1.0=100% | RHU2 filtered resistance heat utilization for hourly temperature bin \(50 \leq T_{out} < 55\) |
rhu2_55F_to_60F |
decmial | 0.0=0%, 1.0=100% | RHU2 filtered resistance heat utilization for hourly temperature bin \(55 \leq T_{out} < 60\) |
rhu2_30F_to_45F |
decmial | 0.0=0%, 1.0=100% | RHU2 filtered resistance heat utilization for hourly temperature bin \(30 \leq T_{out} < 45\) |
rhu2IQFLT_00F_to_05F |
decmial | 0.0=0%, 1.0=100% | RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(0 \leq T_{out} < 5\) |
rhu2IQFLT_05F_to_10F |
decmial | 0.0=0%, 1.0=100% | RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(5 \leq T_{out} < 10\) |
rhu2IQFLT_10F_to_15F |
decmial | 0.0=0%, 1.0=100% | RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(10 \leq T_{out} < 15\) |
rhu2IQFLT_15F_to_20F |
decmial | 0.0=0%, 1.0=100% | RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(15 \leq T_{out} < 20\) |
rhu2IQFLT_20F_to_25F |
decmial | 0.0=0%, 1.0=100% | RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(20 \leq T_{out} < 25\) |
rhu2IQFLT_25F_to_30F |
decmial | 0.0=0%, 1.0=100% | RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(25 \leq T_{out} < 30\) |
rhu2IQFLT_30F_to_35F |
decmial | 0.0=0%, 1.0=100% | RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(30 \leq T_{out} < 35\) |
rhu2IQFLT_35F_to_40F |
decmial | 0.0=0%, 1.0=100% | RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(35 \leq T_{out} < 40\) |
rhu2IQFLT_40F_to_45F |
decmial | 0.0=0%, 1.0=100% | RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(40 \leq T_{out} < 45\) |
rhu2IQFLT_45F_to_50F |
decmial | 0.0=0%, 1.0=100% | RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(45 \leq T_{out} < 50\) |
rhu2IQFLT_50F_to_55F |
decmial | 0.0=0%, 1.0=100% | RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(50 \leq T_{out} < 55\) |
rhu2IQFLT_55F_to_60F |
decmial | 0.0=0%, 1.0=100% | RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(55 \leq T_{out} < 60\) |
rhu2IQFLT_30F_to_45F |
decmial | 0.0=0%, 1.0=100% | RHU2 IQR filtered resistance heat utilization for hourly temperature bin \(30 \leq T_{out} < 45\) |
Summary Statistics¶
The results of the statistics module are output into two separate file types: statistics and advanced statistics. Currently only the statistics file is required to be submitted for certification. See the tutorial for an example of the distinction between these files.
For each real- or integer-valued column (“###”) from the individual thermostat-season output (metrics file), the following summary statistics are generated.
(For readability, these columns are actually rows.)
Columns¶
Name | Description |
---|---|
###_n |
Number of samples |
###_upper_bound_95_perc_conf |
95% confidence upper bound on mean value |
###_mean |
Mean value |
###_lower_bound_95_perc_conf |
95% confidence lower bound on mean value |
###_sem |
Standard error of the mean |
###_1q |
q1 (q=quantile) |
###_2.5q |
q2.5 |
###_5q |
q5 |
###_10q |
q10 |
###_15q |
q15 |
###_20q |
q20 |
###_25q |
q25 |
###_30q |
q30 |
###_35q |
q35 |
###_40q |
q40 |
###_45q |
q45 |
###_50q |
q50 |
###_55q |
q55 |
###_60q |
q60 |
###_65q |
q65 |
###_70q |
q70 |
###_75q |
q75 |
###_80q |
q80 |
###_85q |
q85 |
###_90q |
q90 |
###_95q |
q95 |
###_98q |
q98 |
###_99q |
q99 |
The following general columns are also output:
Columns¶
Name | Description |
---|---|
sw_version |
Software version |
product_id |
Alphanumeric product identifier |
n_thermostat_core_day_sets_total |
Number of relevant thermostat rows from thermostat module output prior to filtering |
n_thermostat_core_day_sets_kept |
Number of relevant thermostat rows from thermostat module not filtered out |
n_thermostat_core_day_sets_discarded |
Number of relevant thermostat rows from thermostat module filtered out |
Certification File¶
This file contains all of the relevant statistics for certification. It is submitted for certification.
Columns¶
Name | Description |
---|---|
product_id |
Product ID |
sw_version |
Software Version |
metric |
Metric (percent_savings_baseline_percentile or rhu_30F_to_45F ) |
filter |
Filter Used (tau_cvrmse_savings_p01 ) |
region |
Region (national_weighted_mean or all ) |
statistic |
Statistic (lower_bound_95 (95% confidence lower bound on mean value), q20 (20th percentile) or upper_bound_95 (95% confidence upper bound on mean value)) |
season |
Season (heating or cooling ) |
value |
Value |
National weighted percent savings are computed by weighted average of percent savings results
grouped by climate zone. Heavier weights are applied to results in climate
zones which tend to have longer runtimes. Weightings used are
available for download
.