Quickstart

Installation

To install the thermostat package for the first time, we highly recommend that you create a virtual environment or a conda environment in which to install it. You may choose to skip this step, but do so at the risk of corrupting your existing python environment. Isolating your python environment will also make it easier to debug.

# if using virtualenvwrapper (see https://virtualenvwrapper.readthedocs.org/en/latest/install.html)
$ mkvirtualenv thermostat
(thermostat)$ pip install thermostat

# if using Python 3 with venv (see https://docs.python.org/3/library/venv.html)
# (cd to directory with data files)
$ python3 -m venv venv
$ source venv/bin/activate
(venv)$ pip install thermostat

# if using conda (see note below - conda is distributed with Anaconda)
# Certain Windows installations may have issues with Thermostat 1.7.x. See "Windows Notes" below.
$ conda create --yes --name thermostat pip
$ conda activate thermostat
(thermostat)$ pip install thermostat

..note:: We no longer recommend using pipenv because of difficulties keeping the environment up-to-date. Please use one of the other methods mentioned above for installing the software.

If you already have an environment, use the following:

# if using virtualenvwrapper
$ workon thermostat
(thermostat)$

# if using conda
$ source activate thermostat
(thermostat)$

# if using venv, or virtualenv directly
$ source /path/to/venv/bin/activate

To deactivate the environment when you’ve finished, use the following:

# if using virtualenvwrapper / venv
(thermostat)$ deactivate
$

# if using conda
(thermostat)$ source deactivate
$

Check to make sure you are on the most recent version of the package.

>>> import thermostat; thermostat.get_version()

'1.7.4'

If you are not on the correct version, you should upgrade:

$ pip install thermostat --upgrade

The command above will update dependencies as well. If you wish to skip this, use the --no-deps flag:

$ pip install thermostat --upgrade --no-deps

Previous versions of the package are available on github.

Note

If you experience issues installing python packages with C extensions, such as numpy or scipy, we recommend installing and using the free Anaconda Python distribution by Continuum Analytics. It contains many of the numeric and scientific packages used by this package and has installers for Windows, macOS and Linux.

Once you have verified a correct installation, import the necessary methods and set a directory for finding and storing data.

Note

If you suspect a package version conflict or error, you can verify the versions of the packages you have installed against the package versions in requirements.txt.

To list your package versions, use:

$ pip freeze

or (if you’re using Anaconda):

$ conda list

Script setup and imports

Import the few built-in python packages and methods we will be using in this tutorial as follows.

import sys
import os
import warnings
from os.path import expanduser

Also make sure to import the methods we will be using from the thermostat package.

from thermostat.importers import from_csv
from thermostat.exporters import metrics_to_csv
from thermostat.stats import compute_summary_statistics
from thermostat.stats import summary_statistics_to_csv

If you wish to use multiple processors for your thermostat calculations you’ll need some additional modules:

from thermostat.multiple import multiple_thermostat_calculate_epa_field_savings_metrics

Set the data_dir variable as a convenience. We will refer to this directory and save our results in it. You should also move all downloaded and extracted files used in this tutorial into this directory before using them. You may, of course, choose to use a different directory, which you can set here, or override it entirely by replacing it where it appears in the tutorial.

data_dir = os.path.join(expanduser("~"), "thermostat_tutorial")
# or data_dir = "/full/path/to/custom/directory/"

Optional Setup

If you wish to follow the progress of downloading and caching external weather files, which will be the most time-consuming portion of this tutorial, you may wish at this point to configure logging. The example here will work within most iPython / Jupyter Notebook or script environments. If you have a more complicated logging setup, you may need to use something other than the root logger, which this uses.

import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

Note

The thermostat package depends on the eemeter and eeweather packages for weather data fetching. The eeweather package automatically creates its own cache directory in which it keeps cached versions of weather source data. This speeds up the (generally I/O bound) NOAA weather fetching routine on subsequent internal calls to fetch the same weather data (i.e. getting outdoor temperature data for thermostats that map to the same weather station).

For more information, see the eeweather package.

Note

US Census Bureau ZIP Code Tabulation Areas (ZCTA) are used to map USPS ZIP codes to outdoor temperature data. If the automatic mapping is unsuccessful for one or more of the ZIP codes in your dataset, the reason is likely to be the discrepancy between “true” USPS ZIP codes and the US Census Bureau ZCTAs. “True” ZIP codes are not used because they do not always map well to location (for example, ZIP codes for P.O. boxes). You may need to first map ZIP codes to ZCTAs, or these thermostats will be skipped. There are roughly 32,000 ZCTAs and roughly 42000 ZIP codes - many fewer ZCTAs than ZIP codes.

Computing individual thermostat-season metrics

After importing the package methods, load the example thermostat data, or provide data of your own. See Input data for more detailed file format information.

Fabricated example data from 35 thermostats in various climate zones, is available for download here.

Loading the thermostat data below will take more than a few minutes, even if the weather cache is enabled (see note above). This is because loading thermostat data involves downloading hourly weather data from a remote source - in this case, the NCDC.

The following loads an lazy iterator over the thermostats. The thermostats will be loaded into memory as necessary in the following steps.

metadata_filename = os.path.join(data_dir, "examples/metadata.csv")
thermostats = from_csv(metadata_filename, verbose=True)

To calculate savings metrics, iterate through thermostats and save the results. Uncomment the commented lines if you would like to store the thermostats in memory for inspection. Note that this could eat up your application memory and is only recommended for debugging purposes.

metrics = []
# saved_thermostats = []
for thermostat in thermostats:
    outputs = thermostat.calculate_epa_field_savings_metrics()
    metrics.extend(outputs)
    # saved_thermostats.append(thermostat)

If you are looking to use multiple thermostats for the calculation you may replace the above code with the following method call:

metrics = multiple_thermostat_calculate_epa_field_savings_metrics(thermostats)

This will use all of the available CPUs on the machine in order to calculate the savings metrics.

Note

You will need to have imported the multiple_thermostat_calculate_epa_field_savings_metrics method from thermostat.multiple prior to using this method.

If you’re running under Windows please see the “Notes for Windows Users” below.

The single-thermostat metrics should be output to CSV and converted to dataframe format.

output_filename = os.path.join(data_dir, "thermostat_example_output.csv")
metrics_df = metrics_to_csv(metrics, output_filename)

The output CSV will be saved in your data directory and should very nearly match the output CSV provided in the example data.

See Output data for more detailed file format information.

Computing summary statistics

Once you have obtained output for each individual thermostat in your dataset, use the stats module to compute summary statistics, which are formatted for submission to the EPA. The example below works with the output file from the tutorial above and can be modified to use your data.

Compute statistics across all thermostats.

# uses the metrics_df created in the Quickstart above.
with warnings.catch_warnings():
    warnings.simplefilter("ignore")

    # uses the metrics_df created in the quickstart above.
    stats = compute_summary_statistics(metrics_df)

    # If you want to have advanced filter outputs, use this instead
    # stats_advanced = compute_summary_statistics(metrics_df, advanced_filtering=True)

Save these results to file.

Each row of the saved CSV will represent one type of output, with one row per statistic per output. Each column in the CSV will represent one subset of thermostats, as determined by grouping by EIC climate zone and applying various filtering methods. National weighted averages will be available near the top of the file.

At this point, you will also need to provide an alphanumeric product identifier for the connected thermostat; e.g. a combination of the connected thermostat service plus one or more connected thermostat device models that comprises the data set.

product_id = "INSERT ALPHANUMERIC PRODUCT ID HERE"
stats_filepath = os.path.join(data_dir, "thermostat_example_stats.csv")
stats_df = summary_statistics_to_csv(stats, stats_filepath, product_id)

# or with advanced filter outputs
# stats_advanced_filepath = os.path.join(data_dir, "thermostat_example_stats_advanced.csv")
# stats_advanced_df = summary_statistics_to_csv(stats_advanced, stats_advanced_filepath, product_id)

National savings are computed by weighted average of percent savings results grouped by climate zone. Heavier weights are applied to results in climate zones which, regionally, tend to have longer runtimes. Weightings used are available for download.

Notes for Windows Users

Python under Windows requires that all multiprocessing code needs to be run under a sub module. If you are under Windows you will need to wrap your code using the following:

def main():
    # Code goes here

if __name__ == "__main__":
    main()

Not having this wrapper will cause a Runtime Error “Attempt to start a new process before the current process has finished its bootstrapping phase.”.

Other platforms should not be affected by this.

Notes for Windows Conda Users

Thermostat 1.7.x may have issues installing on Windows machines using pip because of issues with the Shapely wheel and numpy. If you are receiving strange behavior such as “WindowsError: [Error 126] The specified module could not be found” then please try this method to install the Thermostat module:

$ conda env remove --name thermostat
$ conda create --yes --name thermostat python==3.10
$ conda install -c conda-forge shapely pandas==1.4.1 numpy==1.22.2
$ pip install thermostat==1.7.4

Note

This is only recommended in cases where the Python environment has issues running the thermostat module. If you are not having issues then we recommend sticking with pip for installing the software.

Sample Program

Here is a complete version of the above tutorial code (this code can be found in the scripts/multi_thermostat_tutorial.py file):

import os
import logging
import logging.config
import json
from thermostat.importers import from_csv
from thermostat.exporters import metrics_to_csv
from thermostat.stats import compute_summary_statistics
from thermostat.stats import summary_statistics_to_csv
from thermostat.multiple import multiple_thermostat_calculate_epa_field_savings_metrics


# This is an example of how to best use the new multi-processing functionality.
# It shows the proper format for wrapping the code under a main() function and
# shows how to use the multiple_thermostat_calculate_epa_field_savings_metrics
# function. Windows needs to have this code wrapped in a main() function in
# order to work.


def main():

    logging.basicConfig()
    # Example logging configuration for file and console output
    # logging.json: Normal logging example
    # logging_noisy.json: Turns on all debugging information
    # logging_quiet.json: Only logs error messages
    with open("logging.json", "r") as logging_config:
        logging.config.dictConfig(json.load(logging_config))

    logger = logging.getLogger('epathermostat')  # Uses the 'epathermostat' logging
    logger.debug("Starting...")
    logging.captureWarnings(True)  # Set to True to log additional warning messages, False to only display on console

    data_dir = os.path.join("..", "tests", "data")
    metadata_filename = os.path.join(data_dir, "metadata.csv")

    # Use this to save the weather cache to local disk files
    # thermostats = from_csv(metadata_filename, verbose=True, save_cache=True, cache_path='/tmp/epa_weather_files/')

    # Verbose will override logging to display the imported thermostats. Set to "False" to use the logging level instead
    thermostats = from_csv(metadata_filename, verbose=True)

    output_dir = "."
    metrics = multiple_thermostat_calculate_epa_field_savings_metrics(thermostats)

    output_filename = os.path.join(output_dir, "thermostat_example_output.csv")
    metrics_out = metrics_to_csv(metrics, output_filename)

    stats = compute_summary_statistics(metrics_out)
    stats_advanced = compute_summary_statistics(metrics_out, advanced_filtering=True)

    product_id = "test_product"
    stats_filepath = os.path.join(data_dir, "thermostat_example_stats.csv")
    summary_statistics_to_csv(stats, stats_filepath, product_id)

    stats_advanced_filepath = os.path.join(data_dir, "thermostat_example_stats_advanced.csv")
    summary_statistics_to_csv(stats_advanced, stats_advanced_filepath, product_id)


if __name__ == "__main__":
    main()

More information

For additional information on package usage, please see the API documentation. For additional information in the input and output data files please see the Input data and Output data documentation.