Utilities have always had some interval meters which could record energy usage at regular intervals (like 5-minute, 15-minute or hourly). Loggers on individual equipment also collect interval data. With the advent of Automated Meter Information (AMI) systems, interval data is becoming much more common.
Whether interval data is used for special studies or for regular billing, it's volume is many times what it used to be. Utilities used to keep twelve readings of monthly usage data for each customer each year. Now, even with hourly data, that volume has increased to 8,760 pieces of data for each customer for each year. That kind of increase in the volume of collected data brings with it the potential increase in the volume of missing data. Since every hour counts, both for studies and for billing, it is becoming standard practice that interval data goes through Validation, Estimation and Editing (VEE) algorithms to identify and fill-in missing data.
Encountering missing data when you are dealing with very large datasets is nothing new, and Daniel has developed many algorithms for identifying, estimating and cleaning missing and suspect energy usage data over the years. To do this well requires comprehensive knowledge of how utility billing data is collected and stored, as well as an understanding of basic engineering principles related to energy consumption, mathematical skills to aid in identification of unusual patterns in the data and sophisticated programming skills to efficiently analyze large volumes of data. Fortunately, Daniel has all of these skills and he has had the opportunity to apply them to many different projects. Individual projects where Daniel verified, analyzed and/or cleaned very large energy usage datasets are listed on this page.
Real-time Display of Customer AMI Data. Daniel developed graphics for real-time Web display of customer interval data as it was being collected. He used C++, Informix and DB2 on a Linux server. (2012)
Long-term Analysis of Customer Billing Data and Hourly AMI Data. Daniel used SQL and mainframe systems to directly download billing data and premise profiles for all of Wisconsin Public Service Corporation's Residential and Agricultural electric and gas customers over the last twenty years. Data was cleaned and merged by customer across years to facilitate analysis. It was necessary to standardize changes in rate codes and billing systems that occurred over the historical period. Daniel also gathered telephone and e-mail addresses to create population lists for general random sampling. After performing monthly billing analysis, the work was extended to hourly data where Daniel identified and investigated very low use customers. (2011)
Analysis of SCADA Data. Daniel prepared, merged and cleaned five-minute SCADA data for use in impact evaluation of a Conservation Voltage Reduction program for PECO Energy. This required understanding substation schematics so a database could be built that correctly accounted for all power flows and relationships between the components and transducers at each substation in the study. (2010)
Investigation into Reliability of Texas AMI Meter Data. Daniel worked with a team to compare AMI to non-AMI customers in Texas, looking at usage patterns and geo-location to see if the presence of an AMI meter affected usage. Daniel cleaned and merged the meter data for the study. This was a very large dataset: 47 million records for Oncor and 28 million for Centerpoint. He created logic to determine the best control group matches based on energy use levels and patterns for a treatment group of 2000 customers. He also compared error rates for different meter manufacturers and models. (2010)
Power Flow Analysis for Los Angeles. Daniel analyzed generation, transmission and load data for a California Energy Commission study of the Los Angeles basin. He used probit and other regression models to analyze energy imports and exports to the region. Results were shown in three-dimensional graphics. (2009)
Efficient Logger Data Handling. Daniel wrote code to automatically find all zipped and regular logger files in multiple subdirecties, unpack them, and build a single dataset from them. This allowed quick, error-free compilation of individual logger files without manually moving or unzipping any files. This effort is often combined with data preparation and cleaning. It was used to save time and improve data quality in the following projects:
Pepco Holdings Direct Load Control Study (2010 to 2011)
KCPL Direct Load Control Study (2010)
Progress Energy-Carolinas Energy Efficiency Benchmarking (2010)
University of California Lighting Logger Study (2009)