SA-Timeseries

Temporal Saliency Analysis for Multi-Horizon Time Series Forecasting using Deep Learning

Interpreting the model’s behavior is important in understanding decision-making in practice. However, explaining complex time series forecasting models faces challenges due to temporal dependencies between subsequent time steps and the varying importance of input features over time. Many time series forecasting models use input context with a look-back window for better prediction performance. However, the existing studies (1) do not consider the temporal dependencies among the feature vectors in the input window and (2) separately consider the time dimension that the feature dimension when calculating the importance scores. In this work, we propose a novel Windowed Temporal Saliency Analysis method to address these issues.

Saliency Analysis

Saliency Analysis is the study of input feature importance to model output using black-box interpretation techniques. We use the following libraries to perform the saliency analysis methods.

Captum

(“comprehension” in Latin) is an open source library for model interpretability built on PyTorch.

Time Interpret (tint)

This package expands the Captum library with a specific focus on time-series. As such, it includes various interpretability methods specifically designed to handle time series data.

Multi-Horizon Forecasting

Multi-horizon forecasting is the prediction of variables-of-interest at multiple future time steps. It is a crucial challenge in time series machine learning. Most real-world datasets have a time component, and forecasting the future can unlock great value. For example, retailers can use future sales to optimize their supply chain and promotions, investment managers are interested in forecasting the future prices of financial assets to maximize their performance, and healthcare institutions can use the number of future patient admissions to have sufficient personnel and equipment.

We use the following library for implementing the time series models,

Time-Series-Library (TSlib)

TSlib is an open-source library for deep learning researchers, especially deep time series analysis.

Interpretation Methods

The following local intepretation methods are supported till now:

Time Series Models

This repository currently has the following models collected from Time-Series-Library.

Datasets

The datasets are available at this Google Drive in the long-term-forecast folder. Download and keep them in the dataset folder here. Only mimic-iii dataset is private and hence must be approved to get access from PhysioNet.

Electricity

The electricity dataset ¹ was collected in 15-minute intervals from 2011 to 2014. We select the records from 2012 to 2014 since many zero values exist in 2011. The processed dataset contains the hourly electricity consumption of 321 clients. We use ’MT 321’ as the target, and the train/val/test is 12/2/2 months. We aggregated it to 1h intervals following prior works.

Traffic

This dataset ² records the road occupancy rates from different sensors on San Francisco freeways.

Mimic-III

MIMIC-III is a multivariate clinical time series dataset with a range of vital and lab measurements taken over time for around 40,000 patients at the Beth Israel Deaconess Medical Center in Boston, MA (Johnson et al. ³, 2016). It is widely used in healthcare and medical AI-related research. There are multiple tasks associated, including mortality, length-of-stay prediction, and phenotyping. We follow the pre-processing procedure described in Tonekaboni et al. (2020) ⁴ and use 8 vitals and 20 lab measurements hourly over a 48-hour period to predict patient mortality. For more visit the source description.

This is a private dataset. Refer to the official MIMIC-III documentation. ReadMe and datagen of MIMIC is from Dynamask Repo. This repository followed the database setup instructions from the offficial site here.

Run this command to acquire the data and store it:
```
 python -m data.mimic_iii.icu_mortality --sqluser YOUR_USER --sqlpass YOUR_PASSWORD
```
If everything happens properly, two files named adult_icu_vital.gz and adult_icu_lab.gz are stored in dataset/mimic_iii.
Run this command to preprocess the data:
```
 python -m data.mimic_iii.data_preprocess
```
If everything happens properly, a file mimic_iii.pkl is stored in dataset/mimic_iii.

How to Reproduce

The module was developed using python 3.10.

Option 1. Use Singularity Container

Ensure you have singularity installed. On Rivanna you might need to load singularity first with module load singularity. Pull the singularity container from the remote library,

singularity pull timeseries.sif library://khairulislam/collection/timeseries:latest

## uncomment the following if you want to build it from scratch instead
# sudo singularity build timeseries.sif singularity.def

This saves the pulled container with name timeseries.sif in the current directory. You can use the container to run the scripts. For example,

singularity run --nv timeseries.sif python run.py

Here --nv option enforces using the NVIDIA GPU.

Option 2. Use Virtual Environment

First create a virtual environment with the required libraries. For example, to create an venv named ml, you can either use the Anaconda library or your locally installed python. An example code using Anaconda,

conda create -n ml python=3.10
conda activate ml

This will activate the venv ml. If you want to run code on your GPU, you need to have CUDA installed. Check if you already have CUDA installed.

import torch

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
print(f'Using {device} backend')

If this fails to detect your GPU, install CUDA using,

pip install torch==1.11.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html

Then install the rest of the required libraries,

python3 -m pip install -r requirements.txt

References

https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014. ↩
https://pems.dot.ca.gov/. ↩
Alistair EW Johnson, Tom J Pollard, Lu Shen, Li-wei H Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. MIMIC-III, a freely accessible critical care database. Scientific Data, 3, 2016. ↩
Sana Tonekaboni, Shalmali Joshi, Kieran Campbell, David K Duvenaud, and Anna Goldenberg. What went wrong and when? Instance-wise feature importance for time-series black-box models. In Neural Information Processing Systems, 2020. ↩

This site is open source. Improve this page.