Experiments

Experiments#

After introducing our principal TFT model, we describe our experiments desing in this chapter.

Data Splits#

Split

Unless otherwise mentioned, we use the Primary Split to conudct experiments. Validation set comprises the next 15 days after the training period, and the test set comprises the following 15 days after the validation set.

Hyperparameter Tuning#

We fine-tune in total 5 models: our proposed TFT model, LSTM, Bi-LSTM, NBEATS, and NHiTS, with last four being the models used for comparison with our TFT model.
We use the PyTorch implementation for our TFT model and tune the hyperparameters with Optuna with 25 trial runs for each model and selected the best configuration based on the validation loss. All models are optimized using Adam Optimizer and MSE Loss. We make our comparisons by using the best configuration of each models. Split

Evaluation Metrics and Comparison#

We use a variety of metrics including MAE, RMSE, RMSLE, SMAPE, and NNSE because each of them has their own focus. So we use a variety of metrics to hollistically evaluate our model performance.

Metric	Usage	Lower is Better
MAE	penalize the model irrespective of error magnitude	Yes
RMSE	penalize more for larger outliers	Yes
SMAPE	measure the proportional error	Yes
RMSLE	useful when the error distribution is skewed because applying logarithm	Yes
NNSE	robust to error variance and has a range of [0,1]	No

Our TFT model excels all other models in terms of all 5 evaluation metrics.

Model	MAE	RMSE	RMSLE	SMAPE	NNSE
TFT	35.68	221.3	1.347	0.842	0.679
LSTM	40.27	267.1	1.434	1.054	0.616
Bi-LSTM	40.36	261.8	1.465	1.022	0.626
NHiTS	36.79	247.5	1.366	1.066	0.628
NBEATS	41.22	244.8	1.649	1.134	0.633

Our line graph comparing prediction performances also reinforces our conclusions from the evaluation metrics. Split

Experiments

Contents

Experiments#

Data Splits#

Hyperparameter Tuning#

Evaluation Metrics and Comparison#