Financial-Time-Series

Transformer, Foundation Models for Financial Time Series Forecasting (FTSF).

High-level overview

Of pre-training an LLM and fine-tuning on a custom dataset (e.g. the Financial Aid dataset) for downstream tasks.

Citation

This short-paper is published in IEEE International Workshop on Large Language Models for Finance.

@article{islam2024large,
  title={Large Language Models for Financial Aid in Financial Time-series Forecasting},
  author={Islam, Md Khairul and Karmacharya, Ayush and Sue, Timothy and Fox, Judy},
  journal={arXiv preprint arXiv:2410.19025},
  year={2024}
}

Dataset

Financial Aid by US States

Financial aid distributed to each US state by the Government to support student education and collected from years 2004 to 2020 from InformedStaets.org. Details of the available features are in the following Table. Aid is given based on financial needs, academic merit, or both. The sub-categories are simplified and describe multiple features.

Category	Sub-category	Description
	Identifier	State id and name abbreviation.
	Number	Total students receiving the award.
	Public/Private	Whether the funds can be used for public or private sectors and how long (2 or 4 years).
Need, Merit, both	Flags	0 or 1 based on whether the aid falls in a particular category.
	Program	Aid program with the most generous eligibility criteria.
	Notes	Related text.
	Threshold	GPA, SAT, income, and other academic or financial limits to qualify for the aid.
Time	Year	Fiscal or academic year.
Target	Amount	Aid amount received by the students.

Need amount aggregated at the state level

From 2004 to 2020 (17 years), in billions of US dollars. Access to historical datasets is limited to yearly intervals.

Currency Exchange Rate

Representative rates of US dollar for the period August 01, 2014 - August 01, 2024.
Collected from the IMF rates database. These rates, normally quoted as currency units per U.S. dollar, are reported daily to the Fund by the issuing central bank. (The IMF does not maintain exchange rates on weekends and some holidays.) The collected data covers the following currencies:

Australian Dollar (AUD)
Candian Dollar (CAD)
Chinese yuan (CNY)
Euro (EUR)
Indian rupee (INR)
Japanese yen (JPY)
U.K. pound (GBP)

Converted to csv using the following

df = pd.read_csv('./data/Exchange_Rate_Report.tsv', sep='\t')
df.drop(['Unnamed: 0', 'Unnamed: 9'], axis=1, inplace=True)
df.fillna(method='ffill').fillna(method='bfill').to_csv(
    './data/Exchange_Rate_Report.csv', 
    sep=',', index=False
)

Stock Market

Daily stock prices (Close, Open, High, Low) and volumes for each stock for upto 10 years from NASDAQ database.

Commodity

Models

Time Series models implemented using the Time Series Library

DLinear - Are Transformers Effective for Time Series Forecasting? [AAAI 2023]
iTransformer - iTransformer: Inverted Transformers Are Effective for Time Series Forecasting [ICLR 2024].
TimeMixer - TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting [ICLR 2024].
PatchTST - A Time Series is Worth 64 Words: Long-term Forecasting with Transformers [ICLR 2023].
TimesNet - TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis [ICLR 2023].

Time Series LLM models

GPT4TS - One Fits All (OFA) : Power General Time Series Analysis by Pretrained LM (NeurIPS 2023 Spotlight)
CALF - CALF - Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning. (Under review 2024)
TimeLLM - Time-LLM: Time Series Forecasting by Reprogramming Large Language Models (ICLR 2024)

Results

Which models are better as few-shot learners?

Few-shot learning performance with 10% training data. TimeLLM and PatchTST outperform the other models. The best and the second best results are in bold and underlined.

Can LLMs perform zero-shot learning in FTSF?

GPT4TS performs the best in zero shot performance. The best and the second best results are in bold and underlined. The traditional models are excluded here since they are not pre-trained.

Reproduce

Install the required libraries using

pip install -r requirements.txt

Use the run.py script for the traditional models. The run_CALF, run_OFA and run_TimeLLM are for the CALF, GPT4TS and TimeLLM respectively. The sample scripts are available in scripts folder. Run those commands from the project root folder.

For example, to train on the Apple stock dataset using DLinear model run the following from the project root. Apple.csv has 5 feature columns and this whole experiment will be repeated 3 times. You can test the trained model later with --test.

python run.py --n_features 5 --data_path Apple.csv --model DLinear --itrs 3

To run a pretrained LLM on the financial aid data run the following.

python run_OFA.py\
    --n_features 1 --features S \
    --data_path Financial_Aid_State.csv --group_id GROUP_ID\
    --itrs 3 --d_model 768\
    --model_id ori --patch_size 1 --stride 2\
    --seq_len 10 --label_len 5 --freq a \
    --pred_len 1 --target need_amt

This site is open source. Improve this page.