Transformer, Foundation Models for Financial Time Series Forecasting (FTSF).
Of pre-training an LLM and fine-tuning on a custom dataset (e.g. the Financial Aid dataset) for downstream tasks.
This short-paper is published in IEEE International Workshop on Large Language Models for Finance.
@article{islam2024large,
title={Large Language Models for Financial Aid in Financial Time-series Forecasting},
author={Islam, Md Khairul and Karmacharya, Ayush and Sue, Timothy and Fox, Judy},
journal={arXiv preprint arXiv:2410.19025},
year={2024}
}
Financial aid distributed to each US state by the Government to support student education and collected from years 2004 to 2020 from InformedStaets.org. Details of the available features are in the following Table. Aid is given based on financial needs, academic merit, or both. The sub-categories are simplified and describe multiple features.
Category | Sub-category | Description |
---|---|---|
Identifier | State id and name abbreviation. | |
Number | Total students receiving the award. | |
Public/Private | Whether the funds can be used for public or private sectors and how long (2 or 4 years). | |
Need, Merit, both | Flags | 0 or 1 based on whether the aid falls in a particular category. |
Program | Aid program with the most generous eligibility criteria. | |
Notes | Related text. | |
Threshold | GPA, SAT, income, and other academic or financial limits to qualify for the aid. | |
Time | Year | Fiscal or academic year. |
Target | Amount | Aid amount received by the students. |
From 2004 to 2020 (17 years), in billions of US dollars. Access to historical datasets is limited to yearly intervals.
Representative rates of US dollar for the period August 01, 2014 - August 01, 2024.
Collected from the IMF rates database.
These rates, normally quoted as currency units per U.S. dollar, are reported daily to the Fund by the issuing central bank. (The IMF does not maintain exchange rates on weekends and some holidays.) The collected data covers the following currencies:
Converted to csv using the following
df = pd.read_csv('./data/Exchange_Rate_Report.tsv', sep='\t')
df.drop(['Unnamed: 0', 'Unnamed: 9'], axis=1, inplace=True)
df.fillna(method='ffill').fillna(method='bfill').to_csv(
'./data/Exchange_Rate_Report.csv',
sep=',', index=False
)
Daily stock prices (Close, Open, High, Low) and volumes for each stock for upto 10 years from NASDAQ database.
Time Series models implemented using the Time Series Library
Time Series LLM models
Few-shot learning performance with 10% training data. TimeLLM and PatchTST outperform the other models. The best and the second best results are in bold and underlined.
GPT4TS performs the best in zero shot performance. The best and the second best results are in bold and underlined. The traditional models are excluded here since they are not pre-trained.
Install the required libraries using
pip install -r requirements.txt
Use the run.py
script for the traditional models. The run_CALF
, run_OFA
and run_TimeLLM
are for the CALF
, GPT4TS
and TimeLLM
respectively. The sample scripts are available in scripts
folder.