initial commit

This commit is contained in:
Ming Jin 2024-01-29 15:53:06 +11:00
parent cf621c5222
commit f0ae0b0e9b
39 changed files with 4176 additions and 2 deletions

7
LEGAL.md Normal file
View File

@ -0,0 +1,7 @@
Legal Disclaimer
Within this source code, the comments in Chinese shall be the original, governing version. Any comment in other languages are for reference only. In the event of any conflict between the Chinese language version comments and other language version comments, the Chinese language version shall prevail.
法律免责声明
关于代码注释部分,中文注释为官方版本,其它语言注释仅做参考。中文注释可能与其它语言注释存在不一致,当中文注释与其它语言注释存在不一致时,请以中文注释为准。

122
README.md
View File

@ -1,2 +1,120 @@
# Time-LLM
Official implementation of "Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
<div align="center">
<!-- <h1><b> Time-LLM </b></h1> -->
<!-- <h2><b> Time-LLM </b></h2> -->
<h2><b> Time-LLM: Time Series Forecasting by Reprogramming Large Language Models </b></h2>
</div>
<div align="center">
![](https://img.shields.io/github/last-commit/KimMeen/Time-LLM?color=green)
![](https://img.shields.io/github/stars/KimMeen/Time-LLM?color=yellow)
![](https://img.shields.io/github/forks/KimMeen/Time-LLM?color=lightblue)
![](https://img.shields.io/badge/PRs-Welcome-green)
</div>
<div align="center">
**[<a href="https://arxiv.org/abs/2310.01728">Paper Page</a>]**
**[<a href="https://mp.weixin.qq.com/s/FSxUdvPI713J2LiHnNaFCw">中文解读1</a>]**
**[<a href="https://mp.weixin.qq.com/s/nUiQGnHOkWznoBPqM0KHXg">中文解读2</a>]**
</div>
<p align="center">
<img src="./figures/logo.png" width="60">
</p>
---
>
> 🙋 Please let us know if you find out a mistake or have any suggestions!
>
> 🌟 If you find this resource helpful, please consider to star this repository and cite our research:
```
@inproceedings{jin2023time,
title={Time-llm: Time series forecasting by reprogramming large language models},
author={Jin, Ming and Wang, Shiyu and Ma, Lintao and Chu, Zhixuan and Zhang, James Y and Shi, Xiaoming and Chen, Pin-Yu and Liang, Yuxuan and Li, Yuan-Fang and Pan, Shirui and others},
booktitle={International Conference on Learning Representations},
year={2024}
}
```
## Introdcution
Time-LLM is a reprogramming framework to repurpose LLMs for general time series forecasting with the backbone language models kept intact.
Notably, we show that time series analysis (e.g., forecasting) can be cast as yet another "language task" that can be effectively tackled by an off-the-shelf LLM.
<p align="center">
<img src="./figures/framework.png" height = "360" alt="" align=center />
</p>
- Time-LLM comprises two key components: (1) reprogramming the input time series into text prototype representations that are more natural for the LLM, and (2) augmenting the input context with declarative prompts (e.g., domain expert knowledge and task instructions) to guide LLM reasoning.
<p align="center">
<img src="./figures/method-detailed-illustration.png" height = "190" alt="" align=center />
</p>
## Requirements
- accelerate==0.20.3
- einops==0.7.0
- matplotlib==3.7.0
- numpy==1.23.5
- pandas==1.5.3
- scikit_learn==1.2.2
- scipy==1.5.4
- torch==2.0.1
- tqdm==4.65.0
- peft==0.4.0
- transformers==4.31.0
- deepspeed==0.13.0
To install all dependencies:
```
pip install -r requirements.txt
```
## Datasets
You can access the well pre-processed datasets from [[Google Drive]](https://drive.google.com/file/d/1NF7VEefXCmXuWNbnNe858WvQAkJ_7wuP/view?usp=sharing), then place the downloaded contents under `./dataset`
## Quick Demos
1. Download datasets and place them under `./dataset`
2. Tune the model. We provide five experiment scripts for demonstration purpose under the folder `./scripts`. For example, you can evaluate on ETT datasets by:
```bash
bash ./scripts/TimeLLM_ETTh1.sh
```
```bash
bash ./scripts/TimeLLM_ETTh2.sh
```
```bash
bash ./scripts/TimeLLM_ETTm1.sh
```
```bash
bash ./scripts/TimeLLM_ETTm2.sh
```
## Detailed usage
Please refer to ```run_main.py``` and ```run_m4.py``` for the detailed description of each hyperparameter.
## Further Reading
[**Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook**](https://arxiv.org/abs/2310.10196)
**Authors**: Ming Jin, Qingsong Wen*, Yuxuan Liang, Chaoli Zhang, Siqiao Xue, Xue Wang, James Zhang, Yi Wang, Haifeng Chen, Xiaoli Li (IEEE Fellow), Shirui Pan*, Vincent S. Tseng (IEEE Fellow), Yu Zheng (IEEE Fellow), Lei Chen (IEEE Fellow), Hui Xiong (IEEE Fellow)
🌟 If you find this resource helpful, please consider to cite it in your research:
```
@article{jin2023lm4ts,
title={Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook},
author={Ming Jin and Qingsong Wen and Yuxuan Liang and Chaoli Zhang and Siqiao Xue and Xue Wang and James Zhang and Yi Wang and Haifeng Chen and Xiaoli Li and Shirui Pan and Vincent S. Tseng and Yu Zheng and Lei Chen and Hui Xiong},
journal={arXiv preprint arXiv:2310.10196},
year={2023}
}
```
## Acknowledgement
Our implementation adapts [Time-Series-Library](https://github.com/thuml/Time-Series-Library) and [GPT4TS](https://github.com/DAMO-DI-ML/NeurIPS2023-One-Fits-All) as the code base and have extensively modified it to our purposes. We thank the authors for sharing their implementations and related resources.

View File

@ -0,0 +1 @@

View File

@ -0,0 +1,62 @@
from data_provider.data_loader import Dataset_ETT_hour, Dataset_ETT_minute, Dataset_Custom, Dataset_M4
from torch.utils.data import DataLoader
data_dict = {
'ETTh1': Dataset_ETT_hour,
'ETTh2': Dataset_ETT_hour,
'ETTm1': Dataset_ETT_minute,
'ETTm2': Dataset_ETT_minute,
'custom': Dataset_Custom,
'm4': Dataset_M4,
}
def data_provider(args, flag):
Data = data_dict[args.data]
timeenc = 0 if args.embed != 'timeF' else 1
percent = args.percent
if flag == 'test':
shuffle_flag = False
drop_last = True
batch_size = args.batch_size
freq = args.freq
else:
shuffle_flag = True
drop_last = True
batch_size = args.batch_size
freq = args.freq
if args.data == 'm4':
drop_last = False
data_set = Data(
root_path=args.root_path,
data_path=args.data_path,
flag=flag,
size=[args.seq_len, args.label_len, args.pred_len],
features=args.features,
target=args.target,
timeenc=timeenc,
freq=freq,
seasonal_patterns=args.seasonal_patterns
)
else:
data_set = Data(
root_path=args.root_path,
data_path=args.data_path,
flag=flag,
size=[args.seq_len, args.label_len, args.pred_len],
features=args.features,
target=args.target,
timeenc=timeenc,
freq=freq,
percent=percent,
seasonal_patterns=args.seasonal_patterns
)
data_loader = DataLoader(
data_set,
batch_size=batch_size,
shuffle=shuffle_flag,
num_workers=args.num_workers,
drop_last=drop_last)
return data_set, data_loader

View File

@ -0,0 +1,389 @@
import os
import numpy as np
import pandas as pd
from torch.utils.data import Dataset
from sklearn.preprocessing import StandardScaler
from utils.timefeatures import time_features
from data_provider.m4 import M4Dataset, M4Meta
import warnings
warnings.filterwarnings('ignore')
class Dataset_ETT_hour(Dataset):
def __init__(self, root_path, flag='train', size=None,
features='S', data_path='ETTh1.csv',
target='OT', scale=True, timeenc=0, freq='h', percent=100,
seasonal_patterns=None):
if size == None:
self.seq_len = 24 * 4 * 4
self.label_len = 24 * 4
self.pred_len = 24 * 4
else:
self.seq_len = size[0]
self.label_len = size[1]
self.pred_len = size[2]
# init
assert flag in ['train', 'test', 'val']
type_map = {'train': 0, 'val': 1, 'test': 2}
self.set_type = type_map[flag]
self.percent = percent
self.features = features
self.target = target
self.scale = scale
self.timeenc = timeenc
self.freq = freq
# self.percent = percent
self.root_path = root_path
self.data_path = data_path
self.__read_data__()
self.enc_in = self.data_x.shape[-1]
self.tot_len = len(self.data_x) - self.seq_len - self.pred_len + 1
def __read_data__(self):
self.scaler = StandardScaler()
df_raw = pd.read_csv(os.path.join(self.root_path,
self.data_path))
border1s = [0, 12 * 30 * 24 - self.seq_len, 12 * 30 * 24 + 4 * 30 * 24 - self.seq_len]
border2s = [12 * 30 * 24, 12 * 30 * 24 + 4 * 30 * 24, 12 * 30 * 24 + 8 * 30 * 24]
border1 = border1s[self.set_type]
border2 = border2s[self.set_type]
if self.set_type == 0:
border2 = (border2 - self.seq_len) * self.percent // 100 + self.seq_len
if self.features == 'M' or self.features == 'MS':
cols_data = df_raw.columns[1:]
df_data = df_raw[cols_data]
elif self.features == 'S':
df_data = df_raw[[self.target]]
if self.scale:
train_data = df_data[border1s[0]:border2s[0]]
self.scaler.fit(train_data.values)
data = self.scaler.transform(df_data.values)
else:
data = df_data.values
df_stamp = df_raw[['date']][border1:border2]
df_stamp['date'] = pd.to_datetime(df_stamp.date)
if self.timeenc == 0:
df_stamp['month'] = df_stamp.date.apply(lambda row: row.month, 1)
df_stamp['day'] = df_stamp.date.apply(lambda row: row.day, 1)
df_stamp['weekday'] = df_stamp.date.apply(lambda row: row.weekday(), 1)
df_stamp['hour'] = df_stamp.date.apply(lambda row: row.hour, 1)
data_stamp = df_stamp.drop(['date'], 1).values
elif self.timeenc == 1:
data_stamp = time_features(pd.to_datetime(df_stamp['date'].values), freq=self.freq)
data_stamp = data_stamp.transpose(1, 0)
self.data_x = data[border1:border2]
self.data_y = data[border1:border2]
self.data_stamp = data_stamp
def __getitem__(self, index):
feat_id = index // self.tot_len
s_begin = index % self.tot_len
s_end = s_begin + self.seq_len
r_begin = s_end - self.label_len
r_end = r_begin + self.label_len + self.pred_len
seq_x = self.data_x[s_begin:s_end, feat_id:feat_id + 1]
seq_y = self.data_y[r_begin:r_end, feat_id:feat_id + 1]
seq_x_mark = self.data_stamp[s_begin:s_end]
seq_y_mark = self.data_stamp[r_begin:r_end]
return seq_x, seq_y, seq_x_mark, seq_y_mark
def __len__(self):
return (len(self.data_x) - self.seq_len - self.pred_len + 1) * self.enc_in
def inverse_transform(self, data):
return self.scaler.inverse_transform(data)
class Dataset_ETT_minute(Dataset):
def __init__(self, root_path, flag='train', size=None,
features='S', data_path='ETTm1.csv',
target='OT', scale=True, timeenc=0, freq='t', percent=100,
seasonal_patterns=None):
if size == None:
self.seq_len = 24 * 4 * 4
self.label_len = 24 * 4
self.pred_len = 24 * 4
else:
self.seq_len = size[0]
self.label_len = size[1]
self.pred_len = size[2]
# init
assert flag in ['train', 'test', 'val']
type_map = {'train': 0, 'val': 1, 'test': 2}
self.set_type = type_map[flag]
self.percent = percent
self.features = features
self.target = target
self.scale = scale
self.timeenc = timeenc
self.freq = freq
self.root_path = root_path
self.data_path = data_path
self.__read_data__()
self.enc_in = self.data_x.shape[-1]
self.tot_len = len(self.data_x) - self.seq_len - self.pred_len + 1
def __read_data__(self):
self.scaler = StandardScaler()
df_raw = pd.read_csv(os.path.join(self.root_path,
self.data_path))
border1s = [0, 12 * 30 * 24 * 4 - self.seq_len, 12 * 30 * 24 * 4 + 4 * 30 * 24 * 4 - self.seq_len]
border2s = [12 * 30 * 24 * 4, 12 * 30 * 24 * 4 + 4 * 30 * 24 * 4, 12 * 30 * 24 * 4 + 8 * 30 * 24 * 4]
border1 = border1s[self.set_type]
border2 = border2s[self.set_type]
if self.set_type == 0:
border2 = (border2 - self.seq_len) * self.percent // 100 + self.seq_len
if self.features == 'M' or self.features == 'MS':
cols_data = df_raw.columns[1:]
df_data = df_raw[cols_data]
elif self.features == 'S':
df_data = df_raw[[self.target]]
if self.scale:
train_data = df_data[border1s[0]:border2s[0]]
self.scaler.fit(train_data.values)
data = self.scaler.transform(df_data.values)
else:
data = df_data.values
df_stamp = df_raw[['date']][border1:border2]
df_stamp['date'] = pd.to_datetime(df_stamp.date)
if self.timeenc == 0:
df_stamp['month'] = df_stamp.date.apply(lambda row: row.month, 1)
df_stamp['day'] = df_stamp.date.apply(lambda row: row.day, 1)
df_stamp['weekday'] = df_stamp.date.apply(lambda row: row.weekday(), 1)
df_stamp['hour'] = df_stamp.date.apply(lambda row: row.hour, 1)
df_stamp['minute'] = df_stamp.date.apply(lambda row: row.minute, 1)
df_stamp['minute'] = df_stamp.minute.map(lambda x: x // 15)
data_stamp = df_stamp.drop(['date'], 1).values
elif self.timeenc == 1:
data_stamp = time_features(pd.to_datetime(df_stamp['date'].values), freq=self.freq)
data_stamp = data_stamp.transpose(1, 0)
self.data_x = data[border1:border2]
self.data_y = data[border1:border2]
self.data_stamp = data_stamp
def __getitem__(self, index):
feat_id = index // self.tot_len
s_begin = index % self.tot_len
s_end = s_begin + self.seq_len
r_begin = s_end - self.label_len
r_end = r_begin + self.label_len + self.pred_len
seq_x = self.data_x[s_begin:s_end, feat_id:feat_id + 1]
seq_y = self.data_y[r_begin:r_end, feat_id:feat_id + 1]
seq_x_mark = self.data_stamp[s_begin:s_end]
seq_y_mark = self.data_stamp[r_begin:r_end]
return seq_x, seq_y, seq_x_mark, seq_y_mark
def __len__(self):
return (len(self.data_x) - self.seq_len - self.pred_len + 1) * self.enc_in
def inverse_transform(self, data):
return self.scaler.inverse_transform(data)
class Dataset_Custom(Dataset):
def __init__(self, root_path, flag='train', size=None,
features='S', data_path='ETTh1.csv',
target='OT', scale=True, timeenc=0, freq='h', percent=100,
seasonal_patterns=None):
if size == None:
self.seq_len = 24 * 4 * 4
self.label_len = 24 * 4
self.pred_len = 24 * 4
else:
self.seq_len = size[0]
self.label_len = size[1]
self.pred_len = size[2]
# init
assert flag in ['train', 'test', 'val']
type_map = {'train': 0, 'val': 1, 'test': 2}
self.set_type = type_map[flag]
self.features = features
self.target = target
self.scale = scale
self.timeenc = timeenc
self.freq = freq
self.percent = percent
self.root_path = root_path
self.data_path = data_path
self.__read_data__()
self.enc_in = self.data_x.shape[-1]
self.tot_len = len(self.data_x) - self.seq_len - self.pred_len + 1
def __read_data__(self):
self.scaler = StandardScaler()
df_raw = pd.read_csv(os.path.join(self.root_path,
self.data_path))
'''
df_raw.columns: ['date', ...(other features), target feature]
'''
cols = list(df_raw.columns)
cols.remove(self.target)
cols.remove('date')
df_raw = df_raw[['date'] + cols + [self.target]]
num_train = int(len(df_raw) * 0.7)
num_test = int(len(df_raw) * 0.2)
num_vali = len(df_raw) - num_train - num_test
border1s = [0, num_train - self.seq_len, len(df_raw) - num_test - self.seq_len]
border2s = [num_train, num_train + num_vali, len(df_raw)]
border1 = border1s[self.set_type]
border2 = border2s[self.set_type]
if self.set_type == 0:
border2 = (border2 - self.seq_len) * self.percent // 100 + self.seq_len
if self.features == 'M' or self.features == 'MS':
cols_data = df_raw.columns[1:]
df_data = df_raw[cols_data]
elif self.features == 'S':
df_data = df_raw[[self.target]]
if self.scale:
train_data = df_data[border1s[0]:border2s[0]]
self.scaler.fit(train_data.values)
data = self.scaler.transform(df_data.values)
else:
data = df_data.values
df_stamp = df_raw[['date']][border1:border2]
df_stamp['date'] = pd.to_datetime(df_stamp.date)
if self.timeenc == 0:
df_stamp['month'] = df_stamp.date.apply(lambda row: row.month, 1)
df_stamp['day'] = df_stamp.date.apply(lambda row: row.day, 1)
df_stamp['weekday'] = df_stamp.date.apply(lambda row: row.weekday(), 1)
df_stamp['hour'] = df_stamp.date.apply(lambda row: row.hour, 1)
data_stamp = df_stamp.drop(['date'], 1).values
elif self.timeenc == 1:
data_stamp = time_features(pd.to_datetime(df_stamp['date'].values), freq=self.freq)
data_stamp = data_stamp.transpose(1, 0)
self.data_x = data[border1:border2]
self.data_y = data[border1:border2]
self.data_stamp = data_stamp
def __getitem__(self, index):
feat_id = index // self.tot_len
s_begin = index % self.tot_len
s_end = s_begin + self.seq_len
r_begin = s_end - self.label_len
r_end = r_begin + self.label_len + self.pred_len
seq_x = self.data_x[s_begin:s_end, feat_id:feat_id + 1]
seq_y = self.data_y[r_begin:r_end, feat_id:feat_id + 1]
seq_x_mark = self.data_stamp[s_begin:s_end]
seq_y_mark = self.data_stamp[r_begin:r_end]
return seq_x, seq_y, seq_x_mark, seq_y_mark
def __len__(self):
return (len(self.data_x) - self.seq_len - self.pred_len + 1) * self.enc_in
def inverse_transform(self, data):
return self.scaler.inverse_transform(data)
class Dataset_M4(Dataset):
def __init__(self, root_path, flag='pred', size=None,
features='S', data_path='ETTh1.csv',
target='OT', scale=False, inverse=False, timeenc=0, freq='15min',
seasonal_patterns='Yearly'):
self.features = features
self.target = target
self.scale = scale
self.inverse = inverse
self.timeenc = timeenc
self.root_path = root_path
self.seq_len = size[0]
self.label_len = size[1]
self.pred_len = size[2]
self.seasonal_patterns = seasonal_patterns
self.history_size = M4Meta.history_size[seasonal_patterns]
self.window_sampling_limit = int(self.history_size * self.pred_len)
self.flag = flag
self.__read_data__()
def __read_data__(self):
# M4Dataset.initialize()
if self.flag == 'train':
dataset = M4Dataset.load(training=True, dataset_file=self.root_path)
else:
dataset = M4Dataset.load(training=False, dataset_file=self.root_path)
training_values = np.array(
[v[~np.isnan(v)] for v in
dataset.values[dataset.groups == self.seasonal_patterns]]) # split different frequencies
self.ids = np.array([i for i in dataset.ids[dataset.groups == self.seasonal_patterns]])
self.timeseries = [ts for ts in training_values]
def __getitem__(self, index):
insample = np.zeros((self.seq_len, 1))
insample_mask = np.zeros((self.seq_len, 1))
outsample = np.zeros((self.pred_len + self.label_len, 1))
outsample_mask = np.zeros((self.pred_len + self.label_len, 1)) # m4 dataset
sampled_timeseries = self.timeseries[index]
cut_point = np.random.randint(low=max(1, len(sampled_timeseries) - self.window_sampling_limit),
high=len(sampled_timeseries),
size=1)[0]
insample_window = sampled_timeseries[max(0, cut_point - self.seq_len):cut_point]
insample[-len(insample_window):, 0] = insample_window
insample_mask[-len(insample_window):, 0] = 1.0
outsample_window = sampled_timeseries[
cut_point - self.label_len:min(len(sampled_timeseries), cut_point + self.pred_len)]
outsample[:len(outsample_window), 0] = outsample_window
outsample_mask[:len(outsample_window), 0] = 1.0
return insample, outsample, insample_mask, outsample_mask
def __len__(self):
return len(self.timeseries)
def inverse_transform(self, data):
return self.scaler.inverse_transform(data)
def last_insample_window(self):
"""
The last window of insample size of all timeseries.
This function does not support batching and does not reshuffle timeseries.
:return: Last insample window of all timeseries. Shape "timeseries, insample size"
"""
insample = np.zeros((len(self.timeseries), self.seq_len))
insample_mask = np.zeros((len(self.timeseries), self.seq_len))
for i, ts in enumerate(self.timeseries):
ts_last_window = ts[-self.seq_len:]
insample[i, -len(ts):] = ts_last_window
insample_mask[i, -len(ts):] = 1.0
return insample, insample_mask

132
data_provider/m4.py Normal file
View File

@ -0,0 +1,132 @@
# This source code is provided for the purposes of scientific reproducibility
# under the following limited license from Element AI Inc. The code is an
# implementation of the N-BEATS model (Oreshkin et al., N-BEATS: Neural basis
# expansion analysis for interpretable time series forecasting,
# https://arxiv.org/abs/1905.10437). The copyright to the source code is
# licensed under the Creative Commons - Attribution-NonCommercial 4.0
# International license (CC BY-NC 4.0):
# https://creativecommons.org/licenses/by-nc/4.0/. Any commercial use (whether
# for the benefit of third parties or internally in production) requires an
# explicit license. The subject-matter of the N-BEATS model and associated
# materials are the property of Element AI Inc. and may be subject to patent
# protection. No license to patents is granted hereunder (whether express or
# implied). Copyright © 2020 Element AI Inc. All rights reserved.
"""
M4 Dataset
"""
from dataclasses import dataclass
import numpy as np
import pandas as pd
import logging
import os
import pathlib
import sys
from urllib import request
def url_file_name(url: str) -> str:
"""
Extract file name from url.
:param url: URL to extract file name from.
:return: File name.
"""
return url.split('/')[-1] if len(url) > 0 else ''
def download(url: str, file_path: str) -> None:
"""
Download a file to the given path.
:param url: URL to download
:param file_path: Where to download the content.
"""
def progress(count, block_size, total_size):
progress_pct = float(count * block_size) / float(total_size) * 100.0
sys.stdout.write('\rDownloading {} to {} {:.1f}%'.format(url, file_path, progress_pct))
sys.stdout.flush()
if not os.path.isfile(file_path):
opener = request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
request.install_opener(opener)
pathlib.Path(os.path.dirname(file_path)).mkdir(parents=True, exist_ok=True)
f, _ = request.urlretrieve(url, file_path, progress)
sys.stdout.write('\n')
sys.stdout.flush()
file_info = os.stat(f)
logging.info(f'Successfully downloaded {os.path.basename(file_path)} {file_info.st_size} bytes.')
else:
file_info = os.stat(file_path)
logging.info(f'File already exists: {file_path} {file_info.st_size} bytes.')
@dataclass()
class M4Dataset:
ids: np.ndarray
groups: np.ndarray
frequencies: np.ndarray
horizons: np.ndarray
values: np.ndarray
@staticmethod
def load(training: bool = True, dataset_file: str = '../dataset/m4') -> 'M4Dataset':
"""
Load cached dataset.
:param training: Load training part if training is True, test part otherwise.
"""
info_file = os.path.join(dataset_file, 'M4-info.csv')
train_cache_file = os.path.join(dataset_file, 'training.npz')
test_cache_file = os.path.join(dataset_file, 'test.npz')
m4_info = pd.read_csv(info_file)
return M4Dataset(ids=m4_info.M4id.values,
groups=m4_info.SP.values,
frequencies=m4_info.Frequency.values,
horizons=m4_info.Horizon.values,
values=np.load(
train_cache_file if training else test_cache_file,
allow_pickle=True))
@dataclass()
class M4Meta:
seasonal_patterns = ['Yearly', 'Quarterly', 'Monthly', 'Weekly', 'Daily', 'Hourly']
horizons = [6, 8, 18, 13, 14, 48]
frequencies = [1, 4, 12, 1, 1, 24]
horizons_map = {
'Yearly': 6,
'Quarterly': 8,
'Monthly': 18,
'Weekly': 13,
'Daily': 14,
'Hourly': 48
} # different predict length
frequency_map = {
'Yearly': 1,
'Quarterly': 4,
'Monthly': 12,
'Weekly': 1,
'Daily': 1,
'Hourly': 24
}
history_size = {
'Yearly': 1.5,
'Quarterly': 1.5,
'Monthly': 1.5,
'Weekly': 10,
'Daily': 10,
'Hourly': 10
} # from interpretable.gin
def load_m4_info() -> pd.DataFrame:
"""
Load M4Info file.
:return: Pandas DataFrame of M4Info.
"""
return pd.read_csv(INFO_FILE_PATH)

View File

@ -0,0 +1,2 @@
The Electricity Transformer Temperature (ETT) is a crucial indicator in the electric power long-term deployment. This dataset consists of 2 years data from two separated counties in China. To explore the granularity on the Long sequence time-series forecasting (LSTF) problem, different subsets are created, {ETTh1, ETTh2} for 1-hour-level and ETTm1 for 15-minutes-level. Each data point consists of the target value ”oil temperature” and 6 power load features. The train/val/test is 12/4/4 months.

View File

@ -0,0 +1,2 @@
The M4 dataset is a collection of 100,000 time series used for the fourth edition of the Makridakis forecasting Competition. The M4 dataset consists of time series of yearly, quarterly, monthly and other (weekly, daily and hourly) data, which are divided into training and test sets. The minimum numbers of observations in the training test are 13 for yearly, 16 for quarterly, 42 for monthly, 80 for weekly, 93 for daily and 700 for hourly series. The participants were asked to produce the following numbers of forecasts beyond the available data that they had been given: six for yearly, eight for quarterly, 18 for monthly series, 13 for weekly series and 14 and 48 forecasts respectively for the daily and hourly ones.

21
ds_config_zero2.json Normal file
View File

@ -0,0 +1,21 @@
{
"bf16": {
"enabled": true,
"auto_cast": true
},
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 2e8,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 2e8,
"contiguous_gradients": true,
"sub_group_size": 1e9
},
"gradient_accumulation_steps": "auto",
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"steps_per_print": 10,
"wall_clock_breakdown": false
}

BIN
figures/framework.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 93 KiB

BIN
figures/logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 259 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 752 KiB

163
layers/AutoCorrelation.py Normal file
View File

@ -0,0 +1,163 @@
import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
import numpy as np
import math
from math import sqrt
import os
class AutoCorrelation(nn.Module):
"""
AutoCorrelation Mechanism with the following two phases:
(1) period-based dependencies discovery
(2) time delay aggregation
This block can replace the self-attention family mechanism seamlessly.
"""
def __init__(self, mask_flag=True, factor=1, scale=None, attention_dropout=0.1, output_attention=False):
super(AutoCorrelation, self).__init__()
self.factor = factor
self.scale = scale
self.mask_flag = mask_flag
self.output_attention = output_attention
self.dropout = nn.Dropout(attention_dropout)
def time_delay_agg_training(self, values, corr):
"""
SpeedUp version of Autocorrelation (a batch-normalization style design)
This is for the training phase.
"""
head = values.shape[1]
channel = values.shape[2]
length = values.shape[3]
# find top k
top_k = int(self.factor * math.log(length))
mean_value = torch.mean(torch.mean(corr, dim=1), dim=1)
index = torch.topk(torch.mean(mean_value, dim=0), top_k, dim=-1)[1]
weights = torch.stack([mean_value[:, index[i]] for i in range(top_k)], dim=-1)
# update corr
tmp_corr = torch.softmax(weights, dim=-1)
# aggregation
tmp_values = values
delays_agg = torch.zeros_like(values).float()
for i in range(top_k):
pattern = torch.roll(tmp_values, -int(index[i]), -1)
delays_agg = delays_agg + pattern * \
(tmp_corr[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length))
return delays_agg
def time_delay_agg_inference(self, values, corr):
"""
SpeedUp version of Autocorrelation (a batch-normalization style design)
This is for the inference phase.
"""
batch = values.shape[0]
head = values.shape[1]
channel = values.shape[2]
length = values.shape[3]
# index init
init_index = torch.arange(length).unsqueeze(0).unsqueeze(0).unsqueeze(0).repeat(batch, head, channel, 1).cuda()
# find top k
top_k = int(self.factor * math.log(length))
mean_value = torch.mean(torch.mean(corr, dim=1), dim=1)
weights, delay = torch.topk(mean_value, top_k, dim=-1)
# update corr
tmp_corr = torch.softmax(weights, dim=-1)
# aggregation
tmp_values = values.repeat(1, 1, 1, 2)
delays_agg = torch.zeros_like(values).float()
for i in range(top_k):
tmp_delay = init_index + delay[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length)
pattern = torch.gather(tmp_values, dim=-1, index=tmp_delay)
delays_agg = delays_agg + pattern * \
(tmp_corr[:, i].unsqueeze(1).unsqueeze(1).unsqueeze(1).repeat(1, head, channel, length))
return delays_agg
def time_delay_agg_full(self, values, corr):
"""
Standard version of Autocorrelation
"""
batch = values.shape[0]
head = values.shape[1]
channel = values.shape[2]
length = values.shape[3]
# index init
init_index = torch.arange(length).unsqueeze(0).unsqueeze(0).unsqueeze(0).repeat(batch, head, channel, 1).cuda()
# find top k
top_k = int(self.factor * math.log(length))
weights, delay = torch.topk(corr, top_k, dim=-1)
# update corr
tmp_corr = torch.softmax(weights, dim=-1)
# aggregation
tmp_values = values.repeat(1, 1, 1, 2)
delays_agg = torch.zeros_like(values).float()
for i in range(top_k):
tmp_delay = init_index + delay[..., i].unsqueeze(-1)
pattern = torch.gather(tmp_values, dim=-1, index=tmp_delay)
delays_agg = delays_agg + pattern * (tmp_corr[..., i].unsqueeze(-1))
return delays_agg
def forward(self, queries, keys, values, attn_mask):
B, L, H, E = queries.shape
_, S, _, D = values.shape
if L > S:
zeros = torch.zeros_like(queries[:, :(L - S), :]).float()
values = torch.cat([values, zeros], dim=1)
keys = torch.cat([keys, zeros], dim=1)
else:
values = values[:, :L, :, :]
keys = keys[:, :L, :, :]
# period-based dependencies
q_fft = torch.fft.rfft(queries.permute(0, 2, 3, 1).contiguous(), dim=-1)
k_fft = torch.fft.rfft(keys.permute(0, 2, 3, 1).contiguous(), dim=-1)
res = q_fft * torch.conj(k_fft)
corr = torch.fft.irfft(res, dim=-1)
# time delay agg
if self.training:
V = self.time_delay_agg_training(values.permute(0, 2, 3, 1).contiguous(), corr).permute(0, 3, 1, 2)
else:
V = self.time_delay_agg_inference(values.permute(0, 2, 3, 1).contiguous(), corr).permute(0, 3, 1, 2)
if self.output_attention:
return (V.contiguous(), corr.permute(0, 3, 1, 2))
else:
return (V.contiguous(), None)
class AutoCorrelationLayer(nn.Module):
def __init__(self, correlation, d_model, n_heads, d_keys=None,
d_values=None):
super(AutoCorrelationLayer, self).__init__()
d_keys = d_keys or (d_model // n_heads)
d_values = d_values or (d_model // n_heads)
self.inner_correlation = correlation
self.query_projection = nn.Linear(d_model, d_keys * n_heads)
self.key_projection = nn.Linear(d_model, d_keys * n_heads)
self.value_projection = nn.Linear(d_model, d_values * n_heads)
self.out_projection = nn.Linear(d_values * n_heads, d_model)
self.n_heads = n_heads
def forward(self, queries, keys, values, attn_mask):
B, L, _ = queries.shape
_, S, _ = keys.shape
H = self.n_heads
queries = self.query_projection(queries).view(B, L, H, -1)
keys = self.key_projection(keys).view(B, S, H, -1)
values = self.value_projection(values).view(B, S, H, -1)
out, attn = self.inner_correlation(
queries,
keys,
values,
attn_mask
)
out = out.view(B, L, -1)
return self.out_projection(out), attn

203
layers/Autoformer_EncDec.py Normal file
View File

@ -0,0 +1,203 @@
import torch
import torch.nn as nn
import torch.nn.functional as F
class my_Layernorm(nn.Module):
"""
Special designed layernorm for the seasonal part
"""
def __init__(self, channels):
super(my_Layernorm, self).__init__()
self.layernorm = nn.LayerNorm(channels)
def forward(self, x):
x_hat = self.layernorm(x)
bias = torch.mean(x_hat, dim=1).unsqueeze(1).repeat(1, x.shape[1], 1)
return x_hat - bias
class moving_avg(nn.Module):
"""
Moving average block to highlight the trend of time series
"""
def __init__(self, kernel_size, stride):
super(moving_avg, self).__init__()
self.kernel_size = kernel_size
self.avg = nn.AvgPool1d(kernel_size=kernel_size, stride=stride, padding=0)
def forward(self, x):
# padding on the both ends of time series
front = x[:, 0:1, :].repeat(1, (self.kernel_size - 1) // 2, 1)
end = x[:, -1:, :].repeat(1, (self.kernel_size - 1) // 2, 1)
x = torch.cat([front, x, end], dim=1)
x = self.avg(x.permute(0, 2, 1))
x = x.permute(0, 2, 1)
return x
class series_decomp(nn.Module):
"""
Series decomposition block
"""
def __init__(self, kernel_size):
super(series_decomp, self).__init__()
self.moving_avg = moving_avg(kernel_size, stride=1)
def forward(self, x):
moving_mean = self.moving_avg(x)
res = x - moving_mean
return res, moving_mean
class series_decomp_multi(nn.Module):
"""
Multiple Series decomposition block from FEDformer
"""
def __init__(self, kernel_size):
super(series_decomp_multi, self).__init__()
self.kernel_size = kernel_size
self.series_decomp = [series_decomp(kernel) for kernel in kernel_size]
def forward(self, x):
moving_mean = []
res = []
for func in self.series_decomp:
sea, moving_avg = func(x)
moving_mean.append(moving_avg)
res.append(sea)
sea = sum(res) / len(res)
moving_mean = sum(moving_mean) / len(moving_mean)
return sea, moving_mean
class EncoderLayer(nn.Module):
"""
Autoformer encoder layer with the progressive decomposition architecture
"""
def __init__(self, attention, d_model, d_ff=None, moving_avg=25, dropout=0.1, activation="relu"):
super(EncoderLayer, self).__init__()
d_ff = d_ff or 4 * d_model
self.attention = attention
self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1, bias=False)
self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1, bias=False)
self.decomp1 = series_decomp(moving_avg)
self.decomp2 = series_decomp(moving_avg)
self.dropout = nn.Dropout(dropout)
self.activation = F.relu if activation == "relu" else F.gelu
def forward(self, x, attn_mask=None):
new_x, attn = self.attention(
x, x, x,
attn_mask=attn_mask
)
x = x + self.dropout(new_x)
x, _ = self.decomp1(x)
y = x
y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
y = self.dropout(self.conv2(y).transpose(-1, 1))
res, _ = self.decomp2(x + y)
return res, attn
class Encoder(nn.Module):
"""
Autoformer encoder
"""
def __init__(self, attn_layers, conv_layers=None, norm_layer=None):
super(Encoder, self).__init__()
self.attn_layers = nn.ModuleList(attn_layers)
self.conv_layers = nn.ModuleList(conv_layers) if conv_layers is not None else None
self.norm = norm_layer
def forward(self, x, attn_mask=None):
attns = []
if self.conv_layers is not None:
for attn_layer, conv_layer in zip(self.attn_layers, self.conv_layers):
x, attn = attn_layer(x, attn_mask=attn_mask)
x = conv_layer(x)
attns.append(attn)
x, attn = self.attn_layers[-1](x)
attns.append(attn)
else:
for attn_layer in self.attn_layers:
x, attn = attn_layer(x, attn_mask=attn_mask)
attns.append(attn)
if self.norm is not None:
x = self.norm(x)
return x, attns
class DecoderLayer(nn.Module):
"""
Autoformer decoder layer with the progressive decomposition architecture
"""
def __init__(self, self_attention, cross_attention, d_model, c_out, d_ff=None,
moving_avg=25, dropout=0.1, activation="relu"):
super(DecoderLayer, self).__init__()
d_ff = d_ff or 4 * d_model
self.self_attention = self_attention
self.cross_attention = cross_attention
self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1, bias=False)
self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1, bias=False)
self.decomp1 = series_decomp(moving_avg)
self.decomp2 = series_decomp(moving_avg)
self.decomp3 = series_decomp(moving_avg)
self.dropout = nn.Dropout(dropout)
self.projection = nn.Conv1d(in_channels=d_model, out_channels=c_out, kernel_size=3, stride=1, padding=1,
padding_mode='circular', bias=False)
self.activation = F.relu if activation == "relu" else F.gelu
def forward(self, x, cross, x_mask=None, cross_mask=None):
x = x + self.dropout(self.self_attention(
x, x, x,
attn_mask=x_mask
)[0])
x, trend1 = self.decomp1(x)
x = x + self.dropout(self.cross_attention(
x, cross, cross,
attn_mask=cross_mask
)[0])
x, trend2 = self.decomp2(x)
y = x
y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
y = self.dropout(self.conv2(y).transpose(-1, 1))
x, trend3 = self.decomp3(x + y)
residual_trend = trend1 + trend2 + trend3
residual_trend = self.projection(residual_trend.permute(0, 2, 1)).transpose(1, 2)
return x, residual_trend
class Decoder(nn.Module):
"""
Autoformer encoder
"""
def __init__(self, layers, norm_layer=None, projection=None):
super(Decoder, self).__init__()
self.layers = nn.ModuleList(layers)
self.norm = norm_layer
self.projection = projection
def forward(self, x, cross, x_mask=None, cross_mask=None, trend=None):
for layer in self.layers:
x, residual_trend = layer(x, cross, x_mask=x_mask, cross_mask=cross_mask)
trend = trend + residual_trend
if self.norm is not None:
x = self.norm(x)
if self.projection is not None:
x = self.projection(x)
return x, trend

60
layers/Conv_Blocks.py Normal file
View File

@ -0,0 +1,60 @@
import torch
import torch.nn as nn
class Inception_Block_V1(nn.Module):
def __init__(self, in_channels, out_channels, num_kernels=6, init_weight=True):
super(Inception_Block_V1, self).__init__()
self.in_channels = in_channels
self.out_channels = out_channels
self.num_kernels = num_kernels
kernels = []
for i in range(self.num_kernels):
kernels.append(nn.Conv2d(in_channels, out_channels, kernel_size=2 * i + 1, padding=i))
self.kernels = nn.ModuleList(kernels)
if init_weight:
self._initialize_weights()
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
nn.init.constant_(m.bias, 0)
def forward(self, x):
res_list = []
for i in range(self.num_kernels):
res_list.append(self.kernels[i](x))
res = torch.stack(res_list, dim=-1).mean(-1)
return res
class Inception_Block_V2(nn.Module):
def __init__(self, in_channels, out_channels, num_kernels=6, init_weight=True):
super(Inception_Block_V2, self).__init__()
self.in_channels = in_channels
self.out_channels = out_channels
self.num_kernels = num_kernels
kernels = []
for i in range(self.num_kernels // 2):
kernels.append(nn.Conv2d(in_channels, out_channels, kernel_size=[1, 2 * i + 3], padding=[0, i + 1]))
kernels.append(nn.Conv2d(in_channels, out_channels, kernel_size=[2 * i + 3, 1], padding=[i + 1, 0]))
kernels.append(nn.Conv2d(in_channels, out_channels, kernel_size=1))
self.kernels = nn.ModuleList(kernels)
if init_weight:
self._initialize_weights()
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
nn.init.constant_(m.bias, 0)
def forward(self, x):
res_list = []
for i in range(self.num_kernels + 1):
res_list.append(self.kernels[i](x))
res = torch.stack(res_list, dim=-1).mean(-1)
return res

198
layers/Embed.py Normal file
View File

@ -0,0 +1,198 @@
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import Tensor
from torch.nn.utils import weight_norm
import math
class PositionalEmbedding(nn.Module):
def __init__(self, d_model, max_len=5000):
super(PositionalEmbedding, self).__init__()
# Compute the positional encodings once in log space.
pe = torch.zeros(max_len, d_model).float()
pe.require_grad = False
position = torch.arange(0, max_len).float().unsqueeze(1)
div_term = (torch.arange(0, d_model, 2).float()
* -(math.log(10000.0) / d_model)).exp()
pe[:, 0::2] = torch.sin(position * div_term)
pe[:, 1::2] = torch.cos(position * div_term)
pe = pe.unsqueeze(0)
self.register_buffer('pe', pe)
def forward(self, x):
return self.pe[:, :x.size(1)]
class TokenEmbedding(nn.Module):
def __init__(self, c_in, d_model):
super(TokenEmbedding, self).__init__()
padding = 1 if torch.__version__ >= '1.5.0' else 2
self.tokenConv = nn.Conv1d(in_channels=c_in, out_channels=d_model,
kernel_size=3, padding=padding, padding_mode='circular', bias=False)
for m in self.modules():
if isinstance(m, nn.Conv1d):
nn.init.kaiming_normal_(
m.weight, mode='fan_in', nonlinearity='leaky_relu')
def forward(self, x):
x = self.tokenConv(x.permute(0, 2, 1)).transpose(1, 2)
return x
class FixedEmbedding(nn.Module):
def __init__(self, c_in, d_model):
super(FixedEmbedding, self).__init__()
w = torch.zeros(c_in, d_model).float()
w.require_grad = False
position = torch.arange(0, c_in).float().unsqueeze(1)
div_term = (torch.arange(0, d_model, 2).float()
* -(math.log(10000.0) / d_model)).exp()
w[:, 0::2] = torch.sin(position * div_term)
w[:, 1::2] = torch.cos(position * div_term)
self.emb = nn.Embedding(c_in, d_model)
self.emb.weight = nn.Parameter(w, requires_grad=False)
def forward(self, x):
return self.emb(x).detach()
class TemporalEmbedding(nn.Module):
def __init__(self, d_model, embed_type='fixed', freq='h'):
super(TemporalEmbedding, self).__init__()
minute_size = 4
hour_size = 24
weekday_size = 7
day_size = 32
month_size = 13
Embed = FixedEmbedding if embed_type == 'fixed' else nn.Embedding
if freq == 't':
self.minute_embed = Embed(minute_size, d_model)
self.hour_embed = Embed(hour_size, d_model)
self.weekday_embed = Embed(weekday_size, d_model)
self.day_embed = Embed(day_size, d_model)
self.month_embed = Embed(month_size, d_model)
def forward(self, x):
x = x.long()
minute_x = self.minute_embed(x[:, :, 4]) if hasattr(
self, 'minute_embed') else 0.
hour_x = self.hour_embed(x[:, :, 3])
weekday_x = self.weekday_embed(x[:, :, 2])
day_x = self.day_embed(x[:, :, 1])
month_x = self.month_embed(x[:, :, 0])
return hour_x + weekday_x + day_x + month_x + minute_x
class TimeFeatureEmbedding(nn.Module):
def __init__(self, d_model, embed_type='timeF', freq='h'):
super(TimeFeatureEmbedding, self).__init__()
freq_map = {'h': 4, 't': 5, 's': 6,
'm': 1, 'a': 1, 'w': 2, 'd': 3, 'b': 3}
d_inp = freq_map[freq]
self.embed = nn.Linear(d_inp, d_model, bias=False)
def forward(self, x):
return self.embed(x)
class DataEmbedding(nn.Module):
def __init__(self, c_in, d_model, embed_type='fixed', freq='h', dropout=0.1):
super(DataEmbedding, self).__init__()
self.value_embedding = TokenEmbedding(c_in=c_in, d_model=d_model)
self.position_embedding = PositionalEmbedding(d_model=d_model)
self.temporal_embedding = TemporalEmbedding(d_model=d_model, embed_type=embed_type,
freq=freq) if embed_type != 'timeF' else TimeFeatureEmbedding(
d_model=d_model, embed_type=embed_type, freq=freq)
self.dropout = nn.Dropout(p=dropout)
def forward(self, x, x_mark):
if x_mark is None:
x = self.value_embedding(x) + self.position_embedding(x).to(x.device)
else:
x = self.value_embedding(
x) + self.temporal_embedding(x_mark) + self.position_embedding(x)
return self.dropout(x)
class DataEmbedding_wo_pos(nn.Module):
def __init__(self, c_in, d_model, embed_type='fixed', freq='h', dropout=0.1):
super(DataEmbedding_wo_pos, self).__init__()
self.value_embedding = TokenEmbedding(c_in=c_in, d_model=d_model)
self.position_embedding = PositionalEmbedding(d_model=d_model)
self.temporal_embedding = TemporalEmbedding(d_model=d_model, embed_type=embed_type,
freq=freq) if embed_type != 'timeF' else TimeFeatureEmbedding(
d_model=d_model, embed_type=embed_type, freq=freq)
self.dropout = nn.Dropout(p=dropout)
def forward(self, x, x_mark):
if x_mark is None:
x = self.value_embedding(x)
else:
x = self.value_embedding(x) + self.temporal_embedding(x_mark)
return self.dropout(x)
class ReplicationPad1d(nn.Module):
def __init__(self, padding) -> None:
super(ReplicationPad1d, self).__init__()
self.padding = padding
def forward(self, input: Tensor) -> Tensor:
replicate_padding = input[:, :, -1].unsqueeze(-1).repeat(1, 1, self.padding[-1])
output = torch.cat([input, replicate_padding], dim=-1)
return output
class PatchEmbedding(nn.Module):
def __init__(self, d_model, patch_len, stride, dropout):
super(PatchEmbedding, self).__init__()
# Patching
self.patch_len = patch_len
self.stride = stride
self.padding_patch_layer = ReplicationPad1d((0, stride))
# Backbone, Input encoding: projection of feature vectors onto a d-dim vector space
self.value_embedding = TokenEmbedding(patch_len, d_model)
# Positional embedding
# self.position_embedding = PositionalEmbedding(d_model)
# Residual dropout
self.dropout = nn.Dropout(dropout)
def forward(self, x):
# do patching
n_vars = x.shape[1]
x = self.padding_patch_layer(x)
x = x.unfold(dimension=-1, size=self.patch_len, step=self.stride)
x = torch.reshape(x, (x.shape[0] * x.shape[1], x.shape[2], x.shape[3]))
# Input encoding
x = self.value_embedding(x)
return self.dropout(x), n_vars
class DataEmbedding_wo_time(nn.Module):
def __init__(self, c_in, d_model, embed_type='fixed', freq='h', dropout=0.1):
super(DataEmbedding_wo_time, self).__init__()
self.value_embedding = TokenEmbedding(c_in=c_in, d_model=d_model)
self.position_embedding = PositionalEmbedding(d_model=d_model)
self.dropout = nn.Dropout(p=dropout)
def forward(self, x):
x = self.value_embedding(x) + self.position_embedding(x)
return self.dropout(x)

View File

@ -0,0 +1,242 @@
import torch
import torch.nn as nn
import numpy as np
from math import sqrt
from utils.masking import TriangularCausalMask, ProbMask
from reformer_pytorch import LSHSelfAttention
class DSAttention(nn.Module):
'''De-stationary Attention'''
def __init__(self, mask_flag=True, factor=5, scale=None, attention_dropout=0.1, output_attention=False):
super(DSAttention, self).__init__()
self.scale = scale
self.mask_flag = mask_flag
self.output_attention = output_attention
self.dropout = nn.Dropout(attention_dropout)
def forward(self, queries, keys, values, attn_mask, tau=None, delta=None):
B, L, H, E = queries.shape
_, S, _, D = values.shape
scale = self.scale or 1. / sqrt(E)
tau = 1.0 if tau is None else tau.unsqueeze(
1).unsqueeze(1) # B x 1 x 1 x 1
delta = 0.0 if delta is None else delta.unsqueeze(
1).unsqueeze(1) # B x 1 x 1 x S
# De-stationary Attention, rescaling pre-softmax score with learned de-stationary factors
scores = torch.einsum("blhe,bshe->bhls", queries, keys) * tau + delta
if self.mask_flag:
if attn_mask is None:
attn_mask = TriangularCausalMask(B, L, device=queries.device)
scores.masked_fill_(attn_mask.mask, -np.inf)
A = self.dropout(torch.softmax(scale * scores, dim=-1))
V = torch.einsum("bhls,bshd->blhd", A, values)
if self.output_attention:
return (V.contiguous(), A)
else:
return (V.contiguous(), None)
class FullAttention(nn.Module):
def __init__(self, mask_flag=True, factor=5, scale=None, attention_dropout=0.1, output_attention=False):
super(FullAttention, self).__init__()
self.scale = scale
self.mask_flag = mask_flag
self.output_attention = output_attention
self.dropout = nn.Dropout(attention_dropout)
def forward(self, queries, keys, values, attn_mask, tau=None, delta=None):
B, L, H, E = queries.shape
_, S, _, D = values.shape
scale = self.scale or 1. / sqrt(E)
scores = torch.einsum("blhe,bshe->bhls", queries, keys)
if self.mask_flag:
if attn_mask is None:
attn_mask = TriangularCausalMask(B, L, device=queries.device)
scores.masked_fill_(attn_mask.mask, -np.inf)
A = self.dropout(torch.softmax(scale * scores, dim=-1))
V = torch.einsum("bhls,bshd->blhd", A, values)
if self.output_attention:
return (V.contiguous(), A)
else:
return (V.contiguous(), None)
class ProbAttention(nn.Module):
def __init__(self, mask_flag=True, factor=5, scale=None, attention_dropout=0.1, output_attention=False):
super(ProbAttention, self).__init__()
self.factor = factor
self.scale = scale
self.mask_flag = mask_flag
self.output_attention = output_attention
self.dropout = nn.Dropout(attention_dropout)
def _prob_QK(self, Q, K, sample_k, n_top): # n_top: c*ln(L_q)
# Q [B, H, L, D]
B, H, L_K, E = K.shape
_, _, L_Q, _ = Q.shape
# calculate the sampled Q_K
K_expand = K.unsqueeze(-3).expand(B, H, L_Q, L_K, E)
# real U = U_part(factor*ln(L_k))*L_q
index_sample = torch.randint(L_K, (L_Q, sample_k))
K_sample = K_expand[:, :, torch.arange(
L_Q).unsqueeze(1), index_sample, :]
Q_K_sample = torch.matmul(
Q.unsqueeze(-2), K_sample.transpose(-2, -1)).squeeze()
# find the Top_k query with sparisty measurement
M = Q_K_sample.max(-1)[0] - torch.div(Q_K_sample.sum(-1), L_K)
M_top = M.topk(n_top, sorted=False)[1]
# use the reduced Q to calculate Q_K
Q_reduce = Q[torch.arange(B)[:, None, None],
torch.arange(H)[None, :, None],
M_top, :] # factor*ln(L_q)
Q_K = torch.matmul(Q_reduce, K.transpose(-2, -1)) # factor*ln(L_q)*L_k
return Q_K, M_top
def _get_initial_context(self, V, L_Q):
B, H, L_V, D = V.shape
if not self.mask_flag:
# V_sum = V.sum(dim=-2)
V_sum = V.mean(dim=-2)
contex = V_sum.unsqueeze(-2).expand(B, H,
L_Q, V_sum.shape[-1]).clone()
else: # use mask
# requires that L_Q == L_V, i.e. for self-attention only
assert (L_Q == L_V)
contex = V.cumsum(dim=-2)
return contex
def _update_context(self, context_in, V, scores, index, L_Q, attn_mask):
B, H, L_V, D = V.shape
if self.mask_flag:
attn_mask = ProbMask(B, H, L_Q, index, scores, device=V.device)
scores.masked_fill_(attn_mask.mask, -np.inf)
attn = torch.softmax(scores, dim=-1) # nn.Softmax(dim=-1)(scores)
context_in[torch.arange(B)[:, None, None],
torch.arange(H)[None, :, None],
index, :] = torch.matmul(attn, V).type_as(context_in)
if self.output_attention:
attns = (torch.ones([B, H, L_V, L_V]) /
L_V).type_as(attn).to(attn.device)
attns[torch.arange(B)[:, None, None], torch.arange(H)[
None, :, None], index, :] = attn
return (context_in, attns)
else:
return (context_in, None)
def forward(self, queries, keys, values, attn_mask, tau=None, delta=None):
B, L_Q, H, D = queries.shape
_, L_K, _, _ = keys.shape
queries = queries.transpose(2, 1)
keys = keys.transpose(2, 1)
values = values.transpose(2, 1)
U_part = self.factor * \
np.ceil(np.log(L_K)).astype('int').item() # c*ln(L_k)
u = self.factor * \
np.ceil(np.log(L_Q)).astype('int').item() # c*ln(L_q)
U_part = U_part if U_part < L_K else L_K
u = u if u < L_Q else L_Q
scores_top, index = self._prob_QK(
queries, keys, sample_k=U_part, n_top=u)
# add scale factor
scale = self.scale or 1. / sqrt(D)
if scale is not None:
scores_top = scores_top * scale
# get the context
context = self._get_initial_context(values, L_Q)
# update the context with selected top_k queries
context, attn = self._update_context(
context, values, scores_top, index, L_Q, attn_mask)
return context.contiguous(), attn
class AttentionLayer(nn.Module):
def __init__(self, attention, d_model, n_heads, d_keys=None,
d_values=None):
super(AttentionLayer, self).__init__()
d_keys = d_keys or (d_model // n_heads)
d_values = d_values or (d_model // n_heads)
self.inner_attention = attention
self.query_projection = nn.Linear(d_model, d_keys * n_heads)
self.key_projection = nn.Linear(d_model, d_keys * n_heads)
self.value_projection = nn.Linear(d_model, d_values * n_heads)
self.out_projection = nn.Linear(d_values * n_heads, d_model)
self.n_heads = n_heads
def forward(self, queries, keys, values, attn_mask, tau=None, delta=None):
B, L, _ = queries.shape
_, S, _ = keys.shape
H = self.n_heads
queries = self.query_projection(queries).view(B, L, H, -1)
keys = self.key_projection(keys).view(B, S, H, -1)
values = self.value_projection(values).view(B, S, H, -1)
out, attn = self.inner_attention(
queries,
keys,
values,
attn_mask,
tau=tau,
delta=delta
)
out = out.view(B, L, -1)
return self.out_projection(out), attn
class ReformerLayer(nn.Module):
def __init__(self, attention, d_model, n_heads, d_keys=None,
d_values=None, causal=False, bucket_size=4, n_hashes=4):
super().__init__()
self.bucket_size = bucket_size
self.attn = LSHSelfAttention(
dim=d_model,
heads=n_heads,
bucket_size=bucket_size,
n_hashes=n_hashes,
causal=causal
)
def fit_length(self, queries):
# inside reformer: assert N % (bucket_size * 2) == 0
B, N, C = queries.shape
if N % (self.bucket_size * 2) == 0:
return queries
else:
# fill the time series
fill_len = (self.bucket_size * 2) - (N % (self.bucket_size * 2))
return torch.cat([queries, torch.zeros([B, fill_len, C]).to(queries.device)], dim=1)
def forward(self, queries, keys, values, attn_mask, tau, delta):
# in Reformer: defalut queries=keys
B, N, C = queries.shape
queries = self.attn(self.fit_length(queries))[:, :N, :]
return queries, None

68
layers/StandardNorm.py Executable file
View File

@ -0,0 +1,68 @@
import torch
import torch.nn as nn
class Normalize(nn.Module):
def __init__(self, num_features: int, eps=1e-5, affine=False, subtract_last=False, non_norm=False):
"""
:param num_features: the number of features or channels
:param eps: a value added for numerical stability
:param affine: if True, RevIN has learnable affine parameters
"""
super(Normalize, self).__init__()
self.num_features = num_features
self.eps = eps
self.affine = affine
self.subtract_last = subtract_last
self.non_norm = non_norm
if self.affine:
self._init_params()
def forward(self, x, mode: str):
if mode == 'norm':
self._get_statistics(x)
x = self._normalize(x)
elif mode == 'denorm':
x = self._denormalize(x)
else:
raise NotImplementedError
return x
def _init_params(self):
# initialize RevIN params: (C,)
self.affine_weight = nn.Parameter(torch.ones(self.num_features))
self.affine_bias = nn.Parameter(torch.zeros(self.num_features))
def _get_statistics(self, x):
dim2reduce = tuple(range(1, x.ndim - 1))
if self.subtract_last:
self.last = x[:, -1, :].unsqueeze(1)
else:
self.mean = torch.mean(x, dim=dim2reduce, keepdim=True).detach()
self.stdev = torch.sqrt(torch.var(x, dim=dim2reduce, keepdim=True, unbiased=False) + self.eps).detach()
def _normalize(self, x):
if self.non_norm:
return x
if self.subtract_last:
x = x - self.last
else:
x = x - self.mean
x = x / self.stdev
if self.affine:
x = x * self.affine_weight
x = x + self.affine_bias
return x
def _denormalize(self, x):
if self.non_norm:
return x
if self.affine:
x = x - self.affine_bias
x = x / (self.affine_weight + self.eps * self.eps)
x = x * self.stdev
if self.subtract_last:
x = x + self.last
else:
x = x + self.mean
return x

View File

@ -0,0 +1,135 @@
import torch
import torch.nn as nn
import torch.nn.functional as F
class ConvLayer(nn.Module):
def __init__(self, c_in):
super(ConvLayer, self).__init__()
self.downConv = nn.Conv1d(in_channels=c_in,
out_channels=c_in,
kernel_size=3,
padding=2,
padding_mode='circular')
self.norm = nn.BatchNorm1d(c_in)
self.activation = nn.ELU()
self.maxPool = nn.MaxPool1d(kernel_size=3, stride=2, padding=1)
def forward(self, x):
x = self.downConv(x.permute(0, 2, 1))
x = self.norm(x)
x = self.activation(x)
x = self.maxPool(x)
x = x.transpose(1, 2)
return x
class EncoderLayer(nn.Module):
def __init__(self, attention, d_model, d_ff=None, dropout=0.1, activation="relu"):
super(EncoderLayer, self).__init__()
d_ff = d_ff or 4 * d_model
self.attention = attention
self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1)
self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1)
self.norm1 = nn.LayerNorm(d_model)
self.norm2 = nn.LayerNorm(d_model)
self.dropout = nn.Dropout(dropout)
self.activation = F.relu if activation == "relu" else F.gelu
def forward(self, x, attn_mask=None, tau=None, delta=None):
new_x, attn = self.attention(
x, x, x,
attn_mask=attn_mask,
tau=tau, delta=delta
)
x = x + self.dropout(new_x)
y = x = self.norm1(x)
y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
y = self.dropout(self.conv2(y).transpose(-1, 1))
return self.norm2(x + y), attn
class Encoder(nn.Module):
def __init__(self, attn_layers, conv_layers=None, norm_layer=None):
super(Encoder, self).__init__()
self.attn_layers = nn.ModuleList(attn_layers)
self.conv_layers = nn.ModuleList(conv_layers) if conv_layers is not None else None
self.norm = norm_layer
def forward(self, x, attn_mask=None, tau=None, delta=None):
# x [B, L, D]
attns = []
if self.conv_layers is not None:
for i, (attn_layer, conv_layer) in enumerate(zip(self.attn_layers, self.conv_layers)):
delta = delta if i == 0 else None
x, attn = attn_layer(x, attn_mask=attn_mask, tau=tau, delta=delta)
x = conv_layer(x)
attns.append(attn)
x, attn = self.attn_layers[-1](x, tau=tau, delta=None)
attns.append(attn)
else:
for attn_layer in self.attn_layers:
x, attn = attn_layer(x, attn_mask=attn_mask, tau=tau, delta=delta)
attns.append(attn)
if self.norm is not None:
x = self.norm(x)
return x, attns
class DecoderLayer(nn.Module):
def __init__(self, self_attention, cross_attention, d_model, d_ff=None,
dropout=0.1, activation="relu"):
super(DecoderLayer, self).__init__()
d_ff = d_ff or 4 * d_model
self.self_attention = self_attention
self.cross_attention = cross_attention
self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1)
self.conv2 = nn.Conv1d(in_channels=d_ff, out_channels=d_model, kernel_size=1)
self.norm1 = nn.LayerNorm(d_model)
self.norm2 = nn.LayerNorm(d_model)
self.norm3 = nn.LayerNorm(d_model)
self.dropout = nn.Dropout(dropout)
self.activation = F.relu if activation == "relu" else F.gelu
def forward(self, x, cross, x_mask=None, cross_mask=None, tau=None, delta=None):
x = x + self.dropout(self.self_attention(
x, x, x,
attn_mask=x_mask,
tau=tau, delta=None
)[0])
x = self.norm1(x)
x = x + self.dropout(self.cross_attention(
x, cross, cross,
attn_mask=cross_mask,
tau=tau, delta=delta
)[0])
y = x = self.norm2(x)
y = self.dropout(self.activation(self.conv1(y.transpose(-1, 1))))
y = self.dropout(self.conv2(y).transpose(-1, 1))
return self.norm3(x + y)
class Decoder(nn.Module):
def __init__(self, layers, norm_layer=None, projection=None):
super(Decoder, self).__init__()
self.layers = nn.ModuleList(layers)
self.norm = norm_layer
self.projection = projection
def forward(self, x, cross, x_mask=None, cross_mask=None, tau=None, delta=None):
for layer in self.layers:
x = layer(x, cross, x_mask=x_mask, cross_mask=cross_mask, tau=tau, delta=delta)
if self.norm is not None:
x = self.norm(x)
if self.projection is not None:
x = self.projection(x)
return x

0
layers/__init__.py Normal file
View File

158
models/Autoformer.py Normal file
View File

@ -0,0 +1,158 @@
import torch
import torch.nn as nn
import torch.nn.functional as F
from layers.Embed import DataEmbedding, DataEmbedding_wo_pos
from layers.AutoCorrelation import AutoCorrelation, AutoCorrelationLayer
from layers.Autoformer_EncDec import Encoder, Decoder, EncoderLayer, DecoderLayer, my_Layernorm, series_decomp
import math
import numpy as np
class Model(nn.Module):
"""
Autoformer is the first method to achieve the series-wise connection,
with inherent O(LlogL) complexity
Paper link: https://openreview.net/pdf?id=I55UqU-M11y
"""
def __init__(self, configs):
super(Model, self).__init__()
self.task_name = configs.task_name
self.seq_len = configs.seq_len
self.label_len = configs.label_len
self.pred_len = configs.pred_len
self.output_attention = configs.output_attention
# Decomp
kernel_size = configs.moving_avg
self.decomp = series_decomp(kernel_size)
# Embedding
self.enc_embedding = DataEmbedding_wo_pos(configs.enc_in, configs.d_model, configs.embed, configs.freq,
configs.dropout)
# Encoder
self.encoder = Encoder(
[
EncoderLayer(
AutoCorrelationLayer(
AutoCorrelation(False, configs.factor, attention_dropout=configs.dropout,
output_attention=configs.output_attention),
configs.d_model, configs.n_heads),
configs.d_model,
configs.d_ff,
moving_avg=configs.moving_avg,
dropout=configs.dropout,
activation=configs.activation
) for l in range(configs.e_layers)
],
norm_layer=my_Layernorm(configs.d_model)
)
# Decoder
if self.task_name == 'long_term_forecast' or self.task_name == 'short_term_forecast':
self.dec_embedding = DataEmbedding_wo_pos(configs.dec_in, configs.d_model, configs.embed, configs.freq,
configs.dropout)
self.decoder = Decoder(
[
DecoderLayer(
AutoCorrelationLayer(
AutoCorrelation(True, configs.factor, attention_dropout=configs.dropout,
output_attention=False),
configs.d_model, configs.n_heads),
AutoCorrelationLayer(
AutoCorrelation(False, configs.factor, attention_dropout=configs.dropout,
output_attention=False),
configs.d_model, configs.n_heads),
configs.d_model,
configs.c_out,
configs.d_ff,
moving_avg=configs.moving_avg,
dropout=configs.dropout,
activation=configs.activation,
)
for l in range(configs.d_layers)
],
norm_layer=my_Layernorm(configs.d_model),
projection=nn.Linear(configs.d_model, configs.c_out, bias=True)
)
if self.task_name == 'imputation':
self.projection = nn.Linear(
configs.d_model, configs.c_out, bias=True)
if self.task_name == 'anomaly_detection':
self.projection = nn.Linear(
configs.d_model, configs.c_out, bias=True)
if self.task_name == 'classification':
self.act = F.gelu
self.dropout = nn.Dropout(configs.dropout)
self.projection = nn.Linear(
configs.d_model * configs.seq_len, configs.num_class)
def forecast(self, x_enc, x_mark_enc, x_dec, x_mark_dec):
# decomp init
mean = torch.mean(x_enc, dim=1).unsqueeze(
1).repeat(1, self.pred_len, 1)
zeros = torch.zeros([x_dec.shape[0], self.pred_len,
x_dec.shape[2]], device=x_enc.device)
seasonal_init, trend_init = self.decomp(x_enc)
# decoder input
trend_init = torch.cat(
[trend_init[:, -self.label_len:, :], mean], dim=1)
seasonal_init = torch.cat(
[seasonal_init[:, -self.label_len:, :], zeros], dim=1)
# enc
enc_out = self.enc_embedding(x_enc, x_mark_enc)
enc_out, attns = self.encoder(enc_out, attn_mask=None)
# dec
dec_out = self.dec_embedding(seasonal_init, x_mark_dec)
seasonal_part, trend_part = self.decoder(dec_out, enc_out, x_mask=None, cross_mask=None,
trend=trend_init)
# final
dec_out = trend_part + seasonal_part
return dec_out
def imputation(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask):
# enc
enc_out = self.enc_embedding(x_enc, x_mark_enc)
enc_out, attns = self.encoder(enc_out, attn_mask=None)
# final
dec_out = self.projection(enc_out)
return dec_out
def anomaly_detection(self, x_enc):
# enc
enc_out = self.enc_embedding(x_enc, None)
enc_out, attns = self.encoder(enc_out, attn_mask=None)
# final
dec_out = self.projection(enc_out)
return dec_out
def classification(self, x_enc, x_mark_enc):
# enc
enc_out = self.enc_embedding(x_enc, None)
enc_out, attns = self.encoder(enc_out, attn_mask=None)
# Output
# the output transformer encoder/decoder embeddings don't include non-linearity
output = self.act(enc_out)
output = self.dropout(output)
# zero-out padding embeddings
output = output * x_mark_enc.unsqueeze(-1)
# (batch_size, seq_length * d_model)
output = output.reshape(output.shape[0], -1)
output = self.projection(output) # (batch_size, num_classes)
return output
def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):
if self.task_name == 'long_term_forecast' or self.task_name == 'short_term_forecast':
dec_out = self.forecast(x_enc, x_mark_enc, x_dec, x_mark_dec)
return dec_out[:, -self.pred_len:, :] # [B, L, D]
if self.task_name == 'imputation':
dec_out = self.imputation(
x_enc, x_mark_enc, x_dec, x_mark_dec, mask)
return dec_out # [B, L, D]
if self.task_name == 'anomaly_detection':
dec_out = self.anomaly_detection(x_enc)
return dec_out # [B, L, D]
if self.task_name == 'classification':
dec_out = self.classification(x_enc, x_mark_enc)
return dec_out # [B, N]
return None

107
models/DLinear.py Normal file
View File

@ -0,0 +1,107 @@
import torch
import torch.nn as nn
import torch.nn.functional as F
from layers.Autoformer_EncDec import series_decomp
class Model(nn.Module):
"""
Paper link: https://arxiv.org/pdf/2205.13504.pdf
"""
def __init__(self, configs, individual=False):
"""
individual: Bool, whether shared model among different variates.
"""
super(Model, self).__init__()
self.task_name = configs.task_name
self.seq_len = configs.seq_len
if self.task_name == 'classification' or self.task_name == 'anomaly_detection' or self.task_name == 'imputation':
self.pred_len = configs.seq_len
else:
self.pred_len = configs.pred_len
self.decompsition = series_decomp(configs.moving_avg)
self.individual = individual
self.channels = configs.enc_in
if self.individual:
self.Linear_Seasonal = nn.ModuleList()
self.Linear_Trend = nn.ModuleList()
for i in range(self.channels):
self.Linear_Seasonal.append(
nn.Linear(self.seq_len, self.pred_len))
self.Linear_Trend.append(
nn.Linear(self.seq_len, self.pred_len))
self.Linear_Seasonal[i].weight = nn.Parameter(
(1 / self.seq_len) * torch.ones([self.pred_len, self.seq_len]))
self.Linear_Trend[i].weight = nn.Parameter(
(1 / self.seq_len) * torch.ones([self.pred_len, self.seq_len]))
else:
self.Linear_Seasonal = nn.Linear(self.seq_len, self.pred_len)
self.Linear_Trend = nn.Linear(self.seq_len, self.pred_len)
self.Linear_Seasonal.weight = nn.Parameter(
(1 / self.seq_len) * torch.ones([self.pred_len, self.seq_len]))
self.Linear_Trend.weight = nn.Parameter(
(1 / self.seq_len) * torch.ones([self.pred_len, self.seq_len]))
if self.task_name == 'classification':
self.act = F.gelu
self.dropout = nn.Dropout(configs.dropout)
self.projection = nn.Linear(
configs.enc_in * configs.seq_len, configs.num_class)
def encoder(self, x):
seasonal_init, trend_init = self.decompsition(x)
seasonal_init, trend_init = seasonal_init.permute(
0, 2, 1), trend_init.permute(0, 2, 1)
if self.individual:
seasonal_output = torch.zeros([seasonal_init.size(0), seasonal_init.size(1), self.pred_len],
dtype=seasonal_init.dtype).to(seasonal_init.device)
trend_output = torch.zeros([trend_init.size(0), trend_init.size(1), self.pred_len],
dtype=trend_init.dtype).to(trend_init.device)
for i in range(self.channels):
seasonal_output[:, i, :] = self.Linear_Seasonal[i](
seasonal_init[:, i, :])
trend_output[:, i, :] = self.Linear_Trend[i](
trend_init[:, i, :])
else:
seasonal_output = self.Linear_Seasonal(seasonal_init)
trend_output = self.Linear_Trend(trend_init)
x = seasonal_output + trend_output
return x.permute(0, 2, 1)
def forecast(self, x_enc):
return self.encoder(x_enc)
def imputation(self, x_enc):
return self.encoder(x_enc)
def anomaly_detection(self, x_enc):
return self.encoder(x_enc)
def classification(self, x_enc):
enc_out = self.encoder(x_enc)
# Output
# (batch_size, seq_length * d_model)
output = enc_out.reshape(enc_out.shape[0], -1)
output = self.projection(output) # (batch_size, num_classes)
return output
def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):
if self.task_name == 'long_term_forecast' or self.task_name == 'short_term_forecast':
dec_out = self.forecast(x_enc)
return dec_out[:, -self.pred_len:, :] # [B, L, D]
if self.task_name == 'imputation':
dec_out = self.imputation(x_enc)
return dec_out # [B, L, D]
if self.task_name == 'anomaly_detection':
dec_out = self.anomaly_detection(x_enc)
return dec_out # [B, L, D]
if self.task_name == 'classification':
dec_out = self.classification(x_enc)
return dec_out # [B, N]
return None

209
models/TimeLLM.py Normal file
View File

@ -0,0 +1,209 @@
from math import sqrt
import torch
import torch.nn as nn
from transformers import LlamaConfig, LlamaModel, LlamaTokenizer
from layers.Embed import PatchEmbedding
import transformers
from layers.StandardNorm import Normalize
transformers.logging.set_verbosity_error()
class FlattenHead(nn.Module):
def __init__(self, n_vars, nf, target_window, head_dropout=0):
super().__init__()
self.n_vars = n_vars
self.flatten = nn.Flatten(start_dim=-2)
self.linear = nn.Linear(nf, target_window)
self.dropout = nn.Dropout(head_dropout)
def forward(self, x):
x = self.flatten(x)
x = self.linear(x)
x = self.dropout(x)
return x
class Model(nn.Module):
def __init__(self, configs, patch_len=16, stride=8):
super(Model, self).__init__()
self.task_name = configs.task_name
self.pred_len = configs.pred_len
self.seq_len = configs.seq_len
self.d_ff = configs.d_ff
self.top_k = 5
self.d_llm = 4096
self.patch_len = configs.patch_len
self.stride = configs.stride
self.llama_config = LlamaConfig.from_pretrained('/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/')
# self.llama_config = LlamaConfig.from_pretrained('huggyllama/llama-7b')
self.llama_config.num_hidden_layers = configs.llm_layers
self.llama_config.output_attentions = True
self.llama_config.output_hidden_states = True
self.llama = LlamaModel.from_pretrained(
"/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/",
# 'huggyllama/llama-7b',
trust_remote_code=True,
local_files_only=True,
config=self.llama_config,
load_in_4bit=True
)
self.tokenizer = LlamaTokenizer.from_pretrained(
"/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/tokenizer.model",
# 'huggyllama/llama-7b',
trust_remote_code=True,
local_files_only=True
)
if self.tokenizer.eos_token:
self.tokenizer.pad_token = self.tokenizer.eos_token
else:
pad_token = '[PAD]'
self.tokenizer.add_special_tokens({'pad_token': pad_token})
self.tokenizer.pad_token = pad_token
for param in self.llama.parameters():
param.requires_grad = False
self.dropout = nn.Dropout(configs.dropout)
self.patch_embedding = PatchEmbedding(
configs.d_model, self.patch_len, self.stride, configs.dropout)
self.word_embeddings = self.llama.get_input_embeddings().weight
self.vocab_size = self.word_embeddings.shape[0]
self.num_tokens = 1000
self.mapping_layer = nn.Linear(self.vocab_size, self.num_tokens)
self.reprogramming_layer = ReprogrammingLayer(configs.d_model, configs.n_heads, self.d_ff, self.d_llm)
self.patch_nums = int((configs.seq_len - self.patch_len) / self.stride + 2)
self.head_nf = self.d_ff * self.patch_nums
if self.task_name == 'long_term_forecast' or self.task_name == 'short_term_forecast':
self.output_projection = FlattenHead(configs.enc_in, self.head_nf, self.pred_len,
head_dropout=configs.dropout)
else:
raise NotImplementedError
self.normalize_layers = Normalize(configs.enc_in, affine=False)
def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):
if self.task_name == 'long_term_forecast' or self.task_name == 'short_term_forecast':
dec_out = self.forecast(x_enc, x_mark_enc, x_dec, x_mark_dec)
return dec_out[:, -self.pred_len:, :]
return None
def forecast(self, x_enc, x_mark_enc, x_dec, x_mark_dec):
x_enc = self.normalize_layers(x_enc, 'norm')
B, T, N = x_enc.size()
x_enc = x_enc.permute(0, 2, 1).contiguous().reshape(B * N, T, 1)
min_values = torch.min(x_enc, dim=1)[0]
max_values = torch.max(x_enc, dim=1)[0]
medians = torch.median(x_enc, dim=1).values
lags = self.calcute_lags(x_enc)
trends = x_enc.diff(dim=1).sum(dim=1)
prompt = []
for b in range(x_enc.shape[0]):
min_values_str = str(min_values[b].tolist()[0])
max_values_str = str(max_values[b].tolist()[0])
median_values_str = str(medians[b].tolist()[0])
lags_values_str = str(lags[b].tolist())
prompt_ = (
f"<|start_prompt|>Dataset description: The Electricity Transformer Temperature (ETT) is a crucial indicator in the electric power long-term deployment."
f"Task description: forecast the next {str(self.pred_len)} steps given the previous {str(self.seq_len)} steps information; "
"Input statistics: "
f"min value {min_values_str}, "
f"max value {max_values_str}, "
f"median value {median_values_str}, "
f"the trend of input is {'upward' if trends[b] > 0 else 'downward'}, "
f"top 5 lags are : {lags_values_str}<|<end_prompt>|>"
)
prompt.append(prompt_)
x_enc = x_enc.reshape(B, N, T).permute(0, 2, 1).contiguous()
prompt = self.tokenizer(prompt, return_tensors="pt", padding=True, truncation=True, max_length=2048).input_ids
prompt_embeddings = self.llama.get_input_embeddings()(prompt.to(x_enc.device)) # (batch, prompt_token, dim)
source_embeddings = self.mapping_layer(self.word_embeddings.permute(1, 0)).permute(1, 0)
x_enc = x_enc.permute(0, 2, 1).contiguous()
enc_out, n_vars = self.patch_embedding(x_enc.to(torch.bfloat16))
enc_out = self.reprogramming_layer(enc_out, source_embeddings, source_embeddings)
llama_enc_out = torch.cat([prompt_embeddings, enc_out], dim=1)
dec_out = self.llama(inputs_embeds=llama_enc_out).last_hidden_state
dec_out = dec_out[:, :, :self.d_ff]
dec_out = torch.reshape(
dec_out, (-1, n_vars, dec_out.shape[-2], dec_out.shape[-1]))
dec_out = dec_out.permute(0, 1, 3, 2).contiguous()
dec_out = self.output_projection(dec_out[:, :, :, -self.patch_nums:])
dec_out = dec_out.permute(0, 2, 1).contiguous()
dec_out = self.normalize_layers(dec_out, 'denorm')
return dec_out
def calcute_lags(self, x_enc):
q_fft = torch.fft.rfft(x_enc.permute(0, 2, 1).contiguous(), dim=-1)
k_fft = torch.fft.rfft(x_enc.permute(0, 2, 1).contiguous(), dim=-1)
res = q_fft * torch.conj(k_fft)
corr = torch.fft.irfft(res, dim=-1)
mean_value = torch.mean(corr, dim=1)
_, lags = torch.topk(mean_value, self.top_k, dim=-1)
return lags
class ReprogrammingLayer(nn.Module):
def __init__(self, d_model, n_heads, d_keys=None, d_llm=None, attention_dropout=0.1):
super(ReprogrammingLayer, self).__init__()
d_keys = d_keys or (d_model // n_heads)
self.query_projection = nn.Linear(d_model, d_keys * n_heads)
self.key_projection = nn.Linear(d_llm, d_keys * n_heads)
self.value_projection = nn.Linear(d_llm, d_keys * n_heads)
self.out_projection = nn.Linear(d_keys * n_heads, d_llm)
self.n_heads = n_heads
self.dropout = nn.Dropout(attention_dropout)
def forward(self, target_embedding, source_embedding, value_embedding):
B, L, _ = target_embedding.shape
S, _ = source_embedding.shape
H = self.n_heads
target_embedding = self.query_projection(target_embedding).view(B, L, H, -1)
source_embedding = self.key_projection(source_embedding).view(S, H, -1)
value_embedding = self.value_projection(value_embedding).view(S, H, -1)
out = self.reprogramming(target_embedding, source_embedding, value_embedding)
out = out.reshape(B, L, -1)
return self.out_projection(out)
def reprogramming(self, target_embedding, source_embedding, value_embedding):
B, L, H, E = target_embedding.shape
scale = 1. / sqrt(E)
scores = torch.einsum("blhe,she->bhls", target_embedding, source_embedding)
A = self.dropout(torch.softmax(scale * scores, dim=-1))
reprogramming_embedding = torch.einsum("bhls,she->blhe", A, value_embedding)
return reprogramming_embedding

0
models/__init__.py Normal file
View File

12
requirements.txt Normal file
View File

@ -0,0 +1,12 @@
accelerate==0.20.3
einops==0.7.0
matplotlib==3.7.0
numpy==1.23.5
pandas==1.5.3
scikit_learn==1.2.2
scipy==1.5.4
torch==2.0.1
tqdm==4.65.0
peft==0.4.0
transformers==4.31.0
deepspeed==0.13.0

309
run_m4.py Normal file
View File

@ -0,0 +1,309 @@
import argparse
import torch
from accelerate import Accelerator, DeepSpeedPlugin
from accelerate import DistributedDataParallelKwargs
from torch import optim
from torch.optim import lr_scheduler
from data_provider.m4 import M4Meta
from models import Autoformer, DLinear, TimeLLM
from data_provider.data_factory import data_provider
import time
import random
import numpy as np
import pandas
from utils.losses import smape_loss
from utils.m4_summary import M4Summary
import os
os.environ['CURL_CA_BUNDLE'] = ''
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:64"
from utils.tools import del_files, EarlyStopping, adjust_learning_rate, load_content, test
parser = argparse.ArgumentParser(description='Time-LLM')
fix_seed = 2021
random.seed(fix_seed)
torch.manual_seed(fix_seed)
np.random.seed(fix_seed)
# basic config
parser.add_argument('--task_name', type=str, required=True, default='long_term_forecast',
help='task name, options:[long_term_forecast, short_term_forecast, imputation, classification, anomaly_detection]')
parser.add_argument('--is_training', type=int, required=True, default=1, help='status')
parser.add_argument('--model_id', type=str, required=True, default='test', help='model id')
parser.add_argument('--model_comment', type=str, required=True, default='none', help='prefix when saving test results')
parser.add_argument('--model', type=str, required=True, default='Autoformer',
help='model name, options: [Autoformer, DLinear]')
parser.add_argument('--seed', type=int, default=0, help='random seed')
# data loader
parser.add_argument('--data', type=str, required=True, default='ETTm1', help='dataset type')
parser.add_argument('--root_path', type=str, default='./dataset', help='root path of the data file')
parser.add_argument('--data_path', type=str, default='ETTh1.csv', help='data file')
parser.add_argument('--features', type=str, default='M',
help='forecasting task, options:[M, S, MS]; '
'M:multivariate predict multivariate, S: univariate predict univariate, '
'MS:multivariate predict univariate')
parser.add_argument('--target', type=str, default='OT', help='target feature in S or MS task')
parser.add_argument('--loader', type=str, default='modal', help='dataset type')
parser.add_argument('--freq', type=str, default='h',
help='freq for time features encoding, '
'options:[s:secondly, t:minutely, h:hourly, d:daily, b:business days, w:weekly, m:monthly], '
'you can also use more detailed freq like 15min or 3h')
parser.add_argument('--checkpoints', type=str, default='./checkpoints/', help='location of model checkpoints')
# forecasting task
parser.add_argument('--seq_len', type=int, default=96, help='input sequence length')
parser.add_argument('--label_len', type=int, default=48, help='start token length')
parser.add_argument('--pred_len', type=int, default=96, help='prediction sequence length')
parser.add_argument('--seasonal_patterns', type=str, default='Monthly', help='subset for M4')
# model define
parser.add_argument('--enc_in', type=int, default=7, help='encoder input size')
parser.add_argument('--dec_in', type=int, default=7, help='decoder input size')
parser.add_argument('--c_out', type=int, default=7, help='output size')
parser.add_argument('--d_model', type=int, default=16, help='dimension of model')
parser.add_argument('--n_heads', type=int, default=8, help='num of heads')
parser.add_argument('--e_layers', type=int, default=2, help='num of encoder layers')
parser.add_argument('--d_layers', type=int, default=1, help='num of decoder layers')
parser.add_argument('--d_ff', type=int, default=32, help='dimension of fcn')
parser.add_argument('--moving_avg', type=int, default=25, help='window size of moving average')
parser.add_argument('--factor', type=int, default=1, help='attn factor')
parser.add_argument('--dropout', type=float, default=0.1, help='dropout')
parser.add_argument('--embed', type=str, default='timeF',
help='time features encoding, options:[timeF, fixed, learned]')
parser.add_argument('--activation', type=str, default='gelu', help='activation')
parser.add_argument('--output_attention', action='store_true', help='whether to output attention in ecoder')
parser.add_argument('--patch_len', type=int, default=16, help='patch length')
parser.add_argument('--stride', type=int, default=8, help='stride')
parser.add_argument('--prompt_domain', type=int, default=0, help='stride')
# optimization
parser.add_argument('--num_workers', type=int, default=10, help='data loader num workers')
parser.add_argument('--itr', type=int, default=1, help='experiments times')
parser.add_argument('--train_epochs', type=int, default=10, help='train epochs')
parser.add_argument('--align_epochs', type=int, default=10, help='alignment epochs')
parser.add_argument('--batch_size', type=int, default=32, help='batch size of train input data')
parser.add_argument('--eval_batch_size', type=int, default=8, help='batch size of model evaluation')
parser.add_argument('--patience', type=int, default=20, help='early stopping patience')
parser.add_argument('--learning_rate', type=float, default=0.0001, help='optimizer learning rate')
parser.add_argument('--des', type=str, default='test', help='exp description')
parser.add_argument('--loss', type=str, default='MSE', help='loss function')
parser.add_argument('--lradj', type=str, default='type1', help='adjust learning rate')
parser.add_argument('--pct_start', type=float, default=0.2, help='pct_start')
parser.add_argument('--use_amp', action='store_true', help='use automatic mixed precision training', default=False)
parser.add_argument('--llm_layers', type=int, default=6)
parser.add_argument('--percent', type=int, default=100)
args = parser.parse_args()
ddp_kwargs = DistributedDataParallelKwargs(find_unused_parameters=True)
deepspeed_plugin = DeepSpeedPlugin(hf_ds_config='./ds_config_zero2.json')
accelerator = Accelerator(kwargs_handlers=[ddp_kwargs], deepspeed_plugin=deepspeed_plugin)
for ii in range(args.itr):
# setting record of experiments
setting = '{}_{}_{}_{}_ft{}_sl{}_ll{}_pl{}_dm{}_nh{}_el{}_dl{}_df{}_fc{}_eb{}_{}_{}'.format(
args.task_name,
args.model_id,
args.model,
args.data,
args.features,
args.seq_len,
args.label_len,
args.pred_len,
args.d_model,
args.n_heads,
args.e_layers,
args.d_layers,
args.d_ff,
args.factor,
args.embed,
args.des, ii)
if args.data == 'm4':
args.pred_len = M4Meta.horizons_map[args.seasonal_patterns] # Up to M4 config
args.seq_len = 2 * args.pred_len
args.label_len = args.pred_len
args.frequency_map = M4Meta.frequency_map[args.seasonal_patterns]
train_data, train_loader = data_provider(args, 'train')
vali_data, vali_loader = data_provider(args, 'val')
test_data, test_loader = data_provider(args, 'test')
if args.model == 'Autoformer':
model = Autoformer.Model(args).float()
elif args.model == 'DLinear':
model = DLinear.Model(args).float()
else:
model = TimeLLM.Model(args).float()
path = os.path.join(args.checkpoints,
setting + '-' + args.model_comment) # unique checkpoint saving path
args.content = load_content(args)
if not os.path.exists(path) and accelerator.is_local_main_process:
os.makedirs(path)
time_now = time.time()
train_steps = len(train_loader)
early_stopping = EarlyStopping(accelerator=accelerator, patience=args.patience, verbose=True)
model_optim = optim.Adam(model.parameters(), lr=args.learning_rate)
if args.lradj == 'COS':
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(model_optim, T_max=20, eta_min=1e-8)
else:
scheduler = lr_scheduler.OneCycleLR(optimizer=model_optim,
steps_per_epoch=train_steps,
pct_start=args.pct_start,
epochs=args.train_epochs,
max_lr=args.learning_rate)
criterion = smape_loss()
train_loader, vali_loader, model, model_optim, scheduler = accelerator.prepare(
train_loader, vali_loader, model, model_optim, scheduler)
for epoch in range(args.train_epochs):
iter_count = 0
train_loss = []
model.train()
epoch_time = time.time()
for i, (batch_x, batch_y, batch_x_mark, batch_y_mark) in enumerate(train_loader):
iter_count += 1
model_optim.zero_grad()
batch_x = batch_x.float().to(accelerator.device)
batch_y = batch_y.float().to(accelerator.device)
batch_y_mark = batch_y_mark.float().to(accelerator.device)
# decoder input
dec_inp = torch.zeros_like(batch_y[:, -args.pred_len:, :]).float().to(accelerator.device)
dec_inp = torch.cat([batch_y[:, :args.label_len, :], dec_inp], dim=1).float().to(
accelerator.device)
outputs = model(batch_x, None, dec_inp, None)
f_dim = -1 if args.features == 'MS' else 0
outputs = outputs[:, -args.pred_len:, f_dim:]
batch_y = batch_y[:, -args.pred_len:, f_dim:]
batch_y_mark = batch_y_mark[:, -args.pred_len:, f_dim:]
loss = criterion(batch_x, args.frequency_map, outputs, batch_y, batch_y_mark)
train_loss.append(loss.item())
if (i + 1) % 100 == 0:
accelerator.print(
"\titers: {0}, epoch: {1} | loss: {2:.7f}".format(i + 1, epoch + 1, loss.item())
)
speed = (time.time() - time_now) / iter_count
left_time = speed * ((args.train_epochs - epoch) * train_steps - i)
accelerator.print('\tspeed: {:.4f}s/iter; left time: {:.4f}s'.format(speed, left_time))
iter_count = 0
time_now = time.time()
accelerator.backward(loss)
model_optim.step()
if args.lradj == 'TST':
adjust_learning_rate(accelerator, model_optim, scheduler, epoch + 1, args, printout=False)
scheduler.step()
accelerator.print("Epoch: {} cost time: {}".format(epoch + 1, time.time() - epoch_time))
train_loss = np.average(train_loss)
vali_loss = test(args, accelerator, model, train_loader, vali_loader, criterion)
test_loss = vali_loss
accelerator.print(
"Epoch: {0}, Steps: {1} | Train Loss: {2:.7f} Vali Loss: {3:.7f} Test Loss: {4:.7f}".format(
epoch + 1, train_steps, train_loss, vali_loss, test_loss))
early_stopping(vali_loss, model, path) # model saving
if early_stopping.early_stop:
accelerator.print("Early stopping")
break
if args.lradj != 'TST':
adjust_learning_rate(accelerator, model_optim, scheduler, epoch + 1, args, printout=True)
else:
accelerator.print('Updating learning rate to {}'.format(scheduler.get_last_lr()[0]))
best_model_path = path + '/' + 'checkpoint'
accelerator.wait_for_everyone()
unwrapped_model = accelerator.unwrap_model(model)
torch.cuda.synchronize()
torch.cuda.empty_cache()
unwrapped_model.load_state_dict(torch.load(best_model_path, map_location=lambda storage, loc: storage))
x, _ = train_loader.dataset.last_insample_window()
y = test_loader.dataset.timeseries
x = torch.tensor(x, dtype=torch.float32).to(accelerator.device)
x = x.unsqueeze(-1)
model.eval()
with torch.no_grad():
B, _, C = x.shape
dec_inp = torch.zeros((B, args.pred_len, C)).float().to(accelerator.device)
dec_inp = torch.cat([x[:, -args.label_len:, :], dec_inp], dim=1)
outputs = torch.zeros((B, args.pred_len, C)).float().to(accelerator.device)
id_list = np.arange(0, B, args.eval_batch_size)
id_list = np.append(id_list, B)
for i in range(len(id_list) - 1):
outputs[id_list[i]:id_list[i + 1], :, :] = model(
x[id_list[i]:id_list[i + 1]],
None,
dec_inp[id_list[i]:id_list[i + 1]],
None
)
accelerator.wait_for_everyone()
f_dim = -1 if args.features == 'MS' else 0
outputs = outputs[:, -args.pred_len:, f_dim:]
outputs = outputs.detach().cpu().numpy()
preds = outputs
trues = y
x = x.detach().cpu().numpy()
accelerator.print('test shape:', preds.shape)
folder_path = './m4_results/' + args.model + '-' + args.model_comment + '/'
if not os.path.exists(folder_path) and accelerator.is_local_main_process:
os.makedirs(folder_path)
if accelerator.is_local_main_process:
forecasts_df = pandas.DataFrame(preds[:, :, 0], columns=[f'V{i + 1}' for i in range(args.pred_len)])
forecasts_df.index = test_loader.dataset.ids[:preds.shape[0]]
forecasts_df.index.name = 'id'
forecasts_df.set_index(forecasts_df.columns[0], inplace=True)
forecasts_df.to_csv(folder_path + args.seasonal_patterns + '_forecast.csv')
# calculate metrics
accelerator.print(args.model)
file_path = folder_path
if 'Weekly_forecast.csv' in os.listdir(file_path) \
and 'Monthly_forecast.csv' in os.listdir(file_path) \
and 'Yearly_forecast.csv' in os.listdir(file_path) \
and 'Daily_forecast.csv' in os.listdir(file_path) \
and 'Hourly_forecast.csv' in os.listdir(file_path) \
and 'Quarterly_forecast.csv' in os.listdir(file_path):
m4_summary = M4Summary(file_path, args.root_path)
# m4_forecast.set_index(m4_winner_forecast.columns[0], inplace=True)
smape_results, owa_results, mape, mase = m4_summary.evaluate()
accelerator.print('smape:', smape_results)
accelerator.print('mape:', mape)
accelerator.print('mase:', mase)
accelerator.print('owa:', owa_results)
else:
accelerator.print('After all 6 tasks are finished, you can calculate the averaged performance')
accelerator.wait_for_everyone()
if accelerator.is_local_main_process:
path = './checkpoints' # unique checkpoint saving path
del_files(path) # delete checkpoint files
accelerator.print('success delete checkpoints')

267
run_main.py Normal file
View File

@ -0,0 +1,267 @@
import argparse
import torch
from accelerate import Accelerator, DeepSpeedPlugin
from accelerate import DistributedDataParallelKwargs
from torch import nn, optim
from torch.optim import lr_scheduler
from tqdm import tqdm
from models import Autoformer, DLinear, TimeLLM
from data_provider.data_factory import data_provider
import time
import random
import numpy as np
import os
os.environ['CURL_CA_BUNDLE'] = ''
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:64"
from utils.tools import del_files, EarlyStopping, adjust_learning_rate, vali, load_content
parser = argparse.ArgumentParser(description='Time-LLM')
fix_seed = 2021
random.seed(fix_seed)
torch.manual_seed(fix_seed)
np.random.seed(fix_seed)
# basic config
parser.add_argument('--task_name', type=str, required=True, default='long_term_forecast',
help='task name, options:[long_term_forecast, short_term_forecast, imputation, classification, anomaly_detection]')
parser.add_argument('--is_training', type=int, required=True, default=1, help='status')
parser.add_argument('--model_id', type=str, required=True, default='test', help='model id')
parser.add_argument('--model_comment', type=str, required=True, default='none', help='prefix when saving test results')
parser.add_argument('--model', type=str, required=True, default='Autoformer',
help='model name, options: [Autoformer, DLinear]')
parser.add_argument('--seed', type=int, default=2021, help='random seed')
# data loader
parser.add_argument('--data', type=str, required=True, default='ETTm1', help='dataset type')
parser.add_argument('--root_path', type=str, default='./dataset', help='root path of the data file')
parser.add_argument('--data_path', type=str, default='ETTh1.csv', help='data file')
parser.add_argument('--features', type=str, default='M',
help='forecasting task, options:[M, S, MS]; '
'M:multivariate predict multivariate, S: univariate predict univariate, '
'MS:multivariate predict univariate')
parser.add_argument('--target', type=str, default='OT', help='target feature in S or MS task')
parser.add_argument('--loader', type=str, default='modal', help='dataset type')
parser.add_argument('--freq', type=str, default='h',
help='freq for time features encoding, '
'options:[s:secondly, t:minutely, h:hourly, d:daily, b:business days, w:weekly, m:monthly], '
'you can also use more detailed freq like 15min or 3h')
parser.add_argument('--checkpoints', type=str, default='./checkpoints/', help='location of model checkpoints')
# forecasting task
parser.add_argument('--seq_len', type=int, default=96, help='input sequence length')
parser.add_argument('--label_len', type=int, default=48, help='start token length')
parser.add_argument('--pred_len', type=int, default=96, help='prediction sequence length')
parser.add_argument('--seasonal_patterns', type=str, default='Monthly', help='subset for M4')
# model define
parser.add_argument('--enc_in', type=int, default=7, help='encoder input size')
parser.add_argument('--dec_in', type=int, default=7, help='decoder input size')
parser.add_argument('--c_out', type=int, default=7, help='output size')
parser.add_argument('--d_model', type=int, default=16, help='dimension of model')
parser.add_argument('--n_heads', type=int, default=8, help='num of heads')
parser.add_argument('--e_layers', type=int, default=2, help='num of encoder layers')
parser.add_argument('--d_layers', type=int, default=1, help='num of decoder layers')
parser.add_argument('--d_ff', type=int, default=32, help='dimension of fcn')
parser.add_argument('--moving_avg', type=int, default=25, help='window size of moving average')
parser.add_argument('--factor', type=int, default=1, help='attn factor')
parser.add_argument('--dropout', type=float, default=0.1, help='dropout')
parser.add_argument('--embed', type=str, default='timeF',
help='time features encoding, options:[timeF, fixed, learned]')
parser.add_argument('--activation', type=str, default='gelu', help='activation')
parser.add_argument('--output_attention', action='store_true', help='whether to output attention in encoder')
parser.add_argument('--patch_len', type=int, default=16, help='patch length')
parser.add_argument('--stride', type=int, default=8, help='stride')
parser.add_argument('--prompt_domain', type=int, default=0, help='')
# optimization
parser.add_argument('--num_workers', type=int, default=10, help='data loader num workers')
parser.add_argument('--itr', type=int, default=1, help='experiments times')
parser.add_argument('--train_epochs', type=int, default=10, help='train epochs')
parser.add_argument('--align_epochs', type=int, default=10, help='alignment epochs')
parser.add_argument('--batch_size', type=int, default=32, help='batch size of train input data')
parser.add_argument('--eval_batch_size', type=int, default=8, help='batch size of model evaluation')
parser.add_argument('--patience', type=int, default=10, help='early stopping patience')
parser.add_argument('--learning_rate', type=float, default=0.0001, help='optimizer learning rate')
parser.add_argument('--des', type=str, default='test', help='exp description')
parser.add_argument('--loss', type=str, default='MSE', help='loss function')
parser.add_argument('--lradj', type=str, default='type1', help='adjust learning rate')
parser.add_argument('--pct_start', type=float, default=0.2, help='pct_start')
parser.add_argument('--use_amp', action='store_true', help='use automatic mixed precision training', default=False)
parser.add_argument('--llm_layers', type=int, default=6)
parser.add_argument('--percent', type=int, default=100)
args = parser.parse_args()
ddp_kwargs = DistributedDataParallelKwargs(find_unused_parameters=True)
deepspeed_plugin = DeepSpeedPlugin(hf_ds_config='./ds_config_zero2.json')
accelerator = Accelerator(kwargs_handlers=[ddp_kwargs], deepspeed_plugin=deepspeed_plugin)
for ii in range(args.itr):
# setting record of experiments
setting = '{}_{}_{}_{}_ft{}_sl{}_ll{}_pl{}_dm{}_nh{}_el{}_dl{}_df{}_fc{}_eb{}_{}_{}'.format(
args.task_name,
args.model_id,
args.model,
args.data,
args.features,
args.seq_len,
args.label_len,
args.pred_len,
args.d_model,
args.n_heads,
args.e_layers,
args.d_layers,
args.d_ff,
args.factor,
args.embed,
args.des, ii)
train_data, train_loader = data_provider(args, 'train')
vali_data, vali_loader = data_provider(args, 'val')
test_data, test_loader = data_provider(args, 'test')
if args.model == 'Autoformer':
model = Autoformer.Model(args).float()
elif args.model == 'DLinear':
model = DLinear.Model(args).float()
else:
model = TimeLLM.Model(args).float()
path = os.path.join(args.checkpoints,
setting + '-' + args.model_comment) # unique checkpoint saving path
args.content = load_content(args)
if not os.path.exists(path) and accelerator.is_local_main_process:
os.makedirs(path)
time_now = time.time()
train_steps = len(train_loader)
early_stopping = EarlyStopping(accelerator=accelerator, patience=args.patience)
trained_parameters = []
for p in model.parameters():
if p.requires_grad is True:
trained_parameters.append(p)
model_optim = optim.Adam(trained_parameters, lr=args.learning_rate)
if args.lradj == 'COS':
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(model_optim, T_max=20, eta_min=1e-8)
else:
scheduler = lr_scheduler.OneCycleLR(optimizer=model_optim,
steps_per_epoch=train_steps,
pct_start=args.pct_start,
epochs=args.train_epochs,
max_lr=args.learning_rate)
criterion = nn.MSELoss()
mae_metric = nn.L1Loss()
train_loader, vali_loader, test_loader, model, model_optim, scheduler = accelerator.prepare(
train_loader, vali_loader, test_loader, model, model_optim, scheduler)
if args.use_amp:
scaler = torch.cuda.amp.GradScaler()
for epoch in range(args.train_epochs):
iter_count = 0
train_loss = []
model.train()
epoch_time = time.time()
for i, (batch_x, batch_y, batch_x_mark, batch_y_mark) in tqdm(enumerate(train_loader)):
iter_count += 1
model_optim.zero_grad()
batch_x = batch_x.float().to(accelerator.device)
batch_y = batch_y.float().to(accelerator.device)
batch_x_mark = batch_x_mark.float().to(accelerator.device)
batch_y_mark = batch_y_mark.float().to(accelerator.device)
# decoder input
dec_inp = torch.zeros_like(batch_y[:, -args.pred_len:, :]).float().to(
accelerator.device)
dec_inp = torch.cat([batch_y[:, :args.label_len, :], dec_inp], dim=1).float().to(
accelerator.device)
# encoder - decoder
if args.use_amp:
with torch.cuda.amp.autocast():
if args.output_attention:
outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)[0]
else:
outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
f_dim = -1 if args.features == 'MS' else 0
outputs = outputs[:, -args.pred_len:, f_dim:]
batch_y = batch_y[:, -args.pred_len:, f_dim:].to(accelerator.device)
loss = criterion(outputs, batch_y)
train_loss.append(loss.item())
else:
if args.output_attention:
outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)[0]
else:
outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
f_dim = -1 if args.features == 'MS' else 0
outputs = outputs[:, -args.pred_len:, f_dim:]
batch_y = batch_y[:, -args.pred_len:, f_dim:]
loss = criterion(outputs, batch_y)
train_loss.append(loss.item())
if (i + 1) % 100 == 0:
accelerator.print(
"\titers: {0}, epoch: {1} | loss: {2:.7f}".format(i + 1, epoch + 1, loss.item()))
speed = (time.time() - time_now) / iter_count
left_time = speed * ((args.train_epochs - epoch) * train_steps - i)
accelerator.print('\tspeed: {:.4f}s/iter; left time: {:.4f}s'.format(speed, left_time))
iter_count = 0
time_now = time.time()
if args.use_amp:
scaler.scale(loss).backward()
scaler.step(model_optim)
scaler.update()
else:
accelerator.backward(loss)
model_optim.step()
if args.lradj == 'TST':
adjust_learning_rate(accelerator, model_optim, scheduler, epoch + 1, args, printout=False)
scheduler.step()
accelerator.print("Epoch: {} cost time: {}".format(epoch + 1, time.time() - epoch_time))
train_loss = np.average(train_loss)
vali_loss, vali_mae_loss = vali(args, accelerator, model, vali_data, vali_loader, criterion, mae_metric)
test_loss, test_mae_loss = vali(args, accelerator, model, test_data, test_loader, criterion, mae_metric)
accelerator.print(
"Epoch: {0} | Train Loss: {1:.7f} Vali Loss: {2:.7f} Test Loss: {3:.7f} MAE Loss: {4:.7f}".format(
epoch + 1, train_loss, vali_loss, test_loss, test_mae_loss))
early_stopping(vali_loss, model, path)
if early_stopping.early_stop:
accelerator.print("Early stopping")
break
if args.lradj != 'TST':
if args.lradj == 'COS':
scheduler.step()
accelerator.print("lr = {:.10f}".format(model_optim.param_groups[0]['lr']))
else:
if epoch == 0:
args.learning_rate = model_optim.param_groups[0]['lr']
accelerator.print("lr = {:.10f}".format(model_optim.param_groups[0]['lr']))
adjust_learning_rate(accelerator, model_optim, scheduler, epoch + 1, args, printout=True)
else:
accelerator.print('Updating learning rate to {}'.format(scheduler.get_last_lr()[0]))
accelerator.wait_for_everyone()
if accelerator.is_local_main_process:
path = './checkpoints' # unique checkpoint saving path
del_files(path) # delete checkpoint files
accelerator.print('success delete checkpoints')

117
scripts/TimeLLM_ETTh1.sh Normal file
View File

@ -0,0 +1,117 @@
model_name=TimeLLM
train_epochs=100
learning_rate=0.01
llama_layers=32
master_port=00097
num_process=8
batch_size=24
d_model=32
d_ff=128
comment='TimeLLM-ETTh1'
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_main.py \
--task_name long_term_forecast \
--is_training 1 \
--root_path ./dataset/ETT-small/ \
--data_path ETTh1.csv \
--model_id ETTh1_512_96 \
--model $model_name \
--data ETTh1 \
--features M \
--seq_len 512 \
--label_len 48 \
--pred_len 96 \
--factor 3 \
--enc_in 7 \
--dec_in 7 \
--c_out 7 \
--des 'Exp' \
--itr 1 \
--d_model $d_model \
--d_ff $d_ff \
--batch_size $batch_size \
--learning_rate $learning_rate \
--llm_layers $llama_layers \
--train_epochs $train_epochs \
--model_comment $comment
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_main.py \
--task_name long_term_forecast \
--is_training 1 \
--root_path ./dataset/ETT-small/ \
--data_path ETTh1.csv \
--model_id ETTh1_512_192 \
--model $model_name \
--data ETTh1 \
--features M \
--seq_len 512 \
--label_len 48 \
--pred_len 192 \
--factor 3 \
--enc_in 7 \
--dec_in 7 \
--c_out 7 \
--des 'Exp' \
--itr 1 \
--d_model 32 \
--d_ff 128 \
--batch_size $batch_size \
--learning_rate 0.02 \
--llm_layers $llama_layers \
--train_epochs $train_epochs \
--model_comment $comment
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_main.py \
--task_name long_term_forecast \
--is_training 1 \
--root_path ./dataset/ETT-small/ \
--data_path ETTh1.csv \
--model_id ETTh1_512_336 \
--model $model_name \
--data ETTh1 \
--features M \
--seq_len 512 \
--label_len 48 \
--pred_len 336 \
--factor 3 \
--enc_in 7 \
--dec_in 7 \
--c_out 7 \
--des 'Exp' \
--itr 1 \
--d_model $d_model \
--d_ff $d_ff \
--batch_size $batch_size \
--lradj 'COS'\
--learning_rate 0.001 \
--llm_layers $llama_layers \
--train_epochs $train_epochs \
--model_comment $comment
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_main.py \
--task_name long_term_forecast \
--is_training 1 \
--root_path ./dataset/ETT-small/ \
--data_path ETTh1.csv \
--model_id ETTh1_512_720 \
--model $model_name \
--data ETTh1 \
--features M \
--seq_len 512 \
--label_len 48 \
--pred_len 720 \
--factor 3 \
--enc_in 7 \
--dec_in 7 \
--c_out 7 \
--des 'Exp' \
--itr 1 \
--d_model $d_model \
--d_ff $d_ff \
--batch_size $batch_size \
--learning_rate $learning_rate \
--llm_layers $llama_layers \
--train_epochs $train_epochs \
--model_comment $comment

122
scripts/TimeLLM_ETTh2.sh Normal file
View File

@ -0,0 +1,122 @@
model_name=TimeLLM
train_epochs=10
learning_rate=0.01
llama_layers=32
master_port=00098
num_process=8
batch_size=24
d_model=32
d_ff=128
comment='TimeLLM-ETTh2'
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_main.py \
--task_name long_term_forecast \
--is_training 1 \
--root_path ./dataset/ETT-small/ \
--data_path ETTh2.csv \
--model_id ETTh2_512_96 \
--model $model_name \
--data ETTh2 \
--features M \
--seq_len 512 \
--label_len 48 \
--pred_len 96 \
--factor 3 \
--enc_in 7 \
--dec_in 7 \
--c_out 7 \
--des 'Exp' \
--itr 1 \
--d_model $d_model \
--d_ff $d_ff \
--batch_size $batch_size \
--learning_rate $learning_rate \
--llm_layers $llama_layers \
--train_epochs $train_epochs \
--model_comment $comment
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_main.py \
--task_name long_term_forecast \
--is_training 1 \
--root_path ./dataset/ETT-small/ \
--data_path ETTh2.csv \
--model_id ETTh2_512_192 \
--model $model_name \
--data ETTh2 \
--features M \
--seq_len 512 \
--label_len 48 \
--pred_len 192 \
--factor 3 \
--enc_in 7 \
--dec_in 7 \
--c_out 7 \
--des 'Exp' \
--itr 1 \
--d_model $d_model \
--d_ff $d_ff \
--batch_size $batch_size \
--lradj 'TST'\
--learning_rate 0.002 \
--llm_layers $llama_layers \
--train_epochs $train_epochs \
--model_comment $comment
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_main.py \
--task_name long_term_forecast \
--is_training 1 \
--root_path ./dataset/ETT-small/ \
--data_path ETTh2.csv \
--model_id ETTh2_512_336 \
--model $model_name \
--data ETTh2 \
--features M \
--seq_len 512 \
--label_len 48 \
--pred_len 336 \
--factor 3 \
--enc_in 7 \
--dec_in 7 \
--c_out 7 \
--des 'Exp' \
--itr 1 \
--d_model $d_model \
--d_ff $d_ff \
--batch_size $batch_size \
--lradj 'TST'\
--learning_rate 0.005 \
--llm_layers $llama_layers \
--train_epochs $train_epochs \
--model_comment $comment
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_main.py \
--task_name long_term_forecast \
--is_training 1 \
--root_path ./dataset/ETT-small/ \
--data_path ETTh2.csv \
--model_id ETTh2_512_720 \
--model $model_name \
--data ETTh2 \
--features M \
--seq_len 512 \
--label_len 48 \
--pred_len 720 \
--factor 3 \
--enc_in 7 \
--dec_in 7 \
--c_out 7 \
--des 'Exp' \
--itr 1 \
--d_model 16 \
--d_ff 128 \
--batch_size $batch_size \
--learning_rate 0.005 \
--lradj 'TST'\
--llm_layers $llama_layers \
--train_epochs 20 \
--patience 10 \
--model_comment $comment

126
scripts/TimeLLM_ETTm1.sh Normal file
View File

@ -0,0 +1,126 @@
model_name=TimeLLM
train_epochs=100
learning_rate=0.01
llama_layers=32
master_port=00097
num_process=8
batch_size=24
d_model=32
d_ff=128
comment='TimeLLM-ETTm1'
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_main.py \
--task_name long_term_forecast \
--is_training 1 \
--root_path ./dataset/ETT-small/ \
--data_path ETTm1.csv \
--model_id ETTm1_512_96 \
--model $model_name \
--data ETTm1 \
--features M \
--seq_len 512 \
--label_len 48 \
--pred_len 96 \
--factor 3 \
--enc_in 7 \
--dec_in 7 \
--c_out 7 \
--des 'Exp' \
--itr 1 \
--d_model $d_model \
--d_ff $d_ff \
--batch_size $batch_size \
--lradj 'TST'\
--learning_rate 0.001 \
--llm_layers $llama_layers \
--train_epochs $train_epochs \
--model_comment $comment
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_main.py \
--task_name long_term_forecast \
--is_training 1 \
--root_path ./dataset/ETT-small/ \
--data_path ETTm1.csv \
--model_id ETTm1_512_192 \
--model $model_name \
--data ETTm1 \
--features M \
--seq_len 512 \
--label_len 48 \
--pred_len 192 \
--factor 3 \
--enc_in 7 \
--dec_in 7 \
--c_out 7 \
--des 'Exp' \
--itr 1 \
--d_model $d_model \
--d_ff $d_ff \
--batch_size $batch_size \
--learning_rate $learning_rate \
--lradj 'TST'\
--learning_rate 0.001 \
--llm_layers $llama_layers \
--train_epochs $train_epochs \
--patience 20 \
--model_comment $comment
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_main.py \
--task_name long_term_forecast \
--is_training 1 \
--root_path ./dataset/ETT-small/ \
--data_path ETTm1.csv \
--model_id ETTm1_512_336 \
--model $model_name \
--data ETTm1 \
--features M \
--seq_len 512 \
--label_len 48 \
--pred_len 336 \
--factor 3 \
--enc_in 7 \
--dec_in 7 \
--c_out 7 \
--des 'Exp' \
--itr 1 \
--d_model $d_model \
--d_ff $d_ff \
--batch_size $batch_size \
--learning_rate $learning_rate \
--lradj 'TST'\
--learning_rate 0.001 \
--llm_layers $llama_layers \
--train_epochs $train_epochs \
--patience 20 \
--model_comment $comment
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_main.py \
--task_name long_term_forecast \
--is_training 1 \
--root_path ./dataset/ETT-small/ \
--data_path ETTm1.csv \
--model_id ETTm1_512_720 \
--model $model_name \
--data ETTm1 \
--features M \
--seq_len 512 \
--label_len 48 \
--pred_len 720 \
--factor 3 \
--enc_in 7 \
--dec_in 7 \
--c_out 7 \
--des 'Exp' \
--itr 1 \
--d_model $d_model \
--d_ff $d_ff \
--batch_size $batch_size \
--learning_rate $learning_rate \
--lradj 'TST'\
--learning_rate 0.001 \
--llm_layers $llama_layers \
--train_epochs $train_epochs \
--patience 20 \
--model_comment $comment

124
scripts/TimeLLM_ETTm2.sh Normal file
View File

@ -0,0 +1,124 @@
model_name=TimeLLM
train_epochs=10
learning_rate=0.01
llama_layers=32
master_port=00097
num_process=8
batch_size=24
d_model=32
d_ff=128
comment='TimeLLM-ETTm2'
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_main.py \
--task_name long_term_forecast \
--is_training 1 \
--root_path ./dataset/ETT-small/ \
--data_path ETTm2.csv \
--model_id ETTm2_512_96 \
--model $model_name \
--data ETTm2 \
--features M \
--seq_len 512 \
--label_len 48 \
--pred_len 96 \
--factor 3 \
--enc_in 7 \
--dec_in 7 \
--c_out 7 \
--des 'Exp' \
--itr 1 \
--d_model $d_model \
--d_ff $d_ff \
--batch_size 16 \
--learning_rate $learning_rate \
--llm_layers $llama_layers \
--train_epochs $train_epochs \
--model_comment $comment
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_main.py \
--task_name long_term_forecast \
--is_training 1 \
--root_path ./dataset/ETT-small/ \
--data_path ETTm2.csv \
--model_id ETTm2_512_192 \
--model $model_name \
--data ETTm2 \
--features M \
--seq_len 512 \
--label_len 48 \
--pred_len 192 \
--factor 3 \
--enc_in 7 \
--dec_in 7 \
--c_out 7 \
--des 'Exp' \
--itr 1 \
--d_model $d_model \
--d_ff $d_ff \
--batch_size $batch_size \
--learning_rate $learning_rate \
--lradj 'TST'\
--learning_rate 0.002 \
--llm_layers $llama_layers \
--train_epochs $train_epochs \
--model_comment $comment
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_main.py \
--task_name long_term_forecast \
--is_training 1 \
--root_path ./dataset/ETT-small/ \
--data_path ETTm2.csv \
--model_id ETTm2_512_336 \
--model $model_name \
--data ETTm2 \
--features M \
--seq_len 512 \
--label_len 48 \
--pred_len 336 \
--factor 3 \
--enc_in 7 \
--dec_in 7 \
--c_out 7 \
--des 'Exp' \
--itr 1 \
--d_model $d_model \
--d_ff $d_ff \
--batch_size $batch_size \
--learning_rate $learning_rate \
--lradj 'TST'\
--learning_rate 0.002 \
--llm_layers $llama_layers \
--train_epochs $train_epochs \
--model_comment $comment
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_main.py \
--task_name long_term_forecast \
--is_training 1 \
--root_path ./dataset/ETT-small/ \
--data_path ETTm2.csv \
--model_id ETTm2_512_720 \
--model $model_name \
--data ETTm2 \
--features M \
--seq_len 512 \
--label_len 48 \
--pred_len 720 \
--factor 3 \
--enc_in 7 \
--dec_in 7 \
--c_out 7 \
--des 'Exp' \
--itr 1 \
--d_model $d_model \
--d_ff $d_ff \
--batch_size $batch_size \
--learning_rate $learning_rate \
--lradj 'TST'\
--learning_rate 0.002 \
--llm_layers $llama_layers \
--train_epochs $train_epochs \
--model_comment $comment

164
scripts/TimeLLM_M4.sh Executable file
View File

@ -0,0 +1,164 @@
model_name=TimeLLM
train_epochs=50
llama_layers=32
batch_size=24
learning_rate=0.001
d_model=8
d_ff=32
master_port=00097
num_process=8
comment='TimeLLM-M4'
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_m4.py \
--task_name short_term_forecast \
--is_training 1 \
--root_path ./dataset/m4 \
--seasonal_patterns 'Monthly' \
--model_id m4_Monthly \
--model $model_name \
--data m4 \
--features M \
--enc_in 1 \
--dec_in 1 \
--c_out 1 \
--llm_layers $llama_layers \
--d_model $d_model \
--d_ff $d_ff \
--patch_len 1 \
--stride 1 \
--batch_size $batch_size \
--des 'Exp' \
--itr 1 \
--learning_rate $learning_rate \
--loss 'SMAPE' \
--train_epochs $train_epochs \
--model_comment $comment
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_m4.py \
--task_name short_term_forecast \
--is_training 1 \
--root_path ./dataset/m4 \
--seasonal_patterns 'Yearly' \
--model_id m4_Yearly \
--model $model_name \
--data m4 \
--features M \
--enc_in 1 \
--dec_in 1 \
--c_out 1 \
--llm_layers $llama_layers \
--d_model $d_model \
--d_ff $d_ff \
--patch_len 1 \
--stride 1 \
--batch_size $batch_size \
--des 'Exp' \
--itr 1 \
--learning_rate $learning_rate \
--loss 'SMAPE' \
--train_epochs $train_epochs \
--model_comment $comment
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_m4.py \
--task_name short_term_forecast \
--is_training 1 \
--root_path ./dataset/m4 \
--seasonal_patterns 'Weekly' \
--model_id m4_Weekly \
--model $model_name \
--data m4 \
--features M \
--enc_in 1 \
--dec_in 1 \
--c_out 1 \
--llm_layers $llama_layers \
--d_model $d_model \
--d_ff $d_ff \
--patch_len 1 \
--stride 1 \
--batch_size $batch_size \
--des 'Exp' \
--itr 1 \
--learning_rate $learning_rate \
--loss 'SMAPE' \
--train_epochs $train_epochs \
--model_comment $comment
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_m4.py \
--task_name short_term_forecast \
--is_training 1 \
--root_path ./dataset/m4 \
--seasonal_patterns 'Daily' \
--model_id m4_Daily \
--model $model_name \
--data m4 \
--features M \
--enc_in 1 \
--dec_in 1 \
--c_out 1 \
--llm_layers $llama_layers \
--d_model $d_model \
--d_ff $d_ff \
--patch_len 1 \
--stride 1 \
--batch_size $batch_size \
--des 'Exp' \
--itr 1 \
--learning_rate $learning_rate \
--loss 'SMAPE' \
--train_epochs $train_epochs \
--model_comment $comment
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_m4.py \
--task_name short_term_forecast \
--is_training 1 \
--root_path ./dataset/m4 \
--seasonal_patterns 'Quarterly' \
--model_id m4_Quarterly \
--model $model_name \
--data m4 \
--features M \
--enc_in 1 \
--dec_in 1 \
--c_out 1 \
--llm_layers $llama_layers \
--d_model $d_model \
--d_ff $d_ff \
--patch_len 1 \
--stride 1 \
--batch_size $batch_size \
--des 'Exp' \
--itr 1 \
--learning_rate $learning_rate \
--loss 'SMAPE' \
--train_epochs $train_epochs \
--model_comment $comment
accelerate launch --multi_gpu --mixed_precision bf16 --num_processes $num_process --main_process_port $master_port run_m4.py \
--task_name short_term_forecast \
--is_training 1 \
--root_path ./dataset/m4 \
--seasonal_patterns 'Hourly' \
--model_id m4_Hourly \
--model $model_name \
--data m4 \
--features M \
--enc_in 1 \
--dec_in 1 \
--c_out 1 \
--llm_layers $llama_layers \
--d_model $d_model \
--d_ff $d_ff \
--patch_len 1 \
--stride 1 \
--batch_size $batch_size \
--des 'Exp' \
--itr 1 \
--learning_rate $learning_rate \
--loss 'SMAPE' \
--train_epochs $train_epochs \
--model_comment $comment

0
utils/__init__.py Normal file
View File

89
utils/losses.py Normal file
View File

@ -0,0 +1,89 @@
# This source code is provided for the purposes of scientific reproducibility
# under the following limited license from Element AI Inc. The code is an
# implementation of the N-BEATS model (Oreshkin et al., N-BEATS: Neural basis
# expansion analysis for interpretable time series forecasting,
# https://arxiv.org/abs/1905.10437). The copyright to the source code is
# licensed under the Creative Commons - Attribution-NonCommercial 4.0
# International license (CC BY-NC 4.0):
# https://creativecommons.org/licenses/by-nc/4.0/. Any commercial use (whether
# for the benefit of third parties or internally in production) requires an
# explicit license. The subject-matter of the N-BEATS model and associated
# materials are the property of Element AI Inc. and may be subject to patent
# protection. No license to patents is granted hereunder (whether express or
# implied). Copyright © 2020 Element AI Inc. All rights reserved.
"""
Loss functions for PyTorch.
"""
import torch as t
import torch.nn as nn
import numpy as np
import pdb
def divide_no_nan(a, b):
"""
a/b where the resulted NaN or Inf are replaced by 0.
"""
result = a / b
result[result != result] = .0
result[result == np.inf] = .0
return result
class mape_loss(nn.Module):
def __init__(self):
super(mape_loss, self).__init__()
def forward(self, insample: t.Tensor, freq: int,
forecast: t.Tensor, target: t.Tensor, mask: t.Tensor) -> t.float:
"""
MAPE loss as defined in: https://en.wikipedia.org/wiki/Mean_absolute_percentage_error
:param forecast: Forecast values. Shape: batch, time
:param target: Target values. Shape: batch, time
:param mask: 0/1 mask. Shape: batch, time
:return: Loss value
"""
weights = divide_no_nan(mask, target)
return t.mean(t.abs((forecast - target) * weights))
class smape_loss(nn.Module):
def __init__(self):
super(smape_loss, self).__init__()
def forward(self, insample: t.Tensor, freq: int,
forecast: t.Tensor, target: t.Tensor, mask: t.Tensor) -> t.float:
"""
sMAPE loss as defined in https://robjhyndman.com/hyndsight/smape/ (Makridakis 1993)
:param forecast: Forecast values. Shape: batch, time
:param target: Target values. Shape: batch, time
:param mask: 0/1 mask. Shape: batch, time
:return: Loss value
"""
return 200 * t.mean(divide_no_nan(t.abs(forecast - target),
t.abs(forecast.data) + t.abs(target.data)) * mask)
class mase_loss(nn.Module):
def __init__(self):
super(mase_loss, self).__init__()
def forward(self, insample: t.Tensor, freq: int,
forecast: t.Tensor, target: t.Tensor, mask: t.Tensor) -> t.float:
"""
MASE loss as defined in "Scaled Errors" https://robjhyndman.com/papers/mase.pdf
:param insample: Insample values. Shape: batch, time_i
:param freq: Frequency value
:param forecast: Forecast values. Shape: batch, time_o
:param target: Target values. Shape: batch, time_o
:param mask: 0/1 mask. Shape: batch, time_o
:return: Loss value
"""
masep = t.mean(t.abs(insample[:, freq:] - insample[:, :-freq]), dim=1)
masked_masep_inv = divide_no_nan(mask, masep[:, None])
return t.mean(t.abs(target - forecast) * masked_masep_inv)

140
utils/m4_summary.py Normal file
View File

@ -0,0 +1,140 @@
# This source code is provided for the purposes of scientific reproducibility
# under the following limited license from Element AI Inc. The code is an
# implementation of the N-BEATS model (Oreshkin et al., N-BEATS: Neural basis
# expansion analysis for interpretable time series forecasting,
# https://arxiv.org/abs/1905.10437). The copyright to the source code is
# licensed under the Creative Commons - Attribution-NonCommercial 4.0
# International license (CC BY-NC 4.0):
# https://creativecommons.org/licenses/by-nc/4.0/. Any commercial use (whether
# for the benefit of third parties or internally in production) requires an
# explicit license. The subject-matter of the N-BEATS model and associated
# materials are the property of Element AI Inc. and may be subject to patent
# protection. No license to patents is granted hereunder (whether express or
# implied). Copyright 2020 Element AI Inc. All rights reserved.
"""
M4 Summary
"""
from collections import OrderedDict
import numpy as np
import pandas as pd
from data_provider.m4 import M4Dataset
from data_provider.m4 import M4Meta
import os
def group_values(values, groups, group_name):
return np.array([v[~np.isnan(v)] for v in values[groups == group_name]])
def mase(forecast, insample, outsample, frequency):
return np.mean(np.abs(forecast - outsample)) / np.mean(np.abs(insample[:-frequency] - insample[frequency:]))
def smape_2(forecast, target):
denom = np.abs(target) + np.abs(forecast)
# divide by 1.0 instead of 0.0, in case when denom is zero the enumerator will be 0.0 anyway.
denom[denom == 0.0] = 1.0
return 200 * np.abs(forecast - target) / denom
def mape(forecast, target):
denom = np.abs(target)
# divide by 1.0 instead of 0.0, in case when denom is zero the enumerator will be 0.0 anyway.
denom[denom == 0.0] = 1.0
return 100 * np.abs(forecast - target) / denom
class M4Summary:
def __init__(self, file_path, root_path):
self.file_path = file_path
self.training_set = M4Dataset.load(training=True, dataset_file=root_path)
self.test_set = M4Dataset.load(training=False, dataset_file=root_path)
self.naive_path = os.path.join(root_path, 'submission-Naive2.csv')
def evaluate(self):
"""
Evaluate forecasts using M4 test dataset.
:param forecast: Forecasts. Shape: timeseries, time.
:return: sMAPE and OWA grouped by seasonal patterns.
"""
grouped_owa = OrderedDict()
naive2_forecasts = pd.read_csv(self.naive_path).values[:, 1:].astype(np.float32)
naive2_forecasts = np.array([v[~np.isnan(v)] for v in naive2_forecasts])
model_mases = {}
naive2_smapes = {}
naive2_mases = {}
grouped_smapes = {}
grouped_mapes = {}
for group_name in M4Meta.seasonal_patterns:
file_name = self.file_path + group_name + "_forecast.csv"
if os.path.exists(file_name):
model_forecast = pd.read_csv(file_name).values
naive2_forecast = group_values(naive2_forecasts, self.test_set.groups, group_name)
target = group_values(self.test_set.values, self.test_set.groups, group_name)
# all timeseries within group have same frequency
frequency = self.training_set.frequencies[self.test_set.groups == group_name][0]
insample = group_values(self.training_set.values, self.test_set.groups, group_name)
model_mases[group_name] = np.mean([mase(forecast=model_forecast[i],
insample=insample[i],
outsample=target[i],
frequency=frequency) for i in range(len(model_forecast))])
naive2_mases[group_name] = np.mean([mase(forecast=naive2_forecast[i],
insample=insample[i],
outsample=target[i],
frequency=frequency) for i in range(len(model_forecast))])
naive2_smapes[group_name] = np.mean(smape_2(naive2_forecast, target))
grouped_smapes[group_name] = np.mean(smape_2(forecast=model_forecast, target=target))
grouped_mapes[group_name] = np.mean(mape(forecast=model_forecast, target=target))
grouped_smapes = self.summarize_groups(grouped_smapes)
grouped_mapes = self.summarize_groups(grouped_mapes)
grouped_model_mases = self.summarize_groups(model_mases)
grouped_naive2_smapes = self.summarize_groups(naive2_smapes)
grouped_naive2_mases = self.summarize_groups(naive2_mases)
for k in grouped_model_mases.keys():
grouped_owa[k] = (grouped_model_mases[k] / grouped_naive2_mases[k] +
grouped_smapes[k] / grouped_naive2_smapes[k]) / 2
def round_all(d):
return dict(map(lambda kv: (kv[0], np.round(kv[1], 3)), d.items()))
return round_all(grouped_smapes), round_all(grouped_owa), round_all(grouped_mapes), round_all(
grouped_model_mases)
def summarize_groups(self, scores):
"""
Re-group scores respecting M4 rules.
:param scores: Scores per group.
:return: Grouped scores.
"""
scores_summary = OrderedDict()
def group_count(group_name):
return len(np.where(self.test_set.groups == group_name)[0])
weighted_score = {}
for g in ['Yearly', 'Quarterly', 'Monthly']:
weighted_score[g] = scores[g] * group_count(g)
scores_summary[g] = scores[g]
others_score = 0
others_count = 0
for g in ['Weekly', 'Daily', 'Hourly']:
others_score += scores[g] * group_count(g)
others_count += group_count(g)
weighted_score['Others'] = others_score
scores_summary['Others'] = others_score / others_count
average = np.sum(list(weighted_score.values())) / len(self.test_set.groups)
scores_summary['Average'] = average
return scores_summary

26
utils/masking.py Normal file
View File

@ -0,0 +1,26 @@
import torch
class TriangularCausalMask():
def __init__(self, B, L, device="cpu"):
mask_shape = [B, 1, L, L]
with torch.no_grad():
self._mask = torch.triu(torch.ones(mask_shape, dtype=torch.bool), diagonal=1).to(device)
@property
def mask(self):
return self._mask
class ProbMask():
def __init__(self, B, H, L, index, scores, device="cpu"):
_mask = torch.ones(L, scores.shape[-1], dtype=torch.bool).to(device).triu(1)
_mask_ex = _mask[None, None, :].expand(B, H, L, scores.shape[-1])
indicator = _mask_ex[torch.arange(B)[:, None, None],
torch.arange(H)[None, :, None],
index, :].to(device)
self._mask = indicator.view(scores.shape).to(device)
@property
def mask(self):
return self._mask

41
utils/metrics.py Normal file
View File

@ -0,0 +1,41 @@
import numpy as np
def RSE(pred, true):
return np.sqrt(np.sum((true - pred) ** 2)) / np.sqrt(np.sum((true - true.mean()) ** 2))
def CORR(pred, true):
u = ((true - true.mean(0)) * (pred - pred.mean(0))).sum(0)
d = np.sqrt(((true - true.mean(0)) ** 2 * (pred - pred.mean(0)) ** 2).sum(0))
return (u / d).mean(-1)
def MAE(pred, true):
return np.mean(np.abs(pred - true))
def MSE(pred, true):
return np.mean((pred - true) ** 2)
def RMSE(pred, true):
return np.sqrt(MSE(pred, true))
def MAPE(pred, true):
return np.mean(np.abs((pred - true) / true))
def MSPE(pred, true):
return np.mean(np.square((pred - true) / true))
def metric(pred, true):
mae = MAE(pred, true)
mse = MSE(pred, true)
rmse = RMSE(pred, true)
mape = MAPE(pred, true)
mspe = MSPE(pred, true)
return mae, mse, rmse, mape, mspe

134
utils/timefeatures.py Normal file
View File

@ -0,0 +1,134 @@
from typing import List
import numpy as np
import pandas as pd
from pandas.tseries import offsets
from pandas.tseries.frequencies import to_offset
class TimeFeature:
def __init__(self):
pass
def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
pass
def __repr__(self):
return self.__class__.__name__ + "()"
class SecondOfMinute(TimeFeature):
"""Minute of hour encoded as value between [-0.5, 0.5]"""
def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
return index.second / 59.0 - 0.5
class MinuteOfHour(TimeFeature):
"""Minute of hour encoded as value between [-0.5, 0.5]"""
def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
return index.minute / 59.0 - 0.5
class HourOfDay(TimeFeature):
"""Hour of day encoded as value between [-0.5, 0.5]"""
def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
return index.hour / 23.0 - 0.5
class DayOfWeek(TimeFeature):
"""Hour of day encoded as value between [-0.5, 0.5]"""
def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
return index.dayofweek / 6.0 - 0.5
class DayOfMonth(TimeFeature):
"""Day of month encoded as value between [-0.5, 0.5]"""
def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
return (index.day - 1) / 30.0 - 0.5
class DayOfYear(TimeFeature):
"""Day of year encoded as value between [-0.5, 0.5]"""
def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
return (index.dayofyear - 1) / 365.0 - 0.5
class MonthOfYear(TimeFeature):
"""Month of year encoded as value between [-0.5, 0.5]"""
def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
return (index.month - 1) / 11.0 - 0.5
class WeekOfYear(TimeFeature):
"""Week of year encoded as value between [-0.5, 0.5]"""
def __call__(self, index: pd.DatetimeIndex) -> np.ndarray:
return (index.isocalendar().week - 1) / 52.0 - 0.5
def time_features_from_frequency_str(freq_str: str) -> List[TimeFeature]:
"""
Returns a list of time features that will be appropriate for the given frequency string.
Parameters
----------
freq_str
Frequency string of the form [multiple][granularity] such as "12H", "5min", "1D" etc.
"""
features_by_offsets = {
offsets.YearEnd: [],
offsets.QuarterEnd: [MonthOfYear],
offsets.MonthEnd: [MonthOfYear],
offsets.Week: [DayOfMonth, WeekOfYear],
offsets.Day: [DayOfWeek, DayOfMonth, DayOfYear],
offsets.BusinessDay: [DayOfWeek, DayOfMonth, DayOfYear],
offsets.Hour: [HourOfDay, DayOfWeek, DayOfMonth, DayOfYear],
offsets.Minute: [
MinuteOfHour,
HourOfDay,
DayOfWeek,
DayOfMonth,
DayOfYear,
],
offsets.Second: [
SecondOfMinute,
MinuteOfHour,
HourOfDay,
DayOfWeek,
DayOfMonth,
DayOfYear,
],
}
offset = to_offset(freq_str)
for offset_type, feature_classes in features_by_offsets.items():
if isinstance(offset, offset_type):
return [cls() for cls in feature_classes]
supported_freq_msg = f"""
Unsupported frequency {freq_str}
The following frequencies are supported:
Y - yearly
alias: A
M - monthly
W - weekly
D - daily
B - business days
H - hourly
T - minutely
alias: min
S - secondly
"""
raise RuntimeError(supported_freq_msg)
def time_features(dates, freq='h'):
return np.vstack([feat(dates) for feat in time_features_from_frequency_str(freq)])

226
utils/tools.py Normal file
View File

@ -0,0 +1,226 @@
import numpy as np
import torch
import matplotlib.pyplot as plt
import shutil
from tqdm import tqdm
plt.switch_backend('agg')
def adjust_learning_rate(accelerator, optimizer, scheduler, epoch, args, printout=True):
if args.lradj == 'type1':
lr_adjust = {epoch: args.learning_rate * (0.5 ** ((epoch - 1) // 1))}
elif args.lradj == 'type2':
lr_adjust = {
2: 5e-5, 4: 1e-5, 6: 5e-6, 8: 1e-6,
10: 5e-7, 15: 1e-7, 20: 5e-8
}
elif args.lradj == 'type3':
lr_adjust = {epoch: args.learning_rate if epoch < 3 else args.learning_rate * (0.9 ** ((epoch - 3) // 1))}
elif args.lradj == 'PEMS':
lr_adjust = {epoch: args.learning_rate * (0.95 ** (epoch // 1))}
elif args.lradj == 'TST':
lr_adjust = {epoch: scheduler.get_last_lr()[0]}
elif args.lradj == 'constant':
lr_adjust = {epoch: args.learning_rate}
if epoch in lr_adjust.keys():
lr = lr_adjust[epoch]
for param_group in optimizer.param_groups:
param_group['lr'] = lr
if printout:
if accelerator is not None:
accelerator.print('Updating learning rate to {}'.format(lr))
else:
print('Updating learning rate to {}'.format(lr))
class EarlyStopping:
def __init__(self, accelerator=None, patience=7, verbose=False, delta=0, save_mode=True):
self.accelerator = accelerator
self.patience = patience
self.verbose = verbose
self.counter = 0
self.best_score = None
self.early_stop = False
self.val_loss_min = np.Inf
self.delta = delta
self.save_mode = save_mode
def __call__(self, val_loss, model, path):
score = -val_loss
if self.best_score is None:
self.best_score = score
if self.save_mode:
self.save_checkpoint(val_loss, model, path)
elif score < self.best_score + self.delta:
self.counter += 1
if self.accelerator is None:
print(f'EarlyStopping counter: {self.counter} out of {self.patience}')
else:
self.accelerator.print(f'EarlyStopping counter: {self.counter} out of {self.patience}')
if self.counter >= self.patience:
self.early_stop = True
else:
self.best_score = score
if self.save_mode:
self.save_checkpoint(val_loss, model, path)
self.counter = 0
def save_checkpoint(self, val_loss, model, path):
if self.verbose:
if self.accelerator is not None:
self.accelerator.print(
f'Validation loss decreased ({self.val_loss_min:.6f} --> {val_loss:.6f}). Saving model ...')
else:
print(
f'Validation loss decreased ({self.val_loss_min:.6f} --> {val_loss:.6f}). Saving model ...')
if self.accelerator is not None:
model = self.accelerator.unwrap_model(model)
torch.save(model.state_dict(), path + '/' + 'checkpoint')
else:
torch.save(model.state_dict(), path + '/' + 'checkpoint')
self.val_loss_min = val_loss
class dotdict(dict):
"""dot.notation access to dictionary attributes"""
__getattr__ = dict.get
__setattr__ = dict.__setitem__
__delattr__ = dict.__delitem__
class StandardScaler():
def __init__(self, mean, std):
self.mean = mean
self.std = std
def transform(self, data):
return (data - self.mean) / self.std
def inverse_transform(self, data):
return (data * self.std) + self.mean
def adjustment(gt, pred):
anomaly_state = False
for i in range(len(gt)):
if gt[i] == 1 and pred[i] == 1 and not anomaly_state:
anomaly_state = True
for j in range(i, 0, -1):
if gt[j] == 0:
break
else:
if pred[j] == 0:
pred[j] = 1
for j in range(i, len(gt)):
if gt[j] == 0:
break
else:
if pred[j] == 0:
pred[j] = 1
elif gt[i] == 0:
anomaly_state = False
if anomaly_state:
pred[i] = 1
return gt, pred
def cal_accuracy(y_pred, y_true):
return np.mean(y_pred == y_true)
def del_files(dir_path):
shutil.rmtree(dir_path)
def vali(args, accelerator, model, vali_data, vali_loader, criterion, mae_metric):
total_loss = []
total_mae_loss = []
model.eval()
with torch.no_grad():
for i, (batch_x, batch_y, batch_x_mark, batch_y_mark) in tqdm(enumerate(vali_loader)):
batch_x = batch_x.float().to(accelerator.device)
batch_y = batch_y.float()
batch_x_mark = batch_x_mark.float().to(accelerator.device)
batch_y_mark = batch_y_mark.float().to(accelerator.device)
# decoder input
dec_inp = torch.zeros_like(batch_y[:, -args.pred_len:, :]).float()
dec_inp = torch.cat([batch_y[:, :args.label_len, :], dec_inp], dim=1).float().to(
accelerator.device)
# encoder - decoder
if args.use_amp:
with torch.cuda.amp.autocast():
if args.output_attention:
outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)[0]
else:
outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
else:
if args.output_attention:
outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)[0]
else:
outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
# self.accelerator.wait_for_everyone()
f_dim = -1 if args.features == 'MS' else 0
outputs = outputs[:, -args.pred_len:, f_dim:]
batch_y = batch_y[:, -args.pred_len:, f_dim:].to(accelerator.device)
pred = outputs.detach()
true = batch_y.detach()
loss = criterion(pred, true)
mae_loss = mae_metric(pred, true)
total_loss.append(loss.item())
total_mae_loss.append(mae_loss.item())
total_loss = np.average(total_loss)
total_mae_loss = np.average(total_mae_loss)
model.train()
return total_loss, total_mae_loss
def test(args, accelerator, model, train_loader, vali_loader, criterion):
x, _ = train_loader.dataset.last_insample_window()
y = vali_loader.dataset.timeseries
x = torch.tensor(x, dtype=torch.float32).to(accelerator.device)
x = x.unsqueeze(-1)
model.eval()
with torch.no_grad():
B, _, C = x.shape
dec_inp = torch.zeros((B, args.pred_len, C)).float().to(accelerator.device)
dec_inp = torch.cat([x[:, -args.label_len:, :], dec_inp], dim=1)
outputs = torch.zeros((B, args.pred_len, C)).float().to(accelerator.device)
id_list = np.arange(0, B, args.eval_batch_size)
id_list = np.append(id_list, B)
for i in range(len(id_list) - 1):
outputs[id_list[i]:id_list[i + 1], :, :] = model(
x[id_list[i]:id_list[i + 1]],
None,
dec_inp[id_list[i]:id_list[i + 1]],
None
)
accelerator.wait_for_everyone()
f_dim = -1 if args.features == 'MS' else 0
outputs = outputs[:, -args.pred_len:, f_dim:]
pred = outputs
true = torch.from_numpy(np.array(y)).to(accelerator.device)
batch_y_mark = torch.ones(true.shape).to(accelerator.device)
loss = criterion(x[:, :, 0], args.frequency_map, pred[:, :, 0], true, batch_y_mark)
model.train()
return loss
def load_content(args):
if 'ETT' in args.data:
file = 'ETT'
else:
file = args.data
with open('./dataset/prompt_bank/{0}.txt'.format(file), 'r') as f:
content = f.read()
return content