Imagine you have data like the picture below. Where there are 100 instances of ID, Date, Value (IDR), and Expense. They are the data of your expenses for 100 days simultaneously. Can you use that data to predict your expenses for the next day? Or even, can you do that to predict the expense until the end of the month?

Training Data

The answer is yes.

Testing Data

For example, if you have testing data like the picture above. The first thing to do, you have to input your Balance. If today is Sept 6, you put your Expense of today (Travel) and the Value (12000).  Then, given history of 4 previous days, the model will give expenses Value prediction until the end of the month, and tell you if your financial is healthy or not. Like the picture below:

Financial Health

Financial Unhealth

Curious how to do that? Let’s look into that. We will need 3 files to do that. You can see all the files and document needed in my Github.

fileconfig

  1. For the first file, we called it The function of this file is to store every value parameter and path of files that we need to configure the model.
  2. The first configuration is to configure the training. We need two variables for the configuration here. They are the path of training data and timestep of out data that we want (I will explain more about timestep in the second file).
  3. We have to set the directory of testing data and the saved model, too. In this part, I showed you two ways to represent a directory as a variable. You can download the training and testing data on Github link at the end of this post
    train = {
    'training_data_dir' : "C:/Users/LENOVO/Jupyter Notebook/Data/Expense/Data dummy Daily Expense.csv",
    'timestep' : 5
    }
    
    main = {
    'testing_data_dir' : "C:/Users/LENOVO/Jupyter Notebook/Data/Expense/Data dummy Daily Expense_testing.csv",
    'model_dir' : r"C:\Users\LENOVO\Jupyter Notebook\Money Prediction\model.h5"
    }

Train

  1. Import all libraries we need
    import pandas as pd
    import fileconfig as cfg
    from keras.models import Sequential
    from keras.layers import LSTM
    from keras.layers import Dense
    from numpy import array
  2. Read the training data from config file
    df=pd.read_csv(cfg.train['training_data_dir'])
  3. Store column Values and Date as an array. This is the first thing that we have to understand regarding time-series data. From our Date column we have %year-%month-%date. We have to think carefully which feature that we should take as an input for this case. If we take all features of the date and we only have 100 instances data, then it is not a good choice. In this case, the best option is only to use the %date, which shows why we use [-2:] in the code
    Values = df['Value']
    Dates = df['Date'].str[-2:]
    Values = Values.values
    Dates = Dates.values
  4. Every Values and Dates is an independent column, which mean they still have no relationship. To make a relation between them, we have to convert them into time-series data that ready for training. This function will help to make Values as a time series data. We also create the target variable using this function.
    def split_sequence(sequence, date, n_steps):
       X, y = list(), list()
          for i in range(len(sequence)):
             end_ix = i + n_steps
             if end_ix > len(sequence)-1:
                break
             seq_x, seq_y = [date[i:end_ix], sequence[i:end_ix]], sequence[end_ix]
             X.append(seq_x)
             y.append(seq_y)
          return array(X), array(y)
  5. We only use Value to create our time-series data, which mean we only have 1 feature. Now, to add Date as our feature, we use this function below
    def combine_X(sequence):
        result = list()
        for subseq in sequence:
            seq = list()
            for s in range(len(subseq[1])):
                #print(subseq[0][s], subseq[1][s])
                seq.append([subseq[0][s], subseq[1][s]])
            result.append(seq)
        return array(result)
  6. Set the number of steps, call function on number 4 and 5, then reshape the input data to make it ready for training
    n_steps = cfg.train['timestep']
    X, Y = split_sequence(Values, Dates, n_steps)
    X = combine_X(X)
    n_features =  X.shape[2]
    X = X.reshape((X.shape[0], X.shape[1], n_features))
  7. Create the model. In this post, we learn how to process a timeseries data, therefore we would not focus on how complex the model is. One layer of LSTM and on layer of Dense is enough for us to train our model
    model = Sequential()
    model.add(LSTM(50, activation='relu', input_shape=(n_steps, n_features)))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse')
  8. Fit and save the model
    model.fit(X, Y, epochs=20, verbose=1)               
    model.save(cfg.main['model_dir'])

Main

  1. Import the package we need
    import time
    import fileconfig as cfg
    import pandas as pd
    import datetime as dt
    import numpy as np
    import matplotlib.pyplot as plt
    from numpy import array
    from keras.models import load_model
  2. Load the trained model
    model = load_model(cfg.main['model_dir'])
  3. We need Date and Value from the testing data, but they are still in independent columns so this function will take 5 latest Date and Value, then it will represent them in time-series from. Basically, we use the data from date 1st until 5th to predict the expenses on date 6th, then we use the data from date 2nd until 6th to predict the expenses on date 7th.
    def get_testing_data(data):
        step = cfg.train['timestep']
        data = data.tail(step)
    
        date = data['Date'].str[-2:]
        value = data['Value'].values
        result = list()
    
        for i, j in enumerate(date):
            result.append([j, value[i]])
        return result
  4. In this function, basically want to get the date of today by counting the number of expenses history
    def get_history_length(data):
        his_len = data.shape[0]
        return his_len
  5. Predict using the model and return the output (predicted Value for tomorrow)
    def prediction(X_input):
        X_input = X_input.reshape((1, len(X_input), len(X_input[0])))
        yhat = model.predict(X_input, verbose=0)
        return yhat
  6. This function is like the controller of prediction(X_input). In this function, the data before prediction is being prepared. This function also appends the predicted data into the testing data. It returns the predicted Value for the next day and all predicted data.
    def predict_rest(X_test):
        n_previous_data = cfg.train['timestep'] * (-1)
        value_pred = prediction(array(X_test[n_previous_data:]))
        X_test.append([int(X_test[-1][0])+1, value_pred])
        return value_pred, X_test
  7. Return the date with format YYYY-MM-DD, so we can use to store the Date for predicted data
    def get_today_date(yesterday_date):
        string_date = time.strftime("%Y-%m-"+str(format((yesterday_date+1), '02d')))
        return string_date
  8. main() – Part I. Read the testing data, check number of date, take expense history, and take input of balance.
    expense_hist = pd.read_csv(cfg.main['testing_data_dir'])
    expense_len = expense_hist.shape[0] 
    expend_df = expense_hist[expense_hist.Expense.notnull()]
    balance = input("Insert Your Balance (For simulation only):")
  9. main() – Part II. The core of main() is this loop which doing loop to predict Value for the next day until the end of the month
    for i in range(expend_df.shape[0]+1, expense_len+1):
  10. main() – Part III. If you notice, line 3 in main() – Part I has a similar purpose with line 2 in main() – Part III. Yes, the purpose is to take all-expense history. I could not find more efficient way to do this. If you can think of any solution, you can share with me.
            print("Date: ",i)
            expend_df = expend_df[expend_df.Expense.notnull()]
            money = expend_df['Value'].values
            history_len = get_history_length(expend_df)
            unpredicted_len = expense_len - history_len
            today_date = get_today_date(history_len)
  11. main() – Part IV. Take input of expense (String) and value (int). Stop all process if Expense is blank
            expense = input("What's your expense? (Blank to STOP)")
            if not expense.strip():
                break
            value  = input("How much you spend? ")
  12. main() – Part V. append the expense history into expend_df and append the value into history (this variable contains only value, we used it for plotting the diagram). We also called the function get_testing_data().
            expend_df = expend_df.append(pd.Series([(history_len+1), today_date, expense, int(value)], index=expend_df.columns), ignore_index=True)
            money = np.append(money, [int(value)])
            predict = get_testing_data(expend_df)
  13. main() – Part VI. Predict the expense until the end of the month
            money_temp = 0
            for i in range(unpredicted_len):
                value_pred, predict = predict_rest(predict)
                money_temp += int(value_pred)
                money = np.append(money, [int(value_pred)])
  14. main() – Part VII. Show plot of the diagram. We create a variable date separately, just to make it easy. If until the end of the month, the balance is smaller than total predicted value then your financial condition is unhealthy, therefore you will get the red-line for your diagram. Otherwise, your financial condition is healthy and you get green-line for your diagram.
            date = [i for i in range(1,32)]
            if(int(balance) < money_temp):
                print("Your Financial Condition is Unhealthy")
                plt.plot(date, money, 'r')
            else:
                print("Your Financial Condition is Healthy")
                plt.plot(date, money, 'g')
            plt.show()
  15. The system will work like the video below

But the thing you need to remeber is, this is still a playground, we have more things to learn 🙂

You can see the full code in this link (https://github.com/muhammadfhadli1453/Time-Series-Data-Financial-Prediction)

Categories: Python

Leave a Reply

Your email address will not be published. Required fields are marked *