Artificial Intelligence (CO3519) Tutorial 3.4: Calendar Week 50

Modelling by regression using statsmodels

Apply regression using statsmodels as discussed from the lecture and presented on two example Jupyter Notebooks (correlations.ipynb, bundesliga.ipynb) to a data set consisting of 50 x-y pairs.

In particular,

  1. Split the data set into training, validation, and test data. (For example, at a ratio 32:9:9.) Attempt a linear regression of the type y = ax + b, using the training data set only.
  2. Construct at least two other candidate models, one based on a quadratic equation y = ax2 + bx + c and another one based on a hypothesis of your choice. For developing each of these candidate models, only the training data set should be used.
  3. Determine the root mean square deviation of each of your three models from the validation data. Select the model that performs best during validation as your final model.
  4. Test the final model, using the test data set only. Determine the margin of error from an appropriate measure such as two times the root mean square deviation between predicted and actual values of y for the test data.

Submission deadline: 22nd January 2022; discussion planned for 12th February 2022. Group work by up to four people is welcome.