Regression analysis. When prediction and forecasting are all fun and entertainment

posted in: AdMoRe-updates | 0

Have you ever thought about how scientists and experts predict the tendency of the behaviour of an event?

Do you think we can predict the number of trees necessary in order to obtain a certain quantity of wood? Or even, can we tell the relationship (if any) between the weight of the brain in a mammal and its body weight?

This week I attended a course on Statistical Model of Data at UPC Campus and, particularly, on Regression analysis, defined as the set of statistical processes for estimating the relationships among two or more variables.

First, I’ll tell you a curiosity!

A bit of history: the term ‘regression’ was coined by Francis Galton in the XIX century to describe a biological phenomenon. He studied the heights of descendants with respect to the height of ancestors and he found out that as time went by, the average height of individuals tended to increase. However, he observed a curious fact: the offspring of short parents was on average taller than its parents, while the offspring of tall parents was on average shorter than them. In other words, the height of offspring of exceptionally tall or short parents (who lie at the tails of the distribution) tended to lie closer to the centre, the mean, of the distribution (exceptionality was indeed receding). He called this behaviour Regression towards mediocrity, which later turned into Regression to the mean.

Over the time those expressions acquired a more general meaning in statistical contexts and nowadays regression analysis is widely used for prediction and forecasting, whenever we want to estimate the conditional expectation of a dependent variable given the independent variables or even if our interest is to understand which among the independent variables are related to the dependent variable and in which form.

Thus, when faced with a dataset, there’s a few steps we have to follow in order to perform a regression analysis. In order, we have

  1. to do an Exploratory Data Analysis (EDA), where we ‘explore’ the data and suggest hypotheses about the causes of the observed phenomena;
  2. to fit a model to the dataset, namely to find the coefficients that characterize our model (in the photo to the right we imagine the original dataset to be described by the model y = beta_0 + beta_1 x + epsilon -with epsilon being the errors- and our goal is to find the coefficients b_0 and b_1 of the fitted model);
  3. to validate our model, that is to check that our model does not breach some constraints depending on the fitted model chosen.

If at least one of these constraints is violated, then we have to formulate another model and to validate it again.

Luckily, there’s a tool called MINITAB that do most of the dirty work for us, thus the regression analysis becomes easier to perform and sometimes also entertaining: it is really challenging in fact to infer a valid model and it’s even satisfactory once you realise the guess you did was the right one.

I hope I convinced you that regression analysis is not that boring at the end. Moreover, it is a very powerful and indispensable tool to predict the tendency of phenomena related to several fields of study.

 

A photo of me along with other students attending the course at UPC Campus.

 

Follow Simona Vermiglio:

ESR Researcher at UPC·BarcelonaTech

I was born in Lecce, a town in the south of Italy. After a B.Sc. in Mathematical Engineering from Politecnico di Torino (PoliTo, Turin, Italy), I enrolled in the M.Sc. of the same subject area. During this academic program, I spent two years at the École Centrale de Nantes (ECN, France) shared among some classes of the 'Erasmus Mundus Master Course in Computational Mechanics', an internship and a joint master project between ECN and PoliTo. I finally received the master's degree at PoliTo in July 2016. Since September of the same year, I am a PhD student of the 'Erasmus Mundus Joint Doctorate: SEED' at Universitat Politècnica de Catalunya (UPC, Barcelona, Spain). In the AdMoRe project, I am ESR#3 and my PhD thesis is about V&V techniques of numerical methods applied to simulations within the acoustics and electro-magnetics frameworks. Besides numerical analysis and mathematical models and tools, I'm interested in number theory, game theory, machine learning and technology. Among the non-scientific interests, instead, there are Romance languages, interculturalism, animal rights, natural environment as well as playing volleyball, reading and travelling.