## A call for help about understanding Ordinary Least Squares (OLS) vs. Orthogonal Distance Regression (ODR) vs. Robust Regression

Just when I think I have my underlying mathematical knowledge sufficiently wrapped up to start to write Python code for my doctoral research, I find that there are new questions… In the current case, I was originally going to try and determine the strength of a linear or non-linear correlation by using Ordinary Least Squares, which is usually what is used to find the Coefficient of Determination. But when I started to look at regression functions in SciPy, I ran across Orthogonal Distance Regression (ODR), and when I started to try and research ODR more, I ran across the concept of robust regression. Now I’m trying to understand both of these concepts more, and I could use some help from someone who really understands this stuff, and can explain it in a more conceptual manner, so I can determine which statistical method is most appropriate for my research. Here is what I believe I understand so far:

Orthogonal Distance Regression (ODR) seems to be better than Ordinary Least Squares when there are measurement errors in both sets of variables that you are trying to see which function fits best (such as a line, hyperbola, etc.) In the case of my research, since it will be using preexisting data that will have various rigor of original data collection, it might be good to assume that either variable could have measurement errors, so this seems maybe to be the type of regression I should use… (And it seems fairly easy, because of the nice SciPy module!)

But, then when I ran across the concept of “robust regression”, which appears to be methods that help automatically adjust for outliers, I thought, well this too might be the type of statistical analysis I maybe should use, because it is likely that for various reasons many of the data sets that I wish to automatically search for will have outliers. (And I’m not entirely sure ODR isn’t already “robust”)

And, then as I searching for a professor in UNISA’s College of Science, Engineering, and Technology that I might be able to get some help with, I ran across Quantile Regression, which seems that it too may have some advantages and be worthwhile in using in my algorithm…

And then there is the issue, that I am sure I will have many data sets where there is a bias in a single direction, as is illustrated in the following graph, taken from a relevant Stack Overflow question:

In this graph, they eye can see the sine wave that “should” be the best fit within the tight red dots, but the regression algorithm in SciPy instead said it was the blue line, because there are more dots below the “true” sine wave than above. I have seen this same thing happen with long tail curves, where a lot of data will be between what appears to the eye to be the curve of best fit, and below it to the axis. (I wish I had a chart right now to show this also, but I will try to get one in the future).

In either case, I would like to be able to know how to deal with all of these. Maybe in the end, I will have my automated correlation discovery algorithm use multiple statistical methods, and then see which coefficient ratio is highest. And if I can wrap my mind around all these potential methods (and maybe more that I find), then I can potentially also see if I can explain them in a way for others to make them easier to understand!

#### Post Revisions:

- September 26, 2015 @ 06:09:26 [Current Revision] by Jacob Walker
- September 26, 2015 @ 06:09:26 by Jacob Walker
- September 26, 2015 @ 06:08:29 by Jacob Walker
- September 26, 2015 @ 06:08:29 by Jacob Walker
- September 26, 2015 @ 05:51:16 by Jacob Walker

[…] while back I wrote on this blog a “cry for help” about some different forms of linear regression… which given the fact that it was a […]

Some Initial Thoughts on using Least Median of Absolute Deviation for my Data Mining to Reduce Problems with Outliers at Jacob J. Walker's Blog22 Oct 15 at 10:59 am

[…] would like to be able to ask a lot of questions to statistical experts. For instance, if you read my blog post that I wrote when I was still considering parametric methods, and was going down the wrong path of […]

My Email to my Appointed Mentor from UNISA at Jacob J. Walker's Blog19 Feb 16 at 11:59 am