Jacob J. Walker's Blog

Scholarly Thoughts, Research, and Journalism for Informal Peer Review

My Email to my Appointed Mentor from UNISA

without comments

I was excited today to receive an email from Dr. Abraham Tlhalefang Motlhabane of UNISA who has been appointed to be my doctoral supervisor.  Although, while Dr. Motlhabane has an excellent background in science education, I hope he will have sufficient background in statistical methods to help me get beyond my current limitations, or that maybe other UNISA professors can also help.  After he emailed me, I wrote the following email, which I think is a good self-reflection of where I am on this project.

Hello,

I am excited to get started on this project.  While I am still working on the full proposal, attached is my original outline (Research Outline for the Large-Scale Automated Discovery of Correlations of National Characteristics) that I submitted for admissions.  Although, since the time of writing the outline I have have come to see the many flaws of the Pearson’s and Spearman’s coefficients, as their methodologies often have over-fitting (hence Type I errors), and this is especially the case with data that does not have a Gaussian distribution.  And since social science data often isn’t Gaussian, and that the relationships that may exist between 2 sets of variables may often be non-linear, I believe changing to non-parametric methods that can detect monotonic relationships is more appropriate for the data mining that I’m attempting to do.  Specifically, I believe that Kendall’s tau coefficient is probably best, although I’m open to Bayesian or other methodologies that I may not be as familiar with yet.

To give you a little background about myself, I have a broad set of interests.  My current job is as an administrator of an adult charter school in California, in which I focus a lot on the organization aspects, such as policies and procedures, scalability, etc.  But I originally started out as a “computer nerd”, and am still involved in ICT at both my job and, to an extent, regionally, as I helped to write the ICT educational standards for California.  I also am heavily involved at my work in data analysis, and have been diving into data science over the past several years, and am a member of the International Educational Data Mining Society.  My CV (which needs a little updating) is at http://jacobjwalker.effectiveeducation.org/cv

Here is what I see as my strengths, challenges, and needs in this process:

Strengths

  • I am a self-starter
  • I am creative and can think “outside the box”, as is evident by choosing UNISA
  • I can learn a lot on my own by reading academic articles, books, etc.
  • I have sufficient financial resources to purchase books that may contain information I need
  • I have a strong ICT background, being a recognized expert in Excel, and having strong database skills
  • I have already been working on conceptualizing and refining the idea and methods for this project for over a year.

Challenges

  • Much of my focus in life is on my job, which is very demanding, (and rewarding), as we are working on having a California-wide system of schools that can benefit low-income adults.  This at times will require me to pause my doctoral work, and hence you will probably see me work in “spurts”, where for a few weeks I will make tremendous progress, and then I will “go dark” and not communicate or proceed for several more.
  • I am not a statistical expert.  I have learned a lot from my own self-study, but I truly would like to be able to ask a lot of questions to statistical experts.  For instance, if you read my blog post that I wrote when I was still considering parametric methods, and was going down the wrong path of trying to eliminate outliers, instead of just recognizing that these “outliers” were happening because the underlying data was non-Gaussian.
  • I am becoming more proficient in Python and R, but don’t have nearly the expertise as I do in Excel, but I recognize Excel is not scalable nor able to do the crowd-sourcing of data that will be necessary for this project.  Hence I believe SageMathCloud is the best solution (which I already have purchased as subscription), because it can provide the best of both Python and R.

Needs from my Mentor

  • My biggest need is to have a mentor who has an expertise in statistical methods, to be able to ask questions of; especially one who has a broad understanding of non-parametric, non-linear, and Bayesian “regression” methods.
  • Having a mentor that has a background in Python and/or R would be nice. Although, given my general background in ICT, I am confident that I can learn any parts of this on my own in a timely manner.

I feel that my Research Proposal is 90% complete (which is not attached, as my current draft is not fully coherent due to changing methodologies and going away from my original idea of using Excel), and that while there might still be better methods to use than Kendall’s tau, it is sufficient for the initial research, and as long as I build a good database structure, other statistical methods can be applied later.

Post Revisions:

This post has not been revised since publication.

Written by Jacob Walker

February 19th, 2016 at 11:59 am

Leave a Reply

%d bloggers like this: