Jacob J. Walker's Blog

Scholarly Thoughts, Research, and Journalism for Informal Peer Review

The Four Major Activities of Data Science / Machine Learning

without comments

You are viewing an old revision of this post, from May 9, 2017 @ 09:20:16. See below for differences between this version and the current revision.

Recently there was a post on LinkedIn by Erle Hall, lead for the Information and Communication Technologies (ICT) for the California Department of Education (CDE) with a diagram about machine learning.  That diagram had 6 steps: Select Data, Model Data, Validate Model, Test Model, Use the Model, and Tune Model.    Those 6 steps mostly encapsulate what traditionally has been called the “data mining” phase.  But there are 3 other important phases, which I will call “data surfing”, “data wrangling” and “data artistry”.  (These names were chosen to be easier to understand and more interesting for students, but also go by different names)  I also personally prefer to use the term “algorithm” instead of “model”, because while traditionally in data science, statistical models were used, there are now often times methods like neural networks and other such algorithms that are less like a traditional statistical model.  In the next few posts, I’ll dive into each of these 4 steps, and give a basic explanation of what each step does, and why the step is important.

Post Revisions:


There are no differences between the May 9, 2017 @ 09:20:16 revision and the current revision. (Maybe only post meta information was changed.)

Written by Jacob Walker

May 8th, 2017 at 11:59 am

Leave a Reply

%d bloggers like this: