## Archive for May 8th, 2017

## The Four Major Activities of Data Science / Machine Learning

Recently there was a post on LinkedIn by Erle Hall, lead for the Information and Communication Technologies (ICT) for the California Department of Education (CDE) with a diagram about machine learning. That diagram had 6 steps: Select Data, Model Data, Validate Model, Test Model, Use the Model, and Tune Model. Those 6 steps mostly encapsulate what traditionally has been called the “data mining” phase. But there are 3 other important phases, which I will call “data surfing”, “data wrangling” and “data artistry”. (These names were chosen to be easier to understand and more interesting for students, but also go by different names) I also personally prefer to use the term “algorithm” instead of “model”, because while traditionally in data science, statistical models were used, there are now often times methods like neural networks and other such algorithms that are less like a traditional statistical model. In the next few posts, I’ll dive into each of these 4 steps, and give a basic explanation of what each step does, and why the step is important.