2013-06-18, 4, Batya, 3 - Copy (small) (640x480)


The term Data Scientist was initially coined and used in California and New York at the same time in 2010. At that time, Hillary Mason and a group of people wrote a job process list as they were not able to find one anywhere. This list goes as follows: Obtain, scrub, explore, model, and interpret.

People have been preforming the function of a data scientist for quite a few years, but it has just recently been given a special name.

Data Scientists or a data scientist team needs to have a blend of four main talents. These talents provide the foundational structure to be able to be successful. They are Mathematician, Software Engineer, Statistician, and Business Communicator Data scientists are found in the center of a Venn diagram.                                                                             data scientist ven diagram

The engineer component of the team needs to be proficient with data analysis, security, visualization, and quality. They also need to be comfortable with large amounts of unstructured data and be able to organize it in a consumable way.  They must also have a technical background that includes knowledge of Hadoop, are able to write in many programming languages, and are willing to learn how to use technology that will aggregate the data and use machine learning to help guide them to better predictive outcomes.

The business interface aspect of a data scientist needs to understand both technical and business language. They act as the interpreter between the business and the rest of the scientific team. They are responsible for making sure that the data science team understands what the business people are saying and requesting. They are also responsible for helping the business team understand the information that the data science team unearthed. They also need to know the business well enough that they understand what the company does and where it wants to go so they will know which questions are irrelevant and which ones help solve the true business problems.

When it comes to the mathematics and statistical components of the group my research skills have not been as helpful. I attempted to read several descriptions of mathematicians and statisticians including Wikipedia and Dictionary.com. None of them seemed to help increase my understanding of these two jobs.

I know how to do math and I use it frequently to figuring out how many teaspoons I need to use in cooking when I double a recipe or cut it in half. Beyond cooking, sewing, and shopping I much prefer to let someone else enjoy.

I finally found a couple of short and sweet broad generalities about the differences between statistics and mathematics. Here is the short and sweet of .

Mathematics – Deductive, very structured approaches and conclusions, technical terminology, and an old field of study.

Statistics – Inductive, few definitive approaches or conclusions, everyday phrases gain technical meanings, have to be described in layman’s terms, and it is a much newer field of study.

How these exactly fit into what a data scientist does, I’m still not sure but that’s okay because it means I have more to learn.

If you would like to look at a chart of how much time data scientists in different businesses spend focusing on the four different aspects take a look at Data Community DC.

Do you want to be a data scientist?