De Facto Data Science Life Cycle

Successful data science projects go through seven stages (Figure 1).

Figure 1: The Seven Stages of the De Facto Data Science Life Cycle

Data scientists enter projects which are often the aftermath of an organization’s attempts to add analytics into existing business processes. In project kick off meetings phrases like: “Big data is too big!” and responses to simple questions like “What do you want from your big data project?” that come back “I have no idea even what questions to ask!” indicate that a prior project has convinced the client that the outside help of a data scientist might be useful.

Stage 1: Client drowning in data … is a result of attempting analytics. And analytics bring new to the team uncertainties: compute requirements, new relationships between IT and marketing, new to the company data use models, new to the company mathematical models, new visualizations, … etc., etc., etc. Analytic uncertainties often combine to discourage first analytics attempts. Why?

Because as IT, marketing, and operations, people are attempting to deal with analytic uncertainties, day-to-day business processes begin to fall behind. If you think about it, this makes perfect sense, these resources were not scoped to do analytics. Scoping to day-to-day operations is the norm, and analytics expand resources required.

So while the kick off of a data science project may appear to be a “mess”, as long as the resources can be scoped-up to handle analytic uncertainties, there is no reason a mess can’t be transposed into a masterpiece.

Stage 2: Analytics ready data … is the result of bringing properly scoped “big iron” (often cloud-based) resources to bear. Data engineers with modern tools can build a schedule to provide analytics ready data, and deliver on that schedule.

Stage 3: Enrichment … is the result companies being bombarded with the amazing claims for analytics. Suppliers promise the sky. Trade association meetings demonstrate projects from other companies. Senior management pushes for answers not only to “What happened?” operational debriefs, but also “Why did it happen?” debriefs. The pandamonium of positive press for analytics gives everyone in the organization ideas of additional data to collect and use in predicting KPIs (Key Performance Indicators). For example: Weather data, Tourist activity, Zip code data, industrial activity, etc., etc., etc.

Stage 4: Models … is where data scientists want to live. We relish mathematics helping business decision makers cut through the noise of their markets to show the competitive dynamics.

Stage 5: Clients sees results … is where data scientist step Business Decision Makers (BDMs) through the analysis that has been done, and watch/listen as the client reacts. Reactions range from “This is trivial”, “Not useful”, at the beginning of a project, to “… gasp … ‘This is going to be awesome.’” near the end of a well-crafted project.  

Recycling … in a well managed project, Stage 4 and Stage 5 cycle back and forth several times. This recycling is a result of BDMs seeing results, and the results giving BDMs new ideas of what may be possible in the project. A good data scientist will watch BDMs carefully and develop new ideas about what the BDM would like to see.

New ideas feed new models into the analysis, generating more new ideas, until after several iterations (10 or so) BDMs and data scientist align around the business value to be captured by the project. This produces …

Stage 6: “Aha!” discovery … literally every project I have worked on produces a new revelation as BDMs and data science recycle back and forth. These revelations are often larger in profit impact, than the original project that was envisioned in Stage 1: Client drowning in data.

Stage 7: Project implementation in refined processes … often the most difficult stage in a data science project is implementation. Again, because corporate resources for IT and people are scoped to pre-existing business processes. And analytics opportunities, usually require more resources to capture more margin.

This data science life cycle is by no means perfect. It is simply what I have experienced as a data scientist working over 20 years across consumer goods, industrial, software, and robotics companies. Comments are welcome at if you would be willing to share your thoughts.