Considerations for Managing a Data Science Team
One of the most common mistakes that businesses make, is to view the dating mining process as a software development cycle. It is understandable why they would do this and make that assumption, as data mining projects are often initiated by software departments, with data that is generated by a software system(s) and analytics results that are fed back into that system. It is important to keep certain considerations in mind when managing a data science team. Both software development and data mining projects can have milestones that can be agreed upon and both measure success in an unambiguous manner. Managers are comfortable and familiar with the software development lifecycle, so it is their natural inclination to treat data mining projects similarly. These software managers look at the CRISP data mining cycle, quickly see how similar it appears to the software development cycle, and manage both types of projects in the same way.
This is a mistake however, because data mining is an exploratory undertaking that is closer to research and development than it is to engineering.
The outcomes of data mining projects are far less certain, and the results of a given step may change the fundamental understanding of the problem whereas software engineering projects strive to attain a clearly defined outcome (ideally). If a business engineers a data mining solution directly for deployment it can be an expensive and premature commitment, and one that is likely to fail or, at the very least, not meet expectations. Instead, analytics projects should prepare to invest in information to reduce uncertainty in various ways. Small investments can be made via pilot studies and throwaway prototypes. Data scientists should review literature to see what else is been done and how it has worked. On a larger scale, a team can invest substantially in building experimental testbeds to allow extensive agile experimentation. If you’re a software manager, this will look more like research and exploration then you are used to and may be more than you’re comfortable with. This is the correct path however.
Software Engineering Skills
- Ability to write efficient, high quality code from requirements
- Team members evaluated by amount of code written or number of bug tickets closed
- Individuals must be able to formulate problems well
- Individuals must be able to prototype solutions quickly
- Individuals must be able to make reasonable assumptions in the face of ill-structured problems
- Individuals must be able to design experiments that represent good investments and to analyze results.
Software skills versus analytic skills
Although data mining involves software, it also requires skills that may not be common among programmers. In software engineering, the ability to write efficient, high-quality code from requirements may be paramount. Team members may be evaluated using software metrics such as the amount of code written or a number of bug tickets closed. In analytics, it is more important for individuals to be able to formulate problems well, to prototype solutions quickly, to make reasonable assumptions in the face of ill structured problems, to design experiments that represent good investments, and to analyze results. In building a data science team, these qualities, rather than traditional software engineering expertise, are skills that should be sought.