SBE BDD for Data Science

Data Science and Big Data are hot topics, and rightly so – companies can save huge sums, or dramatically increase sales, by analyzing the data that they already collect. Specification by Example is a great partner in this effort; using SBE correctly enables Data Scientists, Product Owners, and Developers to communicate clearly about Big Data and Data Science.

I spent two years working with a Data Science team at a large aerospace firm, training and coaching them in the best ways to write requirements and deliver automated tests; the teams with which I worked delivered maybe two bugs to Production in a year. The developers came to expect automated tests in-sprint – they would finish writing code, run the automated scenarios for both the new functionality and all of the old functionality, and wouldn’t open a pull request until all of those tests passed. It is really difficult to deliver a bug when you follow that process! You can find a discussion of that process on YouTube.

I spoke about this at XP 2021, explaining in detail how to write requirements for the ETL process and the data analysis. Clean data is essential for generating valuable insights, and SBE is ideally suited to creating the communication that lets the Data Scientists specify the cleaning process clearly and precisely, so the Developers can deliver exactly what the Data Scientists want. If you are a member of the Agile Alliance you can find the recording of my presentation there; you can find an interview regarding it on YouTube – or contact me and I would be happy to discuss it.

