The Agile Database Techniques Stack: Bridging the Agile/Data Cultural Divide Presented by Scott Ambler Monday, July 25, 106 at Agile2016 in Atlanta, Georgia
Review by Joe Bernardini
One of the best, most enlightening sessions I attended at Agile 2016 was Scott Ambler’s session on “The Agile Database Techniques Stack.” The theme of the presentation was the difference, or as Scott stated, the chasm between the Dev Ops development process and the processes used in data warehouse / business Intelligence applications.
This chart* graphically showed this point:
*Source: Agiledata.org/essays/culturalImpedanceMismatch.html © Disciplined Agile Consortium
The question then is: “Why is there this chasm, and how do we as DW/BI professionals close this gap?”
Here’s Why:
- Rigid data organizations
- Developers don’t speak data, and data people don’t know development
- BRUF – Big Requirements Up Front
- Lack of Data Testing – data not treated as the asset as it is described
- Traditional Change management is actually change prevention, stifling Continuous Integration
- Lost sight of real goal to fulfill stakeholder’s needs
- No strategy to address data quality issues
How We Can Fix This:
- Work Collaboratively – pair / mob development
- Embrace changing requirements
- Adapt usage driven development – data usage is more important than data structures, “user stories drive development, not data models”
- Deliver incrementally
- Pay down technical debt – refactor, work to change the T in ETL from Big T to Little T
- Be pragmatic – one version of the truth may not be an answer
- Create light weight and valuable artifacts
The Agile Database Techniques Stack, developed by Scott, includes the following:
- Vertical Slicing
- Clean Architecture and Design
- Agile Data Modeling
- Database Refactoring
- Database Regression Testing
- Continuous Database Integration
- Configuration Management
Vertical Slices of a Solution
Every iteration of an agile team produces a working solution. The functionality is added vertically, in vertical slices. He gave examples of slicing strategies for DW/BI: one new element from a single data source, one new data element from several sources, a change to an existing report, a new report, a new reporting view and a new data mart table.
Scott reviewed considerations or clean data architecture. I won’t go into that here, but will list them:
- Create Clean Data Architecture
- Consistency
- Solution Fit
- Latency
- Security
- Scalability
Then he discussed strategies for Clean Database Design(here’s a quick overview)
- Normalization understanding (are you storing data in one and one-only place?)
- Design for Database Type – OLTP (transactional -> normalized design) and OLAP (analytical -> denormalized design)
- Tables and columns should be cohesive
- Future Proof – Historical values, soft/hard deletes, unique surrogate key, all relationships many-to-many
- Set and follow naming conventions
Agile Data Modeling is a concept. Rather than starting development from a fully completed detail data model, the agile data modelling process flows from Conceptual Model to a Detailed Physical Model – where ddl is created and development starts, followed by rounds of iterative development and testing of data usage, after which a detailed data specification is completed. Data modeling, says Scott, is an act of exploring data-oriented structures. It should be evolutionary and incremental, focused on data usage, and should be done collaboratively.
High-level conceptual -> detailed physical (ddl) -> detailed specification
Scott listed and discussed several other agile database techniques. These included: refactoring and regression testing. Refactoring is simple change to a database schema that improves it design while retaining both its behavioral and informational semantics. He has a very good book on this, “Refactoring Databases,” that goes into a lot more detail. In regression testing, he pointed out that as agilists we develop regression unit and acceptance test suites for our applications, so why would we not also do this for databases? He also believes that database testing at the black-box level (which is just a starting point) is really not enough. He is supportive of Test/Behavior Driven Database Development as a practical approach to ensuring database quality, but it’s not a stand-alone activity. He also likes Continuous Integration and Databases, but did say that there is no good one tool solution and that the increase use of mocks may be required. In Configuration Management, Scott is a proponent of keeping all work products in a versioned repository, this includes database objects. He gave several good reasons why we should do it this way, including the fact that it maintains the integrity of the system and all supporting work products as they evolve.
Overall, Scott Ambler and this session, stimulated my thinking about the challenges in using Agile in a Data Warehouse / Business Intelligence environment. He was honest about the barriers in the way, but gave a vision and techniques to move BI into an Agile development world.
See the full description and obtain a PDF of his presentation: The Agile Database Techniques Stack: Bridging the Agile/Data Cultural Divide