a quick pitch for Concourse
Software development is hard, working with other people is hard. Making sure you never skip any steps is hard, reminding other people to not skip any steps is harder. If you are really strict about it, you are a jerk, if you are not, you are a vindictive jerk. The airline industry solves this with checklists, but something about office workers really resists a checklist. It is an admission that you do not know everything, or that your contribution is fungible. Gloomy assertions aside, I have had good results with computer-run checklists.
Continuous integration, or ‘making a computer run the tests’, is table stakes for responsible professional software development. This will work, and be reliable, and is a reasonable place to stop, but there are a bunch of things around the edges that can overwhelm. The most common problems are testing too late, unspecified or underspecified build environments, and management of the build system.
The problem dearest to my heart is that testing the ‘master’ branch is testing too late. That branch is where other developers start when they go to fix a bug, or add a feature. Starting from a broken place can waste an enormous amount of time, and is the sort of thing that leads to hurt feelings. Instead, we should test every branch, as soon as possible, and publish those results.
At a the most basic level, this gives developers a responsible robot, never forgetting to run the tests. There are social benefits too- it means that when another developer goes to review a pending change, they don’t have to go and double check that the checklist has been followed. This makes it easier to concentrate on higher level concerns, and prevents embarrassing “that can’t work” moments that erode a team’s trust.
The management and evolution of the build system should not be an afterthought. It is a crucial part of your workflow, it can be an enormous force-multiplier, and it should be treated with the same care and process as a production environment. Many build systems were built with manual administration through a web ui as a first class citizen, and automation as a second class citizen, and it shows when you try to be rigorous with them. I can be a little dogmatic on this point, but all of your configuration should come from source control. The build system is no exception. Your team knows- or really should know- how to use source control to look at why things changed, your team can use source control to revert changes precisely, your team should use source control to review changes to the build code, and your team should be aware of and reviewing changes to the build code.
Now that I have staked out a vague, philosophical stance, what about concrete recommendations? You should look at Concourse. It is not an out of the box solution for all build problems, but it is quite serious about solving the build management problem, and the rest falls out of that.