Episode 34: Slack and the Safety Dance of Chaos Engineering

Published: Oct. 31, 2018, 6 a.m.

b'In the early days, angry nerd corners on the Internet viewed Slack and some of its predecessors as, \\u201cOh, it\\u2019s just IRC. Now, you pay someone for it.\\u201d Many fell into that trap of wondering about what value such systems offered.The big differentiator? Slack is built as a collaborative business tool.\\nToday, we\\u2019re talking to Holly Allen, who helped make government software better while \\xa0serving as the director of engineering at 18F. Now, she\\u2019s a senior engineering manager at Slack, a collaborative chat program where you can do most of your work through a rich platform of integrations. Holly enjoys taking a weird set of skills that make a computer do things and convincing people who know how to make computers do things do things.\\nSome of the highlights of the show include:\\n\\nSafety engineering brings chaos and resilience engineering, incident management, and post-mortem processes together for resiliency and reliability\\nSlack strives to move really fast while being in complete control\\nSlack is primarily on AWS, but is working on a multi-Cloud strategy because if AWS is down, Slack still needs to work\\nSlack has a close relationship with AWS and is a collaborative company; it has immediate access to AWS staff anytime there\\u2019s a problem\\nSlack uses Terraform and Chef and working to determine if its production workflows in Kubernetes would be worthwhile\\nDisasterpiece Theater: Real scenario that might happen and surmise what will happen; don\\u2019t cause production issues, but teach Slack employees\\nSlack hires collaborative, empathetic people to create a collaborative environment where everyone works together toward a goal \\nSlack was firmly in a centralized operations model, but is transforming toward development teams to increase responsibility and service ownership\\nSlack doesn\\u2019t encourage remote work because it\\u2019s not in a position to put in that investment; day-to-day work happens in hallways and between desks\\nSlack sees itself as an enterprise software company; an enterprise software company must have enterprise software reliability, stability, and processes\\nSlack has thousands of servers, so events and disruptions happen more often; system needs to respond, react, and repair itself without human intervention\\n\\nLinks:\\n\\nHolly Allen on Twitter\\n18F\\nSlack\\nFreenode IRC\\nHipChat\\nAWS\\nKubernetes\\nTerraform\\nChef\\nQCon\\nDatadog'