Operating a large distributed system in a reliable way: practices I learnedblog.pragmaticengineer.com378 pointsgregdoesit7 years ago