Case study of migrating Mattermost installations to the new Custom Resource.
Read MoreOverview In the daily life of a Site Reliability Engineer the main goal is to reduce all the work we call toil. But what is toil? Toil is the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows. This blog post describes our journey to automate our nodes rotation process when we have a new AMI release, and the open source tools we built on this.
Read MoreWith access to the Enterprise source code, the developer build tooling now automates the setup of Prometheus and Grafana for performance monitoring. Even the canonical Grafana dashboards are setup without any manual configuration required!
Read MoreLanguages are complicated, and every language is complicated in different ways that can be hard to understand without learning every single one of them. Some languages form words from multiple characters while others have symbols that represent entire concepts. Some feature words without pluralization or gender and rely on context for that while others have two or even more genders for words. Some are very phonetic while others pronounce words seemingly at random (cough English, though cough).
Read MoreOptimizing SQL queries is always fun, except when it isn’t. If you’re a MySQL veteran and have read the title, you already know where this is heading 😉. In that case, allow me to regale the uninitiated reader. This is the story of an (apparently) smart optimization to a SQL query that backfired spectacularly and how we finally fixed it. Act I: A slow query It started off with a customer noticing that a SQL query was running slowly in their environment.
Read MoreAt the start of implementing Docker Content Trust in our workflow, I thought it shouldn’t take so long. I thought and of course I was wrong. The following is the boiled down version of what I learned and wished for starting out. Prerequisites Docker version: 19.03.12 root *.key + passphrase for the Docker Content Trust delegation/signer private key *.key + public key *.pub + passphrase for the delegated person/bot, who should sign the repository/image:tag Please make sure you have your keys backed up and versioned.
Read MoreHave you ever wondered how many active users your application can handle at the same time? If so, you’re not alone. Here at Mattermost we’re building a highly concurrent messaging platform for team collaboration that needs to potentially serve up to several thousands of users simultaneously. While standard functional testing (e.g. unit tests) is critical to verify correct behavior of your application, it’s usually not sufficient to guarantee its performance at scale.
Read MoreIt’s been more than a year and a half since we started using Cypress for our automated functional testing and it has been worth the investment. It has now become an essential part of our process to automate regression testing to ship new releases faster, with increased quality. It’s fun and easy to get started with Cypress but as we added more scripts with the varying requirements, we faced several setbacks and hurdles, such as flaky tests, which slow down our efforts in automating test cases.
Read MoreUsually organizations use an internal network to prevent unauthorized people from connecting to their private network and by using their own network infrastructure/connectivity they can maintain their desirable level of security for their data. But it would be convenient for users to connect to that private network while they are away from the office’s building, on their own internet connection. To solve that problem, VPN (Virtual Private Network) is used to allow authorized remote access to an organization’s private network.
Read MoreWhat is distributed tracing? Large-scale cloud applications are usually built using interconnected services that can be rather hard to troubleshoot. When a service is scaled, simple logging doesn’t cut it anymore and a more in-depth view into system’s flow is required. That’s where distributed tracing comes into play; it allows developers and SREs to get a detailed view of a request as it travels through the system of services. With distributed tracing you can:
Read More