“This really helped us understand how do we get patients to the right care at the right place in the right amount of time and also to understand the needs for personal protective equipment across the state — who had what and where were the gaps,” Henry said.
Cloud providers like Microsoft Azure are by their nature designed to expand and scale quickly and meet elastic demand. With the more than 60 datacenter regions around the globe — including three new regions announced this past May in Italy, New Zealand and Poland — Microsoft can shift traffic if a natural disaster or power outage affects capacity in one part of the world. Computing hardware to protect against demand spikes is stockpiled in warehouses around the globe, ready to be deployed to where it’s needed most.
By March the epicenter of that demand was in Europe, as countries such as Italy and Spain imposed nationwide lockdowns to slow the spread of the coronavirus. Around the same time, manufacturing issues in China and Southeast Asia due to the global health pandemic created a temporary disruption in the supply chain for certain datacenter hardware as dramatic spikes in usage began challenging computing capacity in some regions. In a Microsoft quarterly earnings call on April 29, the company said those hardware supply chain issues largely began resolving themselves late in the quarter that ended in March.
“The scope and the scale of the response to COVID-19 was completely unprecedented, in terms of how much of the world went digital inside a month,” said Mark Simms, a partner software architect who helped manage the COVID-19 response across Azure. “So the work that we had to do to get through the initial surge in demand and free up capacity for our customers to run critical health and safety workloads was also unprecedented.”
“We made some pretty profound changes in order to do the right thing, and we did them under a very short time frame,” Simms said.
Datacenter employees began working in round-the-clock shifts to install new servers while staying at least six feet apart. Microsoft product teams worked to find any further efficiencies to free up Azure resources for other customers. The company doubled capacity on one of its own undersea cables carrying data across the Atlantic and negotiated with owners of another to open up additional capacity. Network engineers installed new hardware and tripled the deployed capacity on the America Europe Connect cable in just two weeks.
Microsoft’s plan emphasized continuity of service for all its customers, but especially for those on the front lines of the COVID-19 response: health care providers, police and emergency responders, financial institutions, manufacturers of critical supplies, grocery stores and health agencies providing critical information to the public about how quickly the virus was spreading.
The Azure Global and Customer Experience engineering teams mobilized and monitored Azure services around the clock to ensure critical customers could continue to operate smoothly and meet new challenges posed by COVID-19. Employees from across the company volunteered to switch gears to help deploy urgent projects like WA HEALTH.
Microsoft’s Regional Government Emergency Response and Monitoring Solution, on which WA HEALTH was built, uses Microsoft Power Apps Portal and Microsoft Power BI, which run on Azure, to allow healthcare workers to quickly update counts of COVID-19 cases and critical resources at the end of a shift or whenever is convenient through a web portal or by mobile phone.
As the pandemic was unfolding and cases were surging, it allowed state and local health officials to use near-real-time dashboards — based on data from more than 100 acute care hospitals around the state — to coordinate responses. It continues to allow the state to monitor trends, as well as inform decisions such as how quickly counties can move toward safely reopening their economies, Henry said.
“This was a real-time moving event, and people needed to make decisions hour by hour and minute by minute,” said Gary Bird, a principal program manager for Microsoft Power Platform who worked to deploy WA HEALTH. “You really saw the whole company lean in across all fronts to make solutions happen.”
Balancing supply and demand
Once the pandemic hit, Microsoft’s plans to deal with unexpected spikes in usage and expand Azure’s computing resources kicked in. The company began adding new servers to the hardest hit regions and installing new hardware racks 24 hours a day. To protect the health of critical datacenter employees, Microsoft also quickly established social distancing requirements, provided protective equipment and implemented strict disinfectant policies.
Microsoft prioritized capacity for existing customers while also reserving capacity for first responders who needed to quickly scale life and safety services. It also expanded cloud support for non-profits working to protect people’s health during the pandemic.
Microsoft’s Azure and product teams worked around the clock to ensure that services like Teams, Office and Xbox could meet rapidly exploding demand from customers. Next, they looked for efficiencies across all of Microsoft services running on Azure to free up more capacity for external customers.
Microsoft Teams was the first and most obvious service to experience massive growth. But other Azure services that enable remote work, such as Windows Virtual Desktop, which saw its usage triple in one month, and Azure Active Directory’s Application Proxy had to scale those services dramatically as financial institutions, schools, call centers and other companies moved thousands of employees onto those platforms practically overnight.
Not only were new customers signing up, but suddenly existing users were relying on the tools to power every single meeting or interaction in their workday, said Mark Longton, a principal group program manager for Microsoft Teams.
“Teams went from a service that was cool and convenient and something that people realized was the future of communication to very quickly becoming a mission critical, can’t-live-without-it sort of thing,” Longton said. “Really what this did was accelerate us into the future.”
Microsoft Teams began using early data from China and Italy to plan for expected growth as the pandemic spread. As more countries went into lockdown, dozens of Redmond-based Microsoft employees gathered remotely each Sunday night to watch telemetry, look for bottlenecks and troubleshoot as unprecedented numbers of remote European workers began logging in first thing Monday morning.
“Normally, you find and fix issues organically as you grow. When you take software and put it under explosive growth – with services getting used an order of magnitude more in one day — you tend to find all of those in a really short period of time,” said John Sheehan, Microsoft distinguished engineer for Azure quality.
Once critical services were successfully scaled and stabilized, the company shifted to looking for efficiencies. Microsoft opened up performance data for all services running on Azure to engineers across the company, asking them to submit ideas that would allow Microsoft to provide more computing capacity to the wider pool of Azure customers.
In Teams, small tweaks that few customers would notice — lengthening the time it takes for the three dots to appear when someone else is typing in a Teams chat or disabling the feature that suggests a contact every time you type a new letter in the “to” field — made the system run much more leanly. Engineers rewrote the code to make video stream processing 10 times more efficient in a marathon push over a weekend.