Skip to main content

Command Palette

Search for a command to run...

Day 05: What is azure data centers and Service-level Agreements (SLA)

Updated
7 min read
Day 05: What is azure data centers and Service-level Agreements (SLA)
A

"I'm a 3rd-year Computer Engineering student at Marwadi University with skills in C++, web development (MERN stack), and DevOps tools like Kubernetes. I contribute to open-source projects and share tech knowledge on GitHub and LinkedIn. I'm learning cloud technologies and app deployment. As an Internshala Student Partner, I help others find jobs and courses." now currently focusing on #90DaysOfDevops

Azure Data Centers

  • Azure provides more than 100 redundant & secure facilities worldwide linked with a network.

    • Allows you to

      • gain global reach with local presence

      • keep your data secure and compliant with local laws

  • You can pick the region and sometimes availability zone you want resources deployed into.

    • ❗You can't select a specific datacenter or location within a datacenter.

Regions

  • Regions = Contains at least one, but often multiple datacenters that are nearby and networked together with a low-latency network.

    • Azure assigns and controls the resources within each region to ensure workloads are appropriately balanced.

    • E.g. West US, Canada Central, West Europe, Australia East, and Japan West.

  • ❗Some services or virtual machine features are only available in certain regions, such as specific virtual machine sizes or storage types.

  • Azure regions as of February 2020:

  • 💡Regions provide better scalability, redundancy, and preserves data residency for your services.

  • Read more: Azure regions

Special regions

  • For compliance or legal purposes.

  • Azure Government

    • US DoD Central, US Gov Virginia, US Gov Iowa and more

    • 📝 Physical and logical network-isolated instances of Azure for US government agencies and partners.

  • China East, China North and more

    • Unique partnership between Microsoft and 21Vianet

    • Microsoft does not directly maintain the datacenters.

Geographies

  • Each region belongs to a single geography

  • Defined by geopolitical boundaries or country borders.

  • Has specific service availability, compliance, and data residency/sovereignty rules applied to it

  • Fault-tolerant to withstand complete region failure through their connection to dedicated networking infrastructure

    • 📝 Fault-tolerance: App ability to self-detect and correct all types of problems in its environment
  • Data residency

    • Defines the legal or regulatory requirements imposed on data

    • Based on the country or region in which it resides

    • 💡 An important consideration when planning out your application data storage.

  • Geographies are broken up into the following areas

    • Americas

    • Europe

    • Asia Pacific

    • Middle East and Africa

  • Read more: Azure geographies

Availability Zones

  • 📝 Physically separate datacenters within an Azure region.

  • 💡 Allows you to make applications highly available through redundancy.

    • Replicate your compute, storage, networking, and data resources in other zones.

    • Costs more

    • Primarily for VMs, managed disks, load balancers, and SQL databases

    • Zonal services: Pin resource to a specific zone.

    • Zone-redundant services: Replicates automatically across zones.

  • Have independent power, cooling, and networking

  • Set up to be an isolation boundary

    • If one zone goes down, the other continues working
  • Identified as 1-2-3

    • Logically mapped to the actual physical zones for each subscription independently.

    • Availability Zone 1 in a given subscription might refer to a different physical zone than Availability Zone 1 in a different subscription.

  • Connected through high-speed, private fiber-optic networks.

  • ❗There are regions that do not support (multiple) availability zones

Region Pairs

  • Each Azure region is always paired with another region within the same geography

    • E.g. West US paired with East US, and South East Asia paired with East Asia
  • Pairs are at least 300 (≈ 500 km) miles away.

  • Allows for the replication of resources, e.g. virtual machine storage

    • Some services offer automatic geo-redundant storage using region pairs.
  • Reduce the likelihood of interruptions to both regions

    • E.g. natural disasters, civil unrest, power outages, or physical network outages
  • If one region fails, services automatically fail over to the other region in its region pair.

  • Data continues to reside within the same geography as its pair (except for Brazil South) for tax and law enforcement jurisdiction purposes.

  • If there's an extensive Azure outage =>

    • One region out of every pair is prioritized to make sure at least one is restored as quick as possible,
  • Planned Azure updates are rolled out to paired regions one region at a time to minimize downtime and risk of application outage.

Service-level Agreements (SLA)

  • Formal documents to define the performance standards that apply to Azure.

  • Specify also what happens if a service or product fails to perform to a governing SLAs specification.

  • There are SLAs for individual Azure products and services.

  • ❗ Azure does not provide SLAs for most services under the Free or Shared tiers

    • e.g. Azure Advisor
  • Three key characteristics of SLAs for Azure products and services:

    1. Performance Targets

      • Specific to each Azure product and service.

      • E.g. uptime guarantees or connectivity rates

    2. Uptime and Connectivity Guarantees

      • 📝 Monthly Uptime % = (Maximum Available Minutes-Downtime) / Maximum Available Minutes X 100

      • 📝 Range from 99.9% ("three nines") to 99.999% ("five nines") for any paid tier service.

        • In other words minimum SLA for all non-free Azure services are 99.9%
      • E.g. Azure Cosmos DB (Database) service SLA offers 99.999 percent uptime

        • meaning it allows for about 5 minutes of total downtime per year.

        • also includes low-latency commitments of less than 10 milliseconds on DB read + write operations.

    3. 📝 Service credits

      • Given to paying Azure customers if uptime percentage is lower than given in SLA.

      • Describe how Microsoft will respond if an Azure product or service fails to perform to its governing SLAs specification.

      • E.g. customers may have a discount applied to their Azure bill, as compensation for an under-performing Azure product or service.

  • Read more: SLA Summary for Azure Services

Composite SLA

  • Result of combining SLAs across different service offerings.

  • 📝 Calculating downtime

    • E.g. web app (99.95% SLA from Azure) writes to SQL database (99.99% SLA from Azure)

      • Composite SLA = 99.95 percent × 99.99 percent = 99.94 percent

        • \= 0.9995 * 0.9999 = 0.9994
      • Means combined probability of failure is higher than the individual SLA values

  • You can improve the composite SLA by creating independent fallback paths.

    • E.g. if the SQL Database is unavailable, you can put transactions into a queue for processing at a later time.

      • Web app (99.95%) writes to either SQL Database (99.99%) or queue (99.9%)

      • Application is still available even if it can't connect to the database.

        • ❗But it fails if both the database and the queue fail simultaneously.
      • If the expected percentage of time for a simultaneous failure is 0.0001 × 0.001

        • the composite SLA for this combined path of a database or queue would be:

          • 1.0 − (0.0001 × 0.001) = 99.99999 percent
      • If we add the queue to our web app, the total composite SLA is:

      • 99.95 percent × 99.99999 percent = ~99.95 percent

      • Improves SLA but application logic gets more complicated

        • You are paying more to add the queue support and there may be data-consistency issues you'll have to deal with due to retry behavior.

Application SLA

  • By creating your own SLAs, you can set performance targets to suit your specific Azure application.

  • 💡 >= four 9's (99.99%) SLA performance targets =>

    • manual intervention from failures may not be enough (difficult to be quick enough)

    • should have self-diagnosing & self-healing solutions.

Resiliency

  • Resiliency is the ability of a system to recover from failures and continue to function.

  • High availability and disaster recovery are two crucial components of resiliency

    • 📝 Disaster recovery: When Godzilla destroys your data center, you do have alternative locations to keep providing your service and protocols/means for the other location to know how to keep delivering the service.
  • Failure Mode Analysis (FMA)

    • Goal:

      • Identify possible points of failure.

      • Define how the application will respond to those failures.

  • Read more: Designing resilient applications for Azure

High availability

  • 📝 Availability is often given as percentage uptime

  • Refers to the time that a system is functional and working.

  • Most providers prefer to maximize the availability of their Azure solutions by minimizing downtime.

    • ❗ As you increase availability, you also increase the cost and complexity of your solution.

    • As your solution grows in complexity, you will have more services depending on each other.

      • You might overlook possible failure points in your solution if you have any interdependent services.

      • 💡E.g. a workload that requires 99.99 percent uptime shouldn't depend upon a service with a 99.9 percent SLA.

  • Read more: Availability choices for Azure compute

More from this blog

A

Anand Raval

118 posts

Hello I am Anand Raval , i contributed my work in robotics(arduino uno) , fronted web devloper,competitive programming, now currently focusing on #90DaysOfDevops