Osama's Tech Blog

Intelligent Load Balancing with APIM: Using Weight-Based Routing for Improved OpenAI Performance

Osama Shaikh — Mon, 08 Apr 2024 15:10:27 GMT

Ever Since launch of ChatGPT, demand for OpenAI GPT Models has increased exponentially.
Due such vast demand in short span of time, its been challenging for customer to get thier desired capacity in thier respective region

In that case my recommendation has been to deploy multiple OpenAI instance with S0 plan(Token-based-consumption) in any region where capacity is available & then used an load Balancer as fascade to distribute traffic across all your Az OPAI endpoints

Each OpenAI Models has its own limit called token limit which is measured based per minute i.e. TPM. In this case i am referring to GPT3.5 Model which comes with maximum TPM limit of 300K for an given region

Since there is surge in demand from customer all over world for LLM models like GPT, we have limited capacity of OpenAI instance in each region.

Many Customer specially D2C eventually get requirment of over 700-800K+ token for thier use case.
In that scenario customers tend to deploy OpenAI Instance with S0 plan across multiple location
Although TPM limit for all instance is not same because of capacity constrain overall

Following is real world example of customer which uses multiple OpenAI instance with different TPM limit for each region

OpenAI-EastUS :300k
OpenAI-WestUS: 240K
OpenAI-SouthUK: 150K
OpenAI-Southindia:100k
OpenAI-CentralIndia:50K

Why not use existing Load balancers & how it is different from other existing LB technique?

There are multiple load balancing option are now available using AppGW/FrontDoor or API management custom policies with algorithm based on roundrobin, random or even prirotiy based. Although there are some cons with each routing algorithm

AppGW/FrontDoor:Roundrobin pattern to Distribte traffic across OpeAI endpoints.
Can be used in scenario when all OpenAI instance has same TPM Limit.But it cannot do health probe of OpenAI endpoints if its TPM exhausted & received HTTP 429(Server is Busy) & it will continue to forward request to throttled openai endpoint

API management:
One differentiting capablities with APIM using its policies that it is aware about endpoint HTTP 429 status code and offers Retry mechanism which allows send traffic to another available endpoints from backend pool until current endpoints becomes active/healthy again.

APIM Policies for OpenAI:
Roundrobin: if you have multiple OpenAI endpoint across location in APIM backendpool, it simply select any of backend based on roundrobin provided its not being throtlled at that moment & it distribute traffic across sequentially doesnt honor OpenAI Endpoints TPM limits

Random: Its mostly similar with roundrobin method, primary differnence is that it selected backend endpoint randomly from backend pool. & its also doesnt honor openai TPM limit

Priority: In this case if you have endpoint across many location, you can assigned priotiy order sequence either based on TPM limit or latency from your base region,
But even then all traffic would always forwarded to endpoint which has lowest priority & rest of available openai endpoint would simply be waiting on standby unless lowest priority instance is throttled

Now in ths case customer requested for load balancing technique based on its TPM limit assigned against each OpenAI endpoint which should distribute traffic accordingly.

Weightage:

There is no direct feature capablities in APIM for weightage based routing.I have tried achieve same results using custom logic with APIM policies

Selection Process:
Backend logic used in this policy is based on weighted selection method to choose an endpoint route for retry.endpoint with higher weights are more likely to be chosen, but each endpoints route has at least some chance of being selected. This is because the selection is based on a random number that is compared against cumulative weights, which means the selection process inherently favors routes with higher weights due to the way cumulative weights are calculated and utilized. Lets break down the process with a simple example to clarify how routes with higher weights are more likely to be selected.

*in this image, its deplects APIM configured based on Weight policy across OpenAI

There are three variable used in selelction process for backend
1)Totalweight
2)Cumulative weight
3)Random Number

Example: Lets understand this policy using following based on above image
Assume you have five openai endpoints as mentioned in image with following weight

Endpoint A: Weight = 50
Endpoint B: Weight = 100
Endpoint C: Weight = 150
Endpoint D: Weight = 300
Endpoint E: Weight = 600

Step 1:Calculate Total Weight
First, you calculate the total weight of all endpoint routes, which in this case would be 50+100+150+300+600=1200.

Step 2: Generate Random Weight

Next, a random number (lets call it randomWeight) is generated between 1 and the total weight (inclusive). So, randomWeight is between 1 and 1200.

Step 3: Calculate Cumulative Weights
The cumulative weights are calculated to determine the ranges that correspond to each endpoint route. Heres how they look based on the weights:

Cumulative Weight after Endpoint A: 50 (just the weight of A,)
Cumulative Weight after Endpoint B: 100 (the weight of A + B, 50 + 100)
Cumulative Weight after Endpoint C: 150 (the weight of A + B + C, 50+100+150)
Cumulative Weight after Endpoint D: 300 (the weight of A + B + C +D, 50+100+150+300)
Cumulative Weight after Endpoint E: 600 (the weight of A + B + C + D +E, 150+100+150+300+500)
Weight Distribution Percentage Calcuation:
- Route A 50/1200100%=4.17%
- Route B 100/1200100%=8.33%
- Route C 150/1200100%=12.50%
- Route D 300/1200100%=25.00%
- Route E 600/1200100%=50.00%

Step 4: Select the Endpoint Route Based on Random Weight

The randomWeight determines which Endpoint route is selected:

If randomWeight is between 1 and 50, Endpoint A is selected (4.17% chance)
If randomWeight is between 51 and 100, Endpoint B is selected(8.33% chance)
If randomWeight is between 100 and 150, Endpoint C is selected(12.50% chance)
If randomWeight is between 150 and 300, Endpoint D is selected(25 % chance)
If randomWeight is between 300 and 600, Endpoint E is selected(50% chance)

OPAI TPM Exhausted Scenario:

In this image, An first endpoint is throttled with HTTP response code 429 & APIM is routing request to other available backend based on weight
When OpenAI endpoint starts getting traffic beyond its token capacity limit, it throws an HTTP Status code "429" which translate to that is 'server is busy'
In that situation APIM is configured with health probe of endpoint based on '429' using Retry-After logic
Once APIM gets HTTP 429 from its endpoint , it starts to priortize other endpoint in weightage order and forward most request next highest weight assigned endpoint
Meanwhile it continue to retry health probe request to actuall thortlled endpoint after waiting for 60 second

OpenAI PTU Tier (Provision throughput Unit ):

To Ensure OpenAI endpoint with PTU plan should be always be preferred. you would need adjust PTU instance endpoint in such way that it out weight other endpoint significantly, effectively making it the most likely choice under normal circumstances.

1. Significantly Increase PTU endpoint Weight: Increase the weight of PTU endpoint so high compared to Endpoint E,D,C that the random selection virtually always lands on PTU based OpenAI endpoint.In Above example highest weight was 600 for Endpoint E, we could set PTU endpoint Weight something like 5000 or even more.This would make PTU endpoint overwhelmingly more likely to be selected.
2.Preferential Selection: Modify the selection logic to check for PTU endpoint availability first and choose it by default, only falling back to other endpoint A/B/C under certain conditions (e.g., Route C is down or throttling). This requires a bit of custom logic in your policy.
3.Use Priority Method: If your system allows, Clubbed a Prority based system where routes are tried in order of priority rather than by weight. PTU would be given the highest priority, with endpoint A,B,C & D as fallbacks.

Setup Policy in APIM:

Create APIM instance with desired SKU if not exist already & enable managed identity on APIM
Make sure to have same model(GPT35/4) & deployment name across all OpenAI endpoint
Grant following Az RBAC "Cognitive OpenAI User" on All OpenAI resources via 'Access Control(IAM)' blade to APIM managed identity
Download Az OpenAI API version Schema and inetrgrate with APIM using Import, you could read more on OpenAI Integration instruction here
Once OpenAI API is imported, copy and paste this policy into APIM editor
Modify or Replace from line no #15-43 with your existing All OpenAI instance endpoint details
Assigned weight to each endpoint listed under routes as per its TPM Limits.
Update your OpenAI model deployment name and routes index array sequence at line no #47,48 also at line no #129
Perform Test run with APIM traces enabled to see policy logic in action

<policies>    <inbound>        <base />                <authentication-managed-identity resource="https://cognitiveservices.azure.com" />        <cache-lookup-value key="@("oaClusters" + context.Deployment.Region + context.Api.Revision)" variable-name="oaClusters" />                <choose>            <when condition="@(context.Variables.ContainsKey("oaClusters") == true)">                <set-variable name="oaClusters" value="@{                    JArray routes = new JArray();                    JArray clusters = new JArray();                    if(context.Deployment.Region == "West Europe" || true)                    {                        routes.Add(new JObject()                        {                            { "name", "openai 1" },                            { "location", "eastus" },                            { "url", "https://openaiendpoint-a.openai.azure.com/" },                            { "isThrottling", false },                             { "weight", "100"},                            { "retryAfter", DateTime.MinValue }                         });                        routes.Add(new JObject()                        {                            { "name", "openai 2" },                            { "location", "UK SOuth" },                            { "url", "https://openaiendpoint-b.openai.azure.com/" },                            { "isThrottling", false },                            { "weight", "150"},                            { "retryAfter", DateTime.MinValue }                        });                        routes.Add(new JObject()                        {                            { "name", "openai 3" },                            { "location", "Central india" },                            { "url", "https://openendpointai-c.openai.azure.com/" },                            { "isThrottling", false },                            { "weight", "300"},                            { "retryAfter", DateTime.MinValue }                        });                        clusters.Add(new JObject()                        {                            { "deploymentName", "gpt35turbo16k" },                            { "routes", new JArray(routes[0], routes[1], routes[2]) }                        });                    }                    else                    {                        //Error has no clusters for the region                    }                    return clusters;                   }" />                                <cache-store-value key="@("oaClusters" + context.Deployment.Region + context.Api.Revision)" value="@((JArray)context.Variables["oaClusters"])" duration="86400" />            when>        choose>                <cache-lookup-value key="@(context.Request.MatchedParameters["deployment-id"] + "Routes" + context.Deployment.Region + context.Api.Revision)" variable-name="routes" />                <choose>            <when condition="@(context.Variables.ContainsKey("routes") == true)">                <set-variable name="routes" value="@{                    string deploymentName = context.Request.MatchedParameters["deployment-id"];                    JArray clusters = (JArray)context.Variables["oaClusters"];                    JObject cluster = (JObject)clusters.FirstOrDefault(o => o["deploymentName"]?.Value<string>() == deploymentName);                    if(cluster == null)                    {                        //Error has no cluster matched the deployment name                    }                    JArray routes = (JArray)cluster["routes"];                    return routes;                }" />                                <set-variable name="totalWeight" value="@{                int totalWeight = 0;                JArray routes = (JArray)context.Variables["routes"];                foreach (JObject route in routes)                {                    totalWeight += int.Parse(route["weight"].ToString());                }                return totalWeight;                }" />                                <set-variable name="cumulativeWeights" value="@{                JArray cumulativeWeights = new JArray();                int totalWeight = 0;                JArray routes = (JArray)context.Variables["routes"];                foreach (JObject route in routes)                {                    totalWeight += int.Parse(route["weight"].ToString());                    cumulativeWeights.Add(totalWeight);                }                return cumulativeWeights;            }" />                                <cache-store-value key="@(context.Request.MatchedParameters["deployment-id"] + "Routes" + context.Deployment.Region + context.Api.Revision)" value="@((JArray)context.Variables["routes"])" duration="86400" />            when>        choose>        <set-variable name="routeIndex" value="-1" />        <set-variable name="remainingRoutes" value="1" />    inbound>    <backend>        <retry condition="@(context.Response != null && (context.Response.StatusCode == 429 || context.Response.StatusCode >= 500) && ((Int32)context.Variables["remainingRoutes"]) > 0)" count="3" interval="0">            <set-variable name="routeIndex" value="@{            Random random = new Random();            int totalWeight = (Int32)context.Variables["totalWeight"];            JArray cumulativeWeights = (JArray)context.Variables["cumulativeWeights"];            int randomWeight = random.Next(1, totalWeight + 1);            int nextRouteIndex = 0;            for (int i = 0; i < cumulativeWeights.Count; i++)            {                if (randomWeight <= cumulativeWeights[i].Value<int>())                {                    nextRouteIndex = i;                    break;                }            }            return nextRouteIndex;        }" />                        <set-variable name="routeUrl" value="@(((JObject)((JArray)context.Variables["routes"])[(Int32)context.Variables["routeIndex"]]).Value<string>("url") + "/openai")" />            <set-variable name="routeLocation" value="@(((JObject)((JArray)context.Variables["routes"])[(Int32)context.Variables["routeIndex"]]).Value<string>("location"))" />            <set-variable name="routeName" value="@(((JObject)((JArray)context.Variables["routes"])[(Int32)context.Variables["routeIndex"]]).Value<string>("name"))" />            <set-variable name="deploymentName" value="@("gpt35turbo16k")" />            <set-backend-service base-url="@((string)context.Variables["routeUrl"])" />            <forward-request buffer-request-body="true" />        retry>    backend>    <outbound>        <base />    outbound>    <on-error>        <base />    on-error>policies>

Policy Internals:

This policy incorporates several key configuration items that enable it to efficiently manage and route API requests to OpenAI ChatGPT instances. so Highlighting few of configurations can provide valuable insights to blog readers

<authentication-managed-identity resource="https://cognitiveservices.azure.com" /><cache-lookup-value key="@("oaClusters" + context.Deployment.Region + context.Api.Revision)" variable-name="oaClusters" />

For authentication managed identity is used in this case instead of keys, so first line enable policy to leverage manage identity.
In next block Attempts to fetch pre-configured OpenAI cluster information from the cache, significantly reducing the overhead of reconstructing the configuration for each API call.

 <choose>            <when condition="@(context.Variables.ContainsKey("oaClusters") == true)">                <set-variable name="oaClusters" value="@{                    JArray routes = new JArray();                    JArray clusters = new JArray();                    if(context.Deployment.Region == "West Europe" || true)                    {                        routes.Add(new JObject()                        {                            { "name", "openai-a" },                            { "location", "india" },                            { "url", "https://openai-endpoint-a.openai.azure.com/" },                            { "isThrottling", false },                             { "weight", "600"},                            { "retryAfter", DateTime.MinValue }                         });                        clusters.Add(new JObject()                        {                            { "deploymentName", "gpt35turbo16k" },                            { "routes", new JArray(routes[0], routes[1], routes[2]) }                        });                    }                    else                    {                        //Error has no clusters for the region                    }                    return clusters;

In choose block complex logic dynamically generates the configuration for OpenAI clusters if not found in the cache.
Depending on the deployment region, the policy constructs clusters with specific attributes (e.g., name, location, URL, weight) reflecting each OpenAI instance's characteristics and intended traffic handling capacity.
Each route is assigned a specific weight, dictating its selection probability for handling incoming requests, thereby facilitating a balanced and efficient distribution of traffic based on the defined capacities or priorities.
Defines the OpenAI model (in this case, gpt35turbo16k) to which API requests should be routed. This specification allows the policy to support different OpenAI models,By clearly defining the deploymentName.

<choose>            <when condition="@(context.Variables.ContainsKey("routes") == true)">                <set-variable name="routes" value="@{                    string deploymentName = context.Request.MatchedParameters["deployment-id"];                    JArray clusters = (JArray)context.Variables["oaClusters"];                    JObject cluster = (JObject)clusters.FirstOrDefault(o => o["deploymentName"]?.Value<string>() == deploymentName);                    if(cluster == null)                    {                        //Error has no cluster matched the deployment name                    }                    JArray routes = (JArray)cluster["routes"];                    return routes;                }" />

In this block it checks if the routes configuration for the current API request's deployment (e.g., a specific OpenAI model indicated by deployment-id) is already available within the context variables. If so, it proceeds to confirm and utilize these endpoint routes for further processing.

                <set-variable name="totalWeight" value="@{                int totalWeight = 0;                JArray routes = (JArray)context.Variables["routes"];                foreach (JObject route in routes)                {                    totalWeight += int.Parse(route["weight"].ToString());                }                return totalWeight;                }" />                                <set-variable name="cumulativeWeights" value="@{                JArray cumulativeWeights = new JArray();                int totalWeight = 0;                JArray routes = (JArray)context.Variables["routes"];                foreach (JObject route in routes)                {                    totalWeight += int.Parse(route["weight"].ToString());                    cumulativeWeights.Add(totalWeight);                }                return cumulativeWeights;            }" />

Both of these variable are key elements in functioning of this policy
Totalweight: by iterating over each route in the endpoint routes collection and summing up their weights, it represents the aggregate capacity of all endpoint routes. This total is crucial because it defines the range within which a random number can be generated to select a route proportionally based on its weight.
Cumulative weight:As the policy iterates through endpoint routes, it progressively adds each route's weight to a running total, creating a series of increasing values. This cumulative approach allows for determining which route to select by generating a random number within the totalweight range and finding the segment (route) where this number falls.

 <retry condition="@(context.Response != null && (context.Response.StatusCode == 429 || context.Response.StatusCode >= 500) && ((Int32)context.Variables["remainingRoutes"]) > 0)" count="3" interval="30">            <set-variable name="routeIndex" value="@{            Random random = new Random();            int totalWeight = (Int32)context.Variables["totalWeight"];            JArray cumulativeWeights = (JArray)context.Variables["cumulativeWeights"];            int randomWeight = random.Next(1, totalWeight + 1);            int nextRouteIndex = 0;            for (int i = 0; i < cumulativeWeights.Count; i++)            {                if (randomWeight <= cumulativeWeights[i].Value<int>())                {                    nextRouteIndex = i;                    break;                }            }            return nextRouteIndex;        }" />

Retry policy segment is mentioned within backend block is a mechanism designed to handle scenarios where an OpenAI ChatGPT instancefails due to throttling (HTTP 429) or server errors (HTTP status codes >= 500). This segment also takes into account the availability of alternative endpoint routes for retrying the request.
The maximum number of retry attempts is set to 3, meaning the policy will try up to three times to forward the request to a backend service before marking as unhealthy.Also retry attempts will be made on interval of 30 second one after another
Random number is generated within the range of 1 to the totalWeight of all available endpoint routes. This random weight determines which route will be selected for the next retry attempt.it terates through the cumulative weights of all endpoint routes stored in cumulativeWeights variable
While index of the selected route (nextRouteIndex) is determined and stored, guiding the API management platform on which route to use for the next retry attempt.
By incorporating a retry mechanism with weighted random route selection, it ensures that requests are not simply retried on the same route that might be experiencing issues but are intelligently rerouted based on predefined weights reflecting each route's capacity
system effectively balances the load across multiple routes, minimizing the impact of temporary issues on one or more endpoint

Appendix

For simple easy to deploy load balance options on AppGW/AFD refer my this repo

On overall reference Archiecture pattern of APIM & OpenAI can refer this doc

Azure HSM: Navigating RBI Compliance for FSI

Osama Shaikh — Sun, 17 Mar 2024 03:42:22 GMT

Introduction

As a technology enthusiast I recently had the opportunity to dive deep into the world of Azure Managed Hardware Security Modules (HSMs) for FSI customer. These powerful cryptographic guardians play a pivotal role in helping Non-Banking Financial Companies (NBFCs) meet the stringent compliance requirements i.e. FIPS-140-L3 set by the Reserve Bank of India (RBI). In this blog post, Ill cover some best practices in implementation of Azure Managed HSM, explore its practical applications, and guide you through its operational aspects.

What is Managed HSM? Why its important for NBFC

A managed HSM is a single-tenant, highly available, and FIPS 140-2 Level 3 validated hardware security module. Imagine a secure vault within Azure, purpose-built to protect your most sensitive secrets. Whether youre safeguarding financial transactions, securing healthcare data, or ensuring the integrity of critical applications, Managed HSM has your back.

Managed HSM establishes a cryptographic boundary for key material using a unique security domain. It ensures that Microsoft cannot access your keys within the HSM. Customers have full ownership and control over your cryptographic keys. it Isolates your keys within the HSM, preventing unauthorized access.
Security domain is an encrypted blob file unique to each managed HSM instance. It contains critical artifacts such as:
HSM backup, User credentials, Signing key, Data encryption key.

Advantages of HSM on AKV:

Granular Access Control:
- Per-key permissions enable fine-grained control over access.
- Local RBAC model ensures designated HSM cluster administrators have full control.
Private Endpoints:
- Securely connect to Managed HSM from your application using private endpoints.
- Ensures data privacy by avoiding public internet access.
FIPS 140-2 Level 3 Validated HSMs:
- Managed HSMs use Marvell LiquidSecurity HSM adapters.
- Complies with stringent security standards.
Integrated Monitoring and Audit:
- Fully integrated with Azure Monitor.
- Provides complete logs of all activity.
- Use Azure Log Analytics for analytics and alerts.
Data Residency:
- Managed HSM ensures data doesnt leave the region where the HSM instance is deployed.
Centralized Key Management:
- Manage critical keys across your organization in one place, follow the least privileged access principle.

Operational Excellence: Making HSM Private for secure access.

Access to a managed HSM is controlled through two interfaces:

Management plane: On the management plane, you manage the HSM itself. Operations in this plane include creating and deleting managed HSMs and retrieving managed HSM properties.
Data plane: On the data plane, you work with the data that's stored in a managed HSM. which is basically the keys generated on HSM or imported to HSM from different key manager.

Authorization

There are two level of permission required to work with HSM.

Azure RBAC: All management plane operation on HSM, Operations in this plane include Create/Delete, Backup/restore, Networking, Manage security domain.

Local RBAC: Role assignment at this is either scope at All keys or Single key.
Perform all sort of operation i.e. Create/Retrieve/Delete on Keys are granted using Local RBAC.

Networking

In Networking section for HSM, options are either 'Allow All network' or 'Private Endpoints with allow trusted services.'

As general rule of thumb always prefers private connection over public.

All Networks: Exposed over public endpoint, accessible over internet by default.
Private Endpoint: It only exposes HSM on your specific VNet, Resources in that VNet only will have access to HSM Data Plane.
Create private endpoint in HSM steps are similar to private endpoints for any Az resources. specify VNet and Subnet for PE where an application/service has line of sight to HSM PE.
One thing to note with private endpoint is that it restricts access to HSM data plane from Az ARM interface. you would need Azure VM in Same VNet or its Peered VNet to be able Access/manage HSM keys from Az ARM interface i.e. Portal/CLI
Error observed accessing HSM via portal once PE is enabled for HSM.

Operational Excellence: Encryption with managed HSM

Based on your requirement use HSM keys to encrypt data at rest on Azure service such as Blob Storage, PostgreSQL, MySQL etc.
Let's take an example of encryption existing blob storage with CMK on HSM, we already have HSM configured with Private endpoint & Required RBAC "Managed HSM Crypto User."
Noticed once we enable PE on HSM, we can't access data plane on HSM, that means all operation on keys would be restricted from portal/cli including local RBAC management.
When trying to encrypt storage account using CMK option and after HSM is selected, noticed error related to connection with HSM data plane on Storage account blade.

Note: Allow Microsoft trusted service is also enabled along with PE on HSM
I have tested couple of more service i.e. MySQL/PGSQL with similar error that mean its common with all Azure data services that supports encryption with CMK on HSM
Primary reason for this error, when we enable Private network on HSM. it lockdown network access on HSM and only enable access to HSM via Private Network i.e. VNet connected Device to maximize security.
Even though option for "Allow Microsoft trusted services" is enabled, setting up encryption on Blob storage failed because request to HSM API's goes via end user browser not via Storage service IP range.
When we access HSM interface from Azure portal, user's browser interacts with managed HSM API. Even when configuring encryption for other services via portal user's browser used as client to interact with ARM Api for HSM
Here is error Screenshot of setting encryption for new Blob Storage
Now if admin want to perform operations on Private managed HSM, would need an Azure VM which has line of sight connectivity towards HSM private endpoint.
*
In this screenshot users are able to configure encryption on Blob storage via Azure portal on Az VM*

Resiliency: Disaster Recovery with HSM

HSM offers multi region feature which allows data from primary instance replicate to Secondary instance.
Once HSM Replica is enabled for secondary region, its function as Active-Passive behind backend Traffic manager endpoint.
However, replica instance isn't visible to users in its subscription rather function in backend as extension to primary instance.
Failover of HSM is managed by Azure in case of outage of its service for primary instance.
Since secondary instance not visible on portal/cli interfaces many data related services such as Postgres, MySQL requires keystore/HSM to be available local on DR region.

Recommendation deploy another HSM instance rather than replica if users are configured HSM with many native Az DB services.

Disaster Recovery: Backup Restore on Managed HSM

Easiest way is to setup another HSM instance in DR region to perform complete Backup & Restore. HSM stores it backup on Blob Storage would need access to Blob storage with sufficient RBAC on security principle i.e. managed identity.

Note: you would need security domain of primary HSM while restoring backup

Create manage identity and Assign RBAC 'Storage Blob Contributor' on SA.

Associate managed identity on your primary HSM to enable backup write permission on SA (Storage account).

  az keyvault update-hsm --hsm-name primary-hsm --mi-user-assigned "/subscriptions/subid/resourcegroups/rgname/providers/Microsoft.ManagedIdentity/userAssignedIdentities/manageidentityname"

Once MI is associated to HSM, create container on SA for your HSM backups & triggers backup using Az CLI (backup/restore options are not on Az Portal)

 az keyvault backup start --use-managed-identity true --hsm-name primary-hsm --storage-account-name hsmbackupsaname --blob-container-name conatiner1  --subscription Subs-guid

Look for success message when received response from backup command, Now create another vanilla HSM instance in DR region, but don't Activate it
Normally after creating HSM instance, we initialize and download the new HSM's Security Domain as mentioned at start. However, since we're executing DR procedure, we will enable Security Recovery mode on this HSM.
```
 az keyvault security-domain init-recovery --hsm-name secondry-hsm --sd-exchange-key hsmrecoveryfilename
```
Collect/download security domain of primary HSM along with its 2/3 Keys based on Quorum configurations.
Before we triggers restore make sure secondary HSM has access to Blob storage where backup is stored
Now Initiate restore of backup on secondary HSM, noticed that I have received an error which means before restoration of any backup on HSM, backup of target HSM should be triggered within 30 minutes.
Once backup is completed, we reinitiated restoration of secondary HSM, this time it completed successfully.

I have tried to cover important areas around Implementation of HSM which faced during discussion with FSI customers. Feel free to share your thoughts or question in comments & for more details around HSM solution, refer to official documentation.

Managed DNS solution for Hybrid customers

Osama Shaikh — Sun, 10 Mar 2024 19:54:26 GMT

Azure DNS Private Resolver is a first-party Azure managed network component that facilitates DNS name resolution integration between On-premises to Azure and vice-versa.

Historically, on Azure cloud customers needed to rely on Infrastructure-as-a-Service (IaaS) solutions, configuring virtual machines either as DNS forwarders or Network Virtual Appliances (NVA) to achieve correct name resolution across their on-premises environments and Azure. These setups were not only complex but also introduced potential single points of failure, diminishing the network's overall resilience and efficiency.

Recently I have encountered scenarios where customers expressed a desire to transition away from these IaaS-based DNS forwarders. Their goal was clear: to eliminate these single points of failure and significantly boost their network's resiliency, especially in disaster recovery (DR) scenarios. They envisioned an active-active DR solution where, even in the event of a DNS service outage in the primary region, their organization's name resolution would continue uninterrupted. Moreover, they expected name resolution to be handled by their regional Azure Private DNS (Az PDNS) service by default, only routing to another region if the primary was compromised.
Here is Architecture design for Prod/DR in Active Scenario

To fulfill these requirements, I have devised an architectural design tailored for production and DR in an active-active configuration. This design necessitates several critical adjustments:

DNS Resolver Service Configuration: Each DNS resolver service in every region must be capable of resolving the names (URLs) of all workloads.
Private DNS Zone Linking: As Platform-as-a-Service (PaaS) resources have separate instances running in both regions, their respective private DNS zones should be linked to the Virtual Network (VNet) of the PDNS resolver service.
Conditional Forwarder Zones: Production and DR Active Directory (AD) DNS servers must be set up with non-AD integrated conditional forwarder zones for Azure workloads (e.g., Blob storage, Az SQL.), enabling private access through the on-premises network.
Configuration of Conditional Forwarders: The conditional forwarder zones in Prod and DR AD-DNS servers must have a different sequence of primary and secondary endpoints. This ensures that DNS traffic from each environment or region is managed by its respective DNS resolver service, addressing the customer's need for regional traffic management and eliminating cross-region dependencies during DR scenarios.

Here is screenshot of conditional forwarders config & name resolution details.

Name Resolution from on-premises against Blob storage private endpoint.

Private DNS zone details for blob storage on azure

This architecture pattern not only signifies improvement in DNS management but also reflects in more resilient, efficient, and scalable overall azure infrastructure which complement each other in active-active DR scenarios.

Thanks for reading!

Do you have any other tips to improve my blog? Let me know in the comments, or just say hi 👋.

Until next time, Happy Learning