Are you a site reliability engineer looking to enhance alerting on your platform? Or are you intrigued by the possibilities of Bicep? Regardless of your background and experience level, this article is tailored to guide you through defining and deploying Alert Rules resources to Log Analytics Workspaces using Bicep Templates.
Architecture
Scenario
Consider this scenario: you have got your monitoring solution up and running, smoothly funneling logs into your Log Analytics Workspace. These logs get populated to a metrics table, serving as the bedrock for your routine queries to ensure everything is running as expected. With your handcrafted graphs in Log Analytics workbooks, you have an idea of the alerts you need. For instance, you want timely email notifications for your support team whenever a Databricks Job stops writing data for over 6 hours.
Beyond defining the alert rules, it is important to you that the rules are versioned. The reusability is key as you intend to share the rule with other platform teams in your organisation. Moreover, you are deploying the rules across test and production environments. With potentially over 100 rules and multiple engineers making changes, versioning becomes essential to maintain order and collaboration.
Resources
When delving into the deployment of alert rules, one of the primary questions that emerge is: what resources are required?
I want a timely email notifications for my team whenever a Databricks Job stops writing data for over six hours.
In this scenario, we will utilize the scheduledQueryRules resources alongside the actionGroupresource. The scheduled query rule will encompass all configurations tied to our Databricks Alert Rule, while the Action Group will contain all configurations linked to the email notification action.
While there are other options for rules, such as the metricRule, we decided to use the scheduled query rule because we would like to query the log metric data every six hours. As for the action group’s selection is driven by the need to trigger email notifications if an alert is fired.
For reference, the scheduled Query Rules overview from Azure can be found here. The action Group overview can be found here.
Resource interdependence or resource independence?
You should consider whether you want resources to be deployed interdependently or not. For example, do you want it to be a requirement that every alert rule must be deployed with at least one action group? Or do you want to be able to deploy action groups and alerts independently from one another? In this scenario, we would like to have independent deployments. These are the reasons:
- We do not want the overhead of interdependence. For example, if we change one alert rule and deploy that change, all action groups will be checked for changes.
- We always would have to deploy a default resource for the counter-resource, even if we only want alert rules or action groups to be deployed.
- A main.bicep file which references the resource modules would get significantly more complex and thus harder to understand.
- We have fewer action groups than alert rules.
Implementation
We can start the implementation now that we have answered the architectural questions.
First of all, we create a file called actionGroup.bicep . This file will work as the main for the action group deployment. In the file, we specify all configurations according to the actionGroups parameter list. Please note that the resource definition of the actionGroup and the parameters name, emailReceivers and smsReceivers are listed in the code snipped below:
@description('Required. The resource name.')
param name string = 'Support'
@description('Optional. The list of email receivers that are part of this action group.')
param emailReceivers array = [
{
name: 'AlertWasFiredEmail'
emailAddress: 'some.email@somecompany.com'
useCommonAlertSchema: false
status: 'Enabled'
}
]
@description('Optional. The list of SMS receivers that are part of this action group.')
param smsReceivers array = []
resource actionGroup 'Microsoft.Insights/actionGroups@2023-01-01' = {
name: name
location: location
tags: tags
properties: {
groupShortName: groupShortName
enabled: enabled
emailReceivers: emailReceivers
smsReceivers: smsReceivers
webhookReceivers: webhookReceivers
armRoleReceivers: armRoleReceivers
eventHubReceivers: eventHubReceivers
itsmReceivers: itsmReceivers
logicAppReceivers: logicAppReceivers
voiceReceivers: voiceReceivers
azureAppPushReceivers: azureAppPushReceivers
azureFunctionReceivers: azureFunctionReceivers
automationRunbookReceivers: automationRunbookReceivers
}
}
Notes
Appreciate for a moment the cleanliness of the configuration file. Every parameter gets a description, which is indicated by description()
Furthermore, we give every parameter a default value and specify its type. All in all, that makes our configuration file easy to understand.
On the same note, we will create a file called scheduledQueryRule.bicep, which will function as the main for the scheduled query rules deployment. For the sake of this article, not all parameters were included in the snippet.
@description('Required. The evaluation frequency of the alert rule.')
@allowed([
'PT15M'
'PT30M'
'PT1H'
])
param evaluationFrequency string = 'PT15M'
@description('Required. Action Group resource names to used to get Action Group resource Ids.')
param associatedActionGroups array = []
@description('Optional. ActionGroupsResourceIds')
var actionGroupsResourceIds = [for associatedActionGroup in associatedActionGroups: resourceId('Microsoft.Insights/actionGroups', associatedActionGroup)]
resource scheduledQueryRule 'Microsoft.Insights/scheduledQueryRules@2023-03-15-preview' = {
name: name
location: location
tags: tags
properties: {
actions: { actionGroups: actionGroupsResourceIds
customProperties: {}
}
autoMitigate: autoMitigate
criteria: {
allOf: [
{
dimensions: [
{
name: dimensionsName
operator: dimensionsOperator
values: [
dimensionsValue
]
}
]
failingPeriods: {
minFailingPeriodsToAlert: minFailingPeriodsToAlert
numberOfEvaluationPeriods: numberOfEvaluationPeriods
}
metricMeasureColumn: metricMeasureColumn
operator: operator
query: query
resourceIdColumn: resourceIdColumn
threshold: threshold
timeAggregation: timeAggregation
}
]
}
description: description
displayName: displayName
enabled: enabled
evaluationFrequency: evaluationFrequency
muteActionsDuration: muteActionsDuration
overrideQueryTimeRange: overrideQueryTimeRange
ruleResolveConfiguration: {
autoResolved: autoResolved
timeToResolve: timeToResolve
}
scopes: [
sourceId
]
severity: alertSeverity
targetResourceTypes: [
'Microsoft.OperationalInsights/workspaces'
]
windowSize: windowSize
}
}
Notes
As you can see, bicep also permits us to restrict the allowed values for a parameter. This is particularly handy if you want to restrict the user to create costly alerts. For example, a query executed every minute will be significantly more expensive than one executed every 15 minutes. From the start, we can disallow rules that are evaluated every minute.
Now, I want to draw your attention to the following line:
@description('Optional. ActionGroupsResourceIds')
var actionGroupsResourceIds = [for associatedActionGroup in associatedActionGroups: resourceId('Microsoft.Insights/actionGroups', associatedActionGroup)]
This line allows you to reference any action group by its name. The resourceId() function handles the retrieval of the Ids for you. Since actionGroupsResourceIdis a variable, you can dynamically populate it prior to passing it to the scheduledQueryRules resource. It is even possible to leave it undefined.
ParameterFiles
Now that we have created the resource deployment skeleton for the rules and action deployments, we can focus on the .bicepparam parameter files.
First, let’s look at the action group parameter file following the naming convention main.nameActionGroup.bicepparam:
using '../actionGroup.bicep'
// Action Group - Email Notification - Parameter Definition
param name = 'Support'
param emailReceivers = [
{
name: 'email-notification'
emailAddress: 'your.supportTeam@email.com'
useCommonAlertSchema: false
status: 'Enabled'
}
]
Note how the using statement is included in the first line. It is a required line for every parameter file.
Equally, you can create a parameter file for your rule that looks like this:
using '../scheduledQueryRule.bicep'
// Scheduled Query Rule - Databricks Alert Rule - Parameter Definition
param name = 'DatabricksAlert'
param displayName = 'Databricks Alert'
param alertSeverity = 3
param query = 'Metrics | where row_count <= threshold'
param windowSize = 'PT6H'
param resourceIdColumn = '_ResourceId'
param metricMeasureColumn = 'count'
param threshold = 0
param associatedActionGroups = ['Support']
Deployment
The only thing left to do is deploy the action Groups and the rules. The easiest is to deploy with either a Powershell Task or an Azure Cli task. I used an azureCli task with a bash command as shown below:
workspaceDirectory='$(Pipeline.Workspace)/artifacts/'
for parameterFile in "$workspaceDirectory"/main.*ActionGroup.json; do
output=$(az deployment group create
--resource-group $(rg)
--template-file "path_to/actionGroup.bicep"
--parameters "$parameterFile")
done
Naturally, you need to deploy the action groups first (sequentially speaking). In that way, the action can be referenced by name in your rules deployment.
In the next deployment task, you can then deploy the rules like so:
workspaceDirectory='$(Pipeline.Workspace)/artifacts/'
for parameterFile in "$workspaceDirectory"/main.*Alert.json; do
output=$(az deployment group create
--resource-group $(rg)
--template-file "${{ parameters.ruleTemplatePath }}"
--parameters "path_to/scheduledQueryRule.bicep"
--parameters sourceId=$(sourceId))
done
Notes
We have to provide the sourceId of the rule. With the source Id, we scope the alert rule to the right workspace. Also, please note that each bicep main file has already been build to a .json file. To build a .bicep file, you can use the following command:
az bicep build - file **/bicep_templates/scheduledQueryRule.bicep - outdir your/outdir/
Thanks for reading.
Alert Rules Deployment with Bicep Made Easy was originally published in Better Programming on Medium, where people are continuing the conversation by highlighting and responding to this story.