New instance
Instructions in this section document the process of deploying a new instance or region.
Note
In order to deploy a new region, you need the following utilities:
CLI utilities: base64, openssl
If you are working with Cloud Shell these utilities are already available.
Create database
Edit and use the snippet below to deploy the database using az-cli:
LOCATION=centralus
PSQL_NAME=megforms-${LOCATION:?}
PSQL_PASSWORD=$(openssl rand -base64 20)
echo "${PSQL_NAME:?} database password: ${PSQL_PASSWORD:?}"
az postgres flexible-server create \
--resource-group meg \
--name ${PSQL_NAME:?} \
--location ${LOCATION:?} \
--database-name megforms \
--admin-user megforms \
--admin-password "${PSQL_PASSWORD:?}" \
--storage-size 128 \
--backup-retention 30 \
--geo-redundant-backup Enabled \
--tier Burstable \
--sku-name Standard_B2s \
--tags "purpose=production" "region=${LOCATION:?}" \
--version 14
See also
To get a full list of locations, run az account list-locations -o table.
To learn more about az postgres flexible-server create, visit az-cli documentation
Test connection
You can use the output connection string to try connecting to the database:
psql "postgresql://megforms:${PSQL_PASSWORD}@${PSQL_NAME}.postgres.database.azure.com/postgres?sslmode=require"
Upgrading PostgreSQL Flexible Server
You can upgrade your Azure Database for PostgreSQL Flexible Server to a new major version using the following command:
VERSION=16
az postgres flexible-server upgrade \
--resource-group meg \
--name ${PSQL_NAME:?} \
--version ${VERSION:?}
Important
After completing the major version upgrade, it is mandatory to run the ANALYZE command in the database.
postgres=> ANALYZE;
See the official Azure PostgreSQL Post upgrade documentation for details.
Performance optimization and tweaking
The new database will appear in Azure PostgreSQL flexible servers. You can edit the created database to optimize requirements as needed:
See also
- Networking
Tick “Allow public access from any Azure service within Azure to this server”. You can later un-tick it and instead whitelist the kubernetes IP address.
To view the list of IP addresses, use az network public-ip list -o table.
- Compute + storage
The above command creates a Burstable database tier, it is suitable for testing and demoing. For production workloads, consider upgrading to GeneralPurpose once clients in the region start using it full time.
- Maintenance
Set desired maintenance window based on expected usage times. For example, schedule maintenance for Sunday morning.
- Reservations
Create a reservation to commit to using the resource for a minumum time period and reduce cost of the database.
- Monitoring
Add the database to the relevant widgets in the monitoring dashboard
Create Kubernetes cluster
Edit and use the following snippet to deploy a new kubernetes cluster:
LOCATION=centralus
NAME=${LOCATION}
kubernetes_version=$(az aks get-versions --query 'orchestrators[-1].orchestratorVersion' -o tsv)
# Create the cluster
az aks create \
--resource-group meg \
--name ${NAME} \
--location ${LOCATION} \
--load-balancer-sku standard \
--tier standard \
--enable-cluster-autoscaler \
--min-count 2 \
--max-count 5 \
--network-plugin kubenet \
--tags "purpose=production" "region=${LOCATION}" \
--attach-acr MegForms \
--auto-upgrade-channel patch \
--node-vm-size Standard_D2as_v6 \
--kubernetes-version ${kubernetes_version:?} \
--nodepool-name meg
# Make it available to the kubectl command
az aks get-credentials \
--name ${NAME} \
--resource-group meg
The new kubernetes cluster will appear in Azure Kubernetes services
See also
To learn more about az aks create, visit documentation
Create a static IP address & domain name
Create a public static IP address for the cluster
export DOMAIN_NAME="${LOCATION}.qms.megit.com"
export RESOURCE_GROUP=$(az aks show --resource-group meg --name ${NAME:?} --query 'nodeResourceGroup' -o tsv)
# Create the IP address
export IP_ADDRESS=$(az network public-ip create -g ${RESOURCE_GROUP:?} --name kubernetes-prod --sku Standard --allocation-method static --query publicIp.ipAddress -o tsv)
echo "Created IP ${IP_ADDRESS} in ${RESOURCE_GROUP}"
az network dns record-set a add-record --resource-group meg --zone-name qms.megit.com --record-set-name "${LOCATION}" --ipv4-address ${IP_ADDRESS}
echo "Record set added: ${DOMAIN_NAME}
Install Helm charts
Run the following snippet to install ingress controller and cert manager.
Note that it relies on IP_ADDRESS and DOMAIN_NAME set in Create a static IP address & domain name,
and NAME containing cluster name.
# Ensure you're installing charts on the correct cluster
kubectl config use-context ${NAME:?}
# Add helm repos
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo add jetstack https://charts.jetstack.io
# Install charts
helm install ingress-nginx ingress-nginx/ingress-nginx --set controller.replicaCount=3 --set controller.service.loadBalancerIP="${IP_ADDRESS:?}" --set controller.service.externalTrafficPolicy=Local -n ingress
helm install cert-manager jetstack/cert-manager --set installCRDs=true -n cert-manager
# Deploy Ingress - it is important that DOMAIN_NAME variable is exported so that envsubst can access it
cat kubernetes/deployment/template.yaml | envsubst | kubectl apply -f -
Note
In order to upgrade Helm charts, use the following command:
CERT_MANAGER_VERSION=updated-cert-manager-version
NGINX_VERSION=updated-nginx-version
helm upgrade --version ${CERT_MANAGER_VERSION:?} cert-manager jetstack/cert-manager --set installCRDs=true -n cert-manager
helm upgrade --version ${NGINX_VERSION:?} ingress-nginx ingress-nginx/ingress-nginx --set controller.replicaCount=3 --set controller.service.loadBalancerIP="${IP_ADDRESS:?}" --set controller.service.externalTrafficPolicy=Local -n ingress
NGINX Telemetry
To enable telemetry for the NGINX ingress controller using the OpenTelemetry collector, apply the following manifest. This allows exporting metrics and traces from NGINX to your observability backend.
kubectl apply -f ./kubernetes/nginx-telemetry/nginx-telemetry.yaml
Note
If the OpenTelemetry collector is not deployed when telemetry is enabled in NGINX, metrics and traces will be lost, and observability will be unavailable until the collector is running. NGINX itself is unlikely to crash or become unstable.
Enabling telemetry
By default, OpenTelemetry is disabled. To enable it, you can patch the nginx-ingress-controller ConfigMap using the following command:
kubectl patch configmap nginx-ingress-controller --patch '{"data": {"enable-opentelemetry": "true"}}'
Ensure the Ingress resource includes the nginx.ingress.kubernetes.io/enable-opentelemetry: "true" annotation,
and the nginx-ingress-controller ConfigMap is configured with OpenTelemetry settings.
Sampling rate
To adjust sampling rate, set otel-sampler-ratio on the ingress config map:
kubectl patch configmap nginx-ingress-controller --patch '{"data": {"otel-sampler-ratio": "0.01"}}'
Deploy the project into the cluster
Deploy base resources & configuration
Deploy yaml files located in /kubernetes/setup directory, and update secret containing database password.
Note that PSQL_PASSWORD variable created in Create database is being used here
as well as DOMAIN_NAME from Create a static IP address & domain name.
kubectl apply -f kubernetes/setup/providers/azure.yaml -f kubernetes/setup/
POSTGRES_HOST=$(az postgres flexible-server show --resource-group meg --name ${PSQL_NAME:?} --query 'fullyQualifiedDomainName' -o tsv)
kubectl create secret generic database-password --from-literal="POSTGRES_PASSWORD=${PSQL_PASSWORD:?}" --dry-run=client -o yaml | kubectl apply -f -
kubectl patch configmap megforms --patch "{\"data\":{\"POSTGRES_HOST\":\"${POSTGRES_HOST:?}\", \"SITE_DOMAIN\": \"${DOMAIN_NAME:?}\", \"SAML_ENTITY_ID\": \"https://${DOMAIN_NAME:?}/\"}}"
Additional changes can be made to the ConfigMap by using kubectl edit configmap megforms
Add new deployment target to CI & Deploy
Important
Before proceeding with the following changes, you must assign the Contributor role to the Service Principal in Azure to grant GitLab the necessary permissions to make modifications in the new cluster. Run the following command:
az role assignment create --assignee 5ed4ddae-1272-478c-971e-e3a9a97716c3 --role "Contributor" --scope subscriptions/0f14d0e0-e21b-4d40-912a-3111c3a445e2/resourceGroups/${RESOURCE_GROUP}/providers/Microsoft.ContainerService/managedClusters/${NAME}
Add the following section to .gitlab-ci.deploy.yml:
CLUSTER_NAME with actual k8s cluster namek8s:CLUSTER_NAME:deploy:production:
extends: .k8s:deploy:production
variables:
cluster: CLUSTER_NAME
DOMAIN_NAME: CLUSTER_NAME.qms.megit.com
# Remove these after testing
when: manual
only: []
Commit and push this change, and trigger the job to deploy current version to the new cluster. Once deployed, you should start seeing the new pods in kubectl get pods.
Important
Remove when and only keys after initial deployment
to allow the job to run only when new tags are being released.
Wait for database migrations to complete. Run kubectl.exe logs -f job/migration to observe progress.
Add new server to the Status Page
To ensure the new region is monitored on the status page, update the site list in index.js within the status-page repository.
Set up Azure Backup Media Files
The following instructions will guide you through the process of setting up a backup solution for media files stored in a persistent volume within an Azure Kubernetes Service (AKS) cluster.
Edit and use the snippets below to deploy the backup vault, create a backup policy, and register a file share with the backup vault using az-cli:
# Set variables
RESOURCE_GROUP=meg
LOCATION=westeurope
BACKUP_VAULT_NAME=west-backups
FILE_SHARE_NAME=file-share-name
STORAGE_ACCOUNT_NAME=storage-account-name
POLICY_NAME=west-daily-policy
SCHEDULE_RUN_TIME="2024-07-09T02:00:00+00:00"
RETENTION_DAYS=90
# Create a Backup Vault
az backup vault create \
--resource-group ${RESOURCE_GROUP:?} \
--name ${BACKUP_VAULT_NAME:?} \
--location ${LOCATION:?}
# Create a Daily Backup Policy with 90 Days Retention
az backup policy create \
--backup-management-type AzureStorage \
--resource-group ${RESOURCE_GROUP:?} \
--vault-name ${BACKUP_VAULT_NAME:?} \
--name ${POLICY_NAME:?} \
--policy '{
"name": "'${POLICY_NAME:?}'",
"properties": {
"backupManagementType": "AzureStorage",
"workLoadType": "AzureFileShare",
"schedulePolicy": {
"schedulePolicyType": "SimpleSchedulePolicy",
"scheduleRunFrequency": "Daily",
"scheduleRunTimes": ["'${SCHEDULE_RUN_TIME:?}'"]
},
"retentionPolicy": {
"retentionPolicyType": "LongTermRetentionPolicy",
"dailySchedule": {
"retentionTimes": ["'${SCHEDULE_RUN_TIME:?}'"],
"retentionDuration": {
"count": '${RETENTION_DAYS:?}',
"durationType": "Days"
}
}
}
}
}'
# Register the File Share with the Backup Vault Using the Daily Backup Policy
az backup protection enable-for-azurefileshare \
--resource-group ${RESOURCE_GROUP:?} \
--vault-name ${BACKUP_VAULT_NAME:?} \
--storage-account ${STORAGE_ACCOUNT_NAME:?} \
--azure-file-share ${FILE_SHARE_NAME:?} \
--policy-name ${POLICY_NAME:?}
See also
To learn more about az backup, visit az-cli documentation for az backup
Site set-up
Visit the site created in Create a static IP address & domain name. Once migration is completed, you should be able to log in with credentials listed in Test user accounts.
Link to the EU site as a region
Go to Region admin and add the new site.
Link to the EU permission groups
Go to the Region group description link admin page and add a link for each permission group you want to sync to the new server. You can then use the django admin action to sync the newly created links with the downstream server. Each time the permission group is updated in the EU server it will sync with the downstream permission groups it’s linked with. The group description slug is used as the unique identifier to match groups across regions. When a permission group is being synced from the EU site, if a matching group description slug doesn’t exist in the downstream server a new permission group be automatically be created.
Set-up incoming e-mail hook
Create a new MX domain in godaddy DNS pointing at
mx.sendgrid.netSee also
SendGrid documentation for adding MX record
Create a new TXT entry for the subdomain with the following value to authorize Sendgrid to send mail from this domain using SPF:
v=spf1 include:sendgrid.net -all
Add the domain to SendGrid Sender Authentication to enable DKIM
Add the new domain to SendGrid. Enter webhook url
https://audits.megsupporttools.com/emails/receivebut replace domain name with the new domain.
Set-up SSO with Google for meg staff
Add a new app in Google Admin in
Click
Add new SAML identity provider django admin
- Settings:
Link with existing user account: by e-mail
wantNameId: truewantAttributeStatement: falseClient-side Entity ID: must match domain name (this is coming from env var
SAML_ENTITY_ID)