Pearl Platform — Central Runbook

01

When To Use This Runbook

This page is for the person who needs the flow, not just the files. If you are asking what comes first, what can be skipped, or what a successful environment actually looks like, start here.

Use this page if your question sounds like this

What should I do next? Do I need the build VM first? What does the worker VM prove? Why does Terraform success not automatically mean the application works?

Use the home page when you need orientation and navigation.
Use this runbook when you need the practical execution order.
Use the Terraform guide when the issue is provisioning, bootstrap, storage, secrets, or apply behavior.
Use the detailed reference page when you need the mirrored commands, paths, logs, service names, or script locations without going back to the markdown docs.

02

What Each VM Is Responsible For

The most common confusion is treating all three VMs as if they prove the same thing. They do not. Each one answers a different operational question.

🌐

Web VM

The web VM proves the interactive application tier works: IIS, deployed web content, host bindings, Solr, Memcached, and internal routing.

⚙️

Worker VM

The worker VM proves the background-processing tier works: Queue Processor, System Checker, AI Spooler, Totem, service hosting, and writable logs.

🧱

Build VM

The build VM proves repeatable delivery is possible from inside the test estate: source restore, compilation, packaging, and artifact output.

Important separation

A runtime-ready environment and a proven build host are related, but they are not the same milestone. You can prove the web and worker VMs before the build VM if artifacts are produced elsewhere.

03

The Tested Order To Follow

This order reflects the tested behavior of the current environment and avoids false failures caused by validating a later phase before an earlier dependency is ready.

1

Confirm the environment baseline exists

Make sure Terraform apply has completed and the role bootstrap has run on the web, worker, and build VMs. This gives you the platform and guest preparation, not the final application deployment.

2

Prepare trusted web and worker artifacts

Build artifacts on a trusted Windows build machine if the build VM is not yet your proven build host. Do not confuse placeholder folders on the VM with real deployed application output.

3

Deploy and validate the web VM

Prove that IIS, content, host bindings, Solr, and Memcached work before you expect worker-side features to behave consistently.

4

Deploy and validate the worker VM

Prove that services start, Totem responds, logs are writable, and background processors are running from the deployed worker payload.

5

Run the shared database and functional checks

This is where you confirm the environment is operational rather than only provisioned: SQL connectivity, session state reachability, queue job execution, Totem flow, system checker status, and AI spooler connectivity.

6

Validate the build VM

After the environment already works, prove the build VM can build and package the same deliverables so it becomes the repeatable delivery machine going forward.

04

Before You Touch The VM Validation Guides

The fastest way to waste time is to run validation against half-prepared machines. Complete these checks first.

Confirm VM access method by using the tested Bastion and private-connectivity route the team now documents.
Confirm the expected application drive is F: for the prepared application paths on the current test environment.
Confirm your deployment payloads are real and not just empty bootstrap-created directories.
Confirm environment settings point to test resources especially SQL Managed Instance and other service endpoints.
Confirm you know your purpose runtime proof or build-host proof, because the checklist order changes based on that.

Why this matters

Terraform and bootstrap create the landing zone. They do not guarantee that the web application, worker services, or build outputs have already been deployed into that landing zone.

05

Web VM Validation Path

Start runtime validation here. The worker tier depends on parts of the web tier, so proving the web VM first reduces noise in later troubleshooting.

Check area	What success means	What failure usually means
Application folders	Real web payload is present in the prepared `F:` paths.	Deployment never happened or copied to the wrong location.
IIS sites and bindings	Expected sites load and internal hostnames resolve correctly.	Binding mismatch, host file issue, or missing content.
Supporting services	Solr and Memcached are reachable and behaving as expected.	Bootstrap gap, service startup failure, or local firewall issue.
Connection settings	Config points to the test SQL MI and other test endpoints.	Old environment settings or incomplete transform work.

Exit condition for the web phase

You can reach the application through the intended internal path, the correct content is deployed, and the supporting services used by the site are healthy enough to support worker-side testing.

06

Worker VM Validation Path

The worker VM proves background execution. This is where service hosting, Totem behavior, log write permissions, and downstream reachability become important. After VNet peering recreation, Azure VirtualNetwork service tag propagation to the SQL MI NSG can take 15–60 seconds — automated validation now includes retry logic to handle this.

Queue Processor service is present and can run from the deployed worker payload.
System Checker service can start and record status as expected.
AI Spooler service can start and reach its configured endpoint path.
Totem endpoint path can register, respond to ping, and participate in notification flow.
Local log paths are writable by the service identity and the logs actually change during runtime activity.

Do not validate an empty worker

If services exist only as placeholders or point to incomplete binaries, the result tells you almost nothing. Validate only after the real worker payload is deployed and configured.

07

Shared Database And Functional Checks

This is the stage that answers the real question: does the environment function as a system, not just as three prepared Windows machines? SQL connectivity checks use retry logic (3 attempts, 15-second TCP timeout, 5-second delay) to accommodate transient delays after VNet peering or NSG propagation.

1

Verify database targeting

Check that both web and worker connection strings point to the test SQL Managed Instance and not to production or stale non-test infrastructure.

2

Verify basic read and write activity

Prove that the web and worker tiers can reach the database and perform simple operations without authentication or network failures.

3

Verify session state reachability

Because the platform uses SQL-backed session state, reachability here matters for realistic web application behavior.

4

Run functional smoke checks

Prove a background queue job can be claimed and completed, Totem can register and poll, System Checker can write a status, and AI Spooler can reach its external dependency path.

Exit condition for shared checks

The environment is now operational enough to say the application works inside the test estate, not merely that the servers exist and bootstrapped successfully.

08

Build VM Validation Path

This is the final proof that future builds can happen inside the test environment itself. It is not the first thing you need when the immediate goal is runtime validation.

Build VM area	What current tested guidance expects
Tools and package cache	Blob-backed installer prerequisites are used for the tested bootstrap path, with the package cache prepared on `F:\vs-package-cache`.
Source and restore tooling	Git, NuGet, PowerShell modules, and related tooling are present for source retrieval and restore steps.
Compilation path	MSBuild and ASP.NET compilation steps can produce the expected web and worker outputs.
Artifact trust	The build output is good enough to be promoted into the same runtime validation flow already proven elsewhere.

Simple decision rule

If your only question is whether the environment works, validate web and worker first. If your question is whether the test environment can also build itself consistently, add the build VM proof after runtime success.

09

Decision Guide

Use this section when you are uncertain which path applies to your situation.

A

You already have trusted artifacts

Skip directly to web deployment and runtime validation. The build VM can wait until later.

B

You are changing infrastructure or bootstrap

Use the Terraform guide first, then return here for the runtime order.

C

You are trying to prove repeatable delivery

Complete runtime proof first if possible, then use the build VM path to replace your external trusted build host.

10

GitHub CI/CD Path

The repository has seven GitHub Actions workflows covering environment provisioning, building, production deployment, test environment deployment, and automated cleanup. All deployment uses Azure VM Run Commands (v2) — no inbound ports required. Authentication uses OIDC federated identity.

Create or confirm infrastructure through create-environment.yaml (Terraform plan/apply/destroy with target environment dropdown).
Deploy to production via deploy-production.yaml (manual dispatch only, includes build + approval gate + deploy to both VMs).
Deploy test environments automatically via deploy-test.yaml (triggers on push to feature/env branches, creates isolated IIS site).
Cleanup test environments via cleanup-test-env.yaml (triggers on branch delete + daily orphan scan at 03:00 UTC).
Standalone build via build.yaml on GitHub-hosted windows-2022 runners with encrypted vendor dependencies (.zop AES-256).
Manual deploy/rollback via deploy.yaml for targeted VM deployments using existing artifacts.
After destroy/rebuild cycles the firewall public IP changes — retrieve with terraform output firewall_public_ip and update external access rules.

Best companion pages

Use the GitHub CI/CD Guide for workflow details, the Multi-Environment Guide for test environment system, or the Deployment Manual for the complete reference.

11

Common Failure Patterns

These are the patterns that create the most confusion because they look like application defects but are really environment-state issues.

IIS looks healthy but the application is wrong usually means placeholder directories exist but real web content was not deployed.
Worker services start and stop unexpectedly usually means incomplete worker payload, bad config, or missing dependencies rather than a Windows-service hosting problem alone.
Database tests fail from one tier only usually means mismatched connection strings or tier-specific reachability issues.
SCP assumptions fail because Bastion/RDP tunneling is not the same thing as SSH support inside the guest VM.
Build tools mismatch often comes from old documentation assuming a different installer path than the current Blob-backed bootstrap prerequisite layout.

12

When To Open The Detailed Reference Library

The runbook tells you the right order. The detailed reference page carries the deeper technical mirror of the internal docs: exact paths, expected service names, bootstrap logs, access methods, and sample command blocks.

🌐

Web deep checks

Expected IIS paths, app names, host aliases, Solr and Memcached notes, and web-tier command examples.

⚙️

Worker deep checks

Expected worker folders, service names, binary names, log paths, Totem checks, and worker-tier validation commands.

🧱

Build deep checks

Build-tool locations, package cache paths, bootstrap logs, artifact directories, and build smoke-test guidance.

Best usage pattern

Stay on this runbook for flow decisions, then open the detailed reference page beside it when you need the deeper commands and proof points during execution.

13

How To Reuse This Runbook For A New Project

This page should remain useful even after Pearl. The reusable pattern is to separate platform readiness, deployment readiness, runtime readiness, and repeatable-build readiness.

Platform readiness means cloud resources and baseline VM preparation exist.
Deployment readiness means real application payloads and settings are available.
Runtime readiness means the app actually works on the provisioned estate.
Build readiness means the estate can produce its own trustworthy artifacts.

Portable principle

For any future project, build the operator guide around proofs instead of around technologies. That makes the documentation easier for non-technical readers and more durable when implementation details change.

14

Final Checklist

If you only remember one page from this site, remember this sequence.

Provision and bootstrap the test environment.
Prepare trusted web and worker artifacts.
Validate the web VM first.
Validate the worker VM second.
Run shared database and functional checks third.
Validate the build VM after runtime success.
Capture repeatable lessons and turn them into automation.

Pearl Test Environment Central Runbook