<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[The Cloud Playbook]]></title><description><![CDATA[Playbooks for engineering leaders running multitenant SaaS on AWS who choose predictability over speed: boringly reliable services, controlled cloud spend, and audit-ready compliance.]]></description><link>https://www.thecloudplaybook.com</link><image><url>https://substackcdn.com/image/fetch/$s_!7MI5!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b1ca555-b578-4ae3-8dda-a03cbc6b1d18_500x500.png</url><title>The Cloud Playbook</title><link>https://www.thecloudplaybook.com</link></image><generator>Substack</generator><lastBuildDate>Sun, 14 Jun 2026 10:46:57 GMT</lastBuildDate><atom:link href="https://www.thecloudplaybook.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Amrut Patil]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[thecloudplaybook@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[thecloudplaybook@substack.com]]></itunes:email><itunes:name><![CDATA[Amrut Patil]]></itunes:name></itunes:owner><itunes:author><![CDATA[Amrut Patil]]></itunes:author><googleplay:owner><![CDATA[thecloudplaybook@substack.com]]></googleplay:owner><googleplay:email><![CDATA[thecloudplaybook@substack.com]]></googleplay:email><googleplay:author><![CDATA[Amrut Patil]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[TCP# 124: The 6 golden paths worth building first, and how to sequence them.]]></title><description><![CDATA[Scored by frequency, friction, risk, and reach. With backlog template and adoption metrics.]]></description><link>https://www.thecloudplaybook.com/p/first-6-golden-paths-aws-platform-teams</link><guid isPermaLink="false">https://www.thecloudplaybook.com/p/first-6-golden-paths-aws-platform-teams</guid><dc:creator><![CDATA[Amrut Patil]]></dc:creator><pubDate>Sun, 24 May 2026 15:01:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!XSxO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a1509-c22d-4960-8e5a-4aa4129cda82_1254x1254.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Platform teams rarely have a golden path shortage. They have a sequencing problem.</p><p>The backlog of potential paths is long: containerized deployments, serverless functions, database provisioning, secrets management, observability setup, CI/CD templates. Every senior engineer has a candidate. Every app team has a complaint about the current manual process.</p><p>The question is not what to build. It is what to build first.</p><p>Wrong sequencing produces low adoption. A golden path for Kubernetes service mesh configuration is technically impressive and practically irrelevant if most teams are still deploying Lambda functions manually. App teams ignore paths that don&#8217;t match their daily work. The platform team loses credibility before the program gets traction.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XSxO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a1509-c22d-4960-8e5a-4aa4129cda82_1254x1254.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XSxO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a1509-c22d-4960-8e5a-4aa4129cda82_1254x1254.png 424w, https://substackcdn.com/image/fetch/$s_!XSxO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a1509-c22d-4960-8e5a-4aa4129cda82_1254x1254.png 848w, https://substackcdn.com/image/fetch/$s_!XSxO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a1509-c22d-4960-8e5a-4aa4129cda82_1254x1254.png 1272w, https://substackcdn.com/image/fetch/$s_!XSxO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a1509-c22d-4960-8e5a-4aa4129cda82_1254x1254.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XSxO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a1509-c22d-4960-8e5a-4aa4129cda82_1254x1254.png" width="1254" height="1254" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7a2a1509-c22d-4960-8e5a-4aa4129cda82_1254x1254.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1254,&quot;width&quot;:1254,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1492693,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecloudplaybook.com/i/197148237?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a1509-c22d-4960-8e5a-4aa4129cda82_1254x1254.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XSxO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a1509-c22d-4960-8e5a-4aa4129cda82_1254x1254.png 424w, https://substackcdn.com/image/fetch/$s_!XSxO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a1509-c22d-4960-8e5a-4aa4129cda82_1254x1254.png 848w, https://substackcdn.com/image/fetch/$s_!XSxO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a1509-c22d-4960-8e5a-4aa4129cda82_1254x1254.png 1272w, https://substackcdn.com/image/fetch/$s_!XSxO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a2a1509-c22d-4960-8e5a-4aa4129cda82_1254x1254.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>Why the Sequence of Golden Paths Determines Platform Trust</h2><p>The first three paths a platform team ships set the tone for everything that follows.</p><p>If the first paths are fast, correct, and clearly easier than the alternative, app teams adopt them. They stop routing around the platform. They start asking for the next path instead of building their own. The platform team earns the right to set standards for more complex decisions.</p><p>If the first paths are slow to produce, hard to use, or misaligned with what teams actually need in their daily work, adoption fails. Platform engineering becomes the team that built tools nobody uses. Credibility takes quarters to rebuild.</p><p>The first six paths are not a starting point. They are the foundation of platform trust.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Playbook! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>The Failure Mode: Building for Interest Instead of Impact</h2><p>Most platform teams prioritize golden paths based on what is technically interesting to build, not on what creates the most daily impact for app teams.</p><p>They build a Kubernetes deployment path because the platform engineers are skilled in Kubernetes. They build an advanced multi-region failover path because it is architecturally sophisticated. They build a developer portal integration because it makes a good conference talk.</p><p>Meanwhile, every app team is manually configuring Lambda functions with inconsistent IAM roles, provisioning RDS instances without encryption enforcement, and setting up CloudWatch alarms from scratch for every new service. These are the gaps producing incidents, cost anomalies, and audit findings right now.</p><p>A golden path program that prioritizes interesting work over daily friction earns goodwill from platform engineers and indifference from app teams.</p><div><hr></div><h2>A Scoring Model for Golden Path Prioritization</h2><p>Score each candidate path across four dimensions before committing the build investment.</p><ul><li><p><strong>Frequency.</strong> How many engineers would use this path in a given month? A path used daily by every team outranks one used once per quarter by a single team. Count the actual provisioning events, not the theoretical ones.</p></li><li><p><strong>Friction.</strong> How much time does the current manual approach cost? A path that eliminates 90 minutes of manual configuration per deployment creates more measurable value than a path that eliminates 10 minutes.</p></li><li><p><strong>Risk.</strong> What goes wrong when teams build without this path? Score on two sub-dimensions: incident risk (how often has the manual approach produced a production failure?) and audit risk (how often has the manual approach produced a compliance gap?).</p></li><li><p><strong>Reach.</strong> How many existing services would benefit from retroactive adoption? A path that applies to 40 current services without re-provisioning creates value beyond new deployments.</p></li></ul><p>The six paths below score highest across these four dimensions in most AWS SaaS environments.</p><div><hr></div><h2>The Six Paths, Scored and Sequenced</h2><p><strong>Path 1: Containerized Service Deployment (ECS on Fargate)</strong></p><p>The highest-frequency deployment pattern for most SaaS platform teams. Manual deployment requires engineers to configure the task definition, ECS service, target group, log group, IAM execution role, and cost allocation tags independently and consistently.</p><ul><li><p>Owner: Platform Engineering, compute chapter</p></li><li><p>What the module covers: Task definition template, ECS service configuration, ALB target group, CloudWatch log group with standard retention, IAM execution role with least-privilege policy, required cost tags</p></li><li><p>Adoption metric: Percentage of new ECS services deployed via the module in the trailing 30 days</p></li><li><p>Rollout sequence: Release to one app team as a pilot, collect feedback on friction points, iterate, then open to all teams</p></li></ul><p><strong>Path 2: Serverless Function Deployment (Lambda)</strong></p><p>Lambda is the second-highest-frequency deployment pattern and the one most likely to have inconsistent IAM scopes and missing observability in existing deployments.</p><ul><li><p>Owner: Platform Engineering, serverless chapter</p></li><li><p>What the module covers: Function configuration with runtime defaults, IAM execution role scoped to required permissions, CloudWatch log group with retention policy, error rate alarm with defined threshold, required cost and service tags</p></li><li><p>Adoption metric: Percentage of new Lambda functions with IAM roles sourced from the module&#8217;s policy template</p></li><li><p>Rollout sequence: Pilot with the team that has the most Lambda functions in production, prioritize retroactive adoption to close existing IAM and observability gaps</p></li></ul><p><strong>Path 3: Managed Relational Database Provisioning (RDS)</strong></p><p>Database provisioning has the highest incident and audit risk among self-service infrastructure decisions. Encryption at rest, backup retention, deletion protection, subnet group placement, and parameter group selection are regularly misconfigured by teams building in the AWS console or from memory.</p><ul><li><p>Owner: Platform Engineering, data chapter</p></li><li><p>What the module covers: Instance class guardrails by environment tier, encryption at rest enforced, automated backup enabled with defined retention period, deletion protection enabled in production, subnet group assignment to private subnets, parameter group set to approved baseline, required tags</p></li><li><p>Adoption metric: Percentage of RDS instances in production accounts that were provisioned via the module</p></li><li><p>Rollout sequence: Release to production after extensive testing in staging, prioritize retroactive compliance for existing instances</p></li></ul><p><strong>Path 4: Cost-Tagged Infrastructure Module</strong></p><p>Every resource type in your environment should apply a consistent set of cost allocation tags. The alternative is a quarterly archaeology project to attribute spend. This path is not a deployment module for a specific resource. It is a tagging standard embedded in every other module.</p><ul><li><p>Owner: Platform Engineering, FinOps chapter</p></li><li><p>What the module covers: Required tag definitions (team, product, environment, cost-center, service-owner), tag validation logic, tag policy enforcement via AWS Config or Terraform plan validation, tag propagation to child resources</p></li><li><p>Adoption metric: Percentage of resources in production accounts that carry all required cost allocation tags</p></li><li><p>Rollout sequence: Release as a dependency of Paths 1, 2, and 3 &#8212; embedded, not optional</p></li></ul><p><strong>Path 5: Observability Baseline for New Services</strong></p><p>Every new service needs CloudWatch metrics, structured log groups, distributed tracing, and at least two alarms: one on error rate and one on latency. Teams building without this path produce services that are invisible until they fail.</p><ul><li><p>Owner: Platform Engineering, observability chapter</p></li><li><p>What the module covers: CloudWatch metric namespace for the service, log group with structured format and retention policy, X-Ray tracing enabled, error rate alarm, p99 latency alarm, dashboard template for the service&#8217;s health view</p></li><li><p>Adoption metric: Percentage of services in production with all five observability components present</p></li><li><p>Rollout sequence: Release alongside Path 1 and Path 2 so new service deployments include observability by default</p></li></ul><p><strong>Path 6: Secrets Management Pattern (AWS Secrets Manager)</strong></p><p>Hardcoded credentials and environment variables containing secrets are among the most common audit findings on cloud platforms. This path eliminates the decision: secrets go in Secrets Manager, with rotation enabled, and the application accesses them via a scoped IAM policy.</p><ul><li><p>Owner: Platform Engineering, security chapter</p></li><li><p>What the module covers: Secrets Manager secret provisioning with rotation enabled, rotation Lambda for supported secret types, IAM policy granting read access scoped to the specific secret, resource policy blocking cross-account access by default</p></li><li><p>Adoption metric: Percentage of application secrets accessed via Secrets Manager vs. environment variables or parameter store without rotation</p></li><li><p>Rollout sequence: Pilot with one team migrating hardcoded credentials, document the migration steps, then open as the default secret provisioning path</p></li></ul><div><hr></div><h2>Artifact in This Issue</h2><p>The artifact is the Golden Path Backlog Template: a structured planning table for sequencing your golden path program.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://drive.google.com/file/d/1Uc8PteJrY2IewUiNJtgr-E4CKLdc3knc/view?usp=sharing&quot;,&quot;text&quot;:&quot;Download&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://drive.google.com/file/d/1Uc8PteJrY2IewUiNJtgr-E4CKLdc3knc/view?usp=sharing"><span>Download</span></a></p><p>Each row in the template contains:</p><ul><li><p>Path name and type (deployment, provisioning, configuration, observability)</p></li><li><p>Problem solved: the specific manual process this path replaces</p></li><li><p>Frequency score: monthly provisioning events across the engineering org</p></li><li><p>Friction score: estimated hours of manual work eliminated per use</p></li><li><p>Risk score: incident and audit risk rating from the manual approach</p></li><li><p>Reach score: number of existing services that can retroactively adopt the path</p></li><li><p>Total priority score and recommended sequencing position</p></li><li><p>Owner team and implementation scope summary</p></li><li><p>Adoption metric definition</p></li><li><p>Rollout sequence steps (pilot, feedback, iterate, release)</p></li></ul><p>Use this template in your next platform engineering planning session to build your own prioritized golden path backlog. Score each candidate path against your actual provisioning data, not estimates. The team that deploys 20 Lambda functions per month has a higher friction score for Path 2 than the team that deploys two.</p><div><hr></div><h2>What to Measure and When to Review</h2><ul><li><p><strong>Adoption rate per path.</strong> Percentage of new deployments using the golden path module in the trailing 30 days. Track per path, not as a single aggregate. Paths with adoption below 60 percent within 90 days of release need a friction audit.</p></li><li><p><strong>Drift rate.</strong> Percentage of existing services that were compliant with the path standard at release but have since drifted. High drift indicates the path lacks enforcement mechanisms. It is documentation, not a system.</p></li><li><p><strong>Time saved per deployment.</strong> Estimate the manual configuration time eliminated by each path use. Accumulate monthly. This number converts golden path investment into engineering hours returned to product work, which is the right currency for the conversation with the VP of Engineering.</p></li></ul><p>Review path adoption metrics monthly in the platform team&#8217;s operating review. Review the prioritized backlog quarterly to add candidates, retire unused paths, and promote complex paths to simpler defaults as adoption matures.</p><div><hr></div><h2>What the First Six Paths Actually Build</h2><p>The six paths above are not primarily about technical consistency, though they produce it.</p><p>They are about demonstrating that the platform team solves real problems fast, makes the right choice the easy choice, and delivers value that app teams can measure in time and incidents.</p><p>When the first six paths land well, app teams stop routing around the platform. They start requesting paths 7 through 12. The platform team stops defending its investment and starts negotiating for the capacity to expand it.</p><p>The path program that scales is not the one with the most sophisticated architecture. It is the one that earned trust early by solving the problems teams had every day.</p><div><hr></div><p>Use the backlog template as your agenda for next quarter&#8217;s planning. Score your top ten candidate paths before the session so the team is debating real data, not intuition. Forward this issue to the senior engineers and EMs who own the highest-friction provisioning workflows on their teams. They are the ones who know where the manual time is going.</p><div><hr></div><h2><strong>That&#8217;s it for today!</strong></h2><p>Did you enjoy this newsletter issue?</p><p>Share with your friends, colleagues, and your favorite social media platform.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share The Cloud Playbook&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share The Cloud Playbook</span></a></p><p><strong>Until next week &#8212; Amrut</strong></p><div><hr></div><h2><strong>Get in touch</strong></h2><p>You can find me on <a href="https://www.linkedin.com/in/patilamrut/">LinkedIn</a> or <a href="https://twitter.com/realamrutpatil">X</a>.</p><p>If you would like to request a topic to read, please feel free to contact me directly via LinkedIn or X.</p>]]></content:encoded></item><item><title><![CDATA[TCP #123: Golden paths fail when they require engineers to choose them]]></title><description><![CDATA[The difference between documentation and a system, and why one scales while the other doesn't.]]></description><link>https://www.thecloudplaybook.com/p/golden-paths-documentation-not-systems</link><guid isPermaLink="false">https://www.thecloudplaybook.com/p/golden-paths-documentation-not-systems</guid><dc:creator><![CDATA[Amrut Patil]]></dc:creator><pubDate>Sun, 17 May 2026 14:30:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!4b58!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43fd147-24b1-476b-a763-95ea6b1e26ee_1254x1254.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The platform team publishes the golden path. </p><p>A Confluence page. A README. A wiki entry explaining the approved way to deploy a new service, provision a database, or configure observability.</p><p>Adoption is strong in the first two weeks. Engineers read it during onboarding. A few teams follow it on their next service.</p><p>Six months later, the platform team runs an audit. Half the services in production do not comply with the standard. Some never did. Engineers who read the doc once are deploying from memory. Teams under a deadline skipped the path entirely and built what they already knew.</p><p>The platform team calls it an adoption problem. It is not. It is a design problem.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4b58!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43fd147-24b1-476b-a763-95ea6b1e26ee_1254x1254.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4b58!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43fd147-24b1-476b-a763-95ea6b1e26ee_1254x1254.png 424w, https://substackcdn.com/image/fetch/$s_!4b58!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43fd147-24b1-476b-a763-95ea6b1e26ee_1254x1254.png 848w, https://substackcdn.com/image/fetch/$s_!4b58!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43fd147-24b1-476b-a763-95ea6b1e26ee_1254x1254.png 1272w, https://substackcdn.com/image/fetch/$s_!4b58!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43fd147-24b1-476b-a763-95ea6b1e26ee_1254x1254.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4b58!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43fd147-24b1-476b-a763-95ea6b1e26ee_1254x1254.png" width="1254" height="1254" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d43fd147-24b1-476b-a763-95ea6b1e26ee_1254x1254.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1254,&quot;width&quot;:1254,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1467493,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecloudplaybook.com/i/197147610?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43fd147-24b1-476b-a763-95ea6b1e26ee_1254x1254.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4b58!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43fd147-24b1-476b-a763-95ea6b1e26ee_1254x1254.png 424w, https://substackcdn.com/image/fetch/$s_!4b58!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43fd147-24b1-476b-a763-95ea6b1e26ee_1254x1254.png 848w, https://substackcdn.com/image/fetch/$s_!4b58!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43fd147-24b1-476b-a763-95ea6b1e26ee_1254x1254.png 1272w, https://substackcdn.com/image/fetch/$s_!4b58!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd43fd147-24b1-476b-a763-95ea6b1e26ee_1254x1254.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>The Business Case for Getting Golden Paths Right</h2><p>Every decision an engineer makes without a golden path is a chance to deviate from the standard.</p><p>Manual decisions slow delivery. An engineer building a new Lambda function from scratch spends two hours figuring out the right IAM role scope, log retention policy, and error alerting configuration. A golden path that wraps this into a validated module turns that into a 15-minute task.</p><p>Manual decisions create inconsistency. Inconsistency compounds. Twelve teams deploying services twelve different ways means twelve different log formats, twelve different tagging conventions, and twelve different security postures to audit against.</p><p>Inconsistency produces three outcomes that appear on the platform team&#8217;s desk: incidents from configurations that deviated from a tested baseline, AWS cost anomalies from resource patterns that bypassed the approved cost controls, and audit findings from services that were never reviewed against the security standard.</p><p>The golden path does not eliminate these problems by existing. It eliminates them by being used.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">The Cloud Playbook is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Why Documentation Fails Under Pressure</h2><p>A Confluence page requires the engineer to do five things before the standard applies: know the page exists, find it, read it, remember the relevant parts, and choose to follow it when under deadline pressure with a closing release window.</p><p>That is five failure points. Under a deadline, discipline competes with speed. Speed wins.</p><p>Documentation-based golden paths suffer from three structural weaknesses that no amount of engineering culture can fix.</p><ul><li><p><strong>They require active adoption</strong></p><p>The engineer must choose the golden path each time. In low-pressure conditions, most do. Under a production incident, a Friday release, or a late-quarter launch push, most don&#8217;t. The path that requires a decision is the path that gets bypassed.</p></li><li><p><strong>They drift from the standard without detection</strong></p><p>A Confluence page can describe the right pattern for provisioning a database. It cannot prevent an engineer from provisioning a different pattern. The documentation and the deployed state diverge silently. The platform team discovers the gap at the next audit.</p></li><li><p><strong>They have no adoption signal</strong></p><p>You cannot tell from a Confluence page how many teams followed it, which teams ignored it, or when the last adoption happened. Without a signal, the platform team cannot distinguish between &#8220;the path is working&#8221; and &#8220;the path was abandoned months ago.&#8221;</p></li></ul><div><hr></div><h2>Four Axes for Evaluating Whether Your Golden Path Is a System</h2><p>A golden path becomes a system when it reduces the cost of the right choice to below that of any alternative. Evaluate your existing golden paths against four dimensions:</p><ul><li><p><strong>Friction.</strong> Does using the golden path take less time than not using it? If the golden path requires more steps than the manual alternative, engineers will take the shorter route. A Terraform module that provisions a correctly configured RDS instance in five minutes beats a doc that explains how to configure one correctly in forty-five.</p></li><li><p><strong>Decision elimination.</strong> Does the golden path remove choices, or does it document the right choice and leave the decision to the engineer? The distinction matters. A module that enforces the encryption setting eliminates the need for a decision. A doc that says &#8220;enable encryption&#8221; documents it. One applies the standard automatically. The other relies on the engineer remembering to do so.</p></li><li><p><strong>Drift resistance.</strong> Can the golden path drift from the standard over time, or does it enforce the standard automatically? A Terraform module version-pinned to a tested configuration drifts when someone modifies it. A policy-enforced guardrail that rejects non-compliant resources does not drift. Drift resistance is a function of the enforcement mechanism, not documentation quality.</p></li><li><p><strong>Adoption measurement.</strong> Do you know who is using the golden path, how often, and which services have adopted it? An internal module registry that tracks download counts, a pipeline template that logs invocations, or a tag applied automatically on module use all produce adoption signals. A Confluence page produces none.</p></li></ul><p>A golden path that scores well across all four dimensions does not require marketing effort. It gets used because it is the lowest-friction option available.</p><div><hr></div><h2>What Platform Teams That Get This Right Actually Build</h2><p>When golden paths are systems, three things change.</p><p>First, adoption is automatic. Engineers do not choose the golden path. They use it because it is the default, the fastest option, and the only path that meets compliance requirements without additional configuration.</p><p>Second, drift detection becomes operational. When services must pass through a validated module or pipeline template, deviations appear in the deployment log rather than in the audit. The platform team catches the gap before the auditor does.</p><p>Third, investment becomes measurable. Adoption metrics show which paths are in active use, which teams are using them, and what the alternative deployment time would have been. The platform team can quantify the hours saved, the incidents prevented, and the reduced audit exposure. That evidence changes the conversation with the CTO from &#8220;justify the platform investment&#8221; to &#8220;where should we build the next path.&#8221;</p><div><hr></div><p><em>On Wednesday, paid subscribers get the first 6 golden paths every AWS platform team should build, including owner, implementation scope, adoption metric, and rollout sequence for each. The issue prioritizes paths by the combination of daily usage frequency, manual time cost, and compliance risk if skipped.</em></p><h2>Upgrade If You Need Implementation, Not Just Ideas</h2><p>If you&#8217;re using these emails to guide real decisions on your platform, you&#8217;ll get more leverage from the paid version of The Cloud Playbook.</p><p>The free newsletter gives you patterns and language.</p><p>The paid newsletter turns those patterns into implementation kits you can ship inside a quarter:</p><ul><li><p>Concrete rollout plans (90&#8209;day roadmaps for each pattern)</p></li><li><p>Templates and checklists (policies, runbooks, tagging schemes, review checklists)</p></li><li><p>Real examples from high&#8209;stakes AWS environments (what we actually shipped and why)</p></li></ul><p>If the paid side doesn&#8217;t save you more than the subscription in <strong>one</strong> incident, audit cycle, or bad migration you avoid, you should cancel and keep the playbooks.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe&quot;,&quot;text&quot;:&quot;Upgrade to the Paid Cloud Playbook&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.thecloudplaybook.com/subscribe"><span>Upgrade to the Paid Cloud Playbook</span></a></p><div><hr></div><h2><strong>That&#8217;s it for today!</strong></h2><p>Did you enjoy this newsletter issue?</p><p>Share with your friends, colleagues, and your favorite social media platform.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share The Cloud Playbook&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share The Cloud Playbook</span></a></p><p><strong>Until next week &#8212; Amrut</strong></p><div><hr></div><h2><strong>Get in touch</strong></h2><p>You can find me on <a href="https://www.linkedin.com/in/patilamrut/">LinkedIn</a> or <a href="https://twitter.com/realamrutpatil">X</a>.</p><p>If you would like to request a topic to read, please feel free to contact me directly via LinkedIn or X.</p>]]></content:encoded></item><item><title><![CDATA[TCP #122: Your approval process needs a classification model, not just a queue]]></title><description><![CDATA[Three-tier framework, change matrix, and checklist for what platform must own, review, or release.]]></description><link>https://www.thecloudplaybook.com/p/platform-approval-model-change-classification</link><guid isPermaLink="false">https://www.thecloudplaybook.com/p/platform-approval-model-change-classification</guid><dc:creator><![CDATA[Amrut Patil]]></dc:creator><pubDate>Wed, 13 May 2026 13:02:50 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!5NND!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa369f3a6-2ba5-4b0d-9ec2-dbb743829cd2_1254x1254.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When platform teams grow their approval scope without a classification model, two things happen simultaneously.</p><p>App teams wait for approvals they do not actually need. And high-risk infrastructure changes move through the same queue as routine additions, reviewed at the same depth and with the same SLA.</p><p>The result is that the platform is slow, and the risks it was built to catch are still getting through.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5NND!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa369f3a6-2ba5-4b0d-9ec2-dbb743829cd2_1254x1254.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5NND!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa369f3a6-2ba5-4b0d-9ec2-dbb743829cd2_1254x1254.png 424w, https://substackcdn.com/image/fetch/$s_!5NND!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa369f3a6-2ba5-4b0d-9ec2-dbb743829cd2_1254x1254.png 848w, https://substackcdn.com/image/fetch/$s_!5NND!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa369f3a6-2ba5-4b0d-9ec2-dbb743829cd2_1254x1254.png 1272w, https://substackcdn.com/image/fetch/$s_!5NND!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa369f3a6-2ba5-4b0d-9ec2-dbb743829cd2_1254x1254.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5NND!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa369f3a6-2ba5-4b0d-9ec2-dbb743829cd2_1254x1254.png" width="1254" height="1254" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a369f3a6-2ba5-4b0d-9ec2-dbb743829cd2_1254x1254.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1254,&quot;width&quot;:1254,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1481193,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecloudplaybook.com/i/197146000?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa369f3a6-2ba5-4b0d-9ec2-dbb743829cd2_1254x1254.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5NND!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa369f3a6-2ba5-4b0d-9ec2-dbb743829cd2_1254x1254.png 424w, https://substackcdn.com/image/fetch/$s_!5NND!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa369f3a6-2ba5-4b0d-9ec2-dbb743829cd2_1254x1254.png 848w, https://substackcdn.com/image/fetch/$s_!5NND!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa369f3a6-2ba5-4b0d-9ec2-dbb743829cd2_1254x1254.png 1272w, https://substackcdn.com/image/fetch/$s_!5NND!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa369f3a6-2ba5-4b0d-9ec2-dbb743829cd2_1254x1254.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>Why Classification Determines Platform Scale</h2><p>Every change without a classification model lands in the same queue. The platform engineer reviews it, asks the same baseline questions, and applies the same scrutiny regardless of actual risk.</p><p>This works for ten engineers. At forty, the queue is full before Tuesday morning. App teams submit on Monday and receive responses on Thursday. They learn to route around the process. They provision what they need outside the intake channel, outside tagging standards, outside the platform&#8217;s field of view.</p><p>The CTO sees slow delivery and a platform team that cannot explain its own backlog. The CFO sees an AWS bill with no clear attribution. The platform team answers both questions without controlling either outcome.</p><p>The root cause is not throughput. It is a classification. The platform is reviewing decisions that should never require review while inadequately scrutinizing those that genuinely do.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Playbook! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>The Two Failures That Keep Teams Here</h2><div class="paywall-jump" data-component-name="PaywallToDOM"></div><p>The first failure is binary thinking: every change either requires platform approval or it does not. Binary classification turns every edge case into a judgment call. Reviewers decide inconsistently. App teams learn which answer to expect and route requests accordingly. The approval process becomes a negotiation rather than a standard.</p><p>The second failure is calibrating risk on surface features rather than consequences. A complex Terraform file looks risky. A tag update looks trivial. But a missing cost allocation tag on a production RDS instance can result in months of misattributed spend. A well-validated Terraform module for a standard ECS service requires no review.</p><p>Risk lives in three variables: blast radius, reversibility, and compliance scope. Not in file complexity or line count. A classification model built on those three variables outperforms any heuristic based on surface appearance.</p><div><hr></div><h2>A Three-Tier Change Classification Model</h2><p>Tier classification sorts every change into one of three buckets based on risk profile, not technical complexity.</p><p><strong>Tier 1: Self-Service</strong></p><p>Changes that fall within established standards affect only the requesting team&#8217;s scope and are easily reversible if incorrect. No platform review required. Examples:</p><ul><li><p>Scaling compute within an approved instance family</p></li><li><p>Adding resources using approved Terraform modules with required cost allocation tags applied</p></li><li><p>Modifying application configuration for services already in scope</p></li><li><p>Updating routing rules within team-owned load balancers</p></li></ul><p>App teams document Tier 1 changes in their own change log. Platform audits a sample monthly, not every instance.</p><p><strong>Tier 2: Platform Review Required</strong></p><p>Changes that introduce new patterns, cross team boundaries, affect shared infrastructure, or touch controls within compliance scope. Examples:</p><ul><li><p>New VPC configurations or subnet additions</p></li><li><p>IAM roles with cross-account trust or elevated permissions</p></li><li><p>S3 bucket creation in production accounts</p></li><li><p>RDS instance provisioning above the defined size thresholds</p></li><li><p>Security group modifications affecting shared services</p></li><li><p>Any change to networking or compute that modifies a control in scope for SOC 2, FedRAMP, HIPAA, or ISO 27001</p></li></ul><p>Platform reviews Tier 2 changes within a defined SLA: one business day for standard configurations, three business days for requests requiring compliance assessment. The sensitive-change checklist governs what reviewers verify before approving.</p><p><strong>Tier 3: Platform-Owned Changes</strong></p><p>Changes that must never leave the platform's hands. These are decisions with a wide blast radius, direct compliance control, ownership, or irreversible consequences if wrong. Examples:</p><ul><li><p>Changes to account-level SCPs</p></li><li><p>Modifications to centralized logging or audit trail infrastructure</p></li><li><p>VPC peering, Transit Gateway, or Direct Connect configuration</p></li><li><p>Encryption key management and rotation policy</p></li><li><p>Organization-level IAM or identity federation configuration</p></li><li><p>Backup and disaster recovery configuration for shared infrastructure</p></li></ul><p>App teams do not submit Tier 3 items as requests. They describe the business need. Platform engineers own the implementation.</p><p>The key question for classifying any change between Tier 1 and Tier 2: if this configuration is wrong, how long will it take for someone to detect the impact, and how hard will it be to reverse? That question drives tier placement more reliably than any checklist.</p><div><hr></div><h2>Building the Classification System at Your Platform</h2><p>Follow this sequence to implement change classification:</p><p><strong>1. List your accountability scope explicitly.</strong> What does your platform team actually own? Reliability SLAs, cloud cost reporting, compliance posture, networking, identity, observability? Write it as a named list. This list becomes the anchor for the entire model.</p><p><strong>2. Map resource types to accountability areas.</strong> For each area, identify which AWS resource types and configuration choices directly affect the outcome. Cost accountability maps to compute sizing, storage provisioning, and cost allocation tagging. Compliance scope maps to encryption settings, access control, logging destinations, and network exposure.</p><p><strong>3. Score each resource type for blast radius and reversibility.</strong> Blast radius: if this resource is misconfigured, what breaks? Reversibility: if the error is caught after deployment, how quickly can it be corrected without impacting production? High blast radius plus low reversibility equals Tier 3. Low blast radius plus high reversibility equals Tier 1.</p><p><strong>4. Define the self-service boundary explicitly.</strong> Publish the list of resources and configurations that app teams can provision without review. Be specific. &#8220;EC2 instances using approved AMIs within approved instance families, tagged with required cost allocation tags, deployed via approved Terraform modules, into pre-approved subnets&#8221; is a self-service definition. &#8220;Standard EC2 instances&#8221; is not.</p><p><strong>5. Build the sensitive-change checklist.</strong> For each Tier 2 resource category, define the specific items a reviewer verifies before approving. The checklist for IAM role creation differs from the checklist for RDS provisioning. Keep each checklist under 10 items. More than that signals the category needs further decomposition.</p><p><strong>6. Name the escalation path.</strong> When a reviewer is uncertain about tier placement for an ambiguous request, who makes the final call? Name that person and document the process before the first edge case arrives.</p><div><hr></div><h2>Artifact in This Issue</h2><p>The artifact is the <strong>Platform Change Classification Matrix</strong>: a structured reference table for categorizing infrastructure changes by tier, risk profile, and review requirements.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://drive.google.com/file/d/1SHX2S_9M83yDxue3TDmFWfwM6NzmUZ4F/view?usp=sharing&quot;,&quot;text&quot;:&quot;Download&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://drive.google.com/file/d/1SHX2S_9M83yDxue3TDmFWfwM6NzmUZ4F/view?usp=sharing"><span>Download</span></a></p><p>The matrix contains:</p><ul><li><p>Resource type or change category (IAM role creation, S3 bucket provisioning, SCP modification, and 20 additional common resource types)</p></li><li><p>Risk indicators for each: blast radius score, reversibility score, compliance scope flag, shared infrastructure flag</p></li><li><p>Tier assignment: Self-Service, Platform Review, or Platform-Owned</p></li><li><p>Review the owner and review the SLA for every Tier 2 entry</p></li><li><p>Sensitive-change checklist for each Tier 2 category, listing the specific items a reviewer verifies before approving</p></li></ul><p>The matrix is structured as a flat reference table. Sort by tier to give reviewers a quick-reference view. Sort by resource type to give app teams a self-service lookup.</p><p>Use this matrix as the starting point for your next platform ops review session. Walk through the resource types in your environment, assign each to a tier, and resolve disagreements using the blast radius and reversibility scoring. Publish the completed version in your internal wiki and reference it as the first step in your intake form before any request enters the queue.</p><div><hr></div><h2>What to Measure and When to Review</h2><p><strong>Classification distribution.</strong> Track the percentage of intake requests landing in each tier each week. A healthy distribution: 60 to 70 percent Tier 1, 25 to 35 percent Tier 2, 5 to 10 percent Tier 3. If Tier 1 is below 50 percent, the self-service boundary is too narrow. If Tier 2 exceeds 50 percent, audit whether reviewers are escalating ambiguous requests rather than approving or rejecting them.</p><p><strong>Review cycle time for Tier 2.</strong> Mean time from submission to approval decision. Set a target SLA and track it weekly. If cycle time consistently exceeds the SLA, investigate whether the sensitive-change checklist is calibrated correctly or whether submitters are arriving with incomplete context.</p><p>Review the matrix quarterly. As your approved Terraform module library grows and app teams become familiar with standards, Tier 2 items should migrate to Tier 1. The model should become less restrictive over time as platform maturity increases.</p><div><hr></div><h2>What Changes When Classification Is Right</h2><p>When the classification model is working, the platform team is no longer the one slowing delivery. App teams run self-service for their own decisions. Reviewers focus on the changes that actually need review. Platform-owned decisions stay in the platform's hands.</p><p>Platform engineers stop triaging an undifferentiated backlog. They start doing architecture review work worth doing.</p><p>The compliance evidence improves. Tier 3 changes leave a clean ownership trail. Tier 2 approvals are documented against a checklist. Tier 1 changes are auditable via sampling rather than exhaustive review.</p><p>The platform team builds a reputation for predictable, fast responses. That reputation is more durable than any SLA commitment made without a supporting model.</p><div><hr></div><p>Use this matrix as the agenda for your next platform ops session. </p><p>Walk your team through the resource types in your environment and assign tiers. Share the completed matrix with the EMs and senior engineers who submit intake requests. </p><p>Run a calibration pass after the first 30 days: sort the previous month&#8217;s requests by tier and verify the classifications held. </p><p>If you manage a platform for multiple product teams, forward this to the EM or senior engineer who will be the primary intake submitter on each team.</p><div><hr></div><h2><strong>That&#8217;s it for today!</strong></h2><p>Did you enjoy this newsletter issue?</p><p>Share with your friends, colleagues, and your favorite social media platform.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share The Cloud Playbook&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share The Cloud Playbook</span></a></p><p><strong>Until next week &#8212; Amrut</strong></p><div><hr></div><h2><strong>Get in touch</strong></h2><p>You can find me on <a href="https://www.linkedin.com/in/patilamrut/">LinkedIn</a> or <a href="https://twitter.com/realamrutpatil">X</a>.</p><p>If you would like to request a topic to read, please feel free to contact me directly via LinkedIn or X.</p>]]></content:encoded></item><item><title><![CDATA[TCP #121: Accountability Without Authority Is How Platform Teams Fail]]></title><description><![CDATA[When platform is judged on reliability, cost, and compliance without approval rights over infrastructure, failure is structural, not personal.]]></description><link>https://www.thecloudplaybook.com/p/platform-accountability-without-authority-failure</link><guid isPermaLink="false">https://www.thecloudplaybook.com/p/platform-accountability-without-authority-failure</guid><dc:creator><![CDATA[Amrut Patil]]></dc:creator><pubDate>Sun, 10 May 2026 14:30:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_OTe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0d94943-f544-4e00-9422-58a937be63ad_1254x1254.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The most common dysfunction I see in platform engineering is not technical.</p><p>It is organizational.</p><p>The platform team is accountable for reliability, cost, and compliance readiness. Simultaneously, app teams provision their own infrastructure, configure their own environments, and make their own architecture choices. The platform has no approval rights over those decisions.</p><p>When something breaks, the platform team explains the incident. When the AWS bill is too high, the platform team presents the cost review. When the auditor finds a misconfigured S3 bucket provisioned by an app team, the platform team answers for it.</p><p>This is not a people problem. It is a structural mismatch: accountability without authority.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_OTe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0d94943-f544-4e00-9422-58a937be63ad_1254x1254.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_OTe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0d94943-f544-4e00-9422-58a937be63ad_1254x1254.png 424w, https://substackcdn.com/image/fetch/$s_!_OTe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0d94943-f544-4e00-9422-58a937be63ad_1254x1254.png 848w, https://substackcdn.com/image/fetch/$s_!_OTe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0d94943-f544-4e00-9422-58a937be63ad_1254x1254.png 1272w, https://substackcdn.com/image/fetch/$s_!_OTe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0d94943-f544-4e00-9422-58a937be63ad_1254x1254.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_OTe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0d94943-f544-4e00-9422-58a937be63ad_1254x1254.png" width="1254" height="1254" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e0d94943-f544-4e00-9422-58a937be63ad_1254x1254.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1254,&quot;width&quot;:1254,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1487684,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecloudplaybook.com/i/196371584?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0d94943-f544-4e00-9422-58a937be63ad_1254x1254.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_OTe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0d94943-f544-4e00-9422-58a937be63ad_1254x1254.png 424w, https://substackcdn.com/image/fetch/$s_!_OTe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0d94943-f544-4e00-9422-58a937be63ad_1254x1254.png 848w, https://substackcdn.com/image/fetch/$s_!_OTe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0d94943-f544-4e00-9422-58a937be63ad_1254x1254.png 1272w, https://substackcdn.com/image/fetch/$s_!_OTe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0d94943-f544-4e00-9422-58a937be63ad_1254x1254.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>The Business Cost of Structural Mismatch</h2><p>Structural mismatch in platform engineering manifests in three ways, all of which are costly.</p><ul><li><p><strong>Reliability incidents you cannot prevent</strong></p></li></ul><p>App teams deploy changes outside the platform&#8217;s change management process. Those changes introduce instability. Platform is on call for the fallout. The platform team cannot stop the root cause; they can only respond to it after the fact.</p><ul><li><p><strong>Cloud cost variance you cannot explain</strong></p></li></ul><p>App teams create resources without tagging standards. The platform team reports total cloud spend to the CTO, but they cannot attribute it to teams, products, or tenants with confidence. Cost reviews become estimates. Budget conversations become defensive.</p><ul><li><p><strong>Audit findings you cannot remediate</strong></p></li></ul><p>A control requires that all S3 buckets be encrypted. An app team creates a bucket without it. The auditor finds it. The platform team owns the control. But they did not own the bucket.</p><p>Each of these is a variation of the same root problem. The platform owns the outcome without owning the inputs.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">The Cloud Playbook is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>How Smart Teams End Up Here</h2><p>This structure does not happen by accident. It usually develops through a reasonable sequence of decisions.</p><p>The company starts small. Developers provision their own infrastructure. It works. Ownership is clear because ownership is total: each team owns everything they build.</p><p>A platform team forms. Its first mandate is to centralize shared services: CI/CD, networking, and account management. It takes those over. But app teams keep their existing infrastructure provisioning rights. Nobody wants to slow them down.</p><p>The platform team grows. It takes on reliability objectives. It takes on a compliance scope. It takes on cost accountability. Each expansion is reasonable in isolation.</p><p>What nobody updates is the boundary. App teams still have full autonomy over their own infrastructure. The platform now has accountability for the consequences of that autonomy.</p><p>By the time this becomes visible, it is embedded. The platform team is measured against outcomes they cannot fully control. Their performance review includes metrics that depend on decisions they have no input into.</p><div><hr></div><h2>The Operating Principle That Resolves This</h2><p>Accountability must match authority. This is not an organizational theory. It is an operating principle that can be implemented in weeks.</p><p>The practical form: <strong>platform teams should have approval rights over any infrastructure decision that touches their accountability scope.</strong></p><p>That means:</p><ul><li><p>If the platform owns the cloud cost review, the platform approves compute and storage provisioning above the defined thresholds</p></li><li><p>If the platform owns the compliance posture, the platform reviews and approves infrastructure changes that affect controls in scope</p></li><li><p>If the platform owns the reliability objective, the platform sets the change management policy that governs deployments to production</p></li></ul><p>This does not require the platform to do all the work. App teams still build. They still deploy. The platform provides the standards, the guardrails, and, in defined cases, the approval gate.</p><p>The goal is not control. The goal is alignment between who owns the risk and who influences the decisions that create it.</p><div><hr></div><h2>What Improves When You Get This Right</h2><p>When accountability matches authority, three things stabilize quickly.</p><p>Incidents become attributable. When platform standards govern the infrastructure, post-incident reviews identify gaps in the standard, not just in the team that missed it. The root cause analysis becomes systemic. The fix improves the platform, not just the individual response.</p><p>Cost reviews become credible. When the platform controls tagging policy and provisioning standards, attribution improves. You can present AWS spend by team, by product, or by tenant. The CFO gets a useful number. The CTO can make decisions from it.</p><p>Audit prep becomes predictable. When the platform owns the change classification model and the approval gates, control coverage is tracked continuously. Evidence collection becomes operational rather than reactive.</p><p>The platform team stops being the team that explains what went wrong. It becomes the team that designed the system that prevented it.</p><div><hr></div><p><em>On Wednesday, paid subscribers get the full operating model for implementing this: the approval decision tree, the change classification model, and the sensitive-change checklist I would use to define which infrastructure decisions need platform review, which can be self-service, and which must be blocked.</em></p><h2><strong>That&#8217;s it for today!</strong></h2><p>Did you enjoy this newsletter issue?</p><p>Share with your friends, colleagues, and your favorite social media platform.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share The Cloud Playbook&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share The Cloud Playbook</span></a></p><p><strong>Until next week &#8212; Amrut</strong></p><div><hr></div><h2><strong>Get in touch</strong></h2><p>You can find me on <a href="https://www.linkedin.com/in/patilamrut/">LinkedIn</a> or <a href="https://twitter.com/realamrutpatil">X</a>.</p><p>If you would like to request a topic to read, please feel free to contact me directly via LinkedIn or X.</p>]]></content:encoded></item><item><title><![CDATA[TCP #120: The Infrastructure Ownership Matrix For Platform And App Teams]]></title><description><![CDATA[A practical way to decide who can provision which AWS resources, under what conditions, and with whose approval.]]></description><link>https://www.thecloudplaybook.com/p/infrastructure-ownership-matrix-platform-app-teams-aws</link><guid isPermaLink="false">https://www.thecloudplaybook.com/p/infrastructure-ownership-matrix-platform-app-teams-aws</guid><dc:creator><![CDATA[Amrut Patil]]></dc:creator><pubDate>Wed, 06 May 2026 11:00:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!QX_P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488e5be-47fd-437b-bec0-b260eafc96cb_1086x1448.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Sunday&#8217;s issue named the problem: infrastructure provisioned without an ownership model creates reliability gaps, cost exposure, and audit risk.</p><p>This issue gives you the system.</p><p><strong>The Infrastructure Ownership Matrix</strong> defines who can provision what, under what conditions, and who approves what changes. It replaces the informal agreements that break down as soon as a team grows or a new engineer joins.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QX_P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488e5be-47fd-437b-bec0-b260eafc96cb_1086x1448.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QX_P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488e5be-47fd-437b-bec0-b260eafc96cb_1086x1448.png 424w, https://substackcdn.com/image/fetch/$s_!QX_P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488e5be-47fd-437b-bec0-b260eafc96cb_1086x1448.png 848w, https://substackcdn.com/image/fetch/$s_!QX_P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488e5be-47fd-437b-bec0-b260eafc96cb_1086x1448.png 1272w, https://substackcdn.com/image/fetch/$s_!QX_P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488e5be-47fd-437b-bec0-b260eafc96cb_1086x1448.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QX_P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488e5be-47fd-437b-bec0-b260eafc96cb_1086x1448.png" width="1086" height="1448" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2488e5be-47fd-437b-bec0-b260eafc96cb_1086x1448.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1448,&quot;width&quot;:1086,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1426294,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecloudplaybook.com/i/196369637?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488e5be-47fd-437b-bec0-b260eafc96cb_1086x1448.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QX_P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488e5be-47fd-437b-bec0-b260eafc96cb_1086x1448.png 424w, https://substackcdn.com/image/fetch/$s_!QX_P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488e5be-47fd-437b-bec0-b260eafc96cb_1086x1448.png 848w, https://substackcdn.com/image/fetch/$s_!QX_P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488e5be-47fd-437b-bec0-b260eafc96cb_1086x1448.png 1272w, https://substackcdn.com/image/fetch/$s_!QX_P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488e5be-47fd-437b-bec0-b260eafc96cb_1086x1448.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Why This Problem Gets Expensive Before Anyone Notices</h2>
      <p>
          <a href="https://www.thecloudplaybook.com/p/infrastructure-ownership-matrix-platform-app-teams-aws">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[TCP #119: Platform Teams Do Not Scale by Saying Yes Faster]]></title><description><![CDATA[Most platform bottlenecks come from unclear intake, routing, approvals, and ownership, not a lack of headcount.]]></description><link>https://www.thecloudplaybook.com/p/platform-team-bottlenecks-intake-routing-approvals</link><guid isPermaLink="false">https://www.thecloudplaybook.com/p/platform-team-bottlenecks-intake-routing-approvals</guid><dc:creator><![CDATA[Amrut Patil]]></dc:creator><pubDate>Mon, 04 May 2026 11:01:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!RGNt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736ef2b7-1141-480a-90bf-8980fb64472b_1492x1054.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Requests pile up. Developers escalate to their managers, who escalate to platform leadership.</p><p>The SLA misses compound. Engineers work hard and still fall behind.</p><p>Every VP who sees this situation reaches the same conclusion: the platform team needs more headcount.</p><p>That conclusion is almost always wrong.</p><p>Platform team bottlenecks do not come from teams that are too small. They come from work, arrive through unclear intake channels, are routed to ambiguous owners, and wait for approvals nobody documented.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RGNt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736ef2b7-1141-480a-90bf-8980fb64472b_1492x1054.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RGNt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736ef2b7-1141-480a-90bf-8980fb64472b_1492x1054.png 424w, https://substackcdn.com/image/fetch/$s_!RGNt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736ef2b7-1141-480a-90bf-8980fb64472b_1492x1054.png 848w, https://substackcdn.com/image/fetch/$s_!RGNt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736ef2b7-1141-480a-90bf-8980fb64472b_1492x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!RGNt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736ef2b7-1141-480a-90bf-8980fb64472b_1492x1054.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RGNt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736ef2b7-1141-480a-90bf-8980fb64472b_1492x1054.png" width="1456" height="1029" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/736ef2b7-1141-480a-90bf-8980fb64472b_1492x1054.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1029,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1454431,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecloudplaybook.com/i/196368219?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736ef2b7-1141-480a-90bf-8980fb64472b_1492x1054.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RGNt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736ef2b7-1141-480a-90bf-8980fb64472b_1492x1054.png 424w, https://substackcdn.com/image/fetch/$s_!RGNt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736ef2b7-1141-480a-90bf-8980fb64472b_1492x1054.png 848w, https://substackcdn.com/image/fetch/$s_!RGNt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736ef2b7-1141-480a-90bf-8980fb64472b_1492x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!RGNt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F736ef2b7-1141-480a-90bf-8980fb64472b_1492x1054.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>Why Faster Ticket Response Does Not Fix the Platform Engineering Bottleneck</h2><p>The reflexive response to a growing platform team backlog is to optimize throughput.</p><p>Run intake meetings twice a week instead of once. Add a triage rotation. Write SLA targets. Bring in a TPM to route requests. Some leaders introduce a tiered priority system: P0 gets a 24-hour response, P1 gets a five-day response, and P2 gets a two-week response.</p><p>Each change makes the intake process marginally more efficient. None of them fixes what actually causes requests to stall.</p><p>They do not tell a developer where to submit a request when their Slack message from two weeks ago went unanswered. They do not clarify which approval an engineer needs to unblock a security exception. They do not identify who owns an ambiguous request when it lands in the queue with no routing context.</p><p>Adding a priority label to an unrouted request does not route it.</p><p>Faster throughput into an unclear structure is still unclear.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">The Cloud Playbook is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Four Structural Gaps That Make Platform Teams a Bottleneck</h2><p>Platform engineering scalability problems follow a consistent pattern. Four things are missing at once.</p><ol><li><p><strong>Intake clarity.</strong> There is no single, well-defined path to request platform work. Some teams submit tickets. Others use Slack. Some skip both and go straight to a platform engineer.</p><p>Because the platform team's intake process is informal, everything arrives marked urgent. The team cannot distinguish a genuine blocker from a request that can wait two weeks.</p></li><li><p><strong>Routing clarity.</strong> Once a request lands, no one is certain who will handle it. The team is large enough that ownership is ambiguous.</p><p>Requests get forwarded, sit in limbo, or wait for whoever happens to know the most about that area. There is no platform team request routing logic written down anywhere.</p></li><li><p><strong>Approval clarity.</strong> New infrastructure, security exceptions, and networking changes: each requires sign-off. But the approval chain is not documented.</p><p>Requests stall while engineers chase the right approver. Without a defined process, there is no predictable SLA for anything requiring sign-off, and every blocked request becomes a separate escalation path.</p></li><li><p><strong>Ownership clarity.</strong> When something breaks or a decision needs to be made, &#8220;Who owns this?&#8221; takes too long to answer.</p><p>If developer platform ownership is ambiguous during normal operations, it becomes a crisis under pressure. Every incident starts with a 20-minute conversation that should take 90 seconds.</p></li></ol><p>These four gaps appear to be a capacity problem from the outside. Inside, they feel like everyone is working hard, but nothing is moving.</p><p>Adding engineers to this structure does not fix it. It replicates it. Each new hire spends their first months navigating the same ambiguity the current team has learned to live with.</p><div><hr></div><h2>The Four Questions That Confirm a Structure Problem</h2><p>Before approving a headcount requisition, run this diagnostic.</p><ol><li><p>If a developer needs a new service account today, do they know exactly where to submit the request? Or does the answer depend on who they know?</p></li><li><p>When a request arrives, can your platform engineer identify the owner in under five minutes without asking three colleagues?</p></li><li><p>For a security exception request, can you name the approver and the expected response time right now, without looking it up?</p></li><li><p>If you ask five engineers on your platform team, &#8220;Who owns the API gateway?&#8221; do you get the same answer within five minutes?</p></li></ol><p>One &#8220;it depends&#8221; in those answers means you have a platform team structure problem, not a headcount problem. Hiring more engineers will not change those answers.</p><div><hr></div><h2>How to Make Platform Team Structure Explicit Before You Hire</h2><p>These structural fixes cost less than a single hire and last longer than any retrospective.</p><p><strong>Define one intake channel.</strong> One Slack channel. One ticket form. One entry point for all requests.</p><p>Not &#8220;it depends on the request type.&#8221; One place. This makes the queue visible and eliminates the parallel-path problem where the same work gets started twice by two people who each received a slightly different version of the request.</p><p><strong>Build a routing matrix.</strong> For each request category, define who handles it by role, not name.</p><p>New service account: Platform Infrastructure team, reviewed Mondays. Security exception: Security guild plus Platform lead, SLA 5 business days. The matrix need not be complex. It needs to exist.</p><p><strong>Document the approval chain.</strong> For every request type requiring sign-off, name the role and the expected turnaround. Post it in your intake channel.</p><p>Approvals do not need to be fast. They need to be predictable.</p><p><strong>Assign single owners.</strong> Every platform component, every shared service, every critical decision needs one named person, not a team. Ownership rotates on a schedule. The clarity does not.</p><p>The goal is not to eliminate judgment from the platform team. It is to remove the structural overhead that consumes judgment before real work begins. When intake, routing, approvals, and ownership are clear, engineers spend more time engineering.</p><div><hr></div><h2>Run this check this week:</h2><p>Pull the last five platform requests that missed your SLA.</p><p>For each one, trace its entry into the system, its routing, who needed to approve it, and at which step it stopped moving.</p><p>That step is your structural gap. Fix it before opening a headcount requisition.</p><p>Teams that define intake, routing, and ownership before their next hire recover 30 to 40 percent of effective capacity without adding a single engineer. That is the capacity that the structural ambiguity was absorbing.</p><p>Every time I have traced a chronic platform team backlog to its root cause, the issue was structural: a missing routing matrix, an undocumented approval chain, and no one who could answer &#8220;who owns the API gateway&#8221; in under thirty seconds. </p><p>The team was not too small. The structure was invisible.</p><p><em>On Wednesday, paid subscribers get the full Platform Intake Operating Model: a routing matrix template, approval tiers, rollout checklist, and metrics you can implement with your team.</em></p><div><hr></div><h2>Upgrade If You Need Implementation, Not Just Ideas</h2><p>If you&#8217;re using these emails to guide real decisions on your platform, you&#8217;ll get more leverage from the paid version of The Cloud Playbook.</p><p>The free newsletter gives you patterns and language.</p><p>The paid newsletter turns those patterns into implementation kits you can ship inside a quarter:</p><ul><li><p>Concrete rollout plans (90&#8209;day roadmaps for each pattern)</p></li><li><p>Templates and checklists (policies, runbooks, tagging schemes, review checklists)</p></li><li><p>Real examples from high&#8209;stakes AWS environments (what we actually shipped and why)</p></li></ul><p>If the paid side doesn&#8217;t save you more than the subscription in <strong>one</strong> incident, audit cycle, or bad migration you avoid, you should cancel and keep the playbooks.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe&quot;,&quot;text&quot;:&quot;Upgrade to the Paid Cloud Playbook&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.thecloudplaybook.com/subscribe"><span>Upgrade to the Paid Cloud Playbook</span></a></p><div><hr></div><h2><strong>That&#8217;s it for today!</strong></h2><p>Did you enjoy this newsletter issue?</p><p>Share with your friends, colleagues, and your favorite social media platform.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share The Cloud Playbook&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share The Cloud Playbook</span></a></p><p><strong>Until next week &#8212; Amrut</strong></p><div><hr></div><h2><strong>Get in touch</strong></h2><p>You can find me on <a href="https://www.linkedin.com/in/patilamrut/">LinkedIn</a> or <a href="https://twitter.com/realamrutpatil">X</a>.</p><p>If you would like to request a topic to read, please feel free to contact me directly via LinkedIn or X.</p>]]></content:encoded></item><item><title><![CDATA[TCP #118: The Platform Intake Operating Model For Scaling Platform Teams]]></title><description><![CDATA[How to replace Slack chaos with a routing matrix, approval tiers, and rollout plan your platform team can actually live with]]></description><link>https://www.thecloudplaybook.com/p/platform-intake-operating-model-routing-matrix-approvals-slas</link><guid isPermaLink="false">https://www.thecloudplaybook.com/p/platform-intake-operating-model-routing-matrix-approvals-slas</guid><dc:creator><![CDATA[Amrut Patil]]></dc:creator><pubDate>Thu, 30 Apr 2026 15:03:04 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!VXPY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee53e291-8bf3-4c38-9f0b-a6de7dc69e68_1491x1055.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="https://open.substack.com/pub/thecloudplaybook/p/platform-team-bottleneck?r=ainou&amp;utm_campaign=post&amp;utm_medium=web">Sunday&#8217;s newsletter issue</a> made the case. Saying yes faster does not scale a platform team. Structure does.</p><p>When intake is informal, platform teams drown in tickets, senior engineers become human routers, and every request feels urgent. Reliability suffers because the work that prevents incidents gets displaced by whoever shouts the loudest.</p><p>A platform team that cannot control intake cannot control its outcomes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VXPY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee53e291-8bf3-4c38-9f0b-a6de7dc69e68_1491x1055.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VXPY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee53e291-8bf3-4c38-9f0b-a6de7dc69e68_1491x1055.png 424w, https://substackcdn.com/image/fetch/$s_!VXPY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee53e291-8bf3-4c38-9f0b-a6de7dc69e68_1491x1055.png 848w, https://substackcdn.com/image/fetch/$s_!VXPY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee53e291-8bf3-4c38-9f0b-a6de7dc69e68_1491x1055.png 1272w, https://substackcdn.com/image/fetch/$s_!VXPY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee53e291-8bf3-4c38-9f0b-a6de7dc69e68_1491x1055.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VXPY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee53e291-8bf3-4c38-9f0b-a6de7dc69e68_1491x1055.png" width="1456" height="1030" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ee53e291-8bf3-4c38-9f0b-a6de7dc69e68_1491x1055.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1030,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1777148,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecloudplaybook.com/i/195580288?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee53e291-8bf3-4c38-9f0b-a6de7dc69e68_1491x1055.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VXPY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee53e291-8bf3-4c38-9f0b-a6de7dc69e68_1491x1055.png 424w, https://substackcdn.com/image/fetch/$s_!VXPY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee53e291-8bf3-4c38-9f0b-a6de7dc69e68_1491x1055.png 848w, https://substackcdn.com/image/fetch/$s_!VXPY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee53e291-8bf3-4c38-9f0b-a6de7dc69e68_1491x1055.png 1272w, https://substackcdn.com/image/fetch/$s_!VXPY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee53e291-8bf3-4c38-9f0b-a6de7dc69e68_1491x1055.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2><strong>The Human Router Platform</strong></h2><p>Most teams start with good intent.</p>
      <p>
          <a href="https://www.thecloudplaybook.com/p/platform-intake-operating-model-routing-matrix-approvals-slas">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[TCP #117: Your platform team doesn’t have a capacity problem.]]></title><description><![CDATA[4 structure checks to recover 30&#8211;40% of their time without hiring.]]></description><link>https://www.thecloudplaybook.com/p/platform-team-bottleneck</link><guid isPermaLink="false">https://www.thecloudplaybook.com/p/platform-team-bottleneck</guid><dc:creator><![CDATA[Amrut Patil]]></dc:creator><pubDate>Sun, 26 Apr 2026 14:21:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!xxQU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d86b01-1368-4adf-8e0b-e116e611887d_1122x1402.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Requests pile up. Developers escalate to their managers, who escalate to platform leadership.</p><p>The SLA misses compound. Engineers work hard and still fall behind.</p><p>Every VP who sees this situation reaches the same conclusion: the platform team needs more headcount.</p><p>That conclusion is almost always wrong.</p><p>Platform team bottlenecks do not come from teams that are too small. They come from work that arrives through unclear intake channels, gets routed to ambiguous owners, and waits for undocumented approvals.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xxQU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d86b01-1368-4adf-8e0b-e116e611887d_1122x1402.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xxQU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d86b01-1368-4adf-8e0b-e116e611887d_1122x1402.png 424w, https://substackcdn.com/image/fetch/$s_!xxQU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d86b01-1368-4adf-8e0b-e116e611887d_1122x1402.png 848w, https://substackcdn.com/image/fetch/$s_!xxQU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d86b01-1368-4adf-8e0b-e116e611887d_1122x1402.png 1272w, https://substackcdn.com/image/fetch/$s_!xxQU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d86b01-1368-4adf-8e0b-e116e611887d_1122x1402.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xxQU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d86b01-1368-4adf-8e0b-e116e611887d_1122x1402.png" width="1122" height="1402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f4d86b01-1368-4adf-8e0b-e116e611887d_1122x1402.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1402,&quot;width&quot;:1122,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1492691,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecloudplaybook.com/i/193221322?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d86b01-1368-4adf-8e0b-e116e611887d_1122x1402.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xxQU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d86b01-1368-4adf-8e0b-e116e611887d_1122x1402.png 424w, https://substackcdn.com/image/fetch/$s_!xxQU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d86b01-1368-4adf-8e0b-e116e611887d_1122x1402.png 848w, https://substackcdn.com/image/fetch/$s_!xxQU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d86b01-1368-4adf-8e0b-e116e611887d_1122x1402.png 1272w, https://substackcdn.com/image/fetch/$s_!xxQU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4d86b01-1368-4adf-8e0b-e116e611887d_1122x1402.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>Why Faster Ticket Response Does Not Fix the Platform Engineering Bottleneck</h2><p>The reflexive response to a growing platform team backlog is to optimize throughput.</p><p>Run intake meetings twice a week instead of once. Add a triage rotation. Write SLA targets. Bring in a TPM to route requests. Some leaders introduce a tiered priority system: P0 gets a 24-hour response, P1 gets a five-day response, and P2 gets a two-week response.</p><p>Each change makes the intake process marginally more efficient. None of them fixes what actually causes requests to stall.</p><p>They do not tell a developer where to submit a request when their Slack message from two weeks ago went unanswered. They do not clarify which approval an engineer needs to unblock a security exception. They do not identify who owns an ambiguous request when it lands in the queue with no routing context.</p><p>Adding a priority label to an unrouted request does not route it.</p><p>Faster throughput into an unclear structure is still unclear structure.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">The Cloud Playbook is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Four Structural Gaps That Make Platform Teams a Bottleneck</h2><p>Platform engineering scalability problems follow a consistent pattern: four structural elements are usually missing at once.</p><ol><li><p><strong>Intake clarity.</strong> There is no single, well-defined path to request platform work. Some teams submit tickets. Others use Slack. Some skip both and corner a platform engineer directly. </p><p>Because the platform team intake process is informal, everything arrives marked urgent. The team cannot distinguish a genuine blocker from a request that can wait two weeks.</p></li><li><p><strong>Routing clarity.</strong> Once a request lands, no one is certain who will handle it. The team is large enough that ownership is ambiguous.</p><p>Requests get forwarded, sit in limbo, or wait for whoever happens to know the most about that area. There is no platform team request routing logic written down anywhere.</p></li><li><p><strong>Approval clarity.</strong> New infrastructure, security exceptions, and networking changes: each requires sign-off. But the approval chain is not documented.</p><p>Requests stall while engineers chase the right approver. Without a defined process, there is no predictable SLA for anything requiring sign-off, and every blocked request becomes a separate escalation path.</p></li><li><p><strong>Ownership clarity.</strong> When something breaks or a decision needs to be made, &#8220;Who owns this?&#8221; takes too long to answer. If developer platform ownership is ambiguous during normal operations, it becomes a crisis under pressure. Every incident starts with a 20-minute conversation that should take 90 seconds.</p></li></ol><p>These four gaps appear to be a capacity problem from the outside. Inside, they feel like everyone is working hard, but nothing is moving.</p><p>Adding engineers to this structure does not fix it. It replicates it. Each new hire spends their first months navigating the same ambiguity the current team has learned to live with.</p><div><hr></div><h2>The Four Questions That Confirm a Structure Problem</h2><p>Before approving a headcount requisition, run this diagnostic.</p><ol><li><p>If a developer needs a new service account today, do they know exactly where to submit the request? Or does the answer depend on who they know?</p></li><li><p>When a request arrives, can your platform engineer identify the owner in under five minutes without asking three colleagues?</p></li><li><p>For a security exception request, can you name the approver and the expected response time right now, without looking it up?</p></li><li><p>If you ask five engineers on your platform team, &#8220;Who owns the API gateway?&#8221; do you get the same answer within five minutes?</p></li></ol><p>One &#8220;it depends&#8221; in those answers means you have a platform team structure problem, not a headcount problem. Hiring more engineers will not change those answers.</p><div><hr></div><h2>How to Make Platform Team Structure Explicit Before You Hire</h2><p>These structural fixes cost less than a single hire and last longer than any retrospective.</p><ul><li><p><strong>Define one intake channel.</strong> One Slack channel. One ticket form. One entry point for all requests.</p><p>Not &#8220;it depends on the request type.&#8221; One place. This makes the queue visible and eliminates the parallel-path problem where the same work gets started twice by two people who each received a slightly different version of the request.</p></li><li><p><strong>Build a routing matrix.</strong> For each request category, define who handles it by role, not name.</p><p>New service account: Platform Infrastructure team, reviewed Mondays. Security exception: Security guild plus Platform lead, SLA 5 business days. The matrix need not be complex. It needs to exist.</p></li><li><p><strong>Document the approval chain.</strong> For every request type requiring sign-off, name the role and the expected turnaround. Post it in your intake channel.</p><p>Approvals do not need to be fast. They need to be predictable.</p></li><li><p><strong>Assign single owners.</strong> Every platform component, every shared service, every critical decision needs one named person, not a team. Ownership rotates on a schedule. The clarity does not.</p></li></ul><p>The goal is not to eliminate judgment from the platform team. It is to remove the structural overhead that consumes judgment before real work begins. When intake, routing, approvals, and ownership are clear, engineers spend more time engineering.</p><div><hr></div><h3>Run this check this week</h3><p>Pull the last five platform requests that missed your SLA.</p><p>For each one, trace its entry into the system, its routing, who needed to approve it, and at which step it stopped moving.</p><p>That step is your structural gap. Fix it before opening a headcount requisition.</p><p>Teams that define intake, routing, and ownership before their next hire recover 30 to 40 percent of effective capacity without adding a single engineer. That is the capacity that the structural ambiguity was absorbing.</p><p>Every time I have traced a chronic platform team backlog to its root cause, the issue was structural: a missing routing matrix, an undocumented approval chain, and no one who could answer &#8220;who owns the API gateway&#8221; in under thirty seconds. </p><p>The team was not too small. The structure was invisible.</p><div><hr></div><h3>Upgrade If You Need Implementation, Not Just Ideas</h3><p>If you&#8217;re using these emails to guide real decisions on your platform, you&#8217;ll get more leverage from the paid version of The Cloud Playbook.</p><p>The free newsletter gives you patterns and language.</p><p>The paid newsletter turns those patterns into implementation kits you can ship inside a quarter:</p><ul><li><p>Concrete rollout plans (90&#8209;day roadmaps for each pattern)</p></li><li><p>Templates and checklists (policies, runbooks, tagging schemes, review checklists)</p></li><li><p>Real examples from high&#8209;stakes AWS environments (what we actually shipped and why)</p></li></ul><p>If the paid side doesn&#8217;t save you more than the subscription in <strong>one</strong> incident, audit cycle, or bad migration you avoid, you should cancel and keep the playbooks.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe&quot;,&quot;text&quot;:&quot;Upgrade to the Paid Cloud Playbook&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.thecloudplaybook.com/subscribe"><span>Upgrade to the Paid Cloud Playbook</span></a></p><div><hr></div><h2><strong>That&#8217;s it for today!</strong></h2><p>Did you enjoy this newsletter issue?</p><p>Share with your friends, colleagues, and your favorite social media platform.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share The Cloud Playbook&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share The Cloud Playbook</span></a></p><p><strong>Until next week &#8212; Amrut</strong></p><div><hr></div><h2><strong>Get in touch</strong></h2><p>You can find me on <a href="https://www.linkedin.com/in/patilamrut/">LinkedIn</a> or <a href="https://twitter.com/realamrutpatil">X</a>.</p><p>If you would like to request a topic to read, please feel free to contact me directly via LinkedIn or X.</p>]]></content:encoded></item><item><title><![CDATA[TCP #116: Most teams don't have a technical debt problem.]]></title><description><![CDATA[They have a decision debt problem. The distinction changes what you measure, build, and protect.]]></description><link>https://www.thecloudplaybook.com/p/technical-debt-vs-decision-debt-platform-engineering</link><guid isPermaLink="false">https://www.thecloudplaybook.com/p/technical-debt-vs-decision-debt-platform-engineering</guid><dc:creator><![CDATA[Amrut Patil]]></dc:creator><pubDate>Sun, 12 Apr 2026 14:30:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Zf9E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff09ac1b8-33af-4d9f-a2c6-8d00d230521e_1948x544.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most engineering leaders talk about technical debt as if it is a coding problem.</p><p>It is not.</p><p>The systems that break expensively, the ones that consume quarters of remediation work, delay IPOs, and create the audit findings nobody can explain, almost never break because the code was bad.</p><p>They break because the decisions behind the code were never documented.</p><p>That is a different problem. And it requires a different fix.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Zf9E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff09ac1b8-33af-4d9f-a2c6-8d00d230521e_1948x544.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Zf9E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff09ac1b8-33af-4d9f-a2c6-8d00d230521e_1948x544.png 424w, https://substackcdn.com/image/fetch/$s_!Zf9E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff09ac1b8-33af-4d9f-a2c6-8d00d230521e_1948x544.png 848w, https://substackcdn.com/image/fetch/$s_!Zf9E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff09ac1b8-33af-4d9f-a2c6-8d00d230521e_1948x544.png 1272w, https://substackcdn.com/image/fetch/$s_!Zf9E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff09ac1b8-33af-4d9f-a2c6-8d00d230521e_1948x544.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Zf9E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff09ac1b8-33af-4d9f-a2c6-8d00d230521e_1948x544.png" width="1456" height="407" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f09ac1b8-33af-4d9f-a2c6-8d00d230521e_1948x544.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:407,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1787664,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecloudplaybook.com/i/190867678?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff09ac1b8-33af-4d9f-a2c6-8d00d230521e_1948x544.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Zf9E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff09ac1b8-33af-4d9f-a2c6-8d00d230521e_1948x544.png 424w, https://substackcdn.com/image/fetch/$s_!Zf9E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff09ac1b8-33af-4d9f-a2c6-8d00d230521e_1948x544.png 848w, https://substackcdn.com/image/fetch/$s_!Zf9E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff09ac1b8-33af-4d9f-a2c6-8d00d230521e_1948x544.png 1272w, https://substackcdn.com/image/fetch/$s_!Zf9E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff09ac1b8-33af-4d9f-a2c6-8d00d230521e_1948x544.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>How Most Engineering Leaders Frame The Problem</h2><p>Technical debt is the dominant frame for platform problems.</p><p>It is a useful shortcut. But it points to the wrong layer.</p><p>When leaders say &#8220;we have technical debt,&#8221; they usually mean one of three things: </p><ul><li><p>The codebase is harder to change than it should be</p></li><li><p>The system is harder to reason about than it should be, or </p></li><li><p>The architecture does not match the current scale of the organization.</p></li></ul><p>All of those things can be true.</p><p>But they are usually symptoms of a deeper problem: the team cannot explain why the system works the way it does, because the decisions that produced it were made informally, without documentation, by people who may no longer be at the company.</p><p>The technical debt is real. But it is downstream of the decision debt that created it.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">The Cloud Playbook is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Why Framing This As Technical Debt Produces The Wrong Fix</h2><p>The technical debt frame leads to the same response: allocate engineering time to refactor, migrate, or modernize. That work often creates genuine value.</p><p>It does not prevent the same problem from recurring.</p><p>Teams that pay down technical debt without addressing the decision practices that created it are running a maintenance loop. They clean up the current accumulation. New decision debt creates new technical debt. The cycle repeats.</p><p>The root issue is structural: most engineering organizations do not treat decisions as artifacts that need to be created, stored, and maintained. They treat decisions as conversations that produce outcomes.</p><p><strong>Decisions made in conversations evaporate. The outcome, the code, the architecture, and the policy persist.</strong></p><p>The reasoning does not.</p><p>Three months later, a new engineer inherits the system and asks why it works the way it does. The answer is: nobody knows.</p><p>That is decision debt.</p><h2>The Better Frame: Decisions Are Durable Artifacts</h2><p>Decision debt is the accumulation of choices that were made but not documented: the AWS account structure rationale, the secrets management approach, the deployment ownership model, and the trade-offs accepted under time pressure.</p><p>Unlike technical debt, decision debt is invisible.</p><p>You cannot run a linter against it. It does not surface in code reviews. It shows up during an audit, a compliance review, a post-incident retrospective, or a due diligence process when someone needs to understand why the platform works the way it does, but no one can answer.</p><p>Reframing from technical debt to decision debt changes what you measure, what you build, and what you protect.</p><h2>What Changes When You See It This Way</h2><p>When you treat decision debt as the primary problem, the fix shifts from code to documentation, but not the kind of documentation most teams write.</p><p>Not README files. Not wiki pages that go stale in 90 days.</p><p>The artifact that matters is a decision record: a brief, durable document that captures what was decided, what the alternatives were, why this option was chosen, and what conditions would cause you to revisit it.</p><p><strong>Architecture Decision Records (ADRs)</strong> are the most common format. The format matters less than the habit.</p><p>Platform teams that practice decision documentation accumulate something more valuable than clean code: they accumulate institutional reasoning.</p><p>When an auditor asks why the platform has three separate IAM policies, the team with decision records can answer in 10 minutes. The team reconstructs the rationale without them over six weeks.</p><p>When a new CTO joins and asks why the organization chose multi-account over single-account AWS, the team with decision records shows them the 2023 evaluation. The team, without them, shrugs.</p><p>The gap compounds at every leadership transition, every compliance review, and every architecture evolution.</p><h2>One Action: Start The Record</h2><p>Identify the five most consequential platform decisions made in the last 24 months.</p><p>For each, write a single paragraph capturing: what was decided, what was rejected, why, and what would cause you to revisit it. Date it. Name the decision owner.</p><p>Store it somewhere that the next engineer and the next auditor can find it.</p><p>That is your starting point for a decision debt practice. It will not eliminate the backlog overnight. But it will stop the accumulation.</p><div><hr></div><h2>What to do this week</h2><p>Pull your last three post-incident retrospectives.</p><p>For each incident, identify whether the root cause was a technical failure or a decision that was made without documentation.</p><p>If you cannot answer that question, the decision record does not exist &#8212; and the same incident will recur under different conditions.</p><p>Platform reliability is not a code quality problem. It is a decision quality problem. The documentation is the practice.</p><p><em>Every time I&#8217;ve worked through a platform audit or due diligence process, the hardest questions to answer are not technical. </em></p><p><em>They are: &#8220;Why does this work the way it does?&#8221; and &#8220;Who decided this?&#8221; </em></p><p><em>Teams with decision records answer in minutes. Teams without them answer in months.</em></p><div><hr></div><p>Tools make noise. Boundaries create signal.</p><p>I build platforms by drawing the right lines between teams, not by adding more stacks.</p><div><hr></div><p><strong>Upgrade If You Need Implementation, Not Just Ideas</strong></p><p>If you&#8217;re using these emails to guide real decisions on your platform, you&#8217;ll get more leverage from the paid version of The Cloud Playbook.</p><p>The free newsletter gives you patterns and language.</p><p>The paid newsletter turns those patterns into implementation kits you can ship inside a quarter:</p><ul><li><p>Concrete rollout plans (90&#8209;day roadmaps for each pattern)</p></li><li><p>Templates and checklists (policies, runbooks, tagging schemes, review checklists)</p></li><li><p>Real examples from high&#8209;stakes AWS environments (what we actually shipped and why)</p></li></ul><p>If the paid side doesn&#8217;t save you more than the subscription in <strong>one</strong> incident, audit cycle, or bad migration you avoid, you should cancel and keep the playbooks.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe&quot;,&quot;text&quot;:&quot;Upgrade to the Paid Cloud Playbook&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.thecloudplaybook.com/subscribe"><span>Upgrade to the Paid Cloud Playbook</span></a></p><div><hr></div><h2><strong>That&#8217;s it for today!</strong></h2><p>Did you enjoy this newsletter issue?</p><p>Share with your friends, colleagues, and your favorite social media platform.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share The Cloud Playbook&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share The Cloud Playbook</span></a></p><p><strong>Until next week &#8212; Amrut</strong></p><div><hr></div><h2><strong>Get in touch</strong></h2><p>You can find me on <a href="https://www.linkedin.com/in/patilamrut/">LinkedIn</a> or <a href="https://twitter.com/realamrutpatil">X</a>.</p><p>If you would like to request a topic to read, please feel free to contact me directly via LinkedIn or X.</p>]]></content:encoded></item><item><title><![CDATA[TCP #115: You don't have a platform ROI problem.]]></title><description><![CDATA[You have a translation problem. Here's the framework for making platform investment visible in terms executives actually use.]]></description><link>https://www.thecloudplaybook.com/p/platform-investment-roi-engineering-leaders</link><guid isPermaLink="false">https://www.thecloudplaybook.com/p/platform-investment-roi-engineering-leaders</guid><dc:creator><![CDATA[Amrut Patil]]></dc:creator><pubDate>Wed, 08 Apr 2026 16:30:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!eOEm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d47098f-6854-40c5-9d5b-03685b976e37_2816x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You can also read my newsletters from the Substack mobile app and be notified when a new issue is available.</p><div class="install-substack-app-embed install-substack-app-embed-web" data-component-name="InstallSubstackAppToDOM"><img class="install-substack-app-embed-img" src="https://substackcdn.com/image/fetch/$s_!7MI5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b1ca555-b578-4ae3-8dda-a03cbc6b1d18_500x500.png"><div class="install-substack-app-embed-text"><div class="install-substack-app-header">Get more from Amrut Patil in the Substack app</div><div class="install-substack-app-text">Available for iOS and Android</div></div><a href="https://substack.com/app/app-store-redirect?utm_campaign=app-marketing&amp;utm_content=author-post-insert&amp;utm_source=thecloudplaybook" target="_blank" class="install-substack-app-embed-link"><button class="install-substack-app-embed-btn button primary">Get the app</button></a></div><div><hr></div><p>Most platform engineering leaders know their work creates value.</p><p>Most cannot explain it in terms that survive a budget review.</p><p>When a CFO asks, &#8220;What is the ROI of our platform team?&#8221;, the answer most platform leaders give is a list of what the team built.</p><p>That is not an answer to the question asked.</p><p>The question is not: &#8220;What did you ship?&#8221;</p><p><strong>The question is: &#8220;What changed in the business because you shipped it?&#8221;</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eOEm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d47098f-6854-40c5-9d5b-03685b976e37_2816x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eOEm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d47098f-6854-40c5-9d5b-03685b976e37_2816x1536.png 424w, https://substackcdn.com/image/fetch/$s_!eOEm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d47098f-6854-40c5-9d5b-03685b976e37_2816x1536.png 848w, https://substackcdn.com/image/fetch/$s_!eOEm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d47098f-6854-40c5-9d5b-03685b976e37_2816x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!eOEm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d47098f-6854-40c5-9d5b-03685b976e37_2816x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eOEm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d47098f-6854-40c5-9d5b-03685b976e37_2816x1536.png" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5d47098f-6854-40c5-9d5b-03685b976e37_2816x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7700970,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thecloudplaybook.com/i/190867653?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d47098f-6854-40c5-9d5b-03685b976e37_2816x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eOEm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d47098f-6854-40c5-9d5b-03685b976e37_2816x1536.png 424w, https://substackcdn.com/image/fetch/$s_!eOEm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d47098f-6854-40c5-9d5b-03685b976e37_2816x1536.png 848w, https://substackcdn.com/image/fetch/$s_!eOEm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d47098f-6854-40c5-9d5b-03685b976e37_2816x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!eOEm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d47098f-6854-40c5-9d5b-03685b976e37_2816x1536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>The Real Fork in the Road for Platform Investment</h2><p>Platform engineering ROI justification typically arrives in one of three situations: annual budget cycles, headcount requests, or post-incident reviews after something expensive broke.</p><p>Each creates a different conversation. All three expose the same gap.</p><p>The decision most platform leaders avoid making explicitly: are we measuring platform investment as engineering spend or as a business capability?</p><p>Engineering spend framing produces a cost conversation.</p><p>Business capability framing produces an investment conversation.</p><p>These are not the same conversation. The frame you choose determines the outcome you get.</p><h2>Platform Investment Arguments: What Works and What Doesn&#8217;t</h2><h3>Option 1: Technical output metrics</h3>
      <p>
          <a href="https://www.thecloudplaybook.com/p/platform-investment-roi-engineering-leaders">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[TCP #114: Buy vs. Build for Compliance Automation: The Decision That Stalls Most Platform Teams]]></title><description><![CDATA[There are three options, not two. Here's which tradeoffs your org can actually absorb.]]></description><link>https://www.thecloudplaybook.com/p/compliance-automation-buy-vs-build</link><guid isPermaLink="false">https://www.thecloudplaybook.com/p/compliance-automation-buy-vs-build</guid><dc:creator><![CDATA[Amrut Patil]]></dc:creator><pubDate>Sun, 05 Apr 2026 14:25:09 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!tsDt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18bfe628-7126-418a-a705-ca17fa9d8257_2816x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Every engineering team in a regulated environment eventually hits the same moment.</p><p>Compliance evidence is piling up. Auditors want controls you haven&#8217;t mapped yet. </p><p>Someone says, &#8220;We should automate this.&#8221; And then the room splits: buy a compliance automation platform, or build the tooling yourselves.</p><p>The decision seems tactical. It isn&#8217;t. </p><p>The wrong call costs six to eighteen months of engineering time and still leaves gaps in your audit readiness.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tsDt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18bfe628-7126-418a-a705-ca17fa9d8257_2816x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tsDt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18bfe628-7126-418a-a705-ca17fa9d8257_2816x1536.png 424w, https://substackcdn.com/image/fetch/$s_!tsDt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18bfe628-7126-418a-a705-ca17fa9d8257_2816x1536.png 848w, https://substackcdn.com/image/fetch/$s_!tsDt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18bfe628-7126-418a-a705-ca17fa9d8257_2816x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!tsDt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18bfe628-7126-418a-a705-ca17fa9d8257_2816x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tsDt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18bfe628-7126-418a-a705-ca17fa9d8257_2816x1536.png" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/18bfe628-7126-418a-a705-ca17fa9d8257_2816x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7630173,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecloudplaybook.com/i/190867299?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18bfe628-7126-418a-a705-ca17fa9d8257_2816x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tsDt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18bfe628-7126-418a-a705-ca17fa9d8257_2816x1536.png 424w, https://substackcdn.com/image/fetch/$s_!tsDt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18bfe628-7126-418a-a705-ca17fa9d8257_2816x1536.png 848w, https://substackcdn.com/image/fetch/$s_!tsDt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18bfe628-7126-418a-a705-ca17fa9d8257_2816x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!tsDt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18bfe628-7126-418a-a705-ca17fa9d8257_2816x1536.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p><strong>Building an IDP from Scratch &#8212; Live 2-day Workshop</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MXLt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f6441a-57b0-456c-bfa8-d7292e22d87d_1280x640.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MXLt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f6441a-57b0-456c-bfa8-d7292e22d87d_1280x640.png 424w, https://substackcdn.com/image/fetch/$s_!MXLt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f6441a-57b0-456c-bfa8-d7292e22d87d_1280x640.png 848w, https://substackcdn.com/image/fetch/$s_!MXLt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f6441a-57b0-456c-bfa8-d7292e22d87d_1280x640.png 1272w, https://substackcdn.com/image/fetch/$s_!MXLt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f6441a-57b0-456c-bfa8-d7292e22d87d_1280x640.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MXLt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f6441a-57b0-456c-bfa8-d7292e22d87d_1280x640.png" width="1280" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a7f6441a-57b0-456c-bfa8-d7292e22d87d_1280x640.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:193957,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thecloudplaybook.com/i/190867299?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f6441a-57b0-456c-bfa8-d7292e22d87d_1280x640.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MXLt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f6441a-57b0-456c-bfa8-d7292e22d87d_1280x640.png 424w, https://substackcdn.com/image/fetch/$s_!MXLt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f6441a-57b0-456c-bfa8-d7292e22d87d_1280x640.png 848w, https://substackcdn.com/image/fetch/$s_!MXLt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f6441a-57b0-456c-bfa8-d7292e22d87d_1280x640.png 1272w, https://substackcdn.com/image/fetch/$s_!MXLt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7f6441a-57b0-456c-bfa8-d7292e22d87d_1280x640.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Design and build an Internal Developer Platform that scales and gets adopted. This hands-on, 2-day workshop led by Ajay Chankramath (Founder of Platformetrics, former ThoughtWorks leader, author of Platform Engineer&#8217;s Handbook) covers platform-as-a-product thinking, cloud-native architecture, Infrastructure as Code, automation patterns, and production readiness.<br><br>Ideal for platform engineers, DevOps teams, and engineering leaders building or stabilizing IDPs.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.eventbrite.com/e/building-an-internal-developer-platform-from-scratch-tickets-1978960034736?aff=cloudplaybook&quot;,&quot;text&quot;:&quot;Sign-up today&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.eventbrite.com/e/building-an-internal-developer-platform-from-scratch-tickets-1978960034736?aff=cloudplaybook"><span>Sign-up today</span></a></p><p>Use discount code <strong>CLOUD40</strong> during sign-up to get <strong>40% off</strong></p><div><hr></div><h2>Three Paths On The Table. Most Teams Only See Two.</h2><p>There are three real options for compliance automation.</p><ol><li><p><strong>Buy a compliance automation platform.</strong> Products like Vanta, Drata, or Secureframe sit on top of your cloud infrastructure. They provide automated evidence collection, control mapping, and audit readiness dashboards. You configure integrations. They handle framework updates when the standard changes.</p></li><li><p><strong>Build compliance tooling internally.</strong> Your platform team writes custom scripts or a compliance-as-code layer that pulls evidence from your environment, structures it for auditors, and stores it with full version history. You own every line. You control every integration.</p></li><li><p><strong> Hybrid.</strong> Buy a platform for standard controls and automated evidence collection. Build custom tooling only for what the vendor doesn&#8217;t cover: proprietary systems, non-standard integrations, or regulatory requirements outside the vendor&#8217;s framework mappings.</p></li></ol><p>Most engineering leaders treat this as a binary. It isn&#8217;t.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">The Cloud Playbook is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>What Each Option Actually Cost You</h2><p><strong>Buying costs flexibility.</strong></p><p>Commercial compliance automation tools map well to recognized frameworks: SOC 2, ISO 27001, HIPAA, and FedRAMP Moderate. They handle common integrations well. AWS, GitHub, Okta, Jira. But if your architecture diverges from what the vendor built for, expect gaps.</p><p>When your regulatory requirements don&#8217;t align with the vendor&#8217;s control library, you spend significant time on manual evidence uploads and exception management. Platform engineers end up maintaining the GRC platform instead of building a product.</p><p>That hidden maintenance cost rarely appears in any vendor&#8217;s ROI calculator.</p><p><strong>Building costs time you don&#8217;t have &#8212; and creates a new risk surface.</strong></p><p>Internal compliance tooling gives you full control. You can map exactly to your control set, pull compliance evidence from any system, and structure the output precisely the way your auditors want it.</p><p>But there is a cost most teams don&#8217;t price in: the moment you build your own compliance automation, that codebase becomes part of your risk surface. You need to validate, secure, and audit it as you would any other production system. Your internal compliance tooling is itself a compliance artifact.</p><p>And a credible internal automated evidence collection pipeline, with versioning, access controls, and audit trails, takes three to six months to build and maintain as a first-class capability. Most platform teams don&#8217;t have that capacity during an active audit cycle.</p><p><strong>Hybrid costs coordination.</strong></p><p>You get coverage where the vendor is strong and control where you need custom logic. The cost is maintaining two systems and keeping compliance evidence consistent across both.</p><p>Gaps appear at the seams. Auditors notice when evidence from your internal tooling doesn&#8217;t align with what the compliance platform reports. Reconciling those gaps during an audit is expensive and avoidable.</p><p>No option is clean. The question is which tradeoffs your organization can absorb right now.</p><div><hr></div><h2>Why I Default To Buy First And Build Only At The Edges</h2><p>My recommendation is hybrid, weighted toward buy for the first two years.</p><p>Buy a commercial platform for standard framework controls. You are not going to out-engineer a vendor&#8217;s SOC 2 compliance automation mappings. They have mapped thousands of audits. You have mapped one, maybe two. Get automated evidence collection running quickly. Let the vendor handle framework updates when the standard changes.</p><p><strong>Build for the gaps.</strong> If you run on-premise infrastructure, proprietary data pipelines, or a regulated data classification system that vendors don&#8217;t support, build a lightweight evidence collector for those specific controls. Keep it narrow. Keep it maintainable. Resist the urge to expand scope.</p><p>The reason I weigh toward buying early is simple: compliance automation is not a core differentiator. Your platform team&#8217;s time is finite.</p><p>Spending six months building an internal evidence pipeline is six months not spent on developer experience, deployment infrastructure, or reliability tooling. Buy the commodity. Build only what the market doesn&#8217;t cover.</p><p>The exception is scale. For more than 500 engineers or when operating across multiple regulatory frameworks simultaneously, vendor licensing costs and integration maintenance overhead can exceed what a well-resourced internal team would spend.</p><p>Multi-framework compliance at scale is where commercial platforms often show their seams. At that point, the build becomes economically rational and strategically worth resourcing.</p><div><hr></div><h2>Two Signals That Flip The Answer Towards Build</h2><p>This framework holds when you are in your first or second audit cycle and your scope maps to a recognized framework.</p><p>Two signals change the answer.</p><p><strong>1/ Your architecture is genuinely non-standard.</strong></p><p>Air-gapped environments, on-premise data processing, proprietary protocols. No compliance automation tool covers these well. You will spend more time managing exceptions than building. In these environments, a commercial platform becomes a workaround, not a solution. Build.</p><p><strong>2/ Compliance is your product.</strong></p><p>If customers buy from you because of your compliance posture, continuous compliance is a core platform capability, not overhead. Build it. Own it. Treat it as a product with its own roadmap and dedicated engineering investment. The compliance platform decision is a competitive question when compliance is what you sell.</p><p>Everything else defaults to buy.</p><div><hr></div><p><strong>Run this check before your next vendor conversation:</strong></p><ol><li><p>List every system in your environment that generates compliance evidence. Flag which of your shortlisted compliance automation tools do not cover natively.</p></li><li><p>Price the gap: estimate how many engineering hours per quarter you would spend on manual evidence collection for uncovered controls.</p></li><li><p>Calculate the build cost for a lightweight internal evidence collector for just those controls, and factor in the ongoing security and maintenance overhead of owning that codebase. Compare it against the manual overhead.</p></li></ol><p>That comparison tells you whether you are buying a solution or renting a workaround. The answer changes how you negotiate, what integrations you demand, and whether you sign at all.</p><p>Teams that run this analysis before signing a compliance automation platform reduce their post-implementation integration work by 40 to 60 percent.</p><p>Every team that skips it ends up in the same place: one engineer maintaining a patchwork of scripts and manual uploads six months after go-live, during an active audit.</p><div><hr></div><p><em>If you want the implementation details, I go one level deeper in the <strong>paid Cloud Playbook tier</strong>:</em></p><ul><li><p><em>The exact RFP checklist I use to pressure&#8209;test compliance automation, vendors</em></p></li><li><p><em>A build&#8209;vs&#8209;buy spreadsheet you can plug your own engineer costs and audit scope into</em></p></li><li><p><em>Example &#8220;hybrid&#8221; reference architectures that keep vendors in their lane and your team focused on product</em></p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe&quot;,&quot;text&quot;:&quot;Upgrade Here&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.thecloudplaybook.com/subscribe"><span>Upgrade Here</span></a></p><div><hr></div><h2><strong>Whenever you&#8217;re ready, there are 2 ways I can help you:</strong></h2><ol><li><p><strong>Free guides and helpful resources: </strong><a href="https://thecloudplaybook.gumroad.com/">https://thecloudplaybook.gumroad.com/</a></p></li><li><p>Get certified as an <strong>AWS AI Practitioner</strong> in 2026. Sign up today to elevate your cloud skills. (<em><a href="https://www.udemy.com/course/aws-certified-ai-practitioner-practice-exams-aif-c01/">link</a></em>)</p></li></ol><div><hr></div><h2><strong>That&#8217;s it for today!</strong></h2><p>Did you enjoy this newsletter issue?</p><p>Share with your friends, colleagues, and your favorite social media platform.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share The Cloud Playbook&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share The Cloud Playbook</span></a></p><p><strong>Until next week &#8212; Amrut</strong></p><div><hr></div><h2><strong>Get in touch</strong></h2><p>You can find me on <a href="https://www.linkedin.com/in/patilamrut/">LinkedIn</a> or <a href="https://twitter.com/realamrutpatil">X</a>.</p><p>If you would like to request a topic to read, please feel free to contact me directly via LinkedIn or X.</p>]]></content:encoded></item><item><title><![CDATA[TCP #113: Why your platform team is a bottleneck (and why hiring won’t fix it)]]></title><description><![CDATA[Six engineers, eighty tickets, twelve product teams queued. The problem isn&#8217;t headcount. It&#8217;s the org structure nobody touches.]]></description><link>https://www.thecloudplaybook.com/p/platform-team-bottleneck-organizational-structure</link><guid isPermaLink="false">https://www.thecloudplaybook.com/p/platform-team-bottleneck-organizational-structure</guid><dc:creator><![CDATA[Amrut Patil]]></dc:creator><pubDate>Wed, 01 Apr 2026 16:31:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!40sM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F767f4413-4c84-46f3-847a-fcc3db9c3f2a_2816x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>SIX ENGINEERS. EIGHTY OPEN TICKETS. ONE STRUCTURAL FAILURE.</h2><p>One platform team. Twelve product teams queued behind them.</p><p>Every deploy request was a ticket. Every new AWS account required platform sign-off. Every tool decision got routed through a weekly sync.</p><p>The team had six engineers and eighty open tickets. Leadership called it a headcount problem.</p><p>It wasn&#8217;t.</p>
      <p>
          <a href="https://www.thecloudplaybook.com/p/platform-team-bottleneck-organizational-structure">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[TCP #112: The single question that predicts platform team success]]></title><description><![CDATA[Why &#8220;can a stranger deploy without help?&#8221; predicts adoption better than NPS or toil charts.]]></description><link>https://www.thecloudplaybook.com/p/single-question-predicts-platform-team-success</link><guid isPermaLink="false">https://www.thecloudplaybook.com/p/single-question-predicts-platform-team-success</guid><dc:creator><![CDATA[Amrut Patil]]></dc:creator><pubDate>Sun, 29 Mar 2026 14:21:48 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Y9T4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40c162d-0443-4fd6-9c65-5318b533edcc_2816x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You can also read my newsletters from the Substack mobile app and be notified when a new issue is available.</p><div class="install-substack-app-embed install-substack-app-embed-web" data-component-name="InstallSubstackAppToDOM"><img class="install-substack-app-embed-img" src="https://substackcdn.com/image/fetch/$s_!7MI5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b1ca555-b578-4ae3-8dda-a03cbc6b1d18_500x500.png"><div class="install-substack-app-embed-text"><div class="install-substack-app-header">Get more from Amrut Patil in the Substack app</div><div class="install-substack-app-text">Available for iOS and Android</div></div><a href="https://substack.com/app/app-store-redirect?utm_campaign=app-marketing&amp;utm_content=author-post-insert&amp;utm_source=thecloudplaybook" target="_blank" class="install-substack-app-embed-link"><button class="install-substack-app-embed-btn button primary">Get the app</button></a></div><div><hr></div><p>If you run a platform or infra team, you only need one question:</p><p><strong>Can a developer I&#8217;ve never met deploy a production service without asking anyone for help?</strong></p><p>That is it.</p><p>Not &#8220;what is our deployment frequency?&#8221; </p><p>Not &#8220;what is our platform NPS score?&#8221; </p><p>Not &#8220;how much toil have we eliminated?&#8221;</p><p>Those metrics matter. But they are lagging indicators. They tell you what your platform did last quarter.</p><p>This question tells you what your platform is, right now.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Y9T4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40c162d-0443-4fd6-9c65-5318b533edcc_2816x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Y9T4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40c162d-0443-4fd6-9c65-5318b533edcc_2816x1536.png 424w, https://substackcdn.com/image/fetch/$s_!Y9T4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40c162d-0443-4fd6-9c65-5318b533edcc_2816x1536.png 848w, https://substackcdn.com/image/fetch/$s_!Y9T4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40c162d-0443-4fd6-9c65-5318b533edcc_2816x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!Y9T4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40c162d-0443-4fd6-9c65-5318b533edcc_2816x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Y9T4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40c162d-0443-4fd6-9c65-5318b533edcc_2816x1536.png" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b40c162d-0443-4fd6-9c65-5318b533edcc_2816x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6622315,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thecloudplaybook.com/i/190867254?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40c162d-0443-4fd6-9c65-5318b533edcc_2816x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Y9T4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40c162d-0443-4fd6-9c65-5318b533edcc_2816x1536.png 424w, https://substackcdn.com/image/fetch/$s_!Y9T4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40c162d-0443-4fd6-9c65-5318b533edcc_2816x1536.png 848w, https://substackcdn.com/image/fetch/$s_!Y9T4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40c162d-0443-4fd6-9c65-5318b533edcc_2816x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!Y9T4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb40c162d-0443-4fd6-9c65-5318b533edcc_2816x1536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>WHERE THIS CAME FROM</h2><p>I started asking this question after a specific pattern repeated itself across three different organizations.</p><p>Each team had strong DORA metrics. Deployment frequency was high. Lead times were short. On the surface, the platform was working.</p><p>Then we&#8217;d onboard a new team and watch what happened. Engineers would read the documentation, hit a wall, open a ticket, wait, get an answer, try again, hit another wall, open another ticket.</p><p>Two weeks of this and they&#8217;d either give up on the platform or build their own path around it.</p><p>The metrics hadn&#8217;t lied. They reflected what existing users could do after months of accumulating tribal knowledge.</p><p>They did not reflect what the platform actually offered to someone arriving cold.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">The Cloud Playbook is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>WHY THIS QUESTION PREDICTS WHAT METRICS MISS</h2><p>The deployment frequency metric measures how often your most experienced teams ship.</p><p>The platform NPS score measures whether developers who already use the platform are satisfied with it.</p><p>The toil reduction metric measures how much manual work you eliminated for teams that were doing it manually before.</p><p>All three measure existing behavior in existing users.</p><p>The question &#8220;can a developer I&#8217;ve never met deploy without asking anyone for help&#8221; measures something different. </p><p>It measures the platform&#8217;s legibility to a stranger.</p><p><strong>Legibility is the real test. Not performance. Not satisfaction. Not throughput.</strong></p><p><strong>If a new engineer can&#8217;t even find the path, it doesn&#8217;t matter how fast your existing teams can run it.</strong></p><p>Not performance. Not satisfaction. Not throughput.</p><p>A platform that requires tribal knowledge to operate has a known failure mode that it hasn&#8217;t solved. Every new team that joins the organization will pay the onboarding tax. Every engineer who moves between teams will pay it again.</p><p>The tax compounds quietly. It shows up as a two-week onboarding period that should take two days. It shows up as a Slack message that interrupts a senior engineer at 2 pm. It shows up as a ticket queue that the platform team treats as normal operational load, when it is actually evidence that the platform has not done its job.</p><div><hr></div><h2>WHAT TO DO WITH THIS INSIGHT</h2><p>Run the test literally.</p><p>Find a developer who has not used your platform before. Give them your documentation and nothing else. Watch where they stop. Watch what they search for. Watch what they eventually ask a human to explain.</p><p>Every stopping point is a design failure, not a user failure.</p><p>The goal is not a platform that experienced engineers find fast. The goal is a platform that a new engineer can navigate to a first successful deployment without human intervention.</p><p>That bar is higher than most platform teams think it is. Most teams build for their current users, optimizing for the paths they already know. New users are left to discover the path themselves.</p><div><hr></div><h2>RUN THIS CHECK</h2><p>What to do this week:</p><p>What to do this week (30&#8211;60 minutes):</p><p>&#8226; Identify one engineer who joined your organization in the last 60 days.</p><p>&#8226; Ask them to walk you through their first deployment experience: where they got stuck, who they had to ask for help.</p><p>&#8226; Count the number of human interventions between &#8220;first commit&#8221; and &#8220;service running in production.&#8221; Each intervention is a platform gap, not a developer gap.</p><p>&#8226; Set a target: <strong>zero human interventions for a standard service deployment</strong> and track it alongside your DORA metrics.</p><div><hr></div><p>Reducing new-engineer onboarding time from two weeks to two days returns compounding capacity across every hiring cycle. The teams that run this test consistently report that the gaps it surfaces are more actionable than any NPS survey because they are specific, reproducible, and owned by the platform team, not the developers experiencing them.</p><p>Every platform I have seen struggle with adoption had the same root cause. It was built for the people who built it. The documentation assumed knowledge th<strong>at th</strong>e documentation was supposed to provide. The golden path was golden for the people who paved it.</p><div><hr></div><p><em>If you&#8217;re serious about how your platform serves new users, this is exactly what the <strong>Paid</strong> version of this newsletter is for.</em></p><p><em>Paid subscribers get the full <strong>Platform Team Scorecard</strong>: nine capability areas, a copy&#8209;paste audit worksheet, and example questions you can run with your teams to find the gaps that block new engineers from shipping. You also unlock the back catalog of scorecards and implementation guides, so you&#8217;re not inventing your own framework from scratch.</em></p><p><em>If you want to go beyond reading and actually instrument your platform, <strong>upgrade to Paid</strong> and run the Scorecard with your team in the next week.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe&quot;,&quot;text&quot;:&quot;Upgrade Here&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.thecloudplaybook.com/subscribe"><span>Upgrade Here</span></a></p><div><hr></div><h2><strong>Whenever you&#8217;re ready, there are 2 ways I can help you:</strong></h2><ol><li><p><strong>Free guides and helpful resources: </strong><a href="https://thecloudplaybook.gumroad.com/">https://thecloudplaybook.gumroad.com/</a></p></li><li><p>Get certified as an <strong>AWS AI Practitioner</strong> in 2026. Sign up today to elevate your cloud skills. (<em><a href="https://www.udemy.com/course/aws-certified-ai-practitioner-practice-exams-aif-c01/">link</a></em>)</p></li></ol><div><hr></div><h2><strong>That&#8217;s it for today!</strong></h2><p>Did you enjoy this newsletter issue?</p><p>Share with your friends, colleagues, and your favorite social media platform.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share The Cloud Playbook&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share The Cloud Playbook</span></a></p><p><strong>Until next week &#8212; Amrut</strong></p><div><hr></div><h2><strong>Get in touch</strong></h2><p>You can find me on <a href="https://www.linkedin.com/in/patilamrut/">LinkedIn</a> or <a href="https://twitter.com/realamrutpatil">X</a>.</p><p>If you would like to request a topic to read, please feel free to contact me directly via LinkedIn or X.</p>]]></content:encoded></item><item><title><![CDATA[TCP #111: One standardized policy that killed 4 months of IAM escalations]]></title><description><![CDATA[How we went from recurring tickets to zero and made audit evidence boring again.]]></description><link>https://www.thecloudplaybook.com/p/standardized-iam-policy-cross-team-conflict</link><guid isPermaLink="false">https://www.thecloudplaybook.com/p/standardized-iam-policy-cross-team-conflict</guid><dc:creator><![CDATA[Amrut Patil]]></dc:creator><pubDate>Wed, 25 Mar 2026 16:31:07 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!I1CL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb01b23-7035-45d8-94e1-49e0f6bc6e06_2816x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You can also read my newsletters from the Substack mobile app and be notified when a new issue is available</p><div class="install-substack-app-embed install-substack-app-embed-web" data-component-name="InstallSubstackAppToDOM"><img class="install-substack-app-embed-img" src="https://substackcdn.com/image/fetch/$s_!7MI5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b1ca555-b578-4ae3-8dda-a03cbc6b1d18_500x500.png"><div class="install-substack-app-embed-text"><div class="install-substack-app-header">Get more from Amrut Patil in the Substack app</div><div class="install-substack-app-text">Available for iOS and Android</div></div><a href="https://substack.com/app/app-store-redirect?utm_campaign=app-marketing&amp;utm_content=author-post-insert&amp;utm_source=thecloudplaybook" target="_blank" class="install-substack-app-embed-link"><button class="install-substack-app-embed-btn button primary">Get the app</button></a></div><div><hr></div><p>If three app teams share an S3 bucket in your AWS org, you probably have three different IAM policies and a hidden audit problem.</p><p>Here&#8217;s how we eliminated four months of IAM permission escalations with a single customer-managed policy in a single sprint.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!I1CL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb01b23-7035-45d8-94e1-49e0f6bc6e06_2816x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!I1CL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb01b23-7035-45d8-94e1-49e0f6bc6e06_2816x1536.png 424w, https://substackcdn.com/image/fetch/$s_!I1CL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb01b23-7035-45d8-94e1-49e0f6bc6e06_2816x1536.png 848w, https://substackcdn.com/image/fetch/$s_!I1CL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb01b23-7035-45d8-94e1-49e0f6bc6e06_2816x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!I1CL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb01b23-7035-45d8-94e1-49e0f6bc6e06_2816x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!I1CL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb01b23-7035-45d8-94e1-49e0f6bc6e06_2816x1536.png" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ffb01b23-7035-45d8-94e1-49e0f6bc6e06_2816x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5875188,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecloudplaybook.com/i/190867223?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb01b23-7035-45d8-94e1-49e0f6bc6e06_2816x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!I1CL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb01b23-7035-45d8-94e1-49e0f6bc6e06_2816x1536.png 424w, https://substackcdn.com/image/fetch/$s_!I1CL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb01b23-7035-45d8-94e1-49e0f6bc6e06_2816x1536.png 848w, https://substackcdn.com/image/fetch/$s_!I1CL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb01b23-7035-45d8-94e1-49e0f6bc6e06_2816x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!I1CL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb01b23-7035-45d8-94e1-49e0f6bc6e06_2816x1536.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>THE PERMISSION CONFLICT THAT KEPT ESCALATING</h2><p>Before the change, each team had written its own inline IAM policy for accessing the same shared S3 bucket.</p><p>Team A had scoped their policy by prefix. Team B had scoped by action, then added a wildcard when something broke in a hurry. Team C had copied Team B&#8217;s policy six months earlier, before Team B&#8217;s wildcard was added, and was missing two actions their pipeline now needed.</p><p>Every two to three weeks, one of the three teams would open a ticket. A deployment would fail. An access denied error would appear in CloudTrail. Someone would ping the platform team. The platform engineer on rotation would spend 45 minutes reading three different policy documents to figure out which one was the source of the conflict.</p>
      <p>
          <a href="https://www.thecloudplaybook.com/p/standardized-iam-policy-cross-team-conflict">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[TCP# 110: The question that quietly kills your incident response]]></title><description><![CDATA[Platform teams don&#8217;t fail because of bad tools. They fail because, at 2 am, nobody can answer one question: who owns this service?]]></description><link>https://www.thecloudplaybook.com/p/platform-team-failure-unclear-ownership</link><guid isPermaLink="false">https://www.thecloudplaybook.com/p/platform-team-failure-unclear-ownership</guid><dc:creator><![CDATA[Amrut Patil]]></dc:creator><pubDate>Sun, 22 Mar 2026 14:28:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!nDHA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54656ef4-ae59-4c57-8280-34b194997879_1888x2238.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You can also read my newsletters from the Substack mobile app and be notified when a new issue is available.</p><div class="install-substack-app-embed install-substack-app-embed-web" data-component-name="InstallSubstackAppToDOM"><img class="install-substack-app-embed-img" src="https://substackcdn.com/image/fetch/$s_!7MI5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b1ca555-b578-4ae3-8dda-a03cbc6b1d18_500x500.png"><div class="install-substack-app-embed-text"><div class="install-substack-app-header">Get more from Amrut Patil in the Substack app</div><div class="install-substack-app-text">Available for iOS and Android</div></div><a href="https://substack.com/app/app-store-redirect?utm_campaign=app-marketing&amp;utm_content=author-post-insert&amp;utm_source=thecloudplaybook" target="_blank" class="install-substack-app-embed-link"><button class="install-substack-app-embed-btn button primary">Get the app</button></a></div><div><hr></div><p>Platform teams don&#8217;t fail because of bad tools.</p><p>They fail because nobody can answer one question: who owns this?</p><p>Not &#8220;who built it.&#8221; Not &#8220;who is on-call for it this week.&#8221; </p><p><strong>Who is accountable for its behavior, its health, and its evolution over time?</strong></p><p>That question sounds simple. In most organizations, it has no clean answer.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nDHA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54656ef4-ae59-4c57-8280-34b194997879_1888x2238.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nDHA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54656ef4-ae59-4c57-8280-34b194997879_1888x2238.png 424w, https://substackcdn.com/image/fetch/$s_!nDHA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54656ef4-ae59-4c57-8280-34b194997879_1888x2238.png 848w, https://substackcdn.com/image/fetch/$s_!nDHA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54656ef4-ae59-4c57-8280-34b194997879_1888x2238.png 1272w, https://substackcdn.com/image/fetch/$s_!nDHA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54656ef4-ae59-4c57-8280-34b194997879_1888x2238.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nDHA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54656ef4-ae59-4c57-8280-34b194997879_1888x2238.png" width="1456" height="1726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/54656ef4-ae59-4c57-8280-34b194997879_1888x2238.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1726,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5474221,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecloudplaybook.com/i/190864181?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54656ef4-ae59-4c57-8280-34b194997879_1888x2238.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nDHA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54656ef4-ae59-4c57-8280-34b194997879_1888x2238.png 424w, https://substackcdn.com/image/fetch/$s_!nDHA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54656ef4-ae59-4c57-8280-34b194997879_1888x2238.png 848w, https://substackcdn.com/image/fetch/$s_!nDHA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54656ef4-ae59-4c57-8280-34b194997879_1888x2238.png 1272w, https://substackcdn.com/image/fetch/$s_!nDHA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54656ef4-ae59-4c57-8280-34b194997879_1888x2238.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>THE SIGNAL MOST LEADERS MISS UNTIL IT&#8217;S TOO LATE</h2><p>The pattern shows up the same way every time.</p><p>An incident occurred at 2 am. The alert lands in a shared channel. Engineers from three teams join the call. The first fifteen minutes are spent not on the fix, but on the question: whose service is this?</p><p>Nobody is lying. Nobody is avoiding the work. The system was just never designed to answer that question clearly.</p><p>This is not a tooling gap. No amount of better observability surfaces an owner. No dashboard tells you who is responsible for making a decision. That is an organizational design problem.</p><p>The cost is not just the 15 minutes of confusion at 2 am. </p><p>The cost compounds: a slower mean time to resolution, higher engineer burnout, recurring incidents because no one has a clear mandate to fix the root cause, and audit findings where evidence collection stalls because nobody owns the system that should automatically produce it.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">The Cloud Playbook is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>WHAT UNCLEAR OWNERSHIP ACTUALLY LOOKS LIKE IN PRODUCTION</h2><p>It does not look like chaos. It looks like reasonable ambiguity.</p><p>A service was built by one team, migrated to another, and is now consumed by six. The original team documented it two years ago. The documentation is stale. The consuming teams have added workarounds. No one has updated the service catalog entry because nobody feels like the owner.</p><p>During normal operations, this is invisible. The service runs. Nobody asks questions.</p><p>During an incident, a compliance audit, or a cost-optimization review, ambiguity becomes costly. Three teams each spend four hours gathering evidence for the same control because none of them is sure who should do it. You pay for that coordination overhead every single time.</p><p>The research backs this up. Organizations with explicit ownership models, where every service has a named team, and that assignment is enforced in the deployment pipeline, resolve incidents measurably faster. The ownership metadata is not just a cultural artifact. It is an operational infrastructure.</p><div><hr></div><h2>THE OPERATING PRINCIPLE AT WORK</h2><p>Ownership is not a feeling. It is a system-level declaration that must be maintained like code.</p><p>The fix is not a new tool. It is a new invariant: no service runs in production without a named owner encoded in its configuration, linked to an on-call rotation, and tied to a support tier. That invariant is enforced at deploy time, not suggested in a wiki.</p><p>Here is what that looks like concretely.</p><p>Every resource in your service catalog carries three fields: owning team, support tier, and deprecation status. Deployments that do not carry valid owner labels are rejected at the infrastructure layer, not flagged afterward. The ownership registry is queried automatically at incident time, so the first alert includes the owning team, not just the service name.</p><p>This is not complex to build. It is a webhook, a registry, and a policy. Most teams have the technical capability in a few sprints.</p><p>What they lack is the mandate. Ownership enforcement feels bureaucratic until the first incident, where it saves 45 minutes of confusion. After that, engineers stop objecting to it.</p><p>The deeper shift is cultural. When ownership is enforced at the infrastructure layer, it stops being a conversation and starts being a constraint. Constraints are honest. They tell you exactly what the system expects of you. That honesty reduces the cognitive overhead that ambiguous shared ownership creates for everyone.</p><div><hr></div><h2>RUN THIS CHECK</h2><p>What to do this week:</p><p>Pull your current service catalog. Count the services with no owner or an owner field pointing to a team that no longer exists. If that number is above 10%, you have a structural problem, not a documentation problem.</p><p>Pick one critical service with ambiguous ownership. Assign it to a named team, add the assignment to the deployment configuration, and run a tabletop incident drill where that team is the first call. Track resolution time.</p><p>Write down the three services that caused the most escalation overhead in the last quarter. In each case, determine whether the escalation was driven by unclear ownership. It almost always is.</p><div><hr></div><p>Teams that enforce ownership at the infrastructure layer cut incident mean-time-to-resolution by removing the ownership-discovery step entirely. That step costs more time than most engineering leaders realize, because it rarely appears in post-incident reviews as a distinct line item.</p><p>Every time I&#8217;ve seen a platform team struggle with recurring incidents in the same service area, the root cause has been the same: the team on-call did not feel accountable because they did not feel like the owner. Ownership clarity is not a nice-to-have. It is the precondition for accountability.</p><div><hr></div><p><em><strong>Free tool: Score your AWS platform&#8217;s predictability in 5 minutes</strong><br>If this hit close to home, you probably have other places where ownership and accountability are fuzzy but invisible until something breaks.</em></p><p><em>To make this practical, I put together a free <strong>AWS Platform Predictability Starter Kit</strong> for readers:</em></p><ul><li><p><em>5&#8209;minute predictability checklist</em></p></li><li><p><em>&#8220;Where are we bleeding?&#8221; team scorecard</em></p></li><li><p><em>Platform risk radar with 12 early&#8209;warning signals</em></p></li><li><p><em>10 executive questions with weak vs strong answers + debrief worksheet<br>Most leaders run through these in a week of normal meetings and come away with a clear &#8220;top 3&#8221; to fix next.</em></p></li></ul><p><em>&#128073; Grab the free PDF here: <a href="https://thecloudplaybook.gumroad.com/l/aws-platform-predictability-check">https://thecloudplaybook.gumroad.com/l/aws-platform-predictability-check</a></em></p><p><em>If you run an AWS platform in production, <strong>do this before your next incident review</strong> so you&#8217;re not guessing which part of the platform will bite you next.</em></p><p><em>In the paid Cloud Playbook tier, I share the exact &#8220;Platform vs Team Contract&#8221; template and review checklist I use to draw these boundaries without starting a turf war.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe&quot;,&quot;text&quot;:&quot;Upgrade Here&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.thecloudplaybook.com/subscribe"><span>Upgrade Here</span></a></p><div><hr></div><h2><strong>Whenever you&#8217;re ready, there are 2 ways I can help you:</strong></h2><ol><li><p><strong>Free guides and helpful resources: </strong><a href="https://thecloudplaybook.gumroad.com/">https://thecloudplaybook.gumroad.com/</a></p></li><li><p>Get certified as an <strong>AWS AI Practitioner</strong> in 2026. Sign up today to elevate your cloud skills. (<em><a href="https://www.udemy.com/course/aws-certified-ai-practitioner-practice-exams-aif-c01/">link</a></em>)</p></li></ol><div><hr></div><h2><strong>That&#8217;s it for today!</strong></h2><p>Did you enjoy this newsletter issue?</p><p>Share with your friends, colleagues, and your favorite social media platform.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share The Cloud Playbook&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share The Cloud Playbook</span></a></p><p><strong>Until next week &#8212; Amrut</strong></p><div><hr></div><h2><strong>Get in touch</strong></h2><p>You can find me on <a href="https://www.linkedin.com/in/patilamrut/">LinkedIn</a> or <a href="https://twitter.com/realamrutpatil">X</a>.</p><p>If you would like to request a topic to read, please feel free to contact me directly via LinkedIn or X.</p>]]></content:encoded></item><item><title><![CDATA[TCP #109: The trust event that killed your platform adoption]]></title><description><![CDATA[Platforms rarely &#8220;fade.&#8221; One unannounced breaking change quietly trains teams to avoid you. This issue shows you how to surface that moment.]]></description><link>https://www.thecloudplaybook.com/p/developer-platform-adoption-trust-internal-developer-platform</link><guid isPermaLink="false">https://www.thecloudplaybook.com/p/developer-platform-adoption-trust-internal-developer-platform</guid><dc:creator><![CDATA[Amrut Patil]]></dc:creator><pubDate>Wed, 18 Mar 2026 16:31:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!HpeI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd71e13cf-93d5-45b6-8e18-9b973134da23_1888x2238.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You can also read my newsletters from the Substack mobile app and be notified when a new issue is available.</p><div class="install-substack-app-embed install-substack-app-embed-web" data-component-name="InstallSubstackAppToDOM"><img class="install-substack-app-embed-img" src="https://substackcdn.com/image/fetch/$s_!7MI5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b1ca555-b578-4ae3-8dda-a03cbc6b1d18_500x500.png"><div class="install-substack-app-embed-text"><div class="install-substack-app-header">Get more from Amrut Patil in the Substack app</div><div class="install-substack-app-text">Available for iOS and Android</div></div><a href="https://substack.com/app/app-store-redirect?utm_campaign=app-marketing&amp;utm_content=author-post-insert&amp;utm_source=thecloudplaybook" target="_blank" class="install-substack-app-embed-link"><button class="install-substack-app-embed-btn button primary">Get the app</button></a></div><div><hr></div><p>When developers stop using your platform, the problem is not adoption. It is trust.</p><p>Adoption is a behavior. Trust is what drives it. </p><p>When developers route around your platform, open tickets in the wrong queue, or write their own Terraform instead of using your modules, they are telling you something. </p><p>Most platform teams respond by addressing the behavior. The right response is to diagnose what broke the signal underneath it.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HpeI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd71e13cf-93d5-45b6-8e18-9b973134da23_1888x2238.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HpeI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd71e13cf-93d5-45b6-8e18-9b973134da23_1888x2238.png 424w, https://substackcdn.com/image/fetch/$s_!HpeI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd71e13cf-93d5-45b6-8e18-9b973134da23_1888x2238.png 848w, https://substackcdn.com/image/fetch/$s_!HpeI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd71e13cf-93d5-45b6-8e18-9b973134da23_1888x2238.png 1272w, https://substackcdn.com/image/fetch/$s_!HpeI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd71e13cf-93d5-45b6-8e18-9b973134da23_1888x2238.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HpeI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd71e13cf-93d5-45b6-8e18-9b973134da23_1888x2238.png" width="1456" height="1726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d71e13cf-93d5-45b6-8e18-9b973134da23_1888x2238.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1726,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5649274,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecloudplaybook.com/i/190239745?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd71e13cf-93d5-45b6-8e18-9b973134da23_1888x2238.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HpeI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd71e13cf-93d5-45b6-8e18-9b973134da23_1888x2238.png 424w, https://substackcdn.com/image/fetch/$s_!HpeI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd71e13cf-93d5-45b6-8e18-9b973134da23_1888x2238.png 848w, https://substackcdn.com/image/fetch/$s_!HpeI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd71e13cf-93d5-45b6-8e18-9b973134da23_1888x2238.png 1272w, https://substackcdn.com/image/fetch/$s_!HpeI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd71e13cf-93d5-45b6-8e18-9b973134da23_1888x2238.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>WHAT PLATFORM TEAMS SEE FIRST</h2><p>The symptom surfaces quietly.</p><p>Golden path usage drops. A team submits a request to bypass the CI pipeline for a one-off deployment. Another team forks the shared Terraform module instead of requesting a change. A senior engineer casually mentions that the platform adds a step that wasn&#8217;t there before.</p><p>Nobody files a ticket saying, &#8220;I do not trust this platform.&#8221; They just stop using it.</p><p>The metrics follow a few weeks later. Deployment frequency drops on platform-managed services. </p><p>Support requests thin out, which looks like a good thing until you realize it means teams stopped asking and started working around. Usage of the internal developer portal flatlines.</p><p>The signal is not loud. That is what makes it easy to misread.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">The Cloud Playbook is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>THE REFLEXIVE RESPONSE THAT MAKES IT WORSE</h2><p>Most platform teams respond to dropping adoption the same way.</p>
      <p>
          <a href="https://www.thecloudplaybook.com/p/developer-platform-adoption-trust-internal-developer-platform">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[TCP #108: Who owns this service? (If you need Slack to answer, you have a problem)]]></title><description><![CDATA[Turning ownership from a stale spreadsheet into an enforced AWS constraint wired to tags, on-call, and cost.]]></description><link>https://www.thecloudplaybook.com/p/service-ownership-enforcement-platform-engineering</link><guid isPermaLink="false">https://www.thecloudplaybook.com/p/service-ownership-enforcement-platform-engineering</guid><dc:creator><![CDATA[Amrut Patil]]></dc:creator><pubDate>Sun, 15 Mar 2026 14:29:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!rMd4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f56a653-7c8b-414c-8648-2f7a3319f8ad_2816x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You can also read my newsletters from the Substack mobile app and be notified when a new issue is available.</p><div class="install-substack-app-embed install-substack-app-embed-web" data-component-name="InstallSubstackAppToDOM"><img class="install-substack-app-embed-img" src="https://substackcdn.com/image/fetch/$s_!7MI5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b1ca555-b578-4ae3-8dda-a03cbc6b1d18_500x500.png"><div class="install-substack-app-embed-text"><div class="install-substack-app-header">Get more from Amrut Patil in the Substack app</div><div class="install-substack-app-text">Available for iOS and Android</div></div><a href="https://substack.com/app/app-store-redirect?utm_campaign=app-marketing&amp;utm_content=author-post-insert&amp;utm_source=thecloudplaybook" target="_blank" class="install-substack-app-embed-link"><button class="install-substack-app-embed-btn button primary">Get the app</button></a></div><div><hr></div><p>Average teams document ownership. Great teams enforce it.</p><p>Most engineering teams have a spreadsheet, a Confluence page, or a service catalog entry that says who owns what. That document gets created during a planning cycle, reviewed once, and then ignored. </p><p>Nobody updates it when engineers leave, when services get refactored, or when a new team inherits an old codebase.</p><p>The document exists. Ownership does not.</p><p>This gap is not a knowledge problem. <strong>It is a mechanism problem.</strong> </p><p><em>In the paid version of this issue, I include the exact AWS policies, Config rules, and Terraform patterns I use to enforce ownership in regulated environments. This free version explains the pattern so you can decide if it&#8217;s worth wiring into your platform.</em></p><p>Until you treat it that way, your incidents will keep revealing owners who did not know they were owners, your audits will surface gaps nobody saw coming, and your cost anomalies will have no clear path to resolution.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rMd4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f56a653-7c8b-414c-8648-2f7a3319f8ad_2816x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rMd4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f56a653-7c8b-414c-8648-2f7a3319f8ad_2816x1536.png 424w, https://substackcdn.com/image/fetch/$s_!rMd4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f56a653-7c8b-414c-8648-2f7a3319f8ad_2816x1536.png 848w, https://substackcdn.com/image/fetch/$s_!rMd4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f56a653-7c8b-414c-8648-2f7a3319f8ad_2816x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!rMd4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f56a653-7c8b-414c-8648-2f7a3319f8ad_2816x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rMd4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f56a653-7c8b-414c-8648-2f7a3319f8ad_2816x1536.png" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0f56a653-7c8b-414c-8648-2f7a3319f8ad_2816x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6484041,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thecloudplaybook.com/i/190236661?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f56a653-7c8b-414c-8648-2f7a3319f8ad_2816x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rMd4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f56a653-7c8b-414c-8648-2f7a3319f8ad_2816x1536.png 424w, https://substackcdn.com/image/fetch/$s_!rMd4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f56a653-7c8b-414c-8648-2f7a3319f8ad_2816x1536.png 848w, https://substackcdn.com/image/fetch/$s_!rMd4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f56a653-7c8b-414c-8648-2f7a3319f8ad_2816x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!rMd4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f56a653-7c8b-414c-8648-2f7a3319f8ad_2816x1536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>THE CATALOG THAT DRIFTS WHILE YOUR ORG MOVES</h2><p>Most teams treat ownership as a one-time labeling exercise.</p><p>They build a service catalog. They assign owners. They add a field in Backstage or a column in a spreadsheet. A few quarters later, engineers rotate, services split, and the catalog drifts from reality.</p><p>The result is a document that looks complete but carries no accountability.</p><p>When an incident fires at 2 am, the on-call engineer finds an owner who left eight months ago and starts pinging Slack channels. </p><p>The incident resolution time climbs. Post-mortems list &#8220;unclear ownership&#8221; as a contributing factor. Nobody changes the underlying system.</p><p>The same pattern appears in compliance reviews. The team points to documentation. The documentation points to a person who no longer holds that role. The audit finding stems from the fact that the document said one thing, while the organization reflected another.</p><p>Documentation without enforcement is not governance. It is the appearance of governance.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you&#8217;re on the free tier, you&#8217;ll get the concepts. Paid subscribers get the concrete templates and walkthroughs to ship this inside your AWS org.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>OWNERSHIP WIRED TO FUNCTION, NOT IDENTITY</h2><p>High-performing teams make ownership structural, not declarative.</p><p>They do not ask teams to record who owns a service. They build systems that make it impossible to provision a resource without declaring ownership. </p><p>They tie that declaration to on-call rotation, cost attribution, and access controls. When ownership changes, the system reflects it within one sprint, or the pipeline breaks.</p><p>The mechanism starts at provisioning. AWS Service Control Policies and Tag Policies in AWS Organizations prevent resource creation when mandatory ownership tags are absent. </p><p>For paid subscribers, I break down the specific tag keys, example SCPs, and the safe rollout sequence I&#8217;ve used so you don&#8217;t brick existing pipelines.</p><p>The tags include service name, team alias, on-call contact, environment, and cost center. A resource without those tags cannot be deployed. The enforcement is not a reminder. It is a gate.</p><p>The owner field points to a team alias or rotation, not a person&#8217;s name. When an engineer leaves, the alias stays active. The team updates the rotation. The on-call system stays intact.</p><p>Compliance reporting then runs against the live state, not the catalog. AWS Config Rules continuously check tagging compliance. </p><p>Drift gets surfaced to platform dashboards weekly. Teams see their own compliance score. Remediation is self-service.</p><p>This approach shifts ownership from an administrative task to an engineering constraint. It does not rely on discipline. It relies on design.</p><div><hr></div><h2>FASTER INCIDENTS. CLEANER AUDITS. NO ARCHAEOLOGY</h2><p>The operational gap between these two approaches is not theoretical.</p><p>Teams that enforce ownership at provisioning resolve incidents faster. The on-call contact is visible in the resource tag, surfaced by the monitoring tool, and reachable in under two minutes. There is no Slack archaeology.</p><p>Ownership is queryable. It is attached to the resource, not stored in someone&#8217;s memory. When auditors ask about access controls or cost attribution, the answer is a tag report.</p><p>Cost anomalies get routed to the right team without a platform team investigation.</p><p>A platform team that enforces ownership is running governance as a system. A platform team that documents ownership is running governance as a hope. The first team gets the budget. The second team gets audit findings.</p><div><hr></div><h2>VISIBILITY FIRST. ENFORCEMENT SECOND. IN THAT ORDER.</h2><p>The path from documentation to enforcement does not require rebuilding your platform. It requires picking the right starting point.</p><p>Pull an AWS Config report or use Resource Groups Tag Editor to identify every resource missing an owner tag. Show that report to your engineering leads. Not as a compliance finding. As a shared problem.</p><p>Then apply SCP-based tag enforcement to new accounts first. Existing accounts get a remediation window of 30 to 60 days. The Terraform modules in your golden path should include required tags by default, so new services deploy compliant from day one.</p><p>Wire ownership to function next. Link your on-call rotation to a team alias that appears in the service&#8217;s owner tag. When PagerDuty fires, the routing is automatic.</p><p>Surface each team&#8217;s ownership compliance score in your developer portal. Teams respond to scorecards they can see. They do not respond to spreadsheets they cannot find.</p><p>This shift takes one to two quarters. It is a governance layer applied to the infrastructure you already own.</p><div><hr></div><h2>RUN THIS CHECK THIS WEEK</h2><p>Pull your AWS Resource Groups Tag Editor report across all accounts. Filter for resources missing an &#8220;owner&#8221; or &#8220;team&#8221; tag.</p><p>If more than 20% of resources lack an owner declaration, you will have a gap that will surface in your next audit or incident.</p><p>Pick one account. Apply SCP-based tag enforcement only to new resources. Update your Terraform module defaults to include owner, team, environment, and cost-center tags. Ship that in the next sprint.</p><p>If your team does not have a standard Terraform module or a defined set of required tags, that is the starting point.</p><div><hr></div><p><em>If this resonated, you&#8217;re probably already picturing where your own ownership model would crack during an incident or audit.</em></p><p><em>You can close that gap in two ways:</em></p><ul><li><p><em>Assemble the mechanisms yourself from this essay, AWS docs, and a few painful incidents, or</em></p></li><li><p><em>Start from a baseline that has already survived real FedRAMP / HIPAA / ISO environments.</em></p></li></ul><p><em>The paid version of The Cloud Playbook takes essays like this and turns them into implementation kits.</em></p><p><em>For this issue, paid subscribers get:</em></p><ul><li><p><em>A 90&#8209;day rollout plan for enforcing ownership at provisioning</em></p></li><li><p><em>Example AWS Tag Policies and SCPs to block untagged resources</em></p></li><li><p><em>AWS Config rules to surface ownership drift weekly</em></p></li><li><p><em>An ownership scorecard spec you can drop into your portal</em></p></li></ul><p><em>If enforcing ownership like this doesn&#8217;t save you more than the subscription in one incident or one audit cycle, you should cancel.</em></p><p><em>If you want that kit, upgrade to the paid newsletter here</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe&quot;,&quot;text&quot;:&quot;Upgrade Here&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.thecloudplaybook.com/subscribe"><span>Upgrade Here</span></a></p><div><hr></div><h2><strong>Whenever you&#8217;re ready</strong></h2><p>There are two ways I can help you further:</p><ol><li><p><strong>Get the AWS Platform Predictability Starter Kit (Free)</strong><br>Four short tools to baseline where your platform is strong vs where it&#8217;s bleeding:</p><ul><li><p>5&#8209;minute predictability checklist</p></li><li><p>&#8220;Where are we bleeding?&#8221; team scorecard</p></li><li><p>Platform risk radar with 12 early&#8209;warning signals</p></li><li><p>10 executive questions with weak vs strong answers + debrief worksheet<br></p></li></ul><p>Most leaders run through these in a week of normal meetings and come away with a clear &#8220;top 3&#8221; to fix.</p><p><br>&#8594; <strong>Grab the Starter Kit</strong>: <a href="https://thecloudplaybook.gumroad.com/l/aws-platform-predictability-starter-kit">https://thecloudplaybook.gumroad.com/l/aws-platform-predictability-starter-kit</a></p></li><li><p><strong>Keep getting essays like this every week</strong><br>Stay on the free list, apply one check per week, and share this with your platform peers so you&#8217;re solving the same problems with the same language.</p></li></ol><div><hr></div><h2><strong>Get in touch</strong></h2><p>You can find me on <a href="https://www.linkedin.com/in/patilamrut/">LinkedIn</a> or <a href="https://twitter.com/realamrutpatil">X</a>.</p><p>If you would like to request a topic to read, please feel free to contact me directly via LinkedIn or X.</p>]]></content:encoded></item><item><title><![CDATA[TCP #107: Database Architecture for Multi-Tenant Platforms: The Tradeoffs Nobody Explains Well]]></title><description><![CDATA[What I would build differently and the one rule I enforce at tenant onboarding now.]]></description><link>https://www.thecloudplaybook.com/p/multi-tenant-database-architecture-tradeoffs-pooled-siloed-schema</link><guid isPermaLink="false">https://www.thecloudplaybook.com/p/multi-tenant-database-architecture-tradeoffs-pooled-siloed-schema</guid><dc:creator><![CDATA[Amrut Patil]]></dc:creator><pubDate>Wed, 11 Mar 2026 12:03:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!HEg0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bbc967b-3269-4151-8336-42fdfe60648d_2816x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You can also read my newsletters from the Substack mobile app and be notified when a new issue is available.</p><div class="install-substack-app-embed install-substack-app-embed-web" data-component-name="InstallSubstackAppToDOM"><img class="install-substack-app-embed-img" src="https://substackcdn.com/image/fetch/$s_!7MI5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b1ca555-b578-4ae3-8dda-a03cbc6b1d18_500x500.png"><div class="install-substack-app-embed-text"><div class="install-substack-app-header">Get more from Amrut Patil in the Substack app</div><div class="install-substack-app-text">Available for iOS and Android</div></div><a href="https://substack.com/app/app-store-redirect?utm_campaign=app-marketing&amp;utm_content=author-post-insert&amp;utm_source=thecloudplaybook" target="_blank" class="install-substack-app-embed-link"><button class="install-substack-app-embed-btn button primary">Get the app</button></a></div><div><hr></div><p>When we onboarded our ninth external tenant, we ran into a wall.</p><p>A compliance audit required per-tenant evidence of data isolation. Our pooled RDS instance, with row-level security as the only enforcement layer, could not produce that evidence cleanly.</p><p>We spent six weeks generating audit documentation that a siloed architecture would have produced automatically.</p><p>The database architecture decision I made at tenant two was still costing us at tenant nine.</p><p>This is how I evaluate it now.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HEg0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bbc967b-3269-4151-8336-42fdfe60648d_2816x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HEg0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bbc967b-3269-4151-8336-42fdfe60648d_2816x1536.png 424w, https://substackcdn.com/image/fetch/$s_!HEg0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bbc967b-3269-4151-8336-42fdfe60648d_2816x1536.png 848w, https://substackcdn.com/image/fetch/$s_!HEg0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bbc967b-3269-4151-8336-42fdfe60648d_2816x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!HEg0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bbc967b-3269-4151-8336-42fdfe60648d_2816x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HEg0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bbc967b-3269-4151-8336-42fdfe60648d_2816x1536.png" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7bbc967b-3269-4151-8336-42fdfe60648d_2816x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7424181,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecloudplaybook.com/i/188976623?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bbc967b-3269-4151-8336-42fdfe60648d_2816x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HEg0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bbc967b-3269-4151-8336-42fdfe60648d_2816x1536.png 424w, https://substackcdn.com/image/fetch/$s_!HEg0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bbc967b-3269-4151-8336-42fdfe60648d_2816x1536.png 848w, https://substackcdn.com/image/fetch/$s_!HEg0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bbc967b-3269-4151-8336-42fdfe60648d_2816x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!HEg0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bbc967b-3269-4151-8336-42fdfe60648d_2816x1536.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3>THREE DATABASE ISOLATION MODELS. ONE CHOICE. NO EASY UNDO.</h3><p>Every multi-tenant platform on AWS eventually faces the same fork: how do you store tenant data?</p><p>Three models dominate the decision.</p>
      <p>
          <a href="https://www.thecloudplaybook.com/p/multi-tenant-database-architecture-tradeoffs-pooled-siloed-schema">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[TCP #106: Developers want autonomy. Platform wants consistency. Both are wrong.]]></title><description><![CDATA[Not because either position is bad. Because neither one is a strategy.]]></description><link>https://www.thecloudplaybook.com/p/developer-autonomy-vs-platform-consistency-where-both-go-wrong</link><guid isPermaLink="false">https://www.thecloudplaybook.com/p/developer-autonomy-vs-platform-consistency-where-both-go-wrong</guid><dc:creator><![CDATA[Amrut Patil]]></dc:creator><pubDate>Sun, 08 Mar 2026 12:02:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Qja7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dce921a-530e-44fe-8d99-529235884356_2816x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You can also read my newsletters from the Substack mobile app and be notified when a new issue is available.</p><div class="install-substack-app-embed install-substack-app-embed-web" data-component-name="InstallSubstackAppToDOM"><img class="install-substack-app-embed-img" src="https://substackcdn.com/image/fetch/$s_!7MI5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b1ca555-b578-4ae3-8dda-a03cbc6b1d18_500x500.png"><div class="install-substack-app-embed-text"><div class="install-substack-app-header">Get more from Amrut Patil in the Substack app</div><div class="install-substack-app-text">Available for iOS and Android</div></div><a href="https://substack.com/app/app-store-redirect?utm_campaign=app-marketing&amp;utm_content=author-post-insert&amp;utm_source=thecloudplaybook" target="_blank" class="install-substack-app-embed-link"><button class="install-substack-app-embed-btn button primary">Get the app</button></a></div><div><hr></div><p>This tension is as old as platform engineering itself.</p><p>Developers want to move fast. They want to choose the tools they know. They want to deploy without filing a ticket or waiting for a pipeline they did not build.</p><p>Platform teams want predictability. They want one observability stack, one deployment model, and one set of guardrails that applies across every team.</p><p>Both are right.</p><p>Both, taken to their conclusion, produce organizations that cannot scale.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Qja7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dce921a-530e-44fe-8d99-529235884356_2816x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Qja7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dce921a-530e-44fe-8d99-529235884356_2816x1536.png 424w, https://substackcdn.com/image/fetch/$s_!Qja7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dce921a-530e-44fe-8d99-529235884356_2816x1536.png 848w, https://substackcdn.com/image/fetch/$s_!Qja7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dce921a-530e-44fe-8d99-529235884356_2816x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!Qja7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dce921a-530e-44fe-8d99-529235884356_2816x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Qja7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dce921a-530e-44fe-8d99-529235884356_2816x1536.png" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2dce921a-530e-44fe-8d99-529235884356_2816x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7038885,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecloudplaybook.com/i/188972957?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dce921a-530e-44fe-8d99-529235884356_2816x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Qja7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dce921a-530e-44fe-8d99-529235884356_2816x1536.png 424w, https://substackcdn.com/image/fetch/$s_!Qja7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dce921a-530e-44fe-8d99-529235884356_2816x1536.png 848w, https://substackcdn.com/image/fetch/$s_!Qja7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dce921a-530e-44fe-8d99-529235884356_2816x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!Qja7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dce921a-530e-44fe-8d99-529235884356_2816x1536.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3>AUTONOMY IS RIGHT UNTIL IT ISN&#8217;T</h3><p>The case for developer autonomy in platform engineering is real.</p><p>Engineers closest to the problem understand the constraints better than anyone else.</p><p>A team building a real-time data pipeline knows what latency profile they need. A team managing a compliance-critical workflow knows where the edge cases are.</p><p>Giving them the tools and the freedom to solve their problem without platform overhead produces faster decisions and better systems for that specific problem.</p><p>Autonomy also drives platform adoption.</p><p>Teams that feel constrained by a platform find workarounds. They build shadow infrastructure. They use unapproved tooling. They create exactly the inconsistency the platform was designed to prevent, except now it is invisible to the platform team.</p><p>Developer autonomy, when it works, is not chaos. It is trust.</p><p>It signals that the platform team respects engineering judgment and is not building for control.</p><p>The failure mode is not autonomy itself. It is autonomy without any shared foundation.</p><p>Twelve teams. Eight deployment mechanisms. Six observability stacks. No consistent tagging. No shared on-call model. No golden path anyone actually walks.</p><p>At that point, developer autonomy has produced a system nobody can operate at scale.</p><p>Incidents cross service boundaries that no single engineer understands. Compliance audits require twelve different evidence formats. New engineers spend months learning the local conventions of each team before they can contribute.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">The Cloud Playbook is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>CONSISTENCY IS RIGHT UNTIL IT ISN&#8217;T</h3><p>The case for platform consistency is equally real.</p><p>When every team deploys through the same pipeline, tags resources consistently, and emits metrics in the same format, the platform team can support them all.</p><p>Incidents become diagnosable because the signal looks the same across every service.</p><p>Compliance audits become repeatable because the evidence structure does not change between tenants.</p><p>Cost attribution becomes automatic because the tagging model is enforced rather than aspirational.</p><p>Consistency is what makes a platform team of seven supportable across seventy-five developers.</p><p>Without it, every team becomes its own operational burden.</p><p>The failure mode is not consistency itself. It is consistency applied to the wrong layer.</p><p>When a platform mandates a specific logging library, a specific test framework, a specific database client, it has crossed from enforcing operational standards into controlling engineering decisions that belong to the team.</p><p>That is where platform adoption collapses.</p><p>Engineers do not fight the pipeline. They route around it. They build locally and push to production via the path with the least friction, which is now outside the platform&#8217;s visibility.</p><p>The platform team has achieved consistency on paper and lost it in practice.</p><div><hr></div><h3>WHERE BOTH GO WRONG AT THE SAME TIME</h3><p>The trap is not picking the wrong side.</p><p>The trap is treating this as a values conflict between platform control and developer freedom, then oscillating between them based on whoever complained most recently.</p><p>Platform teams that get burned by inconsistency clamp down. They add mandatory steps. They restrict tool choices. Developers push back. The platform team softens the requirements. Inconsistency returns.</p><p>The cycle repeats.</p><p>Neither position is wrong. Both are responding to real failure modes.</p><p>The problem is that swinging between them is not a strategy. It is a symptom of not having a clear model for where consistency is non-negotiable and where developer autonomy is not just acceptable but preferable.</p><div><hr></div><h3>HOW TO HOLD THE TENSION</h3><p>The resolution is not a compromise. It is a boundary.</p><p>Define what the platform owns and what the team owns. State it explicitly. Enforce the platform&#8217;s layer. Leave the team&#8217;s layer genuinely open.</p><p>The platform owns: tagging, account structure, network topology, deployment gates, security baselines, compliance controls, and observability standards.</p><p>These are non-negotiable because variation here creates systemic risk. One team&#8217;s non-standard deployment gate becomes a compliance gap that blocks the entire organization&#8217;s certification.</p><p>The team owns: language, framework, internal libraries, database choice within approved types, caching strategy, and service architecture.</p><p>These decisions belong to the engineers closest to the problem. The platform&#8217;s job is not to make these decisions for them. It is to make sure those decisions do not create operational or compliance risk at the system level.</p><p>The golden path in platform engineering is not a mandate. It is an offer.</p><p>Here is the fastest way to build and ship something that is secure, compliant, and observable. Use it if it fits.</p><p>If it does not fit, tell the platform team why, and we will decide together whether the standard needs to change or the exception needs guardrails.</p><p>That conversation is what separates a platform developers trust from one they tolerate.</p><p>When developers can see exactly where the boundary is, and it is set at the right layer, the autonomy vs. consistency tension stops being a conflict.</p><p>It becomes a design.</p><p>The platform&#8217;s job is not to eliminate developer judgment. It is to make sure that judgment operates within boundaries that the whole organization can rely on.</p><div><hr></div><p>This week, write a one&#8209;page &#8220;platform vs team&#8221; RACI:</p><ul><li><p>List: tagging, accounts, network, deploy gates, security, observability</p></li><li><p>List: language, framework, DB within approved list, caching, service architecture<br>Circle which ones are fuzzy today. Those fuzzies are where your incidents and platform fights are coming from.</p></li></ul><div><hr></div><p><em><strong>Free tool: Score your AWS platform&#8217;s predictability in 5 minutes</strong><br>I just shipped a new free tool for you: a 6&#8209;page, 18&#8209;question checklist to score how predictable your AWS platform really is across deployments, incidents, onboarding, cost, compliance, and throughput.</em></p><p><em>It takes 5 minutes and tells you if you&#8217;re in Reactive, Stabilizing, or Predictable territory, plus what to fix first.</em></p><p><em>&#128073; Grab the free PDF here: <a href="https://thecloudplaybook.gumroad.com/l/aws-platform-predictability-check">https://thecloudplaybook.gumroad.com/l/aws-platform-predictability-check</a></em></p><p><em>If you run an AWS platform in production, <strong>do this before your next incident review</strong> so you&#8217;re not guessing which part of the platform will bite you next.</em></p><p><em>In the paid Cloud Playbook tier, I share the exact &#8220;Platform vs Team Contract&#8221; template and review checklist I use to draw these boundaries without starting a turf war.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/subscribe&quot;,&quot;text&quot;:&quot;Upgrade Here&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.thecloudplaybook.com/subscribe"><span>Upgrade Here</span></a></p><div><hr></div><h2><strong>Whenever you&#8217;re ready, there are 2 ways I can help you:</strong></h2><ol><li><p><strong>Free guides and helpful resources: </strong><a href="https://thecloudplaybook.gumroad.com/">https://thecloudplaybook.gumroad.com/</a></p></li><li><p>Get certified as an <strong>AWS AI Practitioner</strong> in 2026. Sign up today to elevate your cloud skills. (<em><a href="https://www.udemy.com/course/aws-certified-ai-practitioner-practice-exams-aif-c01/">link</a></em>)</p></li></ol><div><hr></div><h2><strong>That&#8217;s it for today!</strong></h2><p>Did you enjoy this newsletter issue?</p><p>Share with your friends, colleagues, and your favorite social media platform.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share The Cloud Playbook&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.thecloudplaybook.com/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share The Cloud Playbook</span></a></p><p><strong>Until next week &#8212; Amrut</strong></p><div><hr></div><h2><strong>Get in touch</strong></h2><p>You can find me on <a href="https://www.linkedin.com/in/patilamrut/">LinkedIn</a> or <a href="https://twitter.com/realamrutpatil">X</a>.</p><p>If you would like to request a topic to read, please feel free to contact me directly via LinkedIn or X.</p><p></p>]]></content:encoded></item><item><title><![CDATA[TCP# 105: The Multi-Tenant Architecture I'd Never Build Again]]></title><description><![CDATA[Nine tenants. Eleven services. One pooled model. This is what we got wrong.]]></description><link>https://www.thecloudplaybook.com/p/multi-tenant-architecture-mistakes-lessons-aws-platform-engineering</link><guid isPermaLink="false">https://www.thecloudplaybook.com/p/multi-tenant-architecture-mistakes-lessons-aws-platform-engineering</guid><dc:creator><![CDATA[Amrut Patil]]></dc:creator><pubDate>Wed, 04 Mar 2026 13:04:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!EL2x!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe7feed-4165-4fa6-8fd6-f7741eae3533_3168x1344.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You can also read my newsletters from the Substack mobile app and be notified when a new issue is available.</p><div class="install-substack-app-embed install-substack-app-embed-web" data-component-name="InstallSubstackAppToDOM"><img class="install-substack-app-embed-img" src="https://substackcdn.com/image/fetch/$s_!7MI5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b1ca555-b578-4ae3-8dda-a03cbc6b1d18_500x500.png"><div class="install-substack-app-embed-text"><div class="install-substack-app-header">Get more from Amrut Patil in the Substack app</div><div class="install-substack-app-text">Available for iOS and Android</div></div><a href="https://substack.com/app/app-store-redirect?utm_campaign=app-marketing&amp;utm_content=author-post-insert&amp;utm_source=thecloudplaybook" target="_blank" class="install-substack-app-embed-link"><button class="install-substack-app-embed-btn button primary">Get the app</button></a></div><div><hr></div><p>We built a shared-everything multi-tenant platform on AWS.</p><p>One database per service. Tenant data separated by row-level filters. One deployment pipeline. One observability stack. One set of IAM roles scoped to the service, not the tenant.</p><p>It looked clean on a whiteboard.</p><p>It did not survive contact with production.</p><p>This is what we got wrong, what it cost us, and what I would build instead.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EL2x!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe7feed-4165-4fa6-8fd6-f7741eae3533_3168x1344.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EL2x!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe7feed-4165-4fa6-8fd6-f7741eae3533_3168x1344.png 424w, https://substackcdn.com/image/fetch/$s_!EL2x!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe7feed-4165-4fa6-8fd6-f7741eae3533_3168x1344.png 848w, https://substackcdn.com/image/fetch/$s_!EL2x!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe7feed-4165-4fa6-8fd6-f7741eae3533_3168x1344.png 1272w, https://substackcdn.com/image/fetch/$s_!EL2x!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe7feed-4165-4fa6-8fd6-f7741eae3533_3168x1344.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EL2x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe7feed-4165-4fa6-8fd6-f7741eae3533_3168x1344.png" width="1456" height="618" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6fe7feed-4165-4fa6-8fd6-f7741eae3533_3168x1344.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:618,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7827201,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thecloudplaybook.com/i/188755684?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe7feed-4165-4fa6-8fd6-f7741eae3533_3168x1344.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EL2x!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe7feed-4165-4fa6-8fd6-f7741eae3533_3168x1344.png 424w, https://substackcdn.com/image/fetch/$s_!EL2x!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe7feed-4165-4fa6-8fd6-f7741eae3533_3168x1344.png 848w, https://substackcdn.com/image/fetch/$s_!EL2x!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe7feed-4165-4fa6-8fd6-f7741eae3533_3168x1344.png 1272w, https://substackcdn.com/image/fetch/$s_!EL2x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe7feed-4165-4fa6-8fd6-f7741eae3533_3168x1344.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3>THE INCIDENT THAT EXPOSED EVERYTHING</h3><p>Eighteen months after launch, a single tenant&#8217;s batch job consumed enough database connection pool capacity to degrade response times for every other tenant on the platform.</p><p>No data breach. No data loss.</p><p>Just one tenant&#8217;s workload bleeding into every other tenant&#8217;s experience.</p><p>Leadership called it a performance issue.</p>
      <p>
          <a href="https://www.thecloudplaybook.com/p/multi-tenant-architecture-mistakes-lessons-aws-platform-engineering">
              Read more
          </a>
      </p>
   ]]></content:encoded></item></channel></rss>