<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.2.2">Jekyll</generator><link href="https://blog.victorsilva.com.uy/feed.xml" rel="self" type="application/atom+xml" /><link href="https://blog.victorsilva.com.uy/" rel="alternate" type="text/html" /><updated>2026-04-16T11:51:12+00:00</updated><id>https://blog.victorsilva.com.uy/feed.xml</id><title type="html">$blogTopics | % {echo $_}</title><subtitle>Technical blog about development, ai, security, and cloud engineering by Victor Silva.</subtitle><author><name>Victor Silva</name><email>info@victorsilva.com.uy</email></author><entry><title type="html">Kubernetes Audit Logging: Policy, Fluent Bit, and Alerting</title><link href="https://blog.victorsilva.com.uy/kubernetes-audit-logging/" rel="alternate" type="text/html" title="Kubernetes Audit Logging: Policy, Fluent Bit, and Alerting" /><published>2026-04-13T07:37:23+00:00</published><updated>2026-04-13T07:37:23+00:00</updated><id>https://blog.victorsilva.com.uy/kubernetes-audit-logging</id><content type="html" xml:base="https://blog.victorsilva.com.uy/kubernetes-audit-logging/"><![CDATA[<p>An incident happens. A secret is read, a ClusterRoleBinding is modified, someone runs <code class="language-plaintext highlighter-rouge">kubectl exec</code> into a production pod. You start the post-mortem and reach for the audit trail — and it is either missing, incomplete, or buried under so much noise that the relevant events are invisible. That is the exact situation audit logging is supposed to prevent, and it is surprisingly common because most teams configure it as an afterthought.</p>

<p>Kubernetes audit logging is built into <code class="language-plaintext highlighter-rouge">kube-apiserver</code> and gives you a structured JSON record of every API call made against your cluster: who did what, to which resource, when, and what the server returned. Done right, it is the forensic backbone of your cluster security posture. Done wrong, it either floods your log storage with garbage or silently drops the events you actually care about.</p>

<p>This post covers the full picture: how the audit pipeline works, how to write a production policy that suppresses noise first and captures high-value events at maximum fidelity, and how to ship those logs with Fluent Bit to Elasticsearch or Loki so your SIEM can alert on them.</p>

<h2 id="how-kubernetes-audit-logging-works">How Kubernetes Audit Logging Works</h2>

<p>Every request to the Kubernetes API server moves through a defined lifecycle. The audit subsystem emits one event per stage that is relevant to your policy:</p>

<ul>
  <li><strong>RequestReceived</strong> — emitted the moment <code class="language-plaintext highlighter-rouge">kube-apiserver</code> receives the request, before any authorization or processing</li>
  <li><strong>ResponseStarted</strong> — emitted when the response headers are sent but before the body is streamed (relevant mainly for watch calls)</li>
  <li><strong>ResponseComplete</strong> — emitted when the full response is sent; this is the stage with the most useful context</li>
  <li><strong>Panic</strong> — emitted when <code class="language-plaintext highlighter-rouge">kube-apiserver</code> encounters an internal error handling the request</li>
</ul>

<p>The audit subsystem supports two backends simultaneously: a <strong>log file backend</strong> (<code class="language-plaintext highlighter-rouge">--audit-log-path</code>) that writes newline-delimited JSON to a file on the control-plane node, and a <strong>webhook backend</strong> (<code class="language-plaintext highlighter-rouge">--audit-webhook-config-file</code>) that POSTs events to an external HTTP endpoint. Most production setups use the file backend as the primary and ship from there.</p>

<p>The policy file (<code class="language-plaintext highlighter-rouge">--audit-policy-file</code>) controls what gets recorded and at what verbosity. Without a policy file, nothing is logged. The policy is evaluated top-to-bottom and the first matching rule wins, which is why rule order matters enormously.</p>

<p>The data flow looks like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> kubectl / CI pipeline / controller
           |
           v
    kube-apiserver
           |
     Audit pipeline
           |
    Policy evaluation
    (first match wins)
           |
      +----+----+
      |         |
  File backend  Webhook backend
  (audit.log)   (external endpoint)
      |
  Fluent Bit (DaemonSet on control-plane)
      |
  +---+---+
  |       |
  ES     Loki
</code></pre></div></div>

<p>Each audit event is a JSON object. The fields you will query most in a security context are:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">verb</code> — the HTTP verb mapped to a Kubernetes action: <code class="language-plaintext highlighter-rouge">get</code>, <code class="language-plaintext highlighter-rouge">list</code>, <code class="language-plaintext highlighter-rouge">watch</code>, <code class="language-plaintext highlighter-rouge">create</code>, <code class="language-plaintext highlighter-rouge">update</code>, <code class="language-plaintext highlighter-rouge">patch</code>, <code class="language-plaintext highlighter-rouge">delete</code></li>
  <li><code class="language-plaintext highlighter-rouge">user.username</code> — the authenticated identity; for service accounts this is <code class="language-plaintext highlighter-rouge">system:serviceaccount:&lt;namespace&gt;:&lt;name&gt;</code></li>
  <li><code class="language-plaintext highlighter-rouge">objectRef.resource</code> — the resource type being acted on: <code class="language-plaintext highlighter-rouge">secrets</code>, <code class="language-plaintext highlighter-rouge">pods</code>, <code class="language-plaintext highlighter-rouge">clusterrolebindings</code>, etc.</li>
  <li><code class="language-plaintext highlighter-rouge">objectRef.name</code> — the specific object name</li>
  <li><code class="language-plaintext highlighter-rouge">sourceIPs</code> — the originating IP addresses</li>
  <li><code class="language-plaintext highlighter-rouge">responseStatus.code</code> — the HTTP response code; <code class="language-plaintext highlighter-rouge">401</code> and <code class="language-plaintext highlighter-rouge">403</code> are particularly useful for security alerting</li>
  <li><code class="language-plaintext highlighter-rouge">stage</code> — which pipeline stage emitted this event</li>
</ul>

<h2 id="audit-policy-levels">Audit Policy Levels</h2>

<p>The policy file assigns one of four recording levels to each matched request. Choosing the right level per resource type is the difference between a useful audit trail and a storage bill you cannot explain.</p>

<table>
  <thead>
    <tr>
      <th>Level</th>
      <th>What is recorded</th>
      <th>When to use it</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>None</td>
      <td>Nothing</td>
      <td>High-volume noise: health checks, watch loops, controller heartbeats</td>
    </tr>
    <tr>
      <td>Metadata</td>
      <td>Request metadata only (verb, user, resource, timestamp)</td>
      <td>Routine operations where you need the who-did-what but not the payload</td>
    </tr>
    <tr>
      <td>Request</td>
      <td>Metadata + request body</td>
      <td>Mutations where you want to see exactly what was sent</td>
    </tr>
    <tr>
      <td>RequestResponse</td>
      <td>Full request + full response body</td>
      <td>Secret reads, RBAC changes, exec — anything where the payload itself is evidence</td>
    </tr>
  </tbody>
</table>

<p>The <code class="language-plaintext highlighter-rouge">RequestResponse</code> level on a resource like <code class="language-plaintext highlighter-rouge">configmaps</code> with a <code class="language-plaintext highlighter-rouge">list</code> verb will include the full response body for every list call, which means every value in every ConfigMap in the response ends up in your audit log. That is both a storage problem and a security problem if the audit log destination is not properly secured. Be precise about which verbs you apply <code class="language-plaintext highlighter-rouge">RequestResponse</code> to.</p>

<h2 id="writing-a-production-audit-policy">Writing a Production Audit Policy</h2>

<p>The right approach is noise suppression first. Start by silencing the internal system traffic that would otherwise dominate your log volume — API server self-calls, kube-proxy watch loops, node status reconciliation, controller manager polls — and then escalate the recording level only for resources that carry security significance.</p>

<p>Here is the full production policy:</p>

<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">audit.k8s.io/v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">Policy</span>
<span class="na">omitStages</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="s2">"</span><span class="s">RequestReceived"</span>
<span class="na">rules</span><span class="pi">:</span>
  <span class="c1"># --- Noise suppression ---</span>
  <span class="pi">-</span> <span class="na">level</span><span class="pi">:</span> <span class="s">None</span>
    <span class="na">users</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">system:apiserver"</span><span class="pi">]</span>
    <span class="na">verbs</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">get"</span><span class="pi">]</span>
    <span class="na">resources</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">group</span><span class="pi">:</span> <span class="s2">"</span><span class="s">"</span>
        <span class="na">resources</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">endpoints"</span><span class="pi">]</span>
  <span class="pi">-</span> <span class="na">level</span><span class="pi">:</span> <span class="s">None</span>
    <span class="na">users</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">system:kube-proxy"</span><span class="pi">]</span>
    <span class="na">verbs</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">watch"</span><span class="pi">]</span>
    <span class="na">resources</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">group</span><span class="pi">:</span> <span class="s2">"</span><span class="s">"</span>
        <span class="na">resources</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">endpoints"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">services"</span><span class="pi">]</span>
  <span class="pi">-</span> <span class="na">level</span><span class="pi">:</span> <span class="s">None</span>
    <span class="na">userGroups</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">system:nodes"</span><span class="pi">]</span>
    <span class="na">verbs</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">get"</span><span class="pi">]</span>
    <span class="na">resources</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">group</span><span class="pi">:</span> <span class="s2">"</span><span class="s">"</span>
        <span class="na">resources</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">nodes"</span><span class="pi">]</span>
  <span class="pi">-</span> <span class="na">level</span><span class="pi">:</span> <span class="s">None</span>
    <span class="na">users</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">system:kube-controller-manager"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">system:kube-scheduler"</span><span class="pi">]</span>
    <span class="na">verbs</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">get"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">list"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">watch"</span><span class="pi">]</span>
    <span class="na">resources</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">group</span><span class="pi">:</span> <span class="s2">"</span><span class="s">"</span>
        <span class="na">resources</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">endpoints"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">configmaps"</span><span class="pi">]</span>
  <span class="pi">-</span> <span class="na">level</span><span class="pi">:</span> <span class="s">None</span>
    <span class="na">nonResourceURLs</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">/healthz*"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">/readyz*"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">/livez*"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">/metrics"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">/version"</span><span class="pi">]</span>

  <span class="c1"># --- High-fidelity security captures ---</span>
  <span class="pi">-</span> <span class="na">level</span><span class="pi">:</span> <span class="s">RequestResponse</span>
    <span class="na">verbs</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">get"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">list"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">watch"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">create"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">update"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">patch"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">delete"</span><span class="pi">]</span>
    <span class="na">resources</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">group</span><span class="pi">:</span> <span class="s2">"</span><span class="s">"</span>
        <span class="na">resources</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">secrets"</span><span class="pi">]</span>
  <span class="pi">-</span> <span class="na">level</span><span class="pi">:</span> <span class="s">RequestResponse</span>
    <span class="na">verbs</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">create"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">update"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">patch"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">delete"</span><span class="pi">]</span>
    <span class="na">resources</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">group</span><span class="pi">:</span> <span class="s2">"</span><span class="s">rbac.authorization.k8s.io"</span>
        <span class="na">resources</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">roles"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">clusterroles"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">rolebindings"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">clusterrolebindings"</span><span class="pi">]</span>
  <span class="pi">-</span> <span class="na">level</span><span class="pi">:</span> <span class="s">RequestResponse</span>
    <span class="na">verbs</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">create"</span><span class="pi">]</span>
    <span class="na">resources</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">group</span><span class="pi">:</span> <span class="s2">"</span><span class="s">"</span>
        <span class="na">resources</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">serviceaccounts/token"</span><span class="pi">]</span>
  <span class="pi">-</span> <span class="na">level</span><span class="pi">:</span> <span class="s">RequestResponse</span>
    <span class="na">verbs</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">create"</span><span class="pi">]</span>
    <span class="na">resources</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">group</span><span class="pi">:</span> <span class="s2">"</span><span class="s">"</span>
        <span class="na">resources</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">pods/exec"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">pods/attach"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">pods/portforward"</span><span class="pi">]</span>

  <span class="c1"># --- Mutation capture ---</span>
  <span class="pi">-</span> <span class="na">level</span><span class="pi">:</span> <span class="s">Request</span>
    <span class="na">verbs</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">create"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">update"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">patch"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">delete"</span><span class="pi">]</span>
    <span class="na">resources</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">group</span><span class="pi">:</span> <span class="s2">"</span><span class="s">"</span>
        <span class="na">resources</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">configmaps"</span><span class="pi">]</span>
  <span class="pi">-</span> <span class="na">level</span><span class="pi">:</span> <span class="s">RequestResponse</span>
    <span class="na">verbs</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">create"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">delete"</span><span class="pi">]</span>
    <span class="na">resources</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">group</span><span class="pi">:</span> <span class="s2">"</span><span class="s">"</span>
        <span class="na">resources</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">namespaces"</span><span class="pi">]</span>
  <span class="pi">-</span> <span class="na">level</span><span class="pi">:</span> <span class="s">Request</span>
    <span class="na">verbs</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">create"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">update"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">patch"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">delete"</span><span class="pi">]</span>
    <span class="na">resources</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">group</span><span class="pi">:</span> <span class="s2">"</span><span class="s">apps"</span>
        <span class="na">resources</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">deployments"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">statefulsets"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">daemonsets"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">replicasets"</span><span class="pi">]</span>

  <span class="c1"># --- Metadata-level for pod lifecycle ---</span>
  <span class="pi">-</span> <span class="na">level</span><span class="pi">:</span> <span class="s">Metadata</span>
    <span class="na">verbs</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">create"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">delete"</span><span class="pi">]</span>
    <span class="na">resources</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">group</span><span class="pi">:</span> <span class="s2">"</span><span class="s">"</span>
        <span class="na">resources</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">pods"</span><span class="pi">]</span>

  <span class="c1"># --- Catch-all ---</span>
  <span class="pi">-</span> <span class="na">level</span><span class="pi">:</span> <span class="s">Metadata</span>
    <span class="na">omitStages</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="s2">"</span><span class="s">ResponseStarted"</span></code></pre></figure>

<p>A few design decisions worth explaining.</p>

<p>The <code class="language-plaintext highlighter-rouge">omitStages: [RequestReceived]</code> at the top of the policy applies globally. <code class="language-plaintext highlighter-rouge">RequestReceived</code> fires before authorization, which means it doubles your log volume without adding any information about what actually happened. Omitting it cluster-wide is the single most impactful thing you can do for audit log volume.</p>

<p>The noise suppression rules at the top silence internal system identities doing routine reconciliation work. Without them, <code class="language-plaintext highlighter-rouge">system:kube-proxy</code> watch calls and <code class="language-plaintext highlighter-rouge">system:kube-controller-manager</code> list operations generate tens of thousands of events per hour on a busy cluster.</p>

<p>Secrets get <code class="language-plaintext highlighter-rouge">RequestResponse</code> on all verbs including reads. This is intentional. If an attacker or a misconfigured service account reads a secret, you want the full response in the log — including the base64-encoded values — so you can confirm exactly which credentials were exposed. This means the audit log destination must be treated as a sensitive data store, not a general-purpose logging endpoint.</p>

<p>The RBAC section captures <code class="language-plaintext highlighter-rouge">create</code>, <code class="language-plaintext highlighter-rouge">update</code>, <code class="language-plaintext highlighter-rouge">patch</code>, and <code class="language-plaintext highlighter-rouge">delete</code> on all four RBAC resource types. Privilege escalation via RBAC is one of the most common lateral movement techniques in compromised clusters, and you want full request and response fidelity when it happens.</p>

<p>The catch-all <code class="language-plaintext highlighter-rouge">Metadata</code> rule at the bottom ensures that any API group or resource not explicitly matched by an earlier rule still gets recorded at the metadata level. Without this rule, new custom resource types or API extensions introduced to your cluster would be silently dropped from the audit log.</p>

<h3 id="applying-the-policy">Applying the Policy</h3>

<p>On a kubeadm-managed cluster, place the policy file on the control-plane node and reference it in the <code class="language-plaintext highlighter-rouge">kube-apiserver</code> static pod manifest:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Copy the policy to the control-plane node</span>
<span class="nb">sudo cp </span>audit-policy.yaml /etc/kubernetes/audit/audit-policy.yaml

<span class="c"># Add these flags to /etc/kubernetes/manifests/kube-apiserver.yaml</span>
<span class="c"># under spec.containers[0].command:</span>
<span class="c">#   - --audit-policy-file=/etc/kubernetes/audit/audit-policy.yaml</span>
<span class="c">#   - --audit-log-path=/var/log/kubernetes/audit/audit.log</span>
<span class="c">#   - --audit-log-maxage=30</span>
<span class="c">#   - --audit-log-maxbackup=10</span>
<span class="c">#   - --audit-log-maxsize=100</span></code></pre></figure>

<p>The kubelet will restart <code class="language-plaintext highlighter-rouge">kube-apiserver</code> automatically when it detects a change to the static pod manifest. Verify the API server restarted cleanly and picked up the policy:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash">kubectl get pods <span class="nt">-n</span> kube-system <span class="nt">-l</span> <span class="nv">component</span><span class="o">=</span>kube-apiserver
kubectl logs <span class="nt">-n</span> kube-system kube-apiserver-&lt;node-name&gt; | <span class="nb">grep</span> <span class="nt">-i</span> audit</code></pre></figure>

<p>On managed Kubernetes services (AKS, EKS, GKE), the control plane is not directly accessible. Each provider exposes audit logs through its own mechanism: AKS via Azure Monitor / Log Analytics, EKS via CloudWatch Logs, GKE via Cloud Logging. The policy configuration interface varies by provider, and managed audit log delivery can lag 5–15 minutes on AKS — it is not a real-time feed.</p>

<h2 id="shipping-logs-with-fluent-bit">Shipping Logs with Fluent Bit</h2>

<p>Now that <code class="language-plaintext highlighter-rouge">kube-apiserver</code> is writing structured JSON to <code class="language-plaintext highlighter-rouge">/var/log/kubernetes/audit/audit.log</code> on your control-plane nodes, you need to get those logs into a queryable destination. Fluent Bit is the right tool here: it is lightweight, runs as a DaemonSet with tolerations, and has native output plugins for both Elasticsearch and Loki.</p>

<p>The key constraint is that audit logs only exist on control-plane nodes. Your Fluent Bit DaemonSet needs tolerations for the <code class="language-plaintext highlighter-rouge">control-plane</code> taint and a <code class="language-plaintext highlighter-rouge">nodeSelector</code> to target those nodes specifically.</p>

<h3 id="fluent-bit-configmap">Fluent Bit ConfigMap</h3>

<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">ConfigMap</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">fluent-bit-audit-config</span>
  <span class="na">namespace</span><span class="pi">:</span> <span class="s">logging</span>
<span class="na">data</span><span class="pi">:</span>
  <span class="na">fluent-bit.conf</span><span class="pi">:</span> <span class="pi">|</span>
    <span class="s">[SERVICE]</span>
        <span class="s">Flush         5</span>
        <span class="s">Daemon        Off</span>
        <span class="s">Log_Level     info</span>
        <span class="s">Parsers_File  parsers.conf</span>

    <span class="s">[INPUT]</span>
        <span class="s">Name              tail</span>
        <span class="s">Path              /var/log/kubernetes/audit/audit.log</span>
        <span class="s">Parser            json</span>
        <span class="s">Tag               kube.audit</span>
        <span class="s">Refresh_Interval  5</span>
        <span class="s">Mem_Buf_Limit     50MB</span>
        <span class="s">Skip_Long_Lines   On</span>

    <span class="s">[FILTER]</span>
        <span class="s">Name   record_modifier</span>
        <span class="s">Match  kube.audit</span>
        <span class="s">Record cluster prod-cluster-01</span>
        <span class="s">Record log_type kubernetes_audit</span>

    <span class="s">[OUTPUT]</span>
        <span class="s">Name            es</span>
        <span class="s">Match           kube.audit</span>
        <span class="s">Host            elasticsearch.logging.svc.cluster.local</span>
        <span class="s">Port            9200</span>
        <span class="s">Index           kubernetes-audit</span>
        <span class="s">Type            _doc</span>
        <span class="s">Logstash_Format On</span>
        <span class="s">Logstash_Prefix kubernetes-audit</span>
        <span class="s">Retry_Limit     5</span>

  <span class="na">parsers.conf</span><span class="pi">:</span> <span class="pi">|</span>
    <span class="s">[PARSER]</span>
        <span class="s">Name        json</span>
        <span class="s">Format      json</span>
        <span class="s">Time_Key    requestReceivedTimestamp</span>
        <span class="s">Time_Format %Y-%m-%dT%H:%M:%S.%LZ</span></code></pre></figure>

<p>If you are forwarding to Loki instead of Elasticsearch, replace the <code class="language-plaintext highlighter-rouge">[OUTPUT]</code> block:</p>

<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml">    <span class="pi">[</span><span class="nv">OUTPUT</span><span class="pi">]</span>
        <span class="s">Name            loki</span>
        <span class="s">Match           kube.audit</span>
        <span class="s">Host            loki.logging.svc.cluster.local</span>
        <span class="s">Port            </span><span class="m">3100</span>
        <span class="s">Labels          job=kubernetes-audit,cluster=prod-cluster-01</span>
        <span class="s">Label_Keys      $verb,$user['username'],$objectRef['resource']</span>
        <span class="s">Retry_Limit     5</span></code></pre></figure>

<h3 id="fluent-bit-daemonset">Fluent Bit DaemonSet</h3>

<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">apps/v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">DaemonSet</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">fluent-bit-audit</span>
  <span class="na">namespace</span><span class="pi">:</span> <span class="s">logging</span>
  <span class="na">labels</span><span class="pi">:</span>
    <span class="na">app</span><span class="pi">:</span> <span class="s">fluent-bit-audit</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">selector</span><span class="pi">:</span>
    <span class="na">matchLabels</span><span class="pi">:</span>
      <span class="na">app</span><span class="pi">:</span> <span class="s">fluent-bit-audit</span>
  <span class="na">template</span><span class="pi">:</span>
    <span class="na">metadata</span><span class="pi">:</span>
      <span class="na">labels</span><span class="pi">:</span>
        <span class="na">app</span><span class="pi">:</span> <span class="s">fluent-bit-audit</span>
    <span class="na">spec</span><span class="pi">:</span>
      <span class="na">serviceAccountName</span><span class="pi">:</span> <span class="s">fluent-bit</span>
      <span class="na">tolerations</span><span class="pi">:</span>
        <span class="pi">-</span> <span class="na">key</span><span class="pi">:</span> <span class="s">node-role.kubernetes.io/control-plane</span>
          <span class="na">operator</span><span class="pi">:</span> <span class="s">Exists</span>
          <span class="na">effect</span><span class="pi">:</span> <span class="s">NoSchedule</span>
        <span class="pi">-</span> <span class="na">key</span><span class="pi">:</span> <span class="s">node-role.kubernetes.io/master</span>
          <span class="na">operator</span><span class="pi">:</span> <span class="s">Exists</span>
          <span class="na">effect</span><span class="pi">:</span> <span class="s">NoSchedule</span>
      <span class="na">nodeSelector</span><span class="pi">:</span>
        <span class="na">node-role.kubernetes.io/control-plane</span><span class="pi">:</span> <span class="s2">"</span><span class="s">"</span>
      <span class="na">containers</span><span class="pi">:</span>
        <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">fluent-bit</span>
          <span class="na">image</span><span class="pi">:</span> <span class="s">fluent/fluent-bit:3.2</span>
          <span class="na">resources</span><span class="pi">:</span>
            <span class="na">requests</span><span class="pi">:</span>
              <span class="na">cpu</span><span class="pi">:</span> <span class="s">50m</span>
              <span class="na">memory</span><span class="pi">:</span> <span class="s">64Mi</span>
            <span class="na">limits</span><span class="pi">:</span>
              <span class="na">cpu</span><span class="pi">:</span> <span class="s">200m</span>
              <span class="na">memory</span><span class="pi">:</span> <span class="s">256Mi</span>
          <span class="na">volumeMounts</span><span class="pi">:</span>
            <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">audit-log</span>
              <span class="na">mountPath</span><span class="pi">:</span> <span class="s">/var/log/kubernetes/audit</span>
              <span class="na">readOnly</span><span class="pi">:</span> <span class="no">true</span>
            <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">config</span>
              <span class="na">mountPath</span><span class="pi">:</span> <span class="s">/fluent-bit/etc</span>
      <span class="na">volumes</span><span class="pi">:</span>
        <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">audit-log</span>
          <span class="na">hostPath</span><span class="pi">:</span>
            <span class="na">path</span><span class="pi">:</span> <span class="s">/var/log/kubernetes/audit</span>
            <span class="na">type</span><span class="pi">:</span> <span class="s">DirectoryOrCreate</span>
        <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">config</span>
          <span class="na">configMap</span><span class="pi">:</span>
            <span class="na">name</span><span class="pi">:</span> <span class="s">fluent-bit-audit-config</span>
<span class="nn">---</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">ServiceAccount</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">fluent-bit</span>
  <span class="na">namespace</span><span class="pi">:</span> <span class="s">logging</span>
<span class="nn">---</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">rbac.authorization.k8s.io/v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">ClusterRole</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">fluent-bit-audit</span>
<span class="na">rules</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">apiGroups</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">"</span><span class="pi">]</span>
    <span class="na">resources</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">namespaces"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">pods"</span><span class="pi">]</span>
    <span class="na">verbs</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">get"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">list"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">watch"</span><span class="pi">]</span>
<span class="nn">---</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">rbac.authorization.k8s.io/v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">ClusterRoleBinding</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">fluent-bit-audit</span>
<span class="na">roleRef</span><span class="pi">:</span>
  <span class="na">apiGroup</span><span class="pi">:</span> <span class="s">rbac.authorization.k8s.io</span>
  <span class="na">kind</span><span class="pi">:</span> <span class="s">ClusterRole</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">fluent-bit-audit</span>
<span class="na">subjects</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">kind</span><span class="pi">:</span> <span class="s">ServiceAccount</span>
    <span class="na">name</span><span class="pi">:</span> <span class="s">fluent-bit</span>
    <span class="na">namespace</span><span class="pi">:</span> <span class="s">logging</span></code></pre></figure>

<p>Apply the ConfigMap and DaemonSet to your cluster:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash">kubectl create namespace logging <span class="nt">--dry-run</span><span class="o">=</span>client <span class="nt">-o</span> yaml | kubectl apply <span class="nt">-f</span> -
kubectl apply <span class="nt">-f</span> fluent-bit-audit-config.yaml
kubectl apply <span class="nt">-f</span> fluent-bit-audit-daemonset.yaml</code></pre></figure>

<p>Verify that the DaemonSet pods land only on control-plane nodes and are reading the audit log:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash">kubectl get pods <span class="nt">-n</span> logging <span class="nt">-l</span> <span class="nv">app</span><span class="o">=</span>fluent-bit-audit <span class="nt">-o</span> wide
kubectl logs <span class="nt">-n</span> logging daemonset/fluent-bit-audit | <span class="nb">grep</span> <span class="nt">-E</span> <span class="s2">"audit|flush|chunk"</span></code></pre></figure>

<p>You should see one pod per control-plane node and log lines indicating it is tailing the audit log file.</p>

<h2 id="what-to-alert-on">What to Alert On</h2>

<p>Collecting audit logs is only half the work. The value comes from the alerts you build on top of them. Here are the five security patterns that should have active alerts in any production cluster.</p>

<h3 id="1-secret-reads-and-lists">1. Secret reads and lists</h3>

<p>Any access to <code class="language-plaintext highlighter-rouge">secrets</code> outside your expected service accounts deserves investigation. In Elasticsearch:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Kibana KQL</span>
objectRef.resource: <span class="s2">"secrets"</span> AND verb: <span class="o">(</span><span class="s2">"get"</span> OR <span class="s2">"list"</span><span class="o">)</span> AND NOT user.username: <span class="s2">"system:serviceaccount:*"</span></code></pre></figure>

<p>In Loki (LogQL):</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="o">{</span><span class="nv">job</span><span class="o">=</span><span class="s2">"kubernetes-audit"</span><span class="o">}</span> | json | <span class="nv">objectRef_resource</span><span class="o">=</span><span class="s2">"secrets"</span> and <span class="nv">verb</span><span class="o">=</span>~<span class="s2">"get|list"</span> | line_format <span class="s2">" accessed secret  in "</span></code></pre></figure>

<h3 id="2-pod-exec-attach-and-portforward">2. Pod exec, attach, and portforward</h3>

<p>An exec into a running pod is a major indicator of either legitimate debugging or active intrusion. Either way, you want to know about it. The response code <code class="language-plaintext highlighter-rouge">101</code> indicates a successful WebSocket upgrade (exec session established):</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># KQL</span>
objectRef.resource: <span class="s2">"pods"</span> AND objectRef.subresource: <span class="o">(</span><span class="s2">"exec"</span> OR <span class="s2">"attach"</span> OR <span class="s2">"portforward"</span><span class="o">)</span> AND responseStatus.code: 101</code></pre></figure>

<h3 id="3-rbac-mutations">3. RBAC mutations</h3>

<p>Any create, update, patch, or delete against ClusterRoleBindings or RoleBindings in sensitive namespaces should alert immediately. Privilege escalation via RBAC is the most common post-compromise lateral movement path:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># KQL</span>
objectRef.resource: <span class="o">(</span><span class="s2">"clusterrolebindings"</span> OR <span class="s2">"rolebindings"</span><span class="o">)</span> AND verb: <span class="o">(</span><span class="s2">"create"</span> OR <span class="s2">"update"</span> OR <span class="s2">"patch"</span> OR <span class="s2">"delete"</span><span class="o">)</span></code></pre></figure>

<p>Correlate these events with the <code class="language-plaintext highlighter-rouge">requestObject</code> field, which at <code class="language-plaintext highlighter-rouge">RequestResponse</code> level will contain the full binding definition including the subject being granted access.</p>

<h3 id="4-failed-authentication-and-authorization">4. Failed authentication and authorization</h3>

<p>A burst of <code class="language-plaintext highlighter-rouge">401</code> or <code class="language-plaintext highlighter-rouge">403</code> responses is either a misconfigured service account or credential scanning. Either warrants investigation:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># KQL — rate alert: more than 10 in 5 minutes from the same sourceIP</span>
responseStatus.code: <span class="o">(</span>401 OR 403<span class="o">)</span> AND sourceIPs: <span class="k">*</span></code></pre></figure>

<p>Set this as a count-based alert in your SIEM rather than alerting on individual events — some 403s in normal operations are expected. A spike is the signal.</p>

<h3 id="5-anonymous-requests">5. Anonymous requests</h3>

<p>Any request authenticated as <code class="language-plaintext highlighter-rouge">system:anonymous</code> should be treated as a configuration error at minimum and a probing attempt at worst:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># KQL</span>
user.username: <span class="s2">"system:anonymous"</span></code></pre></figure>

<p>If anonymous authentication is disabled on your cluster (<code class="language-plaintext highlighter-rouge">--anonymous-auth=false</code> on the API server), this alert should never fire. If it does, something is wrong.</p>

<h2 id="best-practices">Best Practices</h2>

<p><strong>Always omit <code class="language-plaintext highlighter-rouge">RequestReceived</code> globally.</strong> This stage fires before authorization and carries no additional information over <code class="language-plaintext highlighter-rouge">ResponseComplete</code> for security purposes. Keeping it doubles your audit log volume without any investigation value. Set it in <code class="language-plaintext highlighter-rouge">omitStages</code> at the top of your policy file, not per rule.</p>

<p><strong>Never apply <code class="language-plaintext highlighter-rouge">RequestResponse</code> to <code class="language-plaintext highlighter-rouge">list</code> verbs on high-volume resources.</strong> A <code class="language-plaintext highlighter-rouge">RequestResponse</code> audit event for a <code class="language-plaintext highlighter-rouge">list secrets</code> call includes the full response body — every secret in the namespace in base64. On a namespace with 50 secrets being listed every 30 seconds by a controller, that is a significant storage and security exposure. Scope <code class="language-plaintext highlighter-rouge">RequestResponse</code> to specific verbs (<code class="language-plaintext highlighter-rouge">get</code>, <code class="language-plaintext highlighter-rouge">create</code>, <code class="language-plaintext highlighter-rouge">update</code>, <code class="language-plaintext highlighter-rouge">patch</code>, <code class="language-plaintext highlighter-rouge">delete</code>) and use <code class="language-plaintext highlighter-rouge">Metadata</code> or <code class="language-plaintext highlighter-rouge">Request</code> for <code class="language-plaintext highlighter-rouge">list</code> and <code class="language-plaintext highlighter-rouge">watch</code> on non-secret resources.</p>

<p><strong>Treat the audit log destination as a sensitive data store.</strong> At <code class="language-plaintext highlighter-rouge">RequestResponse</code> level, audit events for secret reads contain base64-encoded secret values. Your Elasticsearch index or Loki stream for audit logs needs the same access controls as the secrets themselves. Restrict read access, enable encryption at rest, and do not route audit events through a general-purpose logging pipeline with broad access.</p>

<p><strong>Always include a catch-all rule at the bottom of your policy.</strong> Without it, any API group or resource not explicitly matched by your rules is silently dropped. Custom resource definitions, new API groups added by operators, and future Kubernetes API additions all fall through the gap. The <code class="language-plaintext highlighter-rouge">Metadata</code> catch-all at the bottom of the production policy above ensures nothing is silently ignored.</p>

<p><strong>Account for managed Kubernetes audit log latency.</strong> On AKS, audit logs delivered through Azure Monitor can lag 5–15 minutes. This means your audit-based alerts are not real-time — they are delayed. Design your incident response process with this in mind and do not rely on audit log alerts as your only detection layer for active incidents. Complement them with runtime security tools like Falco for real-time detection.</p>

<p><strong>Rotate and archive audit log files on the control-plane node.</strong> The <code class="language-plaintext highlighter-rouge">--audit-log-maxage</code>, <code class="language-plaintext highlighter-rouge">--audit-log-maxbackup</code>, and <code class="language-plaintext highlighter-rouge">--audit-log-maxsize</code> flags on <code class="language-plaintext highlighter-rouge">kube-apiserver</code> control local rotation. Set them explicitly: 30 days retention, 10 backup files, 100MB per file is a reasonable starting point. Without these flags, a single audit log file can grow until it fills the control-plane root volume, which will crash <code class="language-plaintext highlighter-rouge">kube-apiserver</code>.</p>

<h2 id="conclusion">Conclusion</h2>

<p>Kubernetes audit logging is not a checkbox. Without a thoughtful policy, you either have silence where you need evidence or noise that makes the evidence unreachable. The approach in this post — suppress system traffic first, escalate to <code class="language-plaintext highlighter-rouge">RequestResponse</code> only for resources that carry security value, ship with Fluent Bit to a secured destination, and alert on the five patterns that actually indicate malicious activity — gives you a forensic trail you can actually use.</p>

<p>The policy and the Fluent Bit configuration are both starting points. Your first week of running them in production will surface internal system accounts you need to suppress and resources you want to escalate. Tune from there. Version your policy file in git alongside the rest of your cluster configuration.</p>

<p>Happy scripting!</p>]]></content><author><name>Victor Silva</name></author><category term="Security" /><category term="Kubernetes" /><category term="kubernetes" /><category term="observability" /><category term="fluent-bit" /><category term="kube-apiserver" /><category term="devsecops" /><summary type="html"><![CDATA[Kubernetes audit logging records every kube-apiserver call in your cluster. Learn to write a production audit policy and ship logs with Fluent Bit.]]></summary></entry><entry><title type="html">OCI Vault: Secrets Management with Terraform</title><link href="https://blog.victorsilva.com.uy/oci-vault-secrets-management-terraform/" rel="alternate" type="text/html" title="OCI Vault: Secrets Management with Terraform" /><published>2026-04-06T09:00:00+00:00</published><updated>2026-04-06T09:00:00+00:00</updated><id>https://blog.victorsilva.com.uy/oci-vault-secrets-management-terraform</id><content type="html" xml:base="https://blog.victorsilva.com.uy/oci-vault-secrets-management-terraform/"><![CDATA[<p>If you’ve ever opened a Terraform repository and found something like <code class="language-plaintext highlighter-rouge">db_password = "Sup3rS3cr3t!"</code> hardcoded in a <code class="language-plaintext highlighter-rouge">.tfvars</code> file — or worse, in <code class="language-plaintext highlighter-rouge">main.tf</code> itself — you already know exactly what problem we’re talking about. Hardcoded credentials are one of the most common vulnerabilities in infrastructure-as-code projects, and the risk doesn’t stop there: even when secrets are passed correctly as variables, certain Terraform data sources write the secret value directly into the state file, which often lives in an S3 bucket or a remote backend without additional encryption.</p>

<p>OCI Vault solves this problem at the root. It’s Oracle Cloud’s managed service for key and secret storage, backed by HSM, with granular access control via IAM and native support in the Terraform provider for OCI. In this post we’ll build the complete infrastructure from scratch: vault, master encryption key, secrets with expiration and rotation rules, IAM policies for teams and for workloads via Instance Principal, and the verification commands to confirm everything works before trusting the system in production.</p>

<p>We’ll also be explicit about the state file problem and how to avoid it, because it’s the most dangerous gotcha when working with secrets in Terraform.</p>

<h2 id="architecture-and-key-concepts">Architecture and key concepts</h2>

<p>OCI Vault has an architecture with two separate planes:</p>

<ul>
  <li><strong>Management Endpoint (control plane):</strong> used for administrative operations — creating vaults, keys, and secrets, rotating versions. All Terraform calls go here.</li>
  <li><strong>Cryptographic Endpoint (data plane):</strong> used for actual cryptographic operations — encrypt, decrypt, sign. Applications that need direct encryption point here.</li>
</ul>

<p>This separation is not cosmetic. It means you can restrict access to the data plane independently from the control plane, which is relevant for IAM policy design.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌─────────────────────────────────────────────────────────────┐
│  OCI Tenancy                                                │
│  ┌───────────────────────────────────────────────────────┐  │
│  │  Compartment: production                              │  │
│  │                                                       │  │
│  │  ┌─────────────────┐    ┌───────────────────────────┐ │  │
│  │  │   OCI Vault     │    │   Compute Instance        │ │  │
│  │  │  ┌───────────┐  │    │   (Instance Principal)    │ │  │
│  │  │  │  MEK Key  │  │    │                           │ │  │
│  │  │  └─────┬─────┘  │    │   oci secrets             │ │  │
│  │  │        │ encrypts│   │   secret-bundle get ───►  │ │  │
│  │  │  ┌─────▼─────┐  │◄───┤                           │ │  │
│  │  │  │  Secrets  │  │    │                           │ │  │
│  │  │  └───────────┘  │    └───────────────────────────┘ │  │
│  │  └─────────────────┘                                  │  │
│  │         ▲                                             │  │
│  │   Management Endpoint   Cryptographic Endpoint        │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
</code></pre></div></div>

<h3 id="vault-types-an-irreversible-decision">Vault types: an irreversible decision</h3>

<p>This is the first point where you need to think before running <code class="language-plaintext highlighter-rouge">terraform apply</code>, because <strong>the vault type cannot be changed after creation</strong>. The options are:</p>

<table>
  <thead>
    <tr>
      <th>Type</th>
      <th>HSM</th>
      <th>Key auto-rotation</th>
      <th>Cost</th>
      <th>Recommended use</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">DEFAULT</code></td>
      <td>Shared</td>
      <td>No</td>
      <td>Lower</td>
      <td>Development, staging</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">VIRTUAL_PRIVATE</code></td>
      <td>Dedicated</td>
      <td>Yes (GA since Feb 2024)</td>
      <td>Higher</td>
      <td>Production</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">EXTERNAL</code></td>
      <td>External (BYOK)</td>
      <td>No</td>
      <td>Variable</td>
      <td>Strict regulations</td>
    </tr>
  </tbody>
</table>

<p>For production, <code class="language-plaintext highlighter-rouge">VIRTUAL_PRIVATE</code> is the right answer: dedicated HSM, support for automatic key rotation, and full isolation. For development and testing environments, <code class="language-plaintext highlighter-rouge">DEFAULT</code> works well and is considerably more economical.</p>

<p>In this post we’ll use <code class="language-plaintext highlighter-rouge">DEFAULT</code> to keep the example deployable in any tenancy, but in the best practices section we’ll look at when and how to migrate to <code class="language-plaintext highlighter-rouge">VIRTUAL_PRIVATE</code>.</p>

<h3 id="keys-aes-for-secrets-rsaecdsa-for-signing">Keys: AES for secrets, RSA/ECDSA for signing</h3>

<p>OCI Vault supports symmetric keys (AES) and asymmetric keys (RSA, ECDSA). The important constraint: <strong>only AES keys can encrypt secrets</strong>. RSA and ECDSA keys are for signing and asymmetric encryption, not for the vault secrets service. If you try to associate an RSA key with a secret, the operation fails.</p>

<p>The Terraform gotcha that bites almost everyone the first time: <strong>key length is specified in bytes, not bits</strong>. AES-256 = <code class="language-plaintext highlighter-rouge">length = 32</code>. If you set <code class="language-plaintext highlighter-rouge">length = 256</code> you’re requesting a 2048-bit key, which isn’t even a valid AES size.</p>

<h2 id="prerequisites">Prerequisites</h2>

<p>To follow this post you’ll need:</p>

<ul>
  <li>OCI CLI installed and configured (<code class="language-plaintext highlighter-rouge">oci setup config</code> or API key in <code class="language-plaintext highlighter-rouge">~/.oci/config</code>)</li>
  <li>Terraform &gt;= 1.3</li>
  <li>Provider <code class="language-plaintext highlighter-rouge">oracle/oci</code> &gt;= 8.0</li>
  <li>A compartment OCID where you have <code class="language-plaintext highlighter-rouge">manage vaults</code>, <code class="language-plaintext highlighter-rouge">manage keys</code>, and <code class="language-plaintext highlighter-rouge">manage secret-family</code> permissions</li>
  <li>Your tenancy OCID (needed to create Dynamic Groups, which are tenancy-level)</li>
</ul>

<p>Verify access before starting:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Verify the CLI is configured correctly</span>
oci iam user get <span class="nt">--user-id</span> <span class="si">$(</span>oci iam user list <span class="nt">--query</span> <span class="s1">'data[0].id'</span> <span class="nt">--raw-output</span><span class="si">)</span>

<span class="c"># Verify you have access to the compartment</span>
oci iam compartment get <span class="nt">--compartment-id</span> <span class="nv">$COMPARTMENT_ID</span>

<span class="c"># Verify the provider version in your project</span>
terraform providers</code></pre></figure>

<h2 id="step-by-step-implementation">Step-by-step implementation</h2>

<h3 id="provider-configuration">Provider configuration</h3>

<p>We start with the provider configuration block. Nothing special here, but it’s important to pin the provider version because the OCI Vault API changed between major versions:</p>

<figure class="highlight"><pre><code class="language-hcl" data-lang="hcl"><span class="nx">terraform</span> <span class="p">{</span>
  <span class="nx">required_providers</span> <span class="p">{</span>
    <span class="nx">oci</span> <span class="p">=</span> <span class="p">{</span>
      <span class="nx">source</span>  <span class="p">=</span> <span class="s2">"oracle/oci"</span>
      <span class="nx">version</span> <span class="p">=</span> <span class="s2">"~&gt; 8.0"</span>
    <span class="p">}</span>
  <span class="p">}</span>
<span class="p">}</span>

<span class="nx">provider</span> <span class="s2">"oci"</span> <span class="p">{</span>
  <span class="nx">region</span> <span class="p">=</span> <span class="nx">var</span><span class="err">.</span><span class="nx">region</span>
<span class="p">}</span></code></pre></figure>

<p>The variables we’ll need throughout the example:</p>

<figure class="highlight"><pre><code class="language-hcl" data-lang="hcl"><span class="nx">variable</span> <span class="s2">"region"</span> <span class="p">{</span>
  <span class="nx">description</span> <span class="p">=</span> <span class="s2">"OCI region"</span>
  <span class="nx">type</span>        <span class="p">=</span> <span class="nx">string</span>
<span class="p">}</span>

<span class="nx">variable</span> <span class="s2">"compartment_id"</span> <span class="p">{</span>
  <span class="nx">description</span> <span class="p">=</span> <span class="s2">"OCID of the compartment where resources are deployed"</span>
  <span class="nx">type</span>        <span class="p">=</span> <span class="nx">string</span>
<span class="p">}</span>

<span class="nx">variable</span> <span class="s2">"tenancy_ocid"</span> <span class="p">{</span>
  <span class="nx">description</span> <span class="p">=</span> <span class="s2">"OCID of the tenancy (required for Dynamic Groups)"</span>
  <span class="nx">type</span>        <span class="p">=</span> <span class="nx">string</span>
<span class="p">}</span>

<span class="nx">variable</span> <span class="s2">"db_password"</span> <span class="p">{</span>
  <span class="nx">description</span> <span class="p">=</span> <span class="s2">"Database admin password"</span>
  <span class="nx">type</span>        <span class="p">=</span> <span class="nx">string</span>
  <span class="nx">sensitive</span>   <span class="p">=</span> <span class="kc">true</span>
<span class="p">}</span></code></pre></figure>

<h3 id="creating-the-vault-and-master-encryption-key">Creating the Vault and Master Encryption Key</h3>

<p>The vault and key are created with two separate resources. The relationship between them is that <code class="language-plaintext highlighter-rouge">oci_kms_key</code> requires the <code class="language-plaintext highlighter-rouge">management_endpoint</code> of the vault — not a hardcoded endpoint, but a reference to the vault resource’s attribute. Without <code class="language-plaintext highlighter-rouge">depends_on</code>, Terraform may try to create the key before the vault is fully provisioned, resulting in an unavailable endpoint error.</p>

<figure class="highlight"><pre><code class="language-hcl" data-lang="hcl"><span class="nx">resource</span> <span class="s2">"oci_kms_vault"</span> <span class="s2">"app_vault"</span> <span class="p">{</span>
  <span class="nx">compartment_id</span> <span class="p">=</span> <span class="nx">var</span><span class="err">.</span><span class="nx">compartment_id</span>
  <span class="nx">display_name</span>   <span class="p">=</span> <span class="s2">"app-production-vault"</span>
  <span class="nx">vault_type</span>     <span class="p">=</span> <span class="s2">"DEFAULT"</span>

  <span class="nx">freeform_tags</span> <span class="p">=</span> <span class="p">{</span>
    <span class="s2">"Environment"</span> <span class="p">=</span> <span class="s2">"production"</span>
    <span class="s2">"ManagedBy"</span>   <span class="p">=</span> <span class="s2">"terraform"</span>
  <span class="p">}</span>
<span class="p">}</span>

<span class="nx">resource</span> <span class="s2">"oci_kms_key"</span> <span class="s2">"app_key"</span> <span class="p">{</span>
  <span class="nx">compartment_id</span>      <span class="p">=</span> <span class="nx">var</span><span class="err">.</span><span class="nx">compartment_id</span>
  <span class="nx">display_name</span>        <span class="p">=</span> <span class="s2">"app-secrets-key"</span>
  <span class="nx">management_endpoint</span> <span class="p">=</span> <span class="nx">oci_kms_vault</span><span class="err">.</span><span class="nx">app_vault</span><span class="err">.</span><span class="nx">management_endpoint</span>

  <span class="nx">key_shape</span> <span class="p">{</span>
    <span class="nx">algorithm</span> <span class="p">=</span> <span class="s2">"AES"</span>
    <span class="nx">length</span>    <span class="p">=</span> <span class="mi">32</span>   <span class="c1"># 32 bytes = AES-256 (Terraform uses bytes, not bits)</span>
  <span class="p">}</span>

  <span class="nx">protection_mode</span> <span class="p">=</span> <span class="s2">"HSM"</span>

  <span class="nx">depends_on</span> <span class="p">=</span> <span class="p">[</span><span class="nx">oci_kms_vault</span><span class="err">.</span><span class="nx">app_vault</span><span class="p">]</span>
<span class="p">}</span></code></pre></figure>

<p>Two important decisions in this block:</p>

<p><code class="language-plaintext highlighter-rouge">protection_mode = "HSM"</code> means the key material never leaves the HSM — OCI cannot export it and neither can you. If you use <code class="language-plaintext highlighter-rouge">protection_mode = "SOFTWARE"</code>, the key can be exported, which expands the attack surface. For production, always HSM.</p>

<p>The explicit <code class="language-plaintext highlighter-rouge">depends_on</code> is not just best practice: it’s necessary. The vault may take a few seconds to become operational after the API reports the resource as created, and the key needs the management endpoint to be active to register.</p>

<h3 id="creating-the-secret-with-expiration-rules">Creating the secret with expiration rules</h3>

<p>Now for the secret itself. The secret content must be base64-encoded — OCI Vault does not accept plain text in the API. Terraform has the <code class="language-plaintext highlighter-rouge">base64encode()</code> function that does exactly that:</p>

<figure class="highlight"><pre><code class="language-hcl" data-lang="hcl"><span class="nx">resource</span> <span class="s2">"oci_vault_secret"</span> <span class="s2">"db_password"</span> <span class="p">{</span>
  <span class="nx">compartment_id</span> <span class="p">=</span> <span class="nx">var</span><span class="err">.</span><span class="nx">compartment_id</span>
  <span class="nx">vault_id</span>       <span class="p">=</span> <span class="nx">oci_kms_vault</span><span class="err">.</span><span class="nx">app_vault</span><span class="err">.</span><span class="nx">id</span>
  <span class="nx">key_id</span>         <span class="p">=</span> <span class="nx">oci_kms_key</span><span class="err">.</span><span class="nx">app_key</span><span class="err">.</span><span class="nx">id</span>
  <span class="nx">secret_name</span>    <span class="p">=</span> <span class="s2">"app-db-password"</span>
  <span class="nx">description</span>    <span class="p">=</span> <span class="s2">"Database admin password for app-production"</span>

  <span class="nx">secret_content</span> <span class="p">{</span>
    <span class="nx">content_type</span> <span class="p">=</span> <span class="s2">"BASE64"</span>
    <span class="nx">content</span>      <span class="p">=</span> <span class="nx">base64encode</span><span class="err">(</span><span class="nx">var</span><span class="err">.</span><span class="nx">db_password</span><span class="err">)</span>
    <span class="nx">stage</span>        <span class="p">=</span> <span class="s2">"CURRENT"</span>
  <span class="p">}</span>

  <span class="nx">secret_rules</span> <span class="p">{</span>
    <span class="nx">rule_type</span>                                     <span class="p">=</span> <span class="s2">"SECRET_EXPIRY_RULE"</span>
    <span class="nx">secret_version_expiry_interval</span>                <span class="p">=</span> <span class="s2">"P90D"</span>
    <span class="nx">is_secret_content_retrieval_blocked_on_expiry</span> <span class="p">=</span> <span class="kc">true</span>
  <span class="p">}</span>

  <span class="nx">secret_rules</span> <span class="p">{</span>
    <span class="nx">rule_type</span>                              <span class="p">=</span> <span class="s2">"SECRET_REUSE_RULE"</span>
    <span class="nx">is_enforced_on_deleted_secret_versions</span> <span class="p">=</span> <span class="kc">true</span>
  <span class="p">}</span>
<span class="p">}</span></code></pre></figure>

<p>The <code class="language-plaintext highlighter-rouge">secret_rules</code> are the component most often overlooked in basic implementations and that makes the biggest difference in production:</p>

<p><strong>SECRET_EXPIRY_RULE</strong> with <code class="language-plaintext highlighter-rouge">P90D</code> makes the secret expire after 90 days. The critical part is <code class="language-plaintext highlighter-rouge">is_secret_content_retrieval_blocked_on_expiry = true</code>. By default this field is <code class="language-plaintext highlighter-rouge">false</code>, meaning that even when the secret expires, applications can still read it. That makes expiration decorative. With <code class="language-plaintext highlighter-rouge">true</code>, OCI blocks access to the secret bundle once it expires, forcing real rotation.</p>

<p><strong>SECRET_REUSE_RULE</strong> with <code class="language-plaintext highlighter-rouge">is_enforced_on_deleted_secret_versions = true</code> prevents reuse of a previous secret value, even in deleted versions. This is a compliance control relevant in regulated environments.</p>

<h3 id="outputs-for-later-reference">Outputs for later reference</h3>

<p>Outputs are important both for verification and so that other Terraform modules can reference these resources:</p>

<figure class="highlight"><pre><code class="language-hcl" data-lang="hcl"><span class="nx">output</span> <span class="s2">"vault_id"</span> <span class="p">{</span>
  <span class="nx">description</span> <span class="p">=</span> <span class="s2">"Vault OCID"</span>
  <span class="nx">value</span>       <span class="p">=</span> <span class="nx">oci_kms_vault</span><span class="err">.</span><span class="nx">app_vault</span><span class="err">.</span><span class="nx">id</span>
<span class="p">}</span>

<span class="nx">output</span> <span class="s2">"vault_management_endpoint"</span> <span class="p">{</span>
  <span class="nx">description</span> <span class="p">=</span> <span class="s2">"Vault management endpoint (required for key operations)"</span>
  <span class="nx">value</span>       <span class="p">=</span> <span class="nx">oci_kms_vault</span><span class="err">.</span><span class="nx">app_vault</span><span class="err">.</span><span class="nx">management_endpoint</span>
<span class="p">}</span>

<span class="nx">output</span> <span class="s2">"key_id"</span> <span class="p">{</span>
  <span class="nx">description</span> <span class="p">=</span> <span class="s2">"Master encryption key OCID"</span>
  <span class="nx">value</span>       <span class="p">=</span> <span class="nx">oci_kms_key</span><span class="err">.</span><span class="nx">app_key</span><span class="err">.</span><span class="nx">id</span>
<span class="p">}</span>

<span class="nx">output</span> <span class="s2">"db_secret_id"</span> <span class="p">{</span>
  <span class="nx">description</span> <span class="p">=</span> <span class="s2">"Database secret OCID"</span>
  <span class="nx">value</span>       <span class="p">=</span> <span class="nx">oci_vault_secret</span><span class="err">.</span><span class="nx">db_password</span><span class="err">.</span><span class="nx">id</span>
<span class="p">}</span></code></pre></figure>

<h2 id="the-state-file-problem">The state file problem</h2>

<p>This is the point where most projects fail silently, and it’s worth pausing.</p>

<p>OCI Vault exposes two data sources for reading secrets in Terraform:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">oci_vault_secret</code> — returns <strong>only metadata</strong> about the secret: OCID, name, state, dates. The secret value never appears in the state.</li>
  <li><code class="language-plaintext highlighter-rouge">oci_secrets_secretbundle</code> — returns the <strong>actual content</strong> of the secret, decoded. <strong>This value is stored in the state file.</strong></li>
</ul>

<p>The Terraform state file is not encrypted by default. If your backend is an AWS S3 bucket or an OCI Object Storage bucket without additional encryption, the secret is stored in plain text in the state. Anyone with access to the backend has access to the secret.</p>

<p>The safest pattern is to never read the secret value from Terraform. Applications should retrieve it at runtime using the OCI SDK or CLI with Instance Principal, not during apply. If you need to reference a secret’s OCID in another resource, use <code class="language-plaintext highlighter-rouge">oci_vault_secret</code> (metadata only) or directly reference the output of the resource that created it.</p>

<p>If for some operational reason you need to read the bundle in Terraform, there are three mitigations:</p>

<p><strong>1. KMS-encrypted backend.</strong> If you use OCI Object Storage as a Terraform backend, you can configure it with an OCI Vault key so the state file is encrypted at rest. The secret is still in the state, but the state is encrypted with a key whose access you control with IAM.</p>

<p><strong>2. Automatic secret generation.</strong> Some secret types support <code class="language-plaintext highlighter-rouge">enable_auto_generation = true</code> in <code class="language-plaintext highlighter-rouge">oci_vault_secret</code>. In that case, OCI generates the value internally and it never goes through Terraform — the state only contains the OCID, never the value. This is ideal for database passwords that you don’t need to know yourself, only the application does.</p>

<p><strong>3. Separate provisioning.</strong> The vault and keys are managed with Terraform. Secret values are loaded with the CLI or a separate pipeline with limited access. Terraform manages the infrastructure, not the sensitive data.</p>

<p>The recommended posture: use Terraform to create the vault infrastructure (vault, key, secret resource with a placeholder value or with auto-generation), and leave injecting the real value for a separate step outside the Terraform state.</p>

<h2 id="iam-policies-granular-access-control">IAM Policies: granular access control</h2>

<p>This is the component that’s hardest to get right, because OCI IAM has a verb matrix that’s not immediately obvious.</p>

<h3 id="the-verb-matrix-for-secrets">The verb matrix for secrets</h3>

<table>
  <thead>
    <tr>
      <th>Verb</th>
      <th>Operation</th>
      <th>Who needs it</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">read secret-bundles</code></td>
      <td>GetSecretBundle — retrieve the secret value</td>
      <td>App workloads, production instances</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">read secrets</code></td>
      <td>GetSecret — view secret metadata</td>
      <td>Audit, CI/CD pipelines that only reference OCIDs</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">use secrets</code></td>
      <td>ListSecretVersions and rotation operations</td>
      <td>Automated rotation tools</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">manage secret-family</code></td>
      <td>Full control — create, delete, rotate, modify</td>
      <td>Security administrators only</td>
    </tr>
  </tbody>
</table>

<p>The golden rule: <strong>never grant <code class="language-plaintext highlighter-rouge">manage secret-family</code> to an application workload</strong>. With that verb, the application can delete secrets, create versions with arbitrary values, and modify expiration rules. The blast radius if the application is compromised extends to the entire vault.</p>

<h3 id="policies-for-the-administrator-team">Policies for the administrator team</h3>

<figure class="highlight"><pre><code class="language-hcl" data-lang="hcl"><span class="nx">resource</span> <span class="s2">"oci_identity_policy"</span> <span class="s2">"vault_admin_policy"</span> <span class="p">{</span>
  <span class="nx">compartment_id</span> <span class="p">=</span> <span class="nx">var</span><span class="err">.</span><span class="nx">compartment_id</span>
  <span class="nx">name</span>           <span class="p">=</span> <span class="s2">"vault-admin-policy"</span>
  <span class="nx">description</span>    <span class="p">=</span> <span class="s2">"Allow SecurityAdmins group to fully manage vault resources"</span>

  <span class="nx">statements</span> <span class="p">=</span> <span class="p">[</span>
    <span class="s2">"Allow group SecurityAdmins to manage vaults in compartment id ${var.compartment_id}"</span><span class="p">,</span>
    <span class="s2">"Allow group SecurityAdmins to manage keys in compartment id ${var.compartment_id}"</span><span class="p">,</span>
    <span class="s2">"Allow group SecurityAdmins to manage secret-family in compartment id ${var.compartment_id}"</span><span class="p">,</span>
  <span class="p">]</span>
<span class="p">}</span></code></pre></figure>

<h3 id="dynamic-groups-for-instance-principal">Dynamic Groups for Instance Principal</h3>

<p>Dynamic Groups are OCI’s mechanism for compute instances to authenticate with IAM without static credentials. The instance assumes an identity based on its compartment membership, and that identity has the policies you assign to it.</p>

<p>An important operational detail: <strong>Dynamic Groups are created at the tenancy level, not the compartment level</strong>. The <code class="language-plaintext highlighter-rouge">compartment_id</code> of the <code class="language-plaintext highlighter-rouge">oci_identity_dynamic_group</code> resource must be the tenancy OCID, even if the matching rule filters instances from a specific compartment. If you use a child compartment OCID, the OCI provider will return an error.</p>

<figure class="highlight"><pre><code class="language-hcl" data-lang="hcl"><span class="nx">resource</span> <span class="s2">"oci_identity_dynamic_group"</span> <span class="s2">"app_instances"</span> <span class="p">{</span>
  <span class="nx">compartment_id</span> <span class="p">=</span> <span class="nx">var</span><span class="err">.</span><span class="nx">tenancy_ocid</span>   <span class="c1"># Always tenancy, not compartment</span>
  <span class="nx">name</span>           <span class="p">=</span> <span class="s2">"app-compute-instances"</span>
  <span class="nx">description</span>    <span class="p">=</span> <span class="s2">"Compute instances in the app production compartment"</span>
  <span class="nx">matching_rule</span>  <span class="p">=</span> <span class="s2">"All {instance.compartment.id = '${var.compartment_id}'}"</span>
<span class="p">}</span>

<span class="nx">resource</span> <span class="s2">"oci_identity_policy"</span> <span class="s2">"instance_secret_policy"</span> <span class="p">{</span>
  <span class="nx">compartment_id</span> <span class="p">=</span> <span class="nx">var</span><span class="err">.</span><span class="nx">compartment_id</span>
  <span class="nx">name</span>           <span class="p">=</span> <span class="s2">"instance-vault-access-policy"</span>
  <span class="nx">description</span>    <span class="p">=</span> <span class="s2">"Allow app instances to retrieve secrets from vault"</span>

  <span class="nx">statements</span> <span class="p">=</span> <span class="p">[</span>
    <span class="s2">"Allow dynamic-group app-compute-instances to read secret-bundles in compartment id ${var.compartment_id}"</span><span class="p">,</span>
  <span class="p">]</span>
<span class="p">}</span></code></pre></figure>

<p>If you want to narrow access to a specific secret rather than the entire compartment, OCI supports conditions in IAM statements:</p>

<figure class="highlight"><pre><code class="language-hcl" data-lang="hcl"><span class="nx">resource</span> <span class="s2">"oci_identity_policy"</span> <span class="s2">"instance_specific_secret_policy"</span> <span class="p">{</span>
  <span class="nx">compartment_id</span> <span class="p">=</span> <span class="nx">var</span><span class="err">.</span><span class="nx">compartment_id</span>
  <span class="nx">name</span>           <span class="p">=</span> <span class="s2">"instance-specific-secret-policy"</span>
  <span class="nx">description</span>    <span class="p">=</span> <span class="s2">"Allow app instances to retrieve only the db password secret"</span>

  <span class="nx">statements</span> <span class="p">=</span> <span class="p">[</span>
    <span class="s2">"Allow dynamic-group app-compute-instances to read secret-bundles in compartment id ${var.compartment_id} where target.secret.name = 'app-db-password'"</span><span class="p">,</span>
  <span class="p">]</span>
<span class="p">}</span></code></pre></figure>

<p>This granularity is particularly useful in multi-application environments where different services need access to different secrets within the same compartment.</p>

<h2 id="testing-and-verification">Testing and verification</h2>

<p>With the infrastructure applied, verification has three levels: vault and key state, secret retrieval from your local machine, and retrieval from an instance using Instance Principal.</p>

<h3 id="verify-vault-and-key-state">Verify vault and key state</h3>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Verify the vault is ACTIVE</span>
oci kms management vault get <span class="se">\</span>
  <span class="nt">--vault-id</span> <span class="s2">"</span><span class="si">$(</span>terraform output <span class="nt">-raw</span> vault_id<span class="si">)</span><span class="s2">"</span> <span class="se">\</span>
  <span class="nt">--query</span> <span class="s1">'data."lifecycle-state"'</span> <span class="nt">--raw-output</span>

<span class="c"># Verify the key is ENABLED</span>
oci kms management key get <span class="se">\</span>
  <span class="nt">--key-id</span> <span class="s2">"</span><span class="si">$(</span>terraform output <span class="nt">-raw</span> key_id<span class="si">)</span><span class="s2">"</span> <span class="se">\</span>
  <span class="nt">--endpoint</span> <span class="s2">"</span><span class="si">$(</span>terraform output <span class="nt">-raw</span> vault_management_endpoint<span class="si">)</span><span class="s2">"</span> <span class="se">\</span>
  <span class="nt">--query</span> <span class="s1">'data."lifecycle-state"'</span> <span class="nt">--raw-output</span></code></pre></figure>

<p>The expected vault state is <code class="language-plaintext highlighter-rouge">ACTIVE</code>. The expected key state is <code class="language-plaintext highlighter-rouge">ENABLED</code>. If the vault is in <code class="language-plaintext highlighter-rouge">CREATING</code> or <code class="language-plaintext highlighter-rouge">PROVISIONING</code>, wait a few seconds and query again.</p>

<h3 id="retrieve-and-verify-the-secret">Retrieve and verify the secret</h3>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Retrieve the secret and decode the base64</span>
oci secrets secret-bundle get <span class="se">\</span>
  <span class="nt">--secret-id</span> <span class="s2">"</span><span class="si">$(</span>terraform output <span class="nt">-raw</span> db_secret_id<span class="si">)</span><span class="s2">"</span> <span class="se">\</span>
  <span class="nt">--query</span> <span class="s1">'data."secret-bundle-content".content'</span> <span class="se">\</span>
  <span class="nt">--raw-output</span> | <span class="nb">base64</span> <span class="nt">--decode</span></code></pre></figure>

<p>If the output matches the value you passed in <code class="language-plaintext highlighter-rouge">var.db_password</code>, the complete cycle works: Terraform created the secret, OCI encrypted it with the MEK, and the CLI retrieved it correctly.</p>

<h3 id="verify-access-from-an-instance-with-instance-principal">Verify access from an instance with Instance Principal</h3>

<p>From an instance that belongs to the compartment configured in the Dynamic Group’s matching rule:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># On the compute instance — no static credentials needed</span>
oci secrets secret-bundle get <span class="se">\</span>
  <span class="nt">--secret-id</span> <span class="s2">"ocid1.vaultsecret.oc1.xxx"</span> <span class="se">\</span>
  <span class="nt">--auth</span> instance_principal <span class="se">\</span>
  <span class="nt">--query</span> <span class="s1">'data."secret-bundle-content".content'</span> <span class="nt">--raw-output</span> | <span class="nb">base64</span> <span class="nt">--decode</span></code></pre></figure>

<p>If this command returns the secret value without needing API keys configured on the instance, Instance Principal is working correctly. If it returns an authorization error, verify that the instance is in the correct compartment and that the Dynamic Group has the appropriate matching rule.</p>

<h3 id="verify-expiration-rules">Verify expiration rules</h3>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># View secret metadata including rules and expiration date</span>
oci vault secret get <span class="se">\</span>
  <span class="nt">--secret-id</span> <span class="s2">"</span><span class="si">$(</span>terraform output <span class="nt">-raw</span> db_secret_id<span class="si">)</span><span class="s2">"</span> <span class="se">\</span>
  <span class="nt">--query</span> <span class="s1">'data.{name:"secret-name", state:"lifecycle-state", rules:"secret-rules"}'</span></code></pre></figure>

<h2 id="best-practices">Best Practices</h2>

<p><strong>Never use <code class="language-plaintext highlighter-rouge">VIRTUAL_PRIVATE</code> in the same apply as the secrets if you’re just starting.</strong> The <code class="language-plaintext highlighter-rouge">VIRTUAL_PRIVATE</code> vault takes several minutes to provision its dedicated HSM. If Terraform tries to create keys and secrets before the vault is fully operational, the apply fails. Separating vault creation into its own module with a prior <code class="language-plaintext highlighter-rouge">terraform apply</code> avoids this problem.</p>

<p><strong>Use <code class="language-plaintext highlighter-rouge">protection_mode = "HSM"</code> in production, always.</strong> With <code class="language-plaintext highlighter-rouge">SOFTWARE</code>, the key material can be exported. That means with the right permissions, someone can extract the key from the vault. With <code class="language-plaintext highlighter-rouge">HSM</code>, the material never leaves the hardware. The additional cost of HSM is marginal compared to the risk of an exportable key.</p>

<p><strong>The vault type is immutable after creation.</strong> If you need to migrate from <code class="language-plaintext highlighter-rouge">DEFAULT</code> to <code class="language-plaintext highlighter-rouge">VIRTUAL_PRIVATE</code>, the process is: create a new <code class="language-plaintext highlighter-rouge">VIRTUAL_PRIVATE</code> vault, create new keys, rotate all secrets to the new vault, and delete the old one. There’s no in-place upgrade. Plan your vault type before the first deploy.</p>

<p><strong>Enable <code class="language-plaintext highlighter-rouge">is_secret_content_retrieval_blocked_on_expiry = true</code> in all expiration rules.</strong> The default is <code class="language-plaintext highlighter-rouge">false</code>, which turns expiration into a toothless alert. With <code class="language-plaintext highlighter-rouge">true</code>, OCI blocks access to the secret once it expires, forcing rotation. Without this, a secret “expired” six months ago is still accessible.</p>

<p><strong>Separate key management from secret management.</strong> Keys (MEK) are the responsibility of the security team. Individual secrets can be the responsibility of application teams, with the constraint that they can only use pre-approved keys. This is modeled in IAM by separating groups and policies: <code class="language-plaintext highlighter-rouge">SecurityAdmins</code> has <code class="language-plaintext highlighter-rouge">manage keys</code>, application teams have <code class="language-plaintext highlighter-rouge">use keys</code> and <code class="language-plaintext highlighter-rouge">manage secret-family</code> in their compartment.</p>

<p><strong>Use encrypted backends for Terraform state.</strong> If your Terraform backend is in OCI Object Storage, configure server-side encryption with an OCI Vault key. This doesn’t eliminate the risk of secrets being in the state, but adds a layer of at-rest protection with auditable access control.</p>

<p><strong>Prefer auto-generation or separate provisioning over reading secrets in Terraform.</strong> The safest pattern is for Terraform to never know the actual value of the secrets it manages. For database passwords, enable <code class="language-plaintext highlighter-rouge">enable_auto_generation</code>. For secrets you need to control, load them with the CLI in a separate pipeline step with reduced permissions.</p>

<h2 id="conclusion">Conclusion</h2>

<p>We built the complete OCI Vault infrastructure with Terraform: vault with correctly configured type and protection mode, AES-256 master encryption key in HSM, secrets with expiration rules that actually block access, and the correct IAM policies for both administrators and workloads via Instance Principal.</p>

<p>The most important point is not the code itself, but the gotchas you need to know before going to production: the vault type is irreversible, key length is in bytes, <code class="language-plaintext highlighter-rouge">oci_secrets_secretbundle</code> writes the value to the state, and <code class="language-plaintext highlighter-rouge">is_secret_content_retrieval_blocked_on_expiry</code> is <code class="language-plaintext highlighter-rouge">false</code> by default. With that clear, the rest is configuration.</p>

<p>The natural next step is integrating this vault with CI/CD pipelines using OCI DevOps or GitHub Actions with OIDC, so pipelines retrieve secrets at runtime without static credentials. That’s material for another post.</p>

<p>Happy scripting!</p>]]></content><author><name>Victor Silva</name></author><category term="Oracle" /><category term="Terraform" /><category term="OCI" /><category term="vault" /><category term="terraform" /><category term="secrets-management" /><category term="iam" /><category term="security" /><summary type="html"><![CDATA[Learn how to manage secrets in OCI Vault with Terraform: vault, keys, IAM policies, and the right pattern to prevent your secrets from ending up in the state file.]]></summary></entry><entry><title type="html">Oracle Cloud Security Zones: Custom Recipes, Terraform, and Day-2 Operations</title><link href="https://blog.victorsilva.com.uy/oci-security-zones-part2/" rel="alternate" type="text/html" title="Oracle Cloud Security Zones: Custom Recipes, Terraform, and Day-2 Operations" /><published>2026-03-25T01:18:34+00:00</published><updated>2026-03-25T01:18:34+00:00</updated><id>https://blog.victorsilva.com.uy/oci-security-zones-part2</id><content type="html" xml:base="https://blog.victorsilva.com.uy/oci-security-zones-part2/"><![CDATA[<p><a href="https://blog.victorsilva.com.uy/oci-security-zones/">Part 1 of this series</a> covered the conceptual foundation of OCI Security Zones: what they are, how they enforce policy by denying API calls outright, the relationship with Cloud Guard and Security Advisor, and what the Maximum Security Recipe actually blocks. If you haven’t read it, start there.</p>

<p>This post picks up where Part 1 left off. It answers the next set of questions practitioners ask after they understand the concept: <em>How do I build a custom recipe that fits my workload? How do I automate this with Terraform instead of clicking through the console? And what are the operational surprises waiting on day two?</em></p>

<h2 id="custom-recipes-vs-maximum-security-a-decision-framework">Custom Recipes vs. Maximum Security: A Decision Framework</h2>

<p>The Maximum Security Recipe is Oracle’s nuclear option — it enables every available Security Zone policy simultaneously and cannot be modified. In practice, most production workloads cannot tolerate it without significant architectural changes, because it blocks things like internet gateways, NAT gateways, public load balancers, volume detachment, instance termination, and OKE cluster operations.</p>

<p>Custom recipes let you select which Oracle-authored policies to include. You cannot write your own policy logic — the library is curated by Oracle — but you can assemble a policy set appropriate to each environment.</p>

<p>The most useful mental model for custom recipe construction comes from OCI’s own Landing Zone framework, which maps policies to <strong>CIS Benchmark levels</strong>:</p>

<table>
  <thead>
    <tr>
      <th>CIS Level</th>
      <th>Policies Included</th>
      <th>Best For</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Level 1</strong></td>
      <td>Deny public buckets, public subnets, internet gateway; deny databases without backup; require customer-managed encryption keys</td>
      <td>Most production workloads</td>
    </tr>
    <tr>
      <td><strong>Level 2</strong></td>
      <td>All Level 1 + data confinement policies, Oracle-approved configurations, port restriction</td>
      <td>Regulated data: PHI, PCI, classified</td>
    </tr>
  </tbody>
</table>

<p>Start with CIS Level 1 for standard production compartments. Apply Level 2 selectively to compartments holding your most sensitive data. Avoid Maximum Security unless you are in a greenfield environment specifically designed around its constraints — or you are onboarding to OCI via a Landing Zone that has already pre-validated compatibility.</p>

<p>The categories you most frequently need to reason about when customizing:</p>

<p><strong>Deny Public Access</strong> — the most operationally impactful category. Blocking <code class="language-plaintext highlighter-rouge">internet_gateway</code>, <code class="language-plaintext highlighter-rouge">NAT_gateway</code>, and <code class="language-plaintext highlighter-rouge">public_subnets</code> means your VCN topology must be private-only. For environments that legitimately need outbound internet access (to pull container images, reach OCI service endpoints, etc.), this either requires a shared services VCN with a NAT gateway outside the zone, or removing the NAT gateway policy from your custom recipe.</p>

<p><strong>Require Customer-Managed Encryption Keys</strong> — the four vault key policies (<code class="language-plaintext highlighter-rouge">deny block_volume_without_vault_key</code>, <code class="language-plaintext highlighter-rouge">deny boot_volume_without_vault_key</code>, <code class="language-plaintext highlighter-rouge">deny file_system_without_vault_key</code>, <code class="language-plaintext highlighter-rouge">deny buckets_without_vault_key</code>) require OCI Vault to be set up and a Master Encryption Key provisioned <em>before</em> applying the zone. Vault is not part of Always Free — you need a standard or virtual private vault. The vault should be in the same zone or a parent compartment to avoid key access itself violating zone policies.</p>

<p><strong>Oracle-Approved Configurations</strong> — this category includes policies that block compute instance termination (<code class="language-plaintext highlighter-rouge">deny terminate_instance</code>), volume detachment (<code class="language-plaintext highlighter-rouge">deny detach_volume</code>), and OKE operations (<code class="language-plaintext highlighter-rouge">deny manage_oke_service</code>). These are frequently too restrictive for teams that use autoscaling or perform routine maintenance. Exclude them from custom recipes unless you have a specific operational reason to include them.</p>

<h2 id="building-a-custom-recipe-via-cli">Building a Custom Recipe via CLI</h2>

<p>The CLI workflow has three steps: list available policies and collect the OCIDs you want, create a recipe from those OCIDs, then create the zone with the recipe.</p>

<h3 id="step-1-list-and-filter-available-policies">Step 1: List and filter available policies</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># List all available security policies in the tenancy</span>
oci cloud-guard security-policy-collection list-security-policies <span class="se">\</span>
  <span class="nt">--compartment-id</span> <span class="nv">$COMPARTMENT_ID</span> <span class="se">\</span>
  <span class="nt">--all</span>

<span class="c"># Filter to find a specific policy's OCID</span>
oci cloud-guard security-policy-collection list-security-policies <span class="se">\</span>
  <span class="nt">--compartment-id</span> <span class="nv">$COMPARTMENT_ID</span> <span class="se">\</span>
  <span class="nt">--display-name</span> <span class="s2">"deny public_buckets"</span> <span class="se">\</span>
  <span class="nt">--lifecycle-state</span> ACTIVE
</code></pre></div></div>

<p><strong>Important:</strong> Policy OCIDs are region-specific. You must look them up in the target tenancy and region — you cannot hardcode them from documentation or another environment.</p>

<h3 id="step-2-create-the-recipe">Step 2: Create the recipe</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>oci cloud-guard security-recipe create <span class="se">\</span>
  <span class="nt">--compartment-id</span> <span class="nv">$COMPARTMENT_ID</span> <span class="se">\</span>
  <span class="nt">--display-name</span> <span class="s2">"prod-cis-level-1-recipe"</span> <span class="se">\</span>
  <span class="nt">--security-policies</span> <span class="s1">'["ocid1.securityzonepolicy.oc1..aaa...xyz", "ocid1.securityzonepolicy.oc1..aaa...abc"]'</span>
</code></pre></div></div>

<h3 id="step-3-create-the-zone">Step 3: Create the zone</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>oci cloud-guard security-zone create <span class="se">\</span>
  <span class="nt">--compartment-id</span> <span class="nv">$COMPARTMENT_ID</span> <span class="se">\</span>
  <span class="nt">--display-name</span> <span class="s2">"production-security-zone"</span> <span class="se">\</span>
  <span class="nt">--security-zone-recipe-id</span> <span class="nv">$RECIPE_OCID</span>
</code></pre></div></div>

<p>To update an existing zone to use a different recipe:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>oci cloud-guard security-zone update <span class="se">\</span>
  <span class="nt">--security-zone-id</span> <span class="nv">$ZONE_OCID</span> <span class="se">\</span>
  <span class="nt">--security-zone-recipe-id</span> <span class="nv">$NEW_RECIPE_OCID</span>
</code></pre></div></div>

<h2 id="terraform-automation">Terraform Automation</h2>

<p>The CLI workflow above does not scale. For any environment managed as code, there are two Terraform paths: the Oracle Landing Zone module, or native provider resources.</p>

<h3 id="path-1-oracle-landing-zone-security-module">Path 1: Oracle Landing Zone Security Module</h3>

<p>Oracle publishes and maintains a <code class="language-plaintext highlighter-rouge">terraform-oci-modules-security</code> module (github.com/oci-landing-zones/terraform-oci-modules-security) that abstracts away the policy OCID lookup problem. You specify a <code class="language-plaintext highlighter-rouge">cis_level</code> and it selects the appropriate policies automatically:</p>

<div class="language-hcl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">module</span> <span class="s2">"security_zones"</span> <span class="p">{</span>
  <span class="nx">source</span>       <span class="p">=</span> <span class="s2">"github.com/oci-landing-zones/terraform-oci-modules-security//security-zones"</span>
  <span class="nx">tenancy_ocid</span> <span class="p">=</span> <span class="nx">var</span><span class="err">.</span><span class="nx">tenancy_ocid</span>

  <span class="nx">security_zones_configuration</span> <span class="p">=</span> <span class="p">{</span>
    <span class="nx">reporting_region</span> <span class="p">=</span> <span class="s2">"us-ashburn-1"</span>

    <span class="nx">recipes</span> <span class="p">=</span> <span class="p">{</span>
      <span class="nx">CIS</span><span class="err">-</span><span class="nx">L1</span><span class="err">-</span><span class="nx">RECIPE</span> <span class="p">=</span> <span class="p">{</span>
        <span class="nx">name</span>           <span class="p">=</span> <span class="s2">"prod-cis-level-1-recipe"</span>
        <span class="nx">description</span>    <span class="p">=</span> <span class="s2">"CIS Level 1 recipe for production workloads"</span>
        <span class="nx">compartment_id</span> <span class="p">=</span> <span class="nx">var</span><span class="err">.</span><span class="nx">compartment_id</span>
        <span class="nx">cis_level</span>      <span class="p">=</span> <span class="s2">"1"</span>
      <span class="p">}</span>
      <span class="nx">CIS</span><span class="err">-</span><span class="nx">L2</span><span class="err">-</span><span class="nx">RECIPE</span> <span class="p">=</span> <span class="p">{</span>
        <span class="nx">name</span>           <span class="p">=</span> <span class="s2">"sensitive-cis-level-2-recipe"</span>
        <span class="nx">description</span>    <span class="p">=</span> <span class="s2">"CIS Level 2 recipe for sensitive data compartments"</span>
        <span class="nx">compartment_id</span> <span class="p">=</span> <span class="nx">var</span><span class="err">.</span><span class="nx">compartment_id</span>
        <span class="nx">cis_level</span>      <span class="p">=</span> <span class="s2">"2"</span>
      <span class="p">}</span>
    <span class="p">}</span>

    <span class="nx">security_zones</span> <span class="p">=</span> <span class="p">{</span>
      <span class="nx">PROD</span><span class="err">-</span><span class="nx">ZONE</span> <span class="p">=</span> <span class="p">{</span>
        <span class="nx">name</span>           <span class="p">=</span> <span class="s2">"production-security-zone"</span>
        <span class="nx">compartment_id</span> <span class="p">=</span> <span class="nx">var</span><span class="err">.</span><span class="nx">prod_compartment_id</span>
        <span class="nx">recipe_key</span>     <span class="p">=</span> <span class="s2">"CIS-L1-RECIPE"</span>
      <span class="p">}</span>
      <span class="nx">SENSITIVE</span><span class="err">-</span><span class="nx">ZONE</span> <span class="p">=</span> <span class="p">{</span>
        <span class="nx">name</span>           <span class="p">=</span> <span class="s2">"sensitive-data-security-zone"</span>
        <span class="nx">compartment_id</span> <span class="p">=</span> <span class="nx">var</span><span class="err">.</span><span class="nx">sensitive_compartment_id</span>
        <span class="nx">recipe_key</span>     <span class="p">=</span> <span class="s2">"CIS-L2-RECIPE"</span>
      <span class="p">}</span>
    <span class="p">}</span>
  <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>Prerequisites before applying:</strong></p>
<ul>
  <li>Terraform &gt;= 1.3.0</li>
  <li>Cloud Guard must be enabled in the tenancy</li>
  <li>IAM policy: <code class="language-plaintext highlighter-rouge">allow group &lt;SecurityAdmins&gt; to manage cloud-guard-family in tenancy</code></li>
</ul>

<p>This module approach is the recommended path for teams using OCI Core or Zero Trust Landing Zones, because the module is tested against Oracle’s own reference architectures.</p>

<h3 id="path-2-native-provider-resources">Path 2: Native Provider Resources</h3>

<p>For teams that prefer direct resource control without the module abstraction:</p>

<div class="language-hcl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Fetch policy OCIDs from the tenancy — required because they are region-specific</span>
<span class="nx">data</span> <span class="s2">"oci_cloud_guard_security_policies"</span> <span class="s2">"all_policies"</span> <span class="p">{</span>
  <span class="nx">compartment_id</span> <span class="p">=</span> <span class="nx">var</span><span class="err">.</span><span class="nx">tenancy_ocid</span>
<span class="p">}</span>

<span class="c1"># Locals to extract specific policy OCIDs by name</span>
<span class="nx">locals</span> <span class="p">{</span>
  <span class="nx">policy_map</span> <span class="p">=</span> <span class="p">{</span>
    <span class="nx">for</span> <span class="nx">p</span> <span class="nx">in</span> <span class="nx">data</span><span class="err">.</span><span class="nx">oci_cloud_guard_security_policies</span><span class="err">.</span><span class="nx">all_policies</span><span class="err">.</span><span class="nx">security_policy_collection</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="err">.</span><span class="nx">items</span> <span class="err">:</span>
    <span class="nx">p</span><span class="err">.</span><span class="nx">display_name</span> <span class="p">=</span><span class="err">&gt;</span> <span class="nx">p</span><span class="err">.</span><span class="nx">id</span>
  <span class="p">}</span>
<span class="p">}</span>

<span class="nx">resource</span> <span class="s2">"oci_cloud_guard_security_recipe"</span> <span class="s2">"custom_recipe"</span> <span class="p">{</span>
  <span class="nx">compartment_id</span> <span class="p">=</span> <span class="nx">var</span><span class="err">.</span><span class="nx">compartment_id</span>
  <span class="nx">display_name</span>   <span class="p">=</span> <span class="s2">"custom-security-recipe"</span>
  <span class="nx">description</span>    <span class="p">=</span> <span class="s2">"Custom recipe for production workloads"</span>
  <span class="nx">security_policies</span> <span class="p">=</span> <span class="p">[</span>
    <span class="nx">local</span><span class="err">.</span><span class="nx">policy_map</span><span class="p">[</span><span class="s2">"deny public_buckets"</span><span class="p">],</span>
    <span class="nx">local</span><span class="err">.</span><span class="nx">policy_map</span><span class="p">[</span><span class="s2">"deny public_subnets"</span><span class="p">],</span>
    <span class="nx">local</span><span class="err">.</span><span class="nx">policy_map</span><span class="p">[</span><span class="s2">"deny internet_gateway"</span><span class="p">],</span>
    <span class="nx">local</span><span class="err">.</span><span class="nx">policy_map</span><span class="p">[</span><span class="s2">"deny block_volume_without_vault_key"</span><span class="p">],</span>
    <span class="nx">local</span><span class="err">.</span><span class="nx">policy_map</span><span class="p">[</span><span class="s2">"deny boot_volume_without_vault_key"</span><span class="p">],</span>
    <span class="nx">local</span><span class="err">.</span><span class="nx">policy_map</span><span class="p">[</span><span class="s2">"deny database_without_backup"</span><span class="p">],</span>
  <span class="p">]</span>
<span class="p">}</span>

<span class="nx">resource</span> <span class="s2">"oci_cloud_guard_security_zone"</span> <span class="s2">"production_zone"</span> <span class="p">{</span>
  <span class="nx">compartment_id</span>          <span class="p">=</span> <span class="nx">var</span><span class="err">.</span><span class="nx">compartment_id</span>
  <span class="nx">display_name</span>            <span class="p">=</span> <span class="s2">"production-security-zone"</span>
  <span class="nx">description</span>             <span class="p">=</span> <span class="s2">"Security zone for production workloads"</span>
  <span class="nx">security_zone_recipe_id</span> <span class="p">=</span> <span class="nx">oci_cloud_guard_security_recipe</span><span class="err">.</span><span class="nx">custom_recipe</span><span class="err">.</span><span class="nx">id</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Using the <code class="language-plaintext highlighter-rouge">data</code> source and <code class="language-plaintext highlighter-rouge">locals</code> to map policy names to OCIDs avoids hardcoded OCID strings that would break across regions and environments.</p>

<p>The OCI Console also provides a <strong>“Save as Stack”</strong> button during recipe and zone creation wizards. This exports a Terraform configuration to Oracle Resource Manager — useful for teams bootstrapping an IaC workflow from a console-based starting point.</p>

<h2 id="operational-lifecycle-what-happens-after-day-one">Operational Lifecycle: What Happens After Day One</h2>

<h3 id="the-cloud-guard-target-side-effect">The Cloud Guard Target Side Effect</h3>

<p>This is the most commonly missed operational detail: <strong>when you create a security zone on a compartment, OCI deletes any existing Cloud Guard target for that compartment and replaces it with a security zone target</strong>.</p>

<p>If you had a manually configured Cloud Guard target with custom detector recipes on that compartment, those configurations are gone. The replacement target gets the default Oracle-managed detector recipe.</p>

<p>Audit your Cloud Guard targets before applying security zones to existing compartments. If you have custom detector configuration you want to preserve, document it before creating the zone and reapply it to the new target afterward.</p>

<h3 id="subcompartment-hierarchy">Subcompartment Hierarchy</h3>

<p>When a security zone is applied to a parent compartment, all subcompartments are automatically included. Subcompartments can have their own separate security zones (which creates a distinct Cloud Guard target for the subcompartment). A subcompartment can also be removed from the parent zone entirely via the Security Zones console:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>oci cloud-guard security-zone remove <span class="se">\</span>
  <span class="nt">--security-zone-id</span> <span class="nv">$ZONE_OCID</span> <span class="se">\</span>
  <span class="nt">--compartment-id</span> <span class="nv">$SUBCOMPARTMENT_OCID</span>
</code></pre></div></div>

<p>The hard constraint remains: <strong>each compartment can belong to exactly one security zone</strong>. You cannot layer multiple recipes on a single compartment. If your workload needs different policy profiles within the same parent, the answer is separate child compartments with separate zones.</p>

<p>You also <strong>cannot move a compartment</strong> using the standard IAM console once it is part of a security zone. Use the Security Zones console for compartment operations.</p>

<h3 id="existing-resources-and-policy-violations">Existing Resources and Policy Violations</h3>

<p>Applying a security zone to a compartment that already contains non-compliant resources does not delete or modify those resources. Cloud Guard detects and reports the violations, but remediation is the operator’s responsibility.</p>

<p>The key constraint: you cannot move a non-compliant resource out of the compartment using movement-restriction policies — the movement itself would be denied by the zone. You must bring the resource into compliance in place (e.g., encrypting an unencrypted block volume with a Vault key) before the zone will treat it as fully compliant.</p>

<p>For the same reason, you cannot move a non-compliant resource <em>into</em> a security zone compartment. All policies must be satisfied before the move is permitted.</p>

<h3 id="name-immutability">Name Immutability</h3>

<p>Once a security zone is created, <strong>its name cannot be changed</strong>. Only the description and recipe assignment can be updated. Establish a naming convention before deployment — <code class="language-plaintext highlighter-rouge">&lt;env&gt;-&lt;purpose&gt;-security-zone</code> works well — and document it. Renaming requires deleting and recreating the zone, which resets the Cloud Guard target again.</p>

<h2 id="common-gotchas">Common Gotchas</h2>

<p><strong>Root compartment warning.</strong> Oracle’s documentation explicitly cautions against assigning a security zone to the root (tenancy) compartment. Doing so applies zone policies to every resource across the entire tenancy, which blocks a wide range of routine administrative operations. Apply zones at the workload compartment level, not the root.</p>

<p><strong>Database compatibility.</strong> Not all database configurations are compatible with Security Zones. Incompatible with Maximum Security Recipe: Always Free Autonomous Databases and Autonomous Database with public endpoints. Compatible (paid, private endpoint configurations): Autonomous AI Database, Bare Metal DB systems, Virtual Machine DB systems, and Exadata Cloud DB systems. Data Guard associations must be within the same security zone compartments — cross-zone Data Guard is blocked.</p>

<p><strong>Vault must exist before encryption policies apply.</strong> The four <code class="language-plaintext highlighter-rouge">deny *_without_vault_key</code> policies will cause resource creation to fail unless you have an OCI Vault with a Master Encryption Key already provisioned and accessible. If you include encryption policies in your recipe, provision the vault as part of the same Terraform apply (with correct ordering) or as a prerequisite stack. The vault should be in the same or a parent compartment to avoid key access itself triggering zone violations.</p>

<p><strong>Policy OCIDs are region-specific.</strong> Do not copy OCID values from one region’s recipe to another. Always look up policy OCIDs in the target region, either via CLI or the <code class="language-plaintext highlighter-rouge">data</code> source in Terraform. The module approach avoids this problem entirely by resolving OCIDs internally.</p>

<h2 id="what-to-cover-in-part-3">What to Cover in Part 3</h2>

<p>The natural next topic in this series is Cloud Guard in depth: how detector recipes work, when to use the Oracle-managed recipe vs. a custom one, how auto-remediation is configured, and how to interpret Cloud Guard’s risk score output in the context of Security Zone policy violations. Zero Trust Packet Routing (ZPR) — Oracle’s newer, attribute-based network control layer — is also worth its own post as a complement to Security Zones for teams building on OCI’s security architecture.</p>

<p><em>References: <a href="https://docs.oracle.com/iaas/security-zone/using/security-zone-policies.htm">Security Zone Policies — Oracle Docs</a> · <a href="https://github.com/oci-landing-zones/terraform-oci-modules-security">terraform-oci-modules-security</a> · <a href="https://www.ateam-oracle.com/safeguard-your-tenancy-with-custom-security-zones">Safeguard Your Tenancy With Custom Security Zones — Oracle A-Team</a> · <a href="https://registry.terraform.io/providers/oracle/oci/latest/docs/resources/cloud_guard_security_zone">oci_cloud_guard_security_zone — Terraform Registry</a></em></p>

<p>Happy scripting!</p>]]></content><author><name>Victor Silva</name></author><category term="Oracle" /><category term="Security" /><category term="Oracle Cloud" /><category term="Security Zones" /><category term="Security" /><summary type="html"><![CDATA[The Maximum Security Recipe is Oracle's nuclear option — it enables every available Security Zone policy simultaneously and cannot be modified. In practice, most production workloads cannot tolerate it without significant architectural changes, because it blocks things like internet gateways, NAT gateways, public load balancers, volume detachment, instance termination, and OKE cluster operations.]]></summary></entry><entry><title type="html">Building a Compliance-as-Code agent</title><link href="https://blog.victorsilva.com.uy/compliance-as-code-agent/" rel="alternate" type="text/html" title="Building a Compliance-as-Code agent" /><published>2025-12-30T06:15:31+00:00</published><updated>2025-12-30T06:15:31+00:00</updated><id>https://blog.victorsilva.com.uy/az-compliance-as-code-agent</id><content type="html" xml:base="https://blog.victorsilva.com.uy/compliance-as-code-agent/"><![CDATA[<p>Manual compliance reviews are the bottleneck nobody talks about. Your infrastructure code sits in a pull request, waiting for someone to verify naming conventions, check security policies, and ensure resource configurations align with company standards. Hours or even days pass before deployment can proceed.</p>

<p>There’s a better way: intelligent automation that understands your policies and validates infrastructure code before it ever reaches production.</p>

<h2 id="the-challenge-with-traditional-compliance">The Challenge with Traditional Compliance</h2>

<p>Most organizations handle infrastructure compliance through one of two approaches, both flawed:</p>

<p><strong>Manual code reviews</strong> consume significant engineering time and introduce human error. Reviewers might miss subtle violations or apply policies inconsistently across teams.</p>

<p><strong>Static linting tools</strong> catch syntax issues but lack contextual understanding. They can’t interpret nuanced business rules or explain <em>why</em> something violates policy.</p>

<p>What we need is something that combines the intelligence of human review with the consistency and speed of automation.</p>

<h2 id="enter-policy-aware-ai-agents">Enter: Policy-Aware AI Agents</h2>

<p>The solution leverages an AI agent specifically trained on your organization’s compliance documentation. Rather than relying on generic best practices, this agent evaluates infrastructure code against your actual internal policies.</p>

<p>Here’s what makes this approach powerful:</p>

<p><strong>Context-aware analysis</strong> - The agent understands not just Terraform syntax, but your specific requirements around naming, tagging, regions, and resource configurations.</p>

<p><strong>Structured output</strong> - Every compliance check returns a clear verdict with detailed violation descriptions and policy references.</p>

<p><strong>No hallucinations</strong> - By constraining the AI to only reference provided documentation through RAG, you eliminate unreliable suggestions based on general internet knowledge.</p>

<h2 id="architecture-overview">Architecture Overview</h2>

<p>The system consists of two primary components working together:</p>

<h3 id="the-compliance-agent">The Compliance Agent</h3>

<p>Built on <a href="https://ai.azure.com/">Microsoft Foundry</a>, this agent serves as your automated auditor. It receives Terraform code as input and returns structured compliance verdicts.</p>

<p>The agent’s behavior is controlled through a carefully designed system prompt that:</p>

<ul>
  <li>Defines its role as a compliance auditor</li>
  <li>Restricts it to only using provided policy documents</li>
  <li>Enforces a specific JSON output format</li>
  <li>Handles edge cases like invalid input or missing rules</li>
</ul>

<p>Here’s a sample of what the policy documentation might include:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Resource Naming Standards:
Format: &lt;type&gt;-&lt;identifier&gt;-&lt;environment&gt;
Example: rg-webapp-prod

Required Tags:
- Environment: must be dev, stg, or prod
- Cost-center: must match approved list

Approved Regions:
- Primary: eastus
- Secondary: westus
</code></pre></div></div>

<h3 id="the-cicd-integration">The CI/CD Integration</h3>

<p>The agent plugs directly into your deployment pipeline. When developers push Terraform code, the pipeline automatically:</p>

<ol>
  <li>Extracts the infrastructure definitions</li>
  <li>Sends them to the compliance agent</li>
  <li>Receives a structured verdict</li>
  <li>Blocks or approves the deployment based on results</li>
</ol>

<p>This happens in seconds, providing immediate feedback to developers while maintaining consistent policy enforcement.</p>

<h2 id="implementation-walkthrough">Implementation Walkthrough</h2>

<h3 id="setting-up-the-ai-agent">Setting Up the AI Agent</h3>

<p>Start by creating a new AI agent project in Microsoft Foundry. Select an appropriate language model variants work well for code analysis, though gpt-4.1 suffices for simpler use cases.</p>

<p>The critical step is crafting your system prompt. This prompt must be explicit about:</p>

<ul>
  <li>What constitutes valid input</li>
  <li>How to structure responses</li>
  <li>What to do when rules are ambiguous</li>
  <li>How to cite policy violations</li>
</ul>

<p>Your prompt should enforce a consistent output schema. Something like:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"verdict"</span><span class="p">:</span><span class="w"> </span><span class="s2">"COMPLIANT | NON-COMPLIANT | UNKNOWN | INVALID_INPUT"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"analysis"</span><span class="p">:</span><span class="w"> </span><span class="s2">"detailed explanation here"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"violations"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
    </span><span class="p">{</span><span class="w">
      </span><span class="nl">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"what's wrong"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"policy_source"</span><span class="p">:</span><span class="w"> </span><span class="s2">"which rule was violated"</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<h3 id="connecting-policy-documents">Connecting Policy Documents</h3>

<p>The agent needs access to your compliance documentation. Azure AI Search provides the infrastructure for this through RAG implementation.</p>

<p>Upload your policy documents—security guidelines, naming conventions, network topology requirements—to Azure AI Search. These become the knowledge base the agent queries when evaluating code.</p>

<p>The beauty of RAG is that updating policies is straightforward. Add new documents or modify existing ones, and the agent immediately incorporates those changes without requiring prompt retraining.</p>

<p>You can use the tools section to upload directly files. For this example can we use the next content as a policy:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1. Naming convention for resources

All resources must follow this format: `&lt;type&gt;-&lt;uniqueId&gt;-&lt;env&gt;`
Examples:
rg-core-dev (Resource Group for developtment)
sa-1234-prod (Storage Account for production)
Supported types:
- rg: Resource Group
- sa: Storage Account
- vnet: Virtual Network
- sn: Subnet
- vm: Virtual Machine
- nic: Network Interface
Supported environments:
- dev, stg, prod

2. Tags must be applied

All resources must include this tag in Terraform:
tags = {
  env = "dev" | "stg" | "prod"
}

3. Required location

All resources must be deployed to: `eastus`
Example:
location = "eastus"
</code></pre></div></div>

<h3 id="testing-and-validation">Testing and Validation</h3>

<p>Before integrating into production pipelines, thoroughly test your agent. Create Terraform examples that intentionally violate various policies:</p>

<div class="language-hcl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">resource</span> <span class="s2">"azurerm_storage_account"</span> <span class="s2">"example"</span> <span class="p">{</span>
  <span class="nx">name</span> <span class="p">=</span> <span class="s2">"storageaccount123"</span>
  <span class="nx">resource_group_name</span> <span class="p">=</span> <span class="s2">"default-rg"</span>
  <span class="nx">location</span> <span class="p">=</span> <span class="s2">"centralus"</span>
  <span class="nx">account_tier</span> <span class="p">=</span> <span class="s2">"Standard"</span>
  <span class="nx">account_replication_type</span> <span class="p">=</span> <span class="s2">"LRS"</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The agent should catch:</p>
<ul>
  <li>Naming convention violations</li>
  <li>Incorrect region usage</li>
  <li>Missing mandatory tags</li>
</ul>

<p>Verify that it correctly references your specific policies in its violation descriptions.</p>

<p><img src="/assets/images/postsImages/AZ_Agent_01.png" alt="Agent Compliance-as-Code output" /></p>

<h3 id="pipeline-integration">Pipeline Integration</h3>

<p>Add a compliance stage to your Azure DevOps pipeline:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="pi">-</span> <span class="na">stage</span><span class="pi">:</span> <span class="s">InfrastructureCompliance</span>
  <span class="na">jobs</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">job</span><span class="pi">:</span> <span class="s">ValidateCompliance</span>
    <span class="na">steps</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">task</span><span class="pi">:</span> <span class="s">UsePythonVersion@0</span>
      <span class="na">inputs</span><span class="pi">:</span>
        <span class="na">versionSpec</span><span class="pi">:</span> <span class="s1">'</span><span class="s">3.x'</span>
    
    <span class="pi">-</span> <span class="na">script</span><span class="pi">:</span> <span class="pi">|</span>
        <span class="s">pip install azure-ai-projects requests</span>
      <span class="na">displayName</span><span class="pi">:</span> <span class="s1">'</span><span class="s">Install</span><span class="nv"> </span><span class="s">dependencies'</span>
    
    <span class="pi">-</span> <span class="na">script</span><span class="pi">:</span> <span class="pi">|</span>
        <span class="s">python scripts/run_compliance_check.py \</span>
          <span class="s">--terraform-path ./infrastructure</span>
      <span class="na">displayName</span><span class="pi">:</span> <span class="s1">'</span><span class="s">Execute</span><span class="nv"> </span><span class="s">compliance</span><span class="nv"> </span><span class="s">validation'</span>
    
    <span class="pi">-</span> <span class="na">script</span><span class="pi">:</span> <span class="pi">|</span>
        <span class="s">RESULT=$(cat compliance_result.json | jq -r '.verdict')</span>
        <span class="s">if [ "$RESULT" != "COMPLIANT" ]; then</span>
          <span class="s">echo "Compliance check failed"</span>
          <span class="s">exit 1</span>
        <span class="s">fi</span>
      <span class="na">displayName</span><span class="pi">:</span> <span class="s1">'</span><span class="s">Evaluate</span><span class="nv"> </span><span class="s">verdict'</span>
</code></pre></div></div>

<p>This stage runs before any actual infrastructure deployment, catching issues early.</p>

<h2 id="getting-started">Getting Started</h2>

<p>If you’re interested in implementing something similar:</p>

<ol>
  <li>Start small with a single, well-defined policy (like resource naming)</li>
  <li>Test extensively with both compliant and non-compliant examples</li>
  <li>Integrate into a non-production pipeline first</li>
  <li>Gather feedback from developers</li>
  <li>Gradually expand to additional policies</li>
</ol>

<p>The goal isn’t perfection from day one, but rather continuous improvement of your infrastructure governance.</p>

<h2 id="closing-thoughts">Closing Thoughts</h2>

<p>Infrastructure compliance doesn’t have to be a manual slog. By combining AI capabilities with structured policy documentation and automated pipelines, you can create a system that’s both more reliable and more efficient than traditional approaches.</p>

<p>The technology is mature enough for production use today. The real challenge is organizational: clearly documenting your policies, building trust in automated systems, and changing team workflows to embrace this new approach.</p>

<p>For teams shipping infrastructure changes daily, this investment pays dividends quickly. The alternative—scaling manual review processes—simply doesn’t work at modern deployment velocities.</p>

<p>Happy scripting!</p>]]></content><author><name>Victor Silva</name></author><category term="Azure" /><category term="Azure" /><summary type="html"><![CDATA[Manual compliance reviews are the bottleneck nobody talks about. Your infrastructure code sits in a pull request, waiting for someone to verify naming conventions, check security policies, and ensure resource configurations align with company standards. Hours or even days pass before deployment can proceed.]]></summary></entry><entry><title type="html">GCP Agent Development Kit: Automated Compliance Reporter</title><link href="https://blog.victorsilva.com.uy/gcp-adk-compliance-reporter/" rel="alternate" type="text/html" title="GCP Agent Development Kit: Automated Compliance Reporter" /><published>2025-12-15T21:34:29+00:00</published><updated>2025-12-15T21:34:29+00:00</updated><id>https://blog.victorsilva.com.uy/gcp-adk-compliance-reporter-audit-logs</id><content type="html" xml:base="https://blog.victorsilva.com.uy/gcp-adk-compliance-reporter/"><![CDATA[<p>Compliance audits have a standard ritual: gather evidence, review it for policy violations, classify the findings, write a report, and hand it to someone who will ask you to do it again next quarter. The gather-and-review steps are the bottleneck nobody talks about. In a GCP environment with half a dozen services generating audit events, even a 24-hour window can produce thousands of log entries. Manually grepping Cloud Logging for <code class="language-plaintext highlighter-rouge">SetIamPolicy</code> calls, checking whether <code class="language-plaintext highlighter-rouge">secretmanager.googleapis.com</code> data access logs are even enabled, and then formatting the results into something an auditor can read is the kind of work that takes an afternoon, introduces human error, and is immediately out of date by the time it is filed. The GCP Agent Development Kit gives you a better option.</p>

<p>The GCP Agent Development Kit changes that equation. ADK lets you wire a Gemini model to plain Python functions and give it a compliance mandate as natural language instructions. The agent reasons over what it needs to do, calls your tools to pull audit data, analyzes the results, and produces a structured JSON report — all autonomously. This post builds exactly that: a compliance reporter that queries Cloud Audit Logs, classifies IAM mutations, secret access events, and auth failures by risk level, and delivers the report to Pub/Sub or Cloud Storage. Deployable to Cloud Run and schedulable with Cloud Scheduler, it runs daily without touching a console.</p>

<h2 id="what-is-gcp-agent-development-kit">What is GCP Agent Development Kit</h2>

<p>ADK was announced at Google Cloud Next 2025 as an open-source Python framework for building agents powered by Gemini. The core idea is deliberately minimal: you write plain Python functions, decorate them with docstrings, and pass them to an <code class="language-plaintext highlighter-rouge">Agent</code>. The framework handles the Gemini API calls, the tool dispatch loop, and session state. You focus on what the agent should do and what tools it has access to.</p>

<p>The four concepts you need to hold in your head are:</p>

<ul>
  <li><strong>Agent</strong> — the central object. It holds the model name, a natural language instruction that defines its role and behaviour, and the list of tools it can call.</li>
  <li><strong>Tools</strong> — ordinary Python functions. The function’s name and docstring become the tool schema that Gemini uses to decide when and how to call the function. No decorators, no registration step.</li>
  <li><strong>Runner</strong> — executes the agent against a session. <code class="language-plaintext highlighter-rouge">InMemoryRunner</code> is the development-time runner; for production you would use a persistent session backend.</li>
  <li><strong>Sessions</strong> — track conversation state across multiple turns. For a compliance reporter that runs a single audit pass, one session per run is all you need.</li>
</ul>

<p>Install the framework with:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash">pip <span class="nb">install </span>google-adk</code></pre></figure>

<p>The package includes the <code class="language-plaintext highlighter-rouge">google.adk.agents</code>, <code class="language-plaintext highlighter-rouge">google.adk.runners</code>, and supporting modules. It depends on <code class="language-plaintext highlighter-rouge">google-genai</code> for the underlying model calls, which is pulled in automatically.</p>

<h2 id="cloud-audit-logs-as-a-compliance-data-source">Cloud Audit Logs as a compliance data source</h2>

<p>Cloud Audit Logs are the authoritative record of who did what in your GCP environment and when. There are four log types, and understanding which are enabled by default is the first compliance gap to close.</p>

<p><strong>Admin Activity logs</strong> record API calls that create, modify, or delete resources — <code class="language-plaintext highlighter-rouge">CreateBucket</code>, <code class="language-plaintext highlighter-rouge">SetIamPolicy</code>, <code class="language-plaintext highlighter-rouge">CreateServiceAccountKey</code>. They are always on, cannot be disabled, and carry no additional cost. These are your primary source for IAM and RBAC mutation events.</p>

<p><strong>Data Access logs</strong> record API calls that read resource configurations or read user-provided data. They are disabled by default for every service. This is the most common compliance gap: teams assume that secret reads or BigQuery query executions are logged, but they are not unless Data Access logging has been explicitly enabled for each service. Enabling them generates significant log volume and cost on active projects, so you enable them selectively for high-value services — Secret Manager, KMS, BigQuery.</p>

<p><strong>System Event logs</strong> record GCP system actions that modify resources, such as live migration of a VM. They are always on, generated by Google systems rather than user activity, and are rarely the primary focus of a compliance audit.</p>

<p><strong>Policy Denied logs</strong> record when a Cloud IAM policy denies access to a resource. They are always on and are your primary source for authentication failure and unauthorized access events.</p>

<p>Key fields in an audit log entry that matter for compliance analysis:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">protoPayload.methodName</code> — the API method that was called (e.g., <code class="language-plaintext highlighter-rouge">google.iam.v1.IAMPolicy.SetIamPolicy</code>)</li>
  <li><code class="language-plaintext highlighter-rouge">protoPayload.authenticationInfo.principalEmail</code> — the identity that made the call</li>
  <li><code class="language-plaintext highlighter-rouge">protoPayload.resourceName</code> — the full resource path the call targeted</li>
  <li><code class="language-plaintext highlighter-rouge">protoPayload.status.code</code> — a non-zero value indicates a failed or denied call</li>
  <li><code class="language-plaintext highlighter-rouge">protoPayload.requestMetadata.callerIp</code> — the source IP address</li>
  <li><code class="language-plaintext highlighter-rouge">timestamp</code> — when the event occurred</li>
</ul>

<h2 id="architecture">Architecture</h2>

<p>The compliance reporter follows a straightforward data flow. The ADK agent sits at the center, using Gemini to reason over which audit log queries to run, what the results mean, and where to send the report.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Cloud Audit Logs (Logging API)
         |
         | query_audit_logs()
         v
  +--------------------+
  |   ADK Agent        |
  |   (Gemini 2.5      |
  |    Flash)          |
  |                    |
  |  - query logs      |
  |  - analyze         |
  |  - classify risk   |
  |  - build report    |
  +--------------------+
         |
    +---------+
    |         |
    v         v
 Pub/Sub    Cloud
 Topic      Storage
            (GCS)
</code></pre></div></div>

<p>The agent issues multiple tool calls in sequence: query <code class="language-plaintext highlighter-rouge">activity</code> logs for IAM mutations, query <code class="language-plaintext highlighter-rouge">data_access</code> logs for secret access events, query <code class="language-plaintext highlighter-rouge">policy</code> logs for auth failures. After each query it accumulates findings. When all queries are complete it calls the delivery tool to publish the report. Gemini drives the sequencing — the instruction defines the compliance checks, not the orchestration code.</p>

<h2 id="prerequisites">Prerequisites</h2>

<p>You will need:</p>

<ul>
  <li>Python 3.11 or later</li>
  <li><code class="language-plaintext highlighter-rouge">gcloud</code> CLI installed and authenticated to the target project</li>
  <li>The Cloud Logging, Pub/Sub, and Cloud Storage APIs enabled</li>
  <li>A GCP project with audit log data to query</li>
</ul>

<p>Verify your active project and authentication:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash">gcloud config get-value project
gcloud auth application-default login</code></pre></figure>

<p>Enable the required APIs:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash">gcloud services <span class="nb">enable</span> <span class="se">\</span>
  logging.googleapis.com <span class="se">\</span>
  pubsub.googleapis.com <span class="se">\</span>
  storage.googleapis.com <span class="se">\</span>
  aiplatform.googleapis.com <span class="se">\</span>
  <span class="nt">--project</span><span class="o">=</span>PROJECT_ID</code></pre></figure>

<h3 id="iam-roles">IAM roles</h3>

<p>Create a dedicated service account with minimum required permissions:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Create the service account</span>
gcloud iam service-accounts create compliance-reporter-sa <span class="se">\</span>
  <span class="nt">--project</span><span class="o">=</span>PROJECT_ID

<span class="c"># Assign minimum roles</span>
<span class="k">for </span>role <span class="k">in </span>roles/logging.viewer roles/pubsub.publisher roles/storage.objectUser<span class="p">;</span> <span class="k">do
  </span>gcloud projects add-iam-policy-binding PROJECT_ID <span class="se">\</span>
    <span class="nt">--member</span><span class="o">=</span><span class="s2">"serviceAccount:compliance-reporter-sa@PROJECT_ID.iam.gserviceaccount.com"</span> <span class="se">\</span>
    <span class="nt">--role</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">role</span><span class="k">}</span><span class="s2">"</span>
<span class="k">done</span></code></pre></figure>

<p><code class="language-plaintext highlighter-rouge">roles/logging.viewer</code> is sufficient to call <code class="language-plaintext highlighter-rouge">list_entries</code> against Cloud Logging. <code class="language-plaintext highlighter-rouge">roles/pubsub.publisher</code> allows publishing to a topic. <code class="language-plaintext highlighter-rouge">roles/storage.objectUser</code> allows writing objects to a GCS bucket.</p>

<h3 id="enabling-data-access-logs-for-secret-manager">Enabling Data Access logs for Secret Manager</h3>

<p>By default, Secret Manager does not log data access events. To add secret access to your compliance audit coverage, enable Data Access logs for the service. The cleanest approach is to export the project IAM policy, add the audit configuration, and re-apply it:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash">gcloud projects get-iam-policy PROJECT_ID <span class="nt">--format</span><span class="o">=</span>json <span class="o">&gt;</span> policy.json</code></pre></figure>

<p>Add the following <code class="language-plaintext highlighter-rouge">auditConfigs</code> block to <code class="language-plaintext highlighter-rouge">policy.json</code> alongside the existing <code class="language-plaintext highlighter-rouge">bindings</code> array:</p>

<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w">
  </span><span class="nl">"auditConfigs"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
    </span><span class="p">{</span><span class="w">
      </span><span class="nl">"service"</span><span class="p">:</span><span class="w"> </span><span class="s2">"secretmanager.googleapis.com"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"auditLogConfigs"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
        </span><span class="p">{</span><span class="w"> </span><span class="nl">"logType"</span><span class="p">:</span><span class="w"> </span><span class="s2">"DATA_READ"</span><span class="w"> </span><span class="p">},</span><span class="w">
        </span><span class="p">{</span><span class="w"> </span><span class="nl">"logType"</span><span class="p">:</span><span class="w"> </span><span class="s2">"DATA_WRITE"</span><span class="w"> </span><span class="p">}</span><span class="w">
      </span><span class="p">]</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">]</span><span class="w">
</span><span class="p">}</span></code></pre></figure>

<p>Apply the updated policy:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash">gcloud projects set-iam-policy PROJECT_ID policy.json</code></pre></figure>

<p>Secret Manager data access events will now appear in <code class="language-plaintext highlighter-rouge">cloudaudit.googleapis.com/data_access</code> logs within a few minutes of the policy change taking effect.</p>

<h2 id="building-the-compliance-tools">Building the compliance tools</h2>

<p>ADK tools are plain Python functions. Gemini reads the function name, the parameter names and types, and the docstring to understand when to call the function and what arguments to pass. The pattern requires nothing beyond a well-written docstring — no decorators, no schema definitions.</p>

<h3 id="query_audit_logs">query_audit_logs</h3>

<p>This is the primary data-gathering tool. It wraps the Cloud Logging Python client and returns a normalized list of audit entries:</p>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">google.cloud</span> <span class="kn">import</span> <span class="n">logging</span> <span class="k">as</span> <span class="n">gcp_logging</span>
<span class="kn">from</span> <span class="nn">datetime</span> <span class="kn">import</span> <span class="n">datetime</span><span class="p">,</span> <span class="n">timedelta</span><span class="p">,</span> <span class="n">timezone</span>


<span class="k">def</span> <span class="nf">query_audit_logs</span><span class="p">(</span>
    <span class="n">project_id</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span>
    <span class="n">log_type</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span>
    <span class="n">hours_back</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">24</span><span class="p">,</span>
    <span class="n">max_results</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">200</span><span class="p">,</span>
    <span class="n">service_name</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="bp">None</span><span class="p">,</span>
<span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">dict</span><span class="p">:</span>
    <span class="s">"""Query Cloud Audit Logs for compliance-relevant events.

    Args:
        project_id: GCP project ID to query.
        log_type: One of 'activity', 'data_access', 'system_event', or 'policy'.
        hours_back: How many hours back to search from now.
        max_results: Maximum number of log entries to return.
        service_name: Optional GCP service name to filter on (e.g.,
            'secretmanager.googleapis.com').

    Returns:
        dict with 'status', 'entry_count', and 'entries' list.
    """</span>
    <span class="n">client</span> <span class="o">=</span> <span class="n">gcp_logging</span><span class="p">.</span><span class="n">Client</span><span class="p">(</span><span class="n">project</span><span class="o">=</span><span class="n">project_id</span><span class="p">)</span>
    <span class="n">end_time</span> <span class="o">=</span> <span class="n">datetime</span><span class="p">.</span><span class="n">now</span><span class="p">(</span><span class="n">timezone</span><span class="p">.</span><span class="n">utc</span><span class="p">)</span>
    <span class="n">start_time</span> <span class="o">=</span> <span class="n">end_time</span> <span class="o">-</span> <span class="n">timedelta</span><span class="p">(</span><span class="n">hours</span><span class="o">=</span><span class="n">hours_back</span><span class="p">)</span>

    <span class="n">log_name</span> <span class="o">=</span> <span class="p">(</span>
        <span class="sa">f</span><span class="s">"projects/</span><span class="si">{</span><span class="n">project_id</span><span class="si">}</span><span class="s">/logs/cloudaudit.googleapis.com%2F</span><span class="si">{</span><span class="n">log_type</span><span class="si">}</span><span class="s">"</span>
    <span class="p">)</span>
    <span class="n">filter_parts</span> <span class="o">=</span> <span class="p">[</span>
        <span class="sa">f</span><span class="s">'logName="</span><span class="si">{</span><span class="n">log_name</span><span class="si">}</span><span class="s">"'</span><span class="p">,</span>
        <span class="sa">f</span><span class="s">'timestamp&gt;="</span><span class="si">{</span><span class="n">start_time</span><span class="p">.</span><span class="n">strftime</span><span class="p">(</span><span class="s">"%Y-%m-%dT%H</span><span class="si">:</span><span class="o">%</span><span class="n">M</span><span class="si">:</span><span class="o">%</span><span class="n">SZ</span><span class="s">")</span><span class="si">}</span><span class="s">"'</span><span class="p">,</span>
        <span class="sa">f</span><span class="s">'timestamp&lt;="</span><span class="si">{</span><span class="n">end_time</span><span class="p">.</span><span class="n">strftime</span><span class="p">(</span><span class="s">"%Y-%m-%dT%H</span><span class="si">:</span><span class="o">%</span><span class="n">M</span><span class="si">:</span><span class="o">%</span><span class="n">SZ</span><span class="s">")</span><span class="si">}</span><span class="s">"'</span><span class="p">,</span>
    <span class="p">]</span>
    <span class="k">if</span> <span class="n">service_name</span><span class="p">:</span>
        <span class="n">filter_parts</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="sa">f</span><span class="s">'protoPayload.serviceName="</span><span class="si">{</span><span class="n">service_name</span><span class="si">}</span><span class="s">"'</span><span class="p">)</span>

    <span class="n">entries</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">client</span><span class="p">.</span><span class="n">list_entries</span><span class="p">(</span>
        <span class="n">resource_names</span><span class="o">=</span><span class="p">[</span><span class="sa">f</span><span class="s">"projects/</span><span class="si">{</span><span class="n">project_id</span><span class="si">}</span><span class="s">"</span><span class="p">],</span>
        <span class="n">filter_</span><span class="o">=</span><span class="s">" AND "</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">filter_parts</span><span class="p">),</span>
        <span class="n">order_by</span><span class="o">=</span><span class="s">"timestamp desc"</span><span class="p">,</span>
        <span class="n">max_results</span><span class="o">=</span><span class="n">max_results</span><span class="p">,</span>
    <span class="p">):</span>
        <span class="n">proto</span> <span class="o">=</span> <span class="n">entry</span><span class="p">.</span><span class="n">payload</span> <span class="ow">or</span> <span class="p">{}</span>
        <span class="n">entries</span><span class="p">.</span><span class="n">append</span><span class="p">({</span>
            <span class="s">"timestamp"</span><span class="p">:</span> <span class="n">entry</span><span class="p">.</span><span class="n">timestamp</span><span class="p">.</span><span class="n">isoformat</span><span class="p">()</span> <span class="k">if</span> <span class="n">entry</span><span class="p">.</span><span class="n">timestamp</span> <span class="k">else</span> <span class="bp">None</span><span class="p">,</span>
            <span class="s">"method_name"</span><span class="p">:</span> <span class="n">proto</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">"methodName"</span><span class="p">),</span>
            <span class="s">"principal_email"</span><span class="p">:</span> <span class="n">proto</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">"authenticationInfo"</span><span class="p">,</span> <span class="p">{}).</span><span class="n">get</span><span class="p">(</span><span class="s">"principalEmail"</span><span class="p">),</span>
            <span class="s">"resource_name"</span><span class="p">:</span> <span class="n">proto</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">"resourceName"</span><span class="p">),</span>
            <span class="s">"status_code"</span><span class="p">:</span> <span class="n">proto</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">"status"</span><span class="p">,</span> <span class="p">{}).</span><span class="n">get</span><span class="p">(</span><span class="s">"code"</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span>
            <span class="s">"caller_ip"</span><span class="p">:</span> <span class="n">proto</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">"requestMetadata"</span><span class="p">,</span> <span class="p">{}).</span><span class="n">get</span><span class="p">(</span><span class="s">"callerIp"</span><span class="p">),</span>
        <span class="p">})</span>

    <span class="k">return</span> <span class="p">{</span><span class="s">"status"</span><span class="p">:</span> <span class="s">"success"</span><span class="p">,</span> <span class="s">"entry_count"</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="n">entries</span><span class="p">),</span> <span class="s">"entries"</span><span class="p">:</span> <span class="n">entries</span><span class="p">}</span></code></pre></figure>

<p>The <code class="language-plaintext highlighter-rouge">log_type</code> parameter maps directly to the audit log path segment: <code class="language-plaintext highlighter-rouge">activity</code>, <code class="language-plaintext highlighter-rouge">data_access</code>, <code class="language-plaintext highlighter-rouge">system_event</code>, or <code class="language-plaintext highlighter-rouge">policy</code>. The payload is a <code class="language-plaintext highlighter-rouge">proto_struct</code> dict when the log entry carries a <code class="language-plaintext highlighter-rouge">protoPayload</code>, which all audit log entries do. The function normalizes the fields Gemini will reason over into a flat dict per entry, which keeps the context window clean.</p>

<h3 id="publish_report_to_pubsub">publish_report_to_pubsub</h3>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">json</span>
<span class="kn">from</span> <span class="nn">google.cloud</span> <span class="kn">import</span> <span class="n">pubsub_v1</span>


<span class="k">def</span> <span class="nf">publish_report_to_pubsub</span><span class="p">(</span>
    <span class="n">project_id</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span>
    <span class="n">topic_id</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span>
    <span class="n">report</span><span class="p">:</span> <span class="nb">dict</span><span class="p">,</span>
<span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">dict</span><span class="p">:</span>
    <span class="s">"""Publish a compliance report as a JSON message to a Pub/Sub topic.

    Args:
        project_id: GCP project ID that owns the topic.
        topic_id: Pub/Sub topic ID (not the full resource name).
        report: The compliance report dict to publish.

    Returns:
        dict with 'status' and 'message_id'.
    """</span>
    <span class="n">publisher</span> <span class="o">=</span> <span class="n">pubsub_v1</span><span class="p">.</span><span class="n">PublisherClient</span><span class="p">()</span>
    <span class="n">topic_path</span> <span class="o">=</span> <span class="n">publisher</span><span class="p">.</span><span class="n">topic_path</span><span class="p">(</span><span class="n">project_id</span><span class="p">,</span> <span class="n">topic_id</span><span class="p">)</span>
    <span class="n">data</span> <span class="o">=</span> <span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">report</span><span class="p">).</span><span class="n">encode</span><span class="p">(</span><span class="s">"utf-8"</span><span class="p">)</span>
    <span class="n">future</span> <span class="o">=</span> <span class="n">publisher</span><span class="p">.</span><span class="n">publish</span><span class="p">(</span><span class="n">topic_path</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">data</span><span class="p">)</span>
    <span class="n">message_id</span> <span class="o">=</span> <span class="n">future</span><span class="p">.</span><span class="n">result</span><span class="p">(</span><span class="n">timeout</span><span class="o">=</span><span class="mi">30</span><span class="p">)</span>
    <span class="k">return</span> <span class="p">{</span><span class="s">"status"</span><span class="p">:</span> <span class="s">"published"</span><span class="p">,</span> <span class="s">"message_id"</span><span class="p">:</span> <span class="n">message_id</span><span class="p">}</span></code></pre></figure>

<h3 id="upload_report_to_gcs">upload_report_to_gcs</h3>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">json</span>
<span class="kn">from</span> <span class="nn">datetime</span> <span class="kn">import</span> <span class="n">datetime</span><span class="p">,</span> <span class="n">timezone</span>
<span class="kn">from</span> <span class="nn">google.cloud</span> <span class="kn">import</span> <span class="n">storage</span>


<span class="k">def</span> <span class="nf">upload_report_to_gcs</span><span class="p">(</span>
    <span class="n">bucket_name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span>
    <span class="n">report</span><span class="p">:</span> <span class="nb">dict</span><span class="p">,</span>
    <span class="n">prefix</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="s">"compliance-reports"</span><span class="p">,</span>
<span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">dict</span><span class="p">:</span>
    <span class="s">"""Upload a compliance report as a JSON object to Cloud Storage.

    Args:
        bucket_name: Name of the GCS bucket.
        report: The compliance report dict to upload.
        prefix: Object path prefix inside the bucket.

    Returns:
        dict with 'status' and 'gcs_uri'.
    """</span>
    <span class="n">client</span> <span class="o">=</span> <span class="n">storage</span><span class="p">.</span><span class="n">Client</span><span class="p">()</span>
    <span class="n">timestamp</span> <span class="o">=</span> <span class="n">datetime</span><span class="p">.</span><span class="n">now</span><span class="p">(</span><span class="n">timezone</span><span class="p">.</span><span class="n">utc</span><span class="p">).</span><span class="n">strftime</span><span class="p">(</span><span class="s">"%Y%m%dT%H%M%SZ"</span><span class="p">)</span>
    <span class="n">blob_name</span> <span class="o">=</span> <span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">prefix</span><span class="si">}</span><span class="s">/</span><span class="si">{</span><span class="n">timestamp</span><span class="si">}</span><span class="s">-report.json"</span>
    <span class="n">bucket</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="n">bucket</span><span class="p">(</span><span class="n">bucket_name</span><span class="p">)</span>
    <span class="n">blob</span> <span class="o">=</span> <span class="n">bucket</span><span class="p">.</span><span class="n">blob</span><span class="p">(</span><span class="n">blob_name</span><span class="p">)</span>
    <span class="n">blob</span><span class="p">.</span><span class="n">upload_from_string</span><span class="p">(</span>
        <span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">report</span><span class="p">,</span> <span class="n">indent</span><span class="o">=</span><span class="mi">2</span><span class="p">),</span>
        <span class="n">content_type</span><span class="o">=</span><span class="s">"application/json"</span><span class="p">,</span>
    <span class="p">)</span>
    <span class="n">gcs_uri</span> <span class="o">=</span> <span class="sa">f</span><span class="s">"gs://</span><span class="si">{</span><span class="n">bucket_name</span><span class="si">}</span><span class="s">/</span><span class="si">{</span><span class="n">blob_name</span><span class="si">}</span><span class="s">"</span>
    <span class="k">return</span> <span class="p">{</span><span class="s">"status"</span><span class="p">:</span> <span class="s">"uploaded"</span><span class="p">,</span> <span class="s">"gcs_uri"</span><span class="p">:</span> <span class="n">gcs_uri</span><span class="p">}</span></code></pre></figure>

<p>The beauty of the ADK tool pattern is that Gemini sees these three functions as a complete toolkit: one for gathering data, two for delivering results. It will call them in whatever order the instruction demands, passing arguments it infers from context — the project ID comes from the user message, the topic ID comes from the same message, the report dict is constructed from its own analysis of the log entries.</p>

<h2 id="defining-the-agent">Defining the agent</h2>

<p>Now that we have the tools, the agent definition is the interesting part. The <code class="language-plaintext highlighter-rouge">instruction</code> field is not just a description — it is the compliance mandate that drives the agent’s multi-step reasoning:</p>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">google.adk.agents</span> <span class="kn">import</span> <span class="n">Agent</span>

<span class="n">COMPLIANCE_INSTRUCTION</span> <span class="o">=</span> <span class="s">"""
You are a GCP compliance auditing agent. Your job is to:
1. Query Cloud Audit Logs for compliance-relevant events.
2. Analyze the retrieved entries for policy violations.
3. Produce a structured JSON report with HIGH/MEDIUM/LOW risk findings.
4. Deliver the report to Pub/Sub or Cloud Storage.

Compliance checks to perform:
- IAM/RBAC mutations: query 'activity' logs for SetIamPolicy and
  CreateServiceAccountKey method names.
- Secret access: query 'data_access' logs filtering on
  service_name='secretmanager.googleapis.com'.
- Auth failures: query 'policy' logs for entries with status_code != 0.

Report format: JSON with two top-level keys:
- 'summary': object with counts keyed by risk level
  (HIGH, MEDIUM, LOW, total_entries_reviewed)
- 'findings': array of objects, each with
  'risk_level', 'category', 'description', 'principal_email',
  'resource_name', 'timestamp'

Risk classification:
- HIGH: CreateServiceAccountKey calls, Policy Denied events from external IPs,
  SetIamPolicy granting roles/owner or roles/editor
- MEDIUM: SetIamPolicy calls that do not match HIGH criteria,
  data access to secrets outside business hours
- LOW: all other audit events surfaced during the checks

Always complete all three compliance checks before building the report.
Deliver the report using the available delivery tool based on user instructions.
"""</span>

<span class="n">root_agent</span> <span class="o">=</span> <span class="n">Agent</span><span class="p">(</span>
    <span class="n">name</span><span class="o">=</span><span class="s">"gcp_compliance_reporter"</span><span class="p">,</span>
    <span class="n">model</span><span class="o">=</span><span class="s">"gemini-2.5-flash"</span><span class="p">,</span>
    <span class="n">description</span><span class="o">=</span><span class="s">"Queries GCP Cloud Audit Logs and produces structured compliance reports."</span><span class="p">,</span>
    <span class="n">instruction</span><span class="o">=</span><span class="n">COMPLIANCE_INSTRUCTION</span><span class="p">,</span>
    <span class="n">tools</span><span class="o">=</span><span class="p">[</span><span class="n">query_audit_logs</span><span class="p">,</span> <span class="n">publish_report_to_pubsub</span><span class="p">,</span> <span class="n">upload_report_to_gcs</span><span class="p">],</span>
<span class="p">)</span></code></pre></figure>

<p>A few things to notice here. The instruction specifies the sequence of checks explicitly — IAM mutations, then secret access, then auth failures — because Gemini will follow the ordering when it reasons about what to do next. The risk classification rules are concrete enough that the model can apply them consistently across runs. And the delivery instruction is left conditional (“based on user instructions”) so the runner message controls where the report goes without changing the agent definition.</p>

<p><code class="language-plaintext highlighter-rouge">gemini-2.5-flash</code> is the right model choice for this workload. It handles long context windows efficiently, which matters when you are passing hundreds of log entries into the reasoning loop. It is also the fastest Gemini model at the time of writing, which keeps the per-run latency reasonable for a scheduled compliance job.</p>

<h2 id="running-the-agent">Running the agent</h2>

<p>The runner wires the agent to a session and drives the async event loop. <code class="language-plaintext highlighter-rouge">InMemoryRunner</code> is suitable for development and single-instance deployments:</p>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">google.adk.runners</span> <span class="kn">import</span> <span class="n">InMemoryRunner</span>
<span class="kn">from</span> <span class="nn">google.genai</span> <span class="kn">import</span> <span class="n">types</span>
<span class="kn">import</span> <span class="nn">asyncio</span>


<span class="k">async</span> <span class="k">def</span> <span class="nf">run_compliance_check</span><span class="p">(</span>
    <span class="n">project_id</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span>
    <span class="n">pubsub_topic_id</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="bp">None</span><span class="p">,</span>
    <span class="n">gcs_bucket</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="bp">None</span><span class="p">,</span>
<span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
    <span class="n">runner</span> <span class="o">=</span> <span class="n">InMemoryRunner</span><span class="p">(</span>
        <span class="n">agent</span><span class="o">=</span><span class="n">root_agent</span><span class="p">,</span>
        <span class="n">app_name</span><span class="o">=</span><span class="s">"gcp_compliance_reporter"</span><span class="p">,</span>
    <span class="p">)</span>
    <span class="n">session</span> <span class="o">=</span> <span class="k">await</span> <span class="n">runner</span><span class="p">.</span><span class="n">session_service</span><span class="p">.</span><span class="n">create_session</span><span class="p">(</span>
        <span class="n">app_name</span><span class="o">=</span><span class="s">"gcp_compliance_reporter"</span><span class="p">,</span>
        <span class="n">user_id</span><span class="o">=</span><span class="s">"scheduler"</span><span class="p">,</span>
    <span class="p">)</span>

    <span class="n">delivery_instruction</span> <span class="o">=</span> <span class="s">""</span>
    <span class="k">if</span> <span class="n">pubsub_topic_id</span><span class="p">:</span>
        <span class="n">delivery_instruction</span> <span class="o">=</span> <span class="sa">f</span><span class="s">" Publish to Pub/Sub topic: </span><span class="si">{</span><span class="n">pubsub_topic_id</span><span class="si">}</span><span class="s">."</span>
    <span class="k">elif</span> <span class="n">gcs_bucket</span><span class="p">:</span>
        <span class="n">delivery_instruction</span> <span class="o">=</span> <span class="sa">f</span><span class="s">" Upload to GCS bucket: </span><span class="si">{</span><span class="n">gcs_bucket</span><span class="si">}</span><span class="s">."</span>

    <span class="n">message</span> <span class="o">=</span> <span class="n">types</span><span class="p">.</span><span class="n">Content</span><span class="p">(</span>
        <span class="n">role</span><span class="o">=</span><span class="s">"user"</span><span class="p">,</span>
        <span class="n">parts</span><span class="o">=</span><span class="p">[</span><span class="n">types</span><span class="p">.</span><span class="n">Part</span><span class="p">.</span><span class="n">from_text</span><span class="p">(</span>
            <span class="n">text</span><span class="o">=</span><span class="p">(</span>
                <span class="sa">f</span><span class="s">"Run a full compliance audit for project </span><span class="si">{</span><span class="n">project_id</span><span class="si">}</span><span class="s"> "</span>
                <span class="sa">f</span><span class="s">"covering the last 24 hours."</span>
                <span class="o">+</span> <span class="n">delivery_instruction</span>
            <span class="p">)</span>
        <span class="p">)],</span>
    <span class="p">)</span>

    <span class="k">async</span> <span class="k">for</span> <span class="n">event</span> <span class="ow">in</span> <span class="n">runner</span><span class="p">.</span><span class="n">run_async</span><span class="p">(</span>
        <span class="n">user_id</span><span class="o">=</span><span class="s">"scheduler"</span><span class="p">,</span>
        <span class="n">session_id</span><span class="o">=</span><span class="n">session</span><span class="p">.</span><span class="nb">id</span><span class="p">,</span>
        <span class="n">new_message</span><span class="o">=</span><span class="n">message</span><span class="p">,</span>
    <span class="p">):</span>
        <span class="k">if</span> <span class="n">event</span><span class="p">.</span><span class="n">is_final_response</span><span class="p">()</span> <span class="ow">and</span> <span class="n">event</span><span class="p">.</span><span class="n">content</span> <span class="ow">and</span> <span class="n">event</span><span class="p">.</span><span class="n">content</span><span class="p">.</span><span class="n">parts</span><span class="p">:</span>
            <span class="k">return</span> <span class="n">event</span><span class="p">.</span><span class="n">content</span><span class="p">.</span><span class="n">parts</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">text</span>

    <span class="k">return</span> <span class="s">""</span>


<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">"__main__"</span><span class="p">:</span>
    <span class="kn">import</span> <span class="nn">os</span>
    <span class="n">result</span> <span class="o">=</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">run</span><span class="p">(</span>
        <span class="n">run_compliance_check</span><span class="p">(</span>
            <span class="n">project_id</span><span class="o">=</span><span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="s">"GCP_PROJECT_ID"</span><span class="p">],</span>
            <span class="n">pubsub_topic_id</span><span class="o">=</span><span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">"PUBSUB_TOPIC_ID"</span><span class="p">),</span>
            <span class="n">gcs_bucket</span><span class="o">=</span><span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">"GCS_BUCKET"</span><span class="p">),</span>
        <span class="p">)</span>
    <span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span></code></pre></figure>

<p><code class="language-plaintext highlighter-rouge">runner.run_async</code> returns an async iterator of events. Most events are intermediate — tool call requests, tool call results, model tokens. <code class="language-plaintext highlighter-rouge">event.is_final_response()</code> is true only on the last event, which carries the agent’s final text output. For a compliance reporter this is either a confirmation that the report was delivered, or an error explanation if something failed.</p>

<p>To test locally before containerizing:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">export </span><span class="nv">GCP_PROJECT_ID</span><span class="o">=</span>your-project-id
<span class="nb">export </span><span class="nv">PUBSUB_TOPIC_ID</span><span class="o">=</span>compliance-reports
<span class="nb">export </span><span class="nv">GOOGLE_CLOUD_PROJECT</span><span class="o">=</span>your-project-id
python main.py</code></pre></figure>

<p>Application Default Credentials handle authentication locally. The Cloud Logging and Pub/Sub clients pick up ADC automatically — no API key or service account JSON file needed in development.</p>

<h2 id="testing-and-validation">Testing and Validation</h2>

<p>Before deploying to Cloud Run, validate that each tool works in isolation and that the agent’s reasoning produces the expected report structure.</p>

<h3 id="testing-tool-functions-directly">Testing tool functions directly</h3>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">asyncio</span>
<span class="kn">from</span> <span class="nn">main</span> <span class="kn">import</span> <span class="n">query_audit_logs</span><span class="p">,</span> <span class="n">publish_report_to_pubsub</span>

<span class="c1"># Test log query — should return a dict with 'status' and 'entries'
</span><span class="n">result</span> <span class="o">=</span> <span class="n">query_audit_logs</span><span class="p">(</span>
    <span class="n">project_id</span><span class="o">=</span><span class="s">"your-project-id"</span><span class="p">,</span>
    <span class="n">log_type</span><span class="o">=</span><span class="s">"activity"</span><span class="p">,</span>
    <span class="n">hours_back</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
    <span class="n">max_results</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"entry_count: </span><span class="si">{</span><span class="n">result</span><span class="p">[</span><span class="s">'entry_count'</span><span class="p">]</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">result</span><span class="p">[</span><span class="s">"entries"</span><span class="p">][:</span><span class="mi">3</span><span class="p">]:</span>
    <span class="k">print</span><span class="p">(</span><span class="n">entry</span><span class="p">)</span></code></pre></figure>

<p>If <code class="language-plaintext highlighter-rouge">entry_count</code> is 0 for <code class="language-plaintext highlighter-rouge">activity</code> logs, either no admin activity occurred in the time window or the service account running the script lacks <code class="language-plaintext highlighter-rouge">roles/logging.viewer</code>. If <code class="language-plaintext highlighter-rouge">data_access</code> queries consistently return 0 entries, Data Access logs are almost certainly not enabled for the target service.</p>

<h3 id="inspecting-the-agents-reasoning-trace">Inspecting the agent’s reasoning trace</h3>

<p>ADK events expose the intermediate reasoning steps. Add a loop that prints every event to see the full tool call sequence:</p>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">async</span> <span class="k">def</span> <span class="nf">run_with_trace</span><span class="p">(</span><span class="n">project_id</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="n">runner</span> <span class="o">=</span> <span class="n">InMemoryRunner</span><span class="p">(</span><span class="n">agent</span><span class="o">=</span><span class="n">root_agent</span><span class="p">,</span> <span class="n">app_name</span><span class="o">=</span><span class="s">"gcp_compliance_reporter"</span><span class="p">)</span>
    <span class="n">session</span> <span class="o">=</span> <span class="k">await</span> <span class="n">runner</span><span class="p">.</span><span class="n">session_service</span><span class="p">.</span><span class="n">create_session</span><span class="p">(</span>
        <span class="n">app_name</span><span class="o">=</span><span class="s">"gcp_compliance_reporter"</span><span class="p">,</span> <span class="n">user_id</span><span class="o">=</span><span class="s">"debug"</span>
    <span class="p">)</span>
    <span class="n">message</span> <span class="o">=</span> <span class="n">types</span><span class="p">.</span><span class="n">Content</span><span class="p">(</span>
        <span class="n">role</span><span class="o">=</span><span class="s">"user"</span><span class="p">,</span>
        <span class="n">parts</span><span class="o">=</span><span class="p">[</span><span class="n">types</span><span class="p">.</span><span class="n">Part</span><span class="p">.</span><span class="n">from_text</span><span class="p">(</span>
            <span class="n">text</span><span class="o">=</span><span class="sa">f</span><span class="s">"Run a full compliance audit for project </span><span class="si">{</span><span class="n">project_id</span><span class="si">}</span><span class="s"> "</span>
                 <span class="sa">f</span><span class="s">"covering the last 1 hour."</span>
        <span class="p">)],</span>
    <span class="p">)</span>
    <span class="k">async</span> <span class="k">for</span> <span class="n">event</span> <span class="ow">in</span> <span class="n">runner</span><span class="p">.</span><span class="n">run_async</span><span class="p">(</span>
        <span class="n">user_id</span><span class="o">=</span><span class="s">"debug"</span><span class="p">,</span> <span class="n">session_id</span><span class="o">=</span><span class="n">session</span><span class="p">.</span><span class="nb">id</span><span class="p">,</span> <span class="n">new_message</span><span class="o">=</span><span class="n">message</span>
    <span class="p">):</span>
        <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"[</span><span class="si">{</span><span class="n">event</span><span class="p">.</span><span class="n">__class__</span><span class="p">.</span><span class="n">__name__</span><span class="si">}</span><span class="s">] is_final=</span><span class="si">{</span><span class="n">event</span><span class="p">.</span><span class="n">is_final_response</span><span class="p">()</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
        <span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">event</span><span class="p">,</span> <span class="s">"content"</span><span class="p">)</span> <span class="ow">and</span> <span class="n">event</span><span class="p">.</span><span class="n">content</span><span class="p">:</span>
            <span class="k">for</span> <span class="n">part</span> <span class="ow">in</span> <span class="n">event</span><span class="p">.</span><span class="n">content</span><span class="p">.</span><span class="n">parts</span><span class="p">:</span>
                <span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">part</span><span class="p">,</span> <span class="s">"text"</span><span class="p">)</span> <span class="ow">and</span> <span class="n">part</span><span class="p">.</span><span class="n">text</span><span class="p">:</span>
                    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"  text: </span><span class="si">{</span><span class="n">part</span><span class="p">.</span><span class="n">text</span><span class="p">[</span><span class="si">:</span><span class="mi">200</span><span class="p">]</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
                <span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">part</span><span class="p">,</span> <span class="s">"function_call"</span><span class="p">)</span> <span class="ow">and</span> <span class="n">part</span><span class="p">.</span><span class="n">function_call</span><span class="p">:</span>
                    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"  tool_call: </span><span class="si">{</span><span class="n">part</span><span class="p">.</span><span class="n">function_call</span><span class="p">.</span><span class="n">name</span><span class="si">}</span><span class="s">(</span><span class="si">{</span><span class="n">part</span><span class="p">.</span><span class="n">function_call</span><span class="p">.</span><span class="n">args</span><span class="si">}</span><span class="s">)"</span><span class="p">)</span></code></pre></figure>

<p>Running this against a project with recent audit activity shows you the exact sequence of tool calls Gemini chose — which log types it queried, in what order, and what arguments it passed. This is the fastest way to catch instruction ambiguities before they appear in a production report.</p>

<h2 id="deploying-to-cloud-run-and-cloud-scheduler">Deploying to Cloud Run and Cloud Scheduler</h2>

<p>Package the agent as a Cloud Run job. Jobs are the right Cloud Run primitive for batch workloads: they run to completion, exit cleanly, and integrate with Cloud Scheduler for recurring execution.</p>

<p>Create a <code class="language-plaintext highlighter-rouge">Dockerfile</code> at the project root:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">cat</span> <span class="o">&gt;</span> Dockerfile <span class="o">&lt;&lt;</span> <span class="sh">'</span><span class="no">EOF</span><span class="sh">'
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "main.py"]
EOF</span></code></pre></figure>

<p>And a <code class="language-plaintext highlighter-rouge">requirements.txt</code>:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">cat</span> <span class="o">&gt;</span> requirements.txt <span class="o">&lt;&lt;</span> <span class="sh">'</span><span class="no">EOF</span><span class="sh">'
google-adk&gt;=1.28.0
google-cloud-logging&gt;=3.10.0
google-cloud-pubsub&gt;=2.21.0
google-cloud-storage&gt;=2.17.0
EOF</span></code></pre></figure>

<p>Build and push the container image, then create the Cloud Run job:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">export </span><span class="nv">PROJECT_ID</span><span class="o">=</span>your-project-id
<span class="nb">export </span><span class="nv">REGION</span><span class="o">=</span>us-central1

<span class="c"># Build and push</span>
gcloud builds submit <span class="se">\</span>
  <span class="nt">--tag</span><span class="o">=</span><span class="s2">"gcr.io/</span><span class="k">${</span><span class="nv">PROJECT_ID</span><span class="k">}</span><span class="s2">/compliance-reporter:latest"</span> <span class="se">\</span>
  <span class="nt">--project</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">PROJECT_ID</span><span class="k">}</span><span class="s2">"</span>

<span class="c"># Create the Cloud Run job</span>
gcloud run <span class="nb">jobs </span>create compliance-reporter <span class="se">\</span>
  <span class="nt">--image</span><span class="o">=</span><span class="s2">"gcr.io/</span><span class="k">${</span><span class="nv">PROJECT_ID</span><span class="k">}</span><span class="s2">/compliance-reporter:latest"</span> <span class="se">\</span>
  <span class="nt">--service-account</span><span class="o">=</span><span class="s2">"compliance-reporter-sa@</span><span class="k">${</span><span class="nv">PROJECT_ID</span><span class="k">}</span><span class="s2">.iam.gserviceaccount.com"</span> <span class="se">\</span>
  <span class="nt">--set-env-vars</span><span class="o">=</span><span class="s2">"GCP_PROJECT_ID=</span><span class="k">${</span><span class="nv">PROJECT_ID</span><span class="k">}</span><span class="s2">,PUBSUB_TOPIC_ID=compliance-reports,GOOGLE_GENAI_USE_VERTEXAI=true,GOOGLE_CLOUD_LOCATION=</span><span class="k">${</span><span class="nv">REGION</span><span class="k">}</span><span class="s2">"</span> <span class="se">\</span>
  <span class="nt">--region</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">REGION</span><span class="k">}</span><span class="s2">"</span> <span class="se">\</span>
  <span class="nt">--project</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">PROJECT_ID</span><span class="k">}</span><span class="s2">"</span></code></pre></figure>

<p><code class="language-plaintext highlighter-rouge">GOOGLE_GENAI_USE_VERTEXAI=true</code> switches the ADK backend from the Gemini Developer API (API key) to Vertex AI (ADC). In Cloud Run the job’s service account identity is used automatically — no API key, no secret management overhead for the model credentials.</p>

<p>Run a manual execution to validate the deployment:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash">gcloud run <span class="nb">jobs </span>execute compliance-reporter <span class="se">\</span>
  <span class="nt">--region</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">REGION</span><span class="k">}</span><span class="s2">"</span> <span class="se">\</span>
  <span class="nt">--project</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">PROJECT_ID</span><span class="k">}</span><span class="s2">"</span>

<span class="c"># Stream the logs</span>
gcloud run <span class="nb">jobs </span>executions list <span class="se">\</span>
  <span class="nt">--job</span><span class="o">=</span>compliance-reporter <span class="se">\</span>
  <span class="nt">--region</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">REGION</span><span class="k">}</span><span class="s2">"</span> <span class="se">\</span>
  <span class="nt">--project</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">PROJECT_ID</span><span class="k">}</span><span class="s2">"</span></code></pre></figure>

<p>Once the manual execution completes successfully, create the Cloud Scheduler job for daily execution:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash">gcloud scheduler <span class="nb">jobs </span>create http compliance-reporter-daily <span class="se">\</span>
  <span class="nt">--location</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">REGION</span><span class="k">}</span><span class="s2">"</span> <span class="se">\</span>
  <span class="nt">--schedule</span><span class="o">=</span><span class="s2">"0 6 * * *"</span> <span class="se">\</span>
  <span class="nt">--uri</span><span class="o">=</span><span class="s2">"https://</span><span class="k">${</span><span class="nv">REGION</span><span class="k">}</span><span class="s2">-run.googleapis.com/apis/run.googleapis.com/v1/namespaces/</span><span class="k">${</span><span class="nv">PROJECT_ID</span><span class="k">}</span><span class="s2">/jobs/compliance-reporter:run"</span> <span class="se">\</span>
  <span class="nt">--oauth-service-account-email</span><span class="o">=</span><span class="s2">"compliance-reporter-sa@</span><span class="k">${</span><span class="nv">PROJECT_ID</span><span class="k">}</span><span class="s2">.iam.gserviceaccount.com"</span> <span class="se">\</span>
  <span class="nt">--project</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">PROJECT_ID</span><span class="k">}</span><span class="s2">"</span></code></pre></figure>

<p>The schedule <code class="language-plaintext highlighter-rouge">0 6 * * *</code> runs at 06:00 UTC daily. Adjust to suit your team’s working hours — running it before the business day starts means the report is waiting in Pub/Sub or GCS when people begin work. The service account needs <code class="language-plaintext highlighter-rouge">roles/run.invoker</code> to trigger the job via the Cloud Run API:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash">gcloud projects add-iam-policy-binding <span class="s2">"</span><span class="k">${</span><span class="nv">PROJECT_ID</span><span class="k">}</span><span class="s2">"</span> <span class="se">\</span>
  <span class="nt">--member</span><span class="o">=</span><span class="s2">"serviceAccount:compliance-reporter-sa@</span><span class="k">${</span><span class="nv">PROJECT_ID</span><span class="k">}</span><span class="s2">.iam.gserviceaccount.com"</span> <span class="se">\</span>
  <span class="nt">--role</span><span class="o">=</span><span class="s2">"roles/run.invoker"</span></code></pre></figure>

<h2 id="best-practices">Best Practices</h2>

<p><strong>Data Access logs are disabled by default — and that is your biggest blind spot.</strong> Admin Activity and Policy Denied logs are always on, but if you are not explicitly enabling Data Access logs for Secret Manager, KMS, and BigQuery, you have no record of secret reads, key decryption operations, or data queries. The compliance reporter will query for those events and return 0 entries, which in a report looks exactly the same as genuine zero-activity. Enable Data Access logs for your high-value services and document which services have coverage. A finding of “0 data access events for secretmanager.googleapis.com” is only meaningful if you know logging is on.</p>

<p><strong>Do not use <code class="language-plaintext highlighter-rouge">InMemoryRunner</code> for multi-instance deployments.</strong> <code class="language-plaintext highlighter-rouge">InMemoryRunner</code> stores session state in process memory. If you scale the Cloud Run job to more than one instance, or if the job is retried after a failure, sessions are not shared across instances and state is lost on restart. For production use a persistent session service backed by Firestore or Cloud SQL. ADK’s session service interface is pluggable; swapping the backend is a constructor argument change.</p>

<p><strong>Pin <code class="language-plaintext highlighter-rouge">google-adk&gt;=1.28.0</code> in your requirements.</strong> ADK 1.28.0 included a fix for a prompt injection vulnerability in tool docstring handling. Pinning to this version or later ensures that a malicious string in an audit log entry cannot manipulate the tool schema seen by Gemini. This is particularly relevant for compliance workloads where the agent processes untrusted log data as part of its reasoning context.</p>

<p><strong>Use Vertex AI with Application Default Credentials in production, not API keys.</strong> The <code class="language-plaintext highlighter-rouge">GOOGLE_GENAI_USE_VERTEXAI=true</code> environment variable routes model calls through Vertex AI, which uses the service account’s ADC identity rather than a static API key. This means no secret to rotate, no risk of the key being logged, and IAM-based access control over which identities can invoke the Gemini models. On Cloud Run this is zero-configuration — the job’s service account identity is used automatically.</p>

<p><strong>Scope the service account to minimum roles.</strong> The compliance reporter needs <code class="language-plaintext highlighter-rouge">roles/logging.viewer</code> to read audit logs, <code class="language-plaintext highlighter-rouge">roles/pubsub.publisher</code> to publish reports, and <code class="language-plaintext highlighter-rouge">roles/storage.objectUser</code> to write to GCS. It does not need any broader project-level permissions, and it does not need any IAM mutation capabilities. A service account with owner or editor permissions running an agent that processes audit log data is a significant risk surface — if the agent’s reasoning is manipulated, it could take destructive actions. Keep the service account scoped to exactly what the delivery tools require.</p>

<h2 id="conclusion">Conclusion</h2>

<p>What we built is an autonomous compliance reporter that replaces an afternoon of manual log review with a scheduled agent run. Cloud Audit Logs provide the raw evidence — Admin Activity for IAM mutations, Data Access for secret operations, Policy Denied for auth failures. ADK connects Gemini’s reasoning to plain Python tool functions, letting the agent drive the query sequence, analyze the results, classify findings by risk level, and deliver a structured JSON report to Pub/Sub or GCS. Cloud Run jobs and Cloud Scheduler handle the operational side: containerized, daily, no console required.</p>

<p>The agent instruction is where the compliance logic lives, which means extending coverage is a matter of adding new checks to the instruction and a new tool if the data source requires one. Adding KMS key usage analysis or VPC firewall mutation detection follows the same pattern: describe the check, describe the risk classification, add it to the instruction. The delivery infrastructure does not change.</p>

<p>Happy scripting!</p>]]></content><author><name>Victor Silva</name></author><category term="GCP" /><category term="gcp" /><category term="google-adk" /><category term="agent-development-kit" /><category term="gcp-adk-compliance" /><category term="cloud-audit-logs" /><category term="gemini-2.5" /><category term="python" /><category term="cloud-run" /><category term="gcp-compliance" /><category term="iam-security" /><summary type="html"><![CDATA[GCP ADK compliance reporter: query Cloud Audit Logs, classify IAM and secret access findings by risk level, and publish JSON reports to Pub/Sub or GCS.]]></summary></entry><entry><title type="html">Azure Private DNS zone fallback to internet</title><link href="https://blog.victorsilva.com.uy/az-private-dns-fallback/" rel="alternate" type="text/html" title="Azure Private DNS zone fallback to internet" /><published>2025-09-28T21:34:27+00:00</published><updated>2025-09-28T21:34:27+00:00</updated><id>https://blog.victorsilva.com.uy/az-private-dns-fallback</id><content type="html" xml:base="https://blog.victorsilva.com.uy/az-private-dns-fallback/"><![CDATA[<p>When working with Azure Private Endpoints across multiple regions, you’ve likely encountered a common problem: how do you access a resource with a private endpoint from a different region that isn’t interconnected with your current virtual network? Microsoft’s offer the “Fallback to Internet” feature for Azure Private DNS zones solves this challenge elegantly.</p>

<h2 id="understanding-the-challenge">Understanding the Challenge</h2>
<p>Private Endpoints provide secure, private connectivity to Azure services by mapping them to private IP addresses within your virtual network. This works seamlessly within a single region or interconnected networks. However, in multi-region scenarios with isolated networks, DNS resolution fails when trying to access a private endpoint from a different region.</p>

<h3 id="how-private-endpoint-dns-resolution-works">How Private Endpoint DNS Resolution Works</h3>
<p>When you create a Private Endpoint for an Azure resource (like a Key Vault or Storage Account), the DNS resolution flow typically works like this:</p>

<ol>
  <li>A DNS query for resource-name.vault.azure.net reaches Azure’s DNS service</li>
  <li>The query resolves to a CNAME: resource-name.privatelink.vaultcore.azure.net</li>
  <li>The Private DNS zone resolves this to the private IP address (e.g., 10.0.0.7)</li>
  <li>The client receives the private IP and connects through the private endpoint</li>
</ol>

<p>This works perfectly when the client and the private endpoint are in the same region or connected networks. But what happens when they’re not?</p>

<h2 id="the-problem-isolated-multi-region-architectures">The Problem: Isolated Multi-Region Architectures</h2>
<p>Consider this scenario:</p>

<ul>
  <li>Region A has a virtual network with a Private DNS zone for Key Vault</li>
  <li>Region B has a separate virtual network with its own Private DNS zone</li>
  <li>A VM in Region B needs to access a Key Vault in Region A that’s behind a private endpoint</li>
  <li>The networks are not interconnected (due to security policies, overlapping IP ranges, or architectural decisions)</li>
</ul>

<p>Without Fallback to Internet, the DNS resolution in Region B fails because:</p>

<ul>
  <li>The query reaches the Private DNS zone in Region B</li>
  <li>No record exists for the Key Vault in Region A</li>
  <li>The DNS query returns empty (NXDOMAIN)</li>
  <li>Access is blocked</li>
</ul>

<p>Previously, you’d need to implement complex solutions like cross-region VNet peering or custom DNS forwarding. The Fallback to Internet feature provides a much simpler alternative.</p>

<h2 id="introducing-fallback-to-internet">Introducing Fallback to Internet</h2>
<p>The Fallback to Internet feature adds a new DNS resolution policy: NxDomainRedirect. When enabled on a Virtual Network Link in your Private DNS zone, it changes the behavior when a DNS query doesn’t find a match:
Without Fallback: DNS query fails → NXDOMAIN error → Access denied
With Fallback: DNS query fails → Fallback to public DNS → Resolves to public endpoint → Access via internet (if allowed by firewall)
This allows you to:</p>

<ul>
  <li>Keep private endpoint connectivity for resources in the same region</li>
  <li>Allow public endpoint access (via firewall rules) for cross-region scenarios</li>
  <li>Avoid complex network peering infrastructure</li>
</ul>

<h2 id="real-world-use-case">Real-World Use Case</h2>
<p>Let’s say you have:</p>

<p>A centralized Key Vault in Region A with sensitive secrets
Application VMs in multiple isolated regions (B, C, D) that need occasional access
Security requirements that prevent full network interconnection</p>

<p>With Fallback to Internet:</p>

<p>Configure Private Endpoint for the Key Vault in Region A
Enable Fallback to Internet on Private DNS zones in Regions B, C, D
Whitelist the public IP/NAT Gateway IPs from Regions B, C, D on the Key Vault firewall
VMs in Region A access via private endpoint (secure, no internet)
VMs in other regions access via public endpoint (controlled by firewall)</p>

<p>Implementation with Terraform
Let’s implement a simple example using Terraform with the AzAPI provider (since the AzureRM provider doesn’t support this feature yet).</p>

<p>Prerequisites</p>

<figure class="highlight"><pre><code class="language-terraform" data-lang="terraform"><span class="k">terraform</span> <span class="p">{</span>
  <span class="nx">required_providers</span> <span class="p">{</span>
    <span class="nx">azurerm</span> <span class="p">=</span> <span class="p">{</span>
      <span class="nx">source</span>  <span class="p">=</span> <span class="s2">"hashicorp/azurerm"</span>
      <span class="nx">version</span> <span class="p">=</span> <span class="s2">"~&gt; 3.0"</span>
    <span class="p">}</span>
    <span class="nx">azapi</span> <span class="p">=</span> <span class="p">{</span>
      <span class="nx">source</span>  <span class="p">=</span> <span class="s2">"azure/azapi"</span>
      <span class="nx">version</span> <span class="p">=</span> <span class="s2">"~&gt; 1.0"</span>
    <span class="p">}</span>
    <span class="nx">random</span> <span class="p">=</span> <span class="p">{</span>
      <span class="nx">source</span>  <span class="p">=</span> <span class="s2">"hashicorp/random"</span>
      <span class="nx">version</span> <span class="p">=</span> <span class="s2">"~&gt; 3.0"</span>
    <span class="p">}</span>
  <span class="p">}</span>
<span class="p">}</span>

<span class="k">provider</span> <span class="s2">"azurerm"</span> <span class="p">{</span>
  <span class="nx">features</span> <span class="p">{}</span>
<span class="p">}</span>

<span class="k">provider</span> <span class="s2">"azapi"</span> <span class="p">{}</span></code></pre></figure>

<p>Step 1: Create the Private DNS Zone</p>

<figure class="highlight"><pre><code class="language-terraform" data-lang="terraform"><span class="k">resource</span> <span class="s2">"azurerm_private_dns_zone"</span> <span class="s2">"keyvault"</span> <span class="p">{</span>
  <span class="nx">name</span>                <span class="p">=</span> <span class="s2">"privatelink.vaultcore.azure.net"</span>
  <span class="nx">resource_group_name</span> <span class="p">=</span> <span class="nx">azurerm_resource_group</span><span class="p">.</span><span class="nx">main</span><span class="p">.</span><span class="nx">name</span>
<span class="p">}</span></code></pre></figure>

<p>Step 2: Create Virtual Network Link with Fallback Enabled
Here’s where we use the AzAPI provider to enable the Fallback to Internet feature:</p>

<figure class="highlight"><pre><code class="language-terraform" data-lang="terraform"><span class="s2">"azapi_resource"</span> <span class="s2">"vnet_link"</span> <span class="p">{</span>
  <span class="nx">type</span>      <span class="p">=</span> <span class="s2">"Microsoft.Network/privateDnsZones/virtualNetworkLinks@2024-06-01"</span>
  <span class="nx">name</span>      <span class="p">=</span> <span class="s2">"vnet-link-with-fallback"</span>
  <span class="nx">parent_id</span> <span class="p">=</span> <span class="nx">azurerm_private_dns_zone</span><span class="p">.</span><span class="nx">keyvault</span><span class="p">.</span><span class="nx">id</span>
  <span class="nx">location</span>  <span class="p">=</span> <span class="s2">"global"</span>

  <span class="nx">body</span> <span class="p">=</span> <span class="nx">jsonencode</span><span class="p">({</span>
    <span class="nx">properties</span> <span class="p">=</span> <span class="p">{</span>
      <span class="nx">registrationEnabled</span> <span class="p">=</span> <span class="kc">false</span>
      <span class="nx">resolutionPolicy</span>    <span class="p">=</span> <span class="s2">"NxDomainRedirect"</span>  <span class="c1"># Enable fallback</span>
      <span class="nx">virtualNetwork</span> <span class="p">=</span> <span class="p">{</span>
        <span class="nx">id</span> <span class="p">=</span> <span class="nx">azurerm_virtual_network</span><span class="p">.</span><span class="nx">main</span><span class="p">.</span><span class="nx">id</span>
      <span class="p">}</span>
    <span class="p">}</span>
  <span class="p">})</span>
<span class="p">}</span></code></pre></figure>

<p>Step 3: Create the Private Endpoint</p>

<figure class="highlight"><pre><code class="language-terraform" data-lang="terraform"><span class="s2">"azurerm_private_endpoint"</span> <span class="s2">"keyvault"</span> <span class="p">{</span>
  <span class="nx">name</span>                <span class="p">=</span> <span class="s2">"pe-keyvault"</span>
  <span class="nx">location</span>            <span class="p">=</span> <span class="nx">azurerm_resource_group</span><span class="p">.</span><span class="nx">main</span><span class="p">.</span><span class="nx">location</span>
  <span class="nx">resource_group_name</span> <span class="p">=</span> <span class="nx">azurerm_resource_group</span><span class="p">.</span><span class="nx">main</span><span class="p">.</span><span class="nx">name</span>
  <span class="nx">subnet_id</span>           <span class="p">=</span> <span class="nx">azurerm_subnet</span><span class="p">.</span><span class="nx">private_endpoints</span><span class="p">.</span><span class="nx">id</span>

  <span class="nx">private_service_connection</span> <span class="p">{</span>
    <span class="nx">name</span>                           <span class="p">=</span> <span class="s2">"psc-keyvault"</span>
    <span class="nx">private_connection_resource_id</span> <span class="p">=</span> <span class="nx">azurerm_key_vault</span><span class="p">.</span><span class="nx">main</span><span class="p">.</span><span class="nx">id</span>
    <span class="nx">is_manual_connection</span>           <span class="p">=</span> <span class="kc">false</span>
    <span class="nx">subresource_names</span>              <span class="p">=</span> <span class="p">[</span><span class="s2">"vault"</span><span class="p">]</span>
  <span class="p">}</span>

  <span class="nx">private_dns_zone_group</span> <span class="p">{</span>
    <span class="nx">name</span>                 <span class="p">=</span> <span class="s2">"default"</span>
    <span class="nx">private_dns_zone_ids</span> <span class="p">=</span> <span class="p">[</span><span class="nx">azurerm_private_dns_zone</span><span class="p">.</span><span class="nx">keyvault</span><span class="p">.</span><span class="nx">id</span><span class="p">]</span>
  <span class="p">}</span>
<span class="p">}</span></code></pre></figure>

<h3 id="complete-example">Complete Example</h3>
<p>Here’s a minimal working example:</p>

<figure class="highlight"><pre><code class="language-terraform" data-lang="terraform"><span class="c1"># Resource Group</span>
<span class="k">resource</span> <span class="s2">"azurerm_resource_group"</span> <span class="s2">"main"</span> <span class="p">{</span>
  <span class="nx">name</span>     <span class="p">=</span> <span class="s2">"rg-dns-fallback-demo"</span>
  <span class="nx">location</span> <span class="p">=</span> <span class="s2">"East US"</span>
<span class="p">}</span>

<span class="c1"># Virtual Network</span>
<span class="k">resource</span> <span class="s2">"azurerm_virtual_network"</span> <span class="s2">"main"</span> <span class="p">{</span>
  <span class="nx">name</span>                <span class="p">=</span> <span class="s2">"vnet-demo"</span>
  <span class="nx">address_space</span>       <span class="p">=</span> <span class="p">[</span><span class="s2">"10.0.0.0/16"</span><span class="p">]</span>
  <span class="nx">location</span>            <span class="p">=</span> <span class="nx">azurerm_resource_group</span><span class="p">.</span><span class="nx">main</span><span class="p">.</span><span class="nx">location</span>
  <span class="nx">resource_group_name</span> <span class="p">=</span> <span class="nx">azurerm_resource_group</span><span class="p">.</span><span class="nx">main</span><span class="p">.</span><span class="nx">name</span>
<span class="p">}</span>

<span class="c1"># Subnet for Private Endpoints</span>
<span class="k">resource</span> <span class="s2">"azurerm_subnet"</span> <span class="s2">"private_endpoints"</span> <span class="p">{</span>
  <span class="nx">name</span>                 <span class="p">=</span> <span class="s2">"snet-private-endpoints"</span>
  <span class="nx">resource_group_name</span>  <span class="p">=</span> <span class="nx">azurerm_resource_group</span><span class="p">.</span><span class="nx">main</span><span class="p">.</span><span class="nx">name</span>
  <span class="nx">virtual_network_name</span> <span class="p">=</span> <span class="nx">azurerm_virtual_network</span><span class="p">.</span><span class="nx">main</span><span class="p">.</span><span class="nx">name</span>
  <span class="nx">address_prefixes</span>     <span class="p">=</span> <span class="p">[</span><span class="s2">"10.0.1.0/24"</span><span class="p">]</span>
<span class="p">}</span>

<span class="c1"># Key Vault</span>
<span class="k">resource</span> <span class="s2">"azurerm_key_vault"</span> <span class="s2">"main"</span> <span class="p">{</span>
  <span class="nx">name</span>                       <span class="p">=</span> <span class="s2">"kv-demo-</span><span class="k">${</span><span class="nx">random_string</span><span class="p">.</span><span class="nx">suffix</span><span class="p">.</span><span class="nx">result</span><span class="k">}</span><span class="s2">"</span>
  <span class="nx">location</span>                   <span class="p">=</span> <span class="nx">azurerm_resource_group</span><span class="p">.</span><span class="nx">main</span><span class="p">.</span><span class="nx">location</span>
  <span class="nx">resource_group_name</span>        <span class="p">=</span> <span class="nx">azurerm_resource_group</span><span class="p">.</span><span class="nx">main</span><span class="p">.</span><span class="nx">name</span>
  <span class="nx">tenant_id</span>                  <span class="p">=</span> <span class="k">data</span><span class="p">.</span><span class="nx">azurerm_client_config</span><span class="p">.</span><span class="nx">current</span><span class="p">.</span><span class="nx">tenant_id</span>
  <span class="nx">sku_name</span>                   <span class="p">=</span> <span class="s2">"standard"</span>
  
  <span class="c1"># Allow public access with firewall rules</span>
  <span class="nx">public_network_access_enabled</span> <span class="p">=</span> <span class="kc">true</span>
  
  <span class="nx">network_acls</span> <span class="p">{</span>
    <span class="nx">bypass</span>         <span class="p">=</span> <span class="s2">"AzureServices"</span>
    <span class="nx">default_action</span> <span class="p">=</span> <span class="s2">"Deny"</span>
    <span class="nx">ip_rules</span>       <span class="p">=</span> <span class="p">[</span><span class="s2">"YOUR_PUBLIC_IP/32"</span><span class="p">]</span>  <span class="c1"># Add your IPs here</span>
  <span class="p">}</span>
<span class="p">}</span>

<span class="c1"># Private DNS Zone</span>
<span class="k">resource</span> <span class="s2">"azurerm_private_dns_zone"</span> <span class="s2">"keyvault"</span> <span class="p">{</span>
  <span class="nx">name</span>                <span class="p">=</span> <span class="s2">"privatelink.vaultcore.azure.net"</span>
  <span class="nx">resource_group_name</span> <span class="p">=</span> <span class="nx">azurerm_resource_group</span><span class="p">.</span><span class="nx">main</span><span class="p">.</span><span class="nx">name</span>
<span class="p">}</span>

<span class="c1"># Virtual Network Link with Fallback</span>
<span class="k">resource</span> <span class="s2">"azapi_resource"</span> <span class="s2">"vnet_link"</span> <span class="p">{</span>
  <span class="nx">type</span>      <span class="p">=</span> <span class="s2">"Microsoft.Network/privateDnsZones/virtualNetworkLinks@2024-06-01"</span>
  <span class="nx">name</span>      <span class="p">=</span> <span class="s2">"vnet-link-fallback"</span>
  <span class="nx">parent_id</span> <span class="p">=</span> <span class="nx">azurerm_private_dns_zone</span><span class="p">.</span><span class="nx">keyvault</span><span class="p">.</span><span class="nx">id</span>
  <span class="nx">location</span>  <span class="p">=</span> <span class="s2">"global"</span>

  <span class="nx">body</span> <span class="p">=</span> <span class="nx">jsonencode</span><span class="p">({</span>
    <span class="nx">properties</span> <span class="p">=</span> <span class="p">{</span>
      <span class="nx">registrationEnabled</span> <span class="p">=</span> <span class="kc">false</span>
      <span class="nx">resolutionPolicy</span>    <span class="p">=</span> <span class="s2">"NxDomainRedirect"</span>
      <span class="nx">virtualNetwork</span> <span class="p">=</span> <span class="p">{</span>
        <span class="nx">id</span> <span class="p">=</span> <span class="nx">azurerm_virtual_network</span><span class="p">.</span><span class="nx">main</span><span class="p">.</span><span class="nx">id</span>
      <span class="p">}</span>
    <span class="p">}</span>
  <span class="p">})</span>
<span class="p">}</span>

<span class="c1"># Private Endpoint</span>
<span class="k">resource</span> <span class="s2">"azurerm_private_endpoint"</span> <span class="s2">"keyvault"</span> <span class="p">{</span>
  <span class="nx">name</span>                <span class="p">=</span> <span class="s2">"pe-keyvault"</span>
  <span class="nx">location</span>            <span class="p">=</span> <span class="nx">azurerm_resource_group</span><span class="p">.</span><span class="nx">main</span><span class="p">.</span><span class="nx">location</span>
  <span class="nx">resource_group_name</span> <span class="p">=</span> <span class="nx">azurerm_resource_group</span><span class="p">.</span><span class="nx">main</span><span class="p">.</span><span class="nx">name</span>
  <span class="nx">subnet_id</span>           <span class="p">=</span> <span class="nx">azurerm_subnet</span><span class="p">.</span><span class="nx">private_endpoints</span><span class="p">.</span><span class="nx">id</span>

  <span class="nx">private_service_connection</span> <span class="p">{</span>
    <span class="nx">name</span>                           <span class="p">=</span> <span class="s2">"psc-keyvault"</span>
    <span class="nx">private_connection_resource_id</span> <span class="p">=</span> <span class="nx">azurerm_key_vault</span><span class="p">.</span><span class="nx">main</span><span class="p">.</span><span class="nx">id</span>
    <span class="nx">is_manual_connection</span>           <span class="p">=</span> <span class="kc">false</span>
    <span class="nx">subresource_names</span>              <span class="p">=</span> <span class="p">[</span><span class="s2">"vault"</span><span class="p">]</span>
  <span class="p">}</span>

  <span class="nx">private_dns_zone_group</span> <span class="p">{</span>
    <span class="nx">name</span>                 <span class="p">=</span> <span class="s2">"default"</span>
    <span class="nx">private_dns_zone_ids</span> <span class="p">=</span> <span class="p">[</span><span class="nx">azurerm_private_dns_zone</span><span class="p">.</span><span class="nx">keyvault</span><span class="p">.</span><span class="nx">id</span><span class="p">]</span>
  <span class="p">}</span>
<span class="p">}</span>

<span class="c1"># Helper resources</span>
<span class="k">data</span> <span class="s2">"azurerm_client_config"</span> <span class="s2">"current"</span> <span class="p">{}</span>

<span class="k">resource</span> <span class="s2">"random_string"</span> <span class="s2">"suffix"</span> <span class="p">{</span>
  <span class="nx">length</span>  <span class="p">=</span> <span class="mi">8</span>
  <span class="nx">special</span> <span class="p">=</span> <span class="kc">false</span>
  <span class="nx">upper</span>   <span class="p">=</span> <span class="kc">false</span>
<span class="p">}</span></code></pre></figure>

<h3 id="testing-the-configuration">Testing the Configuration</h3>
<p>To verify the Fallback to Internet is working:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># From a VM in the same VNet (should resolve to private IP)</span>
nslookup kv-demo-xxxxx.vault.azure.net

<span class="c"># From a different region/VNet with fallback enabled (should resolve to public IP)</span>
nslookup kv-demo-xxxxx.vault.azure.net</code></pre></figure>

<p>You can also use dig for more detailed DNS information:</p>

<figure class="highlight"><pre><code class="language-terraform" data-lang="terraform"><span class="nx">dig</span> <span class="nx">kv</span><span class="err">-</span><span class="nx">demo</span><span class="err">-</span><span class="nx">xxxxx</span><span class="err">.</span><span class="nx">vault</span><span class="err">.</span><span class="nx">azure</span><span class="err">.</span><span class="nx">net</span></code></pre></figure>

<p>Important Considerations</p>

<ul>
  <li>Security: Always configure firewall rules on your resources when using Fallback to Internet. The feature allows DNS resolution to succeed, but network access still needs to be explicitly allowed.</li>
  <li>Cost: Traffic going through the public endpoint may incur data transfer costs, unlike private endpoint traffic within the same region.
Preview Feature: As of this writing, this feature is still in preview. Check Microsoft’s documentation for GA status before using in production.</li>
</ul>

<h3 id="conclusion">Conclusion</h3>
<p>The Fallback to Internet feature for Azure Private DNS zones provides an elegant solution for multi-region scenarios where full network interconnection isn’t feasible or desired. By allowing DNS resolution to fall back to public endpoints when private resolution fails, it maintains the security benefits of Private Endpoints while providing flexibility for cross-region access patterns.
This feature is particularly valuable when:</p>

<p>Operating isolated regions with centralized resources
Dealing with overlapping IP address spaces
Simplifying network architecture without compromising security
Implementing gradual migrations to fully private networking</p>

<p>Combined with proper firewall configuration, it offers a practical middle ground between fully private and fully public access patterns.</p>

<p>Happy scripting!</p>]]></content><author><name>Victor Silva</name></author><category term="Azure" /><category term="Azure" /><summary type="html"><![CDATA[When working with Azure Private Endpoints across multiple regions, you've likely encountered a common problem: how do you access a resource with a private endpoint from a different region that isn't interconnected with your current virtual network? Microsoft's offer the "Fallback to Internet" feature for Azure Private DNS zones solves this challenge elegantly.]]></summary></entry><entry><title type="html">OCI Cloud Guard: Excepting with custom tags [English]</title><link href="https://blog.victorsilva.com.uy/oci-cloud-guard-detection-rules/" rel="alternate" type="text/html" title="OCI Cloud Guard: Excepting with custom tags [English]" /><published>2025-08-31T23:14:33+00:00</published><updated>2025-08-31T23:14:33+00:00</updated><id>https://blog.victorsilva.com.uy/oci-cloud-guard-detection-rules</id><content type="html" xml:base="https://blog.victorsilva.com.uy/oci-cloud-guard-detection-rules/"><![CDATA[<p>In Oracle Cloud Infrastructure (OCI) environments, it’s common to encounter scenarios where public datasets are hosted in Object Storage to facilitate access for researchers, open-source communities, or partners. However, Cloud Guard, OCI’s automated security service, can generate constant alerts about these intentionally public buckets, creating noise in the monitoring system and making it difficult to identify real threats.</p>

<h2 id="the-challenge-security-vs-public-access">The Challenge: Security vs. Public Access</h2>
<p>Cloud Guard is designed to identify insecure configurations and potential vulnerabilities in your OCI tenancy. One of its most sensitive detectors identifies Object Storage buckets with public access, as these represent a potential risk of sensitive data exposure.
But what happens when your buckets must be public by design?
The Right Solution: Modify Detection Rules
Of the available options to handle this scenario, the correct answer is to modify the Cloud Guard Detection Rules configuration to exclude known public buckets from security scans.</p>

<h3 id="why-is-this-the-best-option">Why is this the best option?</h3>

<ul>
  <li><strong>Granularity</strong>: Allows you to keep Cloud Guard active for all other resources
Security: Doesn’t compromise the overall security posture of your tenancy</li>
  <li><strong>Flexibility</strong>: You can apply specific exceptions without disabling important protections</li>
  <li><strong>Scalability</strong>: Easy to maintain as your infrastructure grows</li>
</ul>

<h3 id="why-arent-the-other-options-suitable">Why aren’t the other options suitable?</h3>

<p>Disable Cloud Guard completely: Would remove protection from your entire tenancy
Convert buckets to private: Contradicts the purpose of sharing data publicly
Create a separate compartment without Cloud Guard: Leaves a segment of your infrastructure unmonitored, creating a security blind spot</p>

<h2 id="implementing-exceptions-with-custom-tags">Implementing Exceptions with Custom Tags</h2>
<p>The most elegant and maintainable way to implement this solution is by using custom tags combined with Detection Rules configuration.</p>

<h3 id="step-1-create-a-tag-namespace-and-tag">Step 1: Create a Tag Namespace and Tag</h3>
<p>First, create a tag namespace and a specific tag to identify authorized public resources:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Create Tag Namespace</span>
le

<span class="c"># Create Tag</span>
oci iam tag create <span class="se">\</span>
  <span class="nt">--tag-namespace-id</span> &lt;tag-namespace-ocid&gt; <span class="se">\</span>
  <span class="nt">--name</span> <span class="s2">"exception"</span> <span class="se">\</span>
  <span class="nt">--description</span> <span class="s2">"Indicates that the resource is an authorized exception"</span>
</code></pre></div></div>

<h3 id="step-2-apply-the-tag-to-public-buckets">Step 2: Apply the Tag to Public Buckets</h3>
<p>Tag the buckets that are intentionally public. In our case, we’ll tag the publicBucket in the ocilabs compartment:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>oci os bucket update <span class="se">\</span>
  <span class="nt">--bucket-name</span> publicBucket <span class="se">\</span>
  <span class="nt">--namespace</span> &lt;namespace&gt; <span class="se">\</span>
  <span class="nt">--freeform-tags</span> <span class="s1">'{"exception":"true"}'</span>
</code></pre></div></div>

<p>Or using Defined Tags:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>oci os bucket update <span class="se">\</span>
  <span class="nt">--bucket-name</span> publicBucket <span class="se">\</span>
  <span class="nt">--namespace</span> &lt;namespace&gt; <span class="se">\</span>
  <span class="nt">--defined-tags</span> <span class="s1">'{"SecurityExceptions":{"exception":"true"}}'</span>
</code></pre></div></div>

<h3 id="step-3-configure-cloud-guard-detection-rules">Step 3: Configure Cloud Guard Detection Rules</h3>

<p>Now comes the crucial part: modifying the Detection Rule that detects public buckets so it ignores those with the appropriate tag.</p>

<h4 id="option-a-using-the-oci-console">Option A: Using the OCI Console</h4>

<ol>
  <li>Navigate to <strong>Security</strong> → <strong>Cloud Guard</strong> → <strong>Configuration</strong></li>
  <li>Select the <strong>Detector Recipe</strong> you’re using</li>
  <li>Find the “Public Bucket” rule (typically <code class="language-plaintext highlighter-rouge">OBJECT_STORE_PUBLIC_BUCKET</code>)</li>
  <li>Click <strong>Edit Rule</strong></li>
  <li>In the <strong>Condition</strong> section, add a condition to exclude resources with the tag:</li>
</ol>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>resource.type = 'Bucket' 
AND resource.publicAccessType IN ('ObjectRead', 'ObjectReadWithoutList')
AND NOT (resource.freeformTags.exception = 'true')
</code></pre></div></div>

<h4 id="option-b-using-oci-cli">Option B: Using OCI CLI</h4>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Get the current detector recipe</span>
oci cloud-guard detector-recipe get <span class="se">\</span>
  <span class="nt">--detector-recipe-id</span> &lt;detector-recipe-ocid&gt; <span class="se">\</span>
  <span class="o">&gt;</span> detector-recipe.json

<span class="c"># Edit the detector-recipe.json file to update the condition</span>
<span class="c"># Then update the detector recipe</span>
oci cloud-guard detector-recipe update <span class="se">\</span>
  <span class="nt">--detector-recipe-id</span> &lt;detector-recipe-ocid&gt; <span class="se">\</span>
  <span class="nt">--from-json</span> file://detector-recipe.json
</code></pre></div></div>

<h3 id="practical-example-configuring-publicbucket-in-ocilabs">Practical Example: Configuring publicBucket in ocilabs</h3>
<p>Let’s walk through a complete example for the publicBucket in the ocilabs compartment:</p>
<ol>
  <li>Tag the Bucket</li>
</ol>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># First, get your namespace</span>
<span class="nb">export </span><span class="nv">NAMESPACE</span><span class="o">=</span><span class="si">$(</span>oci os ns get <span class="nt">--query</span> <span class="s1">'data'</span> <span class="nt">--raw-output</span><span class="si">)</span>

<span class="c"># Tag the publicBucket</span>
oci os bucket update <span class="se">\</span>
  <span class="nt">--bucket-name</span> publicBucket <span class="se">\</span>
  <span class="nt">--namespace</span> <span class="nv">$NAMESPACE</span> <span class="se">\</span>
  <span class="nt">--freeform-tags</span> <span class="s1">'{"exception":"true"}'</span> <span class="se">\</span>
  <span class="nt">--compartment-id</span> &lt;ocilabs-compartment-ocid&gt;
</code></pre></div></div>

<ol>
  <li>Verify the Tag</li>
</ol>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>oci os bucket get <span class="se">\</span>
  <span class="nt">--bucket-name</span> publicBucket <span class="se">\</span>
  <span class="nt">--namespace</span> <span class="nv">$NAMESPACE</span> <span class="se">\</span>
  <span class="nt">--query</span> <span class="s1">'data."freeform-tags"'</span>
</code></pre></div></div>

<p>Expected output:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"exception"</span><span class="p">:</span><span class="w"> </span><span class="s2">"true"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<ol>
  <li>Update Cloud Guard Detector Recipe</li>
</ol>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># List detector recipes to find the one you're using</span>
oci cloud-guard detector-recipe list <span class="se">\</span>
  <span class="nt">--compartment-id</span> &lt;root-compartment-ocid&gt; <span class="se">\</span>
  <span class="nt">--lifecycle-state</span> ACTIVE
</code></pre></div></div>

<p>Clone the Oracle-managed recipe if you haven’t already</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>oci cloud-guard detector-recipe create <span class="se">\</span>
  <span class="nt">--compartment-id</span> &lt;ocilabs-compartment-ocid&gt; <span class="se">\</span>
  <span class="nt">--display-name</span> <span class="s2">"Custom Detector Recipe - Public Buckets Exception"</span> <span class="se">\</span>
  <span class="nt">--source-detector-recipe-id</span> &lt;oracle-detector-recipe-ocid&gt;
</code></pre></div></div>

<h2 id="best-practices">Best Practices</h2>

<ol>
  <li>Documentation
Maintain a record of all resources tagged as exceptions:</li>
</ol>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gh"># Cloud Guard Exceptions</span>

| Resource | Compartment | Tag | Justification | Date | Approved by |
|----------|-------------|-----|---------------|------|-------------|
| publicBucket | ocilabs | exception:true | Public datasets for research | 2025-10-31 | Security Team |
</code></pre></div></div>

<ol>
  <li>Periodic Review
Implement a quarterly review process to validate that exceptions are still necessary:</li>
</ol>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># List all buckets with the exception tag</span>
oci search resource structured-search <span class="se">\</span>
  <span class="nt">--query-text</span> <span class="s2">"query bucket resources where (freeformTags.key = 'exception' &amp;&amp; freeformTags.value = 'true')"</span>
</code></pre></div></div>

<ol>
  <li>Custom Alerts
Configure alerts when new public buckets are created without the appropriate tag:</li>
</ol>

<pre><code class="language-hclresource">"oci_events_rule" "public_bucket_without_tag" {
  compartment_id = var.compartment_ocid
  display_name   = "Alert on untagged public bucket"
  is_enabled     = true
  
  condition = &lt;&lt;-EOT
    {
      "eventType": ["com.oraclecloud.objectstorage.createbucket", "com.oraclecloud.objectstorage.updatebucket"],
      "data": {
        "additionalDetails": {
          "publicAccessType": ["ObjectRead", "ObjectReadWithoutList"]
        }
      }
    }
  EOT
  
  actions {
    actions {
      action_type = "ONS"
      is_enabled  = true
      topic_id    = oci_ons_notification_topic.security_alerts.id
      description = "Notify security team of untagged public bucket"
    }
  }
}

# Create ONS topic for alerts
resource "oci_ons_notification_topic" "security_alerts" {
  compartment_id = var.compartment_ocid
  name           = "security-alerts"
  description    = "Security alerts for Cloud Guard exceptions"
}

# Subscribe to the topic
resource "oci_ons_subscription" "security_team_email" {
  compartment_id = var.compartment_ocid
  endpoint       = "security-team@example.com"
  protocol       = "EMAIL"
  topic_id       = oci_ons_notification_topic.security_alerts.id
}
</code></pre>

<ol>
  <li>Principle of Least Privilege
Ensure that only authorized users can apply the exception tag:
```hclresource
“oci_identity_policy” “tag_management” {
  compartment_id = var.tenancy_ocid
  name           = “security-exceptions-tag-policy”
  description    = “Control security exception tags”</li>
</ol>

<p>statements = [
    “Allow group SecurityAdmins to manage buckets in compartment ocilabs where request.user.name != ‘unauthorized-user’”,
    “Allow group SecurityAdmins to use tag-namespaces in tenancy where target.tag-namespace.name=’SecurityExceptions’”
  ]
}</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
5. Automation Script
Create a script to automate the tagging process for multiple buckets:
```bash
#!/bin/bash
# tag-public-buckets.sh

NAMESPACE=$(oci os ns get --query 'data' --raw-output)
COMPARTMENT_ID="&lt;ocilabs-compartment-ocid&gt;"

# Array of public buckets that should be tagged
PUBLIC_BUCKETS=("publicBucket" "research-data" "open-datasets")

for bucket in "${PUBLIC_BUCKETS[@]}"; do
  echo "Tagging bucket: $bucket"
  oci os bucket update \
    --bucket-name "$bucket" \
    --namespace "$NAMESPACE" \
    --freeform-tags '{"exception":"true"}' \
    --compartment-id "$COMPARTMENT_ID" \
    --force
  
  if [ $? -eq 0 ]; then
    echo "✓ Successfully tagged $bucket"
  else
    echo "✗ Failed to tag $bucket"
  fi
done
</code></pre></div></div>

<h3 id="monitoring-and-auditing">Monitoring and Auditing</h3>
<p>Implement logging to track changes to Detection Rules and tagged resources:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Enable Cloud Guard logging</span>
oci logging log create <span class="se">\</span>
  <span class="nt">--display-name</span> <span class="s2">"cloudguard-config-changes"</span> <span class="se">\</span>
  <span class="nt">--log-group-id</span> &lt;log-group-ocid&gt; <span class="se">\</span>
  <span class="nt">--log-type</span> SERVICE <span class="se">\</span>
  <span class="nt">--configuration</span> <span class="s1">'{
    "source": {
      "sourceType": "OCISERVICE",
      "service": "cloudguard",
      "resource": "&lt;target-ocid&gt;",
      "category": "write"
    },
    "archiving": {
      "isEnabled": true
    }
  }'</span>

<span class="c"># Enable Object Storage logging for bucket updates</span>
oci logging log create <span class="se">\</span>
  <span class="nt">--display-name</span> <span class="s2">"bucket-modification-logs"</span> <span class="se">\</span>
  <span class="nt">--log-group-id</span> &lt;log-group-ocid&gt; <span class="se">\</span>
  <span class="nt">--log-type</span> SERVICE <span class="se">\</span>
  <span class="nt">--configuration</span> <span class="s1">'{
    "source": {
      "sourceType": "OCISERVICE",
      "service": "objectstorage",
      "resource": "publicBucket",
      "category": "write"
    }
  }'</span>
</code></pre></div></div>

<h2 id="conclusion">Conclusion</h2>
<p>Modifying Cloud Guard Detection Rules to exclude specific resources through custom tags is the most professional and maintainable way to manage intentionally public buckets in OCI. This strategy allows you to:</p>

<p>Maintain a robust security posture
Reduce false positive alert noise
Scale your infrastructure without compromising security
Maintain visibility and control over exceptions</p>

<p>For the specific case of publicBucket in the ocilabs compartment with the exception:true tag, this approach ensures that your research datasets remain accessible while Cloud Guard continues to protect the rest of your infrastructure.
Remember that security is an ongoing process. Establish clear procedures for exception management, document all decisions, and regularly review your configuration to ensure it remains aligned with your organization’s needs.</p>

<p>Happy scripting!</p>]]></content><author><name>Victor Silva</name></author><category term="Oracle" /><category term="Security" /><category term="Oracle Cloud" /><category term="Cloud Guard" /><category term="Detection Rules" /><category term="Security" /><summary type="html"><![CDATA[In Oracle Cloud Infrastructure (OCI) environments, it's common to encounter scenarios where public datasets are hosted in Object Storage to facilitate access for researchers, open-source communities, or partners. However, Cloud Guard, OCI's automated security service, can generate constant alerts about these intentionally public buckets, creating noise in the monitoring system and making it difficult to identify real threats.]]></summary></entry><entry><title type="html">GCP Chronicle SIEM Detection Rules with YARA-L 2.0</title><link href="https://blog.victorsilva.com.uy/gcp-chronicle-siem-detection-rules/" rel="alternate" type="text/html" title="GCP Chronicle SIEM Detection Rules with YARA-L 2.0" /><published>2025-08-11T10:00:00+00:00</published><updated>2025-08-11T10:00:00+00:00</updated><id>https://blog.victorsilva.com.uy/gcp-chronicle-siem-detection-rules</id><content type="html" xml:base="https://blog.victorsilva.com.uy/gcp-chronicle-siem-detection-rules/"><![CDATA[<p>When working with GCP at any meaningful scale, you quickly realize that logs are not your problem — you have plenty of them. Cloud Audit Logs capture every IAM mutation, every API call to Secret Manager, every VPC firewall change. VPC Flow Logs record every accepted and rejected connection. Cloud Armor logs tell you what traffic was blocked at the edge. The problem is that none of those logs, on their own, will tell you that something is wrong. A single <code class="language-plaintext highlighter-rouge">SetIamPolicy</code> event binding <code class="language-plaintext highlighter-rouge">roles/owner</code> to an external email address looks exactly like a routine administrative change unless something correlates it, evaluates it against a policy, and raises an alert.</p>

<p>That gap between raw log data and actionable detection is where a SIEM lives. GCP’s native answer is Chronicle — a petabyte-scale security analytics platform built on Google’s infrastructure, with a normalized data model and a purpose-built detection rule language called YARA-L 2.0. This post covers the full path: ingesting GCP logs into Chronicle, understanding the Unified Data Model (UDM) that normalizes those logs, and writing three concrete detection rules that cover the most common GCP security incidents.</p>

<h2 id="what-chronicle-is-and-how-it-differs">What Chronicle Is and How It Differs</h2>

<p>Chronicle started as an internal Google project called Backstory before becoming a generally available GCP service. It is not a log aggregation tool with a search UI bolted on — it is built specifically for security analytics, and that design decision shows up in a few important ways.</p>

<p><strong>Unified Data Model (UDM)</strong> is the normalization layer at the center of everything. When a Cloud Audit Log entry arrives in Chronicle, it is parsed and mapped to a standardized schema. An IAM change becomes a <code class="language-plaintext highlighter-rouge">USER_RESOURCE_UPDATE_PERMISSIONS</code> event with a <code class="language-plaintext highlighter-rouge">principal</code>, a <code class="language-plaintext highlighter-rouge">target</code>, and structured <code class="language-plaintext highlighter-rouge">security_result</code> fields. A network connection becomes a <code class="language-plaintext highlighter-rouge">NETWORK_CONNECTION</code> event with <code class="language-plaintext highlighter-rouge">network.ip_protocol</code>, <code class="language-plaintext highlighter-rouge">principal.ip</code>, and <code class="language-plaintext highlighter-rouge">target.port</code> fields. Every event type, regardless of the originating product, maps to the same field names. This is what makes detection rules portable and readable — you write rules against UDM fields, not against the raw JSON structure of a specific product’s log format.</p>

<p><strong>Petabyte-scale retention</strong> at a flat rate is the other structural difference. Chronicle’s default retention is one year with no per-GB ingestion cost for a set of natively supported log types, including Cloud Audit Logs. The cost model is per-user rather than per-volume, which changes the calculus around what you can afford to keep searchable.</p>

<p><strong>Google Threat Intelligence</strong> is built in. Chronicle can automatically correlate IOCs — IPs, domains, file hashes — against Google’s threat intelligence feed without a separate connector or add-on license.</p>

<p>For readers coming from other SIEMs, here is a quick orientation:</p>

<table>
  <thead>
    <tr>
      <th>Feature</th>
      <th>Chronicle</th>
      <th>Splunk</th>
      <th>Microsoft Sentinel</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Query language</td>
      <td>YARA-L 2.0 (rules) + UDM Search</td>
      <td>SPL</td>
      <td>KQL</td>
    </tr>
    <tr>
      <td>Data model</td>
      <td>UDM (normalized)</td>
      <td>Raw + CIM</td>
      <td>ASIM + raw</td>
    </tr>
    <tr>
      <td>Retention</td>
      <td>1 year default, petabyte-scale</td>
      <td>License-dependent</td>
      <td>Log Analytics workspace</td>
    </tr>
    <tr>
      <td>GCP log integration</td>
      <td>Native</td>
      <td>Via HEC/syslog</td>
      <td>Via connector</td>
    </tr>
    <tr>
      <td>Threat intel</td>
      <td>Google TI built-in</td>
      <td>ThreatIntelligence add-on</td>
      <td>MDTI connector</td>
    </tr>
  </tbody>
</table>

<p>YARA-L 2.0 is closer in feel to a structured rule language (like Sigma) than to a query language (like SPL or KQL). You declare what events you are looking for, define how to group them over time, and specify the condition under which the rule fires. If you have written Sigma rules or Snort/Suricata rules before, the pattern will be familiar. If you come from Splunk, the shift from “search and transform” to “declare and match” takes a little adjustment but makes rules easier to audit and version-control.</p>

<h2 id="architecture-getting-gcp-logs-into-chronicle">Architecture: Getting GCP Logs into Chronicle</h2>

<p>The ingestion path from Cloud Logging to Chronicle has two options. The newer path is a direct Chronicle export configured in the Chronicle UI under Settings &gt; Feeds, using Google Cloud Pub/Sub as the transport. The older (and still fully supported) path uses a Log Router sink to push to Pub/Sub, and then a Chronicle Pub/Sub feed pulls from that topic. Both paths land log data in Chronicle as UDM events within a few minutes of the original API call.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Cloud Logging (Cloud Audit Logs, VPC Flow Logs, Cloud Armor)
         |
         | Log Router sink
         v
     Pub/Sub topic (chronicle-gcp-logs)
         |
         | Chronicle Pub/Sub feed
         v
    Chronicle (UDM normalization + retention)
         |
         +--- YARA-L 2.0 detection rules
         |
         +--- Alerts / Findings
</code></pre></div></div>

<p>The Log Router sink approach gives you the most control over which log entries flow to Chronicle, because the sink filter is a full Cloud Logging filter expression. You can scope it to specific services, specific resource types, or specific severity levels, and you can tune it later without touching Chronicle’s configuration.</p>

<h2 id="prerequisites">Prerequisites</h2>

<p>You will need:</p>

<ul>
  <li>A GCP project with Owner or Security Admin access</li>
  <li><code class="language-plaintext highlighter-rouge">gcloud</code> CLI installed and authenticated</li>
  <li>A Chronicle tenant provisioned (Chronicle is a separate license — contact your Google Cloud rep or check the Chronicle trial program)</li>
  <li>Cloud Audit Logs enabled for the services you want to monitor (Admin Activity is always on; Data Access must be explicitly enabled)</li>
</ul>

<p>Verify your active project and authentication:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash">gcloud config get-value project
gcloud auth list</code></pre></figure>

<p>Enable the required APIs:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash">gcloud services <span class="nb">enable</span> <span class="se">\</span>
  logging.googleapis.com <span class="se">\</span>
  pubsub.googleapis.com <span class="se">\</span>
  cloudresourcemanager.googleapis.com <span class="se">\</span>
  secretmanager.googleapis.com <span class="se">\</span>
  compute.googleapis.com <span class="se">\</span>
  <span class="nt">--project</span><span class="o">=</span>PROJECT_ID</code></pre></figure>

<h2 id="setting-up-log-ingestion">Setting Up Log Ingestion</h2>

<h3 id="creating-the-pubsub-topic-and-log-router-sink">Creating the Pub/Sub Topic and Log Router Sink</h3>

<p>The sink filter below covers the three services we will write detection rules for: IAM (via Cloud Resource Manager), Secret Manager, and Compute Engine (for VPC firewall rules). You can extend the filter to include additional services as you add rules.</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Create the Pub/Sub topic that Chronicle will pull from</span>
gcloud pubsub topics create chronicle-gcp-logs <span class="se">\</span>
  <span class="nt">--project</span><span class="o">=</span>PROJECT_ID</code></pre></figure>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Create the Log Router sink with a filter scoped to the services we care about</span>
gcloud logging sinks create chronicle-sink <span class="se">\</span>
  pubsub.googleapis.com/projects/PROJECT_ID/topics/chronicle-gcp-logs <span class="se">\</span>
  <span class="nt">--log-filter</span><span class="o">=</span><span class="s1">'protoPayload.serviceName=("cloudresourcemanager.googleapis.com" OR "secretmanager.googleapis.com" OR "compute.googleapis.com")'</span> <span class="se">\</span>
  <span class="nt">--project</span><span class="o">=</span>PROJECT_ID</code></pre></figure>

<p>The sink creates a dedicated service account (<code class="language-plaintext highlighter-rouge">serviceAccount:...@gcp-sa-logging.iam.gserviceaccount.com</code>) that needs publish rights on the topic. Retrieve it and grant the permission:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Get the sink's writer identity</span>
<span class="nv">SINK_SA</span><span class="o">=</span><span class="si">$(</span>gcloud logging sinks describe chronicle-sink <span class="se">\</span>
  <span class="nt">--project</span><span class="o">=</span>PROJECT_ID <span class="se">\</span>
  <span class="nt">--format</span><span class="o">=</span><span class="s1">'value(writerIdentity)'</span><span class="si">)</span>

<span class="nb">echo</span> <span class="s2">"Sink service account: </span><span class="k">${</span><span class="nv">SINK_SA</span><span class="k">}</span><span class="s2">"</span>

<span class="c"># Grant publish rights on the topic</span>
gcloud pubsub topics add-iam-policy-binding chronicle-gcp-logs <span class="se">\</span>
  <span class="nt">--member</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">SINK_SA</span><span class="k">}</span><span class="s2">"</span> <span class="se">\</span>
  <span class="nt">--role</span><span class="o">=</span><span class="s2">"roles/pubsub.publisher"</span> <span class="se">\</span>
  <span class="nt">--project</span><span class="o">=</span>PROJECT_ID</code></pre></figure>

<h3 id="configuring-the-chronicle-pubsub-feed">Configuring the Chronicle Pub/Sub Feed</h3>

<p>With the topic receiving log data, open the Chronicle UI and navigate to <strong>Settings &gt; Feeds &gt; Add Feed</strong>. Select <strong>Google Cloud Pub/Sub</strong> as the source type, choose <strong>Google Cloud Audit Logs</strong> as the log type, and enter your project ID and the topic name <code class="language-plaintext highlighter-rouge">chronicle-gcp-logs</code>. Chronicle will use its own service account to subscribe to the topic — copy the Chronicle service account email shown in the UI and grant it the subscriber role:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Replace CHRONICLE_SA with the service account shown in the Chronicle feed UI</span>
gcloud pubsub topics add-iam-policy-binding chronicle-gcp-logs <span class="se">\</span>
  <span class="nt">--member</span><span class="o">=</span><span class="s2">"serviceAccount:CHRONICLE_SA"</span> <span class="se">\</span>
  <span class="nt">--role</span><span class="o">=</span><span class="s2">"roles/pubsub.subscriber"</span> <span class="se">\</span>
  <span class="nt">--project</span><span class="o">=</span>PROJECT_ID</code></pre></figure>

<p>Once the feed is saved and active, Cloud Audit Log entries will begin arriving in Chronicle within a few minutes. You can validate ingestion in the Chronicle UI via <strong>UDM Search</strong> — search for <code class="language-plaintext highlighter-rouge">metadata.product_name = "Cloud Audit Logs"</code> and confirm events are appearing.</p>

<h2 id="yara-l-20-detection-rules">YARA-L 2.0 Detection Rules</h2>

<p>Now that we have log data flowing, let’s implement the detection logic. YARA-L 2.0 rules have a fixed structure with five sections. Understanding each section before looking at full rules makes the syntax click faster.</p>

<h3 id="rule-structure">Rule Structure</h3>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash">rule rule_name <span class="o">{</span>
  meta:
    author <span class="o">=</span> <span class="s2">"Victor Silva"</span>
    description <span class="o">=</span> <span class="s2">"What this rule detects"</span>
    severity <span class="o">=</span> <span class="s2">"HIGH"</span>      // CRITICAL, HIGH, MEDIUM, LOW, INFORMATIONAL
    priority <span class="o">=</span> <span class="s2">"HIGH"</span>
    <span class="nb">type</span> <span class="o">=</span> <span class="s2">"ALERT"</span>         // ALERT fires <span class="k">in </span>the Alerts view
                           // RULE_TYPE_UNSPECIFIED creates informational findings

  events:
    // UDM field predicates — all must match <span class="k">for </span>the rule to consider an event
    <span class="nv">$e</span>.metadata.event_type <span class="o">=</span> <span class="s2">"USER_RESOURCE_UPDATE_PERMISSIONS"</span>
    <span class="nv">$e</span>.target.resource.type <span class="o">=</span> <span class="s2">"GCP_IAM_POLICY"</span>

  match:
    // Optional — used <span class="k">for </span>multi-event rules to define the grouping key and
    // <span class="nb">time </span>window. For single-event rules this section is omitted.
    <span class="nv">$e</span>.principal.user.userid over 1h

  condition:
    // Specifies when the rule fires. <span class="s2">"</span><span class="nv">$e</span><span class="s2">"</span> means <span class="s2">"at least one matching event"</span><span class="nb">.</span>
    // For multi-event rules you can write <span class="s2">"#e &gt; 5"</span> or combine variables.
    <span class="nv">$e</span>

  outcome:
    // Variables available <span class="k">in </span>the alert details. These surface <span class="k">in </span>the alert
    // and can be used <span class="k">for </span>triage without opening the raw log.
    <span class="nv">$risk_score</span> <span class="o">=</span> 85
    <span class="nv">$principal_email</span> <span class="o">=</span> <span class="nv">$e</span>.principal.user.userid
<span class="o">}</span></code></pre></figure>

<p>A few UDM field namespaces you will use constantly:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">$e.metadata</code> — event type, product name, log type, timestamps</li>
  <li><code class="language-plaintext highlighter-rouge">$e.principal</code> — who initiated the action (user, service account, IP)</li>
  <li><code class="language-plaintext highlighter-rouge">$e.target</code> — what resource was acted on</li>
  <li><code class="language-plaintext highlighter-rouge">$e.network</code> — network connection details (protocol, ports, IPs)</li>
  <li><code class="language-plaintext highlighter-rouge">$e.security_result</code> — outcome, threat indicators, verdict</li>
</ul>

<p>Single-event rules match on one event at a time — the <code class="language-plaintext highlighter-rouge">match</code> section is omitted and <code class="language-plaintext highlighter-rouge">condition</code> is just <code class="language-plaintext highlighter-rouge">$e</code>. Multi-event rules correlate multiple events within a time window, grouping them by a key field (for example, <code class="language-plaintext highlighter-rouge">$e.principal.user.userid over 1h</code> fires when a single user matches the event predicate more than a threshold number of times in an hour).</p>

<h3 id="rule-1-iam-privilege-escalation">Rule 1: IAM Privilege Escalation</h3>

<p>This is the highest-priority rule to have active. Granting <code class="language-plaintext highlighter-rouge">roles/owner</code> or <code class="language-plaintext highlighter-rouge">roles/editor</code> to any principal — especially an external one or a service account that should not have project-wide permissions — is one of the most reliable signals of a compromised account or an insider threat.</p>

<p>The rule fires on a single event, because even one such IAM binding change is worth immediate investigation.</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash">rule gcp_iam_privilege_escalation_owner_editor <span class="o">{</span>
  meta:
    author <span class="o">=</span> <span class="s2">"Victor Silva"</span>
    description <span class="o">=</span> <span class="s2">"Detects when owner or editor role is granted to any principal"</span>
    severity <span class="o">=</span> <span class="s2">"HIGH"</span>
    priority <span class="o">=</span> <span class="s2">"HIGH"</span>
    <span class="nb">type</span> <span class="o">=</span> <span class="s2">"ALERT"</span>

  events:
    <span class="nv">$e</span>.metadata.event_type <span class="o">=</span> <span class="s2">"USER_RESOURCE_UPDATE_PERMISSIONS"</span>
    <span class="nv">$e</span>.target.resource.type <span class="o">=</span> <span class="s2">"GCP_IAM_POLICY"</span>
    <span class="o">(</span>
      re.regex<span class="o">(</span><span class="nv">$e</span>.target.resource.attribute.labels[<span class="s2">"role"</span><span class="o">]</span>, <span class="sb">`</span>roles/owner<span class="sb">`</span><span class="o">)</span> or
      re.regex<span class="o">(</span><span class="nv">$e</span>.target.resource.attribute.labels[<span class="s2">"role"</span><span class="o">]</span>, <span class="sb">`</span>roles/editor<span class="sb">`</span><span class="o">)</span>
    <span class="o">)</span>

  condition:
    <span class="nv">$e</span>

  outcome:
    <span class="nv">$principal_email</span> <span class="o">=</span> <span class="nv">$e</span>.principal.user.userid
    <span class="nv">$project</span> <span class="o">=</span> <span class="nv">$e</span>.target.resource.name
    <span class="nv">$role_granted</span> <span class="o">=</span> <span class="nv">$e</span>.target.resource.attribute.labels[<span class="s2">"role"</span><span class="o">]</span>
<span class="o">}</span></code></pre></figure>

<p>The <code class="language-plaintext highlighter-rouge">re.regex()</code> function is used here rather than a direct equality check because the role value in the UDM label may contain additional context in some log formats. Using a regex anchored to <code class="language-plaintext highlighter-rouge">roles/owner</code> ensures the rule catches the binding regardless of surrounding characters.</p>

<p>The <code class="language-plaintext highlighter-rouge">outcome</code> variables <code class="language-plaintext highlighter-rouge">$principal_email</code>, <code class="language-plaintext highlighter-rouge">$project</code>, and <code class="language-plaintext highlighter-rouge">$role_granted</code> will appear directly in the Chronicle alert details, giving the analyst the three facts they need to start triage without having to dig into raw log data.</p>

<h3 id="rule-2-secret-manager-anomalous-access">Rule 2: Secret Manager Anomalous Access</h3>

<p>Secret Manager access patterns are a reliable detection surface. In a well-governed project, the set of service accounts that legitimately read secrets is small and known. Any access from outside that approved set warrants investigation — it could indicate a compromised application service account, lateral movement, or exfiltration of credentials.</p>

<p>This rule uses a <code class="language-plaintext highlighter-rouge">not re.regex()</code> predicate to implement a simple allowlist approach. You will customize the regex to match your project’s naming convention for approved service accounts.</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash">rule gcp_secret_manager_anomalous_access <span class="o">{</span>
  meta:
    author <span class="o">=</span> <span class="s2">"Victor Silva"</span>
    description <span class="o">=</span> <span class="s2">"Detects secret access from service accounts not in the approved list"</span>
    severity <span class="o">=</span> <span class="s2">"MEDIUM"</span>
    priority <span class="o">=</span> <span class="s2">"MEDIUM"</span>
    <span class="nb">type</span> <span class="o">=</span> <span class="s2">"ALERT"</span>

  events:
    <span class="nv">$e</span>.metadata.product_name <span class="o">=</span> <span class="s2">"Secret Manager"</span>
    <span class="nv">$e</span>.metadata.event_type <span class="o">=</span> <span class="s2">"USER_RESOURCE_ACCESS"</span>
    re.regex<span class="o">(</span><span class="nv">$e</span>.metadata.product_event_type, <span class="sb">`</span>AccessSecretVersion<span class="sb">`</span><span class="o">)</span>
    not re.regex<span class="o">(</span><span class="nv">$e</span>.principal.user.userid, <span class="sb">`</span>approved-sa@my-project<span class="se">\.</span>iam<span class="se">\.</span>gserviceaccount<span class="se">\.</span>com<span class="sb">`</span><span class="o">)</span>
    not re.regex<span class="o">(</span><span class="nv">$e</span>.principal.user.userid, <span class="sb">`</span>another-approved-sa@my-project<span class="se">\.</span>iam<span class="se">\.</span>gserviceaccount<span class="se">\.</span>com<span class="sb">`</span><span class="o">)</span>

  condition:
    <span class="nv">$e</span>

  outcome:
    <span class="nv">$principal_email</span> <span class="o">=</span> <span class="nv">$e</span>.principal.user.userid
    <span class="nv">$secret_name</span> <span class="o">=</span> <span class="nv">$e</span>.target.resource.name
<span class="o">}</span></code></pre></figure>

<p>A few implementation notes for this rule in practice. First, make sure Data Access logs are enabled for Secret Manager — Admin Activity logs do not capture <code class="language-plaintext highlighter-rouge">AccessSecretVersion</code> calls, only Data Access logs do. Enable them with:</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Export current IAM policy</span>
gcloud projects get-iam-policy PROJECT_ID <span class="nt">--format</span><span class="o">=</span>json <span class="o">&gt;</span> policy.json</code></pre></figure>

<p>Add the Secret Manager Data Access audit config to <code class="language-plaintext highlighter-rouge">policy.json</code> and re-apply:</p>

<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="w">
  </span><span class="nl">"auditConfigs"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
    </span><span class="p">{</span><span class="w">
      </span><span class="nl">"service"</span><span class="p">:</span><span class="w"> </span><span class="s2">"secretmanager.googleapis.com"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"auditLogConfigs"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
        </span><span class="p">{</span><span class="w"> </span><span class="nl">"logType"</span><span class="p">:</span><span class="w"> </span><span class="s2">"DATA_READ"</span><span class="w"> </span><span class="p">}</span><span class="w">
      </span><span class="p">]</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">]</span><span class="w">
</span><span class="p">}</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash">gcloud projects set-iam-policy PROJECT_ID policy.json</code></pre></figure>

<p>Second, the allowlist in the rule above is a starting point. As you expand the rule to cover multiple projects or a more complex service account naming scheme, consider using <code class="language-plaintext highlighter-rouge">re.regex()</code> with a pattern that matches your entire approved namespace (for example, <code class="language-plaintext highlighter-rouge">^(app-backend|app-worker)-sa@my-project\.iam\.gserviceaccount\.com$</code>) rather than listing each approved account individually.</p>

<h3 id="rule-3-overly-permissive-vpc-firewall-rule">Rule 3: Overly Permissive VPC Firewall Rule</h3>

<p>Firewall rules allowing ingress from <code class="language-plaintext highlighter-rouge">0.0.0.0/0</code> are a routine audit finding that rarely gets caught at creation time. By the time a security reviewer looks at the firewall configuration, the rule has been in place for weeks and removing it requires coordination with application teams. This rule catches the problem the moment the firewall rule is created.</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash">rule gcp_vpc_firewall_open_ingress <span class="o">{</span>
  meta:
    author <span class="o">=</span> <span class="s2">"Victor Silva"</span>
    description <span class="o">=</span> <span class="s2">"Detects VPC firewall rules allowing ingress from 0.0.0.0/0"</span>
    severity <span class="o">=</span> <span class="s2">"HIGH"</span>
    priority <span class="o">=</span> <span class="s2">"HIGH"</span>
    <span class="nb">type</span> <span class="o">=</span> <span class="s2">"ALERT"</span>

  events:
    <span class="nv">$e</span>.metadata.event_type <span class="o">=</span> <span class="s2">"USER_RESOURCE_CREATION"</span>
    <span class="nv">$e</span>.target.resource.type <span class="o">=</span> <span class="s2">"GCP_VPC_FIREWALL_RULE"</span>
    <span class="nv">$e</span>.target.resource.attribute.labels[<span class="s2">"direction"</span><span class="o">]</span> <span class="o">=</span> <span class="s2">"INGRESS"</span>
    <span class="nv">$e</span>.target.resource.attribute.labels[<span class="s2">"source_ranges"</span><span class="o">]</span> <span class="o">=</span> <span class="s2">"0.0.0.0/0"</span>

  condition:
    <span class="nv">$e</span>

  outcome:
    <span class="nv">$principal_email</span> <span class="o">=</span> <span class="nv">$e</span>.principal.user.userid
    <span class="nv">$firewall_rule</span> <span class="o">=</span> <span class="nv">$e</span>.target.resource.name
    <span class="nv">$network</span> <span class="o">=</span> <span class="nv">$e</span>.target.resource.attribute.labels[<span class="s2">"network"</span><span class="o">]</span>
<span class="o">}</span></code></pre></figure>

<p>This rule also captures <code class="language-plaintext highlighter-rouge">USER_RESOURCE_CREATION</code> events — meaning it fires when the firewall rule is first created, not only on subsequent modifications. If your environment has existing open ingress rules that you want to detect in historical data, the retroactive search approach covered in the next section will surface them without waiting for a new creation event.</p>

<p>One refinement worth considering: if your environment legitimately uses <code class="language-plaintext highlighter-rouge">0.0.0.0/0</code> ingress for certain ports (like port 80/443 for public-facing load balancers), add a predicate to exclude those specific port combinations, or adjust the severity to <code class="language-plaintext highlighter-rouge">MEDIUM</code> and route it to an informational finding queue for human review rather than an automated alert.</p>

<h2 id="deploying-rules-in-chronicle">Deploying Rules in Chronicle</h2>

<p>With the rule text ready, deploying to Chronicle takes a few steps in the UI.</p>

<p>Navigate to <strong>Detection Engine &gt; Rules</strong> and click <strong>New Rule</strong>. Paste the rule text into the YARA-L editor. Chronicle validates the syntax inline — if any field names or function calls are incorrect, the editor highlights the error and shows the expected format. Fix any validation errors before saving.</p>

<p>Once the rule validates cleanly, configure two settings:</p>

<p><strong>Alert vs. Informational</strong>: Rules with <code class="language-plaintext highlighter-rouge">type = "ALERT"</code> in the <code class="language-plaintext highlighter-rouge">meta</code> section create entries in the Alerts view, trigger notification integrations, and are tracked through Chronicle’s case management workflow. Rules with <code class="language-plaintext highlighter-rouge">type = "RULE_TYPE_UNSPECIFIED"</code> create informational findings that appear in the Rules view but do not create alerts. Start new rules as informational until you have validated them against real traffic, then promote them to alert.</p>

<p><strong>Enabled vs. Disabled</strong>: Rules do not evaluate incoming events until they are explicitly enabled. After saving, toggle the rule to <strong>Enabled</strong> using the status switch in the Rules list.</p>

<p>Chronicle evaluates enabled rules against incoming UDM events in near-real-time — new events that match an enabled rule create findings within a few minutes of the original log event.</p>

<h2 id="testing-rules-with-retroactive-search">Testing Rules with Retroactive Search</h2>

<p>One of Chronicle’s most practical features for detection engineering is the ability to run a rule against historical data. This lets you validate that a new rule would have fired on past events (useful for confirming it catches real threats) and estimate its alert volume before enabling it on live data.</p>

<p>To run a retroactive search, open the rule in the Rules editor and click <strong>Run Retroactive Search</strong>. Set the time range (up to the retention window — one year by default) and submit. Chronicle processes the historical UDM events against the rule and shows you a list of matches with timestamps and outcome variable values.</p>

<p>This workflow is where the <code class="language-plaintext highlighter-rouge">outcome</code> variables pay off. A retroactive search on the IAM privilege escalation rule will show you every <code class="language-plaintext highlighter-rouge">$principal_email</code>, <code class="language-plaintext highlighter-rouge">$project</code>, and <code class="language-plaintext highlighter-rouge">$role_granted</code> value from the past year — you can immediately see whether the rule would have caught real events or whether it is firing on expected administrative activity that needs to be excluded.</p>

<p>For the Secret Manager rule, run a retroactive search over a 30-day window and review the <code class="language-plaintext highlighter-rouge">$principal_email</code> values in the results. Any service account identity you do not recognize should be investigated; any known-good identity that appeared should be added to the allowlist in the rule before you enable it on live data.</p>

<h2 id="best-practices">Best Practices</h2>

<p><strong>Start with the UDM field reference, not trial and error.</strong> Chronicle’s documentation includes a complete UDM field reference that lists every available field, its type, and which event types populate it. Before writing a new rule, look up which UDM event type corresponds to the action you want to detect and which fields are populated for that event type. Writing rules against unpopulated fields produces rules that silently never match — the field predicate evaluates as false because the field does not exist in the event, not because events are not arriving.</p>

<p><strong>Manage alert fatigue before it becomes a problem.</strong> A detection rule that fires 200 times per day for expected activity is worse than no rule at all — it trains analysts to ignore the alert queue. Before enabling a rule in alert mode, run a retroactive search over two weeks of historical data and count the matches. If the volume is too high, tune the rule with additional predicates, add an allowlist for known-good identities, or run it as an informational finding for a week to collect baseline data before deciding on the right threshold.</p>

<p><strong>Version-control your rules.</strong> YARA-L rule text is plain text — store it in a Git repository alongside your other infrastructure code. Chronicle’s API allows programmatic rule management (create, update, enable, disable) via the Chronicle REST API, so you can integrate rule deployment into a CI/CD pipeline with peer review and change tracking. Treat detection rules with the same engineering discipline as Terraform modules: they have the same blast radius when they go wrong.</p>

<p><strong>Use <code class="language-plaintext highlighter-rouge">outcome</code> variables to make alerts self-contained.</strong> Every field you expose in <code class="language-plaintext highlighter-rouge">outcome</code> appears directly in the Chronicle alert details without requiring the analyst to open the raw log. The more context you surface in outcomes — principal identity, resource name, affected project, IP address — the faster triage goes. Think of <code class="language-plaintext highlighter-rouge">outcome</code> variables as the executive summary of the alert.</p>

<p><strong>Separate rule type from severity.</strong> A rule can have <code class="language-plaintext highlighter-rouge">severity = "HIGH"</code> and <code class="language-plaintext highlighter-rouge">type = "RULE_TYPE_UNSPECIFIED"</code> — high severity, informational mode. Use this during the tuning phase for rules that detect genuinely high-risk behaviors but have not yet been validated against your specific environment’s baseline. It gives you visibility into the events without generating alert noise while you tune.</p>

<h2 id="conclusion">Conclusion</h2>

<p>What we built here is the foundation of a detection engineering practice on GCP. Cloud Audit Logs flowing through a Log Router sink into Chronicle give you a normalized, petabyte-scale searchable corpus of everything happening in your GCP environment. YARA-L 2.0 rules — one for IAM privilege escalation, one for Secret Manager anomalous access, one for overly permissive firewall creation — give you concrete detections for three of the most common GCP security incidents. Retroactive search lets you validate those rules against real historical data before they start generating live alerts.</p>

<p>The next step is expanding coverage. The same YARA-L pattern applies to Cloud Storage public bucket access, GKE workload identity escalation, service account key creation, and Cloud Run deployments from unverified container images. Each new rule follows the same structure: understand the UDM event type, identify the fields that distinguish the suspicious behavior from normal activity, write the predicate, validate with retroactive search, and enable.</p>

<p>If you are working on the underlying infrastructure security that feeds into these detections, the posts on <a href="/gcp-secret-manager-terraform/">GCP Secret Manager with Terraform</a> and <a href="/gcp-vpc-service-controls-terraform/">GCP VPC Service Controls with Terraform</a> cover the preventive controls that reduce the surface area these detection rules are monitoring. For runtime threat detection at the workload layer, <a href="/falco-runtime-security-kubernetes/">Falco runtime security for Kubernetes</a> complements Chronicle’s API-level visibility with in-cluster syscall-level detection. For the GKE workload identity escalation scenario mentioned above, <a href="/gcp-binary-authorization-gke-terraform/">GCP Binary Authorization for GKE with Terraform</a> provides the admission-time control that pairs with Chronicle’s post-deployment detection.</p>

<p>Happy scripting!</p>]]></content><author><name>Victor Silva</name></author><category term="GCP" /><category term="Security" /><category term="GCP" /><category term="Chronicle SIEM" /><category term="YARA-L 2.0" /><category term="Detection Engineering" /><category term="Chronicle detection engineering" /><category term="GCP SIEM" /><category term="Cloud Audit Logs" /><category term="Security" /><summary type="html"><![CDATA[Raw GCP audit logs don't surface threats alone. This post builds Chronicle SIEM detection rules in YARA-L 2.0 for IAM escalation, Secret Manager access, and open firewall creation.]]></summary></entry><entry><title type="html">Oracle Cloud Security Zones [English]</title><link href="https://blog.victorsilva.com.uy/oci-security-zones/" rel="alternate" type="text/html" title="Oracle Cloud Security Zones [English]" /><published>2025-08-05T22:36:18+00:00</published><updated>2025-08-05T22:36:18+00:00</updated><id>https://blog.victorsilva.com.uy/oci-security-zones</id><content type="html" xml:base="https://blog.victorsilva.com.uy/oci-security-zones/"><![CDATA[<p>In today’s cloud-first world, security isn’t just about monitoring threats—it’s about preventing them from happening in the first place. Oracle Cloud Infrastructure (OCI) Security Zones provide exactly this capability: proactive, policy-driven security enforcement that prevents misconfigurations before they can become vulnerabilities.
This comprehensive guide will walk you through implementing Security Zones with extensive code examples, Terraform configurations, CLI commands, and interactive demonstrations.</p>

<h3 id="what-are-oci-security-zones-a-technical-overview">What Are OCI Security Zones? A Technical Overview</h3>

<p>Security Zones in OCI are compartment-level security boundaries that enforce predefined security policies. They act as a “security firewall” for your infrastructure-as-code deployments, automatically validating every resource creation request against established security rules.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌─────────────────────────────────────────────────────────┐
│      OCI Tenancy                                        │
│  ┌─────────────────────────────────────────────────┐    │
│  │      Compartment                                │    │
│  │  ┌─────────────────────────────────────────┐    │    │
│  │  │      Security Zone                      │    │    │
│  │  │  ┌─────────────────────────────────┐    │    │    │
│  │  │  │      Security Recipe            │    │    │    │
│  │  │  │       • Network Rules           │    │    │    │
│  │  │  │       • Storage Rules           │    │    │    │
│  │  │  │       • Compute Rules           │    │    │    │
│  │  │  │       • IAM Rules               │    │    │    │
│  │  │  └─────────────────────────────────┘    │    │    │
│  │  └─────────────────────────────────────────┘    │    │
│  └─────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────┘
</code></pre></div></div>

<h3 id="setting-up-your-development-environment">Setting Up Your Development Environment</h3>

<p>Before we dive into code examples, let’s set up the necessary tools:</p>

<ul>
  <li>
    <p>OCI CLI installed and configured:
To install the OCI CLI, follow the official documentation: <a href="https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm">Installing the CLI</a></p>
  </li>
  <li>
    <p>Cloud Guard enabled in your OCI compartment:</p>
    <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># To check the status of Cloud Guard</span>
oci cloud-guard configuration get <span class="nt">--compartment-id</span> &lt;compartmentId&gt;
</code></pre></div>    </div>
  </li>
</ul>

<p>We could obtain the compartment ID using the CLI:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#Replace the compartment name before running the command</span>
<span class="nv">COMPARTMENT_ID</span><span class="o">=</span><span class="si">$(</span>oci iam compartment list <span class="se">\</span>
                  <span class="nt">--name</span> <span class="s2">"ocilabs"</span> <span class="se">\</span>
                  <span class="nt">--query</span> <span class="s2">"data[?contains(</span><span class="se">\"</span><span class="s2">id</span><span class="se">\"</span><span class="s2">,'compartment')].id | [0]"</span> <span class="se">\</span>
                  <span class="nt">--raw-output</span><span class="si">)</span>
</code></pre></div></div>

<p>First, let’s see all the policies available in the Security Zone. We can do this using the OCI CLI:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>oci cloud-guard security-policy-collection list-security-policies <span class="se">\</span>
  <span class="nt">--compartment-id</span> <span class="nv">$COMPARTMENT_ID</span> <span class="se">\</span>
  <span class="nt">--query</span> <span class="s2">"data.items[*]"</span>.<span class="o">{</span><span class="s2">"category:category,name:</span><span class="se">\"</span><span class="s2">display-name</span><span class="se">\"</span><span class="s2">"</span><span class="o">}</span> <span class="se">\</span>
  <span class="nt">--output</span> table
</code></pre></div></div>

<p>With all the prerequisites in place, we can now create a Security Zone in Oracle Cloud Infrastructure (OCI). A Security Zone is a compartment that enforces security policies to ensure compliance with best practices.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>oci cloud-guard security-policy-collection list-security-policies <span class="nt">--compartment-id</span> <span class="nv">$COMPARTMENT_ID</span> <span class="nt">--query</span> <span class="s2">"data.items[?contains(</span><span class="se">\"</span><span class="s2">display-name</span><span class="se">\"</span><span class="s2">, 'public_subnets')]"</span>
</code></pre></div></div>

<p>And add some manipulation to get only the id:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">DENY_PUBLIC_SUBNET_POLICY_ID</span><span class="o">=</span><span class="si">$(</span>oci cloud-guard security-policy-collection list-security-policies <span class="se">\</span>
  <span class="nt">--compartment-id</span> <span class="nv">$COMPARTMENT_ID</span> <span class="se">\</span>
  <span class="nt">--query</span> <span class="s2">"data.items[?contains(</span><span class="se">\"</span><span class="s2">display-name</span><span class="se">\"</span><span class="s2">, 'public_subnets')].id | [0]"</span><span class="si">)</span>
</code></pre></div></div>

<p>The lasts steps are to create a Security Recipe that will use the policy we just found. A Security Recipe is a collection of security policies that define the security posture for resources created within a Security Zone.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>oci cloud-guard security-recipe create <span class="se">\</span>
  <span class="nt">--compartment-id</span> <span class="nv">$COMPARTMENT_ID</span> <span class="se">\</span>
  <span class="nt">--display-name</span> <span class="s2">"fromCLI"</span> <span class="se">\</span>
  <span class="nt">--security-policies</span> <span class="s1">'['</span><span class="nv">$DENY_PUBLIC_SUBNET_POLICY_ID</span><span class="s1">']'</span>
</code></pre></div></div>

<p>Now, I’ll try to create a public subnet in the VCN.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Get the Virtual Cloud Network (VCN) ID</span>
<span class="nv">VCN_ID</span><span class="o">=</span><span class="si">$(</span>oci network vcn list <span class="se">\</span>
          <span class="nt">--compartment-id</span> <span class="nv">$COMPARTMENT_ID</span> <span class="se">\</span>
          <span class="nt">--query</span> <span class="s2">"data[?contains(</span><span class="se">\"</span><span class="s2">id</span><span class="se">\"</span><span class="s2">,'vcn')].id | [0]"</span> <span class="se">\</span>
          <span class="nt">--raw-output</span><span class="si">)</span>

<span class="c"># Try to create a public subnet</span>
oci network subnet create <span class="se">\</span>
  <span class="nt">--cidr-block</span> <span class="s2">"10.0.1.0/24"</span> <span class="se">\</span>
  <span class="nt">--compartment-id</span> <span class="nv">$COMPARTMENT_ID</span> <span class="se">\</span>
  <span class="nt">--vcn-id</span> <span class="nv">$VCN_ID</span>
</code></pre></div></div>

<p>After that, we will see the following error message:</p>

<p><img src="/assets/images/postsImages/OCI_0.png" alt="Error creating public subnet in OCI Security Zone" /></p>

<p>Perfect! OK, return an error message, but it’s the behavior we expect. The Security Zone prevents the creation of a public subnet, as it violates the security policies defined in the Security Recipe.</p>

<p>The same action, but using the web portal return ths message:</p>

<p><img src="/assets/images/postsImages/OCI_1.png" alt="Error creating public subnet in OCI Security Zone using web portal" /></p>

<p>Resuming, Security Zones in OCI are a powerful feature that helps enforce security best practices across your cloud environment. By defining Security Recipes, you can ensure that resources created within a Security Zone comply with your organization’s security policies.</p>

<p>Happy scripting!</p>]]></content><author><name>Victor Silva</name></author><category term="Oracle" /><category term="Security" /><category term="Oracle Cloud" /><category term="Security Zones" /><category term="Security" /><summary type="html"><![CDATA[In today's cloud-first world, security isn't just about monitoring threats—it's about preventing them from happening in the first place. Oracle Cloud Infrastructure (OCI) Security Zones provide exactly this capability: proactive, policy-driven security enforcement that prevents misconfigurations before they can become vulnerabilities.]]></summary></entry><entry><title type="html">Azure Functions development on macOS [English]</title><link href="https://blog.victorsilva.com.uy/azure-functions-macos-dev/" rel="alternate" type="text/html" title="Azure Functions development on macOS [English]" /><published>2025-07-16T23:51:48+00:00</published><updated>2025-07-16T23:51:48+00:00</updated><id>https://blog.victorsilva.com.uy/PowerShell-Azure-Functions-MacOS</id><content type="html" xml:base="https://blog.victorsilva.com.uy/azure-functions-macos-dev/"><![CDATA[<p>As cloud development continues to evolve, more developers are embracing cross-platform solutions. While Azure Functions traditionally felt more at home in Windows environments, macOS has become a first-class citizen for serverless development. Whether you’re a Mac user diving into Azure or a Windows developer switching platforms, this guide will get you up and running with Azure Functions on macOS.</p>

<p>The beauty of serverless computing lies in its platform agnostic nature. With Azure Functions, you can write code in PowerShell, Python, C#, Java, and JavaScript, and deploy it without worrying about the underlying infrastructure. But what about the development experience on macOS? Let’s explore how to set up a productive Azure Functions development environment on your Mac.</p>

<h2 id="prerequisites-setting-up-your-mac">Prerequisites: Setting up your Mac</h2>

<p>Before we dive into Azure Functions, we need to ensure our environment is properly configured. The good news is that Microsoft has invested heavily in cross-platform tooling, making the experience quite seamless.</p>

<p>First, let’s install the essential tools:</p>

<p><strong>Azure CLI</strong>
The Azure CLI is your gateway to managing Azure resources from the command line. Install it using <a href="https://brew.sh/">Homebrew</a>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>brew <span class="nb">install </span>azure-cli
</code></pre></div></div>

<p><strong>PowerShell</strong>
Yes, PowerShell runs natively on macOS! (since 2018 but don’t mind):</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>brew <span class="nb">install</span> <span class="nt">--cask</span> powershell
</code></pre></div></div>

<p><strong>Azure Functions Core Tools</strong>
This toolkit provides the runtime and templates for creating, debugging, and deploying Azure Functions:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>brew tap azure/functions
brew <span class="nb">install </span>azure-functions-core-tools@4
</code></pre></div></div>

<p><strong>Visual Studio Code</strong>
While not mandatory, VS Code provides an excellent development experience for Azure Functions:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>brew <span class="nb">install</span> <span class="nt">--cask</span> visual-studio-code
</code></pre></div></div>

<p>After installation, add the <a href="https://marketplace.visualstudio.com/items?itemName=ms-azuretools.vscode-azurefunctions">Azure Functions extension for VS Code</a> to enhance your development workflow.</p>

<h2 id="creating-your-first-powershell-azure-function-on-macos">Creating your first PowerShell Azure Function on macOS</h2>

<p>Now that we have our tools ready, let’s create our first Azure Function. We’ll use PowerShell as our runtime since it’s particularly powerful for automation and Azure management tasks.</p>

<p>First, let’s authenticate with Azure:</p>

<div class="language-powershell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Connect to Azure</span><span class="w">
</span><span class="n">Connect-AzAccount</span><span class="w">

</span><span class="c"># List available subscriptions</span><span class="w">
</span><span class="n">Get-AzSubscription</span><span class="w">

</span><span class="c"># Select your target subscription</span><span class="w">
</span><span class="n">Select-AzSubscription</span><span class="w"> </span><span class="nt">-SubscriptionId</span><span class="w"> </span><span class="s2">"your-subscription-id"</span><span class="w">
</span></code></pre></div></div>

<p>Create a new function app locally:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Create a new directory for our function and move into it</span>
<span class="nb">mkdir </span>PoShFunction <span class="o">&amp;&amp;</span> <span class="nb">cd</span> <span class="nv">$_</span>

<span class="c"># Initialize a new function app with PowerShell runtime</span>
func init <span class="nt">--worker-runtime</span> powershell
</code></pre></div></div>

<p>This command creates the basic structure for a PowerShell-based function app, including the <code class="language-plaintext highlighter-rouge">host.json</code>, <code class="language-plaintext highlighter-rouge">local.settings.json</code>, and other configuration files.</p>

<p><img src="/assets/images/postsImages/Mac_PoSh_Function_0.png" alt="Initialize PowerShell Function App" /></p>

<p>Let’s create our first HTTP-triggered function:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>func new <span class="nt">--name</span> HttpTriggerDemo <span class="nt">--template</span> <span class="s2">"HTTP trigger"</span>
</code></pre></div></div>

<p>This generates a new folder called <code class="language-plaintext highlighter-rouge">HttpTriggerDemo</code> with the function code. Let’s examine and modify the generated PowerShell script:</p>

<div class="language-powershell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># HttpTriggerDemo/run.ps1</span><span class="w">
</span><span class="kr">using</span><span class="w"> </span><span class="kr">namespace</span><span class="w"> </span><span class="n">System.Net</span><span class="w">

</span><span class="c"># Input bindings are passed in via param block.</span><span class="w">
</span><span class="kr">param</span><span class="p">(</span><span class="nv">$Request</span><span class="p">,</span><span class="w"> </span><span class="nv">$TriggerMetadata</span><span class="p">)</span><span class="w">

</span><span class="c"># Write to the Azure Functions log stream.</span><span class="w">
</span><span class="n">Write-Host</span><span class="w"> </span><span class="s2">"PowerShell HTTP trigger function processed a request on macOS."</span><span class="w">

</span><span class="c"># Interact with query parameters or the request body</span><span class="w">
</span><span class="nv">$name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">$Request</span><span class="o">.</span><span class="nf">Query</span><span class="o">.</span><span class="nf">Name</span><span class="w">
</span><span class="kr">if</span><span class="w"> </span><span class="p">(</span><span class="o">-not</span><span class="w"> </span><span class="nv">$name</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nv">$name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">$Request</span><span class="o">.</span><span class="nf">Body</span><span class="o">.</span><span class="nf">Name</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="nv">$body</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Hello, </span><span class="nv">$name</span><span class="s2">! This Azure Function was developed on macOS and is powered by PowerShell."</span><span class="w">

</span><span class="c"># Associate values to output bindings by calling 'Push-OutputBinding'.</span><span class="w">
</span><span class="n">Push-OutputBinding</span><span class="w"> </span><span class="nt">-Name</span><span class="w"> </span><span class="nx">Response</span><span class="w"> </span><span class="nt">-Value</span><span class="w"> </span><span class="p">([</span><span class="n">HttpResponseContext</span><span class="p">]@{</span><span class="w">
    </span><span class="nx">StatusCode</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="n">HttpStatusCode</span><span class="p">]</span><span class="err">::</span><span class="nx">OK</span><span class="w">
    </span><span class="nx">Body</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">$body</span><span class="w">
</span><span class="p">})</span><span class="w">
</span></code></pre></div></div>

<h2 id="testing-and-debugging-locally">Testing and debugging locally</h2>

<p>One of the great advantages of the Azure Functions Core Tools is the ability to run and test functions locally. This works seamlessly on macOS:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Start the function runtime locally</span>
func start
</code></pre></div></div>

<p>You’ll see output similar to this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Azure Functions Core Tools
Core Tools Version:       4.0.5030 Commit hash: N/A  (64-bit)
Function Runtime Version: 4.21.3.20404

Functions:
        HttpTriggerDemo: [GET,POST] http://localhost:7071/api/HttpTriggerDemo
</code></pre></div></div>

<p>Test your function using curl or your browser:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="s2">"http://localhost:7071/api/HttpTriggerDemo?name=MacOS"</span>
</code></pre></div></div>

<h2 id="advanced-powershell-scenarios">Advanced PowerShell scenarios</h2>

<p>Let’s create a more practical example - a function that manages Azure resources using PowerShell. This showcases the real power of combining PowerShell with Azure Functions on macOS:</p>

<p>Create a new timer-triggered function:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>func new <span class="nt">--name</span> ResourceMonitor <span class="nt">--template</span> <span class="s2">"Timer trigger"</span>
</code></pre></div></div>

<p>Here’s a more advanced PowerShell function that monitors resource group usage:</p>

<div class="language-powershell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># ResourceMonitor/run.ps1</span><span class="w">
</span><span class="c"># Input bindings are passed in via param block.</span><span class="w">
</span><span class="kr">param</span><span class="p">(</span><span class="nv">$Timer</span><span class="p">)</span><span class="w">

</span><span class="c"># Get the current universal time in the default string format.</span><span class="w">
</span><span class="nv">$currentUTCtime</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">Get-Date</span><span class="p">)</span><span class="o">.</span><span class="nf">ToUniversalTime</span><span class="p">()</span><span class="w">

</span><span class="c"># Write an information log with the current time.</span><span class="w">
</span><span class="n">Write-Host</span><span class="w"> </span><span class="s2">"PowerShell timer trigger function started at: </span><span class="nv">$currentUTCtime</span><span class="s2">"</span><span class="w">

</span><span class="kr">try</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="c"># Connect using Managed Identity (when deployed) or local credentials</span><span class="w">
    </span><span class="kr">if</span><span class="w"> </span><span class="p">(</span><span class="nv">$</span><span class="nn">env</span><span class="p">:</span><span class="nv">MSI_ENDPOINT</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="n">Connect-AzAccount</span><span class="w"> </span><span class="nt">-Identity</span><span class="w">
    </span><span class="p">}</span><span class="w"> </span><span class="kr">else</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="c"># For local development, use stored credentials</span><span class="w">
        </span><span class="n">Write-Host</span><span class="w"> </span><span class="s2">"Using local Azure credentials for development"</span><span class="w">
    </span><span class="p">}</span><span class="w">

    </span><span class="c"># Get all resource groups</span><span class="w">
    </span><span class="nv">$resourceGroups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Get-AzResourceGroup</span><span class="w">
    
    </span><span class="nv">$report</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">@()</span><span class="w">
    
    </span><span class="kr">foreach</span><span class="w"> </span><span class="p">(</span><span class="nv">$rg</span><span class="w"> </span><span class="kr">in</span><span class="w"> </span><span class="nv">$resourceGroups</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="c"># Get resources in each resource group</span><span class="w">
        </span><span class="nv">$resources</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Get-AzResource</span><span class="w"> </span><span class="nt">-ResourceGroupName</span><span class="w"> </span><span class="nv">$rg</span><span class="o">.</span><span class="nf">ResourceGroupName</span><span class="w">
        
        </span><span class="nv">$rgInfo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="n">PSCustomObject</span><span class="p">]@{</span><span class="w">
            </span><span class="nx">ResourceGroupName</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">$rg</span><span class="err">.</span><span class="nx">ResourceGroupName</span><span class="w">
            </span><span class="nx">Location</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">$rg</span><span class="err">.</span><span class="nx">Location</span><span class="w">
            </span><span class="nx">ResourceCount</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">$resources</span><span class="err">.</span><span class="nx">Count</span><span class="w">
            </span><span class="nx">CreatedTime</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">$rg</span><span class="err">.</span><span class="nx">Tags</span><span class="err">.</span><span class="nx">CreatedTime</span><span class="w">
            </span><span class="nx">LastChecked</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">$currentUTCtime</span><span class="w">
        </span><span class="p">}</span><span class="w">
        
        </span><span class="nv">$report</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nv">$rgInfo</span><span class="w">
    </span><span class="p">}</span><span class="w">
    
    </span><span class="c"># Log the summary</span><span class="w">
    </span><span class="n">Write-Host</span><span class="w"> </span><span class="s2">"Resource Group Summary:"</span><span class="w">
    </span><span class="nv">$report</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">ForEach-Object</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="n">Write-Host</span><span class="w"> </span><span class="s2">"  - </span><span class="si">$(</span><span class="bp">$_</span><span class="o">.</span><span class="nf">ResourceGroupName</span><span class="si">)</span><span class="s2">: </span><span class="si">$(</span><span class="bp">$_</span><span class="o">.</span><span class="nf">ResourceCount</span><span class="si">)</span><span class="s2"> resources in </span><span class="si">$(</span><span class="bp">$_</span><span class="o">.</span><span class="nf">Location</span><span class="si">)</span><span class="s2">"</span><span class="w">
    </span><span class="p">}</span><span class="w">
    
    </span><span class="c"># In a real scenario, you might want to:</span><span class="w">
    </span><span class="c"># - Send this data to Azure Monitor</span><span class="w">
    </span><span class="c"># - Store it in a database</span><span class="w">
    </span><span class="c"># - Send alerts for specific conditions</span><span class="w">
    
</span><span class="p">}</span><span class="w"> </span><span class="kr">catch</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="n">Write-Error</span><span class="w"> </span><span class="s2">"Error monitoring resources: </span><span class="si">$(</span><span class="bp">$_</span><span class="o">.</span><span class="nf">Exception</span><span class="o">.</span><span class="nf">Message</span><span class="si">)</span><span class="s2">"</span><span class="w">
    </span><span class="kr">throw</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="n">Write-Host</span><span class="w"> </span><span class="s2">"PowerShell timer trigger function completed at: </span><span class="nv">$currentUTCtime</span><span class="s2">"</span><span class="w">
</span></code></pre></div></div>

<h2 id="deployment-from-macos">Deployment from macOS</h2>

<p>Deploying your Azure Function from macOS is straightforward. First, create the necessary Azure resources using PowerShell:</p>

<div class="language-powershell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Variables for our deployment</span><span class="w">
</span><span class="nv">$resourceGroupName</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"rg-functions-macos-demo"</span><span class="w">
</span><span class="nv">$functionAppName</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"func-macos-demo-</span><span class="si">$(</span><span class="n">Get-Random</span><span class="p">)</span><span class="s2">"
</span><span class="nv">$location</span><span class="s2"> = "</span><span class="n">East</span><span class="w"> </span><span class="nx">US</span><span class="s2">"
</span><span class="nv">$storageAccountName</span><span class="s2"> = "</span><span class="nx">stamacosdemfunc</span><span class="err">$</span><span class="p">(</span><span class="n">Get-Random</span><span class="p">)</span><span class="s2">"

# Create resource group
New-AzResourceGroup -Name </span><span class="nv">$resourceGroupName</span><span class="s2"> -Location </span><span class="nv">$location</span><span class="s2">

# Create storage account (required for Azure Functions)
</span><span class="nv">$storageParams</span><span class="s2"> = @{
    ResourceGroupName = </span><span class="nv">$resourceGroupName</span><span class="s2">
    Name = </span><span class="nv">$storageAccountName</span><span class="s2">
    Location = </span><span class="nv">$location</span><span class="s2">
    SkuName = "</span><span class="n">Standard_LRS</span><span class="s2">"
    Kind = "</span><span class="nx">StorageV2</span><span class="s2">"
}
New-AzStorageAccount @storageParams

# Create the function app
</span><span class="nv">$functionParams</span><span class="s2"> = @{
    ResourceGroupName = </span><span class="nv">$resourceGroupName</span><span class="s2">
    Name = </span><span class="nv">$functionAppName</span><span class="s2">
    StorageAccountName = </span><span class="nv">$storageAccountName</span><span class="s2">
    Location = </span><span class="nv">$location</span><span class="s2">
    Runtime = "</span><span class="nx">PowerShell</span><span class="s2">"
    RuntimeVersion = "</span><span class="nx">7.2</span><span class="s2">"
    FunctionsVersion = "</span><span class="nx">4</span><span class="s2">"
}
New-AzFunctionApp @functionParams
</span></code></pre></div></div>

<p>Deploy your function using the Azure Functions Core Tools:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Deploy to Azure</span>
func azure functionapp publish <span class="nv">$functionAppName</span>
</code></pre></div></div>

<p>Whether you’re automating infrastructure tasks, building APIs, or creating scheduled jobs, Azure Functions on macOS with PowerShell gives you the flexibility to work in your preferred environment while leveraging the power of Azure’s serverless platform.</p>

<p>Ready to start building? The tools are installed, the examples are tested, and Azure is waiting for your next serverless creation!</p>

<p>Happy scripting!</p>]]></content><author><name>Victor Silva</name></author><category term="Azure" /><category term="PowerShell" /><category term="Azure" /><category term="PowerShell" /><category term="Development" /><category term="macOS" /><summary type="html"><![CDATA[As cloud development continues to evolve, more developers are embracing cross-platform solutions. While Azure Functions traditionally felt more at home in Windows environments, macOS has become a first-class citizen for serverless development. Whether you're a Mac user diving into Azure or a Windows developer switching platforms, this guide will get you up and running with Azure Functions on macOS.]]></summary></entry></feed>