competitive-intelligence · 9 min read

Inside CommonWealth Ops: How Our Weekly Intelligence Reports Are Built

Last updated: June 2026

Why publish a how-it-works post

Most ad-intelligence tools describe their pipeline in vague marketing copy — proprietary algorithms, AI-powered classification, real-time updates. Operators reading this language get no useful information about what the tool actually does.

This post is the technically honest version for CommonWealth Ops. The exact steps, in order, with the real failure modes. The two gaps the pipeline does not solve — we name them. The Apex rules that govern non-blocking failures — we cite them. An operator reading this should be able to explain CommonWealth Ops's pipeline to their team in three sentences and decide whether the structure fits their need.

The single cron line

The entire weekly pipeline starts from one crontab entry on a single VPS:

``` 0 23 1 bash /home/kobicraft/projects/CommonWealth-Ops/working/deploy/cron-weekly-intelligence.sh ```

Mondays at 23:00 UTC. The script that follows runs five steps in sequence, fails open on individual step errors, and writes a lastrunstatus.json artifact at the end so observability tools can read whether the run succeeded. No human is in the loop — Apex 34 mandates that weekly intelligence runs without human intervention.

Step 0 — Whisper ASR transcription of new TikTok video ads

Before the ingester reads scrapedads, the pipeline opens a Whisper transcription step. The scrapedads table holds TikTok video ad rows where the public library exposed only the video URL and no on-screen caption — the spoken hook is invisible to the rest of the pipeline unless we listen to it.

The step bounds itself to 25 ads per run. Faster-whisper (a community-optimised reimplementation of Whisper, MIT licence) runs inside the docker-worker container. For each video the worker re-probes the live TikTok Ad Library detail page, downloads the video file, runs ASR, and writes the resulting transcript into scrapedads.hooktext.

Failure here is non-blocking — Apex 37. If the worker container is not running, the step skips. If the kobiispy module is not importable, the step skips. If Whisper OOMs on a particular video, that one video is skipped, the rest continue. The pipeline never crashes on an ASR step failure because the trending_score downstream can still operate on the on-screen and caption hooks alone.

What this means in practice: the TikTok hook archive includes ads whose hook only existed as audio narration. Without Whisper those would be classified as no-hook ads — false-negative archive entries. With Whisper they enter the search and pattern surface like any other hook.

Step 1 — Ingest into adperformancesignals

The Meta Ad Library and TikTok Ad Library are scraped separately by the KobiiSpy scrape system (out of scope for this post). Step 1 of the weekly pipeline takes the freshest output and writes structured rows to the adperformancesignals PostgreSQL table inside the cw-postgres container.

Each row captures: advertiser identity, ad creative identifier, platform (meta or tiktok), niche label, captured-at timestamp, on-screen hook text, classified hook archetype (identity / result / problem / social proof), classified trigger combination, run duration in days, persistence flag, and (for TikTok) views_band.

The ingester is invoked per niche so the filter is clean. Failure of the ingester is non-blocking — if the ingest exits non-zero, the pipeline logs the warning and continues to generation. Existing rows in adperformancesignals from previous weeks remain usable; the generator does not require this week's ingest to have succeeded to produce a meaningful post (this matters when a single bad week of scrape data would otherwise block the whole pipeline).

Step 2 — Generate per-niche, per-language posts

For each (niche, language) pair the pipeline calls scripts.generateintelligencepost inside the cw-api container with three arguments: niche, language, ISO week. The generator queries adperformancesignals for the current ISO week and produces a Markdown-with-frontmatter (MDX) post.

Two design choices govern the generator. First — and this is load-bearing — the generator hard-fails with exit code 1 if there are zero ad-signal rows for the (niche, week) tuple. No empty intelligence post is ever generated, no synthetic data is ever inserted to fill a gap. Apex 9 (honest signal) and Apex 33 (hard fail on zero signal) are cited inline in the deploy script. Per-pair failure is swallowed at the bash layer — the loop continues to the next pair — but the individual file never lands if there is no real data behind it.

Second — the generator produces deterministic output for a given input. Re-running it on the same adperformancesignals state produces a byte-identical MDX file. This is what makes the next step's idempotency real.

The generated MDX includes: ISO-week title, lead summary, top patterns table (hook archetypes + trigger combinations + persistence flags), top advertisers, sample ad creatives, and a methodology citation. Every claim in the post is traceable back to a row in adperformancesignals.

Step 3 + 4 — Git clone, copy, commit, push

The generated MDX files exist inside the docker container. To reach production they must traverse to the host filesystem, then into the production bare repo. The pipeline does this via:

  1. `docker cp` each MDX file out of the cw-api container into a temporary host directory.
  2. `git clone` the production bare repo at `/home/kobicraft/projects/CommonWealth-Ops/repo.git` into a different temporary directory.
  3. Copy each MDX into its target path inside the cloned working tree (frontend/content/blog/{lang}/...).
  4. Pathspec-scoped `git add` — only the listed MDX files reach the index. Documented project-wide doctrine after a pane-collision incident in May; this script was an early adopter.
  5. `git diff --cached --quiet` — if no actual content change, skip commit. Re-running the cron on the same week is idempotent.
  6. `git commit` with a bot identity (CW Ops Intelligence Bot, intelligence@cw.infinityops.ai) and a structured commit message.
  7. `git push` to the bare repo. The post-receive hook on the bare repo deploys the Next.js site.

Within minutes of step 7, the new post URLs return 200 on the production site. The sitemap regenerates automatically (it reads the MDX directory at build time). hreflang alternates between EN and ES pair correctly because both posts use the matching ISO-week slug pattern.

Step 5 — Notify Google's Indexing API

After the push completes, the pipeline calls `scripts/notifynewposts.py` with both the EN and ES blog sitemaps. The script discovers any URL in the sitemaps that has not been notified before (tracking lives in `vault/seo/notified-posts.csv`) and submits a URL Notification to Google's Indexing API.

Honest scope here: Google's Indexing API documentation says the API only ACTS on JobPosting and BroadcastEvent schemas. Article and FAQPage URLs receive 200 OK responses but Google's documented policy is that the crawl signal is ignored for non-JobPosting schemas. We wired step 5 anyway for two reasons. First, if Google ever expands the API's actionable surface, we are already plumbed in. Second, the tracking CSV captures URL coverage for downstream manual GSC Request-Indexing actions — operators on our side use this CSV.

If the OAuth token at `~/.claude-google-accounts/<email>.json` is missing, the step exits cleanly with code 3 and the pipeline continues. Apex 37 non-blocking again.

The end-of-run status emitter

The script's EXIT trap writes `/home/kobicraft/kobiispy/lastrunstatus.json` atomically (mv-on-same-fs) before the temp directory cleanup runs. The JSON includes: timestamp, ISO week, success-or-failure status, run duration in seconds, posts generated count, next scheduled run timestamp, and the realpath of the script itself.

Observability tools downstream — including the audit posts authored in subsequent sprints — read this file to verify the pipeline is healthy. A missing or stale lastrunstatus.json is a structural alarm; it means the cron failed to fire (a crontab regression) or the script crashed before the EXIT trap could run (a deeper failure).

What the pipeline captures well

Three things the pipeline does that are hard to replicate.

Whisper-transcribed TikTok hooks. The spoken-only hook of a TikTok video ad is invisible to anyone reading the public TikTok Ad Library by hand. The pipeline turns audio into searchable text, weekly, at scale. Niches dominated by spoken-narration video (fitness coaching, skincare tutorial) are where this lever pays the most.

Emerging-market ad capture. The scrape surface includes India (Flipkart, BigMuscles, HK Vitals, Plix franchise, Mamaearth franchise, BEARDO for Men, Lotus Botanicals, Clinikally, Purplle, Pilgrim), LATAM (Brazilian Portuguese fitness brands, Spanish-language skincare brands like Angê and Tori Repa), and other geographies that Western-focused tools structurally under-index.

Honest empty-result handling. A zero-signal niche-week tuple does not produce an empty post. The pipeline crashes cleanly and the rest of the pipeline continues. No synthetic data, no AI-hallucinated content. Operators reading a CommonWealth Ops intelligence post can trust that every claim has at least one underlying ad-signal row.

What the pipeline does NOT solve

Two structural gaps an honest post must name.

Gap 1 — Meta and TikTok advertiser spend. As covered in the FAQ, neither platform discloses commercial advertiser spend. The pipeline uses observable proxies (run duration, advertiser concentration, ad-count) for ranking but does not claim spend numbers. Operators wanting competitor spend need to accept the platform-policy wall or pay for third-party panel data with wide error bands.

Gap 2 — Cross-platform attribution. The pipeline reads what advertisers ran on Meta and what they ran on TikTok independently. It does not stitch a single advertiser's cross-platform creative strategy into one view. An advertiser running synchronised Meta-and-TikTok campaigns appears as two separate signals; the pattern of synchronisation is left to the reader.

How operators use this in practice

Three real workflows.

The Monday-morning brief. Subscribers read the per-niche intelligence post Monday morning local time. The lead pattern, top advertisers, and top trigger combinations inform that week's creative roadmap. Time-to-decision after reading: typically under thirty minutes.

The competitor watchlist. Operators add specific advertisers to their internal watchlist. When a watched advertiser appears in the persistent-ads section of the weekly post, that is a signal the advertiser scaled a new creative. The operator's response is creative-test pull or strategy review.

The pattern-archive search. Operators searching backward for hooks in their niche use the post archive as a structured search surface. The Whisper-transcribed TikTok hooks are particularly load-bearing here — they would not exist as text anywhere else.

What changes next quarter

Two pipeline upgrades on the roadmap. First, expanding niche coverage beyond fitness and skincare requires consistent weekly ad-signal volume in the candidate niches — supplements is the leading candidate based on existing scrape volume. Second, the Whisper step is currently bounded to 25 ads per run; the bound exists because the ASR step adds wall-clock time and we want the cron to finish inside its window. Raising the bound is a single config change but requires acceptable per-run wall-clock budget.

For the full glossary of terms used inside the pipeline, see our 50-term ad intelligence glossary. For the honest comparison against other tools in this category, see our AdSpy comparison.

See pricing →

Frequently asked questions

How often is CommonWealth Ops intelligence updated?
Once per week — every Monday at 23:00 UTC. The cron triggers the pipeline; the pipeline ingests, classifies, generates, commits, pushes, and notifies Google's Indexing API. The post URL is live within minutes of the cron's commit. If a niche has zero ad-signal rows for a week, that niche-language pair is skipped cleanly (no empty post is ever published) — documented behaviour, not a defect.
What platforms does the pipeline cover?
Meta Ad Library (Facebook, Instagram, Messenger, Audience Network) and TikTok Ad Library — surfaced via TikTok Creative Center. Snapchat, Pinterest, YouTube, and LinkedIn ad surfaces are NOT covered. For LinkedIn-heavy or B2B operators, we discuss the gap in our B2B competitive intelligence post.
How does Whisper ASR fit in?
Many TikTok video ads have audio narration but no on-screen caption — the public TikTok Ad Library does not surface the spoken hook. The pipeline downloads up to 25 such ads per week, runs faster-whisper (community-optimised Whisper) inside the docker-worker container, and writes the transcript into the scraped_ads.hook_text column. The transcribed hooks then enter the trending_score calculation alongside on-screen hooks. Apex 37 governs the step — if Whisper OOMs or the model is missing, the pipeline logs and continues; ranking falls back to non-Whisper signals.
Does CommonWealth Ops surface competitor ad spend?
No. Meta discloses spend in coarse bands ONLY for political ads — commercial advertiser spend is not exposed. TikTok does not disclose spend at all. Any tool claiming to surface competitor commercial spend is inferring from third-party panel data with wide error bands. We document the honest scope of what we capture in our scope-honesty post.
What niches does the pipeline currently cover?
Currently fitness and skincare are the actively-scraped niches with weekly cron output. Supplements, beauty, food and beverage, and home goods are read on demand for editorial intelligence posts but not via the weekly cron. Adding a niche to the weekly cron requires expanding the NICHES array in the deploy script — straightforward operationally but requires consistent ad-signal volume in the target niche to be worthwhile.
Is the pipeline open source?
The pipeline itself is not open source. The Whisper model we use (faster-whisper, MIT licence) is open source. The intelligence content shipped from the pipeline is published openly under cw.infinityops.ai/blog. Operators who want the structured signal use the dashboard; operators who want the editorial pattern read the public posts.

Become an operator

Stop guessing what to sell.

CommonWealth Ops turns your market's competitor activity into ranked, data-backed intelligence — and protects your capital before you spend a euro on ads. EUR 49/mo + 20% of net profit. No free trial: skin in the game both ways.

Join the waitlist
See pricingReal and aspirational stories
Written by CommonWealth Ops Intelligence · Editorial, 2026-06-01

← All posts