Real data, zero estimated spend
product · 8 min read
How CommonWealth Ops Collects and Processes Ad Intelligence (Technical Overview)
Last updated: June 2026
How does CommonWealth Ops collect ad intelligence technically?
CommonWealth Ops runs a weekly Monday-23:00-UTC cron that (1) scrapes Meta Ad Library and TikTok Ad Library for a fixed niche set, (2) downloads TikTok ad videos and transcribes them with Whisper small int8 ASR, (3) normalizes the captured set into a structured database, (4) generates per-niche intelligence posts at /blog. Current scale: ~50-100 commercial ads captured per week across fitness and skincare niches, India-heavy geographic coverage, structurally walled spend data per platform policy.
The architecture in one diagram
The system has four layers, each a separate concern:
- KobiiSpy — the scraper layer. Two Playwright-based adapters: one for Meta Ad Library, one for TikTok Ad Library. Scoped per-niche and per-country.
- Whisper transcription — TikTok ad videos go through faster-whisper small int8 CPU ASR to extract hooktext from spoken audio. Failed transcriptions are explicitly marked '[audiounclear]' or '[system_voice]' rather than fabricated.
- CommonWealth Ops ingester — pulls KobiiSpy's `scrapedads` table into the `adperformance_signals` table on the CW Ops Postgres, with idempotent UPSERT semantics so re-running the same niche-week never produces duplicates.
- Weekly post generator — reads `adperformancesignals`, classifies creatives by archetype, applies a trending_score formula, and emits a per-(niche, lang) MDX file under `frontend/content/blog/`.
The intelligence posts at /blog are the user-facing surface. Everything above them is plumbing.
What CommonWealth Ops captures
For Meta ads, the captured fields per row are:
- `advertiser` (the Page name)
- `niche` (assigned by the scraper based on which niche-query found the ad)
- `hook_text` (the first segment of the ad's primary text)
- `visual_format` (image / video / carousel / unknown)
- `cta_type` (Shop Now / Learn More / Sign Up / etc.)
- `country`
- `datefirstseen` and `datelastseen` (mirroring the Ad Library's First/Last shown)
- `scraped_at` (when CommonWealth Ops captured the row)
For TikTok ads, the captured fields add `viewsband` (the categorical reach band: 0-1K, 1K-5K, 5K-10K, 10K-100K, 100K-1M, 1M-10M, 10M+). The `hooktext` for TikTok ads comes from Whisper transcription of the ad video, NOT from the public DOM (TikTok's library exposes minimal text per ad).
What CommonWealth Ops does NOT capture
Structural walls bound the dataset:
- Spend for commercial ads on either platform. Meta exposes spend only for political ads under EU DSA + US election rules. TikTok exposes no spend at all in the public Ad Library. CommonWealth Ops documents this in dedicated vault wall docs rather than filling in estimates.
- Audience targeting. Neither platform exposes this publicly. CommonWealth Ops does not infer it.
- Performance metrics (CTR, conversion rate, ROAS). Both platforms keep these private; CommonWealth Ops does not estimate them.
When CommonWealth Ops publishes a B-type (data-led) post, every cited value comes from the captured set. When the data has a structural wall, the post acknowledges it explicitly rather than substituting a fabricated number. This is codified in our internal Apex 45 rule.
The weekly cadence in concrete steps
Monday 23:00 UTC:
- Whisper ASR enrichment runs first — re-processes any rows with NULL hook_text where the source video is still accessible.
- KobiiSpy ingester pulls fresh scrapedads into adperformance_signals.
- The intelligence post generator runs for each (niche, lang) pair: fitness/en, fitness/es, skincare/en, skincare/es.
- Generated MDX files commit to the repo's main branch via a pathspec-scoped commit (frontend/content/blog/* only).
- The deploy lane's post-receive hook rebuilds the Next.js production container.
- The Google Indexing API notify step runs for any new URLs (acknowledged in our internal UT-19 doctrine that Google only acts on JobPosting and BroadcastEvent — blog Articles receive 200 OK but Google ignores the crawl signal, so we track the attempt without claiming successful indexing).
The full cycle takes 5-15 minutes depending on Whisper transcription queue depth. The intelligence post is live on /blog by Tuesday 00:00 UTC.
Current scale
For the May-June 2026 window:
- Fitness niche: 47 commercial ads captured on Meta in 30 days, 10 on TikTok. India-heavy coverage.
- Skincare niche: 15+ commercial ads captured on Meta in 30 days, very thin TikTok coverage.
- Total `scraped_ads` rows in the kobiispy database: 108 across both niches and both platforms.
- Geographic coverage: India dominant, Brazil secondary, with thinner representation from Spain, Thailand, Indonesia, Mexico.
The scale is modest by design. CommonWealth Ops captures the LIVE niche state rather than a multi-year archive. For multi-year search, paid scrapers (AdSpy, Minea, Foreplay) cover that surface; CommonWealth Ops covers the weekly-delta surface.
How CommonWealth Ops is different from raw-data tools
Compared to paid scrapers, CommonWealth Ops is a processed-signal product, not a raw-data product. The difference is what the operator does at receipt:
- Paid scraper: operator searches, filters, tags, summarizes. The tool gives raw ads; the operator extracts pattern.
- CommonWealth Ops: operator reads the report. Pattern extraction happens server-side in the post generator.
The right tool depends on whether the operator wants to spend time on pattern extraction (paid scraper) or on acting on patterns (CommonWealth Ops). Most operators we talk to prefer the latter at the EUR 49/month price point.
Where to learn more
The pricing page covers the EUR 49/month subscription plus 20% of net profit, a single rate with no threshold and EUR 0 share in any month you do not profit. The what-is-CommonWealth-Ops-intelligence guide gives the broader framing of competitive intelligence as a discipline. The TikTok-vs-Meta comparison covers platform-specific asymmetries in what each library exposes.
Frequently asked questions
- What data exactly does CommonWealth Ops collect?
- For Meta ads: advertiser Page name, ad creative (image/video/carousel reference), headline + primary text + CTA label, first-shown and last-shown dates, platforms (FB/IG/Messenger/Audience), and country. For TikTok ads: advertiser name (when available), ad video URL, first-shown and last-shown dates, the categorical reach band (0-1K through 10M+), and country. CommonWealth Ops does NOT collect: spend (structurally walled by both platforms for commercial ads), audience targeting, performance metrics, or any private data. Everything captured is from the public Ad Library surfaces.
- How frequently is the data updated?
- The full scrape runs Monday 23:00 UTC weekly. TikTok video transcription via Whisper runs the same Monday before scraping completes. The intelligence posts at /blog publish the same Monday. Daily ad-library scraping also runs (separate process at 20:20 UTC daily) to catch new entrants between weekly publishes. The /blog content reflects the weekly snapshot; the underlying database reflects daily state.
- How does CommonWealth Ops ensure accuracy?
- Three layers. (1) The scraper reads only public Ad Library surfaces — no private data, no inference. What we publish is what the platforms themselves expose. (2) TikTok video transcription uses Whisper small int8 with explicit '[audio_unclear]' marker for unreadable audio rather than fabricated text. (3) Where data has structural walls (Meta commercial spend, TikTok numeric spend), CommonWealth Ops documents the wall in a dedicated vault doc rather than filling in estimates. Honest scope acknowledgment is built into the methodology.
- How is CommonWealth Ops different from AdSpy, Minea, or Foreplay?
- Those tools cache ad-library data and provide search/filter interfaces over raw ads. CommonWealth Ops produces a per-niche intelligence REPORT each week — pattern observations, longevity tracking, new-entrant deltas — instead of just raw ad search. The tools answer 'show me ads matching X.' CommonWealth Ops answers 'what's the actionable pattern in your niche this week.' Different products for different operators; the underlying data substrate (public Ad Libraries) is the same.
Become an operator
Want the same path?
CommonWealth Ops is invite-direct — join the waitlist and we'll reach out as spots open. EUR 49/mo + 20% of net profit when you start. No spam, no preview leaks.