Using Claude to Analyze Google Ads Performance for Small Businesses

The B&B owners I work with aren’t marketers. They’re people who built guest rooms, learned to cook breakfast for strangers, and somewhere along the way ended up managing Google Ads campaigns because that’s what you do now if you want bookings. They run ads. They don’t really know which keywords are working. They keep paying because stopping feels risky.

That’s the problem this system is meant to solve.

I manage Google Ads for two small family-run B&Bs in Taiwan — the kind of places with no IT department, no marketing team, and no one who knows what a Quality Score is. For a while, my weekly routine was: pull keyword reports, stare at numbers, make gut-call adjustments. It worked okay, but “okay” meant I was the bottleneck. If I got busy, the keywords went unreviewed. If I missed a bad keyword eating through budget, nobody caught it until the monthly bill arrived.

So I built bnb-ads-manager. The core idea: automate the data gathering, feed it to Claude, and let Claude do the reasoning while a human still makes the final call.

The Weekly Flow

Every week the system does this:

Pull keyword and campaign performance from the Google Ads API
Read inventory context from Google Sheets (room types, pricing, availability notes)
Send all of it to Claude with business context about each property
Claude returns structured JSON with recommendations
A Gmail notification goes out with a review link
Human approves (or edits)
Execute the approved changes via adGroupCriteria:mutate

The key thing I wanted to preserve was step 6. The system suggests, humans decide. Before this existed, keyword decisions were made by feel — or not made at all. Now there’s a weekly report: here’s what’s working, here’s what’s burning budget, here’s what to watch. The B&B owner doesn’t execute the changes — I do, after review — but they can actually see the reasoning. That shift from “I trust Wayne” to “I understand what’s happening” matters for a small business owner.

I’ve seen too many automations where “human in the loop” is just a checkbox, not a real gate. I didn’t want to build that.

What Claude Actually Sees

The prompt includes more than just performance numbers. I pass in accommodation type, location, typical guest profile, and seasonality notes. Something like:

Business context:
- Property: Mountain cabin, sleeps 6, near hiking trails
- Location: Nantou, Taiwan
- Seasonality: High season Oct-Feb (cool weather hikers), low season Jun-Aug
- Current inventory: 3 rooms available next 2 weeks

Keyword performance (last 30 days):
[...CSV data...]

Return JSON in this format:
{
  "mvp": {"keyword": "...", "stats": "..."},
  "monitor": {"keyword": "...", "stats": "..."},
  "add_suggestions": [{"keyword": "...", "reason": "..."}],
  "remove_suggestions": [{"keywordText": "...", "adGroupId": "...", "criterionId": "...", "reason": "..."}]
}

The mvp object was something I added after a few weeks — it’s for the single most valuable keyword performing well enough to flag, so the client can understand where their budget is actually working. Before this, the report was just a list of suggested changes with no clear “here’s the good news” anchor. monitor covers keywords in the new-keyword protection window (fewer than 14 days of data), where the LLM explains the status rather than making a removal call. add_suggestions and remove_suggestions carry the actual actionable changes.

The 14-Day Protection Rule

Early on I noticed a problem: the system would recommend removing a keyword, we’d pause it, then a week later recommend adding it back. Oscillation. Annoying and probably bad for Quality Score.

The underlying issue is that newly added keywords don’t have enough data to judge. Three days of impressions doesn’t tell you much. Recommending removal based on that data is noise, not signal — and for a B&B owner with a limited budget, acting on that noise wastes money.

I added a simple rule: don’t touch any keyword that was modified in the last 14 days.

from datetime import date

def is_protected(row: dict) -> bool:
    date_added = row.get("Date_Added", "")
    if not date_added:
        return False
    try:
        added = date.fromisoformat(date_added)
        return (date.today() - added).days <= 14
    except ValueError:
        return False

It’s not sophisticated, but it stopped the back-and-forth. Claude still sees these keywords in the data, but the execution layer skips them even if they’re in the recommendation output.

Swappable LLM Provider

I didn’t want to be locked into Anthropic forever, so I built a thin abstraction using Python’s Protocol:

from typing import Any, Protocol

class LLMClient(Protocol):
    def analyze(self, system_prompt: str, user_payload: dict[str, Any]) -> dict[str, Any]:
        ...

def build_llm_client(cfg: Config) -> LLMClient:
    if cfg.llm_provider == "anthropic":
        return AnthropicLLMClient(cfg.anthropic_api_key, cfg.llm_model)
    if cfg.llm_provider == "openai":
        return OpenAILLMClient(cfg.openai_api_key, cfg.llm_model)
    raise RuntimeError(f"unknown LLM_PROVIDER: {cfg.llm_provider}")

In practice I’ve only used Claude (Opus at the time). But having the Protocol interface means I can compare providers on the same prompts if I ever want to — or swap by changing the LLM_PROVIDER env var without touching any other code.

What Surprised Me

Claude is genuinely decent at seasonal reasoning when you give it context. Without context, it would just look at raw CTR and CPC and make generic suggestions. With context, it would flag things like: “This ‘mountain hiking October’ keyword has low volume now but historically peaks in 6 weeks — consider watching instead of removing.”

That’s the kind of call a B&B owner can’t make on their own — not because they’re not smart, but because they don’t have time to cross-reference keyword data with their own booking history and the local hiking calendar. Giving Claude that context closes the gap.

It’s not magic. It’s pattern matching on text I provided. But it saved me from making a shortsighted call a couple of times, and more importantly, it gave clients a weekly artifact they could actually read — not a dashboard with metrics they don’t understand, but a short report that says “this keyword earned its keep, this one didn’t.”

Google Sheets as the audit log turned out to be the right call too. The clients can open a spreadsheet and see what the AI recommended vs. what was actually executed. No dashboard to learn, no login to manage. They already live in Sheets.

The thing I’d do differently: the prompt engineering took longer than I expected. Getting Claude to return consistently structured JSON — especially when performance data is sparse or weird — required a few iterations of output validation and prompt tweaking.