Case studies

Real systems. Real results.

A selection of systems we've designed and deployed. Client names are anonymized — metrics and technical details are exact.

E-commerce / Fashion⏱ 6 weeks to first campaign live

#01

From 40 hours to 4: AI creative pipeline for fashion e-commerce

The challenge

The brand's creative team needed 6–8 weeks and €25k/month in agency fees to produce one campaign's worth of assets. A/B testing was minimal — only 3–5 variants per campaign. The creative director spent 40% of their time in briefing meetings with no guarantee of brand consistency across deliveries.

Our approach

We fine-tuned FLUX.1 on 3 years of brand photography, establishing style, color, and composition consistency at the model level. A ComfyUI pipeline handles variant generation at scale — product images, backgrounds, copy overlays, and format resizing across 6 aspect ratios. An n8n workflow routes approved variants directly into Meta and Google ad accounts with naming conventions for structured A/B test reporting.

Results

340

Monthly ad variants

+750% vs prior 40

€4,200/mo

Creative production cost

-83% vs €25k

48 hours

Time to campaign launch

-94% vs 6 weeks

+31%

Click-through rate

vs prior campaign baseline

FLUX.1 (Black Forest Labs)HuggingFace Inference EndpointsComfyUIn8nMeta Ads APIGoogle Ads APIGPT-4o (copy)

Software / B2B⏱ 8 weeks including voice training and CRM integration

#02

24/7 AI voice qualification — 67% of leads handled without human involvement

The challenge

The SDR team was handling 800+ inbound leads per month. Average first-response time was 4 hours, dropping to 12+ after 6pm and on weekends. 60% of leads were unqualified but consumed identical SDR time as high-value prospects. The team was burning out on repetitive discovery calls and losing deals to faster-responding competitors.

Our approach

We built a voice agent on Vapi with an ElevenLabs voice tuned to match brand tone. The agent handles initial qualification across four dimensions (budget, timeline, use case, decision authority), books meetings directly into Calendly, and enriches lead records in HubSpot before handing off. Complex or high-score leads are escalated to human SDRs with a structured context brief.

Results

<2 min

First response time

-97% vs 4h average

67%

Calls handled autonomously

no human involved

+3.2x

Qualified pipeline

same SDR headcount

+180%

SDR time on top-tier leads

freed from triage calls

VapiElevenLabs Conversational AIClaude 3.5 SonnetHubSpot APICalendly APIn8nDeepgram STT

Financial Technology⏱ 5 weeks to production pipeline

#03

Multi-agent content system: 4× output, 85% less effort, 3 markets

The challenge

Two-person marketing team managing content for three markets with compliance requirements. Each article took ~5 hours: briefing, research, drafting, SEO optimization, compliance review, CMS upload. At 4 posts/week capacity they could not keep pace with competitor content velocity or capitalize on trending topics within the news cycle.

Our approach

A 5-agent CrewAI pipeline handles the full workflow: Research Agent (Perplexity API + Tavily), Outline Agent, Writer Agent (Claude 3.5 Sonnet with compliance-aware system prompt), SEO Optimizer (Ahrefs API), and Publisher (Sanity CMS API). Human review is a single checkpoint at final draft — typically 20–30 minutes per article.

Results

16 posts

Weekly content output

+300% vs prior 4

45 min

Production time per post

-85% vs 5h

+67%

Organic traffic

in 4 months post-launch

3h/week

Team hours on content ops

-85% vs 20h

CrewAIClaude 3.5 SonnetPerplexity APITavilyAhrefs APISanity CMS APIn8n

Consumer / Wellness⏱ 10 weeks (data prep, fine-tuning, evaluation, deploy)

#04

Fine-tuned ad copy LLM — on-brand in <30 seconds, CVR +28%

The challenge

Generic LLM output sounded nothing like the brand — every piece of copy required heavy editorial passes before it was usable. A/B testing was slow: 20 copy variants tested per quarter, with no systematic way to predict winners before spending on distribution. Three years of performance data sat entirely unused.

Our approach

We used the brand's full historical ad copy corpus plus annotated conversion data to fine-tune Llama 3.3 70B with LoRA on HuggingFace. A secondary ranking model, trained on historical CTR/CVR pairs, scores generated variants before they enter paid testing. The workflow compresses from brief to spend decision: brief → generate 20 variants → rank → test top 5 → iterate.

Results

<30 sec

Copy generation time

per full variant set

200+

Variants tested per quarter

+900% vs prior 20

+28%

Top-line CVR

vs pre-model baseline

5× faster

Winner discovery speed

vs random variant testing

Llama 3.3 70BHuggingFace TransformersPEFT / LoRA fine-tuningHuggingFace Inference EndpointsCustom CVR ranking modelMeta Ads API

Start here

Tell us what you're trying to build

Not sure which service fits? Describe the bottleneck — we'll map the right system and scope a solution.

Start a conversation →

Browse services & pricing