Skip to content
All projects
Scrappling app icon

Scrappling

Paste a URL, get clean data.

LiveBuilt solo
Live demoGitHub
scrappling-ui.vercel.app

What it is

Paste a URL and Scrappling renders it with a stealth fetcher, then hands back clean JSON and Markdown side by side. It is built on Scrapling behind a small FastAPI service, with agent-readable endpoints so AI tools can discover and call it on their own.

System design

A thin Next.js display client on Vercel proxies to a FastAPI scraper on Boltic, so no Python or headless browser ever runs in the lambda. The backend offers three fetcher tiers: plain HTTP for speed, a Camoufox stealth mode that clears Cloudflare and JS challenges, and a Playwright dynamic mode, and it reuses the launched browser context across requests so only the first cold start pays the boot cost. Every surface, UI and API alike, publishes llms.txt and agents.md so an agent can self-discover and call the scraper.

What I got wrong, then fixed.

  1. 01 · the problem

    A scrape of a paywalled or login-walled page came back as a clean 200, but the content was just the wall: 'subscribe to read', cookie banners, login prompts. The scraper was treating access-control text as the page.

    what I did

    Added wall detection that flags auth, paywall, subscribe, and cookie-gate text, labels the result blocked or partial, and returns quality metadata, so a successful status with junk content no longer reads as a win.

  2. 02 · the problem

    The Jina Reader fallback and all the HTML cleanup lived in the frontend, duplicating logic the backend should own and bloating a client that was meant to just display results.

    what I did

    Moved the fallback and the content-cleaning pipeline into the FastAPI backend, keeping the frontend a thin display client and giving every caller, not just the UI, the cleaned output.

  3. 03 · the problem

    The first stealth or dynamic request booted Chromium from cold, a 10 to 20 second wait, and Vercel's lambda cannot run a headless browser at all.

    what I did

    Kept the scraping on a Boltic service sized for it (1 vCPU, 1.5 GB, 120s timeout) and reused the launched browser context across requests, so only the first request pays the boot cost while Vercel just proxies.

See Scrappling liveBack to all projects