Build your first Python Apify actor

A step-by-step walkthrough that takes you from zero to a working, deployable Python actor. The finished actor reads a list of URLs, fetches each one, and pushes the page title to the dataset.

Use with an AI agent

Open this guide as a pre-filled prompt — or copy it for Claude Code, Cursor, Codex, or any other coding agent.

What you'll build

A minimal Apify actor in plain Python that:

  • Takes a startUrls array as input.
  • Fetches each URL using httpx.
  • Extracts the <title> of the page.
  • Pushes { "url": ..., "title": ... } to the run's dataset.

No Crawlee, no Playwright — just the Apify Python SDK and one HTTP client. Once it works, you'll know exactly where to plug in heavier tools.

Prerequisites

  • Python 3.9 or newer. Check with python3 --version. Older versions won't work with the modern Apify SDK.
  • Node.js(for the Apify CLI). The CLI is distributed via npm — that's the only reason you need it. The actor itself runs on Python.
  • An Apify account. The free tier is enough — sign up at console.apify.com.

1. Install the Apify CLI and log in

The CLI scaffolds the project, runs it locally with the right env vars, and deploys it to the platform.

npm install -g apify-cli
apify --version
apify login

apify login opens a browser for OAuth and writes a token into ~/.apify/auth.json. After this the CLI can act on your behalf.

2. Scaffold the actor

apify create interactively walks you through picking a language and a template.

apify create my-first-actor
# Pick "Python" → "Empty project"
cd my-first-actor

Pick Python when asked, then the Empty project template — it gives you the smallest possible starting point.

Project structure

my-first-actor/
├── .actor/
│   ├── actor.json          # Actor metadata (name, version, build options).
│   ├── Dockerfile          # Used when the platform builds your actor.
│   └── input_schema.json   # Defines the input fields shown in the UI.
├── src/
│   ├── __main__.py         # Boots the asyncio event loop and calls main().
│   └── main.py             # Your actor code lives here.
├── requirements.txt
└── README.md

Three files matter for now: src/main.py (the code), .actor/input_schema.json (the inputs), and requirements.txt (the deps).

Set up a virtual environment

Optional but strongly recommended — keeps the actor's deps separate from your system Python.

python3 -m venv .venv
source .venv/bin/activate    # Windows: .venv\Scripts\activate
pip install -r requirements.txt

3. Add httpx to requirements

The empty template doesn't include an HTTP client, so add one. Replace requirements.txt with:

apify
httpx

Then install it inside your virtualenv:

pip install -r requirements.txt

4. Define the input schema

The input schema does double duty: it powers the form in the Apify Console and validates input on every run. Replace the contents of .actor/input_schema.json with:

{
  "title": "Page Title Scraper input",
  "type": "object",
  "schemaVersion": 1,
  "properties": {
    "startUrls": {
      "title": "Start URLs",
      "type": "array",
      "description": "List of URLs to fetch the <title> of.",
      "editor": "requestListSources",
      "prefill": [
        { "url": "https://apify.com" },
        { "url": "https://news.ycombinator.com" }
      ]
    }
  },
  "required": ["startUrls"]
}

The prefill array gives the form a sensible default so users can hit Run without filling anything in.

5. Write the actor

Replace src/main.py with:

import re

import httpx

from apify import Actor


async def main() -> None:
    async with Actor:
        actor_input = await Actor.get_input() or {}
        start_urls = actor_input.get("startUrls", [{"url": "https://apify.com"}])

        async with httpx.AsyncClient() as client:
            for entry in start_urls:
                url = entry["url"]
                Actor.log.info(f"Fetching {url}")

                response = await client.get(url, follow_redirects=True)
                match = re.search(
                    r"<title[^>]*>([^<]+)</title>",
                    response.text,
                    re.IGNORECASE,
                )
                title = match.group(1).strip() if match else ""

                await Actor.push_data({"url": url, "title": title})

A quick tour:

  • async with Actor: is the SDK context manager. It calls Actor.init() on entry, Actor.exit() on exit, and handles platform signals in between.
  • Actor.get_input() reads the JSON object provided on the Run screen (or storage/key_value_stores/default/INPUT.json locally).
  • Actor.push_data()appends a row to the run's default dataset, which becomes the downloadable result.
  • Actor.log writes to the run log shown in the Console — use it instead of print() so structured fields are preserved.
  • httpx.AsyncClient is an async-native HTTP client. follow_redirects=True matters because many sites redirect bare hosts to www..

6. Run it locally

apify run --purge

The --purge flag wipes the local storage/ directory before the run so you start with a clean slate. After it finishes you can inspect the output at storage/datasets/default/.

7. Push it to the Apify platform

apify push

This zips your project, uploads it, and triggers a build. When the build finishes, your actor is live at console.apify.com/actors. The first run from the Console will use the input schema you defined.

Where to go next

  • Crawl more than a list of URLs. Swap the for loop for Crawlee for Python — you get retries, concurrency, and a request queue for free.
  • Parse HTML properly. A regex is fine for one tag, but for real scrapes use beautifulsoup4 or parsel (CSS/XPath selectors).
  • Monetize. Set a Pay-Per-Result price on your actor and then add free-tier limits so non-paying users hit a ceiling.
  • Detect paying users. The APIFY_USER_IS_PAYING env var lets you ship richer output to paying users — see how to tell if a user is paying.
  • Price it. Run the Apify Pricing Calculator to translate your costs into the bundle string for your listing.

Spotted a bug, or want a guide on something else?

support@mail.apifyhub.com