Build your first Python Apify actor
A step-by-step walkthrough that takes you from zero to a working, deployable Python actor. The finished actor reads a list of URLs, fetches each one, and pushes the page title to the dataset.
Use with an AI agent
Open this guide as a pre-filled prompt — or copy it for Claude Code, Cursor, Codex, or any other coding agent.
What you'll build
A minimal Apify actor in plain Python that:
- Takes a
startUrlsarray as input. - Fetches each URL using
httpx. - Extracts the
<title>of the page. - Pushes
{ "url": ..., "title": ... }to the run's dataset.
No Crawlee, no Playwright — just the Apify Python SDK and one HTTP client. Once it works, you'll know exactly where to plug in heavier tools.
Prerequisites
- Python 3.9 or newer. Check with
python3 --version. Older versions won't work with the modern Apify SDK. - Node.js(for the Apify CLI). The CLI is distributed via npm — that's the only reason you need it. The actor itself runs on Python.
- An Apify account. The free tier is enough — sign up at console.apify.com.
1. Install the Apify CLI and log in
The CLI scaffolds the project, runs it locally with the right env vars, and deploys it to the platform.
npm install -g apify-cli
apify --version
apify loginapify login opens a browser for OAuth and writes a token into ~/.apify/auth.json. After this the CLI can act on your behalf.
2. Scaffold the actor
apify create interactively walks you through picking a language and a template.
apify create my-first-actor
# Pick "Python" → "Empty project"
cd my-first-actorPick Python when asked, then the Empty project template — it gives you the smallest possible starting point.
Project structure
my-first-actor/
├── .actor/
│ ├── actor.json # Actor metadata (name, version, build options).
│ ├── Dockerfile # Used when the platform builds your actor.
│ └── input_schema.json # Defines the input fields shown in the UI.
├── src/
│ ├── __main__.py # Boots the asyncio event loop and calls main().
│ └── main.py # Your actor code lives here.
├── requirements.txt
└── README.mdThree files matter for now: src/main.py (the code), .actor/input_schema.json (the inputs), and requirements.txt (the deps).
Set up a virtual environment
Optional but strongly recommended — keeps the actor's deps separate from your system Python.
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt3. Add httpx to requirements
The empty template doesn't include an HTTP client, so add one. Replace requirements.txt with:
apify
httpxThen install it inside your virtualenv:
pip install -r requirements.txt4. Define the input schema
The input schema does double duty: it powers the form in the Apify Console and validates input on every run. Replace the contents of .actor/input_schema.json with:
{
"title": "Page Title Scraper input",
"type": "object",
"schemaVersion": 1,
"properties": {
"startUrls": {
"title": "Start URLs",
"type": "array",
"description": "List of URLs to fetch the <title> of.",
"editor": "requestListSources",
"prefill": [
{ "url": "https://apify.com" },
{ "url": "https://news.ycombinator.com" }
]
}
},
"required": ["startUrls"]
}The prefill array gives the form a sensible default so users can hit Run without filling anything in.
5. Write the actor
Replace src/main.py with:
import re
import httpx
from apify import Actor
async def main() -> None:
async with Actor:
actor_input = await Actor.get_input() or {}
start_urls = actor_input.get("startUrls", [{"url": "https://apify.com"}])
async with httpx.AsyncClient() as client:
for entry in start_urls:
url = entry["url"]
Actor.log.info(f"Fetching {url}")
response = await client.get(url, follow_redirects=True)
match = re.search(
r"<title[^>]*>([^<]+)</title>",
response.text,
re.IGNORECASE,
)
title = match.group(1).strip() if match else ""
await Actor.push_data({"url": url, "title": title})
A quick tour:
async with Actor:is the SDK context manager. It callsActor.init()on entry,Actor.exit()on exit, and handles platform signals in between.Actor.get_input()reads the JSON object provided on the Run screen (orstorage/key_value_stores/default/INPUT.jsonlocally).Actor.push_data()appends a row to the run's default dataset, which becomes the downloadable result.Actor.logwrites to the run log shown in the Console — use it instead ofprint()so structured fields are preserved.httpx.AsyncClientis an async-native HTTP client.follow_redirects=Truematters because many sites redirect bare hosts towww..
6. Run it locally
apify run --purgeThe --purge flag wipes the local storage/ directory before the run so you start with a clean slate. After it finishes you can inspect the output at storage/datasets/default/.
7. Push it to the Apify platform
apify pushThis zips your project, uploads it, and triggers a build. When the build finishes, your actor is live at console.apify.com/actors. The first run from the Console will use the input schema you defined.
Where to go next
- Crawl more than a list of URLs. Swap the
forloop for Crawlee for Python — you get retries, concurrency, and a request queue for free. - Parse HTML properly. A regex is fine for one tag, but for real scrapes use
beautifulsoup4orparsel(CSS/XPath selectors). - Monetize. Set a Pay-Per-Result price on your actor and then add free-tier limits so non-paying users hit a ceiling.
- Detect paying users. The
APIFY_USER_IS_PAYINGenv var lets you ship richer output to paying users — see how to tell if a user is paying. - Price it. Run the Apify Pricing Calculator to translate your costs into the bundle string for your listing.
Spotted a bug, or want a guide on something else?
support@mail.apifyhub.com