π Serverless Web Scraper with GitHub Actions π Overview
This project demonstrates a serverless, automated web scraping system built with Python and GitHub Actions. It is designed to scrape large-scale datasets (3,000+ records per run) from internal-style websites and execute daily without any server infrastructure.
The project reflects a real-world scraping workflow commonly requested by clients on platforms like Upwork.
β¨ Key Features
β Daily automated scraping using GitHub Actions (cron jobs)
β Serverless architecture (no VPS, no cloud server)
β Pagination support for large datasets (3,000+ records)
β Modular Python codebase (auth, parser, runner)
β CSV data export
β Production-ready project structure
β Authentication-ready (simulated internal website access)
π§° Tech Stack
Python 3.10
Requests
BeautifulSoup4
GitHub Actions
Cron scheduling
π Project Structure . βββ .github/workflows/ β βββ scrape.yml # GitHub Actions workflow β βββ scraper/ β βββ init.py β βββ auth.py # Authentication/session handling β βββ parser.py # Pagination scraping logic β βββ scraper.py # Main entry point β βββ data/ β βββ output.csv # Scraped data output β βββ requirements.txt βββ README.md
βοΈ How It Works
GitHub Actions triggers the workflow daily (or manually).
A Python environment is set up automatically.
The scraper:
Initializes a session (simulating internal website access)
Iterates through paginated pages
Collects structured data
Results are saved as a CSV file.
Updated data is committed back to the repository.
π Data Scale & Pagination
The scraper supports hundreds of pages via pagination.
Logic is designed to handle 3,000+ records per execution.
Demo website limits the available pages, but the pagination logic is fully scalable and production-ready.
π GitHub Actions Automation
The workflow runs automatically using cron scheduling:
schedule:
- cron: "0 6 * * *"
This ensures hands-free daily scraping without maintaining any server.
π§βπΌ Client / Portfolio Use Case
This project is ideal for:
Internal website scraping
Scheduled data collection
Serverless automation
Upwork & freelance portfolio demonstration
Example client description:
Built a serverless Python scraping system using GitHub Actions to automatically collect large datasets on a daily schedule without any server infrastructure.
This repository uses a public demo website for demonstration purposes only. The architecture and logic are intended to represent internal or authenticated website scraping workflows.
π¬ Contact
If youβre looking for:
Automated web scraping
GitHub Actions automation
Serverless data pipelines
Feel free to reach out.