Price Tracking Service (like CamelCamelCamel)¶
Quick Reference Guide for System Design Interviews
Problem Statement¶
Design a price tracking service that monitors product prices across e-commerce sites, stores historical pricing data, and alerts users when prices drop to their desired levels.
Requirements¶
Functional Requirements¶
- Track prices for millions of products
- Historical price graphs
- Price drop alerts (email, push)
- Support multiple retailers (Amazon, Walmart, etc.)
- User watchlists
Non-Functional Requirements¶
- Freshness: Prices updated every 1-6 hours
- Availability: 99.9%
- Scale: 100M products, 10M users
- Accuracy: Correct price extraction
High-Level Architecture¶
Crawler Service¶
Price Extraction¶
Price History Storage¶
Alert System¶
Data Models¶
-- Products
CREATE TABLE products (
product_id UUID PRIMARY KEY,
name VARCHAR(500),
brand VARCHAR(200),
category VARCHAR(100),
image_url VARCHAR(500),
current_price DECIMAL(10,2),
lowest_price DECIMAL(10,2),
highest_price DECIMAL(10,2),
avg_price DECIMAL(10,2),
last_updated TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW()
);
-- Product URLs (same product on multiple retailers)
CREATE TABLE product_urls (
product_id UUID REFERENCES products(product_id),
retailer VARCHAR(50),
url VARCHAR(2000),
external_id VARCHAR(100), -- ASIN for Amazon
crawl_priority INT DEFAULT 1,
last_crawled TIMESTAMP,
next_crawl TIMESTAMP,
PRIMARY KEY (product_id, retailer)
);
-- User watchlists
CREATE TABLE watchlists (
user_id UUID,
product_id UUID REFERENCES products(product_id),
target_price DECIMAL(10,2),
notify_on_drop BOOLEAN DEFAULT TRUE,
added_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (user_id, product_id)
);
-- Price history (TimescaleDB hypertable)
CREATE TABLE price_history (
time TIMESTAMPTZ NOT NULL,
product_id UUID NOT NULL,
retailer VARCHAR(50),
price DECIMAL(10,2),
in_stock BOOLEAN,
currency VARCHAR(3) DEFAULT 'USD'
);
API Endpoints¶
Interview Discussion Points¶
- How do you crawl millions of products?
- Priority-based scheduling
- Distributed workers
-
Rate limiting per domain
-
How do you handle anti-scraping?
- Rotate user agents/IPs
- Respect robots.txt
-
Use APIs when available
-
How do you store price history efficiently?
- TimescaleDB for time-series
- Compression and downsampling
-
Retention policies
-
How do you check alerts efficiently?
- Event-driven on price update
- Indexed lookup by product_id
-
Batch processing
-
How do you handle layout changes?
- Monitor extraction success rate
- Alert on anomalies
-
Quick parser updates
-
How do you ensure freshness?
- Higher priority for watched items
- Adaptive crawl frequency
- Real-time for high-value products