Skip to content

Reporting System for Payment Platform

Problem Statement

Design a reporting system for a payment platform that generates financial reports, transaction summaries, and analytics exports. The system should handle on-demand reports, scheduled reports, and support various export formats while processing large datasets efficiently.


Requirements

Functional Requirements

  • Generate financial reports (revenue, payouts, fees)
  • Transaction history exports
  • Custom date range reports
  • Scheduled/recurring reports
  • Multiple export formats (CSV, PDF, Excel)
  • Report filtering and customization
  • Email delivery of reports
  • Report history and re-download

Non-Functional Requirements

  • Scalability: Handle reports with millions of rows
  • Performance: Generate most reports within minutes
  • Reliability: No data loss, accurate calculations
  • Availability: 99.9% uptime for report downloads
  • Storage: Efficient storage for generated reports

High-Level Architecture

Reporting System - High-Level Architecture


Core Components

1. Report API Service

  • Accepts report requests
  • Validates parameters
  • Creates report jobs
  • Returns job status
  • Provides download URLs

2. Report Job Queue

  • Decouples request from processing
  • Handles backpressure
  • Enables retry on failure
  • Priority queuing

3. Report Workers

  • Process report jobs
  • Query data sources
  • Generate output files
  • Handle large datasets with streaming

4. Data Sources

  • Read replicas for OLTP data
  • OLAP database for analytics
  • Pre-aggregated tables
  • Data warehouse integration

5. File Storage

  • Store generated reports
  • Signed URLs for downloads
  • Retention policies
  • Encryption at rest

6. Notification Service

  • Email delivery
  • Webhook notifications
  • In-app notifications

Report Types

Report Categories


Data Models

Report Request

CREATE TABLE report_requests (
    id                  UUID PRIMARY KEY,
    account_id          UUID NOT NULL,

    -- Report definition
    report_type         VARCHAR(50) NOT NULL,
    parameters          JSONB NOT NULL,
    /*
    {
        "date_range": {
            "start": "2024-01-01",
            "end": "2024-01-31"
        },
        "filters": {
            "status": ["succeeded"],
            "currency": "usd"
        },
        "columns": ["id", "amount", "created", "customer"],
        "format": "csv"
    }
    */

    -- Output
    format              VARCHAR(20) NOT NULL,       -- csv, pdf, xlsx
    file_url            VARCHAR(500),
    file_size_bytes     BIGINT,
    row_count           BIGINT,

    -- Status
    status              VARCHAR(20) NOT NULL,       -- pending, processing, completed, failed
    progress            INT DEFAULT 0,              -- 0-100
    error_message       TEXT,

    -- Scheduling
    scheduled           BOOLEAN DEFAULT false,
    schedule_cron       VARCHAR(50),
    next_run_at         TIMESTAMP,

    -- Delivery
    email_recipients    TEXT[],
    webhook_url         VARCHAR(500),

    -- Timestamps
    created_at          TIMESTAMP NOT NULL,
    started_at          TIMESTAMP,
    completed_at        TIMESTAMP,
    expires_at          TIMESTAMP,                  -- When file will be deleted

    -- Requestor
    requested_by        UUID,                       -- User ID

    FOREIGN KEY (account_id) REFERENCES accounts(id)
);

CREATE INDEX idx_reports_account ON report_requests(account_id, created_at DESC);
CREATE INDEX idx_reports_status ON report_requests(status, created_at);
CREATE INDEX idx_reports_scheduled ON report_requests(scheduled, next_run_at)
    WHERE scheduled = true;

Report Template

CREATE TABLE report_templates (
    id                  UUID PRIMARY KEY,
    account_id          UUID NOT NULL,

    -- Template definition
    name                VARCHAR(255) NOT NULL,
    description         TEXT,
    report_type         VARCHAR(50) NOT NULL,
    default_parameters  JSONB NOT NULL,
    format              VARCHAR(20) NOT NULL,

    -- Scheduling
    schedule_enabled    BOOLEAN DEFAULT false,
    schedule_cron       VARCHAR(50),
    email_recipients    TEXT[],

    -- Metadata
    created_at          TIMESTAMP NOT NULL,
    updated_at          TIMESTAMP NOT NULL,
    created_by          UUID,

    FOREIGN KEY (account_id) REFERENCES accounts(id)
);

Report Generation Flow

On-Demand Report

On-Demand Report Flow

Polling vs Webhooks

Client Notification Options


Report Worker

Worker Implementation

@Component
public class ReportWorker {

    private final DataSourceRouter dataSource;
    private final ReportGeneratorFactory generatorFactory;
    private final FileStorage fileStorage;
    private final NotificationService notificationService;

    @KafkaListener(topics = "report-jobs")
    public void processReport(ReportJobMessage message) {
        ReportRequest request = reportRepository.findById(message.getReportId());

        try {
            // Update status
            request.setStatus(ReportStatus.PROCESSING);
            request.setStartedAt(Instant.now());
            reportRepository.save(request);

            // Generate report
            ReportResult result = generateReport(request);

            // Upload to storage
            String fileUrl = uploadReport(request, result);

            // Update completion
            request.setStatus(ReportStatus.COMPLETED);
            request.setFileUrl(fileUrl);
            request.setFileSizeBytes(result.getFileSize());
            request.setRowCount(result.getRowCount());
            request.setCompletedAt(Instant.now());
            request.setExpiresAt(Instant.now().plus(Duration.ofDays(7)));
            reportRepository.save(request);

            // Notify
            sendNotification(request);

        } catch (Exception e) {
            request.setStatus(ReportStatus.FAILED);
            request.setErrorMessage(e.getMessage());
            reportRepository.save(request);

            // Send failure notification
            sendFailureNotification(request, e);
        }
    }

    private ReportResult generateReport(ReportRequest request) {
        ReportGenerator generator = generatorFactory.getGenerator(request.getReportType());
        ReportConfig config = ReportConfig.from(request.getParameters());

        // Stream data to avoid memory issues
        try (OutputStream output = createTempFile()) {
            ReportWriter writer = createWriter(request.getFormat(), output);

            // Write header
            writer.writeHeader(generator.getColumns(config));

            // Stream rows
            long rowCount = 0;
            try (Stream<ReportRow> rows = generator.streamData(request.getAccountId(), config)) {
                Iterator<ReportRow> iterator = rows.iterator();
                while (iterator.hasNext()) {
                    writer.writeRow(iterator.next());
                    rowCount++;

                    // Update progress periodically
                    if (rowCount % 10000 == 0) {
                        updateProgress(request, rowCount);
                    }
                }
            }

            writer.finish();

            return new ReportResult(output, rowCount);
        }
    }
}

Streaming Large Reports

public class TransactionReportGenerator implements ReportGenerator {

    private final JdbcTemplate jdbc;

    @Override
    public Stream<ReportRow> streamData(String accountId, ReportConfig config) {
        String sql = buildQuery(accountId, config);

        // Use cursor-based streaming to avoid loading all data in memory
        return jdbc.queryForStream(sql, (rs, rowNum) -> mapRow(rs));
    }

    private String buildQuery(String accountId, ReportConfig config) {
        return """
            SELECT
                p.id,
                p.amount,
                p.currency,
                p.status,
                p.created_at,
                c.email as customer_email
            FROM payments p
            LEFT JOIN customers c ON p.customer_id = c.id
            WHERE p.account_id = :accountId
            AND p.created_at BETWEEN :startDate AND :endDate
            AND p.livemode = :livemode
            ORDER BY p.created_at DESC
            """;
    }
}

Export Formats

CSV Writer

public class CsvReportWriter implements ReportWriter {

    private final CSVPrinter printer;

    public CsvReportWriter(OutputStream output) {
        this.printer = new CSVPrinter(
            new OutputStreamWriter(output, StandardCharsets.UTF_8),
            CSVFormat.DEFAULT.withHeader()
        );
    }

    @Override
    public void writeHeader(List<String> columns) throws IOException {
        printer.printRecord(columns);
    }

    @Override
    public void writeRow(ReportRow row) throws IOException {
        printer.printRecord(row.getValues());
    }

    @Override
    public void finish() throws IOException {
        printer.flush();
        printer.close();
    }
}

PDF Writer

public class PdfReportWriter implements ReportWriter {

    private final Document document;
    private final PdfWriter writer;
    private Table table;

    @Override
    public void writeHeader(List<String> columns) {
        // Add report title
        document.add(new Paragraph("Payment Report")
            .setFontSize(18)
            .setBold());

        document.add(new Paragraph("Generated: " + LocalDateTime.now()));

        // Create table
        table = new Table(columns.size());
        for (String column : columns) {
            table.addHeaderCell(new Cell()
                .add(new Paragraph(column).setBold())
                .setBackgroundColor(ColorConstants.LIGHT_GRAY));
        }
    }

    @Override
    public void writeRow(ReportRow row) {
        for (Object value : row.getValues()) {
            table.addCell(new Cell().add(new Paragraph(String.valueOf(value))));
        }
    }

    @Override
    public void finish() {
        document.add(table);
        document.close();
    }
}

Scheduled Reports

Scheduler

@Component
public class ReportScheduler {

    private final ReportRequestRepository reportRepository;
    private final ReportJobQueue jobQueue;

    @Scheduled(fixedDelay = 60000)  // Check every minute
    public void checkScheduledReports() {
        List<ReportRequest> dueReports = reportRepository.findScheduledDue(Instant.now());

        for (ReportRequest template : dueReports) {
            // Create new report instance from template
            ReportRequest newReport = ReportRequest.builder()
                .accountId(template.getAccountId())
                .reportType(template.getReportType())
                .parameters(calculateParameters(template))
                .format(template.getFormat())
                .emailRecipients(template.getEmailRecipients())
                .status(ReportStatus.PENDING)
                .build();

            reportRepository.save(newReport);
            jobQueue.enqueue(newReport);

            // Update next run time
            template.setNextRunAt(calculateNextRun(template.getScheduleCron()));
            reportRepository.save(template);
        }
    }

    private JsonNode calculateParameters(ReportRequest template) {
        // For scheduled reports, calculate date range relative to now
        // e.g., "last 7 days", "last month"
        JsonNode params = template.getParameters().deepCopy();

        if (params.has("relative_date_range")) {
            String relative = params.get("relative_date_range").asText();
            DateRange absolute = calculateAbsoluteDateRange(relative);
            ((ObjectNode) params).put("date_range", objectMapper.valueToTree(absolute));
        }

        return params;
    }
}

Cron Configuration

# Example schedule configurations
schedules:
  daily:
    cron: "0 6 * * *"           # 6 AM daily
    relative_date_range: "yesterday"

  weekly:
    cron: "0 6 * * MON"         # 6 AM every Monday
    relative_date_range: "last_7_days"

  monthly:
    cron: "0 6 1 * *"           # 6 AM on 1st of month
    relative_date_range: "last_month"

  quarterly:
    cron: "0 6 1 1,4,7,10 *"    # 6 AM on 1st of Jan, Apr, Jul, Oct
    relative_date_range: "last_quarter"

File Storage & Downloads

Upload & Signed URL

@Service
public class ReportFileStorage {

    private final S3Client s3;
    private final String bucket = "reports-bucket";

    public String uploadReport(ReportRequest request, InputStream data) {
        String key = buildKey(request);

        s3.putObject(PutObjectRequest.builder()
            .bucket(bucket)
            .key(key)
            .contentType(getContentType(request.getFormat()))
            .serverSideEncryption(ServerSideEncryption.AES256)
            .build(),
            RequestBody.fromInputStream(data, data.available())
        );

        return key;
    }

    public String getDownloadUrl(ReportRequest request) {
        GetObjectRequest getRequest = GetObjectRequest.builder()
            .bucket(bucket)
            .key(request.getFileUrl())
            .build();

        // Generate pre-signed URL valid for 1 hour
        PresignedGetObjectRequest presigned = s3Presigner.presignGetObject(
            GetObjectPresignRequest.builder()
                .signatureDuration(Duration.ofHours(1))
                .getObjectRequest(getRequest)
                .build()
        );

        return presigned.url().toString();
    }

    private String buildKey(ReportRequest request) {
        return String.format("reports/%s/%s/%s.%s",
            request.getAccountId(),
            LocalDate.now(),
            request.getId(),
            request.getFormat()
        );
    }
}

Retention Policy

@Scheduled(cron = "0 0 2 * * *")  // 2 AM daily
public void cleanupExpiredReports() {
    List<ReportRequest> expired = reportRepository.findExpired(Instant.now());

    for (ReportRequest report : expired) {
        // Delete file from S3
        if (report.getFileUrl() != null) {
            fileStorage.delete(report.getFileUrl());
        }

        // Update record (keep metadata, clear file reference)
        report.setFileUrl(null);
        report.setStatus(ReportStatus.EXPIRED);
        reportRepository.save(report);
    }

    log.info("Cleaned up {} expired reports", expired.size());
}

Performance Optimization

Query Optimization

-- Use read replica for report queries
-- Use appropriate indexes

CREATE INDEX idx_payments_report ON payments(account_id, created_at, livemode)
    INCLUDE (amount, currency, status, customer_id);

-- Partition large tables by time
CREATE TABLE payments (
    ...
) PARTITION BY RANGE (created_at);

-- Pre-aggregate for common reports
CREATE MATERIALIZED VIEW daily_payment_summary AS
SELECT
    account_id,
    date_trunc('day', created_at) as date,
    currency,
    COUNT(*) as transaction_count,
    SUM(amount) as gross_volume,
    SUM(CASE WHEN status = 'succeeded' THEN amount ELSE 0 END) as net_volume
FROM payments
WHERE livemode = true
GROUP BY account_id, date_trunc('day', created_at), currency;

REFRESH MATERIALIZED VIEW CONCURRENTLY daily_payment_summary;

Chunked Processing

public class ChunkedReportProcessor {

    private static final int CHUNK_SIZE = 50000;

    public void processLargeReport(ReportRequest request, ReportWriter writer) {
        long offset = 0;
        long total = countTotalRows(request);

        while (offset < total) {
            // Process in chunks
            List<ReportRow> chunk = fetchChunk(request, offset, CHUNK_SIZE);

            for (ReportRow row : chunk) {
                writer.writeRow(row);
            }

            offset += chunk.size();

            // Update progress
            int progress = (int) ((offset * 100) / total);
            updateProgress(request.getId(), progress);

            // Allow GC between chunks
            chunk.clear();
        }
    }
}

API Design

Report Request Endpoint

POST /v1/reports
Content-Type: application/json

{
    "report_type": "payments",
    "parameters": {
        "date_range": {
            "start": "2024-01-01",
            "end": "2024-01-31"
        },
        "filters": {
            "status": ["succeeded", "refunded"],
            "currency": "usd"
        },
        "columns": ["id", "amount", "status", "created", "customer_email"]
    },
    "format": "csv",
    "email_to": ["[email protected]"]
}

Response

{
    "id": "report_abc123",
    "object": "report",
    "report_type": "payments",
    "status": "pending",
    "progress": 0,
    "created_at": 1705312800,
    "parameters": { ... }
}

Check Status

GET /v1/reports/report_abc123
{
    "id": "report_abc123",
    "object": "report",
    "report_type": "payments",
    "status": "completed",
    "progress": 100,
    "download_url": "https://...",
    "file_size": 1048576,
    "row_count": 15234,
    "created_at": 1705312800,
    "completed_at": 1705312920,
    "expires_at": 1705917720
}

Monitoring

Key Metrics

Metric Description Alert Threshold
Report generation time P95 time to complete > 10 minutes
Queue depth Pending reports > 100
Failure rate % of failed reports > 5%
Worker utilization CPU/memory usage > 80%

Dashboard

Monitoring Dashboard


Technology Choices

Component Technology Options
Job Queue Kafka, SQS, RabbitMQ
Data Source PostgreSQL replicas, ClickHouse
File Storage S3, GCS, Azure Blob
PDF Generation iText, Apache PDFBox
CSV Processing Apache Commons CSV
Scheduling Kubernetes CronJob, Quartz

Interview Discussion Points

  1. How do you handle very large reports (millions of rows)?
  2. Streaming, chunked processing, file compression, async generation

  3. How do you ensure data accuracy in reports?

  4. Use read replicas with consistent snapshots, validation checksums

  5. How do you handle concurrent report requests?

  6. Job queue with priority, worker pool, rate limiting per account

  7. How do you optimize query performance?

  8. Pre-aggregated tables, proper indexes, query optimization, OLAP database

  9. How do you handle report failures?

  10. Retry logic, dead letter queue, partial results, error notifications

  11. How do you manage file storage costs?

  12. Retention policies, tiered storage, compression