Reporting System for Payment Platform¶
Problem Statement¶
Design a reporting system for a payment platform that generates financial reports, transaction summaries, and analytics exports. The system should handle on-demand reports, scheduled reports, and support various export formats while processing large datasets efficiently.
Requirements¶
Functional Requirements¶
- Generate financial reports (revenue, payouts, fees)
- Transaction history exports
- Custom date range reports
- Scheduled/recurring reports
- Multiple export formats (CSV, PDF, Excel)
- Report filtering and customization
- Email delivery of reports
- Report history and re-download
Non-Functional Requirements¶
- Scalability: Handle reports with millions of rows
- Performance: Generate most reports within minutes
- Reliability: No data loss, accurate calculations
- Availability: 99.9% uptime for report downloads
- Storage: Efficient storage for generated reports
High-Level Architecture¶
Core Components¶
1. Report API Service¶
- Accepts report requests
- Validates parameters
- Creates report jobs
- Returns job status
- Provides download URLs
2. Report Job Queue¶
- Decouples request from processing
- Handles backpressure
- Enables retry on failure
- Priority queuing
3. Report Workers¶
- Process report jobs
- Query data sources
- Generate output files
- Handle large datasets with streaming
4. Data Sources¶
- Read replicas for OLTP data
- OLAP database for analytics
- Pre-aggregated tables
- Data warehouse integration
5. File Storage¶
- Store generated reports
- Signed URLs for downloads
- Retention policies
- Encryption at rest
6. Notification Service¶
- Email delivery
- Webhook notifications
- In-app notifications
Report Types¶
Data Models¶
Report Request¶
CREATE TABLE report_requests (
id UUID PRIMARY KEY,
account_id UUID NOT NULL,
-- Report definition
report_type VARCHAR(50) NOT NULL,
parameters JSONB NOT NULL,
/*
{
"date_range": {
"start": "2024-01-01",
"end": "2024-01-31"
},
"filters": {
"status": ["succeeded"],
"currency": "usd"
},
"columns": ["id", "amount", "created", "customer"],
"format": "csv"
}
*/
-- Output
format VARCHAR(20) NOT NULL, -- csv, pdf, xlsx
file_url VARCHAR(500),
file_size_bytes BIGINT,
row_count BIGINT,
-- Status
status VARCHAR(20) NOT NULL, -- pending, processing, completed, failed
progress INT DEFAULT 0, -- 0-100
error_message TEXT,
-- Scheduling
scheduled BOOLEAN DEFAULT false,
schedule_cron VARCHAR(50),
next_run_at TIMESTAMP,
-- Delivery
email_recipients TEXT[],
webhook_url VARCHAR(500),
-- Timestamps
created_at TIMESTAMP NOT NULL,
started_at TIMESTAMP,
completed_at TIMESTAMP,
expires_at TIMESTAMP, -- When file will be deleted
-- Requestor
requested_by UUID, -- User ID
FOREIGN KEY (account_id) REFERENCES accounts(id)
);
CREATE INDEX idx_reports_account ON report_requests(account_id, created_at DESC);
CREATE INDEX idx_reports_status ON report_requests(status, created_at);
CREATE INDEX idx_reports_scheduled ON report_requests(scheduled, next_run_at)
WHERE scheduled = true;
Report Template¶
CREATE TABLE report_templates (
id UUID PRIMARY KEY,
account_id UUID NOT NULL,
-- Template definition
name VARCHAR(255) NOT NULL,
description TEXT,
report_type VARCHAR(50) NOT NULL,
default_parameters JSONB NOT NULL,
format VARCHAR(20) NOT NULL,
-- Scheduling
schedule_enabled BOOLEAN DEFAULT false,
schedule_cron VARCHAR(50),
email_recipients TEXT[],
-- Metadata
created_at TIMESTAMP NOT NULL,
updated_at TIMESTAMP NOT NULL,
created_by UUID,
FOREIGN KEY (account_id) REFERENCES accounts(id)
);
Report Generation Flow¶
On-Demand Report¶
Polling vs Webhooks¶
Report Worker¶
Worker Implementation¶
@Component
public class ReportWorker {
private final DataSourceRouter dataSource;
private final ReportGeneratorFactory generatorFactory;
private final FileStorage fileStorage;
private final NotificationService notificationService;
@KafkaListener(topics = "report-jobs")
public void processReport(ReportJobMessage message) {
ReportRequest request = reportRepository.findById(message.getReportId());
try {
// Update status
request.setStatus(ReportStatus.PROCESSING);
request.setStartedAt(Instant.now());
reportRepository.save(request);
// Generate report
ReportResult result = generateReport(request);
// Upload to storage
String fileUrl = uploadReport(request, result);
// Update completion
request.setStatus(ReportStatus.COMPLETED);
request.setFileUrl(fileUrl);
request.setFileSizeBytes(result.getFileSize());
request.setRowCount(result.getRowCount());
request.setCompletedAt(Instant.now());
request.setExpiresAt(Instant.now().plus(Duration.ofDays(7)));
reportRepository.save(request);
// Notify
sendNotification(request);
} catch (Exception e) {
request.setStatus(ReportStatus.FAILED);
request.setErrorMessage(e.getMessage());
reportRepository.save(request);
// Send failure notification
sendFailureNotification(request, e);
}
}
private ReportResult generateReport(ReportRequest request) {
ReportGenerator generator = generatorFactory.getGenerator(request.getReportType());
ReportConfig config = ReportConfig.from(request.getParameters());
// Stream data to avoid memory issues
try (OutputStream output = createTempFile()) {
ReportWriter writer = createWriter(request.getFormat(), output);
// Write header
writer.writeHeader(generator.getColumns(config));
// Stream rows
long rowCount = 0;
try (Stream<ReportRow> rows = generator.streamData(request.getAccountId(), config)) {
Iterator<ReportRow> iterator = rows.iterator();
while (iterator.hasNext()) {
writer.writeRow(iterator.next());
rowCount++;
// Update progress periodically
if (rowCount % 10000 == 0) {
updateProgress(request, rowCount);
}
}
}
writer.finish();
return new ReportResult(output, rowCount);
}
}
}
Streaming Large Reports¶
public class TransactionReportGenerator implements ReportGenerator {
private final JdbcTemplate jdbc;
@Override
public Stream<ReportRow> streamData(String accountId, ReportConfig config) {
String sql = buildQuery(accountId, config);
// Use cursor-based streaming to avoid loading all data in memory
return jdbc.queryForStream(sql, (rs, rowNum) -> mapRow(rs));
}
private String buildQuery(String accountId, ReportConfig config) {
return """
SELECT
p.id,
p.amount,
p.currency,
p.status,
p.created_at,
c.email as customer_email
FROM payments p
LEFT JOIN customers c ON p.customer_id = c.id
WHERE p.account_id = :accountId
AND p.created_at BETWEEN :startDate AND :endDate
AND p.livemode = :livemode
ORDER BY p.created_at DESC
""";
}
}
Export Formats¶
CSV Writer¶
public class CsvReportWriter implements ReportWriter {
private final CSVPrinter printer;
public CsvReportWriter(OutputStream output) {
this.printer = new CSVPrinter(
new OutputStreamWriter(output, StandardCharsets.UTF_8),
CSVFormat.DEFAULT.withHeader()
);
}
@Override
public void writeHeader(List<String> columns) throws IOException {
printer.printRecord(columns);
}
@Override
public void writeRow(ReportRow row) throws IOException {
printer.printRecord(row.getValues());
}
@Override
public void finish() throws IOException {
printer.flush();
printer.close();
}
}
PDF Writer¶
public class PdfReportWriter implements ReportWriter {
private final Document document;
private final PdfWriter writer;
private Table table;
@Override
public void writeHeader(List<String> columns) {
// Add report title
document.add(new Paragraph("Payment Report")
.setFontSize(18)
.setBold());
document.add(new Paragraph("Generated: " + LocalDateTime.now()));
// Create table
table = new Table(columns.size());
for (String column : columns) {
table.addHeaderCell(new Cell()
.add(new Paragraph(column).setBold())
.setBackgroundColor(ColorConstants.LIGHT_GRAY));
}
}
@Override
public void writeRow(ReportRow row) {
for (Object value : row.getValues()) {
table.addCell(new Cell().add(new Paragraph(String.valueOf(value))));
}
}
@Override
public void finish() {
document.add(table);
document.close();
}
}
Scheduled Reports¶
Scheduler¶
@Component
public class ReportScheduler {
private final ReportRequestRepository reportRepository;
private final ReportJobQueue jobQueue;
@Scheduled(fixedDelay = 60000) // Check every minute
public void checkScheduledReports() {
List<ReportRequest> dueReports = reportRepository.findScheduledDue(Instant.now());
for (ReportRequest template : dueReports) {
// Create new report instance from template
ReportRequest newReport = ReportRequest.builder()
.accountId(template.getAccountId())
.reportType(template.getReportType())
.parameters(calculateParameters(template))
.format(template.getFormat())
.emailRecipients(template.getEmailRecipients())
.status(ReportStatus.PENDING)
.build();
reportRepository.save(newReport);
jobQueue.enqueue(newReport);
// Update next run time
template.setNextRunAt(calculateNextRun(template.getScheduleCron()));
reportRepository.save(template);
}
}
private JsonNode calculateParameters(ReportRequest template) {
// For scheduled reports, calculate date range relative to now
// e.g., "last 7 days", "last month"
JsonNode params = template.getParameters().deepCopy();
if (params.has("relative_date_range")) {
String relative = params.get("relative_date_range").asText();
DateRange absolute = calculateAbsoluteDateRange(relative);
((ObjectNode) params).put("date_range", objectMapper.valueToTree(absolute));
}
return params;
}
}
Cron Configuration¶
# Example schedule configurations
schedules:
daily:
cron: "0 6 * * *" # 6 AM daily
relative_date_range: "yesterday"
weekly:
cron: "0 6 * * MON" # 6 AM every Monday
relative_date_range: "last_7_days"
monthly:
cron: "0 6 1 * *" # 6 AM on 1st of month
relative_date_range: "last_month"
quarterly:
cron: "0 6 1 1,4,7,10 *" # 6 AM on 1st of Jan, Apr, Jul, Oct
relative_date_range: "last_quarter"
File Storage & Downloads¶
Upload & Signed URL¶
@Service
public class ReportFileStorage {
private final S3Client s3;
private final String bucket = "reports-bucket";
public String uploadReport(ReportRequest request, InputStream data) {
String key = buildKey(request);
s3.putObject(PutObjectRequest.builder()
.bucket(bucket)
.key(key)
.contentType(getContentType(request.getFormat()))
.serverSideEncryption(ServerSideEncryption.AES256)
.build(),
RequestBody.fromInputStream(data, data.available())
);
return key;
}
public String getDownloadUrl(ReportRequest request) {
GetObjectRequest getRequest = GetObjectRequest.builder()
.bucket(bucket)
.key(request.getFileUrl())
.build();
// Generate pre-signed URL valid for 1 hour
PresignedGetObjectRequest presigned = s3Presigner.presignGetObject(
GetObjectPresignRequest.builder()
.signatureDuration(Duration.ofHours(1))
.getObjectRequest(getRequest)
.build()
);
return presigned.url().toString();
}
private String buildKey(ReportRequest request) {
return String.format("reports/%s/%s/%s.%s",
request.getAccountId(),
LocalDate.now(),
request.getId(),
request.getFormat()
);
}
}
Retention Policy¶
@Scheduled(cron = "0 0 2 * * *") // 2 AM daily
public void cleanupExpiredReports() {
List<ReportRequest> expired = reportRepository.findExpired(Instant.now());
for (ReportRequest report : expired) {
// Delete file from S3
if (report.getFileUrl() != null) {
fileStorage.delete(report.getFileUrl());
}
// Update record (keep metadata, clear file reference)
report.setFileUrl(null);
report.setStatus(ReportStatus.EXPIRED);
reportRepository.save(report);
}
log.info("Cleaned up {} expired reports", expired.size());
}
Performance Optimization¶
Query Optimization¶
-- Use read replica for report queries
-- Use appropriate indexes
CREATE INDEX idx_payments_report ON payments(account_id, created_at, livemode)
INCLUDE (amount, currency, status, customer_id);
-- Partition large tables by time
CREATE TABLE payments (
...
) PARTITION BY RANGE (created_at);
-- Pre-aggregate for common reports
CREATE MATERIALIZED VIEW daily_payment_summary AS
SELECT
account_id,
date_trunc('day', created_at) as date,
currency,
COUNT(*) as transaction_count,
SUM(amount) as gross_volume,
SUM(CASE WHEN status = 'succeeded' THEN amount ELSE 0 END) as net_volume
FROM payments
WHERE livemode = true
GROUP BY account_id, date_trunc('day', created_at), currency;
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_payment_summary;
Chunked Processing¶
public class ChunkedReportProcessor {
private static final int CHUNK_SIZE = 50000;
public void processLargeReport(ReportRequest request, ReportWriter writer) {
long offset = 0;
long total = countTotalRows(request);
while (offset < total) {
// Process in chunks
List<ReportRow> chunk = fetchChunk(request, offset, CHUNK_SIZE);
for (ReportRow row : chunk) {
writer.writeRow(row);
}
offset += chunk.size();
// Update progress
int progress = (int) ((offset * 100) / total);
updateProgress(request.getId(), progress);
// Allow GC between chunks
chunk.clear();
}
}
}
API Design¶
Report Request Endpoint¶
POST /v1/reports
Content-Type: application/json
{
"report_type": "payments",
"parameters": {
"date_range": {
"start": "2024-01-01",
"end": "2024-01-31"
},
"filters": {
"status": ["succeeded", "refunded"],
"currency": "usd"
},
"columns": ["id", "amount", "status", "created", "customer_email"]
},
"format": "csv",
"email_to": ["[email protected]"]
}
Response¶
{
"id": "report_abc123",
"object": "report",
"report_type": "payments",
"status": "pending",
"progress": 0,
"created_at": 1705312800,
"parameters": { ... }
}
Check Status¶
{
"id": "report_abc123",
"object": "report",
"report_type": "payments",
"status": "completed",
"progress": 100,
"download_url": "https://...",
"file_size": 1048576,
"row_count": 15234,
"created_at": 1705312800,
"completed_at": 1705312920,
"expires_at": 1705917720
}
Monitoring¶
Key Metrics¶
| Metric | Description | Alert Threshold |
|---|---|---|
| Report generation time | P95 time to complete | > 10 minutes |
| Queue depth | Pending reports | > 100 |
| Failure rate | % of failed reports | > 5% |
| Worker utilization | CPU/memory usage | > 80% |
Dashboard¶
Technology Choices¶
| Component | Technology Options |
|---|---|
| Job Queue | Kafka, SQS, RabbitMQ |
| Data Source | PostgreSQL replicas, ClickHouse |
| File Storage | S3, GCS, Azure Blob |
| PDF Generation | iText, Apache PDFBox |
| CSV Processing | Apache Commons CSV |
| Scheduling | Kubernetes CronJob, Quartz |
Interview Discussion Points¶
- How do you handle very large reports (millions of rows)?
-
Streaming, chunked processing, file compression, async generation
-
How do you ensure data accuracy in reports?
-
Use read replicas with consistent snapshots, validation checksums
-
How do you handle concurrent report requests?
-
Job queue with priority, worker pool, rate limiting per account
-
How do you optimize query performance?
-
Pre-aggregated tables, proper indexes, query optimization, OLAP database
-
How do you handle report failures?
-
Retry logic, dead letter queue, partial results, error notifications
-
How do you manage file storage costs?
- Retention policies, tiered storage, compression