Payment Processing Architecture: Lessons Learned
After building payment systems that process millions of transactions daily, I've learned that successful payment architecture is 80% about handling failures gracefully and 20% about the happy path. This post shares the key architectural patterns and lessons learned from building resilient payment systems.
The Payment System Landscape
Modern payment processing involves multiple parties:
- Payment Service Providers (PSPs): Stripe, Adyen, Square
- Card Networks: Visa, Mastercard, American Express
- Acquiring Banks: Process merchant transactions
- Issuing Banks: Issue cards to consumers
- Regulators: PCI DSS, regional financial authorities
Each of these introduces potential failure points, latency, and complexity.
Core Architecture Patterns
1. Command Query Responsibility Segregation (CQRS)
Separate your read and write models for payment data:
// Command side - optimized for writes
interface PaymentCommand {
execute(command: ProcessPaymentCommand): Promise<PaymentResult>;
}
class PaymentProcessor implements PaymentCommand {
async execute(command: ProcessPaymentCommand): Promise<PaymentResult> {
// Validate payment
// Process through PSP
// Store events
// Return result
}
}
// Query side - optimized for reads
interface PaymentQuery {
getPaymentStatus(paymentId: string): Promise<PaymentStatus>;
getPaymentHistory(accountId: string): Promise<PaymentHistory>;
}
class PaymentQueryService implements PaymentQuery {
// Read from optimized read models
// Aggregate data from multiple sources
// Cache frequently accessed data
}
2. Asynchronous Processing
Never make external API calls synchronously in the request path:
class PaymentOrchestrator {
async initiatePayment(request: PaymentRequest): Promise<PaymentInitiationResponse> {
// 1. Immediate validation and response
const paymentId = await this.validateAndCreatePayment(request);
// 2. Async processing
await this.messageQueue.publish('payment.process', {
paymentId,
request
});
// 3. Return immediately with tracking ID
return {
paymentId,
status: 'PROCESSING',
estimatedCompletion: new Date(Date.now() + 30000) // 30 seconds
};
}
@MessageHandler('payment.process')
async processPayment(message: PaymentProcessMessage) {
try {
const result = await this.pspClient.processPayment(message.request);
await this.updatePaymentStatus(message.paymentId, result);
await this.notifyUser(message.paymentId, 'SUCCESS');
} catch (error) {
await this.handlePaymentFailure(message.paymentId, error);
}
}
}
3. Idempotency
Payment operations must be idempotent to handle retries safely:
class IdempotentPaymentProcessor {
async processPayment(
idempotencyKey: string,
paymentRequest: PaymentRequest
): Promise<PaymentResult> {
// Check if we've already processed this request
const existingResult = await this.getExistingResult(idempotencyKey);
if (existingResult) {
return existingResult;
}
// Process payment with distributed lock
const lockKey = `payment:${idempotencyKey}`;
return await this.withLock(lockKey, async () => {
// Double-check after acquiring lock
const result = await this.getExistingResult(idempotencyKey);
if (result) return result;
// Process the payment
const newResult = await this.doProcessPayment(paymentRequest);
// Store result with idempotency key
await this.storeResult(idempotencyKey, newResult);
return newResult;
});
}
}
Failure Handling Strategies
1. Retry Logic with Exponential Backoff
class PaymentRetryService {
async processWithRetry<T>(
operation: () => Promise<T>,
retryConfig: RetryConfig = DEFAULT_RETRY_CONFIG
): Promise<T> {
let lastError: Error;
for (let attempt = 1; attempt <= retryConfig.maxAttempts; attempt++) {
try {
return await operation();
} catch (error) {
lastError = error;
if (!this.isRetryableError(error) || attempt === retryConfig.maxAttempts) {
throw error;
}
const delay = this.calculateBackoff(attempt, retryConfig);
await this.sleep(delay);
}
}
throw lastError!;
}
private calculateBackoff(attempt: number, config: RetryConfig): number {
const exponentialDelay = Math.pow(2, attempt - 1) * config.baseDelayMs;
const jitter = Math.random() * config.jitterMs;
return Math.min(exponentialDelay + jitter, config.maxDelayMs);
}
private isRetryableError(error: Error): boolean {
// Network timeouts, 5xx errors, etc.
return error.name === 'TimeoutError' ||
error.name === 'NetworkError' ||
(error as any).status >= 500;
}
}
2. Dead Letter Queue Pattern
class PaymentDeadLetterHandler {
@MessageHandler('payment.process.dlq')
async handleFailedPayment(message: PaymentProcessMessage) {
const payment = await this.paymentRepo.findById(message.paymentId);
// Analyze failure pattern
const failureAnalysis = await this.analyzeFailure(payment);
switch (failureAnalysis.category) {
case 'TEMPORARY_PSP_ISSUE':
// Retry with different PSP
await this.retryWithAlternatePSP(payment);
break;
case 'INSUFFICIENT_FUNDS':
// Notify user and mark payment as failed
await this.handleInsufficientFunds(payment);
break;
case 'FRAUD_DETECTED':
// Escalate to fraud team
await this.escalateToFraud(payment);
break;
default:
// Manual investigation required
await this.createManualReviewTask(payment);
}
}
}
Monitoring and Observability
Key Metrics to Track
class PaymentMetrics {
// Business metrics
@Counter('payments_total', ['status', 'psp', 'payment_method'])
paymentsTotal = new Counter();
@Histogram('payment_amount_usd', ['currency'])
paymentAmounts = new Histogram();
@Gauge('payment_success_rate', ['psp'])
successRate = new Gauge();
// Technical metrics
@Histogram('payment_processing_duration_ms', ['psp'])
processingDuration = new Histogram();
@Counter('payment_retries_total', ['reason'])
retries = new Counter();
@Gauge('active_payment_workers')
activeWorkers = new Gauge();
recordPayment(payment: Payment, duration: number) {
this.paymentsTotal.inc({
status: payment.status,
psp: payment.psp,
payment_method: payment.method
});
this.paymentAmounts.observe(
{ currency: payment.currency },
payment.amount
);
this.processingDuration.observe(
{ psp: payment.psp },
duration
);
}
}
Alerting Strategy
class PaymentAlerting {
// Critical alerts - page immediately
@Alert({
severity: 'CRITICAL',
message: 'Payment success rate below 95%',
condition: 'payment_success_rate < 0.95',
duration: '2m'
})
lowSuccessRate() {}
@Alert({
severity: 'CRITICAL',
message: 'Payment processing queue backing up',
condition: 'payment_queue_depth > 1000',
duration: '1m'
})
queueBacklog() {}
// Warning alerts - investigate during business hours
@Alert({
severity: 'WARNING',
message: 'High payment retry rate',
condition: 'payment_retry_rate > 0.1',
duration: '5m'
})
highRetryRate() {}
}
Security Considerations
1. PCI DSS Compliance
// Never log sensitive payment data
class SecureLogger {
logPaymentEvent(event: PaymentEvent) {
const sanitized = {
...event,
cardNumber: this.maskCardNumber(event.cardNumber),
cvv: '[REDACTED]',
expiryDate: '[REDACTED]'
};
this.logger.info('Payment event', sanitized);
}
private maskCardNumber(cardNumber: string): string {
if (!cardNumber || cardNumber.length < 4) return '[REDACTED]';
return '*'.repeat(cardNumber.length - 4) + cardNumber.slice(-4);
}
}
2. Encryption at Rest and in Transit
class PaymentDataEncryption {
async storePaymentData(data: SensitivePaymentData): Promise<void> {
const encrypted = await this.encryption.encrypt(
JSON.stringify(data),
this.getEncryptionKey()
);
await this.database.store(encrypted);
}
async retrievePaymentData(id: string): Promise<SensitivePaymentData> {
const encrypted = await this.database.retrieve(id);
const decrypted = await this.encryption.decrypt(
encrypted,
this.getEncryptionKey()
);
return JSON.parse(decrypted);
}
}
Testing Strategies
1. Chaos Engineering
class PaymentChaosTests {
@Test('PSP timeout simulation')
async testPSPTimeout() {
// Simulate PSP timeout
this.mockPSP.setLatency(30000); // 30 second timeout
const result = await this.paymentService.processPayment(testPayment);
expect(result.status).toBe('FAILED');
expect(result.errorCode).toBe('PSP_TIMEOUT');
}
@Test('Network partition simulation')
async testNetworkPartition() {
// Simulate network partition during payment processing
this.networkSimulator.createPartition(['payment-service', 'psp']);
const result = await this.paymentService.processPayment(testPayment);
// Should gracefully handle partition
expect(result.status).toBe('PENDING');
}
}
Key Takeaways
- Design for Failure: Assume every external service will fail and design accordingly
- Asynchronous by Default: Never block user requests on external API calls
- Idempotency is Critical: All payment operations must be safely retryable
- Monitor Everything: Track both business and technical metrics
- Security First: PCI compliance isn't optional—build it in from day one
- Test Failure Scenarios: Use chaos engineering to validate your failure handling
Building payment systems is complex, but following these patterns and principles will help you build systems that are both robust and compliant. Remember: in payments, reliability and compliance are features, not afterthoughts.