Back to Research
Fintech2024-03-1012 min read

Payment Processing Architecture: Lessons Learned

Deep dive into payment processing systems, reliability patterns, and handling failures at scale.

Payment Processing Architecture: Lessons Learned

After building payment systems that process millions of transactions daily, I've learned that successful payment architecture is 80% about handling failures gracefully and 20% about the happy path. This post shares the key architectural patterns and lessons learned from building resilient payment systems.

The Payment System Landscape

Modern payment processing involves multiple parties:

  • Payment Service Providers (PSPs): Stripe, Adyen, Square
  • Card Networks: Visa, Mastercard, American Express
  • Acquiring Banks: Process merchant transactions
  • Issuing Banks: Issue cards to consumers
  • Regulators: PCI DSS, regional financial authorities

Each of these introduces potential failure points, latency, and complexity.

Core Architecture Patterns

1. Command Query Responsibility Segregation (CQRS)

Separate your read and write models for payment data:

// Command side - optimized for writes
interface PaymentCommand {
  execute(command: ProcessPaymentCommand): Promise<PaymentResult>;
}

class PaymentProcessor implements PaymentCommand {
  async execute(command: ProcessPaymentCommand): Promise<PaymentResult> {
    // Validate payment
    // Process through PSP
    // Store events
    // Return result
  }
}

// Query side - optimized for reads
interface PaymentQuery {
  getPaymentStatus(paymentId: string): Promise<PaymentStatus>;
  getPaymentHistory(accountId: string): Promise<PaymentHistory>;
}

class PaymentQueryService implements PaymentQuery {
  // Read from optimized read models
  // Aggregate data from multiple sources
  // Cache frequently accessed data
}

2. Asynchronous Processing

Never make external API calls synchronously in the request path:

class PaymentOrchestrator {
  async initiatePayment(request: PaymentRequest): Promise<PaymentInitiationResponse> {
    // 1. Immediate validation and response
    const paymentId = await this.validateAndCreatePayment(request);

    // 2. Async processing
    await this.messageQueue.publish('payment.process', {
      paymentId,
      request
    });

    // 3. Return immediately with tracking ID
    return {
      paymentId,
      status: 'PROCESSING',
      estimatedCompletion: new Date(Date.now() + 30000) // 30 seconds
    };
  }

  @MessageHandler('payment.process')
  async processPayment(message: PaymentProcessMessage) {
    try {
      const result = await this.pspClient.processPayment(message.request);
      await this.updatePaymentStatus(message.paymentId, result);
      await this.notifyUser(message.paymentId, 'SUCCESS');
    } catch (error) {
      await this.handlePaymentFailure(message.paymentId, error);
    }
  }
}

3. Idempotency

Payment operations must be idempotent to handle retries safely:

class IdempotentPaymentProcessor {
  async processPayment(
    idempotencyKey: string,
    paymentRequest: PaymentRequest
  ): Promise<PaymentResult> {

    // Check if we've already processed this request
    const existingResult = await this.getExistingResult(idempotencyKey);
    if (existingResult) {
      return existingResult;
    }

    // Process payment with distributed lock
    const lockKey = `payment:${idempotencyKey}`;
    return await this.withLock(lockKey, async () => {

      // Double-check after acquiring lock
      const result = await this.getExistingResult(idempotencyKey);
      if (result) return result;

      // Process the payment
      const newResult = await this.doProcessPayment(paymentRequest);

      // Store result with idempotency key
      await this.storeResult(idempotencyKey, newResult);

      return newResult;
    });
  }
}

Failure Handling Strategies

1. Retry Logic with Exponential Backoff

class PaymentRetryService {
  async processWithRetry<T>(
    operation: () => Promise<T>,
    retryConfig: RetryConfig = DEFAULT_RETRY_CONFIG
  ): Promise<T> {
    let lastError: Error;

    for (let attempt = 1; attempt <= retryConfig.maxAttempts; attempt++) {
      try {
        return await operation();
      } catch (error) {
        lastError = error;

        if (!this.isRetryableError(error) || attempt === retryConfig.maxAttempts) {
          throw error;
        }

        const delay = this.calculateBackoff(attempt, retryConfig);
        await this.sleep(delay);
      }
    }

    throw lastError!;
  }

  private calculateBackoff(attempt: number, config: RetryConfig): number {
    const exponentialDelay = Math.pow(2, attempt - 1) * config.baseDelayMs;
    const jitter = Math.random() * config.jitterMs;
    return Math.min(exponentialDelay + jitter, config.maxDelayMs);
  }

  private isRetryableError(error: Error): boolean {
    // Network timeouts, 5xx errors, etc.
    return error.name === 'TimeoutError' ||
           error.name === 'NetworkError' ||
           (error as any).status >= 500;
  }
}

2. Dead Letter Queue Pattern

class PaymentDeadLetterHandler {
  @MessageHandler('payment.process.dlq')
  async handleFailedPayment(message: PaymentProcessMessage) {
    const payment = await this.paymentRepo.findById(message.paymentId);

    // Analyze failure pattern
    const failureAnalysis = await this.analyzeFailure(payment);

    switch (failureAnalysis.category) {
      case 'TEMPORARY_PSP_ISSUE':
        // Retry with different PSP
        await this.retryWithAlternatePSP(payment);
        break;

      case 'INSUFFICIENT_FUNDS':
        // Notify user and mark payment as failed
        await this.handleInsufficientFunds(payment);
        break;

      case 'FRAUD_DETECTED':
        // Escalate to fraud team
        await this.escalateToFraud(payment);
        break;

      default:
        // Manual investigation required
        await this.createManualReviewTask(payment);
    }
  }
}

Monitoring and Observability

Key Metrics to Track

class PaymentMetrics {
  // Business metrics
  @Counter('payments_total', ['status', 'psp', 'payment_method'])
  paymentsTotal = new Counter();

  @Histogram('payment_amount_usd', ['currency'])
  paymentAmounts = new Histogram();

  @Gauge('payment_success_rate', ['psp'])
  successRate = new Gauge();

  // Technical metrics
  @Histogram('payment_processing_duration_ms', ['psp'])
  processingDuration = new Histogram();

  @Counter('payment_retries_total', ['reason'])
  retries = new Counter();

  @Gauge('active_payment_workers')
  activeWorkers = new Gauge();

  recordPayment(payment: Payment, duration: number) {
    this.paymentsTotal.inc({
      status: payment.status,
      psp: payment.psp,
      payment_method: payment.method
    });

    this.paymentAmounts.observe(
      { currency: payment.currency },
      payment.amount
    );

    this.processingDuration.observe(
      { psp: payment.psp },
      duration
    );
  }
}

Alerting Strategy

class PaymentAlerting {
  // Critical alerts - page immediately
  @Alert({
    severity: 'CRITICAL',
    message: 'Payment success rate below 95%',
    condition: 'payment_success_rate < 0.95',
    duration: '2m'
  })
  lowSuccessRate() {}

  @Alert({
    severity: 'CRITICAL',
    message: 'Payment processing queue backing up',
    condition: 'payment_queue_depth > 1000',
    duration: '1m'
  })
  queueBacklog() {}

  // Warning alerts - investigate during business hours
  @Alert({
    severity: 'WARNING',
    message: 'High payment retry rate',
    condition: 'payment_retry_rate > 0.1',
    duration: '5m'
  })
  highRetryRate() {}
}

Security Considerations

1. PCI DSS Compliance

// Never log sensitive payment data
class SecureLogger {
  logPaymentEvent(event: PaymentEvent) {
    const sanitized = {
      ...event,
      cardNumber: this.maskCardNumber(event.cardNumber),
      cvv: '[REDACTED]',
      expiryDate: '[REDACTED]'
    };

    this.logger.info('Payment event', sanitized);
  }

  private maskCardNumber(cardNumber: string): string {
    if (!cardNumber || cardNumber.length < 4) return '[REDACTED]';
    return '*'.repeat(cardNumber.length - 4) + cardNumber.slice(-4);
  }
}

2. Encryption at Rest and in Transit

class PaymentDataEncryption {
  async storePaymentData(data: SensitivePaymentData): Promise<void> {
    const encrypted = await this.encryption.encrypt(
      JSON.stringify(data),
      this.getEncryptionKey()
    );

    await this.database.store(encrypted);
  }

  async retrievePaymentData(id: string): Promise<SensitivePaymentData> {
    const encrypted = await this.database.retrieve(id);
    const decrypted = await this.encryption.decrypt(
      encrypted,
      this.getEncryptionKey()
    );

    return JSON.parse(decrypted);
  }
}

Testing Strategies

1. Chaos Engineering

class PaymentChaosTests {
  @Test('PSP timeout simulation')
  async testPSPTimeout() {
    // Simulate PSP timeout
    this.mockPSP.setLatency(30000); // 30 second timeout

    const result = await this.paymentService.processPayment(testPayment);

    expect(result.status).toBe('FAILED');
    expect(result.errorCode).toBe('PSP_TIMEOUT');
  }

  @Test('Network partition simulation')
  async testNetworkPartition() {
    // Simulate network partition during payment processing
    this.networkSimulator.createPartition(['payment-service', 'psp']);

    const result = await this.paymentService.processPayment(testPayment);

    // Should gracefully handle partition
    expect(result.status).toBe('PENDING');
  }
}

Key Takeaways

  1. Design for Failure: Assume every external service will fail and design accordingly
  2. Asynchronous by Default: Never block user requests on external API calls
  3. Idempotency is Critical: All payment operations must be safely retryable
  4. Monitor Everything: Track both business and technical metrics
  5. Security First: PCI compliance isn't optional—build it in from day one
  6. Test Failure Scenarios: Use chaos engineering to validate your failure handling

Building payment systems is complex, but following these patterns and principles will help you build systems that are both robust and compliant. Remember: in payments, reliability and compliance are features, not afterthoughts.

Discussion (2)

A

Alex Chen

Great insights on distributed systems! The Saga pattern explanation was particularly helpful.

S

Sarah Williams

This article helped me understand event sourcing better. Do you have any recommendations for implementing this in Node.js?