E-Commerce Public Cloud Architecture (No Kubernetes) — Availability, Latency, Security, and Cost Optimization

Based on the diagram below, this is a reference architecture for running an e-commerce service on the public cloud without Kubernetes. The goals are fourfold: high availability, low latency, strong security, and cost optimization—especially under burst traffic (e.g., flash sales). The examples assume AWS.


1) High-Level Overview

Network

  • A single VPC with public/private subnet pairs across multiple AZs.
  • Public subnets: ALB, NAT Gateway, optional Bastion.
    Private subnets: EC2 (ASG), RDS Multi-AZ, internal services, VPC Endpoints.
  • Site-to-Site VPN from the on-prem management network to VGW (VPN Gateway). Ops/DB admin traffic stays on private paths only.

Traffic Entry

  • CloudFront (CDN) in front with AWS WAF.
  • Static assets use S3 as the origin. Dynamic requests route to ALB (EC2 app) as the origin.
  • Selected APIs (webhooks/event-driven) take a serverless side path: API Gateway → Lambda → SQS/DynamoDB to absorb bursts.

Application

  • ALB → Auto Scaling Group (EC2). Image delivery via ECR/CodeDeploy/SSM—any standard method is fine.
  • Stateful data in RDS Multi-AZ (MySQL/PostgreSQL). Add read replicas when read traffic spikes.

Security & Observability

  • WAF, GuardDuty, Macie, Inspector, Security Hub.
  • CloudTrail, CloudWatch, VPC Flow Logs, AWS Config, and Systems Manager (SSM).

Endpoints

  • Use Gateway/Interface VPC Endpoints (S3, DynamoDB, CloudWatch Logs, SSM, ECR, etc.) so internal calls avoid NAT and reduce cost.

2) Request Flows (Two Paths)

2-1. Static / Cacheable Content

Client → CloudFront (+WAF) → S3 (static) | ALB (cacheable dynamic)

  • CloudFront caching, compression, HTTP/2/3, TLS 1.3, and Origin Shield reduce TTFB and origin load.
  • Serve images/JS/CSS with long-term cache + hashed versions; cache misses are still absorbed at the edge.

2-2. Dynamic / Event-Driven

  1. Regular dynamic: Client → CloudFront → ALB → EC2 (ASG) → RDS
  2. Burst / async: Client or external webhook → API Gateway → Lambda → SQS (queue) → backend consumers (EC2 or Lambda) → DynamoDB/RDS
  • SQS buffers spikes (sales, pushes, batch settlements). Consumers scale out horizontally and drain safely.
  • Heavy read pages (rankings/inventory) can move to DynamoDB + DAX (optional).

3) Availability Design

  • Multi-AZ everywhere: ALB, distributed ASG, and RDS Multi-AZ (synchronous standby).
  • Health checks & auto-recovery: ALB targets, EC2 status checks, automatic RDS failover.
  • NAT Gateways per AZ so other AZs can carry traffic on failure.
    If cost is a concern, reducing NAT dependence matters even more (see §6).
  • S3/CloudFront are inherently highly available. Route 53 can add health checks/failover records.

4) Latency Optimization

  • CDN first: cache hit rate drives both performance and cost.
  • ALB + keep-alive: connection reuse, tuned timeouts; compression via CloudFront (gzip/Brotli).
  • Database is the slowest tier: cache reads and offload writes to async (SQS) to smooth peaks.
  • VPC Endpoints keep internal calls close and off NAT.
  • App autoscaling: step/predictive policies and pre-warm before early-morning sales.

5) Security Design

  • Edge: CloudFront WAF rules, rate limiting, IP allow/deny, bot filtering.
  • Network: strict public/private subnet separation; least-privilege security groups; DBs private only.
  • Identity/Audit: least-privilege IAM, CloudTrail in all regions; immutable S3/log retention.
  • Detection/Assessment: GuardDuty (threats), Inspector (vulns), Macie (sensitive data), unified in Security Hub.
  • Ops access: VPN + IAM.

6) Cost Optimization — Especially for Bursts

The levers are NAT reduction, cache hit rate, async design, and pricing mix.

6-1. Cut NAT Spend

  • Use VPC Endpoints aggressively: S3/DynamoDB (Gateway) and CloudWatch Logs/SSM/ECR (Interface)
    → large egress doesn’t traverse NAT (lower volume × lower rate).
  • Minimize outbound calls. For third-party calls, batch/async through Lambda where possible.

6-2. Protect Origins with CDN

  • Static uses S3 origin; for dynamic, use short TTL + stale-while-revalidate patterns.
  • Image resizing/serving via Lambda@Edge/CloudFront Functions reduces origin storage and data transfer.

6-3. Absorb Bursts with Queues

  • Orders/notifications/settlement/webhooks: API GW → Lambda → SQS to flatten spikes.
  • Consumers scale (ASG/Lambda) and prevent sudden DB pressure.
  • DynamoDB: on-demand write to catch bursts + TTL/GSIs to keep costs in check; cacheable reads via DAX.

6-4. Compute Pricing Mix

  • Cover 50–70% steady load with Savings Plans/Reserved Instances.
  • Serve peaks with On-Demand plus async paths.
  • Spot isn’t ideal for front-end web (interruption risk). Use it for async consumers/batch only.

6-5. RDS Cost/Performance Balance

  • Start with RDS Multi-AZ (without premium IO). Add IO-Optimized/Provisioned IOPS only when you truly hit IO limits.
  • Offload reads to read replicas and in-app/CloudFront caching to avoid DB over-sizing.
  • For critical write spikes, buffer via SQS and commit in batches—often the biggest saver.

6-6. Observability Spend

  • Log sampling, short hot retention, and lifecycle to Glacier.
  • Enable VPC Flow Logs only where needed and filter before storage.

7) Component Checklists

VPC/Subnets

  • AZ A/C pair. Public (NAT/ALB), Private (EC2/RDS/Endpoints).
  • Routing: private subnets prefer endpoints; use NAT only for true external egress.

CloudFront + WAF

  • Cache policies, compression, Origin Shield, managed rules + custom rate limits.

ALB + ASG (EC2)

  • Health checks, scale-out on CPU/latency/queue length, warm up before deployments.
  • Golden AMIs and minimal UserData to shorten instance boot time.

API Gateway + Lambda + SQS/DynamoDB (optional path)

  • Separate the burst path. Set DLQs/retries and idempotency keys.
  • DynamoDB on-demand with solid partition-key design.

RDS Multi-AZ

  • Backups/snapshots, parameter groups, connection pooling, slow query metrics.
  • If splitting reads/writes, handle routing/driver configuration explicitly.

Security/Operations

  • Least-privilege IAM. SSM sessions. Secrets in Secrets Manager/Parameter Store.
  • Pipe GuardDuty/Inspector/Macie into Security Hub; alert via CloudWatch Alarms.

8) Why “No Kubernetes”?

  • ALB + ASG plus serverless achieves strong availability and elasticity without extra platform complexity.
  • Queue-based async handles bursts more predictably and cheaply.
  • Standardized ops (SSM/Inspector/Config/CloudTrail) and clean cost visibility are simpler to maintain.

9) Growth Path (as Traffic Matures)

  • More reads: stronger global caching (CloudFront), DB replicas, or DynamoDB adoption.
  • Bigger write spikes: expand SQS, scale consumers, batch commits.
  • Lower DB latency / higher resilience: consider Aurora (incl. Serverless v2).
  • More global users: CloudFront regional optimizations; Route 53 latency-based routing.

Summary

  • Static via S3 + CloudFront, dynamic via ALB + ASG, and bursts via API GW + Lambda + SQS.
  • RDS Multi-AZ for write availability/consistency; read replicas/caches for read scale.
  • VPC Endpoints reduce NAT dependence; higher CDN cache cuts origin and transfer costs.
  • Use Savings Plans/RI for steady load; handle peaks with async + On-Demand.
  • Keep security strong at the edge (WAF) and with account-level detection/audit.

This design meets availability, latency, security, and cost goals while safely absorbing flash-sale–style burst traffic. Operations stay simple, costs remain predictable, and the scale path is clear.

ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. 본문 및 이미지를 무단 복제·배포할 수 없습니다. 공유 시 반드시 원문 링크를 명시해 주세요.
ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. All rights reserved. Unauthorized copying or redistribution of the text and images is prohibited. When sharing, please include the original source link.

🛠 마지막 수정일: 2025.10.14