Everyone’s building AI apps. And whether you actually need a GPU or just think you do, demand has exploded. 

Originally built for graphics rendering, GPUs have proven exceptionally effective for compute-hungry workloads like machine learning, AI, blockchain, and gaming. 

Luckily, running these workloads just got cheaper: AWS announced up to a 45% price reduction on its P4 and P5 instances (June 2025). In this guide, we’ll cover everything you need to know GPU instances and how to cost optimize with them. 

What Are GPU-Accelerated Workloads?

GPUs are built to process thousands of operations in parallel, making them ideal for workloads where throughput and concurrency matter more than single-thread speed. Unlike CPUs with a few powerful cores, GPUs contain thousands of smaller cores optimized for high-volume, repetitive tasks. This makes them essential for compute-heavy workloads like deep learning and simulations. As demand has surged, AWS has invested heavily—securing long-term NVIDIA contracts and building one of the largest GPU fleets in the world.

What is the EC2 P Family?

The EC2 P family is AWS’s high-performance line of GPU-powered instances built for compute-intensive workloads like machine learning training, large-scale inference, and scientific simulations. These instances are backed by NVIDIA’s most powerful data center GPUs and are designed to deliver maximum throughput and scalability.

Each generation brings significant improvements in GPU architecture, memory bandwidth, and networking. Later generations (P4 and P5) include support for features like Elastic Fabric Adapter (EFA) and GPUDirect RDMA, which reduce latency and improve performance in distributed workloads.

P family table

These instances are typically used in clusters for high-throughput training, inference pipelines, and simulation-heavy HPC workloads. If your job is bottlenecked by GPU throughput or memory bandwidth, the P family is likely where you’ll land.

What is the EC2 G Family?

The EC2 G family is AWS’s line of GPU instances designed for graphics rendering, media streaming, and lightweight machine learning inference. Compared to the P family, G instances are optimized for workloads that benefit from GPU acceleration but don’t require the full compute power of high-end data center GPUs.

G instances are built on NVIDIA’s T4, L4, and L40S GPUs, which are more power-efficient and cost-effective than the A100 or H100 chips in the P family. They support virtualization technologies like NVIDIA GRID, making them a strong choice for virtual desktops, game streaming, and 3D visualization.

These instances are ideal when you need GPU acceleration without the full cost or scale of the P family. Common use cases include video processing, virtual workstations, graphics-intensive applications, and real-time inference workloads.

EC2 P Family vs G Family

Both the P and G families offer GPU acceleration, but they’re optimized for very different use cases. Let’s compare across key dimensions: 

 

Category

EC2 P Family (ML, HPC, Compute-Intensive)

EC2 G Family (Graphics, Media, Inference)

Best For

Deep learning training, scientific simulations, large-scale inference, high-throughput AI pipelines

Real-time inference, media encoding, game streaming, virtual desktops

GPU Models

V100 (P3), A100 (P4), H100/H200 (P5), B100 (P6 upcoming)

T4 (G4dn), A10G (G5), L4/L40S (G6 upcoming)

GPU Memory & Throughput

16–141 GB; up to 400+ TFLOPS (mixed precision)

16–48 GB; ~65–130 TFLOPS

Instance Sizing

8–96 vCPUs; large-memory, multi-GPU configs (e.g. P4d.24xlarge)

4–64 vCPUs; flexible sizing for lighter GPU needs

Networking & Scale

Up to 400 Gbps with UltraClusters; supports EFA and GPUDirect RDMA for low-latency distributed training

Up to 25 Gbps; no EFA; suitable for single-node or lightly scaled workloads

Storage Options

EBS-only or EBS + NVMe (P4de)

Local NVMe + EBS support (especially in G4dn/G5)

Pricing (On-Demand)

High: ~$3 to $32/hr depending on GPU and size

Lower: ~$0.50 to $8/hr depending on instance type

Scalability

Designed for multi-node training, tightly coupled HPC, massive GPU clusters

Scales well for graphics or parallel batch jobs, but not optimized for distributed training

Key Advantages

Maximum GPU compute density, memory bandwidth, and interconnect performance for production-scale AI

Cost-efficient GPU access for media, visualization, and low-latency inference

Common Instance Types

P3.8xlarge, P4d.24xlarge, P5.48xlarge

G4dn.12xlarge, G5.2xlarge, G5.16xlarge

What are the use cases of EC2 P Family?

AWS offers multiple GPU instance families, but most compute-intensive production workloads will benefit from the performance and memory capacity of the P family. Below are the most common use cases and which instance family to use.

  1. Training Large Machine Learning Models: Training complex models like BERT, LLaMA, or ResNet requires massive GPU throughput, fast interconnects, and large memory capacity. Features like NVLink, multi-GPU configurations, and EFA on P4 and P5 instances make them the clear choice for scalable training.
    Use P family (P4 or P5) for all but the smallest experiments.

  2. Fine-Tuning and Serving Large Language Models (LLMs): P4 and P5 provide the GPU memory and tensor compute needed for serving multi-billion parameter models or fine-tuning with large batch sizes. If you’re doing light, low-latency inference with smaller LLMs, G5 can be a cost-effective option.
    Use P for high-traffic or large model deployments.
    Use G for lightweight inference of small transformer models.

  3. Generative AI and Diffusion Models: Models like Stable Diffusion and StyleGAN require fast matrix math, high memory bandwidth, and multi-GPU scaling—making P4 and P5 ideal for both training and production pipelines.
    Use P family for image/video/audio generation at scale.

  4. High-Performance Computing (HPC): Scientific simulations, fluid dynamics, and genome analysis benefit from high FP64 performance and low-latency inter-node networking. P-family instances with EFA and GPUDirect RDMA are purpose-built for these workloads.
    Use P family for any HPC use case with tight coupling or precision requirements.

  5. Media Rendering and Streaming: Video encoding, game streaming, and 3D visualization typically don’t need the compute power of high-end GPUs. G-family instances are optimized for graphics acceleration and support NVIDIA GRID virtualization.
    Use G family (G4dn or G5) for rendering, encoding, and VDI workloads.

  6. Batch Inference at Scale: If you’re classifying millions of records or running offline inference on massive datasets, P4 and P5 deliver better throughput and support larger batch sizes.
    Use P family when you care about throughput over latency.
    Use G family for smaller or cost-sensitive batch jobs.

  7. Enterprise AI Pipelines: When your pipeline spans preprocessing, training, and deployment, P-family instances offer the consistency and scalability needed for end-to-end automation.
    Use P family to avoid switching infrastructure between pipeline stages.

EC2 GPU Pricing

GPU instances on AWS are priced based on a combination of GPU type, instance size, and region. In general, P-family instances are significantly more expensive than G-family instances due to their higher-end GPUs, larger memory footprints, and advanced networking capabilities. That said, their performance per dollar can be favorable for large-scale training and inference—especially when combined with Savings Plans or Spot Instances.

Here’s a screenshot of the most recent p family and g family pricing from the Amazon pricing page

To reduce costs, many teams use Savings Plans, which offer up to 45% off in exchange for a 1- or 3-year commitment. Spot Instances are also available for some P and G instance types, often at steep discounts—but with interruption risk. We’ll cover both strategies in more detail in the cost optimization section.

Recent Changes to AWS GPU Pricing

In June 2025, AWS announced significant price reductions across its most powerful NVIDIA GPU instances. The update includes up to 45% savings on P4 and P5 instances, covering both On-Demand and Savings Plan pricing across all available regions.

Key Reductions (vs May 31, 2025 baseline):

Instance Type

GPU

On-Demand

EC2 Savings Plan

Compute Savings Plan

P4d

A100

33%

31% (1yr) / 31% (3yr)

25% (1yr) / –

P4de

A100

33%

31% (1yr) / 31% (3yr)

25% (1yr) / –

P5

H100

44%

45% (1yr) / 44% (3yr)

25% (1yr) / –

P5en

H200

25%

26% (1yr) / 25% (3yr)

Expanded Availability

To help customers take advantage of the new pricing, AWS also expanded On-Demand GPU capacity in multiple regions:

  • P4d: Seoul, Sydney, Canada (Central), London
  • P4de: Northern Virginia
  • P5: Mumbai, Tokyo, Jakarta, São Paulo
    P5en: Mumbai, Tokyo, Jakarta

P6 Capacity Update

AWS also confirmed that P6-B200 (Blackwell) instances are now available through Savings Plans, following their launch via EC2 Capacity Blocks for ML in May 2025. These instances are designed for large-scale AI training and inference using the latest NVIDIA Blackwell architecture.

With these changes, AWS continues to scale its GPU offerings and pass along cost efficiencies to customers building the next generation of AI and HPC workloads.

How to Cost Optimize with EC2 P Family

GPU-backed workloads can quickly become one of the most expensive components of your AWS bill—but there are proven strategies to reduce that cost without sacrificing performance. Here’s how engineering teams can optimize spend when using the EC2 P family.

1. Use Savings Plans for Steady Workloads

If you have long-running training pipelines, model development environments, or predictable batch jobs, EC2 Instance Savings Plans offer the best discount. As of June 2025, AWS provides up to 45% savings on P4 and P5 instances with a 1- or 3-year commitment. This is ideal for foundational workloads that require consistent compute.

2. Leverage Spot Instances for Flexible or Interruptible Jobs

Spot capacity is available for many P-family instances—especially older generations like P3 and P2. These can offer up to 70–90% discounts compared to On-Demand. Use Spot for:

  • Preemptible training runs
    Experiments and tuning jobs
  • Non-time-sensitive batch inference

Be sure to configure checkpoints or use managed services (e.g. SageMaker Managed Spot Training) to handle interruptions gracefully.

3. Choose the Right Generation for Your Workload

Not every workload needs a P5. For smaller models or less memory-intensive tasks, older generation instances like P3.8xlarge or P4d.24xlarge may deliver better performance per dollar. Match instance type to:

  • Model size and batch size
  • Memory bandwidth requirements
  • Cost sensitivity vs speed of completion

4. Consolidate Training into Fewer, Larger Jobs

When scaling horizontally, fewer larger instances (e.g. P4d.24xlarge) may be more cost-efficient than many smaller ones. Larger nodes offer NVLink and GPUDirect, which reduce inter-GPU latency and speed up multi-GPU training—cutting overall job duration and total cost.

5. Use nOps to Automate GPU Cost Optimization

nOps Compute Optimization constantly manages the scheduling and scaling of your workloads for the best price and stability, automatically implementing these best practices for you. 

  • Optimize your RI, SP and Spot automatically for 50%+ savings — Copilot analyzes your organizational usage and market pricing to ensure you’re always on the best options.
  • Reliably run business-critical workloads on Spot. Our ML model predicts Spot terminations 60 minutes in advance.
  • All-in-one solution — get all the essential cloud cost optimization features (cost allocation, reporting, scheduling, rightsizing, & more) 

Copilot is entrusted with over 2 billion dollars of cloud spend. Join our satisfied customers who recently named us #1 in G2’s cloud cost management category by booking a demo today!