NVIDIA® Tesla® P40 and P4 GPU accelerators give you the perfect solution - built to deliver the highest throughput and most responsive experiences for deep learning inference workloads. They’re powered by NVIDIA Pascal™ architecture to provide over 60X faster inference performance than CPUs for real-time responsiveness in even the most complex deep learning models.
MAXIMUM DEEP LEARNING INFERENCE THROUGHPUT
The Tesla P40 is purpose-built to deliver maximum throughput for deep learning inference. With 47 TOPS (Tera-Operations Per Second) of inference performance per GPU, a single server with eight Tesla P40s can replace over 100 CPU servers.
ULTRA-EFFICIENT DEEP LEARNING IN SCALE-OUT SERVERS
The Tesla P4 accelerates any scale-out server, offering an incredible 40X higher energy efficiency compared to CPUs.
DEEP LEARNING ACCELERATOR FEATURES AND BENEFITS
These GPUs power faster predictions that enable amazing user experiences for AI applications.
100X Higher Throughput to Keep Up with Expanding Data
The volume of data generated every day in the form of sensor logs, images, videos, and records is economically impractical to process on CPUs. Pascal-powered GPUs give data centers a dramatic boost in throughput for deep learning deployment workloads and extract intelligence from this tsunami of data. A server with eight Tesla P40s can replace over 100 CPU-only servers for deep learning workloads, so you get higher throughput with lower acquisition cost.
Unprecedented Efficiency for Low-Power Scale-out Servers
The ultra-efficient Tesla P4 GPU accelerates density-optimized scale-out servers with a small form factor and 50/75 W power footprint design. It delivers an incredible 40X better energy efficiency than CPUs for deep learning inference workloads. This lets hyperscale customers scale within their existing infrastructure and service the exponential growth in demand for AI-based applications.
A Dedicated Decode Engine for New AI-based Video Services
Tesla P4 and P40 GPUs can analyze up to 39 HD video streams in real time, powered by a dedicated hardware-accelerated decode engine that works in parallel with the NVIDIA CUDA® cores performing inference. By integrating deep learning into the video pipeline, customers can offer new levels of smart, innovative video services to users.
Faster Deployment With NVIDIA TensorRT™ and DeepStream SDK
NVIDIA TensorRT is a high-performance neural network inference engine for production deployment of deep learning applications. It includes a library created to optimize deep learning models for production deployment, taking trained neural nets - usually in 32-bit or 16-bit data - and optimizing them for reduced-precision INT8 operations. NVIDIA DeepStream SDK taps into the power of Pascal GPUs to simultaneously decode and analyze video streams.