Spuler David. Generative AI in C++: Coding Transformers and LLMs

zip file
size 1,14 MB
contains epub document(s)

added by morozov_97 04/22/2025 15:55

Spuler David. Generative AI in C++: Coding Transformers and LLMs

Yoryck AI Pty Ltd., 2024. — 947 p. — ISBN-13 979-8871928684.

Do you know C++ but not AI? Do you dream of writing your AI engine in C++? From beginner to advanced, this book covers the internals of AI engines in C++, with real source code examples and research paper citations.

As a programmer, your job is to harness the power of your AI platform and offer it up to your many users in top-level features. Whether your AI project is about writing sports content or auto-diagnosing X-ray images, your work as an AI developer is based on fundamentally the same architecture. And to do this at a scale that matches the capability of your workhorse models, you need a programming language to match its power. I'll give you three guesses which one I recommend.

C++ is on the inside of all AI engines. Whereas Python is often on the outside, wrapping around the various models, C++ is always closer to the machine and its hardware. PyTorch and Tensorflow have lots of Python code on the top layers, but the grunt work underneath runs in highly optimized C++ code. The main advantage of C++ is that it is super-fast and has low-level capabilities, which makes its operations close to those of the hardware instructions. This is a perfect match because AI engines need to run blazingly fast, with hardware-acceleration integrations direct to the GPU to handle billions of arithmetic calculations. And yet, C++ is also a high-level programming language with support for advanced features like classes and modularity, so it's great for programmer productivity.

Key Features

Transformer components in C++.
Faster and smarter AI.
Play with an AI engine on your desktop.
Cutting-edge research optimizations.
Just C++ code without all the math.

Part I: AI Projects in C++
Introduction to AI in C++.
Transformers & LLMs.
AI Phones.
AI on Your Desktop.
Design Choices & Architectures.
Training, Fine-Tuning & RAG.
Deployment Architecture.

Part II: Basic C++ Optimizations
Bitwise Operations.
Floating Point Arithmetic.
Arithmetic Optimizations.
Compile-Time Optimizations.
Pointer Arithmetic.
Algorithm Speedups.
Memory Optimizations.

Part III: Parallel C++ Optimizations
Loop Vectorization.
Hardware Acceleration.
AVX Intrinsics.
Parallel Data Structures.

Part IV: Transformer Components in C++
Encoders & Decoders.
Attention.
Activation Functions.
Vector Algorithms.
Tensors.
Normalization.
Softmax.
Decoding Algorithms.
Tokenizer and Vocabulary.

Part V: Optimizing Transformers in C++
Deslugging AI Engines.
Caching Optimizations.
Vectorization.
Kernel Fusion.
Quantization.
Pruning.
MatMul/GEMM.
Lookup Tables & Precomputation.
AI Memory Optimizations.

Part VI: Enterprise AI in C++
Tuning, Profiling & Benchmarking.
Platform Portability.
Quality.
Reliability.
Self-Testing Code.
Debugging.

Part VII: Research on AI Optimization
Overview of AI Research.
Advanced Quantization.
Knowledge Distillation.
Structured Pruning.
Early Exit and Layer Pruning.
Width Pruning.
Length Pruning.
Adaptive Inference.
Zero-Multiplication Models.
Logarithmic Models.
Arithmetic Optimization Research.
Ensemble Multi-Model Architectures.
Advanced Number Systems.
Neural Architecture Search.
Appendix 1: C++ Slug Catalog.