Jentzen A., Kuckuck B., Von Wurstemberger P. Mathematical Introduction to Deep Learning

pdf file
size 4,97 MB

added by Masherov 02/08/2025 15:01

Jentzen A., Kuckuck B., Von Wurstemberger P. Mathematical Introduction to Deep Learning

Independent edition, 2023. — 601 p.

I Artificial neural networks (ANNs).
Basics on ANNs.
Fully-connected feedforward ANNs (vectorized description).
Affine functions.
Vectorized description of fully connected feedforward ANNs.
Weight and bias parameters of fully connected feedforward ANNs.
Activation functions.
Multidimensional versions.
Single hidden layer fully connected feedforward ANNs.
Rectified linear unit (ReLU) activation.
Clipping activation.
Softplus activation.
Gaussian error linear unit (GELU) activation.
Standard logistic activation.
Swish activation.
Hyperbolic tangent activation.
Softsign activation.
Leaky rectified linear unit (leaky ReLU) activation.
Exponential linear unit (ELU) activation.
Rectified power unit (RePU) activation.
Sine activation.
Heaviside activation.
Softmax activation.
Fully-connected feedforward ANNs (structured description).
Structured description of fully connected feedforward ANNs.
Realizations of fully connected feedforward ANNs.
On the connection to the vectorized description.
Convolutional ANNs (CNNs).
Discrete convolutions.
Structured description of feedforward CNNs.
Realizations of feedforward CNNs.
Residual ANNs (ResNets).
Structured description of fully-connected ResNets.
Realizations of fully-connected ResNets.
Recurrent ANNs (RNNs).
Description of RNNs.
Vectorized description of simple fully-connected RNNs.
Long short-term memory (LSTM) RNNs.
Further types of ANNs.
ANNs with encoder-decoder architectures: autoencoders.
Transformers and the attention mechanism.
Graph neural networks (GNNs).
Neural operators.
ANN calculus.
Compositions of fully connected feedforward ANNs.
Compositions of fully connected feedforward ANNs.
Elementary properties of compositions of fully connected feedforward ANNs.
Associativity of compositions of fully connected feedforward ANNs.
Powers of fully connected feedforward ANNs.
Parallelizations of fully-connected feedforward ANNs.
Parallelizations of fully connected feedforward ANNs with the same length.
Representations of the identities with ReLU activation functions.
Extensions of fully connected feedforward ANNs.
Parallelizations of fully connected feedforward ANNs with different lengths.
Scalar multiplications of fully connected feedforward ANNs.
Affine transformations as fully connected feedforward ANNs.
Scalar multiplications of fully connected feedforward ANNs.
Sums of fully connected feedforward ANNs with the same length.
Sums of vectors as fully connected feedforward ANNs.
Concatenation of vectors as fully connected feedforward ANNs.
Sums of fully connected feedforward ANNs.
II Approximation.
One-dimensional ANN approximation results.
Linear interpolation of one-dimensional functions.
On the modulus of continuity.
Linear interpolation of one-dimensional functions.
Linear interpolation with fully connected feedforward ANNs.
Activation functions as fully connected feedforward ANNs.
Representations for ReLU ANNs with one hidden neuron.
ReLU ANN representations for linear interpolations.
ANN approximations result in one-dimensional functions.
Constructive ANN approximation results.
Convergence rates for the approximation error.
Multi-dimensional ANN approximation results.
Approximations through supremal convolutions.
ANN representations.
ANN representations for the 1-norm.
ANN representations for maxima.
ANN representations for maximum convolutions.
ANN approximations result for multi-dimensional functions.
Constructive ANN approximation results.
Covering number estimates.
Convergence rates for the approximation error.
Refined ANN approximations result for multi-dimensional functions.
Rectified clipped ANNs.
Embedding ANNs in larger architectures.
Approximation through ANNs with variable architectures.
Refined convergence rates for the approximation error.
III Optimization.
Optimization through gradient flow (GF) trajectories.
Introductory comments for the training of ANNs.
Basics for GFs.
GF ordinary differential equations (ODEs).
Direction of negative gradients.
Regularity properties for ANNs.
On the differentiability of compositions of parametric functions.
On the differentiability of realizations of ANNs.
Loss functions.
Absolute error loss.
Mean squared error loss.
Huber error loss.
Cross-entropy loss.
Kullback – Leibler divergence loss.
GF optimization in the training of ANNs.
Lyapunov-type functions for GFs.
Gronwall differential inequalities.
Lyapunov-type functions for ODEs.
On Lyapunov-type functions and coercivity-type conditions.
Sufficient and necessary conditions for local minimum points.
On a linear growth condition.
Optimization through flows of ODEs.
Approximation of local minimum points through GFs.
Existence and uniqueness of solutions of ODEs.
Approximation of local minimum points through GFs revisited.
Approximation error concerning the objective function.
Deterministic gradient descent (GD) optimization methods.
GD optimization.
GD optimization in the training of ANNs.
Euler discretizations for GF ODEs.
Lyapunov-type stability for GD optimization.
Error analysis for GD optimization.
Explicit midpoint GD optimization.
Explicit midpoint discretizations for GF ODEs.
GD optimization with classical momentum.
Representations for GD optimization with momentum.
Bias-adjusted GD optimization with momentum.
Error analysis for GD optimization with momentum.
Numerical comparisons for GD optimization with and without momentum.
GD optimization with Nesterov momentum.
Adagrad GD optimization (Adagrad).
Root mean square propagation GD optimization (RMSprop).
Representations of the mean square terms in RMSprop.
Bias-adjusted root mean square propagation GD optimization.
Adadelta GD optimization.
Adaptive moment estimation GD optimization (Adam).
Stochastic gradient descent (SGD) optimization methods.
Introductory comments for the training of ANNs with SGD.
SGD optimization.
SGD optimization in the training of ANNs.
Non-convergence of SGD for not appropriately decaying learning rates.
Convergence rates for SGD for quadratic objective functions.
Convergence rates for SGD for coercive objective functions.
Explicit midpoint SGD optimization.
SGD optimization with classical momentum.
Bias-adjusted SGD optimization with classical momentum.
SGD optimization with Nesterov momentum.
Simplified SGD optimization with Nesterov momentum.
Adagrad SGD optimization (Adagrad).
Root mean square propagation SGD optimization (RMSprop).
Bias-adjusted root mean square propagation SGD optimization.
Adadelta SGD optimization.
Adaptive moment estimation SGD optimization (Adam).
Backpropagation.
Backpropagation for parametric functions.
Backpropagation for ANNs.
Kurdyka – Łojasiewicz (KL) inequalities.
Standard KL functions.
Convergence analysis using standard KL functions (regular regime).
Standard KL inequalities for monomials.
Standard KL inequalities around non-critical points.
Standard KL inequalities with increased exponents.
Standard KL inequalities for one-dimensional polynomials.
Power series and analytic functions.
Standard KL inequalities for one-dimensional analytic functions.
Standard KL inequalities for analytic functions.
Counterexamples.
Convergence analysis for solutions of GF ODEs.
Abstract local convergence results for GF processes.
Abstract global convergence results for GF processes.
Convergence analysis for GD processes.
One-step descent property for GD processes.
Abstract local convergence results for GD processes.
On the analyticity of realization functions of ANNs.
Standard KL inequalities for empirical risks in the training of ANNs with analytic activation functions.
Fréchet subdifferentials and limiting Fréchet subdifferentials.
Non-smooth slope.
Generalized KL functions.
ANNs with batch normalization.
Batch normalization (BN).
Structured descr. of fully-connected feedforward ANNs with BN (training).
Realizations of fully-connected feedforward ANNs with BN (training).
Structured descr. of fully-connected feedforward ANNs with BN (inference).
Realizations of fully-connected feedforward ANNs with BN (inference).
On the connection between BN for training and BN for inference.
Optimization through random initializations.
Analysis of the optimization error.
The complementary distribution function formula.
Estimates for the optimization error involving complementary distribution functions.
Strong convergence rates for the optimization error.
Properties of the gamma and the beta function.
Product measurability of continuous random fields.
Strong convergences rates for the optimization error.
Strong convergences rates for the optimization error involving ANNs.
Local Lipschitz continuity estimates for the parametrization functions of ANNs.
Strong convergences rates for the optimization error involving ANNs.
IV Generalization.
Probabilistic generalization error estimates.
Concentration inequalities for random variables.
Markov's inequality.
A first concentration inequality.
Moment-generating functions.
Chernoff bounds.
Hoeffding's inequality.
A strengthened Hoeffding's inequality.
Covering number estimates.
Entropy quantities.
Inequalities for packing entropy quantities in metric spaces.
Inequalities for covering entropy quantities in metric spaces.
Inequalities for entropy quantities in finite dimensional vector spaces.
Empirical risk minimization.
Concentration inequalities for random fields.
Uniform estimates for the statistical learning error.
Strong generalization error estimates.
Monte Carlo estimates.
Uniform strong error estimates for random fields.
Strong convergence rates for the generalization error.
V Composed error analysis.
Overall error decomposition.
Bias-variance decomposition.
Risk minimization for measurable functions.
Overall error decomposition.
Composed error estimates.
Full strong error analysis for the training of ANNs.
Full strong error analysis with optimization via SGD with random initializations.
VI Deep learning for partial differential equations (PDEs).
Physics-informed neural networks (PINNs).
Reformulation of PDE problems as stochastic optimization problems.
Derivation of PINNs and deep Galerkin methods (DGMs).
Implementation of PINNs.
Implementation of DGMs.
Deep Kolmogorov methods (DKMs).
Stochastic optimization problems for expectations of random variables.
Stochastic optimization problems for expectations of random fields.
Feynman – Kac formulas.
Feynman – Kac formulas providing existence of solutions.
Feynman – Kac formulas providing uniqueness of solutions.
Reformulation of PDE problems as stochastic optimization problems.
Derivation of DKMs.
Implementation of DKMs.
Further deep learning methods for PDEs.
Deep learning methods based on strong formulations of PDEs.
Deep learning methods based on weak formulations of PDEs.
Deep learning methods based on stochastic representations of PDEs.
Error analyzes for deep learning methods for PDEs.

List of source codes.