Paolo Frasconi, DINFO, via di S. Marta 3, 50139 Firenze
email: .
Tuesday Monday 10:45-12:45.
The course aims to provide an overview of classic and some current deep learning methodologies. It will tentatively cover the following aspects:
Starting this year, I will experiment a blended form of teaching with 6 online hours (less than 1 credit) consisting of six 30-minutes video-lectures, mainly covering practical deep learning programming aspects, with a focus on PyTorch and Tensorflow. There will be a question answering session in class, one week after each video-lecture is posted. Please email me in advance at with the questions that you would like to discuss in class.
You will be able to understand and apply state-of-the-art algorithms and architectures, to understand the relevant methodological details, and to operate according to the current practices. Deep learning is a fast moving field. To be successful in your future career you will need to develop sufficient skills to competently read and understand a large fraction of the current and future literature (yes, a form of meta-learning). Thus, after succesfully completing this course, you should be able to understand, reimplement and evaluate on your own many novel algorithms, with limited help or guidance from a supervisor.
Multivariate calculus and linear algebra are much needed. Elementary numerical optimization, algorithms and data structures, and proficiency on scientific computing with a modern programming language (e.g., NumPy with Python) will be useful.
Starting this year, I will try to use textbook material whenever possible. There are two reasons for this change. First, the course has been moved to the very beginning of the AI program and many of you are still not comfortable studying from papers. Second, two very useful and up-to-date books have been published this year. Still, some material is better covered by reading relevant papers.
There is a single oral final exam with an associated project. You can choose topic of your project but you should discuss it with me during office hours and I will give you the details of what should be done.
Typically, you will be assigned one or more papers to read and will be asked to work at home to reproduce some (simplified) experimental results or to apply the same method to different data or in a slightly different setting. You are responsible for studying the relevant methodological and theoretical prerequisites of these papers (in some cases, studing the references covered in class may be sufficient but in other cases, especially when dealing with the details of the experimental procedures, readings other ancestors in the citation graph may be necessary).
There is no need to submit a report for your work, but you
are asked to share with me the code (not the data! --- for that a links
is sufficient) you have developed with some short instructions for
reproducing your results. Small zip files can be shared by email (please
send a https://0x0.st/
link if
the zip is over a MByte) but if you prefer to share a git repository,
please create a private one on
https://codeberg.org/
and share it with me by inviting the user dl-unifi
as a
member.
You will be required to give a short presentation during the exam. Please ensure that during your presentation you introduce and motivate the problem being addressed in the context of the relevant literature, explain the technical derivation of the methods, and describe in detail the experimental work and the results. You are allowed (but not required) to use multimedia tools to prepare your presentation. You should be prepared to answer general questions about the background literature supporting your paper(s) (for example if the method uses an optimizer, which happens with overwhelming probability, you are supposed to know how it works) and about the details of your experimental work.
You can work in groups of two to carry out the experimental work (three is an exceptional number that you must motivate clearly). If you do so, please ensure that personal contributions to the overall work are clearly identifiable. In any case, during the exam you will have to answer questions individually.
Relevant papers and/or sections of the textbook(s) are listed on the right side. [Sections of] papers in the "required" list have been covered in class and should be studied while preparing for the exam. Papers listed as "optional" may be useful to get a better picture of the class topic but you do not need to study them, unless they are directly related to the topic of your project.
Date | Topics | Readings/Handouts |
---|---|---|
2024-09-17 | Administrivia. Most common forms of learning: supervised, unsupervised, reinforcement. Outline of the course. Historical remarks. |
|
2024-09-20 | Supervised learning and empirical risk minimization. Optimal (binary) classifier. Modeling: the generative and the discriminative direction. Single layer networks. Role of the logistic function. |
|
2024-09-24 | Setting up deep learning frameworks. Working remotely. Tensors. |
|
2024-09-24 | "Linking MLE and ERM. Loss functions for regression: square, Huber. The least squares problem and its solution. Loss function for classification: 0-1, hinge, log, exp. Generalized linear models: Bernoulli and binary classification." |
|
2024-09-27 | "Logistic regression as a special case of generalized linear model. Multiclass classification and softmax regression. Gradient descent." |
|
2024-10-01 | Logistic regression in pure Tensorflow. Tensorboard. Logistic regression in PyTorch |
|
2024-10-01 | Stochastic gradient descent. Convergence rates of GD and SGD. The tradeoffs of large scale learning. Optimization, estimation, and approximation errors. |
|
2024-10-04 | Minibatches. Effects of batch size. SGD with momentum. Adagrad. RMSProp. |
|
2024-10-08 | Basis function. RBF networks. Feature engineering and its limitations. Biologically inspired features. Sparse coding. Feature learning and end-to-end learning. Compositionality and deep representations. |
|
2024-10-11 | Layerwise training of deep networks. Denoising autoencoders. Multilayered perceptrons. Rectifiers. |
|
2024-10-15 | More activation functions: Softplus, Leaky ReLU, parametric ReLU. MLPs of ReLUs are linear piecewise functions. Inductive bias. Automatic differentiation in forward and reverse mode. |
|
2024-10-18 | Automatic differentiation in TensorFlow and in PyTorch. The multilayered perceptron in TensorFlow/Keras |
|
2024-10-18 | Explicit regularization by penalties. Effects of ridge (L2) and L1 regularizers. Weight decay. Bias-variance tradeoff and double descent. Bayesian interpretation and max-a-posteriori. |
|
2024-10-22 | The Adam and the AdamW optimizers. Weight sharing. Early stopping. Dropout. |
|
2024-10-25 | More activation units (GELU, SiLU, Swish) and their relatioship to dropout. Weight initialization (LeCun, Glorot, He). Batch and layer normalization. |
|
2024-10-29 | Convolutional networks for Nd signals and their inductive bias. Basic concepts and some variants. Translational equivariance. Stacking convolutional layers. Strides and pooling. |
|
2024-11-05 | Dilated and transposed convolutions. Bottlenecks (1x1 convolutions). Normalization for CNNs: batch, layer, instance, group. Gates. Mixtures of experts. Skip connections: Highway and residual networks. Sketch of DenseNet. Efficient net. |
|
2024-11-08 | Semantic segmentation and U-nets. Fully convolutional structures. Class imbalance in segmentation problems and Dice loss. Sequence processing problems. Recurrent neural networks. Vanishing/exploding gradients when storing long-term information. Gates in RNNs: LSTM and GRU. |
|
2023-11-12 | No class today. Also: office hours moved to Friday Nov 15, 14:15-16:15 |
|
2024-11-14 | Dataloaders in PyTorch. Convolutional networks and DenseNet in PyTorch. |
|
2024-11-15 | Bidirectional RNNs. The general sequence-to-sequence learning problem. Encoder-decoder architectures. Decoding: Greedy; optimal decoding with Viterbi; beam search; sampling. Recurrent language models with attention. Embedding layers. |
|
2024-11-19 | Introduction to transformers. Soft dictionaries. Self-attention layers. Complexity. Handling minibatches: Masking and batch matrix multiplication. |
|
2024-11-22 | Uso di einsum e einops. Multihead attention. Positional encoding. |
|
2024-11-26 | Vision Transformers. BERT. The pretraining/fine-tuning strategy. Transfer learning. Fine-tuning. Self-supervised learning. Pretext tasks. |
|
2024-11-29 | The hyperparameter optimization problem. Elementary algorithm. An introduction to Gaussian processes. |
|
2024-12-03 | Triplet loss. Siamese networks. Contrastive learning. SimCLR. Model-based hyperparameter optimization. |
|
2024-12-06 | Acquisition functions: probability of improvement, expected improvement. Multi-fidelity approaches to hyperparameter optimization. Successive halving. Hyperband. Brief mention of ASHA. |
|
2024-12-10 | Overview of setting involving several distributions. Multi-task learning. Meta-learning. Out-of-distribution data and anomaly detection. Algorithms for domain adaptation. Reweighting examples. Mixup. |
|
Full text of linked papers
is normally accessible when connecting from a UNIFI IP address. Use
proxy-auth.unifi.it:8888
(with your credentials) if
you are connecting from outside the campus network.