Large-Scale Optimization

Veranstaltung	Wochentag	Termin	Ort
042531 (Übung: 042532)	Montag (Übung: Mittwoch)	10.15 - 12.00	Otto-Hahn-Str. 12 - R 1.056 Campus Nord

Lecture Information

Module description: pdf
Language: English
Lecturer: Dr. Sangkyun Lee
Attendance to practices (Übung) is REQUIRED.
Office Hour (lecturer): Thursday 13:00-14:00, LS8, OH 12, Raum 4.023 (or by appointments)

Content:

This course focuses on optimization techniques to find solutions of large-scale problems that typically appear in statistical learning / data analysis tasks with big data. Topics would include widely adopted methods in modern research literature such as projected gradient methods, accelerated first order algorithms, conjugate gradient methods, quasi-Newton methods, block coordinate descent, proximal point methods, stochastic sub-gradient algorithms, alternating direction method of multipliers, and semi-definite programming. Efficiency of these methods in terms of convergence rate and overall time complexity will be discussed, so that one can see how to choose a suitable method for his/her own research problems. Separable (approximate) reformulations of optimization problems will also be discussed that are suitable for parallel computation compared to their original formulation. Discussions in the course will be problem- oriented, so that methods will be presented in the context of specific problem instances whenever possible. Homework assignments will be given to provide background knowledge for the course or to check understanding of techniques.

Aims:

The aim of this course is to provide students with understanding of modern optimization techniques suitable for large-scale/big-data problems, so that students see how they can choose, apply, and/or modify appropriate methods for their own research problems.

Organization:

Lectures: we focus on methods to understand the structure of their target problem, algorithms, and convergence properties in theory.
Practices: we will see real-world examples where optimization methods (discussed in lectures) are used.

Lectures:

This lecture will be based on recent publications that will be assigned as reading homeworks. No textbook is required. Another lecture of mine will be helpful if you need background knowledge:

Some of my lecture materials are from:

EE236C (UCLA) by L. Vendenberghe, link
10-725 (CMU) by G. Gordon R. Tibshirani, link
EE364b (Stanford) by S. Boyd, link

The following is a tentative schedule, and may subject to change.

Lecture 1 (08,13.04): Introduction to large-scale optimization
Lecture 2 (15.04): Background for convex optimization
Lecture 3 (20.04): Gradient descent
Lecture 4 (22.04): Subgradient method
Lecture 5 (27.04): Stochastic Subgradient descent 1
Lecture 6 (29.04): Stochastic Subgradient descent 2
Lecture 7-8 (4-6.05): SVM and KKT conditions
Lecture 9 (11.05): Pegasos SVM solver
Lecture 10 (13.05): Proximal gradient descent
Lecture 11 (18.05): Separable optimization / Compressed sensing
Lecture 12 (20.05): Compressed sensing 2
Lecture 13 (27.05): Accelerated First-Order Method
Lecture 14 (01-08.06): Matrix Factorization
Lecture 15 (15.06): ADMM 1
Lecture 16 (17.06): ADMM 2
Lecture 17 (22.06): ADMM 3
Lecture 18 (24.06): Optimization modeling languages
Lecture 19 (29.06): Block coordinate descent 1
Lecture 20 (01.07): Block coordinate descent 2
Lecture 21 (06.07): Duality 1
Lecture 22 (08.07): Duality 2
Lecture 23 (13.07): Duality 3
Lecture 24 (15.07): Duality 4

Reading assignments:

Read the following papers as for your homework.

(discussion begins on 22.04)Nemirovski et al., Robust Stochastic Approximation Approach to Stochastic Programming, SIOPT 2009 Sections 1~2.2, 4~5 (inclusive)
(4-6.5) Shalev-Shwartz, Singer, and Srebro, Pegasos: Primal Estimated sub-GrAdient SOlver for SVM, ICML 2007
(13.5, 27.5-1.6) FISTA: A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems
(18-20.5, optional) L1-L2 Optimization in Signal and Image Processing (This provides a nice overview of some applications/methods we discuss. PDF is accessible within TU Dortmund)
(8-17.6) Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, FnT ML, 2010 Chapters: 2 and 3 (ADMM), 7 (distributed optimization), and 8 (ML examples: optional)
(29.6-8.7) On the Convergence of Block Coordinate Descent Type Methods, SIOPT, 2013

Final Exam: Mini-Project

> Proposal (due by 17.07): a successful proposal is required for the qualification of the final exam. Check the requirements discussed on 14.01 pdf
> Presentation (18.09): oral presentation of a mini-project (20min: 1-person group or 30min: 2-people group)
> Final report (due by 24.09): up to 7 pages (extra pages allowed only for large tables and plots)
Final grade will be based on the presentation (50\%) and the final report (50\%).
Latex template for proposal and the final report: link

Topics:

There is no restriction on the choice of topics, but it should be motivating for the student and fit the time limit of the lecture. You can focus on either method or application, but not restricted to.

Method focus: e.g. modification of a method for better speed for a certain problem. Need to implement the method. Adaptation from an existing code is allowed. Benchmarking several algorithms for one problem setting is possible.
Application focus: e.g. apply an optimization method for your research problem. Don't need to implement the method by yourself, if you can find an off-the-shelf software.

Datasets:

These data sets can be useful for your project.

Collections of "standard" data sets:

> UCI Machine Learning Repository: a common source of data sets link

> LIBSVM datasets: link
Computer vision and image datasets:

> A collection at UT link

> Face recognition databases link

> Visual SLAM for driverless cars link
Movie rating data sets:

> NF dataset (right clink -> download link as a file. Password required to unzip. About 500 MB): link

> MovieLens dataset (smaller than NF) link
Bioinformatics:

> Gene expression omnibus (GEO): link

> The cancer genome atlas (TCGA): link
ECML/PKDD 2015 Discovery Challenge: link

Grading Criteria:

Motivation: why is this an interesting problem for you and/or for the audience?
Understanding: the method(s) of choice should be explained in a compact but precise manner
Creativity: any good reason behind: the choice of a method / methods, the choice of a problem / problems, etc.
Evaluation: how well a hypothesis (e.g. an algorithm will get faster by a modification) is validated by theoretic analysis or experiments?
Presentation: how good things are organized and delivered in the presentation?
Repeatability: does it provide a script and executables to generate (a part of) experiments and results?

Lecture Schedule:

The schedule may change, so please check often.

Hauptnavigation

General

Research

Teaching

Staff