Large-Scale Optimization

Veranstaltung	Wochentag	Termin	Ort
042531 (Übung: 042532)	Montag (Übung: Mittwoch)	10.15 - 12.00	Otto-Hahn-Str. 12 - R 1.056 Campus Nord

Lecture Information

Module description: pdf
Language: English
Lecturer: Dr. Sangkyun Lee
Attendance to practices (Übung) is STRONGLY RECOMMENDED..
Office Hour (lecturer): by appointments only, LS8, OH 12, Raum 4.023

Content:

This course focuses on optimization techniques to find solutions of large-scale problems that typically appear in statistical learning / data analysis tasks with big data. Topics would include widely adopted methods in modern research literature such as projected gradient methods, accelerated first order algorithms, conjugate gradient methods, quasi-Newton methods, block coordinate descent, proximal point methods, stochastic sub-gradient algorithms, alternating direction method of multipliers, and semi-definite programming. Efficiency of these methods in terms of convergence rate and overall time complexity will be discussed, so that one can see how to choose a suitable method for his/her own research problems. Separable (approximate) reformulations of optimization problems will also be discussed that are suitable for parallel computation compared to their original formulation. Discussions in the course will be problem- oriented, so that methods will be presented in the context of specific problem instances whenever possible. Homework assignments will be given to provide background knowledge for the course or to check understanding of techniques.

Aims:

The aim of this course is to provide students with understanding of modern optimization techniques suitable for large-scale/big-data problems, so that students see how they can choose, apply, and/or modify appropriate methods for their own research problems.

Organization:

Lectures: understanding of optimization algorithms and their convergence properties.
Practices: application of optimization on machine learning / data analysis problems.

Lectures:

This lecture will be based on recent publications that will be assigned as reading homeworks. No textbook is required. Another lecture of mine will be helpful if you need background knowledge:

Some of my lecture materials are from:

EE236C (UCLA) by L. Vendenberghe, link
10-725 (CMU) by G. Gordon R. Tibshirani, link
EE364b (Stanford) by S. Boyd, link

The following is a tentative list of topics that is subject to change.

Ubung (Practices):

Due to the nature of practices, only some parts of Ubung materials will be available here. We'll use the Julia language.

Final Exam: Mini-Project

*** !!! To register for the final exam, bring this form filled-in by the last lecture date (21.07) to get my signature !!! ***
Final exam registration form

The basic idea is that you pick a paper from a suggested list or of your own choice, and implement the algorithm in the paper. You give two presentations.

> Presentation 1 (30-45min including Q/A, qualification for the final): showing your understanding about the ideas and algorithms in the paper (tentative dates: July 6, 13, 20)
- Include a list of tasks you plan to do in one month (end of lecture ~ final presentation)
> Presentation 2 (45min including Q/A, final exam): briefly review the idea, and show your implementation is correct and works well (tentative date: August 17 and 18, 10:00 - 13:00)
- 1. Understanding of the method from your chosen paper.
- 2. Correct implementation of the method. Correctness can be shown directly (e.g. compared to JuMP solution), or indirectly (e.g. comparing your prediciton performance to that of another code on benchmark datasets)
- 3. (bonus) making your code readable, modular, and open source with some documentation
- (note) if you choose a paper already discussed in lectures (e.g. fista), extra work will be expected: e.g. interesting application, or in-depth discussion of the method which hasn't been in lectures, etc.
- (note) if you go application-centric (using already-discussed methods), your discussion can focus on the application itself (motive, challenges, etc.) while fulfiling the criterion 1 above.

Topic suggestions for mini-project:

The list will be provided soon and keep growing. Discussion with the lecturer is highly recommended for your choice (you must send me an email about it at least). More than one person can work on the same paper, and you can focus on a part of a paper if you have a good reason. Finding more recent papers of similar topics are welcomed. Choose one you can implement around 2 -- 3 weeks. Implementation in the Julia language is strongly recommended. Some papers may accessible only from the university.

>> Methodology-focused (this is the preferred way for a mini-project):

Basis paper: Nemirovski et al., Robust Stochastic Approximation Approach to Stochastic Programming, SIOPT 2009
ADAGRAD, JMLR 2011
SAG, NIPS 2012
SDCA, JMLR 2013
SVRG, NIPS 2013 [Chosen by Mirko]
ADAM, ICLR 2015

Hessian-free second order, ICML 2011 [Chosen by Michael]
K-FAC, ICML 2011
Stochastic quasi-Netwon, AISTATS 2007
SFO, ICML 2014

Basis paper: FISTA: A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems
A geometric alternative to Nesterov's accelerated gradient descent
A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights
Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent [Chosen by Kevin]

Basis paper: On the Convergence of Block Coordinate Descent Type Methods, SIOPT, 2013
SPDC, ICML 2015
more to be added

>> Application-centric (interesting application + basic method is possible. Finding newer/better papers is recommended)

ALS for collaborative filtering This is not a paper, but you can get some idea for your project
NMF: non-negative matrix factorization Methods and papers are referred to in the package description
Learning the parts of objects by non-negative matrix factorization, Nature, 1999

An overview of problems: L1-L2 Optimization in Signal and Image Processing
Lasso, J. Royal Stat. Soc. 2003
Generalized Lasso, Ann. Stat. 2011
Fused Lasso, AAAI 2014
Blind source separation
Blind source separation-based dictionary learning
Inpainting, Computer Journal, 2007
Inpainting, IASC 2008
Sparse component analysis
more to be added

GMRF, JMLR 2008
Graphical Lasso, Biostatistics 2008

Datasets:

Data sets that might be useful for your project.

Collections of "standard" data sets:

> UCI Machine Learning Repository: a common source of data sets link

> LIBSVM datasets: link
Computer vision and image datasets:

> A collection at UT link

> Face recognition databases link

> Visual SLAM for driverless cars link
Movie rating data sets:

> NF dataset (right clink -> download link as a file. Password required to unzip. About 500 MB): link

> MovieLens dataset (smaller than NF) link
Bioinformatics:

> Gene expression omnibus (GEO): link

> The cancer genome atlas (TCGA): link
ECML/PKDD 2015 Discovery Challenge: link

Lecture Schedule:

The schedule may change, so please check often (ignore the marked times, it's always 10:15 - 11:45).

Hauptnavigation

General

Research

Teaching

Staff