Computer Science 294
Practical Machine Learning
(Spring 2008)
Prof. Michael Jordan (jordan-AT-cs)
GSI Percy Liang (pliang-AT-cs)
Lecture: Tuesday 5-7pm, Soda 306
Office hours of Percy Liang: Monday 5-6pm (Soda 511 alcove)
Office hours of the lecturer of the week: Thursday 5-6pm (Soda 511 alcove)
This course introduces core statistical machine learning algorithms in a
(relatively) non-mathematical way, emphasizing applied problem-solving. The
prerequisites are light; some prior exposure to basic probability and to linear
algebra will suffice. A list of topics can be found here.
Here's the course website from Fall 2006.
[Announcements]
[Administrivia]
[Lectures]
[Homework]
[Project]
[Readings]
[Software]
- Feb 25: Homework 2 is posted now.
Please submit your homework on bSpace (under Assignments).
- Feb 25: If you haven't started thinking about your project, start
doing so; you will need to submit a brief paragraph about your project by March 21.
- Feb 18: I will not be having office hours by default due to the
holiday, but if someone wants to meet me, email me to set up a time. -Percy
- Jan 22: Please take the following
quick survey
so we can tailor the class for you.
- Course prerequisites: some prior exposure to probability and to linear algebra.
- Coursework and grading:
Students will be required to complete bi-weekly homework assignments.
These must be turned in on time to receive credit. There will also
be a final project. A project report will be required and projects
will also be presented in an end-of-term poster session. The homeworks
will count for 60% of the grade and the project will count for
40% of the grade.
- bSpace: use the forum group there to
discuss homeworks, project topics, ask questions about the class, etc.
To access bSpace, simply visit
https://bspace.berkeley.edu
and login using your CalNet ID.
If you don't have a CalNet ID,
send an email to pliang-AT-cs to request a guest account.
If you're not registered to the class or the tab for the course doesn't show up,
you can add it by going through My Workspace | Membership, then click
on 'Joinable Sites' and search for 'CS294-34 Spring 2008'.
There will be bi-weekly homeworks, worth a total of 60% of your grade.
Each homework is due at the beginning of class.
Please keep your responses succinct and clear.
There is no need to attach code.
Turn in your homework on bSpace (click Assignments on the left menu).
The project counts for roughly 40% of your grade.
We will use the same guidelines as the ones
for cs281a of last year (though of less theoretical flavor);
please read them here.
The guideline contains examples of project write-ups and posters,
just to give you an idea of what one can do.
The main idea is to have you apply a concept from the class in your own research, or explore it further through
experimentation.
The evaluation of the project will be based on the following three deliverables:
- Submit on bSpace one paragraph describing your project plan or ideas by Friday, March 21.
The idea is to have you start working on the project before
May... Feel free to come to OH to discuss project ideas, to send emails to the lecturers,
or to use the wiki/discussion group on bSpace to brainstorm ideas.
- Present a poster about your project on Tuesday, May 13 from 2-4pm on the 6th floor.
- Submit your project write-up on bSpace by Tuesday, May 20.
Readings for the specific sections will be provided in the future. There are
several good resources which contain general information.
-
Hastie, Tibshirani and Friedman.
The Elements of Statistical Learning: Data Mining, Inference, and Prediction.
Book's web site
-
Witten and Frank.
Data Mining: Practical Machine Learning Tools and Techniques.
Book's web site
-
Andrew Moore's Tutorials are
a collection of PDF tutorials on many of the topics that will be covered in the class.
There is a wide variety of free data mining and machine learning software
available. You might find them useful for doing the homeworks or the final project.
- Weka is a large
Java package implementing many learning algorithms.
- RapidMiner (formerly known as YALE)
is an alternative (and complementary) Java package. It includes a GUI which
allows automation of the whole data path from feature normalization through
feature selection, learning and cross validation.
- SVM-Light and LibSVM are two popular implementations of
various SVM algorithms
- R is an interactive programming
language designed for statistics. Many very useful libraries are available.
Last updated Apr. 22, 2008.