CS 194-10, Fall 2011
Introduction to Machine Learning



Overview

CS 194-10 is a new undergraduate machine learning course designed to complement CS 188, which covers all areas of AI. Eventually it will become CS 189. The main prerequisite is CS 188 or consent of the instructor; students are assumed to have lower-division mathematical preparation including CS 70 and Math 54. The course will be a mixture of theory, algorithms, and hands-on projects with real data. The goal is to enable students to understand and use machine learning methods across a wide range of settings. As this is a new course, the ride may be a little bumpy, so enrollment is limited (see below).

Announcements

10/21/11 Assignment 7 posted, due 11/20.
10/21/11 Assignment 6 posted, due 11/9.
10/28/11 Corrected Assignment 5 posted (removed requirement for fixed weights in 1(c) and (d)).
10/21/11 Assignment 5 posted, due 10/30.
10/14/11 Assignment 4 due date extended to due 10/23.
10/14/11 Revised Assignment 4 posted (including testing code, new submission instructions), due 10/21.
10/14/11 Assignment 4 posted (including training data), due 10/21.
10/6/11 Midterm solutions posted.
9/9/11 Assignment 0 solutions and Assignment 2 solutions posted.
9/9/11 Assignment 3 posted, due 10/3.
9/11/11 Corrected version of Assignment 2 posted (fixes typos in Q2 and Q4(c)).
9/9/11 Assignment 2 posted (including training data), due 9/19.
9/1ish/11 Assignment 1 posted, due 9/9.
9/1/11 submit is not working, due to 194-10 being just a section of 194; for the time being, email your solutions as firstname.lastname.tar.gz or firstname.lastname.zip to Avital at cs194-tc@imail.eecs.berkeley.edu.
8/25/11 Corrected version of Assignment 0 posted, fixes typos in Q.3.
8/22/11 Assignment 0 posted, due 9/2.
8/16/11 Discussion sections WILL be held in Week 1, i.e., on Aug 24 before the first class; they will be in 310 Soda instead of the usual rooms.



Instructor Stuart Russell
748 Sutardja Dai Hall, russell AT cs.berkeley.edu; (510) 642 4964
Office hours Mon 10-12 and Wed 9-10 in 748 Sutardja Dai Hall.

GSIs
Lecture TuTh 3.30-5, 390 Hearst Mining
Discussion sections
101, Wed 10-11am, 75 Evans (Avital)
102, Wed 2-3pm, 3109 Etcheverry (Mert)
103, Wed 3-4pm, 87 Evans (Mert)

Final Exam Friday Dec 16th, 7:00-10:00pm, location TBD.


Prerequisites
Enrollment in the Course
Course Requirements and Grading
Reading List
Syllabus, Lecture Slides/Notes, Readings, Due Dates
Assignments
Computer Accounts and Course Software

Prerequisites

The prerequisite for the course is CS188 or consent of the instructor. I will assume familiarity with logic, elementary probability theory, elementary linear algebra, and multivariable calculus. The overall technical level will be similar to that in 188. It will help to know Python; if you don't, it can be picked up quickly. As usual, ask me if you're not sure about whether to take the course.

Enrollment in the Course

Enrollment is limited to 60 for this offering; the primary criterion for enrollment will be performance in 188. At present, it is likely the cutoff will be around B+. (This will apply to those who are already enrolled as well as to those on the wait list.) Since we are not much above 60 at present, it is likely that students on the wait list with a B+ or better will be able to enroll. If you are not yet enrolled as of the first day of lecture and still want to be considered for participation in the course, you should attend class and file an appeal form by the end of the first week of classes. Appeal forms are available from the CS Office on the 3rd floor of Soda Hall.

Course Requirements and Grading (Preliminary Draft)

Late policy: NO LATE HOMEWORKS WILL BE ACCEPTED, with the following exception: Over the semester, you have a total of 5 FREE LATE DAYS to cover for emergencies. If you wish to use one or more of these, indicate it clearly on your homework. Assignments are to be turned in by midnight on the due date.

Grading policy: the class is not graded on a curve. Grade is based on total percentage as follows:

A+
A
A-
B+
B
B-
C+
C
C-
D+
D
D-
F

[90 -- 100]%
[85 -- 90)%
[80 -- 85)%
[75 -- 80)%
[70 -- 75)%
[65 -- 70)%
[60 -- 65)%
[55 -- 60)%
[50 -- 55)%
[45 -- 50)%
[40 -- 45)%
[35 -- 40)%
[0 -- 35)%

These boundaries are sharp, i.e., no rounding up. Some assignments and exam questions may offer extra credit; good performance on extra credit questions may result in an improved grade, at the instructor's discretion.

A course grade of F will be assigned if the midterm or final is skipped.


Reading List (Preliminary Draft)

The first two books are very helpful, and are available online, so those (in addition to AIMA) will be the primary sources. Bishop has a wide range of solid mathematical derivations, while Witten and Frank focus much more on the practical side of applied machine learning and on the Weka package (a Java library and interface for machine learning).

Reading assignments for each week (to prepare for lecture, or review for assignments) appear here.


Assignments

Assignments are due by midnight on the day indicated.

Each assignment will include a combination of problems to solve and programs to write and test. Assignments should be turned in using the submit program from an instructional (named or class) account, as described here.

If necessary, solutions to the homework problems can be turned in on paper in the homework box in 283 Soda, or they may be turned in online (e.g., as pdfs produced from LaTeX) using submit, as part of your overall submission.

Except for Assignment 0, which must be done individually, assignments can be done individually or in pairs. (This goes for both problem-solving and programming parts.) If done in pairs, each partner should be involved in all the work!! The usual rule about free-riding applies: the more you free-ride, the lower will be your score on the midterm and final.

Discussion of assignments among students is permitted and encouraged, but solutions and programs may not be copied. I would recommend NOT mixing inter-group discussion with writing up of solutions or code. See the EECS Department Policy on Academic Dishonesty and Kris Pister's policy for further explanation and examples.
Finding solutions on the web: It is becoming increasingly difficult to give homework problems whose solutions are not already available in some form on the web. This does not mean that your first response to any homework is to type the question into Google. The EECS policy begins "Copying all or part of another person's work, or using reference material not specifically allowed, are forms of cheating." For the purposes of this course, the allowed reference materials are the reading materials listed on the course web page and any additional materials specified in the homework; in addition, you may use Wikipedia for background reference.

It is a good idea to start your programming assignments as soon as you can; computers have a tendency to go down the night before an assignment is due. There is evidence from past courses that students who start working well before the due date take about one third the time to complete their work compared to students who wait until the last minute. In general, it will be worth your while to spend more time away from the screen thinking about programs than struggling with them on-line.


Computing Facilities

You will have access to department UNIX workstations for this course. If you already have a "named" account or are enrolled in another EECS course this semester that provides named accounts, you can use that account for this class. Otherwise, you will get a "class" account specifically for CS 194-10 -- see
Information for New Instructional Users as well as the departmental policies.

Course Software

The primary programming language for the course will be Python, including the numpy, scipy, and matplotlib packages. In addition we may be using special-purpose machine learning software packages.

Class newsgroup

The class newgroup will be on Piazza; students who are enrolled or on the waitlist will receive an email inviting them to access the newsgroup.

The class newsgroup is suitable for asking general questions about what the homework questions mean, how the course software works, etc. Do not ask or answer specific questions about homework solutions, e.g., "What's the right answer for number 2?" One of the course GSIs will be checking the newsgroup fairly regularly, but for "official" answers to important questions you might want to email your own GSI directly, AFTER you have checked to see if the question has already been answered on the newsgroup!