The course explores two complementary roles for humans as applied to interactive data analytics: one, where humans are the analysts performing or supervising the analysis; here, the emphasis is on building usable tools for these analysts, and second, where humans are the crowdsourced workers assisting with the computation and analysis; here, the emphasis is on having humans process as little data as possible while gaining maximum benefit.
Students will read a number of papers: both important landmark papers as well as cutting-edge papers, act as a discussant for a paper at least once, and complete a semester-long implementation project. Familiarity with basic databases, machine learning, and algorithms expected.
Crowd-Powered Analytics: An IBM study estimated that 80% of the data recorded every day is unstructured: i.e., it consists of images, videos and text. Fully automated processing of unstructured data is not yet a solved problem. Humans, on the other hand, are very good at understanding, interpreting, and processing unstructured data. How do we use humans to effectively process large volumes of unstructured data?
Interactive Analytics: A McKinsey Big Data Study estimated that 10s of Millions of new data analysts will be needed by 2017. With so many novice data analysts interacting with data, how do we enable them to quickly get valuable insights? Quickly could mean generating the same results faster, but approximately; it could mean showing them visualizations instead of raw data; it could mean helping the users to ``guess'' the query or insight in mind.
You must use the following link to submit your list of top-5 papers: Link.
The papers you provide can be from the list given below. You are also free to list a paper of your choice, but it must match the themes of the class. This list must be submitted by midnight September 6. .
You must use the following link to submit class reviews: Link.
Remember to cover the 5 key questions: what is the problem, why is it important, what sets it apart from previous work, what are the key technical ideas, what are the key flaws and open issues, all within 500 words.
The class reviews must be submitted by midnight the day before class.
|Date||Paper||Presenter||Notes||8/28/2017||VLDB Conference--No Lecture|
|8/30/2017||VLDB Conference--No Lecture|
|9/6/2017||Introduction to course content||Aditya||Send list of paper preferences by midnight September 6th.|
|9/11/2017||CrowdScreen: Algorithms for Filtering Data Using Humans||Aditya|
|9/13/2017||Human-Powered Sorts and Joins||Aditya||First time for Class Review- Send it by midnight the day before;|
|9/18/2017||So Who Won: Dynamic Max Discovery with the Crowd||Assma||Student Presentations Start|
|9/20/2017||CrowdDB: Answering Queries Using Crowdsourcing||Fareedah|
|9/25/2017||Deco: Declarative Crowdsourcing||Litian|
|9/27/2017||Dremel: Interactive Analysis Of Web-Scale Datasets||Dipannita|
|10/2/2017||Spark SQL: Relational Data Processing in Spark||Junting Lou|
|10/4/2017||BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data||Subham De|
|10/9/2017||Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee||Silu Huang, Ph.D. Student, DAIS|
|10/11/2017||Trust Me, I’m Partially Right: Incremental Visualization Lets Analysts Explore Large Datasets Faster||Aditya|
|10/16/2017||Incvisage: I’ve Seen “Enough”: Incrementally Improving Visualizations to Support Rapid Decision Making||Sajjadur, Ph.D. Student, DAIS|
|10/18/2017||ImMens: Real-time Visual Querying of Big Data||Tao Mo|
|10/23/2017||Polaris: A System for Query, Analysis, and Visualization of Multidimensional Relational Databases||Chi-Hsien Yen|
|10/25/2017||Effortless Data Exploration with zenvisage:An Expressive and Interactive Visual Analytics System||Edward Xue|
|10/30/2017||dbTouch: Analytics at your Fingertips||Peter|
|11/1/2017||Gestural Query Specification||Saar Kuzi||Midterm project report due on 3rd Midnight|
|11/6/2017||DataPlay: Interactive Tweaking and Example-driven Correction of Graphical Database Queries||Doris|
|11/8/2017||Data-Spread: Unifying Databases and Spreadsheets||Mangesh, Ph.D. Student, DAIS|
|11/13/2017||Making Database Systems Usable||Assma|
|11/15/2017||MLbase: A Distributed Machine-learning System||Jialin|
|11/27/2017||Guest Lecture: Leveraging data and people to accelerate data science||Laura Haas, IBM Research||No class|
|11/29/2017||MAD Skills: New Analysis Practices for Big Data||Yue|
|12/4/2017||GraphLab: A New Framework For Parallel Machine Learning||Siyu|
|12/6/2017||OrpheusDB: Bolt-on Versioning for Relational Databases||Liqi, Ph.D. Student, DAIS||12/11/2017||Project Presentations||Presentation due prior to class|
|12/13/2017||Project Presentations||Presentation due prior to class; report due midnight|
As part of this course, you need to complete a semester-long project. See the instructor for ideas. Alternatively, you are free to look for ideas in your domain of expertise: for instance, if you work in computational journalism, building a new way to browse and manage large collections of textual archives could be a perfectly reasonable project. Either way, you must speak to the instructor to verify that the project is indeed "challenging" enough.