CSE 5800 Advanced Topics in CS:
Learning/Mining and the Internet
MW 6:30-7:45pm, Link 255

Philip Chan
322 Harris Center, 674-7280

Office Hours: MW 1-3pm (or by appointment)

Syllabus

Schedule

Week Dates Monday Wednesday
1Aug 17 & 19 Introduction
A Mining/Learning Algorithm [Classification]
Fast Effective Rule Induction. W. Cohen. Proc. ICML, p115-123, 1995.
FOIL Gain
description length: pages 44-45 (typo in third term of Eq 4.9: cover -> uncover)
2Aug 24 & 26 Syskill & Webert: Identifying interesting web sites. M. Pazzani, J. Muramatsu & D. Billsus, Proc. AAAI, p54-61, 1996 Data Mining Methods for Detection of New Malicious Executables. M. Schultz, E. Eskin, E. Zadok & S. Stolfo. Proc. IEEE Security & Privacy Symp., p38-49, 2001.
3Aug 31 & Sep 2 An Experimental Comparison of Naive Bayesian and Keyword-Based Anti-Spam Filtering with Personal E-mail Messages. I. Androutsopoulos, J. Koutsias, K. Chandrinos, C. Spyropoulos. Proc SIGIR, pp 160-167, 2000. Anomaly Detection
Learning Rules for Anomaly Detection of Hostile Network Traffic. M. Mahoney & P. Chan. Proc. ICDM, pp. 601-604, 2003.
[More details in FIT Tech Report CS-2003-16.]
4Sep 7 & 9 Labor Day holiday HW 1 demo
5Sep 14 & 16 A comparative study of anomaly detection schemes in network intrusion detection. A. Lazarevic, L. Ertoz, A. Ozgur, J. Srivastava & V. Kumar. Proc. SDM, p25-36, 2003. Using Generalization and Characterization Techniques in the Anomaly-based Detection of Web Attacks, W. Robertson, G. Vigna, C. Kruegel, R. Kemmerer, Proc. NDSS, 2006.
6Sep 21 & 23 Association Rules from handout--6.1-6.3
Correction of Algorithms 6.2 & 6.3
[More details in Fast Algorithms for Mining Association Rules. R. Agrawal & R. Srikant. Proc. VLDB, pp. 487-499, 1994.]
Mining Web Logs for Prediction Models in WWW Caching and Prefetching Q. Yang, H. Zhang & T Li. Proc. KDD, pp. 473-478, 2001.
Video mentions ML for prefetching in cloud with Amazon's Silk broswer, 2011.
7Sep 28 & Sep 30 Clustering
A Comparison of Document Clustering Techniques. M. Steinbach, G. Karypis & V. Kumar. U. Minnesota Tech Report 00-034, 2000.
[shorter version: A Comparison of Document Clustering Techniques. M. Steinbach, G. Karypis & V. Kumar. KDD Workshop on Text Mining, 2000.]
HW 2 demo
8Oct 5 & 7 A New Suffix Tree Similarity Measure for Document Clustering. H. Chim, X. Deng. Proc. WWW, pp. 121-130, 2007. Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces. R. Perdisci, I. Corona, D. Dagon, & W. Lee, ACSAC, pp. 311 - 320, 2009.
9Oct 12 & 14 Columbus Day holiday A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. M. Ester, H. Kriegel, J. Sander & X. Xu. Proc. KDD, pp. 226-231, 1996.
10Oct 19 & 21 Graphs
Bridging Centrality: Graph Mining from Element Level to Group Level. W. Hwang, T. Kim, M. Ramanathan and A. Zhang. Proc. KDD, pp. 336-344, 2008.
HW 3 demo
11Oct 26 & Oct 28 Human mobility, social ties, and link prediction, D. Wang et al., Proc. KDD, pp. 1100-1108, 2011. On Flow Authority Discovery in Social Networks. C. Aggarwal, A. Khan & X. Yan. Proc. SDM, pp. 522-533, 2011.
12Nov 2 & 4 Meta path-based collective classification in heterogeneous information networks. X. Kong et al. Proc. CIKM, pp. 1567-1571, 2012. Detecting influenza epidemics using search engine query data. J. Ginsberg, M. Mohebbi, R. Patel, L. Brammer, M. Smolinski & L. Brilliant. Nature, 457:1012-1014, 2009.
13Nov 9 & 11 HW 4 demo Veteran's Day holiday
14Nov 16 & 18 Google News Personalization: Scalable Online Collaborative Filtering. A. Das, M. Datar, A. Garg, S. Rajaram. Proc. WWW, pp. 271-280, 2007. Information cartography: creating zoomable, large-scale maps of information. D. Shahaf et al. Proc. KDD, pp. 1097-1105, 2013.
15Nov 23 & 25 Coupled Semi-Supervised Learning for Information Extraction. A. Carlson, J. Betteridge, R. Wang, E. Hruschka Jr. and T. Mitchell. Proc. ACM Intl. Conf. Web Search and Data Mining (WSDM), pp. 101-110, 2010. Thanksgiving holiday
16Nov 30 & Dec 2 Learning First-Order Horn Clauses from Web Text. S. Schoenmackers, O. Etzioni, D. Weld, and J. Davis. Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), pp. 1088-1098, 2010. Term Project presentation and demo

Abbreviation/acronym of research conferences

Assignments (Submit Server)

Resources