Day |
Time |
Room |
Monday |
10:00-11:00 |
CS102 |
Wednesday |
10:00-11:00 |
CS102 |
Thursday |
10:00-11:00 |
CS102 |
Lecture No. & Topic |
Reading Material |
1. Overview |
The KDD process for extracting
useful knowledge from volumes of data Usama Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, Communications of the ACM, Vol. 39(11), Nov. 1996, Pages: 27 - 34 |
2. Data
Warehousing & OLAP |
An overview of data warehousing and OLAP
technology Surajit Chaudhuri, Umeshwar
Dayal, ACM SIGMOD Record,
Vol. 26, Mar. 1997.
Data cube: A relational
aggregation operator generalizing group-by, cross-tab, and sub-totals.
J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. In Proc. 12th ICDE, pages 152--159, New Orleans,
March 1996.
|
3. Data
Preprocessing: Summary Data Structures, AD-Tree, KD-Tree |
Cached
Sufficient Statistics for Efficient Machine Learning with Large
Datasets Andrew Moore and Mary Soon Lee, Journal of Artificial Intelligence
Research, Vol. 8, Mar. 1998, pp. 67-91. A tutorial on kd-trees, Andrew Moore, University of Cambridge Computer Laboratory Technical Report No. 209, 1991. |
4. Data Preprocessing:
Attribute Selection |
Pattern
Classification, Duda, Hart & Stork, Prentice Hall India,
2002,
Call No. 006.4 D86P2 (Chapters on Feature Selection and Extraction) |
5. Association
Rules, Apriori Algorithm |
Mining association rules between sets
of items in large databases, Rakesh Agrawal, Tomasz Imieliński,
Arun Swami, ACM SIGMOD Record , Proceedings of the
1993 ACM SIGMOD international conference on Management of data, Vol. 22(2)
Fast Algorithms for Mining Association Rules, R. Agrawal, R. Srikant, Proc. of the 20th Int'l Conference on Very Large Databases, Santiago, Chile, Sept. 1994. Chapter 6 of Han & Kamber book. |
6. Association Rules, FP-growth algorithm, Generalized Association Rules | Mining Frequent Patterns
without Candidate Generation, (Slides),
J. Han, J. Pei, and Y. Yin, Proc. 2000 ACM-SIGMOD Int. Conf.
on Management of Data (SIGMOD'00), 200
Mining
Generalized Association Rules, R. Srikant, R. Agrawal,
Proc. of the 21st Int'l Conference on Very Large Databases,
1995 Chapter 6 of Han & Kamber book. |
7. Clustering:
Overview |
Data clustering: a review,
A. K. Jain, M. N. Murty, P. J. Flynn, ACM Computing
Surveys, Vol. 31, Issue 3, Pages: 264 - 323, 1999 Chapter 8 of Han & Kamber book. |
8. Partitional Clustering: ScaleKM, CLARANS |
Scaling Clustering
Algorithms to Large Databases, P.S. Bradley, U. Fayyad
and C. Reina, Knowledge Discovery and Data Mining, 1998, pp. 9-15. |
9. Hierarchical Clustering: BIRCH, CURE |
BIRCH: an efficient data clustering
method for very large databases , Tian Zhang, Raghu
Ramakrishnan, Miron Livny, ACM SIGMOD, Vol. 25(2), Pages 103-114,
1996 CURE: An efficient clustering algorithm for large databases, |
10. Density based clustering: DBSCAN, OPTICS |
A
Density-Based Algorithm for Discovering Clusters in Large Spatial
Databases with Noise, Ester M., Kriegel H.-P., Sander J.,
Xu X., Proc. 2nd int. Conf. on Knowledge Discovery and Data Mining
(KDD ‘96), Portland, Oregon, 1996, AAAI Press, 1996. OPTICS: Ordering Points To Identify the Clustering Structure , Ankerst M., Breunig M. M., Kriegel H.-P., Sander J., Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'99), Philadelphia, PA, 1999, pp. 49-60. |
11. Grid based clustering: STING,
CLIQUE |
STING: A Statistical
Information Grid Approach to Spatial Data Mining, W. Wang,
J. Yang, and R.R. Muntz, Proceedings of the 23rd VLDB Conference,
Athens, Greece, Aug 1997. Automatic subspace clustering of high dimensional data for data mining applications, Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan, Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, Seattle, WA, Pages: 94 - 105 |
12. Model based clustering: Expectation
Maximization, Clustering categorical data: ROCK |
Scaling EM algorithm to large
database, P.S. Bradley, U. Fayyad, C. Reina, Microsoft Technical
Report, 1999. ROCK: A robust clustering algorithm for categorical attributes, S. Guha, R. Rastogi and K. Shim, Information Systems, 2002 |
13. Outlier Detection: Statistical,
Distance based, exception based, discovery driven exploration of
data cubes. |
Chapter 8 of Han and Kamber
book Distance-based outliers: algorithms and applications, |
14. Classification:
Overview, Decision trees |
Chapter 7 of Han and Kamber book |
15. Scalable Decision Trees: SLIQ |
SLIQ:
A Fast Scalable Classifier for Data Mining, M. Mehta,
R. Agrawal and J. Rissanen, Proc. of the Fifth Int'l Conference
on Extending Database Technulogy, Avignon, France, March
1996. |
16. Scalable Decision Trees: SPRINT,
RainForest |
SPRINT:
A Scalable Parallel Classifier for Data Mining, J.C.
Shafer, R. Agrawal, M. Mehta, Proc. of the 22th Int'l Conference
on Very Large Databases, Mumbai (Bombay), India, Sept. 1996. RAINFOREST - A Framework for Fast Decision Tree Construction of Large Datasets. J. E. Gehrke, Raghu Ramakrishnan, and Venkatesh Ganti, Proceedings of the Twenty-fourth International Conference on Very Large Data Bases, New York, New York, 1998. |
17. Support Vector Machines |
A Tutorial on
Support Vector Machines for Pattern Recognition, C. J. C.
Burges. Knowledge Discovery and Data Mining, 2(2), 1998.
|
18. Bayes Classifier, k-nearest neighbor
rule |
Pattern Classification, Duda,
Hart & Stork, Prentice Hall India, 2002, Call No. 006.4 D86P2 (Chapters on Bayes Decision Rule and k-nearest neighbor rule) |
19. Metaclassifiers: Boosting, Error Correcting
Output Codes, Query by Committee |
Ensemble Learning. T. G.
Dietterich, In The Handbook of Brain Theory and Neural Networks,
Second edition, (M.A. Arbib, Ed.), Cambridge, MA: The MIT Press,
2002. Postscript
Preprint. A short introduction to boosting, Y. Freund and R.E. Schapire.Journal of Japanese Society for Artificial Intelligence, 14(5):771-780, September 1999. Solving Multiclass Learning Problems via Error-Correcting Output Codes. Dietterich, T. G., Bakiri, G. Journal of Artificial Intelligence Research 2: 263-286, 1995. Postscript file. Query by committee, H. S. Seung, M. Opper, H. Sompolinsky, Proceedings of the fifth annual workshop on Computational learning theory, Pages: 295 - 302 |
20. Overview of Statistical and Computational
Learning Theory |
An
overview of statistical learning theory, Vapnik, V.N. IEEE Trans. Neural
Networks, Vol. 10, No. 5, Page(s):
988-999, 1999, PDF
Full-Text A theory of the learnable, L. G. Valiant, Communications of the ACM, Volume 27, Issue 11, Pages: 1134 - 1142, 1984. |
21. Sequence Mining: Sequential Pattern
Mining |
Mining
Sequential Patterns, R. Agrawal, R. Srikant, Proc.
of the Int'l Conference on Data Engineering (ICDE), Taipei,
Taiwan, March 1995. |
22. Motif discovery, Clustering, Similarity
Search in Time Series Data |
Finding Motifs in
Time Series, J. Lin, E. Keogh, P. Patel, and S. Lonardi,
In proceedings of the 2nd Workshop
on Temporal Data Mining, at the 8th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining. Edmonton,
Alberta, Canada. July 23-26, 2002. Efficient Similarity Search in Sequence Databases, R. Agrawal, C. Faloutsos, A. Swami, Proc. of the 4th Int'l Conference on Foundations of Data Organization and Algorithms, Chicago, Oct. 1993, Also in Lecture Notes in Computer Science 730, Springer Verlag, 1993, 69-84. An enhanced representation of time series which allows fast and accurate classification clustering and relevance feedback. Keogh, E. and Pazzani, M., In 4th International Conference on Knowledge Discovery and Data Mining. New York, NY, Aug 27-31. 1998, pp 239-243. [pdf] |
23. Streaming Data Mining: Computing
model, Sampling, Histogram |
Querying
and Mining Data Streams: You Only Get One Look, Minos Garofalakis,
Johannes Gehrke, and Rajeev Rastogi, Tutorial at 2002
ACM SIGMOD International Conference on Management of Data (SIGMOD'2002),
Madison, Wisconsin, June 2002. |
24. Sketches |
Processing
Complex Aggregate Queries over Data Streams. Alin Dobra, Minos Garofalakis,
J. E. Gehrke, and Rajeev Rastogi. In Proceedings of the 2002 ACM
Sigmod International Conference on Management of Data, Madison, Wisconsin,
June 2002. |
25. Clustering Data Streams |
Clustering
Data Streams: Theory and Practice, Sudipto Guha, Nina Mishra, Adam
Meyerson, Rajeev Motwani and Liadan O'Callaghan. IEEE Transactions on
Knowledge and Data Engineering, 15(3): 515-528, May/June 2003. |
26. Knowledge evaluation: Interestingness |
What makes patterns interesting in knowledge discovery
systems, Silberschatz,
A.; Tuzhilin, A.; Knowledge and Data Engineering, IEEE Transactions
on ,Vol: 8(6) , 1996 Pages:970 - 974, [PDF
Full-Text (708KB)] |
27. Visualization: Data Visualization |
Multidimensional Information Visualization:
A Survey, P. E. Hoffman and G. G. Grinstein, In Information Visualization
in Data Mining and Knowledge Discovery, U Fayyad, G. G. Grinstein and
A. Wierse (Eds.) Morgan Kauffman, 2001, Link: http://ivpr.cs.uml.edu/~phoffman/aviz/program.htm#url1 |
28. Model Visualization |
Visualizing Data Mining Models,
Kurt Thearling, Barry Becker, Dennis DeCoste, Bill Mawby, Michel
Pilote, and Dan Sommerfield, In Information Visualization
in Data Mining and Knowledge Discovery, U Fayyad, G. G. Grinstein and A.
Wierse (Eds.) Morgan Kauffman, 2001, Link: http://www.thearling.com/text/dmviz/modelviz.htm |
29. Web Mining: Overview |
Web mining research: a survey,
Raymond Kosala, Hendrik Blockeel, ACM SIGKDD Explorations, Vol. 2(1), pp.
1-15, June 2000. |
30. Document Models, Latent Semantic Indexing,
Hyperlink analysis: PageRank |
Indexing by latent semantic analysis, Deerwester,
S., Dumais, S. T., Landauer, T. K., Furnas, G. W. and Harshman, R. A.,
Journal
of the Society for Information Science, 41(6), 391-407, 1990, PDF version.
The PageRank Citation Ranking: Bringing Order to the Web, Page, Lawrence; Brin, Sergey; Motwani, Rajeev; Winograd, Terry, Proceedings of the SIGCHI conference on Human factors in computing systems, The Hague, Netherlands, pp. 145-152, 2000. pdf |
31. HITS, Search Engine Architecture |
Authoritative sources
in a hyperlinked environment. J. Kleinberg, Proc. 9th ACM-SIAM Symposium
on Discrete Algorithms, 1998. Extended version in Journal of the ACM 46(1999). The Anatomy of a Large-Scale Hypertextual Web Search Engine. S. Brin and L. Page,WWW7 / Computer Networks 30(1-7): 107-117 (1998), http://www-db.stanford.edu/~backrub/google.html |
32. Web Page Classification, Web Usage Mining |
Mining the Web, Soumen Chakrabarty,
Morgan Kauffman, 2003 Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data, Jaideep Srivastava,Robert Cooley, Mukund Deshpande,Pang-Ning Tan, SIGKDD Explorations, Vol. 1, Issue 2, 2000. (Postscript) |
Conclusion |