Study Guide@lith
 

Linköping Institute of Technology

 
 
Valid for year : 2017
 
TDDE16 Text Mining, 6 ECTS credits.
/Text Mining/

For:   CS   D   DAV   IT   U  


OBS!

The course is not available for exchange students

 

Prel. scheduled hours: 34
Rec. self-study hours: 126

  Area of Education: Technology

Main field of studies: Computer Engineeing, Computer Science, Information Technology

  Advancement level (G1, G2, A): A

Aim:
The overall aim of the course is to provide an introduction to quantitative analysis of text, with special focus on applying machine learning methods to text documents. The student will learn all the main steps when working with text: i) efficient extraction of text, ii) natural language processing of the text in a form suitable for iii) statistical machine learning methods which are subsequently used for iv) text prediction.
After completing the course the student should be able to:
  • use basic methods for information extraction and retrieval of textual data.
  • apply text processing techniques to prepare documents for statistical modelling
  • apply relevant machine learning models for analyzing textual data and correctly interpreting the results
  • use machine learning models for text prediction
  • evaluate the performance of machine learning models for textual data


Prerequisites: (valid for students admitted to programmes within which the course is offered)
Mathematical analysis; Linear Algebra; Probability and Statistics; Machine Learning; Basic programming.

Note: Admission requirements for non-programme students usually also include admission requirements for the programme and threshhold requirements for progression within the programme, or corresponding.

Supplementary courses:
Bayesian Learning, Natural Language Processing.

Organisation:
The course consists of lectures, computer laboratory work and an individual project. The lectures introduce concepts and theories that students then use in problem solving at the computer labs and in the project work.

Course contents:
Introduction and overview of quantitative text analysis and its applications. Information extraction. Web crawling. Information retrieval. Tf-idf. Vector space models. Text preprocessing. Bag of words. N-grams. Sparsity and smoothing for text. Document classification. Sentiment analysis. Model evaluation. Topic models.

Course literature:
Bird, S., Klein, E., and Loper, E., Natural Language Processing with Python, Oâ?TReilly, 2009.
Jurafsky, D., Martin, J. H., Speech and Language Processing, 2nd international edition. Pearson, 2008.


Examination:
Laboratory exercises
Project
3 ECTS
3 ECTS
 
UPG1 consists of computer exercises that tests the students' ability to translate theoretical knowledge into practical problem solving in machine learning.
UPG2 is an individual project where the student solves a real-world problem involving text. The project is documented and evaluated by a written project report.



Course language is English.
Department offering the course: IDA.
Director of Studies: Ann-Charlotte Hallberg
Examiner:

Course Syllabus in Swedish

Linköping Institute of Technology

 


Contact: TFK , val@tfk.liu.se
Last updated: 04/26/2017