| TDDE16 | Text Mining, 6 ECTS credits. /Text Mining/
 
 
			For:  
			
			
			
				CS  
			
			
			
				D  
			
			
			
			
			
				DAV  
			
			
			
				IT  
			
			
			
				U  
			
			
 
 | 
		 
         
          | OBS!
 | The course is not available for exchange students
 
 
 | 
         
        
		
		  |  | Prel. scheduled
		        hours: 34Rec. self-study hours: 126
 
 
 | 
		 
		
			|  | Area of Education: Technology 
 Main field of studies: Computer Engineeing, Computer Science, Information Technology
 
 
 | 
         
          |  | Advancement level 
(G1, G2, A):   A 
 
 | 
         
          |  | Aim: The overall aim of the course is to provide an introduction to quantitative analysis of text, with special focus on applying machine learning methods to text documents. The student will learn all the main steps when working with text: i) efficient extraction of text, ii) natural language processing of the text in a form suitable for  iii) statistical machine learning methods which are subsequently used for  iv) text prediction.
 After completing the course the student should be able to:
 
use basic methods for information extraction and retrieval of textual data.
apply text processing techniques to prepare documents for statistical modelling
apply relevant machine learning models for analyzing textual data and correctly interpreting the results
use machine learning models for text prediction
evaluate the performance of machine learning models for textual data
 
 
 | 
         
          |  | Prerequisites: (valid for students admitted to programmes within which the course is offered) Mathematical analysis; Linear Algebra; Probability and Statistics; Machine Learning; Basic programming.
 
 Note: Admission requirements for non-programme students usually also include admission requirements for the programme and threshhold requirements for progression within the programme, or corresponding.
 
 
 | 
         
         
          |  | Supplementary courses: Bayesian Learning, Natural Language Processing.
 
 
 | 
         
         
          |  | Organisation: The course consists of lectures, computer laboratory work and an individual project. The lectures introduce concepts and theories that students then use in problem solving at the computer labs and in the project work.
 
 
 | 
         
          |  | Course contents: Introduction and overview of quantitative text analysis and its applications. Information extraction. Web crawling. Information retrieval. Tf-idf. Vector space models. Text preprocessing. Bag of words. N-grams. Sparsity and smoothing for text. Document classification. Sentiment analysis. Model evaluation. Topic models.
 
 
 | 
         
          |  | Course literature: Bird, S., Klein, E., and Loper, E., Natural Language Processing with Python, O�?TReilly, 2009.
 Jurafsky, D., Martin, J. H., Speech and Language Processing, 2nd international edition. Pearson, 2008.
 
 
 | 
         
          |  | Examination: | 
        
				
			|  | Laboratory exercises Project
 
 | 3 ECTS 3 ECTS
 
 | 
        
		    |  | 
         
          |  | UPG1 consists of computer exercises that tests the students' ability to translate theoretical knowledge into practical problem solving in machine learning. UPG2 is an individual project where the student solves a real-world problem involving text. The project is documented and evaluated by a written project report.
 | 
 
         
          | 
 
 Course language is English.
 Department offering the course: IDA.
 Director of Studies: Ann-Charlotte Hallberg
 Examiner:
 
 Course Syllabus in Swedish
 
 |