Visa svensk kursplan
 
SYLLABUS
Text Mining, 6 ECTS Credits
 
COURSE CATEGORY   Master´s Programme in Statistics and Data Mining
MAIN FIELD OF STUDY   Statistic - STA
SUBJECT AREA   Statistics – ST1
  COURSE CODE   732A47
AIM OF THE COURSE
After completion of the course the student should on an advanced level be able to:
- account for and use the principles for storing and accessing textual information
- account for techniques for information extraction and information retrieval
-apply text processing techniques to prepare documents for statistical modelling
- apply relevant statistical models for analyzing textual data and correctly interpret the results
- use statistical models for prediction of textual information
- evaluate the performance of statistical models for textual data
CONTENTS
The course aims to show how to textual data can be retrieved, linguistically pre-processed and subsequently analyzed quantitatively using formal statistical methods and models. The course brings together expertise from the areas of database methodology, computational linguistics and statistics.
The course proceeds in four stages:
1. Introductory modules
- Introduction to Python programming
- Introduction to statistical modeling
- Introduction to computational linguistics
2. Data models and information retrieval for textual data
3. Statistical models for textual data
4. Text mining project
TEACHING
The course consists of lectures, lab exercises and a text mining project. The lectures are devoted to presentations of concepts, and methods. The computer lab exercises are devoted to practical application of text mining tools. In the project work, the student will get hands-on experience in solving a text mining problem. Homework and independent study are a necessary complement to the course. Language of instruction: English.
EXAMINATION
Written and oral report on the Text mining project. Written reports on lab assignments. Detailed information about the examination can be found in the course’s study guide.

Students failing an exam covering either the entire course or part of the course two times are entitled to have a new examiner appointed for the reexamination.

Students who have passed an examination may not retake it in order to improve their grades.
ADMISSION REQUIREMENTS

For acceptance to the course, the student must have a bachelor’s degree with a total of at least 90 ECTS credits (1.5 years of full-time studies) in mathematics, applied mathematics, statistics, and computer science. The undergraduate courses in mathematics should include both calculus and linear algebra. Basic undergraduate courses in statistics and computer science are also required.
Documented knowledge of English equivalent to Engelska B/Engelska 6: internationally recognized test, e.g. TOEFL (minimum scores: Paper based 575 + TWE-score 4.5, and internet based 90+TWE-score 20), IELTS, academic (minimum score Overall band 6.5 and no band under 5.5), or equivalent.
GRADING
The course is graded according to the ECTS grading scale A-F
CERTIFICATE
Course certificate is issued by the Faculty Board on request. The Department provides a special form which should be submitted to the Student Affairs Division.
COURSE LITERATURE
The course literature is decided upon by the department in question.
OTHER INFORMATION
Planning and implementation of a course must take its starting point in the wording of the syllabus. The course evaluation included in each course must therefore take up the question how well the course agrees with the syllabus.

The course is carried out in such a way that both men´s and women´s experience and knowledge is made visible and developed.
 
Text Mining
Text Mining
 
Department responsible
for the course or equivalent:
IDA - Department of Computer and Information Science
           
Registrar No: 2012-01070   Course Code: 732A47      
    Exam codes: see Local Computer System      
Subject/Subject Area : Statistic - STA          
           
Level   Education level     Subject Area Code   Field of Education  
A1X   Advanced level       TE  
The syllabus was approved by the Board of Faculty of Arts and Science 2013-10-18