0% found this document useful (0 votes)
61 views

Mining Students Data To Analyze Learning Behavior: A Case Study

This document discusses mining student data from a database course to analyze learning behavior. It collected data from 151 students, including personal records, academic records, course records, and data from the course management system Moodle. After preprocessing the data, it applied data mining techniques including association rule mining, classification, clustering, and outlier detection. Association rule mining identified rules describing relationships between student attributes and final grades. Classification and clustering analyzed student behavior, while outlier detection found unusual students.

Uploaded by

Emil Stankov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

Mining Students Data To Analyze Learning Behavior: A Case Study

This document discusses mining student data from a database course to analyze learning behavior. It collected data from 151 students, including personal records, academic records, course records, and data from the course management system Moodle. After preprocessing the data, it applied data mining techniques including association rule mining, classification, clustering, and outlier detection. Association rule mining identified rules describing relationships between student attributes and final grades. Classification and clustering analyzed student behavior, while outlier detection found unusual students.

Uploaded by

Emil Stankov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

MINING STUDENTS DATA TO ANALYZE LEARNING BEHAVIOR:

A CASE STUDY
ALAA EL-HALEES
Department of Computer Science, Islamic University of Gaza P.O.Box 108 Gaza, Palestine
[email protected]

Section 3 gives a general description of the data we


ABSTRACT used in our case study. Section 4 describes the
Educational data mining concerns with developing preprocess stage of the used data. Section 5 reports our
methods for discovering knowledge from data that come experiments about applying data mining methods on the
from educational environment. In this paper we used educational data. Finally we conclude this paper with a
educational data mining to analyze learning behavior. summary and an outlook for future work.
In our case study, we collected students' data from
DataBase course. After preprocessing the data, we 2. RELATED WORK
applied data mining techniques to discover association, Although, using data mining in higher education is a
classification, clustering and outlier detection rules. In recent research field, there are many works in this area.
each of these four tasks, we extracted knowledge that That is because of its potentials to educational institutes.
describes students' behavior. [14] have a survey on educational data mining between
1995 and 2005. They concluded that educational data
Keywords: Educational Data Mining, E-Learning, mining is a promising area of research and it has a
Learning Management Systems. specific requirements not presented in other domains.
Thus, work should be oriented towards educational
1. INTRODUCTION domain of data mining.
There are increasing research interests in using data [10] gave a case study that used educational data mining
mining in education. This new emerging field, called to identify behavior of failing students to warn students
Educational Data Mining, concerns with developing at risk before final exam. [15] gave another case study
methods that discover knowledge from data come from of using educational data mining in Moodle course
educational environments [15]. The data can be colleted management system. They used each step in data
form historical and operational data reside in the mining process for mining e-learning data. Also,
databases of educational institutes. The student data can educational data mining used by [11] to predict
be personal or academic. Also it can be collected from students' final grade using data collected from Web-
e-learning systems which have a vast amount of based system. [3] used educational data mining to
information used by most institutes [8][13]. identify and then enhance educational process in higher
Educational data mining used many techniques such as educational system which can improve their decision-
decision trees, neural networks, k-nearest Neighbor, making process. Finally, [17] used data mining to assist
Naive Bayes, support vector machines and many others. in development of new curricula, and to help
Using these methods many kinds of knowledge can be engineering students to select an appropriate major.
discovered such as association rules, classifications and
clustering. The discovered knowledge can be used to 3. DATA COLLECTION
better understand students' behavior, to assist In our case study we collected the students data from
instructors, to improve teaching, to evaluate and data based management system course held at the
improve e-learning systems , to improve curriculums Islamic University of Gaza in the first semester of
and many other benefits [14] [15]. 2007/2008. The number of students was 151. The
Romero and Ventura in [14] concluded that work sources of collected data were: personal records and
should be oriented towards educational domain of data academic records of students, course records and data
mining. This paper investigates the educational domain came from e-learning system. For e-learning system the
of data mining using a case study from Database class. course used Moodle which is a well known open source
It showed what kind of data could be collected, how course management system [12]. From Moodle, first,
could we preprocess the data, how to apply data mining we collected information about student accessing e-
methods on the data, and finally how can we benefited learning, where it appeared that some students did not
from the discovered knowledge. There are many kinds access the system at all. Then, we got information about
of knowledge can be discovered from data. In this work how much student benefited from resources, such as
we investigated the most common ones which are using ebooks, research papers and old exams available
association, classification, clustering and outlier on the system. Also, we got the results of students'
detection. grades in solving exercises available in the system.
The rest of the paper is organized as follows: Section 2
summaries related works in educational data mining.
4. DATA PREPARATION AND one item which is final_grade= z where z is one value
of the final grade such as excellent, very good,…etc.
PREPROCESSING Figure 5.1 is sample of association rules discovered
To get better input data for data mining techniques, we from data for excellent final grade students.
did some preprocessing for the collected data. After we
integrated the data into one file, to increase
interpretation and comprehensibility, we discretized the
numerical attributes to categorical ones. For example,
we grouped all grades into five groups excellent, very
good, good, poor and failure. In the same way, we
discretized other attributes such as attendance and
resource access.

Figure 5.1: Associations rules for student data

These rules are sorted by lift metric. The lift value is the
ratio of the confidence of the rule and the expected
confidence of the rule [16]. The lift is measured as the
ratio of the probability of antecedent and consequent
occurring together to the probability of antecedent and
consequent occurring independently. The lift value of
greater than 1 indicates a positive correlation between
antecedent and consequent. For example the first rule
Figure 4.1: visualizing data used in the case study using Knim
with lift is 9.313 means there is a high positive
data mining system
correlation between the antecedent good attendance,
doing exercise in e-learning and has good midterm
By using some preprocessing techniques, such as
grade, and the consequent final grade excellent. With
visualization, we can get some primary useful
the lift value, we can interpret the importance of a rule.
knowledge. For example, using Knim, which is an open
The first rule, with the highest lift which means highest
source data mining system from university of Konstanz,
correlation is the most important, and so on.
Germany, we visualized students' data [6]. From this
For more understanding of the association rules, a graph
visualization some useful knowledge has been drawn
can be constructed. For example, figure 5.2 represents
about the attributes before applying data mining
rules for final grade fail. From the graph we can see
methods. By using histogram of the data as in graph
some attributes happens more frequent than others such
(figure 4.1), we discovered that attendance, students'
as failing in midterm.
GPAs, and lab grades has a positive relationship with
the final grade. However, e-learning facilities such as e-
recourses used by student, exercises and assignment
hardly affected the final grade of student.

5. DATA MINING TASKS IN


EDUCATIONAL SYSTEMS
Data mining used advanced techniques to discover
patterns from data. The data mining tasks are the kinds
of patterns that can be mined. There are many tasks in
data mining, the most common ones are: Association,
classification, clustering and outlier detections. In the
following sections describes the results of applying data
mining techniques to the data of our case study for each Figure 5.2 Associations rules Graph for students
of the four tasks. with grade fail using Arviewer [2]

5.1 Association Rules


Mining association rules searches for interesting
relationships among items in a given dataset [1]. It
allows finding rules of the form If antecedent then
5.2 CLASSIFICATION
Classification is a data mining task that predicts
(likely) consequent where antecedent and consequent
group membership for data instances [7]. In
are itemsets [7]. Itemsets are sets of one or more items.
educational data mining, given works of a student,
In our dataset an example of item is: attendance =
one may predicate his/her final grade [15]. In our
good. Because, we are looking for items that
case study we used J48 decision tree to represent
characterize the final grade of students, consequent has
logical rules of student final grade. The represented
tree is large, some of the strong rules in the tree are:
_________________________________________
If midterm =good and attendance=good then final
grade= excellent
If midterm = fail and lab = fail the final grade =
fail
If midterm = average and gpa=pass then final
grade = pass
If midterm = average and gpa= good and e-
homework=one then final grade=pass
If midterm = average and gpa= good and e-
homework=two attendance = average then final
grade=good
If midterm=average and gpa=verygood the final
Figure 5.3: Clustering students into five groups using
grade=very good EM-Clustering Algorithm

The benefit of this method is that it can predict low


grades on time. For example the instructor can predict 5.4 OUTLIER DETECTION
fail students before the end of the semester and he may Outlier detection discovers data points that are
work on them to improve their performance before the significantly different than the rest of the data [9].
final. In educational data mining outlier analysis can be
It is important to know that classification rules are used to detect students with learning problems [14].
different than rules generated from association. In our case study, we used outlier analysis to detect
Association rules are characteristic rules (it describes outliers in the student data. The system detected 37
current situation), but classification rules are prediction outliers in our data. Figure (5.4) is a sample of
rules (it describes future situation). instances which detected as an outlier and the
attribute where the outlier occurred. For each case
5.3 CLUSTERING instructor can look at the outlier behavior of the
Clustering is finding groups of objects such that the student and try to find and understand why the
objects in one group will be similar to one another and irregularity happened and then resolve the problem
different from the objects in another group [7]. In if there is any.
educational data mining, clustering has been used to
group students according to their behavior. For
example, Romero in [18] used clustering to distinguish
active students from non-active according to their
performance in activates. According to this clustering,
instructor groups active students with non-active
students for better students' performance.
In our case we used Expectation-Maximization
Algorithm (EM-clustering) to cluster the given data. An
EM algorithm [5][4] is a mixture based algorithm that
finds maximum likelihood estimates of parameters in
probabilistic models. In our case, we used EM-
clustering to group students according to their
performance. Figure (5.3) gives Mean of each cluster
for each attribute. Using these results we can divide
students into five groups and guide them according to
their behavior.

Figure 5.4: Outlier analysis of student data

6. CONDUCTION AND FUTURE


WORK
In this paper, we gave a case study in educational data
mining. It showed how useful data mining can be in
higher education in particularly to improve student
performance. We used students' data from database
course. We collected all available data including their [10] Merceron, A. and Yacef, K.,"Educational Data
usage of Moodle e-learning facility. We applied data Mining: a Case Study" In Proceedings of the 12th
mining techniques to discover knowledge. Particularly International Conference on Artificial Intelligence in
we discovered association rules and we sorted the rules Education AIED 2005, Amsterdam, The Netherlands,
using lift metric then we visualized the rules. Then we IOS Press. 2005
discovered classification rules using decision tree. Also
we clustered the student into group using EM- [11] Minaei-Bidgoli B., Kashy, D. Kortemeyer G.,
clustering. Finally, using outlier analysis we detected all Punch W., "Predicting Student Performance: An
outliers in the data. Each one of these knowledge can be Application of Data Mining Methods with an
used to improve the performance of student. Educational Web-Based System". In the Processing of
For future work, a way to generalize the study to more 33rd ASEE/IEEE conference of Frontiers in Education.
diverse courses to get more accurate results. Also, 2003
experiments could be done using more data mining
techniques such as neural nets, genetic algorithms, k- [12] Moodle, <www.Moodle.com> 2008
nearest Neighbor, Naive Bayes, support vector
machines and others. Finally, the used preprocess and [13] Mostow,J and Beck , J., "Some useful tactics to
data mining algorithms could be embedded into e- modify , map and mine data from intelligent tutors".
learning system so that any one using the system can Natural Language Engineering 12(2), 195-208. 2006
benefited from the data mining techniques.
[14] Romero,C. and Ventura, S. ,"Educational data
REFERENCES: Mining: A Survey from 1995 to 2005".Expert Systems
with Applications (33) 135-146. 2007
[1] Agrawal, R. Imielinski, T. Swami, A., "Mining
Association Rules between Sets of Items in large
[15] Romero, C. , Ventura, S. and Garcia, E., "Data
database". In proceedings of the ACM SIGMID
mining in course management systems: Moodle case
Conferences on Management of Data, Page 207-216,
study and tutorial". Computers & Education, Vol. 51,
Washington, D.C. May. 1993
No. 1. pp. 368-384. 2008
[2] Arviewer,
http://www2.lifl.fr/~jourdan/download/arv.html, 2008 [16] Sheikh,L Tanveer B. and Hamdani,S., "Interesting
Measures for Mining Association Rules". IEEE-INMIC
[3] Beikzadeh,M. and Delavari, N., "A New Analysis Conference December. 2004.
Model for Data Mining Processes in Higher Educational
Systems". On the proceedings of the 6th Information [17] Waiyamai,K. "Improving Quality of Gradate
Technology Based Higher Education and Training 7-9 Students by Data Mining" Department of Computer
July 2005. Engineering. Faculty of Engineering. Kasetsart
University , Bangkok, Thailand. 2003.
[4] Bradley, P. Fayyad, U. and Renia C., "Scaling EM
clustering to large databases". Technical Report.
Microsoft Research. 1999

[5] Dempster ,A. Larid N., Rubin,D. "Maximum


Likehood estimation from incomplete data via EM
Algorithm". Journal of the Royal Statistics Society, 39
(1) : 1- 38. 1977.

[6] knime <www. Knime.com> , 2008

[7] Han,J. and Kamber, M., "Data Mining: Concepts


and Techniques", 2nd edition. The Morgan Kaufmann
Series in Data Management Systems, Jim Gray, Series
Editor. 2006.

[8] Machado, L. and Becker, K. "Distance Education: A


Web Usage Mining Case Study for the Evaluation of
Learning Sites". Third IEEE International Conference
on Advanced Learning Technologies (ICALT'03), 2003.

[9] Mansur, M. O. and Sap, M. Noor , M. "Outlier


Detection Technique in Data Mining: A Research
Perspective". In Postgraduate Annual Research
Seminar. 2005

You might also like