Performance of Student Academics By K-Mean Clustering Algorithm


Mrs. Bhawna Janghel, Dr. Asha Ambhaikar

Department of Computer Science, Kalinga university, Raipur, India

*Corresponding Author E-mail:,



Data clustering is the process of grouping a set of objects that objects is the same group are more similar to each other than to those in other groups. In this Paper Clustering is used as K-mean clustering to evaluate student performance based on their result of quarterly exam, half yearly exam, and final exams result. On the basis of academics performance we can compare the result of govt. school vs private school, this will help us to find out better education system.


KEYWORDS: Data clustering, k-mean, academic performance, etc.



Data Mining:

Data mining is the process of extracting the useful information, which is stored in the large database. It is a powerful tool, which is useful for organizations to retrieve the useful information from available data warehouses. Data mining can be applied to relational databases, object-oriented databases, data warehouses, structured-unstructured databases, etc. Data mining is used in numerous areas like banking, insurance companies, pharmaceutical companies etc.[1]


Patterns in Data Mining:


The items or objects in relational databases, transactional databases or any other information repositories are considered, while finding associations or correlations.

2. Classification:

The goal of classification is to construct a model with the help of historical data that can accurately predict the value. It maps the data into the predefined groups or classes and searches for the new patterns.

3. Regression:

Regression creates predictive models. Regression analysis is used to make predictions based on existing data by applying formulas. Regression is very useful for finding (or predicting) the information on the basis of previously known information.

4. Cluster analysis:

It is a process of portioning a set of data into a set of meaningful subclass, called as cluster.It is used to place the data elements into the related groups without advanced knowledge of the group definitions.


Forecasting is concerned with the discovery of knowledge or information patterns in data that can lead to reasonable predictions about the future.


Technologies used in data mining:

Lots of techniques used in the development of data mining methods. Some of them are mentioned below:

a)     Statistics:

It uses the mathematical analysis to express representations, model and summarize empirical data or real world observations. Statistical analysis involves the collection of methods, applicable to large amount of data to conclude and report the trend.

b)     Machine learning:

Arthur Samuel defined machine learning as a field of study that gives computers the ability to learn without being programmed. When the new data is entered in the computer, algorithms help the data to grow or change due to machine learning. In machine learning, an algorithm is constructed to predict the data from the available database (Predictive analysis).It is related to computational statistics.


The four types of machine learning are:

1. Supervised learning:

It is based on the classification. It is also called as inductive learning. In this method, the desired outputs are included in the training dataset.


Unsupervised learning is based on clustering. Clusters are formed on the basis of similarity measures and desired outputs are not included in the training dataset.


Semi-supervised learning includes some desired outputs to the training dataset to generate the appropriate functions. This method generally avoids the large number of labeled examples (i.e. desired outputs).

4. Active learning:

Active learning is a powerful approach in analyzing the data efficiently. The algorithm is designed in such a way that, the desired output should be decided by the algorithm itself (the user plays important role in this type).

c)     Information retrieval :

Information deals with uncertain representations of the semantics of object (text, images).


Database systems and data warehouse :

Databases are used for the purpose of recording the data as well as data warehousing. Online Transactional Processing (OLTP) uses databases for day to day transaction purpose. To remove the redundant data and save the storage space, data is normalized and stored in the form of tables. Entity-Relational modeling techniques are used for relational database management system design. Data warehouses are used to store historical data which helps to take strategical decision for business. It is used for online analytical processing (OALP), which helps to analyze the data.

d)    Decision support system:

Decision support system is a category of information system. It is very useful indecision making for organizations. It is an interactive software based system which helps decision makers to extract useful information from the data, documents to make the decision.


KDD Data mining:

The process of discovering knowledge in data and application of data mining techniques are referred to as knowledge Discovery in Database (KDD). KDD consists of various application domains such as artificial intelligence, pattern recognition, machine learning and data visualization. The main goal of KDD is to extract knowledge from large database with the help of data mining methods.

The different steps of KDD are as given below:

1.Data cleaning:

In this step, noise and irrelevant data are removed from the database.

2.Data integration :

In this step, the heterogeneous data sources are merged into a single data source.

3.Data selection:

In this step, the data which is relevant to the analysis process gets retrieved from the database.

4.Data transformation :

In this step, the selected data is transformed in such forms which are suitable for data mining.

5.Data mining:

In this step, the various techniques are applied to extract the data patterns.

6.Pattern evaluation:

In this step, the different data patterns are evaluated.

7.Knowledge representation:

This is the final step of KDD, which represents the knowledge.



On the basis of review literature the problem is identified to build data mining model by using clustering method use of clustering method for students performance and mostly papers prediction is the based on students results, but sometimes prediction may be wrong because the prediction is based on their previous result, more dataset instance will be collected and will be compared and analyzed with other data mining techniques such as association and clustering.




k-means is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed apriori. The main idea is to define k centers, one for each cluster. These centers should be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other. The next step is to take each point belonging to a given data set and associate it to the nearest center. When no point is pending, the first step is completed and an early group age is done. At this point we need to re-calculate k new centroids as barycenter of the clusters resulting from the previous step. After we have these k new centroids, a new binding has to be done between the same data set points and the nearest new center. A loop has been generated. As a result of this loop we may notice that the k centers change their location step by step until no more changes are done or in other words centers do not move any more. Finally, this algorithm aims at minimizing an objective function know as squared error function given by:[2]


 ‘||xi - vj||’ is the Euclidean distance between xi and vj.

 ‘ci’ is the number of data points in ith cluster.

 ‘c’ is the number of cluster centers.


Algorithmic steps for k-means clustering

Let X = {x1, x2, x3, …….., xn} be the set of data points and V = {v1, v2, ……., vc} be the set of centers.

1) Randomly select ‘c’ cluster centers.

2) Calculate the distance between each data point and cluster centers.

3) Assign the data point to the cluster center whose distance from the cluster center is minimum of all the cluster centers..

4) Recalculate the new cluster center using:


where, ‘ci’ represents the number of data points in ith cluster.

5) Recalculate the distance between each data point and new obtained cluster centers.

6) If no data point was reassigned then stop, otherwise repeat from step (3).



In this process, we have taken the data of the Pathalgaon dist. Jashpur private school and govt. school and Bhilai dist. Durg private school And govt. school we have 10 private school and 10 govt. school data.we will compare the result of both govt. vs private school result and will find out which school techniques are effective by comparing their results, which help us to improve the present education system.



1.      Datamining Tutorial, Home>BigDataandAnalytics>Datamining,>datamining.


3.      Oyelade, O. J, Oladipupo, O.O, Obagbuwa.I.C (IJCSIS), Application of k-Means Clustering algorithm for prediction of Students’ Academic Performance, Vol. 7, _o. 1, 2010,

4.      Sunita B Aher, Mr. LOBO L.M.R.J., Data Mining in Educational System using WEKA, (ICETT) 2011

5.      Bindiya M Varghese, Jose Tomy J, Unnikrishnan A, Poulose Jacob K, Clustering Student Data to Characterize Performance Patterns, (IJACSA)

6.      Mohammed M. Abu Tair, Alaa M. El-Halees, Mining Educational Data to Improve Students’ Performance: Volume 2 No. 2, February 2012

7.      Dorina Kabakchieva, Student Performance Prediction by Using Data Mining Classification Algorithms, Vol 1 Issue 4 November 2012

8.      Surjeet Kumar Yadav, Saurabh Pal, Data Mining: A Prediction for Performance Improvement of Engineering Students using Classification, Vol. 2, No. 2, 51-56, 2012

9.      Dr. Sudhir B. Jagtap, Dr. Kodge B. G., Census Data Mining and Data Analysis using WEKA, (ICETSTM – 2013)

10.   P.Veeramuthu, Dr.R.Periyasamy, V.Sugasini, Analysis of Student Result Using Clustering Techniques, (IJCSIT) Vol. 5 (4), 2014

11.   M. Durairaj, C. Vijitha, Educational Data mining for Prediction of Student Performance Using Clustering Algorithms, (IJCSIT) Vol. 5 (4), 2014

12.   Kashish Kohli, Shiivong Birla, Data Mining on Student Database to Improve Future Performance, 15, July 2016

13.   Mr. Shashikant Pradip Borgavakar, Mr. Amit Shrivastava, Evaluating Student’s Performance using K-Means Clustering, Vol. 6 Issue 05, May – 2017

14.   Dr. K. Karthikeyan, P. Kavipriya, On Improving Student Performance Prediction in Education Systems using Enhanced Data Mining Techniques, Volume 7, Issue 5, May 2017

15.   Hilal Almarabeh, Analysis of Students' Performance by Using Different Data Mining Classifiers, 2017.08.02

16.   Hafez Mousa1, Ashraf Maghari2, School Students' performance Predication Using Data Mining Classification, Vol. 6, Issue 8, August 2017

17.   K. Govindasamya and T. Velmuruganb, A Study on Classification and Clustering Data Mining Algorithms based on Students Academic Performance Prediction, Volume 10 • Number 23 • 2017,

18.   Abdelbaset Al-Masri, Experiences in Mining Educational Data to Analyze Teacher's Performance: A Case Study with High Educational Teachers, 2017.10.12.01




Received on 23.05.2020            Accepted on 21.06.2020     

© All Right Reserved

Int. J. Tech. 2020; 10(1):58-61.

DOI: 10.5958/2231-3915.2020.00011.5