Student Grade Prediction

A student’s academic performance—shaped by factors like study habits, family background, and school environment — directly impacts their future career prospects, earning potential, and overall well-being. By identifying these influences and applying targeted strategies, educators and policymakers can enhance learning and academic outcomes and better support student in achieving their full potentials and success.
The objective is to equip educational policymakers with predictive tools and actionable insights to enhance student performance and promote academic success
In this project, we will develop a model to predict which students need intervention before they fail their college program, using the UCI Student Performance dataset from two Portuguese schools.
Data
The dataset used in this project is the publicly available Student Performance dataset obtained from the UCI Machine Learning Repository. Each row represents a unique student from two Portuguese secondary schools, and each column contains features describing student characteristics—such as demographic, social, and school-related features—as well as their final grades.
The following describes the key columns in the data:
- school: Student’s school (binary: ‘GP’ - Gabriel Pereira or ‘MS’ - Mousinho da Silveira)
- sex: Student’s sex (binary: ‘F’ - female or ‘M’ - male)
- age: Student’s age (numeric: from 15 to 22)
- address: Student’s home address type (binary: ‘U’ - urban or ‘R’ - rural)
- famsize: Family size (binary: ‘LE3’ - less or equal to 3 or ‘GT3’ - greater than 3)
- Pstatus: Parent’s cohabitation status (binary: ‘T’ - living together or ‘A’ - apart)
- Medu: Mother’s education (numeric: 0 - none, 1 - primary education, 2 - 5th to 9th grade, 3 - secondary education, or 4 - higher education)
- Fedu: Father’s education (numeric: 0 - none, 1 - primary education, 2 - 5th to 9th grade, 3 - secondary education, or 4 - higher education)
- studytime: Weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)
- failures: Number of past class failures (numeric: n if 1≤n<3, else 4)
- schoolsup: Extra educational support (binary: yes or no)
- famsup: Family educational support (binary: yes or no)
- activities: Extra-curricular activities (binary: yes or no)
- internet: Internet access at home (binary: yes or no)
- G1: First period grade (numeric: from 0 to 20)
- G2: Second period grade (numeric: from 0 to 20)
- G3: Final grade (numeric: from 0 to 20, output target)
- Mjob: Mothers job (nominal: “teacher”, “health” care related, civil “services” (e.g. administrative or police), “at_home” or “other”)
- Guardian: Students guardian (nominal: “mother”, “father” or “other”)
- Fjob: Fathers job (nominal: “teacher”, “health” care related, civil “services” (e.g. administrative or police), “at_home” or “other”)
- Famrel: quality of family relationships (numeric: from 1 - very bad to 5 - excellent)
- Reason: reason to choose this school (nominal: close to “home”, school “reputation”, “course” preference or “other”)
- Traveltime: Home to school travel time (numeric: 1 - < 15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)
- Studytime: Weekly study time (numeric: 1 - < 2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - > 10 hours)
- Paid : Extra paid classes within the course subject (Math or Portuguese) (binary: yes or no)
- Nursery : Attended nursery school (binary: yes or no)
- Higher: Wants to take higher education (binary: yes or no).
- Romantic: With a romantic relationship (binary: yes or no)
- Freetime: Free time after school (numeric: from 1 - very low to 5 - very high)
- Goout: Going out with friends (numeric: from 1 - very low to 5 - very high)
- Walc: Weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
- Dalc: Workday alcohol consumption (numeric: from 1 - very low to 5 - very high)
- Health: Current health status (numeric: from 1 - very bad to 5 - very good)
- Absences: Number of school absences (numeric: from 0 to 93)
The dataset is available here: UCI Repository in CSV format.