View on GitHub

Evaluate Student Summarizes

by Atharva K

Download this project as a .zip file Download this project as a tar.gz file

Evaluate Student Summaries

Overview

This project aims to automatically assess the quality of summaries written by students in grades 3-12. The goal is to build a model that evaluates how well a student represents the main idea and details of a source text, as well as the clarity, precision, and fluency of the language used in the summary.

Project Notebook

For detailed code and analysis, check out the Main Notebook.

Dataset

The dataset for this project is sourced from the CommonLit Kaggle challenge. It includes a collection of real student summaries, each annotated with quality scores.

Objectives

Predict the quality of student summaries based on various linguistic and content features.
Identify key factors that contribute to a high-quality summary.
Assist educators by providing an automated assessment tool for student summaries.

Methodology

Data Preprocessing: Clean and prepare the student summaries dataset for analysis.
Exploratory Data Analysis: Investigate relationships and trends in the data.
Feature Engineering: Create relevant features that could influence summary quality predictions.
Modeling: Build predictive models to evaluate the quality of student summaries.
Evaluation: Assess model performance and interpret the results.

Tools Used

Python
Pandas, NumPy for data manipulation
Scikit-learn, TensorFlow, or PyTorch for modeling
NLTK, SpaCy for natural language processing
Matplotlib, Seaborn for data visualization

Contributing

Contributions are welcome! Fork the repository and submit a pull request with your enhancements.