Evaluate Student Summaries
Overview
This project aims to automatically assess the quality of summaries written by students in grades 3-12. The goal is to build a model that evaluates how well a student represents the main idea and details of a source text, as well as the clarity, precision, and fluency of the language used in the summary.
Project Notebook
For detailed code and analysis, check out the Main Notebook.
Dataset
The dataset for this project is sourced from the CommonLit Kaggle challenge. It includes a collection of real student summaries, each annotated with quality scores.
Objectives
- Predict the quality of student summaries based on various linguistic and content features.
- Identify key factors that contribute to a high-quality summary.
- Assist educators by providing an automated assessment tool for student summaries.
Methodology
- Data Preprocessing: Clean and prepare the student summaries dataset for analysis.
- Exploratory Data Analysis: Investigate relationships and trends in the data.
- Feature Engineering: Create relevant features that could influence summary quality predictions.
- Modeling: Build predictive models to evaluate the quality of student summaries.
- Evaluation: Assess model performance and interpret the results.
Tools Used
- Python
- Pandas, NumPy for data manipulation
- Scikit-learn, TensorFlow, or PyTorch for modeling
- NLTK, SpaCy for natural language processing
- Matplotlib, Seaborn for data visualization
Contributing
Contributions are welcome! Fork the repository and submit a pull request with your enhancements.
Authors
- Atharva Kulkarni
Acknowledgments
- CommonLit and Kaggle for providing the dataset and hosting the competition.