Vocabulary knowledge is essential to educational progress. High quality vocabulary instruction requires supportive contextual examples to teach word meaning and proper usage. Identifying such contexts by hand for a large number of words can be difficult. In this work, we take a statistical learning approach to engineer a system that predicts informativeness of a context for target words that span the range of difficulty from middle school to college level. Our database (released open source) includes 1,000 hand-selected words associated with approximately 70,000 contextual examples gathered from the Internet. Our training data included each context rated by 10 individuals on a four-point informativeness scale. We process the text of each context into a novel collection of approximately 600 numerical features that captures diverse linguistic information. We then fit a nonparametric regression model using Random Forests and compute out-of-sample prediction performance using cross-validation. Our system performs well enough that it can replace a human judge: for a target word not found in our dataset, we can provide curated contexts to a student learner such that most of the contexts (54 percent) feature rich contextual clues and confusing contexts are rare ( <; 1 percent). The quality of our curated contexts was validated by an independent panel of high school language arts teachers.
To View the Base Paper Abstract Contents
Now it is Your Time to Shine.
Great careers Start Here.
We Guide you to Every Step
Success! You're Awesome
Thank you for filling out your information!
We’ve sent you an email with your Final Year Project PPT file download link at the email address you provided. Please enjoy, and let us know if there’s anything else we can help you with.
To know more details Call 900 31 31 555
The WISEN Team