Introduction published in:
Natural language processing for learner corpus research
Edited by Kristopher Kyle
[International Journal of Learner Corpus Research 7:1] 2021
► pp. 116


Natural language processing for learner corpus research


Alexopoulou, T., Michel, M., Murakami, A., & Meurers, D.
(2017) Task Effects on Linguistic Complexity and Accuracy: A Large-Scale Learner Corpus Analysis Employing Natural Language Processing Techniques. Language Learning, 67(S1), 180–208. CrossrefGoogle Scholar
Anthony, L.
(2014) AntWordProfiler (Version 1.4. 1)[Computer Software]. Tokyo, Japan: Waseda University.Google Scholar
(2019) AntConc (3.5.8) [Computer software]. Tokyo, Japan: Waseda University.Google Scholar
Bauer, L., & Nation, I. S. P.
(1993) Word families. International Journal of Lexicography, 6(4), 253–279. CrossrefGoogle Scholar
Berzak, Y., Kenney, J., Spadine, C., Wang, J. X., Lam, L., Mori, K. S., Garza, S., & Katz, B.
(2016) Universal dependencies for learner English. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (pp. 737–746). Stroudsburg: Association for Computational Linguistics.Google Scholar
Bestgen, Y., & Granger, S.
(2014) Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing, 26, 28–41. CrossrefGoogle Scholar
Biber, D.
(1988) Variation across speech and writing. Cambridge: Cambridge University Press. CrossrefGoogle Scholar
Biber, D., Gray, B., & Staples, S.
(2014) Predicting Patterns of Grammatical Complexity Across Language Exam Task Types and Proficiency Levels. Applied Linguistics, 37(5), 639–668. CrossrefGoogle Scholar
Chen, D., & Manning, C. D.
(2014) A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 740–750). Stroudsburg: Association for Computational Linguistics. CrossrefGoogle Scholar
Choi, J. D., Tetreault, J., & Stent, A.
(2015) It depends: Dependency parser comparison using a web-based evaluation tool. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 387–396). Stroudsburg: Association for Computational Linguistics.Google Scholar
Cobb, T.
(2018) Web VocabProfile (WebVP). [Computer Software].Google Scholar
Crossley, S. A., Kyle, K., & Dascalu, M.
(2019) The Tool for the Automatic Analysis of Cohesion 2.0: Integrating semantic similarity and text overlap. Behavior Research Methods, 51(1), 14–27. CrossrefGoogle Scholar
Crossley, S. A., & McNamara, D. S.
(2012) Predicting second language writing proficiency: The roles of cohesion and linguistic sophistication. Journal of Research in Reading, 35(2), 115–135. CrossrefGoogle Scholar
Díez-Bedmar, M. B., & Pérez-Paredes, P.
(2020) Noun phrase complexity in young Spanish EFL learners’ writing: Complementing syntactic complexity indices with corpus-driven analyses. International Journal of Corpus Linguistics, 25(1), 4–35. CrossrefGoogle Scholar
Explosion AI
(2018) spaCy language models. Retrieved from https://​spacy​.io​/models​/en#en​_core​_web​_sm
Garside, R., Leech, G. N., & McEnery, T.
(1997) Corpus annotation: Linguistic information from computer text corpora. Harlow: Longman. CrossrefGoogle Scholar
Geertzen, J., Alexopoulou, T., & Korhonen, A.
(2013) Automatic linguistic annotation of large scale L2 databases: The EF-Cambridge Open Language Database (EFCAMDAT). In R. T. Miller, K. I. Martin, C. M. Eddington, A. Henery, N. Marcos Miguel, A. M. Tseng, A. Tuninetti, & D. Walter (Eds.), Selected Proceedings of the 2012 Second Language Research Forum (pp. 240–254). Somerville, MA: Cascadilla Proceedings Project.Google Scholar
Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z.
(2004) Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36(2), 193–202. CrossrefGoogle Scholar
Granger, S., & Bestgen, Y.
(2017) Using collgrams to assess L2 phraseological development: A replication study. In P. Haan, R. de Vries, & S. van Vuuren (Eds.), Language, Learners and Levels: Progression and Variation (pp. 385–408). Louvain-la-Neuve: Presses universitaires de Louvain.Google Scholar
Green, C.
(2019) Enriching the academic wordlist and Secondary Vocabulary Lists with lexicogrammar: Toward a pattern grammar of academic vocabulary. System, 87, 102158. CrossrefGoogle Scholar
Heatley, A., & Nation, I. S. P.
(1994) Range. [Computer Software]. Victoria University of Wellington, NZ. Retrieved from http://​Www​.Vuw​.Ac​.Nz​/Lals/
Huang, Y., Murakami, A., Alexopoulou, T., & Korhonen, A.
(2018) Dependency parsing of learner English. International Journal of Corpus Linguistics, 23(1), 28–54. CrossrefGoogle Scholar
Jurafsky, D., & Manning, C. D.
(2008) Speech and language processing: An introduction to natural language processing, speech recognition, and computational linguistics (2nd ed.). Upper Saddle River: Prentice-Hall.Google Scholar
Jurafsky, D., & Martin, J. H.
(2019) Speech and Language Processing (Unpublished Manuscript). October 2019 Retrieved from https://​web​.stanford​.edu​/~jurafsky​/slp3/
Khushik, G. A., & Huhta, A.
(2020) Investigating Syntactic Complexity in EFL Learners’ Writing across Common European Framework of Reference Levels A1, A2, and B1. Applied Linguistics, 41(4), 506–532. CrossrefGoogle Scholar
Kitaev, N., & Klein, D.
(2018) Constituency parsing with a self-attentive encoder. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 2676–2686). Stroudsburg: Association for Computational Linguistics. CrossrefGoogle Scholar
Klein, D., & Manning, C. D.
(2003) Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (pp. 423–430). Stroudsburg: Association for Computational Linguistics. CrossrefGoogle Scholar
Kyle, K.
(2016) Measuring Syntactic Development in L2 Writing: Fine Grained Indices of Syntactic Complexity and Usage-Based Indices of Syntactic Sophistication (Unpublished doctorial dissertation). Georgia State University, Atlanta. http://​scholarworks​.gsu​.edu​/alesl​_diss​/35/
Kyle, K., & Crossley, S. A.
(2017) Assessing syntactic sophistication in L2 writing: A usage-based approach. Language Testing, 34(4), 513–535. CrossrefGoogle Scholar
(2018) Measuring Syntactic Complexity in L2 Writing Using Fine-Grained Clausal and Phrasal Indices. The Modern Language Journal, 102(2), 333–349. CrossrefGoogle Scholar
Kyle, K., Crossley, S. A., & Verspoor, M.
in press). Measuring longitudinal writing development using indices of syntactic complexity and VAC sophistication. Studies in Second Language Acquisition.
Kyle, K., & Eguchi, M.
in press). Automatically assessing lexical sophistication using word, bigram, and dependency indices. In S. Granger Ed. Perspectives on the Second Language Phrasicon: The View from Learner Corpora Bristol Multilingual Matters
in progress). A gold standard part of speech tagged and dependency parsed corpus of L2 speech.
Levy, R., & Andrew, G.
(2006) Tregex and Tsurgeon: Tools for querying and manipulating tree data structures. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06) (pp. 2231–2234). European Language Resources Association (ELRA).Google Scholar
Lu, X.
(2010) Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474–496. CrossrefGoogle Scholar
Lu, X., & Ai, H.
(2015) Syntactic complexity in college-level English writing: Differences among writers with diverse L1 backgrounds. Journal of Second Language Writing, 29, 16–27. CrossrefGoogle Scholar
McNamara, D. S., Graesser, A. C., McCarthy, P. M., & Cai, Z.
(2014) Automated evaluation of text and discourse with Coh-Metrix. Cambridge: Cambridge University Press. CrossrefGoogle Scholar
Meurers, D., & Dickinson, M.
(2017) Evidence and interpretation in language learning research: Opportunities for collaboration with computational linguistics. Language Learning, 67(S1), 66–95. CrossrefGoogle Scholar
Nivre, J., Hall, J., & Nilsson, J.
(2006) MaltParser: A Data-Driven Parser-Generator for Dependency Parsing. In Proceedings of the fifth international conference on language resources and evaluation (LREC’06) (pp. 2216–2219). European Language Resources Association (ELRA).Google Scholar
Paquot, M.
(2018) Phraseological Competence: A Missing Component in University Entrance Language Tests? Insights From a Study of EFL Learners’ Use of Statistical Collocations. Language Assessment Quarterly, 15(1), 29–43. CrossrefGoogle Scholar
(2019) The phraseological dimension in interlanguage complexity research. Second Language Research, 35(1), 121–145. CrossrefGoogle Scholar
Paquot, M., Naets, H., & Gries, S. T.
in press). Using syntactic co-occurrences to trace phraseological complexity development in learner writing: Verb + object structures in LONGDALE. In B. LeBruyn & M. Paquot Eds. Learner Corpus Research Meets Second Language Acquisition Cambridge Cambridge University Press
Pinchbeck, G. G.
(2017) Vocabulary Use in Academic-Track High-School English Literature Diploma Exam Essay Writing and its Relationship to Academic Achievement (Unpublished doctoral dissertation). University of Calgary, Calgary.Google Scholar
Polio, C., & Yoon, H.
(2018) The reliability and validity of automated tools for examining variation in syntactic complexity across genres. International Journal of Applied Linguistics, 28(1), 165–188. CrossrefGoogle Scholar
Schmid, H.
(1994) Probabilistic part-of-speech tagging using decision trees. In International Conference on New Methods in Language Processing (pp. 44–49). Manchester, UK.Google Scholar
(1995) Treetagger: A language independent part-of-speech tagger [Computer software] Institut Für Maschinelle Sprachverarbeitung, Universität Stuttgart, Stuttgart.Google Scholar
Scott, M.
(2020) WordSmith Tools (8.0) [Computer software]. Liverpool: Lexical Analysis Software.Google Scholar
Toutanova, K., Klein, D., Manning, C. D., & Singer, Y.
(2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology – Volume 1 (pp. 173–180). Stroudsburg: Association for Computational Linguistics.Google Scholar
van den Bosch, A., Busser, B., Canisius, S., & Daelemans, W.
(2007) An efficient memory-based morphosyntactic tagger and parser for Dutch. In P. Dirix, I. Schuurman, V. Vandeghinste, & F. Van Eynde (Eds.), Proceedings of the 17th meeting of Computational Linguistics in the Netherlands (pp. 191–206).Google Scholar
van Noord, G.
(2006) At last parsing is now operational. In TALN 2006 (pp. 20–42).Google Scholar
Weischedel, R., Palmer, M., Marcus, M., Hovy, E., Pradhan, S., Ramshaw, L., Xue, N., Taylor, A., Kaufman, J., & Franchini, M.
(2013) Ontonotes release 5.0. Philadelphia: Linguistic Data Consortium. Retrieved from https://​catalog​.ldc​.upenn​.edu​/LDC2013T19
Yannakoudakis, H., Briscoe, T., & Medlock, B.
(2011) A new dataset and method for automatically grading ESOL texts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (pp. 180–189). Stroudsburg: Association for Computational Linguistics.Google Scholar