As part of my work updating the Text and Corpus Linguistics class (LING 4886) at the University of Georgia, I re-encoded the Digital Archive of Southern Speech for CQP/OpenCWB format, maintaining structural attributes from the original transcription files and joining demographic information to the corpus. Previous encodings of the corpus did not include structural attributes, and attached demographic information to each word. The new encoding is more conducive to analysis, especially using R and the new polmineR library.

The scripts as well as the compiled CWB/CQP version of the corpus are available on my GitLab: DASS-CWB @ GitLab