Corpora and datasets of pathological speech are hard to get simply because they are hard to share. In this webinar we will present and explore several alternatives for sharing such sensitive data. The webinar is interesting for all who struggle with sharing and obtaining similar types of data.
Topics for discussion include:
- Progress achieved by the DELAD initiative for sharing corpora of speech disorders (CSD) and the role of the CLARIN Knowledge Centre on Atypical Communication Expertise
- GDPR and the ethics of special category data relevant for collecting and sharing CSD
- How storing and sharing CSD is arranged in a GDPR compliant way at the Language Archive of the Max Plank Institute for Psycholinguistics and the collaboration with the Talkbank at CMU
- Infrastructure requirements for secure remote access to sensitive research data with diverse legal (e.g. social media terms of service), ethical (e.g. children as subjects), and technical (e.g. audio and video) challenges, and assessment of several existing platforms
- The CAVA audio-visual human communication archive project - a digital video repository to support the work of the international human communication research community. This resource enhances the discoverability and re-usability of expensively-created, specialist video content
- The curation and disclosure of pathological speech corpora: how CSD can be found through one organisation and made accessible through another - includes a demonstration using the example of the Polish Cued Speech Corpus of Hearing-Impaired Children