Digital corpus building for humanities research: from data collection, to annotation, exploitation and sharing

The workshop will address all necessary steps to create an annotated textual corpus, paying special attention to the use of different standards in order to enhance the sustainability and              interoperability of this type of resource.

Topics:

  • Web scraping techniques and use of regular expressions for cleaning-up and carrying out an initial mark-up of the texts
  • Discussion of different annotation pipelines according to specific research interests
  • XML mark-up and introduction to the Text Encoding Initiative Guidelines
  • Incorporating Natural Language Processing annotations into TEI editions (both for linguistic and literary research)
  • Introduction to Semantic Web technologies and how to increase the interoperability of our corpora
  • Available resources that enable the publication and exploitation of TEI corpora

Please note: The event will take place from 30.05. to 01.06.2023 from 9:00 am - 1:00 pm.

Zoom-Link: https://uni-rostock-de.zoom.us/j/63047472241?pwd=MENUUFdma3Q3K0lGUDBzeWdEbGNPQT09
Meeting ID: 630 4747 2241
Passwort: 430211

Target audience: Researchers and young researchers interested in Corpus Linguistics, Computational Literary Studies, and/or Philology, Number of participants: max. 15 on site and 15 online, Language: English, Registration: required for participation in presence

Contact & Registration

Jun.-Prof. Ulrike Henny-Krahmer (host scientist)
Faculty of Arts and Humanities
Institute for German Studies
Tel.: +49 381 498 2555
ulrike.henny-krahmeruni-rostockde


Zurück zu allen Veranstaltungen