OPEN REPRODUCIBLE DATA SCIENCE AND STATISTICS
Open Reproducible Data Science and Statistics (ORDS) is a scientific network at the Graduate Academy of the University of Rostock. Its goal is to bundle regional expertise in the fields of data analysis and statistics with open and reproducible science in the Rostock-science-region. The focus is on the exchange of expertise between doctoral candidates and postdocs, but all other interested scientists are also welcome.
Due to the interdisciplinary character of modern data-driven science, the network explicitly addresses all disciplines. Besides general questions of data analysis, statistics and reproducibility, the network also focuses on programming environments such as R and Python. While R is mainly used for statistical data analysis and data visualization, Python is especially common in the application of machine learning methods. Besides the actual data analysis, another central point is the management and versioning of data and source code. Modern tools such as the Git version control system, Jupyter notebooks and Docker containers can be used for this purpose.
We want to meet at regular intervals and inform each other about current projects and methods in the field of data analysis and statistics. Furthermore, we have the possibility to invite external experts. Furthermore, seminars and workshops at the Graduate Academy will be dedicated to the topics of ORDS.
Virtual kick off meeting of the ORDS network on 01 December 2020
In the morning: 9:00 - 11:30 a.m.
Opening & Words of Welcome
Vice Rector für Research and Transfer of Knowledge, Udo Kragl
What is the ORDS network?
Anja Eggert and Frank Krüger
Keynote: Let's become (Open) Science Champions!
Project: Let’s organize a ReproHack in Rostock!
Workshop: Reproducible statistical data analysis with R and RMarkdown
Afternoon: 1:00 - 4:00 p.m.
Keynote: Reproducibility and Peer Review
Workshop: Reproducible Data Science with Jupyter Notebooks
Frank Krüger and Max Schröder
Future Perspectives & Closing Remarks
Anja Eggert and Frank Krüger
KEYNOTE: Let's become (Open) Science Champions!
What was your reason to start working in science? To produce new knowledge? To make the world a better place? If that is the case, you do not want to be a part of the problem in scientific misconduct or the reproducibility crisis. Even more, you want to be part of the solution and be a real science champion. In this talk I will discuss which people I see as science champions (spoiler: open scientists) and which steps each of us can take on the road to becoming a science champion.
Heidi Seibold is a group leader at Helmholtz AI. Her group "Open AI in Health" works on improving practices of open and reproducible research in artificial intelligence and health research. Dr. Seibold studied statistics at LMU Munich and did her PhD in computational Biostatistics at the University of Zurich, where she developed tree methods for personalized medicine and developed her interest in open and reproducible research.
KEYNOTE: Reproducibility and Peer Review
Reproducible research and peer review are cornerstones of science today, but are they getting along? In this talk, Daniel presents challenges and opportunities of executing code-based workflows as part of peer review processes. Learn what he thinks how you can, and should, change your habits today, what institutions and communities can do, and how the future looks like once research compendia become the norm in scholarly communication.
Daniel Nüst is researcher at the Institute for Geoinformatics, University of Münster, Germany. He completed his studies in Münster with a Diploma in Geoinformatics and worked at 52°North Initiative for Geospatial Open Source Software as a consultant and software developer. Since 2016, Daniel develops tools for creation and execution of research compendia in geography and geosciences in the project Opening Reproducible Research. He is Reproducibility Chair at the AGILE Conference 2020, Co-PI of CODECHECK and vice chair of the German Society for Research Software.
PROJECT: Let’s organize a ReproHack in Rostock!
ReproHacks are one day reproducibility hackathons where participants attempt to reproduce papers from associated published code and data. The events act as a sandbox for practicing reproducible research, providing opportunity for authors to practice generating and publishing reproducible papers and for participants to reproduce, reuse and review other researchers work! In this talk, we’ll describe in more detail how the events work, the various flavours of ReproHacks available and some initial findings from events so far. We’ll also give tips for folks considering running their own events, especially for this group here in Rostock!
Daniela Gawehns is a PhD student in computer science at Leiden University. Her research focuses on the integration of data from diverse data sources for data science applications in the Health Sciences. Before her doctoral studies, she obtained a Masters degree in Clinical Neuropsychology and a Masters degree in Statistical Sciences. She is interested in reproducible research practices for machine learning research and promotes making computer science an open, diverse and welcoming field of research. Within the ReproHack core team, She is in charge of most of the outreach via twitter and designing training materials for future ReproHack organizers.
WORKSHOP: Reproducible statistical data analysis with R and RMarkdown
This tutorial provides you with the first steps how you can create a reproducible workflow of your data and statistical analysis. Using real-world data, we will write analysis code and graphics code in R in the RStudio interface. Setting up an R project even allows to use R in combination with the version control system Git. Finally, we will knit a so-called “dynamic report” in R Markdown, which ensures that each report is consistent with the actual statistical results. The example data set will be available on GitHub.
Anja Eggert is researcher and statistical advisor at the Leibniz Institute of Farm Animal Biology, Dummerstorf, where she promotes open and reproducible data science and statistics. She studied Marine Biology at the University of Bremen and did her PhD on aspects of climate change on global distribution of seaweeds at the University of Groningen. She continued working in phycology at the University of Rostock. With her strong affinity to big data, she changed her focus to programming numerical ocean models in research projects at the Leibniz Institute of Baltic Sea Research, Warnemünde.
WORKSHOP: Reproducible Data Science with Jupyter Notebooks
Publishing not only the results, but also source code and data is central in the discussion about open science and the FAIR principles. Literate programming is the concept of interweaving documentation, code, and data and, thus, fosters the publication of a comprehensive document containing not only the results of a research analysis, but the analysis itself. Jupyter notebooks are one implementation of this concept. This workshop employs Jupyter notebooks examples to illustrate these aspects.
Frank Krüger is researcher at the Institute of Communications Engineering, University of Rostock with interests in research data management, natural language processing and provenance modelling. He works in the Infrastructure Support project of the CRC 1270 ELAINE and investigates how techniques of automatic information extraction and machine learning can be used for the documentation of research processes and the resulting data. Frank studied computer science and did his PhD about human activity and plan recognition from noisy sensor data.
Max Schröder is a doctoral researcher at the Infrastructure project of the CRC 1270 ELAINE with interests in provenance and semantic modeling, virtual research environments, and research data management in collaborative and interdisciplinary research projects. Before, he studied computer science at the University of Rostock with a specialization in Smart Computing. Besides his research, he promotes open and reproducible science in order to foster high quality research.