OPEN REPRODUCIBLE DATA SCIENCE AND STATISTICS
Open Reproducible Data Science and Statistics (ORDS) is a scientific network at the Graduate Academy of the University of Rostock. Its goal is to bundle regional expertise in the fields of data analysis and statistics with open and reproducible science in the Rostock-science-region. The focus is on the exchange of expertise between doctoral candidates and postdocs, but all other interested scientists are also welcome.
Due to the interdisciplinary character of modern data-driven science, the network explicitly addresses all disciplines. Besides general questions of data analysis, statistics and reproducibility, the network also focuses on programming environments such as R and Python. While R is mainly used for statistical data analysis and data visualization, Python is especially common in the application of machine learning methods. Besides the actual data analysis, another central point is the management and versioning of data and source code. Modern tools such as the Git version control system, Jupyter notebooks and Docker containers can be used for this purpose.
We want to meet at regular intervals and inform each other about current projects and methods in the field of data analysis and statistics. Furthermore, we have the possibility to invite external experts. Furthermore, seminars and workshops at the Graduate Academy will be dedicated to the topics of ORDS.
Next event: ORDS ReproHack on 11th May 2021
With a ReproHack reproducibility hackathon - we aim to reproduce published scientific results. During the workshop, you will either work as part of a group or individually to reproduce results from published code and open data. The aim of a ReproHack is not to discredit researchers or their work but to understand the importance of careful documentation of the entire analysis. You will be provided with the opportunity to work with real-world data and research software and asked to analyse different reproducibility aspects. Our aim is to bring together researchers of all disciplines and levels of technical expertise to join forces at the science region Rostock. This workshop gives you the opportunity to learn from other researchers from different domains and levels of experience.
Thüringen as a guest in Rostock
Together with the Thuringian Competence Network for Research Data Management, we invite you to participate in the first Rostock ReproHack. Together, we will analyse an original research paper with respect to openess and reproducibility. The authors voluntarily offered their work as part of our ReproHack and will get feedback based on our reporting back session.
A program for different levels of technical expertise
We propose the following working groups for in-depth analysis of the article at hand:
- Beginners re-run the analyis and inspect the different aspects according to the ReproHack Checklist.
- Advanced users re-implement the entire analysis in a different programming language, e.g. Python, R, Julia
- Expert users build a re-usable computational environment for the above groups with Docker or similar container techniques
- 9:30 Opening and virtual come together
- 9:45 Introduction of the Article including Code and Data
- 10:00 1st part of the workshop
- 12:00 Lunch break
- 13:00 2nd part of workshop
- 14:30 Evaluation and Goodbye
Registration to ORDS ReproHack on 11th May 2021
Summary of the kick off meeting
On 1. December, the new scientific network "Open Reproducible Data Science and Statistics" (ORDS) of the Graduate Academy of the University of Rostock celebrated its kick off with about 150 participants from the Rostock science region and beyond. A program of workshops and keynote talks around reproducible and open data analysis was presented, see below. The results of two polls regarding expections and potential future topics of the ORDS scientific network can be seen at the right. The great interest in the event made clear how many scientists, particularly young scientists, care about the topic. The lively discussions showed that regional networking and thematic training is very much desired, as all aspects of reproducible data analysis can be very challenging. This is exactly where the ORDS network would like to pick up.
As primary communication channel, the ORDS channel at the institutional chat system of the University of Rostock was presented. Members of the University of Rostock can join the channel, by following this invitation. Interested external researchers are invited to get in contact via the graduate academy and to join the ORDS mailing list.
As first results of the kick off, planning of new events already started. For example, there will be a ReproHack event where, we try to reproduce an already published study. We invite everyone to propose publications for this event. To do so, please feel free to use the discussion board in the ORDS channel or just send an email. Likewise, we are pleased to announce that a Python learning group has been formed. For more information please have a look at the ORDS channel.
Kick off material
Finally, in the spirit of Open Science, we would like to turn you attention to the published material of the kick off event.
- Slides of the keynote talk Reproducibility and Peer Review by Daniel Nüst
- Slides and video of the keynote talk Let's become (Open) Science Champions! by Heidi Seibold
- Code of the tutorial on Reproducible statistical data analysis with R and RMarkdown by Anja Eggert
- Code of the tutorial on Reproducibile Data Science with Jupyter Notebooks by Max Schröder and Frank Krüger
Virtual kick off meeting of the ORDS network on 1st December 2020
In the morning: 9:00 - 11:30 a.m.
Opening & Words of Welcome
Vice Rector für Research and Transfer of Knowledge, Udo Kragl
What is the ORDS network?
Anja Eggert and Frank Krüger
Keynote: Let's become (Open) Science Champions!
Project: Let’s organize a ReproHack in Rostock!
Workshop: Reproducible statistical data analysis with R and RMarkdown
Afternoon: 1:00 - 4:00 p.m.
Keynote: Reproducibility and Peer Review
Workshop: Reproducible Data Science with Jupyter Notebooks
Frank Krüger and Max Schröder
Future Perspectives & Closing Remarks
Anja Eggert and Frank Krüger
KEYNOTE: Let's become (Open) Science Champions!
What was your reason to start working in science? To produce new knowledge? To make the world a better place? If that is the case, you do not want to be a part of the problem in scientific misconduct or the reproducibility crisis. Even more, you want to be part of the solution and be a real science champion. In this talk I will discuss which people I see as science champions (spoiler: open scientists) and which steps each of us can take on the road to becoming a science champion.
Heidi Seibold is a group leader at Helmholtz AI. Her group "Open AI in Health" works on improving practices of open and reproducible research in artificial intelligence and health research. Dr. Seibold studied statistics at LMU Munich and did her PhD in computational Biostatistics at the University of Zurich, where she developed tree methods for personalized medicine and developed her interest in open and reproducible research.
KEYNOTE: Reproducibility and Peer Review
Reproducible research and peer review are cornerstones of science today, but are they getting along? In this talk, Daniel presents challenges and opportunities of executing code-based workflows as part of peer review processes. Learn what he thinks how you can, and should, change your habits today, what institutions and communities can do, and how the future looks like once research compendia become the norm in scholarly communication.
Daniel Nüst is researcher at the Institute for Geoinformatics, University of Münster, Germany. He completed his studies in Münster with a Diploma in Geoinformatics and worked at 52°North Initiative for Geospatial Open Source Software as a consultant and software developer. Since 2016, Daniel develops tools for creation and execution of research compendia in geography and geosciences in the project Opening Reproducible Research. He is Reproducibility Chair at the AGILE Conference 2020, Co-PI of CODECHECK and vice chair of the German Society for Research Software.
PROJECT: Let’s organize a ReproHack in Rostock!
ReproHacks are one day reproducibility hackathons where participants attempt to reproduce papers from associated published code and data. The events act as a sandbox for practicing reproducible research, providing opportunity for authors to practice generating and publishing reproducible papers and for participants to reproduce, reuse and review other researchers work! In this talk, we’ll describe in more detail how the events work, the various flavours of ReproHacks available and some initial findings from events so far. We’ll also give tips for folks considering running their own events, especially for this group here in Rostock!
Daniela Gawehns is a PhD student in computer science at Leiden University. Her research focuses on the integration of data from diverse data sources for data science applications in the Health Sciences. Before her doctoral studies, she obtained a Masters degree in Clinical Neuropsychology and a Masters degree in Statistical Sciences. She is interested in reproducible research practices for machine learning research and promotes making computer science an open, diverse and welcoming field of research. Within the ReproHack core team, She is in charge of most of the outreach via twitter and designing training materials for future ReproHack organizers.
WORKSHOP: Reproducible statistical data analysis with R and RMarkdown
This tutorial provides you with the first steps how you can create a reproducible workflow of your data and statistical analysis. Using real-world data, we will write analysis code and graphics code in R in the RStudio interface. Setting up an R project even allows to use R in combination with the version control system Git. Finally, we will knit a so-called “dynamic report” in R Markdown, which ensures that each report is consistent with the actual statistical results. The example data set will be available on GitHub.
Anja Eggert is researcher and statistical advisor at the Leibniz Institute of Farm Animal Biology, Dummerstorf, where she promotes open and reproducible data science and statistics. She studied Marine Biology at the University of Bremen and did her PhD on aspects of climate change on global distribution of seaweeds at the University of Groningen. She continued working in phycology at the University of Rostock. With her strong affinity to big data, she changed her focus to programming numerical ocean models in research projects at the Leibniz Institute of Baltic Sea Research, Warnemünde.
WORKSHOP: Reproducible Data Science with Jupyter Notebooks
Publishing not only the results, but also source code and data is central in the discussion about open science and the FAIR principles. Literate programming is the concept of interweaving documentation, code, and data and, thus, fosters the publication of a comprehensive document containing not only the results of a research analysis, but the analysis itself. Jupyter notebooks are one implementation of this concept. This workshop employs Jupyter notebooks examples to illustrate these aspects.
Frank Krüger is researcher at the Institute of Communications Engineering, University of Rostock with interests in research data management, natural language processing and provenance modelling. He works in the Infrastructure Support project of the CRC 1270 ELAINE and investigates how techniques of automatic information extraction and machine learning can be used for the documentation of research processes and the resulting data. Frank studied computer science and did his PhD about human activity and plan recognition from noisy sensor data.
Max Schröder is a doctoral researcher at the Infrastructure project of the CRC 1270 ELAINE with interests in provenance and semantic modeling, virtual research environments, and research data management in collaborative and interdisciplinary research projects. Before, he studied computer science at the University of Rostock with a specialization in Smart Computing. Besides his research, he promotes open and reproducible science in order to foster high quality research.