Learning to Read Data

The Göttingen Campus represents a network of universities and university-affiliated institutions recognised for rigour and robustness in research. As such, it is at the forefront of development in the field of Data Science. By drawing on established expertise and infrastructural resources, the project “Learning to Read Data” is aimed at promoting data-processing skills throughout the Campus as a whole. Working on this basis, the purpose of the project initiated by the Göttingen Campus is to offer broadly based and generally available tools for acquiring basic data-handling skills in all bachelor degree courses. The project is based on a three-pronged concept involving: offering a purpose-designed lecture course entitled “Data Literary Basics”, setting up a DataLab, and to round off, editing and providing “Open Educational Resources”. This concept has been submitted by the University of Göttingen in participation with the Data Literacy Education project sponsored by the “Stifterverband” [Donors’ Association for the Advancement of Science] and the Heinz Nixdorf Foundation.

Under the following link the current course materials can be found (GitHub).

Project related OER materials are available on the following page (Link).

The term “data competence” refers to a variety of skills, such as the ability to explore data, examine it with the assistance of a computer and interpret the results gained. As such, data competence goes beyond the traditional boundaries of subjects, combining skills acquired in computer science, statistics and mathematics not only with related ethical issues and aspects of the humanities, but also with subject-specific knowledge. Until now, the curricula at most German universities do not cater adequately or at all for teaching data handling skills to a substantial number of students.

This project, as initiated by the Göttingen Campus, is thus aimed at offering broadly based and readily available tools for teaching basic data handling skills to all students working for a bachelor’s degree. Three important building blocks are constituent in achieving this:
  1. the interactive lecture course on Data Literacy Basics, available to students of any subject doing a bachelor’s degree, teaching the fundamentals of data competence, practice-focused and research-related,

  2. setting up a DataLabs that acts as an interface between not only the different fields of study from which the students come, but also regional business and industry as well as social actors, and, in addition, the research scientists working together with CIDAS [Campus -Institute Data Science], their aim being to apply the theory of the lecture course to practical projects,

  3. Open Educational Resources, accordingly selected and checked for high standards, complement the lecture course and DataLab in teaching data handling skills. Adapted to the examination requirements of the degree programme in question, they can contribute towards the overall assessment.

Dalele_Säule_englisch

The overarching aim in the teaching concept is that “all graduates may acquire skills in data handling that are important to them for their studies, research, job and social participation”. In a peer-to-peer assessment procedure, promoted by the “Stifterverband” and CHE [Centre for the Development of Universities], heads of universities, lecturers, students and ancillary services together have analysed the strengths and weaknesses and identified priority areas for implementation.



The “Data Literacy Basics” course

The summer term of 2019 will see the start of “Data Literacy Basics”, a joint course offered by the Göttingen Campus and spearheaded by the Centre for Statistics with accompanying tutorials. As a new course, it teaches the fundamentals of data handling skills for students studying for a bachelor’s degree. It will take place once a week with two hours of lectures and tutorials in the DataLab so that the students will be able to learn the relevant essentials of practical application by analysing suitable datasets.

In order to ensure that this course, designed for undergraduate students of any subject, is recognised as an option on the curriculum, it will be available as such as from the summer term of 2019 as part of the university-wide core competence courses. As from the summer term of 2020, the course in question will be available to the Faculty of Arts, the Faculty of Social Sciences and the Faculty of Economics, and in the content area of teaching training. The special focus of attention here is on the subjects offered by the Faculty of Arts and the Faculty of Social Sciences where teaching data handling skills has a particularly extensive innovatory potential. The intention is to include further faculties by the summer term of 2021.

In terms of content, the planned course focuses especially on

  • a lecture with a method-based approach
  • exercise material in the tutorials that is subject-specific,
  • providing learning content und suitable data sets for practice in the form of modules that are open and accessible online, and complementary to the lecture,
  • an integrated, hands-on approach, directly applying the methods taught,
  • involving, at an early stage, local businesses, research institutions and social actors not only in the planning of the course content but also in providing practical exercise material and examination projects.

The lecture draws from interactive support from a browser-based programming environment that is easily available to students logging into a university account on their laptop.

DatenLesenLernen_Ablauf_englisch



Learning a scripting language

So that students are able to use tools and libraries in handling data, they need to learn a scripting language. The decision as to which language is suitable depends not only on how easy it is to learn it but also on how broadly it can be applied. The aim is to give students the tools with which they can subsequently work in any research area no matter what the subject matter is. Python offers great advantages here because of its widespread use in many disciplines and because of its open-source nature. Together with Jupyter Notebooks and the JupyterHub, Python-Universum also offers two browser-based tools that greatly facilitate teaching and learning, and allow a prompt start.


Collecting, reading, writing and editing data

In this part of the lecture the focus is not only on introducing the student to collecting and managing data but also on facilitating (automated) reading, writing and editing data in standard formats. For this purpose, students require an introduction to the nature of data that is short and practical, and that puts what is learned into a greater social and scientific context. Next, the lecture shows how to deal with questions on organising, manipulating and converting data with the help of established and generally used tools from the Python “toolkit” (Python Standard Library, Pandas). Finally, the lecture discusses the issue of incomplete, corrupted entries in datasets and what effects incomplete datasets can have on analytical results.


Exploring data

This part of the lecture deals in more detail with the question of how a library and its methods can be developed and used. Generally accessible libraries (Pandas, Matplotlib) that facilitate dealing with data are used to calculate and visualise simple statistical quantities. With the help of these tools students can learn how to interpret the results.


Statistical analysis

Here again, the focus is not so much on acquiring in-depth theoretical knowledge of statistics as on the ability to apply methods. With the help of libraries (SciPy, Scikit-learn) linear regressions and simple methods from the subject area of machine learning can be quickly applied to accessed datasets. In doing so, it is possible to draw attention to the problems that may occur using the methods or interpreting the results. The aim here is to give guidance in questions relating to

  • what methods are suitable to deal with the respective questions and data,
  • how the results gained can be used to find answers to the original questions and where the weaknesses of different methods lie.


Ethics, data protection and publishing data

As part of a broad review, the lecture on “Data Literacy Basics” puts the discussed contents into the overall context of working with data and underlines the significance of aspects of data ethics, data protection and publishing. In this way, students are encouraged to extend and enhance their successfully acquired skills according to their individual needs in further course modules. For this purpose, they can choose from a collection of suitable Open Educational Resources, which forms the third pillar of this project.



Setting up a DataLab

The second integral part of our initiative involves setting up a DataLab that is closely linked to the lecture and can be practically applied in the different disciplines. The DataLab offers an opportunity to hold tutorials and supervise examination projects connected with the lecture described above. But it also includes the already existing advice services for undergraduate and PhD students at the University of Göttingen, and uses these for the purpose of teaching. In addition, the DataLab benefits from the systematic involvement of young academics, that is, more than 3,000 PhD students at the University of Göttingen and the Göttingen Campus, and also approximately (expected) 100 students per academic year studying Data Science degree courses. A large number of PhD students has already acquired well-grounded expertise in subject-related data analysis and would be able to pass this on to participants of the course described. On the other hand, students from the Data Science study courses working as tutors can use their skills for the technical side of data-handling. In this way, the DataLab is able to teach participating students the subject-specific application of data skills within their own area of study. In addition, the DataLab is a focal centre for regional businesses to promote cooperation in all aspects of data analysis.

Tutorials and project papers

Weekly, complementary tutorials are offered with the lecture on “Data Literacy Basics”. In the first phase of these tutorials, students are to learn how to work independently using their data-handling skills with the help of practical and subject-specific examples. The emphasis here is on help in solving subject-specific problems and on acquiring further independent skills in applying the script language and the tools to deal with data-related problems. The following are three practical examples taken from the Faculties of Economics, Social Sciences and Arts respectively:

  • In tutorials for students of the Faculty of Economics the complex relationships between socio-economic variables such as working hours, wages and unemployment can be analysed with the help of comprehensive panel databases and by using regression techniques.
  • In a tutorial for the Faculty of Social Sciences the online, freely available election manifestos of the parliamentary parties can be uploaded and compared with the help of corpus-based research methods.
  • The acquired data-handling skills can be used in the Faculty of Arts, for example, to elucidate topometric tests carried out in connection with the 3D campus laboratory in the Department of Classical Archaeology. The computer-based comparison and analysis here can give the students more detailed insight.

In addition to the tutorials, the examination projects at the end of the lecture on “Data Literacy Basics” are also supervised by the DataLab. Students work at a data-related problem that is relevant to their subject and thus demonstrate that they can apply the acquired skills to a practical situation. Projects are allotted to small groups of 3-5 undergraduate students, whereby each group is supervised by a tutor. Intensive supervision thus guarantees all-round support for the students.


Producing practical examples on the basis of data consulting

Since 2012 the Centre for Statistics has been offering advice in statistics for those completing a bachelor’s thesis in any area of study at the University of Göttingen. This is complemented by a range of subject-specific advice options offered to undergraduate and postgraduate students by seven different faculties. In setting up a DataLab a central advisory facility can give undergraduate and postgraduate students at the Göttingen Campus an opportunity to follow up any questions about data handling. In this way, existing facilities are being improved and extended. A policy of central and focused information adds to more awareness among students of the DataLab as a source of support.

Part of the work of the advice service is to systematically collect relevant questions and datasets for the purpose of providing, where possible, a feedback for the university teaching programme. In particular, questions, datasets and insight gained are to be prepared in such a way that they can be used as practical examples in the introductory lecture and the corresponding tutorials, the aim being to cover as many areas of data-handling skills as possible. In addition, our Data Consulting also helps tutors to deal with questions in examination projects relating to the planned course.


A centre for social actors and regional industry

Based on already existing networks within the Göttingen Campus, the SüdNiedersachsenInnovationsCampus [Southern Lower Saxony Innovation Campus] and the Measurement Valley, the Datalab is intended to function not only as a focal point for businesses and further social actors but also as a platform for an exchange of questions between students, researchers, businesses and non-university institutions addressing specific issues relating to data analysis. This feeds into the planned project dealing with data-specific issues from non-university partners that can be a topic for examination projects.



Open Educational Resources (OER)

In addition to the lecture and DataLab, there are further courses offered by other universities, representing a third pillar of the project and providing students with freely available study material. This could involve issues like “data organisation”, “quoting databases”, or in-depth study of data ethics, computerised decision-making or the publication of data.

Content and scope

The emphasis here is primarily on less demanding courses that are also viable for students from the Faculties of Arts and Social Sciences. Within the framework of the project only a limited number of innovatory ideas relating directly to the planned lecture are to be included (for example, recordings of the lecture); instead, with the target group in mind, material already available from other providers is to be compiled and catalogued. The project staff are responsible for this on the one hand, but on the other hand students and teachers may make relevant suggestions and state specific needs. Modules varying in length of time and depth of content can meet the different needs.


Reconition of course credits

With relevant extramural and internal advisory boards collaborating, there is a project-related regulatory procedure in place that determines the overall framework and practical issues of recognition of credits within the curricula. For the purpose of cataloguing, module descriptions are compiled and approved on the basis of this. Similar modules, that are also offered at the same time, are systematically compared as to their suitability and recommended accordingly.


Accessing material

The assembled information on the overall procedure and the catalogued OER is coordinated with partners in the eCULT-Netzwerk and published online. Based on this arrangement, students and course coordinators at other universities can also access data that is linked to the respective ER material. In this way, relevant information about corresponding courses can be made available beyond the University of Göttingen.




Evaluation and quality management of courses on offer

The evaluation of new courses teaching broad-ranging data-handling skills is subject to different procedures. The EvaSys evaluation system, already a standard procedure at the university, allows an initial evaluation of the individual courses, especially of the lecture on “Data Literacy Basics” and the DataLab. The intention is to make increased use of this possibility for the purpose of receiving a detailed feedback from the students, especially in newly introduced courses. Evaluation of the lecture on “Data Literacy Basics” and the DataLab is also undergoing a formative process which includes social media channels.

At the same time, a board of advisors will ensure that the idea of the project– teaching data-handling skills to undergraduate students from all the faculties – is implemented. The board will consist of researchers from universities and university-affiliated research institutions, stakeholders in industry and society, and student representatives who, once a year, will together evaluate the courses and the goals achieved so far, offering advice to the project team for changes and improvements. The evaluation will be based on an annual report that contains indicators denoting course attendance and capacity, the evaluation results of the EvaSys evaluation system and further relevant information.


Why is learning data handling skills so important?

The increasing digitalisation of industry, science and society is making the handling of data and understanding the information obtained from data a core competence for participation in society. This is reflected on the one hand in the increasing importance of the corresponding qualifications required for job applications in many fields of work. On the other hand, there has been a significant increase in the number of specialised data science courses that are focused on computer science, statistics or mathematics. As a result, there is a critical imbalance between the training of data scientists and the great demand for young professionals with data handling skills, but who are not specialised data scientists.


What are the credentials of the Göttingen Campus for the “Data Literacy Basics” course?

The Göttingen Campus has collaborated for many years with the University of Göttingen, the Faculty of Medicine , five Max Plank Institutes, Deutschen Zentrum für Luft- und Raumfahrt [the German Aerospace Centre], Deutschen Primatenzentrum [the German Primate Centre] and other non-university research institutions. These are excellent credentials for creating a cross-faculty, campus-wide teaching programme for data handling skills. The support programme “Data Literacy Education” gives the Göttingen Campus an opportunity to establish a programme of this kind, target-driven for undergraduate students, and to secure its place permanently in the curriculum.

As members of the Board of Trustees at the university, the Vice President for Study and Teaching and the full-time Vice President for Facilities, Infrastructure and Operations are responsible for promoting the digital teaching programme. The Göttingen Centre for Statistics with a staff of academics from seven faculties has longstanding experience in interdisciplinary and cross-faculty tuition not only in the master degree course Applied Statistics but also in the doctoral and certificate programme Applied Statistics and Empirical Methods. The Centre for Statistics also offers statistical advice in various forms (for undergraduate and PhD students, and research institutions) and is therefore well-equipped to be able to assess the current demand for data competence in various subject areas.

With the support of the eResearch Alliance the university has also, over the past few years, gained substantial experience and competence in the area of data management covering the whole cycle of data processing (designing and collecting, editing and analysing, storing long-term, publishing, and reusing existing data). The Service für Digitales Lernen und Lehren [Service for Digital Learning and Teaching] offers teachers as partners didactic advice in dealing with questions on digital skills and in using new media. There is an intensive exchange between the Lower Saxony network eCult+, promoted in the second phase by BMBF, and twelve other universities on the subject of learning and teaching digital technology. The society for scientific data processing (GWDG) also offers professional support for advanced IT-infrastructure. In addition, the GWDG and the eResearch Alliance together with other partners like the Centre for Statistics are already highly successful in organising annual, interdisciplinary Data Science Summer Schools. This underscores how well the various institutions are interlinked and points to their shared focus on data science. The Göttingen Campus has excellent contacts with regional industry, management and society, especially through the SüdNiedersachsenInnovations-Campus and the Measurement Valley.

The Campus Institute for Data Science (CIDAS) is at present in the process of establishing itself as a partner institution of the Göttingen Campus. The CIDAS is at the interface between computer science, statistics and mathematics, and applied disciplines especially within the field of economics, social sciences and the natural sciences. Its purpose is to combine the most recent developments in scientific methods in various areas of data science (e.g., machine learning, artificial intelligence, simulation-based methods, Applied Statistics, etc.) with internationally competitive research in the profile fields of the University of Göttingen and its associated research institutions. This involves not only scientific topics that are currently be researched in various scientific centres but also the integration of new key areas of work by way of four professorships yet to be announced (in the fields of data science, artificial intelligence and machine learning). The CIDAS will also host the bachelor degree courses Applied Data Science and Mathematical Data Science, to commence in the winter semester of 2018/2019.

The Göttingen Campus offers an excellent environment for research and teaching in data science for students specialising in this area. Especially because of the way the it is organised, the CIDAS is set to give on-going impetus to future projects and to continue its dynamic development. The three components of the project “Learning Reading Data” make it possible to add an important dimension to these developments and teach data handling skills covering the whole spectrum of university education for undergraduate students in all fields of study. The planned concept also allows for flexibility in adapting and scaling course contents and methods for a large number of students, it makes it possible to link theory and practice on a broad basis and to involve stakeholders in the process of development. In the medium term, there will be a review of the question of whether the measures in place can be brought together and complemented to produce a Data Literacy Certificate, giving students formal confirmation of their acquired data handling skills, over and beyond recognition of credits within the curriculum.


Structures and Team

The project team consists of two essential groups. A board of directors, which accompanies and instructs the implementation of the planned goals; and a group of individuals, whose task it is to takes care of the concrete implementation of the underlying project goals. The board of directors consists of the following persons:

  • Prof. Dr. Andrea Bührmann (Vice-President of the University of Göttingen of Studies, Teaching, Equality, and Diversity)
  • Prof. Dr. Norbert Lossau (Vice-President of the University of Göttingen for research and information infrastructure)
  • Prof. Dr. Thomas Kneib (Speaker of the Centre for Statistics, Professor for Statistics)
  • Prof. Dr. Ramin Yahyapour (Managing Director of the GWDG, Speaker of the Cener for Applied Computer Science, Professor for Practical Informatics)
  • Prof. Dr. Stephan Herminghaus (Max-Planck-Institut for Dynamics and Self-Organization)
  • Prof. Dr. Stefan Halverscheid (Professor for Mathematics Education)
  • Prof. Dr. Albert Busch (Dean of Studies of the Faculty of Philosophy)
  • Prof. Dr. Stefan Dierkes (Dean of Studies of the Faculty of Economic Sciences)
  • Prof. Dr. Timo Weishaupt (Dean of Studies of the Faculty of Social Sciences)
  • Claudia Trepte (Manager Measurement Valley)

The management board thus combines scientific expertise with the necessary decision-making authority with regard to university teaching, so that the course can be successfully organised and promptly and broadly anchored in the university.

The team for implementation consists of

  • Dr. Benjamin Säfken (scientific coordinator at the Centre for Statistics with extensive experience in establishing quantitative courses in university curricula)
  • Dr. Alexander Silbersdorff (Statistical consulting at the Centre for Statistics with extensive experience in the design and implementation of statistical courses for heterogeneous audiences)
  • Jana Lasser und Dr. Debsankha Manik (MPI for Dynamics and Self-Organization with experience in delivering multidisciplinary Data Literacy courses using Python and Open Educational Resources
  • Dr. Wolfgang Radenbach (Head of the digitization department for studies and teaching with extensive experience in Open Educational Resources)
  • Lea M. Dammann (Research Assistant and Doctoral Candidate at the Chair of Statistics)
  • René-M. Kruse (Research Assistant and Doctoral Candidate at the Chair of Statistics)