How scientists want to share data in the future – Knowledge

By Christian J Meier

In the experimental hall L05-03 of the Technical University of Darmstadt, Grigorios Hatzissawidis shows a video that he recorded with a high-speed camera: water flows around the profile of a boat’s hydrofoil, vortices and bubbles form. “The camera takes up to 20,000 pictures per second,” says the research assistant – an example of how data-intensive research can be. At the Institute for Fluid Systems Technology, Hatzissawidis and his colleagues want to make this flood of data accessible to others.

Because research data potentially benefits not only those who collect it. Others can use them, compare them with their results, and draw new conclusions from them, even years later. For example, the data from the high-speed camera can train an artificial intelligence that predicts the formation of bubbles. This allows you to analyze recordings from simple cameras that record at normal speed.

This refinement of research data to become common knowledge for science is the idea behind the National Research Data Infrastructure (NFDI). Around a thousand researchers from various disciplines are currently setting up this network.

“It shouldn’t just have been a flash in the pan”

York Sure-Vetter, director of the NFDI, describes the situation as follows: “We are drowning in data, but we cannot find it.” There is a lack of interconnected data rooms for science, he says, and means protected virtual places that facilitate the exchange of data across disciplines. For example, if art historians want to find out how old an image is based on the color of it, they could find material-scientific data on the pigments used there.

Since 2020, such data rooms have been created, initially within individual subject areas, 26 consortia have been formed. They are less concerned with the hardware than with the data and competent handling of it. The data get a kind of identity card, says Sure-Vetter. In addition to a unique ID number, there is so-called meta information. “Data has a context without which another researcher cannot understand it,” says Sure-Vetter. “Similar to an Excel column without a heading.” Metadata describe this context with additional information, such as the type and serial number of the measuring device, the uncertainty of measured values ​​or the objective of a sociological survey.

The reality is far from that. Laboratory books, for example, are often still handwritten. They contain important information without which the results of an experiment can hardly be understood. For chemists, the “NFDI4Chem” consortium has developed an electronic laboratory notebook in which the experimental data is supplemented with comments, images, diagrams or the composition of the samples.

Although the advantages are obvious, it is currently not very attractive for scientists to share their data, which is why they hardly prepare them for this purpose. The reputation of a researcher depends above all on publications in classic specialist journals. Researchers also identify with their data, often calling it their “own ideas” or “heart and soul,” as the sociologist Eva Barlösius from the University of Hanover said at an NFDI conference in April. This also inhibits the eagerness to share data.

Sure-Vetter calls the willingness to do so the “biggest hurdle”. It needs a culture change. This includes the fact that a shared data set contributes to a researcher’s reputation just as much as a conventional specialist publication.

But data security, data protection, licensing and copyright aspects are also key so that researchers can develop trust in the new infrastructure, emphasizes Canan Hastik, an expert for research data at TU Darmstadt. “The principle of freedom of research requires that nobody with a commercial interest may have access to the data,” says the data manager, who is involved in two NFDI consortia. Personal data, for example from social science surveys, would only be made accessible via “complex access models” that take into account anonymization and other data protection guidelines, explains Hastik.

Sure-Vetter emphasizes that it is just as important to provide researchers with the digital skills to prepare research data in such a way that they can be found. At the TU Darmstadt, they start with the bachelor’s degree courses. In a “digitization internship”, students build the digital 3D model of a Lego car. Every digital component is enriched with data: what does it cost, how much does it weigh, what color is it? “The students learn how to generate added value by linking this data,” explains research assistant Philipp Wetterich. For example, how to put together a lightweight vehicle that is as inexpensive as possible.

But the culture change is profound and will hardly take place within a few years. This could become a problem for the NFDI. Because the federal and state governments are initially only funding the project until 2028. The researchers who are building the infrastructure are concerned about the continuation. “The consortia have done enormous development work in the first three years,” says Sure-Vetter. “It shouldn’t just have been a flash in the pan.”

source site