From research data to data-centric research
Claudia Draxl
Physics Department and Iris Adlershof, Humboldt-Universität zu Berlin
The vast amounts of research data generated daily in the field of materials science are considered a gold mine of the 21st century. How can we turn this resource into knowledge and value? Data-centric approaches – machine learning (ML) and, more generally, artificial intelligence (AI) algorithms – have entered many scientific fields. They complement our traditional research and have already been successfully used to predict new materials with improved properties. However, one drawback remains that almost all of these investigations are based on data sets that have been created or adapted for the specific purpose. Therefore, ML results are mainly interpolations rather than out-of-the-box predictions. To change this situation, data from different sources must be brought together. Here, a comprehensive FAIR (Findable, Accessible, Interoperable, and Re-usable) data infrastructure plays a critical role. Leveraging the knowledge created by the entire community promises major breakthroughs for AI in materials research, but also comes with challenges. I will discuss how the consortium FAIRmat is dealing with all these aspects, address the issue of sustainable research, and show examples of how data can be used by AI to generate insight that cannot be gained from single investigations.