CSS (computational social science) is located at the interface of social science and computer science. Social phenomena are therefore the main focus of our research. They are, however, analyzed using "new" data types.
In particular, this includes process-generated data, such as minutes of plenary proceedings, scientific texts and collaborations, or messages on social media channels (such as tweets). Methodologically, CSS combines inferential statistics with iterative calculation rules (e.g. algorithms), or Bayesian probability classifications. These can then be used to identify topics within large quantities of texts, to predict economic growth, or to investigate the coevolution of social relationships and attributes.
In addition to data issues and the intertwined development of innovative methods, our group also focuses on their "social science fit". This means, we are commited to ensuring the validity and reliability of data and also want to ensure the theoretical applicability of the methods.
Research foci of the department at a glance:
Machine learning generally refers to a process in which computers learn from data. Machine learning can be understood as one of the most important methods of artificial intelligence. We can draw a distinction between "supervised" and "unsupervised learning" algorithms:
- Supervised machine learning requires both input and output data to “learn” (e.g. Naive Bayes classifications). Thereby, training data provides the program with examples of classifications, i.e., it links certain features of the data to an outcome. An algorithm then tries to derive the best possible classification for further data from the training data set. Hence, it enables you to classify unknown data.
An introduction to the topic can be found here
- Unsupervised machine learning tries to recognize patterns in data with the help of non-assigned data (i.e. without training data) and to group them (e.g. cluster analysis, topic modeling). The groups are divided based on statistical similarities and differences. It does not refer to existing classifications as in (i), but rather classifies directly “from the data” by utilizing its statistical properties.
An introduction to the topic can be found here.
Social Network Analysis (SNA) describes relations between actors and examines the significance of network structures (e.g. gatekeeper or broker positions) for social integration, economic or political processes or general social developments. The rapidly expanding field of research is driven by relational measures for actor-based network positions (e.g. "centrality" of actors) and structural data for the description of overall networks (e.g. the identification of "communities"). In the last decade, numerous advances have been made in understanding the dynamics and modelling of networks (e.g. through "Exponential Random Graph" or "Stochastic Actor-Oriented" models).
Natural Language Processing (NLP) is located at the interface of computer science and linguistics. With the help of computers it is attempted to record ("natural language understanding"), classify ("speech recognition") and analyse natural language. This method has numerous applications in the social sciences, e.g. large text corpora can be summarized "automatically" and reduced to their central dimensions ("Topic Models"). This allows broad discourses to be mapped quantitatively and over long periods of time. Fields of application for this are the analysis of process generated data, for example plenary protocols of the German Bundestag, tweets, but also newspaper articles.