About

My name is David Wille and I grew up and went to school in the northern part of Germany in a small Lower Saxon town close to Bremen. Currently, I am working as a computer scientist with focus on software development in the automotive sector at IAV GmbH in Gifhorn, Germany.

I am interested in traveling, photography, sports (working out at the gym and tennis), good food, cooking and meeting my friends.

Below you can find my curriculum vitae and a list of my publications.

You can also find me on the following platforms:

Linkedin logo

Curriculum Vitae

Dr. David Wille
Braunschweig, Germany
Professional Experience
Computer Scientist
since 05/2019
IAV GmbH, Gifhorn
Research Associate / Project Leader
09/2018 - 04/2019
Working Group Model-Based Software Engineering, Institute of Software Engineering and Automotive Informatics, Technische Universität Carolo-Wilhelmina Braunschweig

Functional specification of a system to analyze the compatibility of systems in cars after updating / replacing software or hardware. Project coordination and personelle responsibility.

Research Associate / PhD Student
01/2015 - 11/2018
Working Group Model-Based Software Engineering, Institute of Software Engineering and Automotive Informatics, Technische Universität Carolo-Wilhelmina Braunschweig

Research of algorithms to automatically reverse-engineer variability between related artifact variants (e.g., statecharts, MATLAB/Simulink models, source code, technical architecture descriptions) using model-based techniques. The identified varibiability information is used to visualize the identified variability relations between the systems, to improve the analyzed systems (e.g., their maintainability) by restructuring them or to transfer the artifacts to a software product line.

Research Assistant
10/2012 - 07/2013, 09/2013 - 12/2014
Institute of Software Engineering and Automotive Informatics, Technische Universität Carolo-Wilhelmina Braunschweig

Research of algorithms to automatically reverse-engineer variability between related MATLAB/Simulink model variants.

Professional Training / University
PhD in Computer Science, Graduation with Honors
01/2015 - 11/2018
Technische Universität Carolo-Wilhelmina Braunschweig
Master of Science in Computer Science, Graduation with Honors
10/2012 - 11/2014
Technische Universität Carolo-Wilhelmina Braunschweig

Focus on Software Engineering, Software Architectures, Software Product Lines and Model-based Software Engineering

Bachelor of Science in Computer Science
10/2009 - 08/2012
Technische Universität Carolo-Wilhelmina Braunschweig

Minor subject: Communication Networks

Awards
IBM Center for Advanced Studies Best Paper Award
10/2018
Awarded by the IBM Center for Advanced Studies for the paper "Reducing Variability of Technically Related Software Systems in Large-Scale IT Landscapes" at the International Conference on Computer Science and Software Engineering (CASCON) 2018 in Toronto (Canada)
HITACHI Young Best Paper Award - Research Track
09/2017
Awarded by HITACHI for the paper "Variability Mining of Technical Architectures" at the International Systems and Software Product Line Conference (SPLC) 2017 in Seville (Spain)
AutoVision Graduate Award
11/2015
Awarded by the Carl-Friedrich-Gauß-Faculty of Technische Universität Carolo-Wilhelmina Braunschweig for Outstanding Accomplishments at Master's Level
ACM Travel Award
11/2014
Awarded by the Association for Computing Machinery (ACM) to participate in the Student Research Competition of the International Symposium on Foundations of Software Engineering (FSE) 2014 in Hong Kong
ZSB//Student Award
05/2014
Awarded by the Student Counselling of Technische Universität Carolo-Wilhelmina Braunschweig for Particular Good Accomplishments at Bachelor's Level
International Experience
Presentations at International Conferences & Workshops
2013 - 2018
i. a. in Japan, Hong Kong, South Africa, Canada, Greece, The Netherlands

Presentation of research results (cf. list of publications and the corresponding post) in front of scientific audiences at well recognized international conferences and workshops.

National Institute of Informatics Japan (NII) Shonan School
03/2017
Kamiyamaguchi (Japan), Mining Software Repositories: Accomplishments, Challenges and Future Trends
Winter School
10/2016
Ede (The Netherlands), Big Software on the Run: Where Software meet Data
Intensiv Course English
08/2010 - 09/2010
International House Toronto (Canada)
Work & Travel
04/2009 - 08/2009
Australia

Publications

2019

  • [DOI] D. Wille, “Custom-Tailored Product Line Extraction,” PhD Thesis, Technische Universität Carolo-Wilhelmina Braunschweig, Germany, 2019.
    [Bibtex] [Publisher Page] [Abstract]

    Industry faces an increasing number of challenges regarding the functionality, efficiency and reliability of developed software systems. Especially in domains with safety-critical systems, such as the automotive domain or the aviation domain, these challenges increase the complexity and costs during development. A common approach to reduce the complexity and involved development effort are model-based languages, such as Matlab/Simulink and statecharts. Starting on a high level of abstraction, these languages allow to refine the functionality to be developed step-by-step. While this approach helps companies during development of single systems, the customers’ high demand for software that is specifically configured towards their requirements is an increasing challenge. As a result, companies have to efficiently develop variants that have slightly varying functionality, but still are highly similar. As reimplementation of complex functionality for each variant is no option, a common solution to cope with this problem is copying existing solutions and modifying them to changed needs. In the short-run, this so-called clone-and-own approach allows to save development costs and enables developers to easily reuse existing solutions by adding or removing parts and modifying existing functionality for a specific customer. However, this approach also involves high risks for future maintenance as the relations between the copied systems are rarely documented. For example, fixing bugs becomes a tedious task as the systems have to be maintained in isolation, and developers have to manually identify and fix them in all variants. Thus, with a growing number of such potentially large system copies, the resulting effort can become a limiting factor for a company’s productivity as increasing capacity is spent on maintaining existing systems. For a more structured reuse strategy, academia promotes to apply software product lines. This approach allows to develop and automatically generate variants from a set of reusable artifacts constituting their common and varying parts. While the approach was successfully adopted by many companies, its adoption bares initial risks as large manual effort is required to identify the common and varying parts of existing variants and to encode them in a future software product line. To overcome these problems, this thesis contributes a variability mining algorithm for existing model variants to allow their semi-automatic migration to a software product line. As industry uses a variety of different modeling languages, the focus of the approach lies on an easy adaptation for different languages. Furthermore, the approach can be custom-tailored to include domain knowledge or language-specific details in the mining process and, thus, to identify relations expected by domain experts. The first step of the approach performs a high-level analysis of variants to identify outliers (e.g., variants that diverged too much from the rest) and clusters of variants relevant for a detailed variability analysis. The second step executes variability mining to identify low-level variability relations for these clusters supporting developers during maintenance by showing common and varying parts of the corresponding variants. The third step uses these detailed variability relations to allow automatic migration of the compared variants to a delta-oriented software product line. The proposed approach is evaluated using publicly available case studies with industrial background as well as model variants provided by an industry partner.

    @phdthesis{Wil19,
    author = {Wille, David},
    title = {{Custom-Tailored Product Line Extraction}},
    institution = {Technische Universit{\"{a}}t Carolo-Wilhelmina Braunschweig, Germany},
    year = {2019},
    doi = {https://dx.doi.org/10.24355/dbbs.084-201902041142-0},
    abstract = {Industry faces an increasing number of challenges regarding the functionality, efficiency and reliability of developed software systems. Especially in domains with safety-critical systems, such as the automotive domain or the aviation domain, these challenges increase the complexity and costs during development. A common approach to reduce the complexity and involved development effort are model-based languages, such as Matlab/Simulink and statecharts. Starting on a high level of abstraction, these languages allow to refine the functionality to be developed step-by-step.
    While this approach helps companies during development of single systems, the customers' high demand for software that is specifically configured towards their requirements is an increasing challenge. As a result, companies have to efficiently develop variants that have slightly varying functionality, but still are highly similar. As reimplementation of complex functionality for each variant is no option, a common solution to cope with this problem is copying existing solutions and modifying them to changed needs. In the short-run, this so-called clone-and-own approach allows to save development costs and enables developers to easily reuse existing solutions by adding or removing parts and modifying existing functionality for a specific customer. However, this approach also involves high risks for future maintenance as the relations between the copied systems are rarely documented. For example, fixing bugs becomes a tedious task as the systems have to be maintained in isolation, and developers have to manually identify and fix them in all variants. Thus, with a growing number of such potentially large system copies, the resulting effort can become a limiting factor for a company's productivity as increasing capacity is spent on maintaining existing systems.
    For a more structured reuse strategy, academia promotes to apply software product lines. This approach allows to develop and automatically generate variants from a set of reusable artifacts constituting their common and varying parts. While the approach was successfully adopted by many companies, its adoption bares initial risks as large manual effort is required to identify the common and varying parts of existing variants and to encode them in a future software product line.
    To overcome these problems, this thesis contributes a variability mining algorithm for existing model variants to allow their semi-automatic migration to a software product line. As industry uses a variety of different modeling languages, the focus of the approach lies on an easy adaptation for different languages. Furthermore, the approach can be custom-tailored to include domain knowledge or language-specific details in the mining process and, thus, to identify relations expected by domain experts. The first step of the approach performs a high-level analysis of variants to identify outliers (e.g., variants that diverged too much from the rest) and clusters of variants relevant for a detailed variability analysis. The second step executes variability mining to identify low-level variability relations for these clusters supporting developers during maintenance by showing common and varying parts of the corresponding variants. The third step uses these detailed variability relations to allow automatic migration of the compared variants to a delta-oriented software product line. The proposed approach is evaluated using publicly available case studies with industrial background as well as model variants provided by an industry partner.}
    }
  • [DOI] C. Seidl, D. Wille, and I. Schaefer, “Software Reuse: From Cloned Variants to Managed Software Product Lines,” in Automotive Systems and Software Engineering: State of the Art and Future Trends, Springer International Publishing, 2019, p. 77–108.
    [Bibtex] [Publisher Page] [Abstract]

    Many software systems are available in similar, yet different variants to accommodate specific customer requirements. Even though sophisticated techniques exist to manage this variability, industrial practice mainly is to copy and modify existing products to create variants in an ad hoc manner. This clone-and-own practice loses variability information as no explicit connection between the variants is kept. This causes significant cost in the long term with a large set of variants as each software system has to be maintained individually. Software product line (SPL) engineering remedies this problem by allowing to develop and maintain large sets of software systems as a software family. In this chapter, we give an overview of variability realization mechanisms in the state of practice in the industry and the state of the art in SPL engineering. Furthermore, we describe a procedure for variability mining to retrieve previously unavailable variability information from a set of cloned variants and to generate an SPL from cloned variants. Finally, we demonstrate our tool suite DeltaEcore to manage the resulting SPL and to extend it with new functionality or different realization artifacts. We illustrate the entire procedure and our tool suite with an example from the automotive industry.

    @inbook{SWS19,
    author = {Seidl, Christoph and Wille, David and Schaefer, Ina},
    title = {{Software Reuse: From Cloned Variants to Managed Software Product Lines}},
    booktitle = {{Automotive Systems and Software Engineering: State of the Art and Future Trends}},
    year = {2019},
    pages = {77--108},
    publisher = {Springer International Publishing},
    doi = {https://dx.doi.org/10.1007/978-3-030-12157-0_5},
    abstract = {Many software systems are available in similar, yet different variants to accommodate specific customer requirements. Even though sophisticated techniques exist to manage this variability, industrial practice mainly is to copy and modify existing products to create variants in an ad hoc manner. This clone-and-own practice loses variability information as no explicit connection between the variants is kept. This causes significant cost in the long term with a large set of variants as each software system has to be maintained individually. Software product line (SPL) engineering remedies this problem by allowing to develop and maintain large sets of software systems as a software family.
    In this chapter, we give an overview of variability realization mechanisms in the state of practice in the industry and the state of the art in SPL engineering. Furthermore, we describe a procedure for variability mining to retrieve previously unavailable variability information from a set of cloned variants and to generate an SPL from cloned variants. Finally, we demonstrate our tool suite DeltaEcore to manage the resulting SPL and to extend it with new functionality or different realization artifacts. We illustrate the entire procedure and our tool suite with an example from the automotive industry.}
    }

2018

  • [DOI] D. Wille, Ö. Babur, L. Cleophas, C. Seidl, M. van den Brand, and I. Schaefer, “Improving Custom-Tailored Variability Mining Using Outlier and Cluster Detection,” Science of Computer Programming, vol. 163, p. 62–84, 2018.
    [Bibtex] [Publisher Page] [Abstract]

    To satisfy demand for customized software solutions, companies commonly use so-called clone-and-own approaches to reuse functionality by copying existing realization artifacts and modifying them to create new product variants. Lacking clear documentation about the variability relations (i.e., the common and varying parts), the resulting variants have to be developed, maintained and evolved in isolation. In previous work, we introduced a semi-automatic mining algorithm allowing custom-tailored identification of distinct variability relations for block-based model variants (e.g., MATLAB/Simulink models or statecharts) using user-adjustable metrics. However, variants completely unrelated with other variants (i.e., outliers) can negatively influence the usefulness of the generated variability relations for developers maintaining the variants (e.g., erroneous relations might be identified). In addition, splitting the compared models into smaller sets (i.e., clusters) can be sensible to provide developers separate view points on different variable system features. In further previous work, we proposed statistical clustering capable of identifying such outliers and clusters. The contribution of this paper is twofold. First, we present guidelines and a generic implementation that both ease adaptation of our variability mining algorithm for new languages. Second, we integrate our clustering approach as a preprocessing step to the mining. This allows users to remove outliers prior to executing variability mining on suggested clusters. Using models from two industrial case studies, we show feasibility of the approach and discuss how our clustering can support our variability mining in identifying sensible variability information.

    @article{WBC+18,
    author = {Wille, David and Babur, {\"{O}}nder and Cleophas, Loek and Seidl, Christoph and van den Brand, Mark and Schaefer, Ina},
    title = {{Improving Custom-Tailored Variability Mining Using Outlier and Cluster Detection}},
    journal = {{Science of Computer Programming}},
    year = {2018},
    publisher = {Elsevier},
    pages = {62--84},
    volume = {163},
    doi = {https://dx.doi.org/10.1016/j.scico.2018.04.002},
    abstract = {To satisfy demand for customized software solutions, companies commonly use so-called clone-and-own approaches to reuse functionality by copying existing realization artifacts and modifying them to create new product variants. Lacking clear documentation about the variability relations (i.e., the common and varying parts), the resulting variants have to be developed, maintained and evolved in isolation. In previous work, we introduced a semi-automatic mining algorithm allowing custom-tailored identification of distinct variability relations for block-based model variants (e.g., MATLAB/Simulink models or statecharts) using user-adjustable metrics. However, variants completely unrelated with other variants (i.e., outliers) can negatively influence the usefulness of the generated variability relations for developers maintaining the variants (e.g., erroneous relations might be identified). In addition, splitting the compared models into smaller sets (i.e., clusters) can be sensible to provide developers separate view points on different variable system features. In further previous work, we proposed statistical clustering capable of identifying such outliers and clusters. The contribution of this paper is twofold. First, we present guidelines and a generic implementation that both ease adaptation of our variability mining algorithm for new languages. Second, we integrate our clustering approach as a preprocessing step to the mining. This allows users to remove outliers prior to executing variability mining on suggested clusters. Using models from two industrial case studies, we show feasibility of the approach and discuss how our clustering can support our variability mining in identifying sensible variability information.}
    }
  • [DOI] K. Wehling, D. Wille, C. Seidl, and I. Schaefer, “Reducing Variability of Technically Related Software Systems in Large-scale IT Landscapes,” in Proc. of the Intl. Conference on Computer Science and Software Engineering (CASCON), 2018, p. 224–235. Toronto, Canada. IBM Center for Advanced Studies Best Paper Award.
    [Bibtex] [Publisher Page] [Abstract]

    The number of software systems in a company typically grows with the business requirements. Therefore, IT landscapes in large companies can consist of hundreds or thousands of different software systems. As the evolution of such large-scale landscapes is often uncoordinated, they commonly comprise different groups of related software systems using a common core technology (e.g., Java Web-Application) implemented by a variety of architectural components (e.g., different application servers or databases). This leads to increased costs and higher effort for maintaining and evolving these software systems and the entire IT landscape. To alleviate these problems, the variability of such technically related software systems has to be reduced. For this purpose, experts have to assess and evaluate restructuring potentials in order to take appropriate restructuring decisions. As a manual analysis requires high effort and is not feasible for large-scale IT landscapes, experts face a major challenge. To overcome this challenge, we introduce a novel approach to automatically support experts in taking reasonable restructuring decisions. By providing automated methods for assessing, evaluating and simulating restructuring potentials, experts are capable of reducing the variability of related software systems in large-scale IT landscapes. We show suitability of our approach by expert interviews and an industrial case study with architectures of real-world software systems.

    @inproceedings{WWS+18,
    author = {Wehling, Kenny and Wille, David and Seidl, Christoph and Schaefer, Ina},
    title = {{Reducing Variability of Technically Related Software Systems in Large-scale IT Landscapes}},
    booktitle = {{Proc. of the Intl. Conference on Computer Science and Software Engineering (CASCON)}},
    year = {2018},
    pages = {224--235},
    publisher = {IBM Corporation},
    note = {IBM Center for Advanced Studies Best Paper Award},
    location = {Toronto, Canada},
    weblink = {https://dl.acm.org/citation.cfm?id=3291291.3291314},
    abstract = {The number of software systems in a company typically grows with the business requirements. Therefore, IT landscapes in large companies can consist of hundreds or thousands of different software systems. As the evolution of such large-scale landscapes is often uncoordinated, they commonly comprise different groups of related software systems using a common core technology (e.g., Java Web-Application) implemented by a variety of architectural components (e.g., different application servers or databases). This leads to increased costs and higher effort for maintaining and evolving these software systems and the entire IT landscape. To alleviate these problems, the variability of such technically related software systems has to be reduced. For this purpose, experts have to assess and evaluate restructuring potentials in order to take appropriate restructuring decisions. As a manual analysis requires high effort and is not feasible for large-scale IT landscapes, experts face a major challenge. To overcome this challenge, we introduce a novel approach to automatically support experts in taking reasonable restructuring decisions. By providing automated methods for assessing, evaluating and simulating restructuring potentials, experts are capable of reducing the variability of related software systems in large-scale IT landscapes. We show suitability of our approach by expert interviews and an industrial case study with architectures of real-world software systems.}
    }

2017

  • [DOI] D. Wille, K. Wehling, C. Seidl, M. Pluchator, and I. Schaefer, “Variability Mining of Technical Architectures,” in Proc. of the Intl. Systems and Software Product Line Conference (SPLC), 2017, p. 39–48. Seville, Spain. HITACHI Young Best Paper Award – Research Track.
    [Bibtex] [Publisher Page] [Abstract]

    Technical architectures (TAs) represent the computing infrastructure of a company with all its hardware and software components. Over the course of time, the number of TAs grows with the companies’ requirements and usually a large variety of TAs has to be maintained. Core challenge is the missing information on relations between the existing variants of TAs, which complicates reuse of solutions across systems. However, identifying these relations is an expensive task as architects have to manually analyze each TA individually. Restructuring the existing TAs poses severe risks as often sufficient information is not available (e.g., due to time constraints). To avoid failures in productive systems and resulting loss of profit, companies continue to create new solutions without restructuring existing ones. This increased variability in TAs represents technical debt. In this paper, we adapt the idea of variability mining from the software product line domain and present an efficient and automatic mining algorithm to identify the common and varying parts of TAs by analyzing a potentially arbitrary number of TAs in parallel. Using the identified variability information, architects are capable of analyzing the relations of TAs, identifying reuse potential, and making well-founded maintenance decisions. We show the feasibility and scalability of our approach by applying it to a real-world industrial case study with large sets of TAs.

    @inproceedings{WWS+17c,
    author = {Wille, David and Wehling, Kenny and Seidl, Christoph and Pluchator, Martin and Schaefer, Ina},
    title = {{Variability Mining of Technical Architectures}},
    booktitle = {{Proc. of the Intl. Systems and Software Product Line Conference (SPLC)}},
    year = {2017},
    doi = {https://dx.doi.org/10.1145/3106195.3106202},
    pages = {39--48},
    publisher = {ACM},
    location = {Seville, Spain},
    note = {HITACHI Young Best Paper Award - Research Track},
    abstract = {Technical architectures (TAs) represent the computing infrastructure of a company with all its hardware and software components. Over the course of time, the number of TAs grows with the companies' requirements and usually a large variety of TAs has to be maintained. Core challenge is the missing information on relations between the existing variants of TAs, which complicates reuse of solutions across systems. However, identifying these relations is an expensive task as architects have to manually analyze each TA individually. Restructuring the existing TAs poses severe risks as often sufficient information is not available (e.g., due to time constraints). To avoid failures in productive systems and resulting loss of profit, companies continue to create new solutions without restructuring existing ones. This increased variability in TAs represents technical debt. In this paper, we adapt the idea of variability mining from the software product line domain and present an efficient and automatic mining algorithm to identify the common and varying parts of TAs by analyzing a potentially arbitrary number of TAs in parallel. Using the identified variability information, architects are capable of analyzing the relations of TAs, identifying reuse potential, and making well-founded maintenance decisions. We show the feasibility and scalability of our approach by applying it to a real-world industrial case study with large sets of TAs.}
    }
  • [DOI] D. Wille, T. Runge, C. Seidl, and S. Schulze, “Extractive Software Product Line Engineering Using Model-Based Delta Module Generation,” in Proc. of the Intl. Workshop on Variability Modeling in Software-intensive Systems (VaMoS), 2017, p. 36–43. Eindhoven, Netherlands.
    [Bibtex] [Publisher Page] [Abstract]

    To satisfy demand for customized products, companies commonly apply so-called clone-and-own strategies by copying functionality from existing products and modifying it to create product variants that have to be developed, maintained, and evolved in isolation. In previous work, we introduced a variability mining technique to identify variability information (commonalities and differences) in block-based model variants (e.g., MATLAB/Simulink models), which can be used to guide manual transition from clone-and-own to managed reuse of a software product line (SPL). In this paper, we present a procedure that uses the extracted variability information to generate a transformational delta-oriented SPL fully automatically. We generate a delta language specifically tailored to transforming models in the analyzed modeling language and utilize it to generate delta modules expressing variation of the SPL’s implementation artifacts. The procedure seamlessly integrates with our variability mining technique and allows to fully adopt a managed reuse strategy (i.e., generation of products from a single code base) without manual overhead. We show the feasibility of the procedure by applying it to state chart and MATLAB/Simulink model variants from two industrial case studies.

    @inproceedings{WRS+17,
    author = {Wille, David and Runge, Tobias and Seidl, Christoph and Schulze, Sandro},
    title = {{Extractive Software Product Line Engineering Using Model-Based Delta Module Generation}},
    booktitle = {{Proc. of the Intl. Workshop on Variability Modeling in Software-intensive Systems (VaMoS)}},
    year = {2017},
    doi = {https://dx.doi.org/10.1145/3023956.3023957},
    pages = {36--43},
    publisher = {ACM},
    location = {Eindhoven, Netherlands},
    abstract = {To satisfy demand for customized products, companies commonly apply so-called clone-and-own strategies by copying functionality from existing products and modifying it to create product variants that have to be developed, maintained, and evolved in isolation. In previous work, we introduced a variability mining technique to identify variability information (commonalities and differences) in block-based model variants (e.g., MATLAB/Simulink models), which can be used to guide manual transition from clone-and-own to managed reuse of a software product line (SPL). In this paper, we present a procedure that uses the extracted variability information to generate a transformational delta-oriented SPL fully automatically. We generate a delta language specifically tailored to transforming models in the analyzed modeling language and utilize it to generate delta modules expressing variation of the SPL's implementation artifacts. The procedure seamlessly integrates with our variability mining technique and allows to fully adopt a managed reuse strategy (i.e., generation of products from a single code base) without manual overhead. We show the feasibility of the procedure by applying it to state chart and MATLAB/Simulink model variants from two industrial case studies.}
    }
  • [DOI] A. Schlie, D. Wille, S. Schulze, L. Cleophas, and I. Schaefer, “Detecting Variability in MATLAB/Simulink Models: An Industry-Inspired Technique and Its Evaluation,” in Proc. of the Intl. Systems and Software Product Line Conference (SPLC), 2017, p. 215–224. Seville, Spain.
    [Bibtex] [Publisher Page] [Abstract]

    Model-based languages such as MATLAB/Simulink play an essential role in the model-driven development of software systems. To comply with new requirements, it is common practice to create new variants by copying existing systems and modifying them. Commonly referred to as clone-and-own, severe problems arise in the long-run when no dedicated variability management is installed. To allow for a documented and structured reuse of systems, their variability information needs to be reverse-engineered. In this paper, we propose an advanced comparison procedure, the Matching Window Technique, and a customizable metric. Both allow us to overcome structural alterations commonly performed during clone-and-own. We analyze related MATLAB/Simulink models and determine, classify and represent their variability information in an understandable way. With our technique, we assist model engineers in maintaining and evolving existing variants. We provide three feasibility studies with real-world models from the automotive domain and show our technique to be fast and precise. Furthermore, we perform semi-structured interviews with domain experts to assess the potential applicability of our technique in practice.

    @inproceedings{SWS+17,
    author = {Schlie, Alexander and Wille, David and Schulze, Sandro and Cleophas, Loek and Schaefer, Ina},
    title = {{Detecting Variability in MATLAB/Simulink Models: An Industry-Inspired Technique and Its Evaluation}},
    booktitle = {{Proc. of the Intl. Systems and Software Product Line Conference (SPLC)}},
    year = {2017},
    doi = {https://dx.doi.org/10.1145/3106195.3106225},
    pages = {215--224},
    publisher = {ACM},
    location = {Seville, Spain},
    abstract = {Model-based languages such as MATLAB/Simulink play an essential role in the model-driven development of software systems. To comply with new requirements, it is common practice to create new variants by copying existing systems and modifying them. Commonly referred to as clone-and-own, severe problems arise in the long-run when no dedicated variability management is installed. To allow for a documented and structured reuse of systems, their variability information needs to be reverse-engineered. In this paper, we propose an advanced comparison procedure, the Matching Window Technique, and a customizable metric. Both allow us to overcome structural alterations commonly performed during clone-and-own. We analyze related MATLAB/Simulink models and determine, classify and represent their variability information in an understandable way. With our technique, we assist model engineers in maintaining and evolving existing variants. We provide three feasibility studies with real-world models from the automotive domain and show our technique to be fast and precise. Furthermore, we perform semi-structured interviews with domain experts to assess the potential applicability of our technique in practice.}
    }
  • [DOI] A. Schlie, D. Wille, L. Cleophas, and I. Schaefer, “Clustering Variation Points in MATLAB/Simulink Models Using Reverse Signal Propagation Analysis,” in Proc. of the Intl. Conference on Software Reuse (ICSR), 2017, p. 77–94. Salvador, Brazil.
    [Bibtex] [Publisher Page] [Abstract]

    Model-based languages such as MATLAB/Simulink play an essential role in the model-driven development of software systems. During their development, these systems can be subject to modification numerous times. For large-scale systems, to manually identify performed modifications is infeasible. However, their precise identification and subsequent validation is essential for the evolution of model-based systems. If not fully identified, modifications may cause unaccountable behavior as the system evolves and their redress can significantly delay the entire development process. In this paper, we propose a fully automated technique called Reverse Signal Propagation Analysis, which identifies and clusters variations within evolving MATLAB/Simulink models. With each cluster representing a clearly delimitable variation point between models, we allow model engineers not only to specifically focus on single variations, but by using their domain knowledge, to also relate and verify them. By identifying variation points, we assist model engineers in validating the respective parts and reduce the risk of improper system behavior as the system evolves. To assess the applicability of our technique, we present a feasibility study with real-world models from the automotive domain and show our technique to be very fast and highly precise.

    @inproceedings{SWC+17,
    author = {Schlie, Alexander and Wille, David and Cleophas, Loek and Schaefer, Ina},
    title = {{Clustering Variation Points in MATLAB/Simulink Models Using Reverse Signal Propagation Analysis}},
    booktitle = {{Proc. of the Intl. Conference on Software Reuse (ICSR)}},
    year = {2017},
    series = {Lecture Notes in Computer Science},
    volume = {10221},
    doi = {https://dx.doi.org/10.1007/978-3-319-56856-0_6},
    pages = {77--94},
    publisher = {Springer},
    location = {Salvador, Brazil},
    abstract = {Model-based languages such as MATLAB/Simulink play an essential role in the model-driven development of software systems. During their development, these systems can be subject to modification numerous times. For large-scale systems, to manually identify performed modifications is infeasible. However, their precise identification and subsequent validation is essential for the evolution of model-based systems. If not fully identified, modifications may cause unaccountable behavior as the system evolves and their redress can significantly delay the entire development process. In this paper, we propose a fully automated technique called Reverse Signal Propagation Analysis, which identifies and clusters variations within evolving MATLAB/Simulink models. With each cluster representing a clearly delimitable variation point between models, we allow model engineers not only to specifically focus on single variations, but by using their domain knowledge, to also relate and verify them. By identifying variation points, we assist model engineers in validating the respective parts and reduce the risk of improper system behavior as the system evolves. To assess the applicability of our technique, we present a feasibility study with real-world models from the automotive domain and show our technique to be very fast and highly precise.}
    }
  • [DOI] K. Wehling, D. Wille, C. Seidl, and I. Schaefer, “Automated Recommendations for Reducing Unnecessary Variability of Technology Architectures,” in Proc. of the Intl. Workshop on Feature-Oriented Software Development (FOSD), 2017, p. 1–10. Vancouver, Canada.
    [Bibtex] [Publisher Page] [Abstract]

    A technology architecture (TA) represents the technical infrastructure of a company and consists of hardware and software components. As the evolution of such TAs is typically uncoordinated, their complexity often grows with a company’s requirements. As a consequence, a variety of redundancies and architectural variants exist, which are not necessary for the architecture’s purpose. This leads to increased costs and higher effort for evolving and maintaining the entire IT landscape. To alleviate these problems, unnecessary variability has to be identified and reduced. As a manual approach requires high effort and is not feasible for largescale analysis, experts face a major challenge. For this purpose, we propose an automated approach, which provides experts with recommendations for restructurings of related TAs in order to reduce unnecessary variability. We show suitability of our approach by expert interviews and an industrial case study with real-world TAs.

    @inproceedings{WWS+17a,
    author = {Wehling, Kenny and Wille, David and Seidl, Christoph and Schaefer, Ina},
    title = {{Automated Recommendations for Reducing Unnecessary Variability of Technology Architectures}},
    booktitle = {{Proc. of the Intl. Workshop on Feature-Oriented Software Development (FOSD)}},
    year = {2017},
    doi = {https://dx.doi.org/10.1145/3141848.3141849},
    pages = {1--10},
    publisher = {ACM},
    location = {Vancouver, Canada},
    abstract = {A technology architecture (TA) represents the technical infrastructure of a company and consists of hardware and software components. As the evolution of such TAs is typically uncoordinated, their complexity often grows with a company’s requirements. As a consequence, a variety of redundancies and architectural variants exist, which are not necessary for the architecture’s purpose. This leads to increased costs and higher effort for evolving and maintaining the entire IT landscape. To alleviate these problems, unnecessary variability has to be identified and reduced. As a manual approach requires high effort and is not feasible for largescale analysis, experts face a major challenge. For this purpose, we propose an automated approach, which provides experts with recommendations for restructurings of related TAs in order to reduce unnecessary variability. We show suitability of our approach by expert interviews and an industrial case study with real-world TAs.}
    }
  • [DOI] K. Wehling, D. Wille, C. Seidl, and I. Schaefer, “Decision Support for Reducing Unnecessary IT Complexity of Application Architectures,” in Proc. of the Intl. Workshop on decision Making in Software ARCHitecture (MARCH), 2017, p. 161–168. Gothenburg, Sweden.
    [Bibtex] [Publisher Page] [Abstract]

    As many companies exist for decades, their software systems and IT architectures grow massively with the companies requirements. To avoid failures due to changes to a productive system, new demands often lead to new solutions while neglecting to restructure existing software systems despite a similar purpose of use. As a consequence, the IT complexity of such software systems and IT architectures increases sharply accompanied by higher costs, reduced adaptability and increased effort for evolving and maintaining the entire IT landscape. Although there are applications that fulfill a company’s requirements, there are also software solutions and variants of them that seems to be redundant. This causes unnecessary IT complexity, which is not essential for a company’s goals and requirements. To identify and reduce this unnecessary part of IT complexity, we introduce an approach to support experts in decision making regarding these redundant artifacts. We provide a method to identify variability of application architectures (AAs) and an iterative decision process to determine and remove artifacts that are not required, which enable experts to reduce unnecessary IT complexity of given AAs. We show the feasibility of our approach by applying it to industrial data in a preliminary case study.

    @inproceedings{WWS+17b,
    author = {Wehling, Kenny and Wille, David and Seidl, Christoph and Schaefer, Ina},
    title = {{Decision Support for Reducing Unnecessary IT Complexity of Application Architectures}},
    booktitle = {{Proc. of the Intl. Workshop on decision Making in Software ARCHitecture (MARCH)}},
    year = {2017},
    doi = {https://dx.doi.org/10.1109/ICSAW.2017.47},
    pages = {161--168},
    publisher = {IEEE},
    location = {Gothenburg, Sweden},
    abstract = {As many companies exist for decades, their software systems and IT architectures grow massively with the companies requirements. To avoid failures due to changes to a productive system, new demands often lead to new solutions while neglecting to restructure existing software systems despite a similar purpose of use. As a consequence, the IT complexity of such software systems and IT architectures increases sharply accompanied by higher costs, reduced adaptability and increased effort for evolving and maintaining the entire IT landscape. Although there are applications that fulfill a company's requirements, there are also software solutions and variants of them that seems to be redundant. This causes unnecessary IT complexity, which is not essential for a company's goals and requirements. To identify and reduce this unnecessary part of IT complexity, we introduce an approach to support experts in decision making regarding these redundant artifacts. We provide a method to identify variability of application architectures (AAs) and an iterative decision process to determine and remove artifacts that are not required, which enable experts to reduce unnecessary IT complexity of given AAs. We show the feasibility of our approach by applying it to industrial data in a preliminary case study.}
    }

2016

  • [DOI] D. Wille, S. Schulze, C. Seidl, and I. Schaefer, “Custom-Tailored Variability Mining for Block-Based Languages,” in Proc. of the Intl. Conference on Software Analysis, Evolution, and Reengineering (SANER), 2016, p. 271–282. Osaka, Japan.
    [Bibtex] [Publisher Page] [Abstract]

    Block-based modeling languages, such as MATLAB/Simulink or state charts, reduce the complexity inherent to developing large-scale software systems. When creating variants for largely similar yet different software systems, the common practice is to copy models and modify them to different requirements. While this allows companies to save costs in the short-term, these so-called clone-and-own approaches cause problems regarding long-term evolution and system quality as the relation between the variants of the resulting software family is lost so that the variants have to be maintained in isolation. To recreate information regarding the variants’ relations, variability mining identifies common and varying parts of cloned variants but, currently, the respective algorithms have to be created for each target language individually. In this paper, we present a generalized method to instantiate variability mining for arbitrary block-based modeling languages. The identified variability information allows developers to understand the variability of their grown software family. This knowledge helps efficiently maintaining the variants and allows migrating from clone-and-own approaches to more elaborate reuse strategies, such as software product lines. We demonstrate the feasibility of our method by instantiating variability mining techniques for two block-based languages.

    @inproceedings{WSS+16,
    author = {Wille, D. and Schulze, S. and Seidl, C. and Schaefer, I.},
    title = {{Custom-Tailored Variability Mining for Block-Based Languages}},
    booktitle = {{Proc. of the Intl. Conference on Software Analysis, Evolution, and Reengineering (SANER)}},
    year = {2016},
    volume = {1},
    pages = {271--282},
    doi = {https://dx.doi.org/10.1109/SANER.2016.13},
    publisher = {IEEE},
    location = {Osaka, Japan},
    abstract = {Block-based modeling languages, such as MATLAB/Simulink or state charts, reduce the complexity inherent to developing large-scale software systems. When creating variants for largely similar yet different software systems, the common practice is to copy models and modify them to different requirements. While this allows companies to save costs in the short-term, these so-called clone-and-own approaches cause problems regarding long-term evolution and system quality as the relation between the variants of the resulting software family is lost so that the variants have to be maintained in isolation. To recreate information regarding the variants' relations, variability mining identifies common and varying parts of cloned variants but, currently, the respective algorithms have to be created for each target language individually. In this paper, we present a generalized method to instantiate variability mining for arbitrary block-based modeling languages. The identified variability information allows developers to understand the variability of their grown software family. This knowledge helps efficiently maintaining the variants and allows migrating from clone-and-own approaches to more elaborate reuse strategies, such as software product lines. We demonstrate the feasibility of our method by instantiating variability mining techniques for two block-based languages.}
    }
  • [DOI] D. Wille, S. Schulze, and I. Schaefer, “Variability Mining of State Charts,” in Proc. of the Intl. Workshop on Feature-Oriented Software Development (FOSD), 2016, p. 63–73. Amsterdam, Netherlands.
    [Bibtex] [Publisher Page] [Abstract]

    Companies commonly use state charts to reduce the complexity of software development. To create variants with slightly different functionality from existing products, it is common practice to copy the corresponding state charts and modify them to changed requirements. Even though these so-called clone-and-own approaches save money in the short-term, they introduce severe risks for software evolution and product quality in the long term as the relation between the software variants is lost so that all products have to be maintained separately. In previous work, we introduced variability mining algorithms to identify the relations between related MATLAB/Simulink model variants regarding their common and varying parts. In this paper, we adapt these algorithms for state charts by applying guidelines from previous work to make them available for developers to better understand the relations between a set of state chart variants. Using this knowledge, maintenance of related variants can be improved and migration from clone-and-own based single variant development to more elaborate reuse strategies is possible to increase maintainability and the overall product quality. We demonstrate the feasibility of variability mining for state charts by means of a case study with models of realistic size.

    @inproceedings{WSS16,
    author = {Wille, David and Schulze, Sandro and Schaefer, Ina},
    title = {{Variability Mining of State Charts}},
    booktitle = {{Proc. of the Intl. Workshop on Feature-Oriented Software Development (FOSD)}},
    year = {2016},
    pages = {63--73},
    doi = {https://dx.doi.org/10.1145/3001867.3001875},
    publisher = {ACM},
    location = {Amsterdam, Netherlands},
    abstract = {Companies commonly use state charts to reduce the complexity of software development. To create variants with slightly different functionality from existing products, it is common practice to copy the corresponding state charts and modify them to changed requirements. Even though these so-called clone-and-own approaches save money in the short-term, they introduce severe risks for software evolution and product quality in the long term as the relation between the software variants is lost so that all products have to be maintained separately. In previous work, we introduced variability mining algorithms to identify the relations between related MATLAB/Simulink model variants regarding their common and varying parts. In this paper, we adapt these algorithms for state charts by applying guidelines from previous work to make them available for developers to better understand the relations between a set of state chart variants. Using this knowledge, maintenance of related variants can be improved and migration from clone-and-own based single variant development to more elaborate reuse strategies is possible to increase maintainability and the overall product quality. We demonstrate the feasibility of variability mining for state charts by means of a case study with models of realistic size.}
    }
  • [DOI] D. Wille, M. Tiede, S. Schulze, C. Seidl, and I. Schaefer, “Identifying Variability in Object-Oriented Code Using Model-Based Code Mining,” in Proc. of the Intl. Symposium on Leveraging Applications of Formal Methods, Verification and Validation (ISoLA), 2016, p. 547–562. Corfu, Greece.
    [Bibtex] [Publisher Page] [Abstract]

    A large set of object-oriented programming (OOP) languages exists to realize software for different purposes. Companies often create variants of their existing software by copying and modifying them to changed requirements. While these so-called clone-and-own approaches allow to save money in short-term, they expose the company to severe risks regarding long-term evolution and product quality. The main reason is the high manual maintenance effort which is needed due to the unknown relations between variants. In this paper, we introduce a model-based approach to identify variability information for OOP code, allowing companies to better understand and manage variability between their variants. This information allows to improve maintenance of the variants and to transition from single variant development to more elaborate reuse strategies such as software product lines. We demonstrate the applicability of our approach by means of a case study analyzing variants generated from an existing software product line and comparing our findings to the managed reuse strategy.

    @inproceedings{WTS+16,
    author = {Wille, David and Tiede, Michael and Schulze, Sandro and Seidl, Christoph and Schaefer, Ina},
    title = {{Identifying Variability in Object-Oriented Code Using Model-Based Code Mining}},
    booktitle = {{Proc. of the Intl. Symposium on Leveraging Applications of Formal Methods, Verification and Validation (ISoLA)}},
    year = {2016},
    series = {Lecture Notes in Computer Science},
    volume = {9953},
    publisher = {Springer},
    doi = {https://dx.doi.org/10.1007/978-3-319-47169-3_43},
    pages = {547--562},
    location = {Corfu, Greece},
    abstract = {A large set of object-oriented programming (OOP) languages exists to realize software for different purposes. Companies often create variants of their existing software by copying and modifying them to changed requirements. While these so-called clone-and-own approaches allow to save money in short-term, they expose the company to severe risks regarding long-term evolution and product quality. The main reason is the high manual maintenance effort which is needed due to the unknown relations between variants. In this paper, we introduce a model-based approach to identify variability information for OOP code, allowing companies to better understand and manage variability between their variants. This information allows to improve maintenance of the variants and to transition from single variant development to more elaborate reuse strategies such as software product lines. We demonstrate the applicability of our approach by means of a case study analyzing variants generated from an existing software product line and comparing our findings to the managed reuse strategy.}
    }
  • [DOI] K. Wehling, D. Wille, M. Pluchator, and I. Schaefer, “Towards Reducing the Complexity of Enterprise Architectures by Identifying Standard Variants Using Variability Mining,” in 1. Automobil Symposium Wildau: Tagungsband, 2016, p. 37–43. Wildau, Germany.
    [Bibtex] [Publisher Page] [Abstract]

    For decades, Enterprise Architectures (EAs) of car manufacturers have been constantly evolved to respond to growing requirements. As a consequence, EAs have often reached a very high level of complexity, which leads to problems in adapting EAs to new environmental condi­tions. Such a new condition is, for instance, digitalization of society (e.g., social media, Internet of Things) which has a huge effect on the automotive industry and the grown EA. Resulting changes in complex EAs have long implementation cycles, require enormous communica­tion efforts, and lead to high development costs. To alle­viate these problems, in this paper, we present a concept to reduce the complexity of grown EAs by adapting the Family Mining approach. This approach is originally used to compare block-oriented models, such as MATLAB/Simulink models, and to identify commonalities and diffe­rences between these models. In our concept, we utilize the Family Mining approach to analyze the variability of a particular EA and to identify the contained variants. All information about the variability and the variants will be used to derive standard variants representing default so­lutions for different issues. Using these standard variants, the existing EA will be restructured involving economic considerations (e.g., which standard variant yields best benefits under certain circumstances). Hence, applying this concept to a complex EA should allow reducing the complexity of the EA, alleviating related problems and making suitable design decisions for future extensions.

    @inproceedings{WWP+16,
    author = {Wehling, Kenny and Wille, David and Pluchator, Martin and Schaefer, Ina},
    title = {{Towards Reducing the Complexity of Enterprise Architectures by Identifying Standard Variants Using Variability Mining}},
    booktitle = {{1. Automobil Symposium Wildau: Tagungsband}},
    pages = {37--43},
    doi = {https://dx.doi.org/10.15771/ASW_2016_6},
    year = {2016},
    location = {Wildau, Germany},
    abstract = {For decades, Enterprise Architectures (EAs) of car manufacturers have been constantly evolved to respond to growing requirements. As a consequence, EAs have often reached a very high level of complexity, which leads to problems in adapting EAs to new environmental condi­tions. Such a new condition is, for instance, digitalization of society (e.g., social media, Internet of Things) which has a huge effect on the automotive industry and the grown EA. Resulting changes in complex EAs have long implementation cycles, require enormous communica­tion efforts, and lead to high development costs. To alle­viate these problems, in this paper, we present a concept to reduce the complexity of grown EAs by adapting the Family Mining approach. This approach is originally used to compare block-oriented models, such as MATLAB/Simulink models, and to identify commonalities and diffe­rences between these models. In our concept, we utilize the Family Mining approach to analyze the variability of a particular EA and to identify the contained variants. All information about the variability and the variants will be used to derive standard variants representing default so­lutions for different issues. Using these standard variants, the existing EA will be restructured involving economic considerations (e.g., which standard variant yields best benefits under certain circumstances). Hence, applying this concept to a complex EA should allow reducing the complexity of the EA, alleviating related problems and making suitable design decisions for future extensions.}
    }

2014

  • [DOI] D. Wille, “Managing Lots of Models: The FaMine Approach,” in Proc. of the Intl. Symposium on the Foundations of Software Engineering (FSE), 2014, p. 817–819. Hong Kong.
    [Bibtex] [Publisher Page] [Abstract]

    In this paper we present recent developments in reverse engineering variability for block-based data-flow models.

    @inproceedings{Wil14,
    author = {Wille, David},
    title = {{Managing Lots of Models: The FaMine Approach}},
    booktitle = {{Proc. of the Intl. Symposium on the Foundations of Software Engineering (FSE)}},
    year = {2014},
    pages = {817--819},
    doi = {https://dx.doi.org/10.1145/2635868.2661681},
    publisher = {ACM},
    location = {Hong Kong},
    abstract = {In this paper we present recent developments in reverse engineering variability for block-based data-flow models.}
    }
  • [DOI] S. Holthusen, D. Wille, C. Legat, S. Beddig, I. Schaefer, and B. Vogel-Heuser, “Family Model Mining for Function Block Diagrams in Automation Software,” in Proc. of the Intl. Workshop on Reverse Variability Engineering (REVE), 2014, p. 36–43. Florence, Italy.
    [Bibtex] [Publisher Page] [Abstract]

    Automation systems are mostly individual highly customized system variants, consisting both of hardware and software. In order to reduce development effort, it is a common practice to use a clone-and-own approach by modifying an existing variant to fit the changed requirements of a new variant. The information about the commonalities and differences between those variants is usually not well documented and leads to problems in maintenance, testing and evolution. To alleviate these problems, in this paper, we present an improved version of a family mining approach for automatically discovering commonality and variability between related system variants. We apply this approach to function block diagrams used to develop automation software and show its feasibility by a manufacturing case study.

    @inproceedings{HWL+14,
    author = {Holthusen, S{\"{o}}nke and Wille, David and Legat, Christoph and Beddig, Simon and Schaefer, Ina and Vogel-Heuser, Birgit},
    title = {{Family Model Mining for Function Block Diagrams in Automation Software}},
    booktitle = {{Proc. of the Intl. Workshop on Reverse Variability Engineering (REVE)}},
    year = {2014},
    pages = {36--43},
    doi = {https://dx.doi.org/10.1145/2647908.2655965},
    publisher = {ACM},
    location = {Florence, Italy},
    abstract = {Automation systems are mostly individual highly customized system variants, consisting both of hardware and software. In order to reduce development effort, it is a common practice to use a clone-and-own approach by modifying an existing variant to fit the changed requirements of a new variant. The information about the commonalities and differences between those variants is usually not well documented and leads to problems in maintenance, testing and evolution. To alleviate these problems, in this paper, we present an improved version of a family mining approach for automatically discovering commonality and variability between related system variants. We apply this approach to function block diagrams used to develop automation software and show its feasibility by a manufacturing case study.}
    }

2013

  • [DOI] D. Wille, S. Holthusen, S. Schulze, and I. Schaefer, “Interface Variability in Family Model Mining,” in Proc. of the Intl. Workshop on Model-Driven Approaches in Software Product Line Engineering (MAPLE), 2013, p. 44–51. Tokyo, Japan.
    [Bibtex] [Publisher Page] [Abstract]

    Model-driven development of software gains more and more importance, especially in domains with high complexity. In order to develop differing but still similar model-based systems, these models are often copied and modified according to the changed requirements. As the variability between these different models is not documented, issues arise during maintenance. For example, applying patches becomes a tedious task because errors have to be fixed in all of the created models and no information about modified and unchanged parts exists. In this paper, we present an approach to analyze related models and determine the variability between them. This analysis provides crucial information about the variability (i.e., changed parts, additional parts, and parts without any modification) between the models in order to create family models. The particular focus is the analysis of models containing components with differing interfaces.

    @inproceedings{WHS+13,
    author = {Wille, David and Holthusen, S{\"{o}}nke and Schulze, Sandro and Schaefer, Ina},
    title = {{Interface Variability in Family Model Mining}},
    booktitle = {{Proc. of the Intl. Workshop on Model-Driven Approaches in Software Product Line Engineering (MAPLE)}},
    year = {2013},
    pages = {44--51},
    doi = {https://dx.doi.org/10.1145/2499777.2500708},
    publisher = {ACM},
    location = {Tokyo, Japan},
    abstract = {Model-driven development of software gains more and more importance, especially in domains with high complexity. In order to develop differing but still similar model-based systems, these models are often copied and modified according to the changed requirements. As the variability between these different models is not documented, issues arise during maintenance. For example, applying patches becomes a tedious task because errors have to be fixed in all of the created models and no information about modified and unchanged parts exists. In this paper, we present an approach to analyze related models and determine the variability between them. This analysis provides crucial information about the variability (i.e., changed parts, additional parts, and parts without any modification) between the models in order to create family models. The particular focus is the analysis of models containing components with differing interfaces.}
    }