Matching entries: 0
settings...
AuthorTitleYearJournal/ProceedingsReftypeDOI/URL
Ferrucci, F., Salza, P. and Sarro, F. Using Hadoop MapReduce for Parallel Genetic Algorithms: A Comparison of the Global, Grid and Island Models To appear Evolutionary Computation Journal  article DOI  
Abstract: The need to improve the scalability of Genetic Algorithms (GAs) has motivated the research on Parallel Genetic Algorithms (PGAs), and different technologies and approaches have been used. Hadoop MapReduce represents one of the most mature technologies to develop parallel algorithms. Based on the fact that parallel algorithms introduce communication overhead, the aim of the present work is to understand if, and possibly when, the parallel GAs solutions using Hadoop MapReduce show better performance than sequential versions in terms of execution time. Moreover, we are interested in understanding which PGA model can be more effective among the global, grid and island models. We empirically assessed the performance of these three parallel models with respect to a sequential GA on a software engineering problem, evaluating the execution time and the achieved speedup. We also analysed the behaviour of the parallel models in relation to the overhead produced by the use of Hadoop MapReduce and the GAs' computational effort, which gives a more machine-independent measure of these algorithms. We exploited three problem instances to differentiate the computation load and three cluster configurations based on 2, 4 and 8 parallel nodes. Moreover, we estimated the costs of the execution of the experimentation on a potential cloud infrastructure, based on the pricing of the major commercial cloud providers. The empirical study revealed that the use of PGA based on the island model outperforms the other parallel models and the sequential GA for all the considered instances and clusters. Using 2, 4 and 8 nodes, the island model achieves an average speedup over the three datasets of 1.8×, 3.4× and 7.0× times, respectively. Hadoop MapReduce has a set of different constraints that need to be considered during the design and the implementation of parallel algorithms. The overhead of data store (i.e., HDFS) accesses, communication and latency requires solutions that reduce data store operations. For this reason, the island model is more suitable for PGAs than the global and grid model, also in terms of costs when executed on a commercial cloud provider.
BibTeX:
@article{FerrucciSS,
  author = {Filomena Ferrucci and Pasquale Salza and Federica Sarro},
  title = {Using Hadoop MapReduce for Parallel Genetic Algorithms: A Comparison of the Global, Grid and Island Models},
  journal = {Evolutionary Computation Journal},
  year = {To appear},
  doi = {http://dx.doi.org/10.1162/evco_a_00213}
}
Li, M., Grosan, C., Yang, S., Liu, X. and Yao, X. Multi-Line Distance Minimization: A Visualized Many-Objective Test Problem Suite To appear IEEE Transactions on Evolutionary Computation  article DOI  
Abstract: Studying the search behavior of evolutionary manyobjective optimization is an important, but challenging issue. Existing studies rely mainly on the use of performance indicators which, however, not only encounter increasing difficulties with the number of objectives, but also fail to provide the visual information of the evolutionary search. In this paper, we propose a class of scalable test problems, called multi-line distance minimization problem (ML-DMP), which are used to visually examine the behavior of many-objective search. Two key characteristics of the ML-DMP problem are: 1) its Pareto optimal solutions lie in a regular polygon in the two-dimensional decision space, and 2) these solutions are similar (in the sense of Euclidean geometry) to their images in the high-dimensional objective space. This allows a straightforward understanding of the distribution of the objective vector set (e.g., its uniformity and coverage over the Pareto front) via observing the solution set in the two-dimensional decision space. Fifteen well-established algorithms have been investigated on three types of 10 ML-DMP problem instances. Weakness has been revealed across classic multi-objective algorithms (such as Pareto-based, decompositionbased and indicator-based algorithms) and even state-of-the-art algorithms designed especially for many-objective optimization. This, together with some interesting observations from the experimental studies, suggests that the proposed ML-DMP may also be used as a benchmark function to challenge the search ability of optimization algorithms.
BibTeX:
@article{LiGYLY,
  author = {Miqing Li and Crina Grosan and Shengxiang Yang and Xiaohui Liu and Xin Yao},
  title = {Multi-Line Distance Minimization: A Visualized Many-Objective Test Problem Suite},
  journal = {IEEE Transactions on Evolutionary Computation},
  year = {To appear},
  doi = {http://dx.doi.org/10.1109/TEVC.2017.2655451}
}
Martin, W., Sarro, F., Jia, Y., Zhang, Y. and Harman, M. A Survey of App Store Analysis for Software Engineering To appear IEEE Transactions on Software Engineering  article DOI  
Abstract: App Store Analysis studies information about applications obtained from app stores. App stores provide a wealth of information derived from users that would not exist had the applications been distributed via previous software deployment methods. App Store Analysis combines this non-technical information with technical information to learn trends and behaviours within these forms of software repositories. Findings from App Store Analysis have a direct and actionable impact on the software teams that develop software for app stores, and have led to techniques for requirements engineering, release planning, software design, security and testing. This survey describes and compares the areas of research that have been explored thus far, drawing out common aspects, trends and directions future research should take to address open problems and challenges.
BibTeX:
@article{MartinSJZH,
  author = {William Martin and Federica Sarro and Yue Jia and Yuanyuan Zhang and Mark Harman},
  title = {A Survey of App Store Analysis for Software Engineering},
  journal = {IEEE Transactions on Software Engineering},
  year = {To appear},
  doi = {http://dx.doi.org/10.1109/TSE.2016.2630689}
}
Ochoa, G., Verel, S., Daolio, F. and Tomassini, M. Book on Fitness Landscapes To appear   inbook  
BibTeX:
@inbook{OchoaVDT,
  author = {G. Ochoa and S. Verel and F. Daolio and M. Tomassini},
  title = {Book on Fitness Landscapes},
  publisher = {Springer},
  year = {To appear}
}
Qian, C., Yu, Y., Tang, K., Jin, Y., Yao, X. and Zhou, Z.-H. On the Effectiveness of Sampling for Evolutionary Optimization in Noisy Environments To appear Evolutionary Computation  article DOI  
Abstract: In real-world optimization tasks, the objective (i.e., fitness) function evaluation is often disturbed by noise due to a wide range of uncertainties. Evolutionary algorithms are often employed in noisy optimization, where reducing the negative effect of noise is a crucial issue. Sampling is a popular strategy for dealing with noise: to estimate the fitness of a solution, it evaluates the fitness multiple () times independently and then uses the sample average to approximate the true fitness. Obviously, sampling can make the fitness estimation closer to the true value, but also increases the estimation cost. Previous studies mainly focused on empirical analysis and design of efficient sampling strategies, while the impact of sampling is unclear from a theoretical viewpoint. In this article, we show that sampling can speed up noisy evolutionary optimization exponentially via rigorous running time analysis. For the (11)-EA solving the OneMax and the LeadingOnes problems under prior (e.g., one-bit) or posterior (e.g., additive Gaussian) noise, we prove that, under a high noise level, the running time can be reduced from exponential to polynomial by sampling. The analysis also shows that a gap of one on the value of for sampling can lead to an exponential difference on the expected running time, cautioning for a careful selection of . We further prove by using two illustrative examples that sampling can be more effective for noise handling than parent populations and threshold selection, two strategies that have shown to be robust to noise. Finally, we also show that sampling can be ineffective when noise does not bring a negative impact.
BibTeX:
@article{QianYTJYZ,
  author = {Chao Qian and Yang Yu and Ke Tang and Yaochu Jin and Xin Yao and Zhi-Hua Zhou},
  title = {On the Effectiveness of Sampling for Evolutionary Optimization in Noisy Environments},
  journal = {Evolutionary Computation},
  year = {To appear},
  doi = {http://dx.doi.org/10.1162/EVCO_a_00201}
}
Sarro, F., Ferrucci, F., Harman, M., Manna, A. and Ren, J. Adaptive Multi-objective Evolutionary Algorithms for Overtime Planning in Software Projects To appear IEEE Transactions on Software Engineering  article DOI  
Abstract: Software engineering and development is wellknown to suffer from unplanned overtime, which causes stress and illness in engineers and can lead to poor quality software with higher defects. Recently, we introduced a multi-objective decision support approach to help balance project risks and duration against overtime, so that software engineers can better plan overtime. This approach was empirically evaluated on six real world software projects and compared against state-of-the-art evolutionary approaches and currently used overtime strategies. The results showed that our proposal comfortably outperformed all the benchmarks considered. This paper extends our previous work by investigating adaptive multi-objective approaches to meta-heuristic operator selection, thereby extending and (as the results show) improving algorithmic performance. We also extended our empirical study to include two new real world software projects, thereby enhancing the scientific evidence for the technical performance claims made in the paper. Our new results, over all eight projects studied, showed that our adaptive algorithm outperforms the considered state of the art multi-objective approaches in 93% of the experiments (with large effect size). The results also confirm that our approach significantly outperforms current overtime planning practices in 100% of the experiments (with large effect size).
BibTeX:
@article{SarroFHMR,
  author = {Federica Sarro and Filomena Ferrucci and Mark Harman and Alessandra Manna and Jian Ren},
  title = {Adaptive Multi-objective Evolutionary Algorithms for Overtime Planning in Software Projects},
  journal = {IEEE Transactions on Software Engineering},
  year = {To appear},
  doi = {http://dx.doi.org/10.1109/TSE.2017.2650914}
}
Tang, K., Wang, J., Li, X. and Yao, X. A Scalable Approach to Capacitated Arc Routing Problems Based on Hierarchical Decomposition To appear IEEE Transactions on Cybernetics  article DOI  
Abstract: The capacitated arc routing problem (CARP) is a challenging optimization problem with lots of applications in the real world. Numerous approaches have been proposed to tackle this problem. Most of these methods, albeit showing good performance on CARP instances of small and median sizes, do not scale well to large-scale CARPs, e.g., taking at least a few hours to achieve a satisfactory solution on a CARP instance with thousands of tasks. In this paper, an efficient and scalable approach is proposed for CARPs. The key idea of the proposed approach is to hierarchically decompose the tasks involved in a CARP instance into subgroups and solve the induced subproblems recursively. The output of the subproblems at the lower layer in the hierarchy is treated as virtual tasks and new subproblems are formulated based on these virtual tasks using clustering techniques. By this means, the number of tasks (or virtual tasks) decreases rapidly from the bottom to the top layers of the hierarchy, and the sizes of all subproblems at each layer can be kept tractable even for very large-scale CARPs. Empirical studies are conducted on CARP instances with up to 3584 tasks, which are an order of magnitude larger than the number of tasks involved in all CARP instances investigated in the literature. The results show that the proposed approach significantly outperforms existing methods in terms of scalability. Since the proposed hierarchical decomposition scheme is designed to obtain a good permutation of tasks in a CARP instance, it may also be generalized to other hard optimization problems that can be formulated as permutation-based optimization problems.
BibTeX:
@article{TangWLY,
  author = {Ke Tang and Juan Wang and Xiaodong Li and Xin Yao},
  title = {A Scalable Approach to Capacitated Arc Routing Problems Based on Hierarchical Decomposition},
  journal = {IEEE Transactions on Cybernetics},
  year = {To appear},
  doi = {http://dx.doi.org/10.1109/TCYB.2016.2590558}
}
Yang, P., Tang, K. and Yao, X. Turning High-dimensional Optimization into Computationally Expensive Optimization To appear IEEE Transactions on Evolutionary Computation  article DOI  
Abstract: Divide-and-Conquer (DC) is conceptually well suited to deal with high-dimensional optimization problems by decomposing the original problem into multiple low-dimensional sub-problems, and tackling them separately. Nevertheless, the dimensionality mismatch between the original problem and subproblems makes it non-trivial to precisely assess the quality of a candidate solution to a sub-problem, which has been a major hurdle for applying the idea of DC to non-separable highdimensional optimization problems. In this paper, we suggest that searching a good solution to a sub-problem can be viewed as a computationally expensive problem and can be addressed with the aid of meta-models. As a result, a novel approach, namely Self-Evaluation Evolution (SEE) is proposed. Empirical studies have shown the advantages of SEE over 4 representative compared algorithms increase with the problem size on the CEC2010 large scale global optimization benchmark. The weakness of SEE is also analysed in the empirical studies.
BibTeX:
@article{YangTY,
  author = {Peng Yang and Ke Tang and Xin Yao},
  title = {Turning High-dimensional Optimization into Computationally Expensive Optimization},
  journal = {IEEE Transactions on Evolutionary Computation},
  year = {To appear},
  doi = {http://dx.doi.org/10.1109/TEVC.2017.2672689}
}
Bazargani, M. and Lobo, F.G. Parameter-less Late Acceptance Hill-Climbing 2017 Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '17), pp. 219-226  inproceedings DOI  
Abstract: The Late Acceptance Hill-Climbing (LAHC) algorithm has been recently introduced by Burke and Bykov. It is a simple, general purpose, one-point search metaheuristic that has similarities with Simulated Annealing (SA) in the sense that worsening moves on a current solution can be accepted. One of its advantages relative to Simulated Annealing is that no cooling schedule is required and its sole parameter, the so-called history length, has a more meaningful interpretation from the application point of view and is therefore easier to specify by a user. In this paper we show that even this single parameter can be eliminated, making LAHC simpler to apply in practice. The validity of the method is shown with computational experiments on a number of instances of the Travelling Salesman Problem.
BibTeX:
@inproceedings{BazarganiL17,
  author = {Mosab Bazargani and Fernando G. Lobo},
  title = {Parameter-less Late Acceptance Hill-Climbing},
  booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '17)},
  publisher = {ACM},
  year = {2017},
  pages = {219-226},
  doi = {http://dx.doi.org/10.1145/3071178.3071225}
}
Benlic, U., Burke, E. and Woodward, J.R. Breakout Local Search for the Multi-objective Gate Allocation Problem 2017 Computers & Operations Research
Vol. 78 
article DOI  
Abstract: The problem of assigning gates to arriving and departing flights is one of the most important problems in airport operations. We take into account the real multi-criteria nature of the problem by optimizing a total of nine gate allocation objectives that are oriented both on convenience for airport/airline services and passenger comfort. As far as we are aware, this is the largest number of objectives jointly optimized in the GAP literature. Given the complexity of the considered problem, we propose a heuristic approach based on the Breakout Local Search (BLS) framework. BLS is a recent variant of the Iterated Local Search (ILS) with a particular focus on the perturbation strategy. Based on some relevant information on search history, it tries to introduce an appropriate degree of diversification by determining adaptively the number and type of moves for the next perturbation phase. Moreover, we use a new memory-based greedy constructive heuristic to generate a starting point for BLS. Benchmark instances used for our experiments and comparisons are based on information provided by Manchester Airport.
BibTeX:
@article{BenlicBW17,
  author = {Una Benlic and Edmund Burke and John R. Woodward},
  title = {Breakout Local Search for the Multi-objective Gate Allocation Problem},
  journal = {Computers & Operations Research},
  year = {2017},
  volume = {78},
  doi = {http://dx.doi.org/10.1016/j.cor.2016.08.010}
}
Benlic, U., Epitropakis, M.G. and Burke, E.K. A Hybrid Breakout Local Search and Reinforcement Learning Approach to the Vertex Separator Problem 2017 European Journal of Operational Research
Vol. 261(3), pp. 803-818 
article DOI  
Abstract: The Vertex Separator Problem (VSP) is an NP-hard problem which emerges from a variety of important domains and applications. In this paper, we present an improved Breakout Local Search for VSP (named BLS-RLE). The distinguishing feature of this approach is a new parameter control mechanism that draws upon ideas from reinforcement learning theory to reach an interdependent decision on the number and on the type of perturbation moves. The mechanism complies with the principle of first carrying out intensification and then employing minimal diversification only if needed, it uses a dedicated sampling strategy for a rapid convergence towards a limited set of parameter values that appear to be the most convenient for the given state of search. Extensive experimental evaluations and statistical comparisons on a wide range of benchmark instances show significant improvement in performance of the proposed algorithm over the existing BLS algorithm for VSP. Indeed, out of the 422 tested instances, BLS-RLE was able to attain the best-known solution in 93.8% of the cases, which is around 20% higher compared to the existing BLS. In addition, we provide detailed analyses to evaluate the importance of the key elements of the proposed method and to justify the degree of diversification introduced during perturbation.
BibTeX:
@article{BenlicEB17,
  author = {Una Benlic and Michael G. Epitropakis and Edmund K. Burke},
  title = {A Hybrid Breakout Local Search and Reinforcement Learning Approach to the Vertex Separator Problem},
  journal = {European Journal of Operational Research},
  year = {2017},
  volume = {261},
  number = {3},
  pages = {803-818},
  doi = {http://dx.doi.org/10.1016/j.ejor.2017.01.023}
}
Burke, E.K. and Bykov, Y. The Late Acceptance Hill-Climbing Heuristic 2017 European Journal of Operational Research
Vol. 258(1), pp. 70-78 
article DOI  
Abstract: This paper introduces a new and very simple search methodology called Late Acceptance Hill-Climbing (LAHC). It is a local search algorithm, which accepts non-improving moves when a candidate cost function is better than it was a number of iterations before. This number appears as a single algorithmic input parameter which determines the total processing time of the search procedure. The properties of this method are experimentally studied in this paper with a range of Travelling Salesman and Exam Timetabling benchmark problems. Also, we present a fair comparison of LAHC with well-known search techniques, which employ a cooling schedule: Simulated Annealing (SA), Threshold Accepting (TA) and the Great Deluge Algorithm (GDA). In addition, we discuss the success of the method in winning an international competition to automatically solve the Magic Square problem. Our experiments have shown that the LAHC approach is simple, easy to implement and yet is an effective search procedure. For most of the studied instances (especially for the large sized ones), its average performance is better than competitor methods. Moreover, the LAHC approach has an additional advantage (in contrast to the above cooling schedule based methods) in its scale independence. We present an example where the rescaling of a cost function in the Travelling Salesman Problem dramatically deteriorates the performance of three cooling schedule based techniques, but has absolutely no influence upon the performance of LAHC.
BibTeX:
@article{BurkeB17,
  author = {Edmund K. Burke and Yuri Bykov},
  title = {The Late Acceptance Hill-Climbing Heuristic},
  journal = {European Journal of Operational Research},
  year = {2017},
  volume = {258},
  number = {1},
  pages = {70-78},
  doi = {http://dx.doi.org/10.1016/j.ejor.2016.07.012}
}
Chen, T. and Bahsoon, R. Self-Adaptive and Online QoS Modeling for Cloud-Based Software Services 2017 IEEE Transactions on Software Engineering
Vol. 43(5), pp. 453-475 
article DOI  
Abstract: In the presence of scale, dynamism, uncertainty and elasticity, cloud software engineers faces several challenges when modeling Quality of Service (QoS) for cloud-based software services. These challenges can be best managed through self-adaptivity because engineers’ intervention is difficult, if not impossible, given the dynamic and uncertain QoS sensitivity to the environment and control knobs in the cloud. This is especially true for the shared infrastructure of cloud, where unexpected interference can be caused by co-located software services running on the same virtual machine; and co-hosted virtual machines within the same physical machine. In this paper, we describe the related challenges and present a fully dynamic, self-adaptive and online QoS modeling approach, which grounds on sound information theory and machine learning algorithms, to create QoS model that is capable to predict the QoS value as output over time by using the information on environmental conditions, control knobs and interference as inputs. In particular, we report on in-depth analysis on the correlations of selected inputs to the accuracy of QoS model in cloud. To dynamically selects inputs to the models at runtime and tune accuracy, we design self-adaptive hybrid dual-learners that partition the possible inputs space into two sub-spaces, each of which applies different symmetric uncertainty based selection techniques; the results of sub-spaces are then combined. Subsequently, we propose the use of adaptive multi-learners for building the model. These learners simultaneously allow several learning algorithms to model the QoS function, permitting the capability for dynamically selecting the best model for prediction on the fly. We experimentally evaluate our models in the cloud environment using RUBiS benchmark and realistic FIFA 98 workload. The results show that our approach is more accurate and effective than state-of-the-art modelings.
BibTeX:
@article{ChenB17,
  author = {Tao Chen and Rami Bahsoon},
  title = {Self-Adaptive and Online QoS Modeling for Cloud-Based Software Services},
  journal = {IEEE Transactions on Software Engineering},
  year = {2017},
  volume = {43},
  number = {5},
  pages = {453-475},
  doi = {http://dx.doi.org/10.1109/TSE.2016.2608826}
}
Drake, J.H., Swan, J., Neumann, G. and Özcan, E. Sparse, Continuous Policy Representations for Uniform Online Bin Packing via Regression of Interpolants 2017 Proceedings of the 17th European Conference on Evolutionary Computation in Combinatorial Optimisation (EvoCOP '17), pp. 189-200  inproceedings DOI  
Abstract: Online bin packing is a classic optimisation problem, widely tackled by heuristic methods. In addition to human-designed heuristic packing policies (e.g. first- or best- fit), there has been interest over the last decade in the automatic generation of policies. One of the main limitations of some previously-used policy representations is the trade-off between locality and granularity in the associated search space. In this article, we adopt an interpolation-based representation which has the jointly-desirable properties of being sparse and continuous (i.e. exhibits good genotype-to-phenotype locality). In contrast to previous approaches, the policy space is searchable via real-valued optimization methods. Packing policies using five different interpolation methods are comprehensively compared against a range of existing methods from the literature, and it is determined that the proposed method scales to larger instances than those in the literature.
BibTeX:
@inproceedings{DrakeSNO17,
  author = {John H. Drake and Jerry Swan and Geoff Neumann and Ender Özcan},
  title = {Sparse, Continuous Policy Representations for Uniform Online Bin Packing via Regression of Interpolants},
  booktitle = {Proceedings of the 17th European Conference on Evolutionary Computation in Combinatorial Optimisation (EvoCOP '17)},
  publisher = {Springer},
  year = {2017},
  pages = {189-200},
  doi = {http://dx.doi.org/10.1007/978-3-319-55453-2_13}
}
Epitropakis, M.G. and Burke, E.K. Hyper-Heuristics 2017 Handbook of Heuristics, pp. in print  incollection  
BibTeX:
@incollection{EpitropakisB17,
  author = {Michael G. Epitropakis and Edmund K. Burke},
  title = {Hyper-Heuristics},
  booktitle = {Handbook of Heuristics},
  publisher = {Springer},
  year = {2017},
  pages = {in print},
  note = {Invited Chapter}
}
Gretsista, A. and Burke, E.K. An Iterated Local Search Framework with Adaptive Operator Selection for Nurse Rostering 2017 In Proceedings of the 11th Learning and Intelligent Optimization Conference (LION '11)  inproceedings  
BibTeX:
@inproceedings{GretsistaB17,
  author = {Angeliki Gretsista and Edmund K. Burke},
  title = {An Iterated Local Search Framework with Adaptive Operator Selection for Nurse Rostering},
  booktitle = {In Proceedings of the 11th Learning and Intelligent Optimization Conference (LION '11)},
  year = {2017}
}
Langdon, W.B. and Davidson, B. BarraCUDA in the Cloud 2017 IEEE Software blog  misc URL 
BibTeX:
@misc{LangdonD17,
  author = {William B. Langdon and Bob Davidson},
  title = {BarraCUDA in the Cloud},
  year = {2017},
  url = {http://www0.cs.ucl.ac.uk/staff/w.langdon/ftp/papers/langdon_2017_ieeeblog.pdf}
}
Langdon, W.B., Lam, B.Y.H., Modat, M., Petke, J. and Harman, M. Genetic Improvement of GPU Software 2017 Genetic Programming and Evolvable Machines
Vol. 18(1), pp. 5-44 
article DOI  
Abstract: We survey genetic improvement (GI) of general purpose computing on graphics cards. We summarise several experiments which demonstrate four themes. Experiments with the gzip program show that genetic programming can automatically port sequential C code to parallel code. Experiments with the StereoCamera program show that GI can upgrade legacy parallel code for new hardware and software. Experiments with NiftyReg and BarraCUDA show that GI can make substantial improvements to current parallel CUDA applications. Finally, experiments with the pknotsRG program show that with semi-automated approaches, enormous speed ups can sometimes be had by growing and grafting new code with genetic programming in combination with human input.
BibTeX:
@article{LangdonLMPH17,
  author = {William B. Langdon and Brian Yee Hong Lam and Marc Modat and Justyna Petke and Mark Harman},
  title = {Genetic Improvement of GPU Software},
  journal = {Genetic Programming and Evolvable Machines},
  year = {2017},
  volume = {18},
  number = {1},
  pages = {5-44},
  doi = {http://dx.doi.org/10.1007/s10710-016-9273-9}
}
Langdon, W.B., Veerapen, N. and Ochoa, G. Visualising the Search Landscape of the Triangle Program 2017 Proceedings of the 20th European Conference on Genetic Programming (EuroGP '17), pp. 96-113  inproceedings DOI  
Abstract: High order mutation analysis of a software engineering benchmark, including schema and local optima networks, suggests program improvements may not be as hard to find as is often assumed. (1) Bit-wise genetic building blocks are not deceptive and can lead to all global optima. (2) There are many neutral networks, plateaux and local optima, nevertheless in most cases near the human written C source code there are hill climbing routes including neutral moves to solutions.
BibTeX:
@inproceedings{LangdonVO17,
  author = {William B. Langdon and Nadarajen Veerapen and Gabriela Ochoa},
  title = {Visualising the Search Landscape of the Triangle Program},
  booktitle = {Proceedings of the 20th European Conference on Genetic Programming (EuroGP '17)},
  publisher = {Springer},
  year = {2017},
  pages = {96-113},
  doi = {http://dx.doi.org/10.1007/978-3-319-55696-3_7}
}
Langdon, W.B., Yoo, S. and Harman, M. Inferring Automatic Test Oracles 2017 Proceedings of the 10th International Workshop on Search-Based Software Testing (SBST '17), pp. 5-6  inproceedings DOI  
Abstract: We propose the use of search based learning from existing open source test suitesto automatically generate partially correct test oracles. We argue that mutation testing and n-version computing(augmented by deep learningand other soft computingtechniques), will be able to predict whether a program's output is correct sufficiently accurately to be useful.
BibTeX:
@inproceedings{LangdonYH17,
  author = {William B. Langdon and Shin Yoo and Mark Harman},
  title = {Inferring Automatic Test Oracles},
  booktitle = {Proceedings of the 10th International Workshop on Search-Based Software Testing (SBST '17)},
  publisher = {IEEE},
  year = {2017},
  pages = {5-6},
  doi = {http://dx.doi.org/10.1109/SBST.2017.1}
}
Li, K., Deb, K., Altinoz, T. and Yao, X. Empirical Investigations of Reference Point Based Methods When Facing a Massively Large Number of Objectives: First Results 2017 Proceedings of the 9th International Conference on Evolutionary Multi-Criterion Optimization (EMO '17), pp. 390-405  inproceedings DOI  
Abstract: Multi-objective optimization with more than three objectives has become one of the most active topics in evolutionary multi-objective optimization (EMO). However, most existing studies limit their experiments up to 15 or 20 objectives, although they claimed to be capable of handling as many objectives as possible. To broaden the insights in the behavior of EMO methods when facing a massively large number of objectives, this paper presents some preliminary empirical investigations on several established scalable benchmark problems with 25, 50, 75 and 100 objectives. In particular, this paper focuses on the behavior of the currently pervasive reference point based EMO methods, although other methods can also be used. The experimental results demonstrate that the reference point based EMO method can be viable for problems with a massively large number of objectives, given an appropriate choice of the distance measure. In addition, sufficient population diversity should be given on each weight vector or a local niche, in order to provide enough selection pressure. To the best of our knowledge, this is the first time an EMO methodology has been considered to solve a massively large number of conflicting objectives.
BibTeX:
@inproceedings{LiDAY17,
  author = {Ke Li and Kalyanmoy Deb and Tolga Altinoz and Xin Yao},
  title = {Empirical Investigations of Reference Point Based Methods When Facing a Massively Large Number of Objectives: First Results},
  booktitle = {Proceedings of the 9th International Conference on Evolutionary Multi-Criterion Optimization (EMO '17)},
  publisher = {Springer},
  year = {2017},
  pages = {390-405},
  doi = {http://dx.doi.org/10.1007/978-3-319-54157-0_27}
}
Li, L., Harman, M., Wu, F. and Zhang, Y. The Value of Exact Analysis in Requirements Selection 2017 IEEE Transactions on Software Engineering
Vol. 43(6), pp. 580-596 
article DOI  
Abstract: Uncertainty is characterised by incomplete understanding. It is inevitable in the early phase of requirements engineering, and can lead to unsound requirement decisions. Inappropriate requirement choices may result in products that fail to satisfy stakeholders' needs, and might cause loss of revenue. To overcome uncertainty, requirements engineering decision support needs uncertainty management. In this research, we develop a decision support framework METRO for the Next Release Problem (NRP) to manage algorithmic uncertainty and requirements uncertainty. An exact NRP solver (NSGDP) lies at the heart of METRO. NSGDP's exactness eliminates interference caused by approximate existing NRP solvers. We apply NSGDP to three NRP instances, derived from a real world NRP instance, RALIC, and compare with NSGA-II, a widely-used approximate (inexact) technique. We find the randomness of NSGA-II results in decision makers missing up to 99.95 percent of the optimal solutions and obtaining up to 36.48 percent inexact requirement selection decisions. The chance of getting an inexact decision using existing approximate approaches is negatively correlated with the implementation cost of a requirement (Spearman r up to -0.72). Compared to the inexact existing approach, NSGDP saves 15.21 percent lost revenue, on average, for the RALIC dataset.
BibTeX:
@article{LiHWZ17,
  author = {Lingbo Li and Mark Harman and Fan Wu and Yuanyuan Zhang},
  title = {The Value of Exact Analysis in Requirements Selection},
  journal = {IEEE Transactions on Software Engineering},
  year = {2017},
  volume = {43},
  number = {6},
  pages = {580-596},
  doi = {http://dx.doi.org/10.1109/TSE.2016.2615100}
}
Li, J. and Kendall, G. A Hyperheuristic Methodology to Generate Adaptive Strategies for Games 2017 IEEE Transactions on Computational Intelligence and AI in Games
Vol. 9(1), pp. 1-10 
article DOI  
Abstract: Hyperheuristics have been successfully applied in solving a variety of computational search problems. In this paper, we investigate a hyperheuristic methodology to generate adaptive strategies for games. Based on a set of low-level heuristics (or strategies), a hyperheuristic game player can generate strategies which adapt to both the behavior of the co-players and the game dynamics. By using a simple heuristic selection mechanism, a number of existing heuristics for specialized games can be integrated into an automated game player. As examples, we develop hyperheuristic game players for three games: iterated prisoner's dilemma, repeated Goofspiel and the competitive traveling salesmen problem. The results demonstrate that a hyperheuristic game player outperforms the low-level heuristics, when used individually in game playing and it can generate adaptive strategies even if the low-level heuristics are deterministic. This methodology provides an efficient way to develop new strategies for games based on existing strategies.
BibTeX:
@article{LiK17,
  author = {Jiawei Li and Graham Kendall},
  title = {A Hyperheuristic Methodology to Generate Adaptive Strategies for Games},
  journal = {IEEE Transactions on Computational Intelligence and AI in Games},
  year = {2017},
  volume = {9},
  number = {1},
  pages = {1-10},
  doi = {http://dx.doi.org/10.1109/TCIAIG.2015.2394780}
}
Li, W., Özcan, E., John, R., Drake, J.H., Neumann, A. and Wagner, M. A Modified Indicator-based Evolutionary Algorithm (mIBEA) 2017 Proceddings of the IEEE Congress on Evolutionary Computation (CEC '17)  inproceedings  
Abstract: Multi-objective evolutionary algorithms (MOEAs) based on the concept of Pareto-dominance have been successfully applied to many real-world optimisation problems. Recently, research interest has shifted towards indicator-based methods to guide the search process towards a good set of trade-off solutions. One commonly used approach of this nature is the indicator-based evolutionary algorithm (IBEA). In this study, we highlight the solution distribution issues within IBEA and propose a modification of the original approach by embedding an additional Pareto-dominance based component for selection. The improved performance of the proposed modified IBEA (mIBEA) is empirically demonstrated on the well-known DTLZ set of benchmark functions. Our results show that mIBEA achieves comparable or better hypervolume indicator values and epsilon approximation values in the vast majority of our cases (13 out of 14 under the same default settings) on DTLZ1-7. The modification also results in an over 8-fold speed-up for larger populations.
BibTeX:
@inproceedings{LiOJDNW17,
  author = {Wenwen Li and Ender Özcan and Robert John and John H. Drake and Aneta Neumann and Markus Wagner},
  title = {A Modified Indicator-based Evolutionary Algorithm (mIBEA)},
  booktitle = {Proceddings of the IEEE Congress on Evolutionary Computation (CEC '17)},
  year = {2017}
}
Mao, K., Capra, L., Harman, M. and Jia, Y. A Survey of the use of Crowdsourcing in Software Engineering 2017 Journal of Systems and Software
Vol. 126, pp. 57-84 
article DOI  
Abstract: The term ‘crowdsourcing’ was initially introduced in 2006 to describe an emerging distributed problem-solving model by online workers. Since then it has been widely studied and practiced to support software engineering. In this paper we provide a comprehensive survey of the use of crowdsourcing in software engineering, seeking to cover all literature on this topic. We first review the definitions of crowdsourcing and derive our definition of Crowdsourcing Software Engineering together with its taxonomy. Then we summarise industrial crowdsourcing practice in software engineering and corresponding case studies. We further analyse the software engineering domains, tasks and applications for crowdsourcing and the platforms and stakeholders involved in realising Crowdsourced Software Engineering solutions. We conclude by exposing trends, open issues and opportunities for future research on Crowdsourced Software Engineering.
BibTeX:
@article{MaoCHJ17,
  author = {Ke Mao and Licia Capra and Mark Harman and Yue Jia},
  title = {A Survey of the use of Crowdsourcing in Software Engineering},
  journal = {Journal of Systems and Software},
  year = {2017},
  volume = {126},
  pages = {57-84},
  doi = {http://dx.doi.org/10.1016/j.jss.2016.09.015}
}
Minku, L.L. and Yao, X. Which models of the past are relevant to the present? A software effort estimation approach to exploiting useful past models 2017 Automated Software Engineering
Vol. 24(3), pp. 499-542 
article DOI  
Abstract: Software Effort Estimation (SEE) models can be used for decision-support by software managers to determine the effort required to develop a software project. They are created based on data describing projects completed in the past. Such data could include past projects from within the company that we are interested in (WC projects) and/or from other companies (cross-company, i.e., CC projects). In particular, the use of CC data has been investigated in an attempt to overcome limitations caused by the typically small size of WC datasets. However, software companies operate in non-stationary environments, where changes may affect the typical effort required to develop software projects. Our previous work showed that both WC and CC models of the past can become more or less useful over time, i.e., they can sometimes be helpful and sometimes misleading. So, how can we know if and when a model created based on past data represents well the current projects being estimated? We propose an approach called Dynamic Cross-company Learning (DCL) to dynamically identify which WC or CC past models are most useful for making predictions to a given company at the present. DCL automatically emphasizes the predictions given by these models in order to improve predictive performance. Our experiments comparing DCL against existing WC and CC approaches show that DCL is successful in improving SEE by emphasizing the most useful past models. A thorough analysis of DCL’s behaviour is provided, strengthening its external validity.
BibTeX:
@article{MinkuY17,
  author = {Leandro L. Minku and Xin Yao},
  title = {Which models of the past are relevant to the present? A software effort estimation approach to exploiting useful past models},
  journal = {Automated Software Engineering},
  year = {2017},
  volume = {24},
  number = {3},
  pages = {499-542},
  doi = {http://dx.doi.org/10.1007/s10515-016-0209-7}
}
Shen, Y., Li, J. and Peng, K. An Estimation of Distribution Algorithm for Public Transport Driver Scheduling 2017 International Journal of Operational Research
Vol. 28(2), pp. 245-262 
article DOI  
Abstract: Public transport driver scheduling is a process of selecting a set of duties for the drivers of vehicles to form a number of legal driver shifts. The problem usually has two objectives which are minimising both the total number of shifts and the total shift cost, while taking into account some constraints related to labour and company rules. A commonly used approach is firstly to generate a large set of feasible shifts by domain-specific heuristics, and then to select a subset to form the final schedule by an integer programming method. This paper presents an estimation of distribution algorithm (EDA) to deal with the subset selection problem which is NP-hard. To obtain a candidate schedules, the EDA applies a number of rules, with each rule corresponding to a particular way of selecting a shift. Computational results from some real-world instances of drive scheduling demonstrate the availability of this approach.
BibTeX:
@article{ShenLP17,
  author = {Yindong Shen and Jingpeng Li and Kunkun Peng},
  title = {An Estimation of Distribution Algorithm for Public Transport Driver Scheduling},
  journal = {International Journal of Operational Research},
  year = {2017},
  volume = {28},
  number = {2},
  pages = {245-262},
  doi = {http://dx.doi.org/10.1504/IJOR.2017.10002083}
}
Soria-Alcaraz, J.A., Ochoa, G., Sotelo-Figeroa, M.A. and Burke, E.K. A Methodology for Determining an Effective Subset of Heuristics in Selection Hyper-heuristics 2017 European Journal of Operational Research
Vol. 260(3), pp. 972-983 
article DOI  
Abstract: We address the important step of determining an effective subset of heuristics in selection hyper-heuristics. Little attention has been devoted to this in the literature, and the decision is left at the discretion of the investigator. The performance of a hyper-heuristic depends on the quality and size of the heuristic pool. Using more than one heuristic is generally advantageous, however, an unnecessary large pool can decrease the performance of adaptive approaches. Our goal is to bring methodological rigour to this step. The proposed methodology uses non-parametric statistics and fitness landscape measurements from an available set of heuristics and benchmark instances, in order to produce a compact subset of effective heuristics for the underlying problem. We also propose a new iterated local search hyper-heuristic using multi-armed bandits coupled with a change detection mechanism. The methodology is tested on two real-world optimization problems: course timetabling and vehicle routing. The proposed hyper-heuristic with a compact heuristic pool, outperforms state-of-the-art hyper-heuristics and competes with problem-specific methods in course timetabling, even producing new best-known solutions in 5 out of the 24 studied instances.
BibTeX:
@article{Soria-AlcarazOSB17,
  author = {Jorge A. Soria-Alcaraz and Gabriela Ochoa and Marco A. Sotelo-Figeroa and Edmund K. Burke},
  title = {A Methodology for Determining an Effective Subset of Heuristics in Selection Hyper-heuristics},
  journal = {European Journal of Operational Research},
  year = {2017},
  volume = {260},
  number = {3},
  pages = {972-983},
  doi = {http://dx.doi.org/10.1016/j.ejor.2017.01.042}
}
Wang, H., He, S. and Yao, X. Nadir Point Estimation for Many-objective Optimization Problems based on Emphasized Critical Regions 2017 Soft Computing
Vol. 21(9), pp. 2283-2295 
article DOI  
Abstract: Nadir points play an important role in many-objective optimization problems, which describe the ranges of their Pareto fronts. Using nadir points as references, decision makers may obtain their preference information for many-objective optimization problems. As the number of objectives increases, nadir point estimation becomes a more difficult task. In this paper, we propose a novel nadir point estimation method based on emphasized critical regions for many-objective optimization problems. It maintains the non-dominated solutions near extreme points and critical regions after an individual number assignment to different critical regions. Furthermore, it eliminates similar individuals by a novel self-adaptive εε -clearing strategy. Our approach has been shown to perform better on many-objective optimization problems (between 10 objectives and 35 objectives) than two other state-of-the-art nadir point estimation approaches.
BibTeX:
@article{WangHY17,
  author = {Handing Wang and Shan He and Xin Yao},
  title = {Nadir Point Estimation for Many-objective Optimization Problems based on Emphasized Critical Regions},
  journal = {Soft Computing},
  year = {2017},
  volume = {21},
  number = {9},
  pages = {2283-2295},
  doi = {http://dx.doi.org/10.1007/s00500-015-1940-x}
}
Wang, H., Jin, Y. and Yao, X. Diversity Assessment in Many-Objective Optimization 2017 IEEE Transactions on Cybernetics
Vol. 47(6), pp. 1510-1522 
article DOI  
Abstract: Maintaining diversity is one important aim of multiobjective optimization. However, diversity for many-objective optimization problems is less straightforward to define than for multiobjective optimization problems. Inspired by measures for biodiversity, we propose a new diversity metric for many-objective optimization, which is an accumulation of the dissimilarity in the population, where an Lp -norm-based ( p<1 ) distance is adopted to measure the dissimilarity of solutions. Empirical results demonstrate our proposed metric can more accurately assess the diversity of solutions in various situations. We compare the diversity of the solutions obtained by four popular many-objective evolutionary algorithms using the proposed diversity metric on a large number of benchmark problems with two to ten objectives. The behaviors of different diversity maintenance methodologies in those algorithms are discussed in depth based on the experimental results. Finally, we show that the proposed diversity measure can also be employed for enhancing diversity maintenance or reference set generation in many-objective optimization.
BibTeX:
@article{WangJY17,
  author = {Handing Wang and Yaochu Jin and Xin Yao},
  title = {Diversity Assessment in Many-Objective Optimization},
  journal = {IEEE Transactions on Cybernetics},
  year = {2017},
  volume = {47},
  number = {6},
  pages = {1510-1522},
  doi = {http://dx.doi.org/10.1109/TCYB.2016.2550502}
}
White, D.R., Joffe, L., Bowles, E. and Swan, J. Deep Parameter Tuning of Concurrent Divide and Conquer Algorithms in Akka 2017 Proceedings of the 20th European Conference on the Applications of Evolutionary Computation (EvoApplications '17), pp. 35-48  inproceedings DOI  
Abstract: Akka is a widely-used high-performance and distributed computing toolkit for fine-grained concurrency, written in Scala for the Java Virtual Machine. Although Akka elegantly simplifies the process of building complex parallel software, many crucial decisions that affect system performance are deferred to the user. Employing the method of Deep Parameter Tuning to extract embedded ‘magic numbers’ from source code, we use the CMA-ES evolutionary computation algorithm to optimise the concurrent implementation of three widely-used divide-and-conquer algorithms within the Akka toolkit: Quicksort, Strassen’s matrix multiplication, and the Fast Fourier Transform.
BibTeX:
@inproceedings{WhiteJBS17,
  author = {David R. White and Leonid Joffe and Edward Bowles and Jerry Swan},
  title = {Deep Parameter Tuning of Concurrent Divide and Conquer Algorithms in Akka},
  booktitle = {Proceedings of the 20th European Conference on the Applications of Evolutionary Computation (EvoApplications '17)},
  publisher = {Springer},
  year = {2017},
  pages = {35-48},
  doi = {http://dx.doi.org/10.1007/978-3-319-55792-2_3}
}
Wu, F., Nanavati, J., Harman, M., Jia, Y. and Krinke, J. Memory Mutation Testing 2017 Information and Software Technology
Vol. 81, pp. 97-111 
article DOI  
Abstract: Context

Three decades of mutation testing development have given software testers a rich set of mutation operators, yet relatively few operators can target memory faults (as we demonstrate in this paper).

Objective

To address this shortcoming, we introduce Memory Mutation Testing, proposing 9 Memory Mutation Operators each of which targets common forms of memory fault. We compare Memory Mutation Operators with traditional Mutation Operators, while handling equivalent and duplicate mutants.

Method

We extend our previous workshop paper, which introduced Memory Mutation Testing, with a more extensive and precise analysis of 18 open source programs, including 2 large real-world programs, all of which come with well-designed unit test suites. Specifically, our empirical study makes use of recent results on Trivial Compiler Equivalence (TCE) to identify both equivalent and duplicate mutants. Though the literature on mutation testing has previously deployed various techniques to cater for equivalent mutants, no previous study has catered for duplicate mutants.

Results

Catering for such extraneous mutants improves the precision with which claims about mutation scores can be interpreted. We also report the results of a new empirical study that compares Memory Mutation Testing with traditional Mutation Testing, providing evidence to support the claim that traditional mutation testing inadequately captures memory faults; 2% of the memory mutants are TCE-duplicates of traditional mutants and average test suite effectiveness drops by 44% when the target shifts from traditional mutants to memory mutants.

Conclusions

Introducing Memory Mutation Operators will cost only a small portion of the overall testing effort, yet generate higher quality mutants compared with traditional operators. Moreover, TCE technique does not only help with reducing testing effort, but also improves the precision of assessment on test quality, therefore should be considered in other Mutation Testing studies.

BibTeX:
@article{WuNHJK17,
  author = {Fan Wu and Jay Nanavati and Mark Harman and Yue Jia and Jens Krinke},
  title = {Memory Mutation Testing},
  journal = {Information and Software Technology},
  year = {2017},
  volume = {81},
  pages = {97-111},
  doi = {http://dx.doi.org/10.1016/j.infsof.2016.03.002}
}
Yang, M., Omidvar, M.N., Li, C., Li, X., Cai, Z., Kazimipour, B. and Yao, X. Efficient Resource Allocation in Cooperative Co-evolution for Large-scale Global Optimization 2017 IEEE Transactions on Evolutionary Computation
Vol. 21(4), pp. 493-505 
article DOI  
Abstract: Cooperative co-evolution (CC) is an explicit means of problem decomposition in multipopulation evolutionary algorithms for solving large-scale optimization problems. For CC, subpopulations representing subcomponents of a large-scale optimization problem co-evolve, and are likely to have different contributions to the improvement of the best overall solution to the problem. Hence, it makes sense that more computational resources should be allocated to the subpopulations with greater contributions. In this paper, we study how to allocate computational resources in this context and subsequently propose a new CC framework named CCFR to efficiently allocate computational resources among the subpopulations according to their dynamic contributions to the improvement of the objective value of the best overall solution. Our experimental results suggest that CCFR can make efficient use of computational resources and is a highly competitive CCFR for solving large-scale optimization problems.
BibTeX:
@article{YangOLLCKY17,
  author = {Ming Yang and Mohammad Nabi Omidvar and Chenghe Li and Xiaodong Li and Zhuihua Cai and Borhan Kazimipour and Xin Yao},
  title = {Efficient Resource Allocation in Cooperative Co-evolution for Large-scale Global Optimization},
  journal = {IEEE Transactions on Evolutionary Computation},
  year = {2017},
  volume = {21},
  number = {4},
  pages = {493-505},
  doi = {http://dx.doi.org/10.1109/TEVC.2016.2627581}
}
Adair, J., Brownlee, A. and Ochoa, G. Evolutionary Algorithms with Linkage Information for Feature Selection in Brain Computer Interfaces 2016 Proceedings of the 16th UK Workshop on Computational Intelligence, pp. 287-307  inproceedings DOI  
Abstract: Brain Computer Interfaces are an essential technology for the advancement of prosthetic limbs, but current signal acquisition methods are hindered by a number of factors, not least, noise. In this context, Feature Selection is required to choose the important signal features and improve classifier accuracy. Evolutionary algorithms have proven to outperform filtering methods (in terms of accuracy) for Feature Selection. This paper applies a single-point heuristic search method, Iterated Local Search (ILS), and compares it to a genetic algorithm (GA) and a memetic algorithm (MA). It then further attempts to utilise Linkage between features to guide search operators in the algorithms stated. The GA was found to outperform ILS. Counter-intuitively, linkage-guided algorithms resulted in higher classification error rates than their unguided alternatives. Explanations for this are explored.
BibTeX:
@inproceedings{AdairBO16,
  author = {Jason Adair and Alexander Brownlee and Gabriela Ochoa},
  title = {Evolutionary Algorithms with Linkage Information for Feature Selection in Brain Computer Interfaces},
  booktitle = {Proceedings of the 16th UK Workshop on Computational Intelligence},
  publisher = {Springer},
  year = {2016},
  pages = {287-307},
  doi = {http://dx.doi.org/10.1007/978-3-319-46562-3_19}
}
Al-Subaihin, A.A., Sarro, F., Capra, L., Harman, M., Jia, Y. and Zhang, Y. Clustering Mobile Apps Based on Mined Textual Features 2016 Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM '16)  inproceedings DOI  
Abstract: Context: Categorising software systems according to their functionality yields many benefits to both users and developers. Goal: In order to uncover the latent clustering of mobile apps in app stores, we propose a novel technique that measures app similarity based on claimed behaviour. Method: Features are extracted using information retrieval augmented with ontological analysis and used as attributes to characterise apps. These attributes are then used to cluster the apps using agglomerative hierarchical clustering. We empirically evaluate our approach on 17,877 apps mined from the BlackBerry and Google app stores in 2014. Results: The results show that our approach dramatically improves the existing categorisation quality for both Blackberry (from 0.02 to 0.41 on average) and Google (from 0.03 to 0.21 on average) stores. We also find a strong Spearman rank correlation (ρ= 0.96 for Google and ρ= 0.99 for BlackBerry) between the number of apps and the ideal granularity within each category, indicating that ideal granularity increases with category size, as expected. Conclusions: Current categorisation in the app stores studied do not exhibit a good classification quality in terms of the claimed feature space. However, a better quality can be achieved using a good feature extraction technique and a traditional clustering method.
BibTeX:
@inproceedings{Al-SubaihinSCHJZ16,
  author = {Afnan A. Al-Subaihin and Federica Sarro and Licia Capra and Mark Harman and Yue Jia and Yuanyuan Zhang},
  title = {Clustering Mobile Apps Based on Mined Textual Features},
  booktitle = {Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM '16)},
  year = {2016},
  doi = {http://dx.doi.org/10.1145/2961111.2962600}
}
Bowes, D., Hall, T., Harman, M., Jia, Y., Sarro, F. and Wu, F. Mutation-Aware Fault Prediction 2016 Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA '16), pp. 330-341  inproceedings DOI  
Abstract: We introduce mutation-aware fault prediction, which leverages additional guidance from metrics constructed in terms of mutants and the test cases that cover and detect them. We report the results of 12 sets of experiments, applying 4 different predictive modelling techniques to 3 large real-world systems (both open and closed source). The results show that our proposal can significantly (p ≤ 0.05) improve fault prediction performance. Moreover, mutation-based metrics lie in the top 5% most frequently relied upon fault predictors in 10 of the 12 sets of experiments, and provide the majority of the top ten fault predictors in 9 of the 12 sets of experiments.
BibTeX:
@inproceedings{BowesHHJSW16,
  author = {David Bowes and Tracy Hall and Mark Harman and Yue Jia and Federica Sarro and Fan Wu},
  title = {Mutation-Aware Fault Prediction},
  booktitle = {Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA '16)},
  publisher = {ACM},
  year = {2016},
  pages = {330-341},
  doi = {http://dx.doi.org/10.1145/2931037.2931039}
}
Brownlee, A.E. Mining Markov Network Surrogates for Value-Added Optimisation 2016 Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (GECCO'16 Companion), pp. 1267-1274  inproceedings DOI  
Abstract: Surrogate fitness functions are a popular technique for speeding up metaheuristics, replacing calls to a costly fitness function with calls to a cheap model. However, surrogates also represent an explicit model of the fitness function, which can be exploited beyond approximating solution fitness. This paper proposes that mining surrogate fitness models can yield useful additional information on the problem to the decision maker, adding value to the optimisation process. An existing fitness model based on Markov networks is presented and applied to the optimisation of glazing on a building facade. Analysis of the model reveals how its parameters point towards the global optima of the problem after only part of the optimisation run, and reveals useful properties like the relative sensitivities of the problem variables.
BibTeX:
@inproceedings{Brownlee16,
  author = {Alexander E.I. Brownlee},
  title = {Mining Markov Network Surrogates for Value-Added Optimisation},
  booktitle = {Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (GECCO'16 Companion)},
  publisher = {ACM},
  year = {2016},
  pages = {1267-1274},
  doi = {http://dx.doi.org/10.1145/2908961.2931711}
}
Bruce, B.R., Aitken, J.M. and Petke, J. Deep Parameter Optimisation for Face Detection Using the Viola-Jones Algorithm in OpenCV 2016 Proceedings of the 8th International Symposium on Search Based Software Engineering (SSBSE '16), pp. 238-243  inproceedings DOI  
Abstract: OpenCV is a commonly used computer vision library containing a wide variety of algorithms for the AI community. This paper uses deep parameter optimisation to investigate improvements to face detection using the Viola-Jones algorithm in OpenCV, allowing a trade-off between execution time and classification accuracy. Our results show that execution time can be decreased by 48 % if a 1.80 % classification inaccuracy is permitted (compared to 1.04 % classification inaccuracy of the original, unmodified algorithm). Further execution time savings are possible depending on the degree of inaccuracy deemed acceptable by the user.
BibTeX:
@inproceedings{BruceAP16,
  author = {Bobby R. Bruce and Jonathan M. Aitken and Justyna Petke},
  title = {Deep Parameter Optimisation for Face Detection Using the Viola-Jones Algorithm in OpenCV},
  booktitle = {Proceedings of the 8th International Symposium on Search Based Software Engineering (SSBSE '16)},
  publisher = {Springer},
  year = {2016},
  pages = {238-243},
  doi = {http://dx.doi.org/10.1007/978-3-319-47106-8_18}
}
Burke, E.K. and Bykov, Y. An Adaptive Flex-Deluge Approach to University Exam Timetabling 2016 INFORMS Journal on Computing,
Vol. 28(4), pp. 781-794 
article DOI  
Abstract: This paper presents a new methodology for university exam timetabling problems, which draws upon earlier work on the Great Deluge metaheuristic. The new method introduces a “flexible” acceptance condition. Even a simple variant of this technique (with fixed flexibility) outperforms the original Great Deluge algorithm. Moreover, it enables a run-time adaptation of an acceptance condition for each particular move. We investigate the adaptive mechanism where the algorithm accepts the movement of exams in a way that is dependent upon the difficulty of assigning that exam. The overall motivation is to encourage the exploration of a wider region of the search space. We present an analysis of the results of our tests of this technique on two international collections of benchmark exam timetabling problems. We show that 9 of 16 solutions in the first collection and 11 of 12 solutions in the second collection produced by our technique have a higher level of quality than previously published methodologies.
BibTeX:
@article{BurkeB16,
  author = {Edmund K. Burke and Yuri Bykov},
  title = {An Adaptive Flex-Deluge Approach to University Exam Timetabling},
  journal = {INFORMS Journal on Computing,},
  year = {2016},
  volume = {28},
  number = {4},
  pages = {781-794},
  doi = {http://dx.doi.org/10.1287/ijoc.2015.0680}
}
Consoli, P.A., Mei, Y., Minku, L.L. and Yao, X. Dynamic Selection of Evolutionary Operators based on Online Learning and Fitness Landscape Analysis 2016 Soft Computing
Vol. 20(10), pp. 3889-3914 
article DOI  
Abstract: Self-adaptive mechanisms for the identification of the most suitable variation operator in evolutionary algorithms rely almost exclusively on the measurement of the fitness of the offspring, which may not be sufficient to assess the optimality of an operator (e.g., in a landscape with an high degree of neutrality). This paper proposes a novel adaptive operator selection mechanism which uses a set of four fitness landscape analysis techniques and an online learning algorithm, dynamic weighted majority, to provide more detailed information about the search space to better determine the most suitable crossover operator. Experimental analysis on the capacitated arc routing problem has demonstrated that different crossover operators behave differently during the search process, and selecting the proper one adaptively can lead to more promising results.
BibTeX:
@article{ConsoliMMY16,
  author = {Pietro A. Consoli and Yi Mei and Leandro L. Minku and Xin Yao},
  title = {Dynamic Selection of Evolutionary Operators based on Online Learning and Fitness Landscape Analysis},
  journal = {Soft Computing},
  year = {2016},
  volume = {20},
  number = {10},
  pages = {3889-3914},
  doi = {http://dx.doi.org/10.1007/s00500-016-2126-x}
}
Drake, J.H., Özcan, E. and Burk, E.K. A Case Study of Controlling Crossover in a Selection Hyper-Heuristic Framework using the Multidimensional Knapsack Problem 2016 Evolutionary Computation
Vol. 24(1), pp. 113-141 
article DOI  
Abstract: Hyper-heuristics are high-level methodologies for solving complex problems that operate on a search space of heuristics. In a selection hyper-heuristic framework, a heuristic is chosen from an existing set of low-level heuristics and applied to the current solution to produce a new solution at each point in the search. The use of crossover low-level heuristics is possible in an increasing number of general-purpose hyper-heuristic tools such as HyFlex and Hyperion. However, little work has been undertaken to assess how best to utilise it. Since a single-point search hyper-heuristic operates on a single candidate solution, and two candidate solutions are required for crossover, a mechanism is required to control the choice of the other solution. The frameworks we propose maintain a list of potential solutions for use in crossover. We investigate the use of such lists at two conceptual levels. First, crossover is controlled at the hyper-heuristic level where no problem-specific information is required. Second, it is controlled at the problem domain level where problem-specific information is used to produce good-quality solutions to use in crossover. A number of selection hyper-heuristics are compared using these frameworks over three benchmark libraries with varying properties for an NP-hard optimisation problem: the multidimensional 0-1 knapsack problem. It is shown that allowing crossover to be managed at the domain level outperforms managing crossover at the hyper-heuristic level in this problem domain.
BibTeX:
@article{DrakeOB16,
  author = {John H. Drake and Ender Özcan and Edmund K. Burk},
  title = {A Case Study of Controlling Crossover in a Selection Hyper-Heuristic Framework using the Multidimensional Knapsack Problem},
  journal = {Evolutionary Computation},
  year = {2016},
  volume = {24},
  number = {1},
  pages = {113-141},
  doi = {http://dx.doi.org/10.1162/EVCO_a_00145}
}
Gargantini, A., Petke, J., Radavelli, M. and Vavassori, P. Validation of Constraints Among Configuration Parameters Using Search-Based Combinatorial Interaction Testing 2016 Proceedings of the 8th International Symposium on Search Based Software Engineering (SSBSE '16), pp. 49-63  inproceedings DOI  
Abstract: The appeal of highly-configurable software systems lies in their adaptability to users’ needs. Search-based Combinatorial Interaction Testing (CIT) techniques have been specifically developed to drive the systematic testing of such highly-configurable systems. In order to apply these, it is paramount to devise a model of parameter configurations which conforms to the software implementation. This is a non-trivial task. Therefore, we extend traditional search-based CIT by devising 4 new testing policies able to check if the model correctly identifies constraints among the various software parameters. Our experiments show that one of our new policies is able to detect faults both in the model and the software implementation that are missed by the standard approaches.
BibTeX:
@inproceedings{GargantiniPRV16,
  author = {Angelo Gargantini and Justyna Petke and Marco Radavelli and Paolo Vavassori},
  title = {Validation of Constraints Among Configuration Parameters Using Search-Based Combinatorial Interaction Testing},
  booktitle = {Proceedings of the 8th International Symposium on Search Based Software Engineering (SSBSE '16)},
  publisher = {Springer},
  year = {2016},
  pages = {49-63},
  doi = {http://dx.doi.org/10.1007/978-3-319-47106-8_4}
}
He, J. and Yao, X. Average Drift Analysis and Population Scalability 2016 IEEE Transactions on Evolutionary Computation
Vol. 21(3), pp. 426-439 
article DOI  
Abstract: This paper aims to study how the population size affects the computation time of evolutionary algorithms (EAs) in a rigorous way. The computation time of EAs can be measured by either the number of generations (hitting time) or the number of fitness evaluations (running time) to find an optimal solution. Population scalability is the ratio of the expected hitting time between a benchmark algorithm and an algorithm using a larger population size. Average drift analysis is introduced to compare the expected hitting time of two algorithms and to estimate lower and upper bounds on the population scalability. Several intuitive beliefs are rigorously analyzed. It is proven that: 1) using a population sometimes increases rather than decreases the expected hitting time; 2) using a population cannot shorten the expected running time of any elitist EA on any unimodal function on the time-fitness landscape, however, this statement is not true in terms of the distance-based fitness landscape; and 3) using a population cannot always reduce the expected running time on deceptive functions, which depends on whether the benchmark algorithm uses elitist selection or random selection.
BibTeX:
@article{HeY16,
  author = {Jun He and Xin Yao},
  title = {Average Drift Analysis and Population Scalability},
  journal = {IEEE Transactions on Evolutionary Computation},
  year = {2016},
  volume = {21},
  number = {3},
  pages = {426-439},
  doi = {http://dx.doi.org/10.1109/TEVC.2016.2608420}
}
Hong, L., Drake, J.H., Woodward, J.R. and Özcan, E. Automatically Designing More General Mutation Operators of Evolutionary Programming for Groups of Function Classes Using a Hyper-Heuristic 2016 Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '16), pp. 725-732  inproceedings DOI  
Abstract: In this study we use Genetic Programming (GP) as an offline hyper-heuristic to evolve a mutation operator for Evolutionary Programming. This is done using the Gaussian and uniform distributions as the terminal set, and arithmetic operators as the function set. The mutation operators are automatically designed for a specific function class. The contribution of this paper is to show that a GP can not only automatically design a mutation operator for Evolutionary Programming (EP) on functions generated from a specific function class, but also can design more general mutation operators on functions generated from groups of function classes. In addition, the automatically designed mutation operators also show good performance on new functions generated from a specific function class or a group of function classes.
BibTeX:
@inproceedings{HongDWO16,
  author = {Libin Hong and John H. Drake and John R. Woodward and Ender Özcan},
  title = {Automatically Designing More General Mutation Operators of Evolutionary Programming for Groups of Function Classes Using a Hyper-Heuristic},
  booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '16)},
  publisher = {ACM},
  year = {2016},
  pages = {725-732},
  doi = {http://dx.doi.org/10.1145/2908812.2908958}
}
Soria-Alcaraza, J.A., Özcan, E., Swan, J., Kendall, G. and Carpio, M. Iterated Local Search Using an Add and Delete Hyper-heuristic for University Course Timetabling 2016 Applied Soft Computing
Vol. 40(40), pp. 581-593 
article DOI  
Abstract: Hyper-heuristics are (meta-)heuristics that operate at a higher level to choose or generate a set of low-level (meta-)heuristics in an attempt of solve difficult optimization problems. Iterated local search (ILS) is a well-known approach for discrete optimization, combining perturbation and hill-climbing within an iterative framework. In this study, we introduce an ILS approach, strengthened by a hyper-heuristic which generates heuristics based on a fixed number of add and delete operations. The performance of the proposed hyper-heuristic is tested across two different problem domains using real world benchmark of course timetabling instances from the second International Timetabling Competition Tracks 2 and 3. The results show that mixing add and delete operations within an ILS framework yields an effective hyper-heuristic approach.
BibTeX:
@article{ILSADHH16,
  author = {Jorge A. Soria-Alcaraza and Ender Özcan and Jerry Swan and Graham Kendall and Martin Carpio},
  title = { Iterated Local Search Using an Add and Delete Hyper-heuristic for University Course Timetabling },
  journal = {Applied Soft Computing},
  year = {2016},
  volume = {40},
  number = {40},
  pages = {581-593},
  doi = {http://dx.doi.org/10.1016/j.asoc.2015.11.043}
}
Jahangirova, G., Clark, D., Harman, M. and Tonella, P. Test oracle assessment and improvement 2016 Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA'16), pp. 247-258  inproceedings DOI  
Abstract: We introduce a technique for assessing and improving test oracles by reducing the incidence of both false positives and false negatives. We prove that our approach can always result in an increase in the mutual information between the actual and perfect oracles. Our technique combines test case generation to reveal false positives and mutation testing to reveal false negatives. We applied the decision support tool that implements our oracle improvement technique to five real-world subjects. The experimental results show that the fault detection rate of the oracles after improvement increases, on average, by 48.6% (86% over the implicit oracle). Three actual, exposed faults in the studied systems were subsequently confirmed and fixed by the developers.
BibTeX:
@inproceedings{JahangirovaCHT16,
  author = {Gunel Jahangirova and David Clark and Mark Harman and Paolo Tonella},
  title = {Test oracle assessment and improvement},
  booktitle = {Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA'16)},
  publisher = {ACM},
  year = {2016},
  pages = {247-258},
  doi = {http://dx.doi.org/10.1145/2931037.2931062}
}
Kocsis, Z.A., Drake, J.H., Carson, D. and Swan, J. Automatic Improvement of Apache Spark Queries using Semantics-preserving Program Reduction 2016 Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (GECCO '16 Companion), pp. 1141-1146  inproceedings DOI  
Abstract: Apache Spark is a popular framework for large-scale data analytics. Unfortunately, Spark's performance can be difficult to optimise, since queries freely expressed in source code are not amenable to traditional optimisation techniques. This article describes Hylas, a tool for automatically optimising Spark queries embedded in source code via the application of semantics-preserving transformations. The transformation method is inspired by functional programming techniques of "deforestation", which eliminate intermediate data structures from a computation. This contrasts with approaches defined entirely within structured query formats such as Spark SQL. Hylas can identify certain computationally expensive operations and ensure that performing them creates no superfluous data structures. This optimisation leads to significant improvements in execution time, with over 10,000 times improvement observed in some cases.
BibTeX:
@inproceedings{KocsisDCS16,
  author = {Zoltan A. Kocsis and John H. Drake and Douglas Carson and Jerry Swan},
  title = {Automatic Improvement of Apache Spark Queries using Semantics-preserving Program Reduction},
  booktitle = {Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (GECCO '16 Companion)},
  publisher = {ACM},
  year = {2016},
  pages = {1141-1146},
  doi = {http://dx.doi.org/10.1145/2908961.2931692}
}
Langdon, W.B. Benchmarking BarraCUDA on Epigenetic DNA and nVidia Pascal GPUs 2016 (RN/16/10)  techreport DOI  
Abstract: Typically BarraCUDA uses CUDA graphics cards to map DNA reads to the human genome. Previously its software source code was genetically improved for short paired end next generation sequences. On longer, 150 base paired end noisy Cambridge Epigenetix's data, a Pascal GTX 1080 processes about 10000 strings per second, comparable with twin nVidia Tesla K40.
BibTeX:
@techreport{Langdon16,
  author = {William B. Langdon},
  title = {Benchmarking BarraCUDA on Epigenetic DNA and nVidia Pascal GPUs},
  year = {2016},
  number = {RN/16/10},
  doi = {http://dx.doi.org/10.1101/095075}
}
Langdon, W.B. The Genetic Improvement Fitness Landscape 2016 (RN/16/04)  techreport URL 
Abstract: Trying all simple changes (first order mutations) to executed source code shows software engineering artefacts are more robust than is often assumed. Of those that compile, up to 89percent run without error. Indeed a few non equivalent mutants are improvements.
BibTeX:
@techreport{Langdon16b,
  author = {William B. Langdon},
  title = {The Genetic Improvement Fitness Landscape},
  year = {2016},
  number = {RN/16/04},
  url = {http://www.cs.ucl.ac.uk/fileadmin/UCL-CS/research/Research_Notes/rn1604.pdf}
}
Langdon, W.B. and Harman, M. Fitness Landscape of the Triangle Program 2016 (RN/16/05)  techreport URL 
Abstract: Trying all hopeful high order mutations to source code shows none of the first order schema of triangle software engineering benchmark are deceptive. Indeed these unit building blocks lead to all global optima. Suggesting program improvements may not be as hard to find as is often assumed.
BibTeX:
@techreport{LangdonH16,
  author = {William B. Langdon and Mark Harman},
  title = {Fitness Landscape of the Triangle Program},
  year = {2016},
  number = {RN/16/05},
  url = {http://www.cs.ucl.ac.uk/fileadmin/UCL-CS/research/Research_Notes/rn1605.pdf}
}
Langdon, W.B. and Ochoa, G. Genetic Improvement: A Key Challenge for Evolutionary Computation 2016 Proceedings of IEEE Congress on Evolutionary Computation (CEC '16), pp. 3068-3075  inproceedings DOI  
Abstract: Automatic Programming has long been a sub-goal of Artificial Intelligence (AI). It is feasible in limited domains. Genetic Improvement (GI) has expanded these dramatically to more than 100 000 lines of code by building on human written applications. Further scaling may need key advances in both Search Based Software Engineering (SBSE) and Evolutionary Computation (EC) research, particularly on representations, genetic operations, fitness landscapes, fitness surrogates, multi objective search and co-evolution.
BibTeX:
@inproceedings{LangdonO16,
  author = {William B. Langdon and Gabriela Ochoa},
  title = {Genetic Improvement: A Key Challenge for Evolutionary Computation},
  booktitle = {Proceedings of IEEE Congress on Evolutionary Computation (CEC '16)},
  publisher = {IEEE},
  year = {2016},
  pages = {3068-3075},
  doi = {http://dx.doi.org/10.1109/CEC.2016.7744177}
}
Langdon, W.B., Vilella, A., Lam, B.Y.H., Petke, J. and Harman, M. Benchmarking Genetically Improved BarraCUDA on Epigenetic Methylation NGS datasets and nVidia GPUs 2016 Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (GECCO '16 Companion), pp. 1131-1132  inproceedings DOI  
Abstract: BarraCUDA uses CUDA graphics cards to map DNA reads to the human genome. Previously its software source code was genetically improved for short paired end next generation sequences. On longer noisy epigenetics strings using nVidia Titan and twin Tesla K40 the same GI-ed code is more than 3 times faster than bwa-meth on an 8 core CPU.
BibTeX:
@inproceedings{LangdonVLPH16,
  author = {William B. Langdon and Albert Vilella and Brian Yee Hong Lam and Justyna Petke and Mark Harman},
  title = {Benchmarking Genetically Improved BarraCUDA on Epigenetic Methylation NGS datasets and nVidia GPUs},
  booktitle = {Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (GECCO '16 Companion)},
  publisher = {ACM},
  year = {2016},
  pages = {1131-1132},
  doi = {http://dx.doi.org/10.1145/2908961.2931687}
}
Li, B., Qian, C., Li, J., Tang, K. and Yao, X. Search based recommender system using many-objective evolutionary algorithm 2016 Proceedings of IEEE Congress on Evolutionary Computation (CEC'16), pp. 120-126  inproceedings DOI  
Abstract: With the explosively increase of information and products, recommender systems have played a more and more important role in the recent years. Various recommendation algorithms, such as content-based methods and collaborative filtering methods, have been proposed. There are a number of performance metrics for evaluating recommender systems, and considering only the precision or diversity might be inappropriate. However, to the best of our knowledge, no existing work has considered recommendation with many objectives. In this paper, we model a many-objective search-based recommender system and adopt a recently proposed many-objective evolutionary algorithm to optimize it. Experimental results on the Movielens data set demonstrate that our algorithm performs better in terms of Generational Distance (GD), Inverted Generational Distance (IGD) and Hypervolume (HV) on most test cases.
BibTeX:
@inproceedings{LiQLTY16,
  author = {Bingdong Li and Chao Qian and Jinlong Li and Ke Tang and Xin Yao},
  title = {Search based recommender system using many-objective evolutionary algorithm},
  booktitle = {Proceedings of  IEEE Congress on Evolutionary Computation (CEC'16)},
  publisher = {IEEE},
  year = {2016},
  pages = {120-126},
  doi = {http://dx.doi.org/10.1109/CEC.2016.7743786}
}
Li, B., Tang, K., Li, J. and Yao, X. Stochastic Ranking Algorithm for Many-Objective Optimization Based on Multiple Indicators 2016 IEEE Transactions on Evolutionary Computation
Vol. 20(6), pp. 924-938 
article DOI  
Abstract: Traditional multiobjective evolutionary algorithms face a great challenge when dealing with many objectives. This is due to a high proportion of nondominated solutions in the population and low selection pressure toward the Pareto front. In order to tackle this issue, a series of indicator-based algorithms have been proposed to guide the search process toward the Pareto front. However, a single indicator might be biased and lead the population to converge to a subregion of the Pareto front. In this paper, a multi-indicator-based algorithm is proposed for many-objective optimization problems. The proposed algorithm, namely stochastic ranking-based multi-indicator Algorithm (SRA), adopts the stochastic ranking technique to balance the search biases of different indicators. Empirical studies on a large number (39 in total) of problem instances from two well-defined benchmark sets with 5, 10, and 15 objectives demonstrate that SRA performs well in terms of inverted generational distance and hypervolume metrics when compared with state-of-the-art algorithms. Empirical studies also reveal that, in the case a problem requires the algorithm to have strong convergence ability, the performance of SRA can be further improved by incorporating a direction-based archive to store well-converged solutions and maintain diversity.
BibTeX:
@article{LiTLY16,
  author = {Bingdong Li and Ke Tang and Jinlong Li and Xin Yao},
  title = {Stochastic Ranking Algorithm for Many-Objective Optimization Based on Multiple Indicators},
  journal = {IEEE Transactions on Evolutionary Computation},
  year = {2016},
  volume = {20},
  number = {6},
  pages = {924-938},
  doi = {http://dx.doi.org/10.1109/TEVC.2016.2549267}
}
Mao, K., Harman, M. and Jia, Y. Sapienz: Multi-objective Automated Testing for Android Applications 2016 Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA '16), pp. 94-105  inproceedings DOI  
Abstract: We introduce Sapienz, an approach to Android testing that uses multi-objective search-based testing to automatically explore and optimise test sequences, minimising length, while simultaneously maximising coverage and fault revelation. Sapienz combines random fuzzing, systematic and search-based exploration, exploiting seeding and multi-level instrumentation. Sapienz significantly outperforms (with large effect size) both the state-of-the-art technique Dynodroid and the widely-used tool, Android Monkey, in 7/10 experiments for coverage, 7/10 for fault detection and 10/10 for fault-revealing sequence length. When applied to the top 1,000 Google Play apps, Sapienz found 558 unique, previously unknown crashes. So far we have managed to make contact with the developers of 27 crashing apps. Of these, 14 have confirmed that the crashes are caused by real faults. Of those 14, six already have developer-confirmed fixes.
BibTeX:
@inproceedings{MaoHJ16,
  author = {Ke Mao and Mark Harman and Yue Jia},
  title = {Sapienz: Multi-objective Automated Testing for Android Applications},
  booktitle = {Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA '16)},
  publisher = {ACM},
  year = {2016},
  pages = {94-105},
  doi = {http://dx.doi.org/10.1145/2931037.2931054}
}
Martin, W. Casual Impact for App Store Analysis 2016 ICSE 2016 ACM Student Research Competition (SRC)  inproceedings  
BibTeX:
@inproceedings{Martin16,
  author = {William Martin},
  title = {Casual Impact for App Store Analysis},
  booktitle = {ICSE 2016 ACM Student Research Competition (SRC)},
  year = {2016}
}
Martina, S., Ouelhadj, D., Beullens, P., Özcan, E., Juane, A.A. and Burke, E.K. A Multi-agent based Cooperative Framework to Scheduling and Routing 2016 European Journal of Operational Research
Vol. 254(1), pp. 169-178 
article DOI  
Abstract: In this paper, we propose a general agent-based distributed framework where each agent is implementing a different metaheuristic/local search combination. Moreover, an agent continuously adapts itself during the search process using a direct cooperation protocol based on reinforcement learning and pattern matching. Good patterns that make up improving solutions are identified and shared by the agents. This agent-based system aims to provide a modular flexible framework to deal with a variety of different problem domains. We have evaluated the performance of this approach using the proposed framework which embodies a set of well known metaheuristics with different configurations as agents on two problem domains, Permutation Flow-shop Scheduling and Capacitated Vehicle Routing. The results show the success of the approach yielding three new best known results of the Capacitated Vehicle Routing benchmarks tested, whilst the results for Permutation Flow-shop Scheduling are commensurate with the best known values for all the benchmarks tested.
BibTeX:
@article{MartinOBOJB16,
  author = {Simon Martina and Djamila Ouelhadj and Patrick Beullens and Ender Özcan and Angel A. Juane and Edmund K. Burke},
  title = {A Multi-agent based Cooperative Framework to Scheduling and Routing},
  journal = {European Journal of Operational Research},
  year = {2016},
  volume = {254},
  number = {1},
  pages = {169-178},
  doi = {http://dx.doi.org/10.1016/j.ejor.2016.02.045}
}
Martino, S.D., Ferrucci, F., Gravino, C. and Sarro, F. Web Effort Estimation: Function Point Analysis vs. COSMIC 2016 Information and Software Technology
Vol. 72, pp. 90-109 
article DOI  
Abstract: Context: software development effort estimation is a crucial management task that critically depends on the adopted size measure. Several Functional Size Measurement (FSM) methods have been proposed. COSMIC is considered a 2nd generation FSM method, to differentiate it from Function Point Analysis (FPA) and its variants, considered as 1st generation ones. In the context of Web applications, few investigations have been performed to compare the effectiveness of the two generations. Software companies could benefit from this analysis to evaluate if it is worth to migrate from a 1st generation method to a 2nd one.
Objective: the main goal of the paper is to empirically investigate if COSMIC is more effective than FPA for Web effort estimation. Since software companies using FPA cannot build an estimation model based on COSMIC as long as they do not have enough COSMIC data, the second goal of the paper is to investigate if conversion equations can be exploited to support the migration from FPA to COSMIC.
Method: two empirical studies have been carried out by employing an industrial data set. The first one compared the effort prediction accuracy obtained with Function Points (FPs) and COSMIC, using two estimation techniques (Simple Linear Regression and Case-Based Reasoning). The second study assessed the effectiveness of a two-step strategy that first exploits a conversion equation to transform historical FPs data into COSMIC, and then builds a new prediction model based on those estimated COSMIC sizes.
Results: the first study revealed that, on our data set, COSMIC was significantly more accurate than FPs in estimating the development effort. The second study revealed that the effectiveness of the analyzed two-step process critically depends on the employed conversion equation.
Conclusion: for Web effort estimation COSMIC can be significantly more effective than FPA. Nevertheless, additional research must be conducted to identify suitable conversion equations so that the two-step strategy can be effectively employed for a smooth migration from FPA to COSMIC.
BibTeX:
@article{MartinoFGS16,
  author = {Sergio Di Martino and Filomena Ferrucci and Carmine Gravino and Federica Sarro},
  title = {Web Effort Estimation: Function Point Analysis vs. COSMIC},
  journal = {Information and Software Technology},
  year = {2016},
  volume = {72},
  pages = {90-109},
  doi = {http://dx.doi.org/10.1016/j.infsof.2015.12.001}
}
Martin, W., Sarro, F. and Harman, M. Causal Impact Analysis for App Releases in Google Play 2016 Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE '16), pp. 435-446  inproceedings DOI  
Abstract: App developers would like to understand the impact of their own and their competitors’ software releases. To address this we introduce Causal Impact Release Analysis for app stores, and our tool, CIRA, that implements this analysis. We mined 38,858 popular Google Play apps, over a period of 12 months. For these apps, we identified 26,339 releases for which there was adequate prior and posterior time series data to facilitate causal impact analysis. We found that 33% of these releases caused a statistically significant change in user ratings. We use our approach to reveal important characteristics that distinguish causal significance in Google Play. To explore the actionability of causal impact analysis, we elicited the opinions of app developers: 56 companies responded, 78% concurred with the causal assessment, of which 33% claimed that their company would consider changing its app release strategy as a result of our findings.
BibTeX:
@inproceedings{MartinSH16,
  author = {William Martin and Federica Sarro and Mark Harman},
  title = {Causal Impact Analysis for App Releases in Google Play},
  booktitle = {Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE '16)},
  publisher = {ACM},
  year = {2016},
  pages = {435-446},
  doi = {http://dx.doi.org/10.1145/2950290.2950320}
}
Mei, Y., Omidvar, M.N., Li, X. and Yao, X. A Competitive Divide-and-Conquer Algorithm for Unconstrained Large-Scale Black-Box Optimization 2016 ACM Transactions on Mathematical Software
Vol. 42(2) 
article DOI  
Abstract: This article proposes a competitive divide-and-conquer algorithm for solving large-scale black-box optimization problems for which there are thousands of decision variables and the algebraic models of the problems are unavailable. We focus on problems that are partially additively separable, since this type of problem can be further decomposed into a number of smaller independent subproblems. The proposed algorithm addresses two important issues in solving large-scale black-box optimization: (1) the identification of the independent subproblems without explicitly knowing the formula of the objective function and (2) the optimization of the identified black-box subproblems. First, a Global Differential Grouping (GDG) method is proposed to identify the independent subproblems. Then, a variant of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is adopted to solve the subproblems resulting from its rotation invariance property. GDG and CMA-ES work together under the cooperative co-evolution framework. The resultant algorithm, named CC-GDG-CMAES, is then evaluated on the CEC’2010 large-scale global optimization (LSGO) benchmark functions, which have a thousand decision variables and black-box objective functions. The experimental results show that, on most test functions evaluated in this study, GDG manages to obtain an ideal partition of the index set of the decision variables, and CC-GDG-CMAES outperforms the state-of-the-art results. Moreover, the competitive performance of the well-known CMA-ES is extended from low-dimensional to high-dimensional black-box problems.
BibTeX:
@article{MeiOLY16,
  author = {Yi Mei and Mohammad Nabi Omidvar and Xiaodong Li and Xin Yao},
  title = {A Competitive Divide-and-Conquer Algorithm for Unconstrained Large-Scale Black-Box Optimization},
  journal = {ACM Transactions on Mathematical Software},
  year = {2016},
  volume = {42},
  number = {2},
  doi = {http://dx.doi.org/10.1145/2791291}
}
Ochoa, G. and Veerapen, N. Deconstructing the Big Valley Search Space Hypothesis 2016 Proceedings of the 16th European Conference on Evolutionary Computation in Combinatiorial Optimisation (EvoCOP '16), pp. 58-73  inproceedings DOI  
Abstract: The big valley hypothesis suggests that, in combinatorial optimisation, local optima of good quality are clustered and surround the global optimum. We show here that the idea of a single valley does not always hold. Instead the big valley seems to de-construct into several valleys, also called 'funnels' in theoretical chemistry. We use the local optima networks model and propose an effective procedure for extracting the network data. We conduct a detailed study on four selected TSP instances of moderate size and observe that the big valley decomposes into a number of sub-valleys of different sizes and fitness distributions. Sometimes the global optimum is located in the largest valley, which suggests an easy to search landscape, but this is not generally the case. The global optimum might be located in a small valley, which offers a clear and visual explanation of the increased search difficulty in these cases. Our study opens up new possibilities for analysing and visualising combinatorial landscapes as complex networks.
BibTeX:
@inproceedings{OchoaV16,
  author = {Gabriela Ochoa and Nadarajen Veerapen},
  title = {Deconstructing the Big Valley Search Space Hypothesis},
  booktitle = {Proceedings of the 16th European Conference on Evolutionary Computation in Combinatiorial Optimisation (EvoCOP '16)},
  publisher = {Springer},
  year = {2016},
  pages = {58-73},
  doi = {http://dx.doi.org/10.1007/978-3-319-30698-8_5}
}
Ochoa, G. and Veerapen, N. Additional Dimensions to the Study of Funnels in Combinatorial Landscapes 2016 Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '16), pp. 373-380  inproceedings DOI  
Abstract: The global structure of travelling salesman's fitness landscapes has recently revealed the presence of multiple 'funnels'. This implies that local optima are organised into several clusters, so that a particular local optimum largely belongs to a particular funnel. Such a global structure can increase search difficulty, especially, when the global optimum is located in a deep, narrow funnel. Our study brings more precision (and dimensions) to the notion of funnels with a data-driven approach using Local Optima Networks and the Chained Lin-Kernighan heuristic. We start by exploring the funnel `floors', characterising them using the notion of communities from complex networks. We then analyse the more complex funnel `basins'. Since their depth is relevant to search, we visualise them in 3D. Our study, across a set of TSP instances, reveals a multi-funnel structure in most of them. However, the specific topology varies across instances and relates to search difficulty. Finally, including a stronger perturbation into Chained Lin-Kernighan proved to smooth the funnel structure, reducing the number of funnels and enlarging the valley leading to global optima.
BibTeX:
@inproceedings{OchoaV16b,
  author = {Gabriela Ochoa and Nadarajen Veerapen},
  title = {Additional Dimensions to the Study of Funnels in Combinatorial Landscapes},
  booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '16)},
  publisher = {ACM},
  year = {2016},
  pages = {373-380},
  doi = {http://dx.doi.org/10.1145/2908812.2908820}
}
Özcan, E., Drake, J.H., CevriyeAltıntaş and Asta, S. A Self-adaptive Multimeme Memetic Algorithm Co-evolving Utility Scores to Control Genetic Operators and Their Parameter Settings 2016 Applied Soft Computing
Vol. 49, pp. 81-93 
article DOI  
Abstract: Memetic algorithms are a class of well-studied metaheuristics which combine evolutionary algorithms and local search techniques. A meme represents contagious piece of information in an adaptive information sharing system. The canonical memetic algorithm uses a fixed meme, denoting a hill climbing operator, to improve each solution in a population during the evolutionary search process. Given global parameters and multiple parameterised operators, adaptation often becomes a crucial constituent in the design of MAs. In this study, a self-adaptive self-configuring Steady-state Multimeme Memetic Algorithm (SSMMA) variant is proposed. Along with the individuals (solutions), SSMMA co-evolves memes, encoding the utility score for each algorithmic component choice and relevant parameter setting option. An individual uses tournament selection to decide which operator and parameter setting to employ at a given step. The performance of the proposed algorithm is evaluated on six combinatorial optimisation problems from a cross-domain heuristic search benchmark. The results indicate the success of SSMMA when compared to the static MAs as well as widely used self-adaptive Multimeme Memetic Algorithm from the scientific literature.
BibTeX:
@article{OzcanDAA16,
  author = {Ender Özcan and John H. Drake and CevriyeAltıntaş and Shahriar Asta},
  title = {A Self-adaptive Multimeme Memetic Algorithm Co-evolving Utility Scores to Control Genetic Operators and Their Parameter Settings},
  journal = {Applied Soft Computing},
  year = {2016},
  volume = {49},
  pages = {81-93},
  doi = {http://dx.doi.org/10.1016/j.asoc.2016.07.032}
}
Papadakis, M., Henard, C., Harman, M., Jia, Y. and le Traon, Y. Threats to the validity of mutation-based test assessment 2016 Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA'16), pp. 254-265  inproceedings DOI  
Abstract: Much research on software testing and test techniques relies on experimental studies based on mutation testing. In this paper we reveal that such studies are vulnerable to a potential threat to validity, leading to possible Type I errors; incorrectly rejecting the Null Hypothesis. Our findings indicate that Type I errors occur, for arbitrary experiments that fail to take countermeasures, approximately 62% of the time. Clearly, a Type I error would potentially compromise any scientific conclusion. We show that the problem derives from such studies’ combined use of both subsuming and subsumed mutants. We collected articles published in the last two years at three leading software engineering conferences. Of those that use mutation-based test assessment, we found that 68% are vulnerable to this threat to validity.
BibTeX:
@inproceedings{PapadakisHHJT16,
  author = {Mike Papadakis and Christopher Henard and Mark Harman and Yue Jia and Yves le Traon},
  title = {Threats to the validity of mutation-based test assessment},
  booktitle = {Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA'16)},
  publisher = {ACM},
  year = {2016},
  pages = {254-265},
  doi = {http://dx.doi.org/10.1145/2931037.2931040}
}
Petke, J. Genetic Improvement for Code Obfuscation 2016 Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (GECCO '16 Companion), pp. 1135-1136  inproceedings DOI  
Abstract: Genetic improvement (GI) is a relatively new area of software engineering and thus the extent of its applicability is yet to be explored. Although a growing interest in GI in recent years started with the work on automatic bug fixing, the area flourished when results on optimisation of non-functional software properties, such as efficiency and energy consumption, were published. Further success of GI in transplanting functionality from one program to another leads to a question: what other software engineering areas can benefit from the use of genetic improvement techniques? We propose to utilise GI for code obfuscation.
BibTeX:
@inproceedings{Petke16,
  author = {Justyna Petke},
  title = {Genetic Improvement for Code Obfuscation},
  booktitle = {Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (GECCO '16 Companion)},
  publisher = {ACM},
  year = {2016},
  pages = {1135-1136},
  doi = {http://dx.doi.org/10.1145/2908961.2931689}
}
Ragkhitwetsagul, C., Paixao, M., Adham, M., Busari, S., Krinke, J. and Drake, J.H. Searching for Configurations in Clone Evaluation - A Replication Study 2016 Proceedings of the 8th International Symposium on Search Based Software Engineering (SSBSE '16), pp. 250-256  inproceedings DOI  
Abstract: Clone detection is the process of finding duplicated code within a software code base in an automated manner. It is useful in several areas of software development such as code quality analysis, bug detection, and program understanding. We replicate a study of a genetic-algorithm based framework that optimises parameters for clone agreement (EvaClone). We apply the framework to 14 releases of Mockito, a Java mocking framework. We observe that the optimised parameters outperform the tools’ default parameters in term of clone agreement by 19.91 % to 66.43 %. However, the framework gives undesirable results in term of clone quality. EvaClone either maximises or minimises a number of clones in order to achieve the highest agreement resulting in more false positives or false negatives introduced consequently.
BibTeX:
@inproceedings{RagkhitwetsagulPABKD16,
  author = {Chaiyong Ragkhitwetsagul and Matheus Paixao and Manal Adham and Saheed Busari and Jens Krinke and John H. Drake},
  title = {Searching for Configurations in Clone Evaluation - A Replication Study},
  booktitle = {Proceedings of the 8th International Symposium on Search Based Software Engineering (SSBSE '16)},
  publisher = {Springer},
  year = {2016},
  pages = {250-256},
  doi = {http://dx.doi.org/10.1007/978-3-319-47106-8_20}
}
Ryser-Welch, P., Miller, J.F., Swan, J. and Trefzer, M.A. Iterative Cartesian Genetic Programming: Creating general algorithms for solving Travelling Salesman Problems 2016
Vol. 9594EuroGP 2016: Proceedings of the 19th European Conference on Genetic Programming, pp. 286-301 
inproceedings  
Abstract: Evolutionary algorithms have been widely used to optimise or design search algorithms, however, very few have considered evolving iterative algorithms. In this paper, we introduce a novel extension to Cartesian Genetic Programming that allows it to encode iterative algorithms. We apply this technique to the Traveling Salesman Problem to produce human-readable solvers which can be then be independently implemented. Our experimental results demonstrate that the evolved solvers scale well to much larger TSP instances than those used for training.
BibTeX:
@inproceedings{Ryser-WelchMST16,
  author = {Patricia Ryser-Welch and Julian F. Miller and Jerry Swan and Martin A. Trefzer},
  title = {Iterative Cartesian Genetic Programming: Creating general algorithms for solving Travelling Salesman Problems},
  booktitle = {EuroGP 2016: Proceedings of the 19th European Conference on Genetic Programming},
  publisher = {Springer},
  year = {2016},
  volume = {9594},
  pages = {286-301}
}
Sarro, F., Petrozziello, A. and Harman, M. Multi-Objective Software Effort Estimation 2016 Proceedings of the 38th International Conference on Software Engineering (ICSE '16), pp. 619-630  inproceedings DOI  
Abstract: We introduce a bi-objective effort estimation algorithm that combines Confidence Interval Analysis and assessment of Mean Absolute Error. We evaluate our proposed algorithm on three different alternative formulations, baseline comparators and current state-of-the-art effort estimators applied to five real-world datasets from the PROMISE repository, involving 724 different software projects in total. The results reveal that our algorithm outperforms the baseline, state-of-the-art and all three alternative formulations, statistically significantly (p < 0.001) and with large effect size (Â12 ≥ 0.9) over all five datasets. We also provide evidence that our algorithm creates a new state-of-the-art, which lies within currently claimed industrial human-expert-based thresholds, thereby demonstrating that our findings have actionable conclusions for practicing software engineers.
BibTeX:
@inproceedings{SarroPH16,
  author = {Federica Sarro and Alessio Petrozziello and Mark Harman},
  title = {Multi-Objective Software Effort Estimation},
  booktitle = {Proceedings of the 38th International Conference on Software Engineering (ICSE '16)},
  publisher = {ACM},
  year = {2016},
  pages = {619-630},
  doi = {http://dx.doi.org/10.1145/2884781.2884830}
}
Shen, X.-N., Minku, L.L., Bahsoon, R. and Yao, X. Dynamic Software Project Scheduling through a Proactive-rescheduling Method 2016 IEEE Transactions on Software Engineering
Vol. 42(7), pp. 658-686 
article DOI  
Abstract: Software project scheduling in dynamic and uncertain environments is of significant importance to real-world software development. Yet most studies schedule software projects by considering static and deterministic sce-narios only, which may cause performance deterioration or even infeasibility when facing disruptions. In order to cap-turemore dynamic features of software project scheduling than the previous work, this paper formulates the project scheduling problemby considering uncertainties and dynamic events that often occur during software project devel-opment, and constructs a mathematical model for the resulting Multi-objective Dynamic Project Scheduling Problem (MODPSP), where the four objectives of project cost, duration, robustness and stability are considered simultaneously under a variety of practical constraints. In order to solve MODPSP appropriately, a multi-objective evolutionary algo-rithm (MOEA)based proactive-rescheduling method is proposed, which generates a robust schedule predictively and adapts the previous schedule in response to critical dynamic events during the project execution. Extensive experi-mental results on 21 problem instances, including three instances derived from real-world software projects, show that our novel method is very effective. By introducing the robustness and stability objectives, and incorporating the dy-namic optimization strategies specifically designed for MODPSP, our proactive-rescheduling method achieves a very good overall performance in a dynamic environment.
BibTeX:
@article{ShenMBY16,
  author = {Xiao-Ning Shen and Leandro L. Minku and Rami Bahsoon and Xin Yao},
  title = {Dynamic Software Project Scheduling through a Proactive-rescheduling Method},
  journal = {IEEE Transactions on Software Engineering},
  year = {2016},
  volume = {42},
  number = {7},
  pages = {658-686},
  doi = {http://dx.doi.org/10.1109/TSE.2015.2512266}
}
Sosa-Ascencio, A., Ochoa, G., Terashima-Marin, H. and Conant-Pablos, S.E. Grammar-based Generation of Variable-selection Heuristics for Constraint Satisfaction Problems 2016 Genetic Programming and Evolvable Machines
Vol. 17(2), pp. 119-144 
article DOI  
Abstract: We propose a grammar-based genetic programming framework that generates variable-selection heuristics for solving constraint satisfaction problems. This approach can be considered as a generation hyper-heuristic. A grammar to express heuristics is extracted from successful human-designed variable-selection heuristics. The search is performed on the derivation sequences of this grammar using a strongly typed genetic programming framework. The approach brings two innovations to grammar-based hyper-heuristics in this domain: the incorporation of if-then-else rules to the function set, and the implementation of overloaded functions capable of handling different input dimensionality. Moreover, the heuristic search space is explored using not only evolutionary search, but also two alternative simpler strategies, namely, iterated local search and parallel hill climbing. We tested our approach on synthetic and real-world instances. The newly generated heuristics have an improved performance when compared against human-designed heuristics. Our results suggest that the constrained search space imposed by the proposed grammar is the main factor in the generation of good heuristics. However, to generate more general heuristics, the composition of the training set and the search methodology played an important role. We found that increasing the variability of the training set improved the generality of the evolved heuristics, and the evolutionary search strategy produced slightly better results.
BibTeX:
@article{Sosa-AscencioOTC16,
  author = {Alejandro Sosa-Ascencio and Gabriela Ochoa and Hugo Terashima-Marin and Santiago Enrique Conant-Pablos},
  title = {Grammar-based Generation of Variable-selection Heuristics for Constraint Satisfaction Problems},
  journal = {Genetic Programming and Evolvable Machines},
  year = {2016},
  volume = {17},
  number = {2},
  pages = {119-144},
  doi = {http://dx.doi.org/10.1007/s10710-015-9249-1}
}
Sun, Y., Minku, L.L., Wang, S. and Yao, X. Online Ensemble Learning of Data Streams with Gradually Evolved Classes 2016 IEEE Transactions on Knowledge and Data Engineering
Vol. 28(6), pp. 1532-1545 
article DOI  
BibTeX:
@article{SunMWY16,
  author = {Y. Sun and Leandro L. Minku and Shuo Wang and Xin Yao},
  title = {Online Ensemble Learning of Data Streams with Gradually Evolved Classes},
  journal = {IEEE Transactions on Knowledge and Data Engineering},
  year = {2016},
  volume = {28},
  number = {6},
  pages = {1532-1545},
  doi = {http://dx.doi.org/10.1109/TKDE.2016.2526675}
}
Swan, J., Causmaecker, P.D., Martin, S. and Özcan, E. A Re-characterization of Hyper-heuristics 2016 Recent Developments of Metaheuristics  incollection  
BibTeX:
@incollection{SwanCMO16,
  author = {Jerry Swan and Patrick De Causmaecker and Simon Martin and Ender Özcan},
  title = {A Re-characterization of Hyper-heuristics},
  booktitle = {Recent Developments of Metaheuristics},
  publisher = {Springer},
  year = {2016}
}
Tang, K., Yang, P. and Yao, X. Negatively Correlated Search 2016 IEEE Journal on Selected Areas in Communications
Vol. 34(3), pp. 542-550 
article DOI  
Abstract: Evolutionary algorithms (EAs) have been shown to be powerful tools for complex optimization problems, which are ubiquitous in both communication and big data analytics. This paper presents a new EA, namely negatively correlated search (NCS), which maintains multiple individual search processes in parallel and models the search behaviors of individual search processes as probability distributions. NCS explicitly promotes negatively correlated search behaviors by encouraging differences among the probability distributions (search behaviors). By this means, individual search processes share information and cooperate with each other to search diverse regions of a search space, which makes NCS a promising method for nonconvex optimization. The co-operation scheme of NCS could also be regarded as a novel diversity preservation scheme that, different from other existing schemes, directly promotes diversity at the level of search behaviors rather than merely trying to maintain diversity among candidate solutions. Empirical studies showed that NCS is competitive to well-established search methods in the sense that NCS achieved the best overall performance on 20 multimodal (nonconvex) continuous optimization problems. The advantages of NCS over state-of-the-art approaches are also demonstrated with a case study on the synthesis of unequally spaced linear antenna arrays.
BibTeX:
@article{TangYY16,
  author = {Ke Tang and Peng Yang and Xin Yao},
  title = {Negatively Correlated Search},
  journal = {IEEE Journal on Selected Areas in Communications},
  year = {2016},
  volume = {34},
  number = {3},
  pages = {542-550},
  doi = {http://dx.doi.org/10.1109/JSAC.2016.2525458}
}
Tayarani-N, M.-H., Bennett, A.P., Xu, H. and Yao, X. Improving the performance of evolutionary engine calibration algorithms with principal component analysis 2016 Proceedings of IEEE Congress on Evolutionary Computation (CEC'16), pp. 5128-5137  inproceedings DOI  
Abstract: By studying the fitness landscape properties of engine calibration problem we propose a new Principal Component Analysis (PCA) based optimisation algorithm for the problem. The engine calibration problem in this paper is to minimise the fuel consumption, gas emission and particle emission of a Jaguar car engine. To evaluate the fuel consumption and emissions of the engine, a model of the engine that was developed in University of Birmingham was used. A strength Pareto method is used to convert the three objectives into one fitness value. Then a local search algorithm is used to find local optima. We then study these local optima to find the properties of good solutions in the landscape. Our studies on the good solutions show that the best solutions in the landscape show some patterns. We perform Principal Component Analysis (PCA) on the good solutions and show that these components present certain properties, which can be exploited to develop new exploration operators for evolutionary algorithms. We use the newly proposed operator on some well-known algorithms and show that the performance of the algorithms can be improved significantly.
BibTeX:
@inproceedings{Tayarani-NBXY16,
  author = {Mohammad-H. Tayarani-N and Adam Prügel Bennett and Hongming Xu and Xin Yao},
  title = {Improving the performance of evolutionary engine calibration algorithms with principal component analysis},
  booktitle = {Proceedings of IEEE Congress on Evolutionary Computation (CEC'16)},
  publisher = {IEEE},
  year = {2016},
  pages = {5128-5137},
  doi = {http://dx.doi.org/10.1109/CEC.2016.7748340}
}
Turner, A.J., White, D.R. and Drake, J.H. Multi-objective Regression Test Suite Minimisation for Mockito 2016 Proceedings of the 8th International Symposium on Search Based Software Engineering (SSBSE '16), pp. 244-249  inproceedings DOI  
Abstract: Regression testing is applied after modifications are performed to large software systems in order to verify that the changes made do not unintentionally disrupt other existing components. When employing regression testing it is often desirable to reduce the number of test cases executed in order to achieve a certain objective; a process known as test suite minimisation. We use multi-objective optimisation to analyse the trade-off between code coverage and execution time for the test suite of Mockito, a popular framework used to create mock objects for unit tests in Java. We show that a large reduction can be made in terms of execution time at the expense of only a small reduction in code coverage and discuss how the described methods can be easily applied to many projects that utilise regression testing.
BibTeX:
@inproceedings{TurnerWD16,
  author = {Andrew J. Turner and David R. White and John H. Drake},
  title = {Multi-objective Regression Test Suite Minimisation for Mockito},
  booktitle = {Proceedings of the 8th International Symposium on Search Based Software Engineering (SSBSE '16)},
  publisher = {Springer},
  year = {2016},
  pages = {244-249},
  doi = {http://dx.doi.org/10.1007/978-3-319-47106-8_19}
}
Ullah, A., Li, J., Hussain, A. and Yang, E. Towards a Biologically Inspired Soft Switching Approach for Cloud Resource Provisioning 2016 Cognitive Computation
Vol. 8(5), pp. 992-1005 
article DOI  
Abstract: Cloud elasticity augments applications to dynamically adapt to changes in demand by acquiring or releasing computational resources on the fly. Recently, we developed a framework for cloud elasticity utilizing multiple feedback controllers simultaneously, wherein, each controller determines the scaling action with different intensity, and the selection of an appropriate controller is realized with a fuzzy inference system. In this paper, we aim to identify the similarities between cloud elasticity and action selection mechanism in the animal brain. We treat each controller in our previous framework as an action, and propose a novel bioinspired, soft switching approach. The proposed methodology integrates a basal ganglia computational model as an action selection mechanism. Initial experimental results demonstrate the improved potential of the basal ganglia-based approach by enhancing the overall system performance and stability.
BibTeX:
@article{UllahLHY16,
  author = {Amjad Ullah and Jingpeng Li and Amir Hussain and Erfu Yang},
  title = {Towards a Biologically Inspired Soft Switching Approach for Cloud Resource Provisioning},
  journal = {Cognitive Computation},
  year = {2016},
  volume = {8},
  number = {5},
  pages = {992-1005},
  doi = {http://dx.doi.org/10.1007/s12559-016-9391-y}
}
Wang, S., Minku, L.L. and Yao, X. Dealing with Multiple Classes in Online Class Imbalance Learning 2016 Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI'16), pp. 2118-2124  inproceedings DOI  
Abstract: Online class imbalance learning deals with data streams having very skewed class distributions in a timely fashion. Although a few methods have been proposed to handle such problems, most of them focus on two-class cases. Multi-class imbalance imposes additional challenges in learning. This paper studies the combined challenges posed by multiclass imbalance and online learning, and aims at a more effective and adaptive solution. First, we introduce two resampling-based ensemble methods, called MOOB and MUOB, which can process multi-class data directly and strictly online with an adaptive sampling rate. Then, we look into the impact of multi-minority and multi-majority cases on MOOB and MUOB in comparison to other methods under stationary and dynamic scenarios. Both multi-minority and multi-majority make a negative impact. MOOB shows the best and most stable G-mean in most stationary and dynamic cases.
BibTeX:
@inproceedings{WangMY16,
  author = {Shuo Wang and Leandro L. Minku and Xin Yao},
  title = {Dealing with Multiple Classes in Online Class Imbalance Learning},
  booktitle = {Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI'16)},
  year = {2016},
  pages = {2118-2124},
  doi = {https://dl.acm.org/citation.cfm?id=3060917}
}
Wang, H. and Yao, X. Objective Reduction Based on Nonlinear Correlation Information Entropy 2016 Soft Computing
Vol. 20(6), pp. 2393–2407 
article DOI  
Abstract: It is hard to obtain the entire solution set of a many-objective optimization problem (MaOP) by multi-objective evolutionary algorithms (MOEAs) because of the difficulties brought by the large number of objectives. However, the redundancy of objectives exists in some problems with correlated objectives (linearly or nonlinearly). Objective reduction can be used to decrease the difficulties of some MaOPs. In this paper, we propose a novel objective reduction approach based on nonlinear correlation information entropy (NCIE). It uses the NCIE matrix to measure the linear and nonlinear correlation between objectives and a simple method to select the most conflicting objectives during the execution of MOEAs. We embed our approach into both Pareto-based and indicator-based MOEAs to analyze the impact of our reduction method on the performance of these algorithms. The results show that our approach significantly improves the performance of Pareto-based MOEAs on both reducible and irreducible MaOPs, but does not much help the performance of indicator-based MOEAs.
BibTeX:
@article{WangY16,
  author = {Handing Wang and Xin Yao},
  title = {Objective Reduction Based on Nonlinear Correlation Information Entropy},
  journal = {Soft Computing},
  year = {2016},
  volume = {20},
  number = {6},
  pages = {2393–2407},
  doi = {http://dx.doi.org/10.1007/s00500-015-1648-y}
}
Wang, H., Zhang, Q., Jiao, L. and Yao, X. Regularity Model for Noisy Multiobjective Optimization 2016 IEEE Transactions on Cybernetics
Vol. 46(9), pp. 1997-2009 
article DOI  
Abstract: Regularity models have been used in dealing with noise-free multiobjective optimization problems. This paper studies the behavior of a regularity model in noisy environments and argues that it is very suitable for noisy multiobjective optimization. We propose to embed the regularity model in an existing multiobjective evolutionary algorithm for tackling noises. The proposed algorithm works well in terms of both convergence and diversity. In our experimental studies, we have compared several state-of-the-art of algorithms with our proposed algorithm on benchmark problems with different levels of noises. The experimental results showed the effectiveness of the regularity model on noisy problems, but a degenerated performance on some noisy-free problems.
BibTeX:
@article{WangZJY16,
  author = {Handing Wang and Qingfu Zhang and Licheng Jiao and Xin Yao},
  title = {Regularity Model for Noisy Multiobjective Optimization},
  journal = {IEEE Transactions on Cybernetics},
  year = {2016},
  volume = {46},
  number = {9},
  pages = {1997-2009},
  doi = {http://dx.doi.org/10.1109/TCYB.2015.2459137}
}
White, D.R. Guiding Unconstrained Genetic Improvement 2016 Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (GECCO '16), pp. 1133-1134  inproceedings DOI  
Abstract: This paper argues that the potential for arbitrary transformation is what differentiates GI from other program transformation work. With great expressive power comes great responsibility, and GI has had mixed success finding effective program repairs and optimisations. The search must be better guided in order to improve solution quality.
BibTeX:
@inproceedings{White16,
  author = {David R. White},
  title = {Guiding Unconstrained Genetic Improvement},
  booktitle = {Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (GECCO '16)},
  publisher = {ACM},
  year = {2016},
  pages = {1133-1134},
  doi = {http://dx.doi.org/10.1145/2908961.2931688}
}
Woodward, J.R., Brownlee, A. and Johnson, C.G. Evals is Not Enough: Why We Should Report Wall-clock Time 2016 Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (GECCO'16 Companion), pp. 1157-1158  inproceedings DOI  
Abstract: Have you ever noticed that your car never achieves the fuel economy claimed by the manufacturer? Does this seem unfair? Unscientic? Would you like the same situation to occur in Genetic Improvement? Comparison will always be difficult [9], however, guidelines have been discussed [3, 5, 4]. With two GP [8] approaches, comparing the number of evaluations of the fitness function is reasonably fair. This means you are comparing the GP systems, and not how well they are implemented, how fast the language is. However, the situation with GI [6, 1] is unique. With GI we will typically compare systems which are applied to the same application written in the same language (i.e. a GI systems targeted at Java, may not even be applied to C). Thus, wall-clock time becomes more relevant. Thus, this paper asks if reporting number of evaluations is enough, or if wall-clock time is also important, particularly in the context of GI. It argues that reporting time is even more important when doing GI when compared to traditional GP
BibTeX:
@inproceedings{WoodwardBJ16,
  author = {John R. Woodward and Alexander Brownlee and Colin G. Johnson},
  title = {Evals is Not Enough: Why We Should Report Wall-clock Time},
  booktitle = {Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (GECCO'16 Companion)},
  publisher = {ACM},
  year = {2016},
  pages = {1157-1158},
  doi = {http://dx.doi.org/10.1145/2908961.2931695}
}
Woodward, J.R., Johnson, C.G. and Brownlee, A. Connecting Automatic Parameter Tuning, Genetic Programming as a Hyper-heuristic, and Genetic Improvement Programming 2016 Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (GECCO'16 Companion), pp. 1357-1358  inproceedings DOI  
Abstract: Automatically designing algorithms has long been a dream of computer scientists. Early attempts which generate computer programs from scratch, have failed to meet this goal. However, in recent years there have been a number of different technologies with an alternative goal of taking existing programs and attempting to improving them.

These methods form a range of methodologies, from the "limited" ability to change (for example only the parameters) to the "complete" ability to change the whole program. These include; automatic parameter tuning (APT), using GP as a hyper-heuristic (GPHH), and GI, which we will now briefly review. Part of research is building links between existing work, and the aim of this paper is to bring together these currently separate approaches.

BibTeX:
@inproceedings{WoodwardJB16,
  author = {John R. Woodward and Colin G. Johnson and Alexander Brownlee},
  title = {Connecting Automatic Parameter Tuning, Genetic Programming as a Hyper-heuristic, and Genetic Improvement Programming},
  booktitle = {Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (GECCO'16 Companion)},
  publisher = {ACM},
  year = {2016},
  pages = {1357-1358},
  doi = {http://dx.doi.org/10.1145/2908961.2931728}
}
Woodward, J.R., Johnson, C.G. and Brownlee, A. GP vs GI: If You Can't Beat Them, Join Them 2016 Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (GECCO'16 Companion), pp. 1155-1156  inproceedings DOI  
Abstract: Genetic Programming (GP) has been criticized for targeting irrelevant problems [12], and is true of the wider machine learning community [11]. which has become detached from the source of the data it is using to drive the field forward. However, recently GI provides a fresh perspective on automated programming. In contrast to GP, GI begins with existing software, and therefore immediately has the aim of tackling real software. As evolution is the main approach to GI to manipulating programs, this connection with real software should persuade the GP community to confront the issues around what it originally set out to tackle i.e. evolving real software.
BibTeX:
@inproceedings{WoodwardJB16b,
  author = {John R. Woodward and Colin G. Johnson and Alexander Brownlee},
  title = {GP vs GI: If You Can't Beat Them, Join Them},
  booktitle = {Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (GECCO'16 Companion)},
  publisher = {ACM},
  year = {2016},
  pages = {1155-1156},
  doi = {http://dx.doi.org/10.1145/2908961.2931694}
}
Wu, X., Consoli, P.A., Minku, L.L., Ochoa, G. and Yao, X. An Evolutionary Hyper-heuristic for the Software Project Scheduling Problem 2016 Proceedings of the 14th International Conference on Parallel Problem Solving from Nature (PPSN '16), pp. 37-47  inproceedings DOI  
Abstract: Software project scheduling plays an important role in reducing the cost and duration of software projects. It is an NP-hard combinatorial optimization problem that has been addressed based on single and multi-objective algorithms. However, such algorithms have always used fixed genetic operators, and it is unclear which operators would be more appropriate across the search process. In this paper, we propose an evolutionary hyper-heuristic to solve the software project scheduling problem. Our novelties include the following: (1) this is the first work to adopt an evolutionary hyper-heuristic for the software project scheduling problem; (2) this is the first work for adaptive selection of both crossover and mutation operators; (3) we design different credit assignment methods for mutation and crossover; and (4) we use a sliding multi-armed bandit strategy to adaptively choose both crossover and mutation operators. The experimental results show that the proposed algorithm can solve the software project scheduling problem effectively.
BibTeX:
@inproceedings{WuCMOY16,
  author = {Xiuli Wu and Pietro A. Consoli and Leandro L. Minku and Gabriela Ochoa and Xin Yao},
  title = {An Evolutionary Hyper-heuristic for the Software Project Scheduling Problem},
  booktitle = {Proceedings of the 14th International Conference on Parallel Problem Solving from Nature (PPSN '16)},
  publisher = {Springer},
  year = {2016},
  pages = {37-47},
  doi = {http://dx.doi.org/10.1007/978-3-319-45823-6_4}
}
Wu, F., Harman, M., Jia, Y. and Krinke, J. HOMI: Searching Higher Order Mutants for Software Improvement 2016 Proceedings of the 8th International Symposium on Search Based Software Engineering (SSBSE '16), pp. 18-33  inproceedings DOI  
Abstract: This paper introduces HOMI, a Higher Order Mutation based approach for Genetic Improvement of software, in which the code modification granularity is finer than in previous work while scalability remains. HOMI applies the NSGAII algorithm to search for higher order mutants that improve the non-functional properties of a program while passing all its regression tests. Experimental results on four real-world C programs shows that up to 14.7 % improvement on time and 19.7 % on memory are found using only First Order Mutants. By combining these First Order Mutants, HOMI found further improvement in Higher Order Mutants, giving an 18.2 % improvement on the time performance while keeping the memory improvement. A further manual analysis suggests that 88 % of the mutation changes cannot be generated using line based ‘plastic surgery’ Genetic Improvement approaches.
BibTeX:
@inproceedings{WuHJK16,
  author = {Fan Wu and Mark Harman and Yue Jia and Jens Krinke},
  title = {HOMI: Searching Higher Order Mutants for Software Improvement},
  booktitle = {Proceedings of the 8th International Symposium on Search Based Software Engineering (SSBSE '16)},
  publisher = {Springer},
  year = {2016},
  pages = {18-33},
  doi = {http://dx.doi.org/10.1007/978-3-319-47106-8_2}
}
Xue, B., Zhang, M., Browne, W.N. and Yao, X. A Survey on Evolutionary Computation Approaches to Feature Selection 2016 IEEE Transactions on Evolutionary Computation
Vol. 20(4), pp. 606-626 
article DOI  
Abstract: Feature selection is an important task in data mining and machine learning to reduce the dimensionality of the data and increase the performance of an algorithm, such as a classification algorithm. However, feature selection is a challenging task due mainly to the large search space. A variety of methods have been applied to solve feature selection problems, where evolutionary computation (EC) techniques have recently gained much attention and shown some success. However, there are no comprehensive guidelines on the strengths and weaknesses of alternative approaches. This leads to a disjointed and fragmented field with ultimately lost opportunities for improving performance and successful applications. This paper presents a comprehensive survey of the state-of-the-art work on EC for feature selection, which identifies the contributions of these different algorithms. In addition, current issues and challenges are also discussed to identify promising areas for future research.
BibTeX:
@article{XueZBY16,
  author = {Bing Xue and Mengjie Zhang and Will N. Browne and Xin Yao},
  title = {A Survey on Evolutionary Computation Approaches to Feature Selection},
  journal = {IEEE Transactions on Evolutionary Computation},
  year = {2016},
  volume = {20},
  number = {4},
  pages = {606-626},
  doi = {http://dx.doi.org/10.1109/TEVC.2015.2504420}
}
Yang, Z., Sendhoff, B., Tang, K. and Yao, X. Target shape design optimization by evolving B-splines with cooperative coevolution 2016 Applied Soft Computing
Vol. 48, pp. 672-682 
article  
Abstract: With high reputation in handling non-linear and multi-model problems with little prior knowledge, evolutionary algorithms (EAs) have successfully been applied to design optimization problems as robust optimizers. Since real-world design optimization is often computationally expensive, target shape design optimization problems (TSDOPs) have been frequently used as efficient miniature model to check algorithmic performance for general shape design. There are at least three important issues in developing EAs for TSDOPs, i.e., design representation, fitness evaluation and evolution paradigm. Existing work has mainly focused on the first two issues, in which (1) an adaptive encoding scheme with B-spline has been proposed as a representation, and (2) a symmetric Hausdorff distance based metric has been used as a fitness function. But for the third issue, off-the-shelf EAs were used directly to evolve B-spline control points and/or knot vector. In this paper, we first demonstrate why it is unreasonable to evolve the control points and knot vector simultaneously. And then a new coevolutionary paradigm is proposed to evolve the control points and knot vector of B-spline separately in a cooperative manner. In the new paradigm, an initial population is generated for both the control points, and the knot vector. The two populations are evolved mostly separately in a round-robin fashion, with only cooperation at the fitness evaluation phase. The new paradigm has at least two significant advantages over conventional EAs. Firstly, it provides a platform to evolve both the control points and knot vector reasonably. Secondly, it reduces the difficulty of TSDOPs by decomposing the objective vector into two smaller subcomponents (i.e., control points and knot vector). To evaluate the efficacy of the proposed coevolutionary paradigm, an algorithm named CMA-ES-CC was formulated. Experimental studies were conducted based on two target shapes. The comparison with six other EAs suggests that the proposed cooperative coevolution paradigm is very effective for TSDOPs.
BibTeX:
@article{YangSTXY16,
  author = {Zhenyu Yang and B. Sendhoff and Ke Tang and Xin Yao},
  title = {Target shape design optimization by evolving B-splines with cooperative coevolution},
  journal = {Applied Soft Computing},
  year = {2016},
  volume = {48},
  pages = {672-682}
}
Yuan, B., Li, B., Chen, H. and Yao, X. Defect- and Variation-Tolerant Logic Mapping in Nanocrossbar using Bipartite Matching and Memetic Algorithm 2016 IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Vol. 24(9), pp. 2813-2826 
article DOI  
Abstract: High defect density and extreme parameter variation make it very difficult to implement reliable logic functions in crossbar-based nanoarchitectures. It is a major design challenge to tolerate defects and variations simultaneously for such architectures. In this paper, a method based on a bipartite matching and memetic algorithm is proposed for defect- and variation-tolerant logic mapping (D/VTLM) problem in crossbar-based nanoarchitectures. In the proposed method, the search space of the D/VTLM problem can be dramatically reduced through the introduction of the min-max weight maximum-bipartite-matching (MMW-MBM) and a related heuristic bipartite matching method. MMW-MBM is defined on a weighted bipartite graph as an MBM, where the maximal weight of the edges in the matching has a minimal value. In addition, a defect- and variation-aware local search (D/VALS) operator is proposed for D/VTLM and embedded in a global search framework. The D/VALS operator is able to utilize the domain knowledge extracted from problem instances and, thus, has the potential to search the solution space more efficiently. Compared with the state-of-the-art heuristic and recursive algorithms, and a simulated annealing algorithm, the good performance of our proposed method is verified on a 3-bit adder and a large set of random benchmarks of various scales.
BibTeX:
@article{YuanLCY16,
  author = {Bo Yuan and Bin Li and Huanhuan Chen and Xin Yao},
  title = {Defect- and Variation-Tolerant Logic Mapping in Nanocrossbar using Bipartite Matching and Memetic Algorithm},
  journal = {IEEE Transactions on Very Large Scale Integration (VLSI) Systems},
  year = {2016},
  volume = {24},
  number = {9},
  pages = {2813-2826},
  doi = {http://dx.doi.org/10.1109/TVLSI.2016.2530898}
}
Yuan, Y., Xu, H., Wang, B. and Yao, X. A New Dominance Relation-Based Evolutionary Algorithm for Many-Objective Optimization 2016 IEEE Transactions on Evolutionary Computation
Vol. 20(1), pp. 16-37 
article DOI  
Abstract: Many-objective optimization has posed a great challenge to the classical Pareto dominance-based multiobjective evolutionary algorithms (MOEAs). In this paper, an evolutionary algorithm based on a new dominance relation is proposed for many-objective optimization. The proposed evolutionary algorithm aims to enhance the convergence of the recently suggested nondominated sorting genetic algorithm III by exploiting the fitness evaluation scheme in the MOEA based on decomposition, but still inherit the strength of the former in diversity maintenance. In the proposed algorithm, the nondominated sorting scheme based on the introduced new dominance relation is employed to rank solutions in the environmental selection phase, ensuring both convergence and diversity. The proposed algorithm is evaluated on a number of well-known benchmark problems having 3-15 objectives and compared against eight state-of-the-art algorithms. The extensive experimental results show that the proposed algorithm can work well on almost all the test functions considered in this paper, and it is compared favorably with the other many-objective optimizers. Additionally, a parametric study is provided to investigate the influence of a key parameter in the proposed algorithm.
BibTeX:
@article{YuanXWY16,
  author = {Yuan Yuan and Hua Xu and Bo Wang and Xin Yao},
  title = {A New Dominance Relation-Based Evolutionary Algorithm for Many-Objective Optimization},
  journal = {IEEE Transactions on Evolutionary Computation},
  year = {2016},
  volume = {20},
  number = {1},
  pages = {16-37},
  doi = {http://dx.doi.org/10.1109/TEVC.2015.2420112}
}
Yuan, Y., Xu, H., Wang, B., Zhang, B. and Yao, X. Balancing Convergence and Diversity in Decomposition-Based Many-Objective Optimizers 2016 IEEE Transactions on Evolutionary Computation
Vol. 20(2), pp. 180-198 
article DOI  
Abstract: The decomposition-based multiobjective evolutionary algorithms generally make use of aggregation functions to decompose a multiobjective optimization problem into multiple single-objective optimization problems. However, due to the nature of contour lines for the adopted aggregation functions, they usually fail in preserving the diversity in high-dimensional objective space even by using diverse weight vectors. To address this problem, we propose to maintain the desired diversity of solutions in their evolutionary process explicitly by exploiting the perpendicular distance from the solution to the weight vector in the objective space, which achieves better balance between convergence and diversity in many-objective optimization. The idea is implemented to enhance two well-performing decomposition-based algorithms, i.e., multiobjective evolutionary algorithms based on decomposition and ensemble fitness ranking. The two enhanced algorithms are compared to several state-ofthe- art algorithms, and a series of comparative experiments are conducted on a number of test problems from two well-known test suites. The experimental results show that the two proposed algorithms are generally more effective than their predecessors in balancing convergence and diversity, and they are also very competitive against other existing algorithms for solving manyobjective optimization problems.
BibTeX:
@article{YuanXWZY16,
  author = {Yuan Yuan and Hua Xu and Bo Wang and Bo Zhang and Xin Yao},
  title = {Balancing Convergence and Diversity in Decomposition-Based Many-Objective Optimizers},
  journal = {IEEE Transactions on Evolutionary Computation},
  year = {2016},
  volume = {20},
  number = {2},
  pages = {180-198},
  doi = {http://dx.doi.org/10.1109/TEVC.2015.2443001}
}
Azzeh, M., Nassif, A.B. and Minku, L.L. An Empirical Evaluation of Ensemble Adjustment Methods for Analogy-Based Effort Estimation 2015 Journal of Systems and Software
Vol. 103, pp. 36-52 
article DOI  
Abstract: Context Effort adjustment is an essential part of analogy-based effort estimation, used to tune and adapt nearest analogies in order to produce more accurate estimations. Currently, there are plenty of adjustment methods proposed in literature, but there is no consensus on which method produces more accurate estimates and under which settings.
Objective This paper investigates the potential of ensemble learning for variants of adjustment methods used in analogy-based effort estimation. The number k of analogies to be used is also investigated.
Method We perform a large scale comparison study where many ensembles constructed from n out of 40 possible valid variants of adjustment methods are applied to eight datasets. The performance of each method was evaluated based on standardized accuracy and effect size.
Results The results have been subjected to statistical significance testing, and show reasonable significant improvements on the predictive performance where ensemble methods are applied.
Conclusion Our conclusions suggest that ensembles of adjustment methods can work well and achieve good performance, even though they are not always superior to single methods. We also recommend constructing ensembles from only linear adjustment methods, as they have shown better performance and were frequently ranked higher.
BibTeX:
@article{AzzehNM15,
  author = {Mohammad Azzeh and Ali Bou Nassif and Leandro L. Minku},
  title = {An Empirical Evaluation of Ensemble Adjustment Methods for Analogy-Based Effort Estimation},
  journal = {Journal of Systems and Software},
  year = {2015},
  volume = {103},
  pages = {36-52},
  doi = {http://dx.doi.org/10.1016/j.jss.2015.01.028}
}
Barr, E.T., Harman, M., Jia, Y., Marginean, A. and Petke, J. Automated Software Transplantation 2015 Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA '15), pp. 257-269  inproceedings DOI  
Abstract: Automated transplantation would open many exciting avenues for software development: suppose we could autotransplant code from one system into another, entirely unrelated, system. This paper introduces a theory, an algorithm, and a tool that achieve this. Leveraging lightweight annotation, program analysis identifies an organ (interesting behavior to transplant); testing validates that the organ exhibits the desired behavior during its extraction and after its implantation into a host. While we do not claim automated transplantation is now a solved problem, our results are encouraging: we report that in 12 of 15 experiments, involving 5 donors and 3 hosts (all popular real-world systems), we successfully autotransplanted new functionality and passed all regression tests. Autotransplantation is also already useful: in 26 hours computation time we successfully autotransplanted the H.264 video encoding functionality from the x264 system to the VLC media player; compare this to upgrading x264 within VLC, a task that we estimate, from VLC's version history, took human programmers an average of 20 days of elapsed, as opposed to dedicated, time.
BibTeX:
@inproceedings{BarrHJMP15,
  author = {Earl T. Barr and Mark Harman and Yue Jia and Alexandru Marginean and Justyna Petke},
  title = {Automated Software Transplantation},
  booktitle = {Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA '15)},
  publisher = {ACM},
  year = {2015},
  pages = {257-269},
  doi = {http://dx.doi.org/10.1145/2771783.2771796}
}
Bruce, B.R. Energy Optimisation via Genetic Improvement: A SBSE technique for a new era in Software Development 2015 Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference (GECCO '15) - the 1st International Genetic Improvement Workshop (GI '15), pp. 819-820  inproceedings DOI  
BibTeX:
@inproceedings{Bruce15,
  author = {Bobby R. Bruce},
  title = {Energy Optimisation via Genetic Improvement: A SBSE technique for a new era in Software Development},
  booktitle = {Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference (GECCO '15) - the 1st International Genetic Improvement Workshop (GI '15)},
  publisher = {ACM},
  year = {2015},
  pages = {819-820},
  doi = {http://dx.doi.org/10.1145/2739482.2768420}
}
Bruce, B.R., Petke, J. and Harman, M. Reducing Energy Consumption using Genetic Improvement 2015 Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference (GECCO '15), pp. 1327-1334  inproceedings DOI  
Abstract: Genetic Improvement (GI) is an area of Search Based Software Engineering which seeks to improve software's non-functional properties by treating program code as if it were genetic material which is then evolved to produce more optimal solutions. Hitherto, the majority of focus has been on optimising program's execution time which, though important, is only one of many non-functional targets. The growth in mobile computing, cloud computing infrastructure, and ecological concerns are forcing developers to focus on the energy their software consumes. We report on investigations into using GI to automatically find more energy efficient versions of the MiniSAT Boolean satisfiability solver when specialising for three downstream applications. Our results find that GI can successfully be used to reduce energy consumption by up to 25%.
BibTeX:
@inproceedings{BrucePH15,
  author = {Bobby R. Bruce and Justyna Petke and Mark Harman},
  title = {Reducing Energy Consumption using Genetic Improvement},
  booktitle = {Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference (GECCO '15)},
  publisher = {ACM},
  year = {2015},
  pages = {1327-1334},
  doi = {http://dx.doi.org/10.1145/2739480.2754752}
}
Burles, N., Bowles, E., Brownlee, A.E.I., Kocsis, Z.A., Swan, J. and Veerapen, N. Object-Oriented Genetic Improvement for Improved Energy Consumption in Google Guava 2015 Proceedings of the 7th International Symposium on Search-Based Software Engineering (SSBSE ’15), pp. 255-261  inproceedings DOI  
Abstract: In this work we use metaheuristic search to improve Google’s Guava library, finding a semantically equivalent version of com.google.common.collect.ImmutableMultimap with reduced energy consumption. Semantics-preserving transformations are found in the source code, using the principle of subtype polymorphism. We introduce a new tool, Opacitor, to deterministically measure the energy consumption, and find that a statistically significant reduction to Guava’s energy consumption is possible. We corroborate these results using Jalen, and evaluate the performance of the metaheuristic search compared to an exhaustive search—finding that the same result is achieved while requiring almost 200 times fewer fitness evaluations. Finally, we compare the metaheuristic search to an independent exhaustive search at each variation point, finding that the metaheuristic has superior performance.
BibTeX:
@inproceedings{BurlesBBKSV15,
  author = {Nathan Burles and Edward Bowles and Alexander E. I. Brownlee and Zoltan A. Kocsis and Jerry Swan and Nadarajen Veerapen},
  title = {Object-Oriented Genetic Improvement for Improved Energy Consumption in Google Guava},
  booktitle = {Proceedings of the 7th International Symposium on Search-Based Software Engineering (SSBSE ’15)},
  publisher = {Springer},
  year = {2015},
  pages = {255-261},
  doi = {http://dx.doi.org/10.1007/978-3-319-22183-0_20}
}
Burles, N., Bowles, E., Bruce, B.R. and Srivisut, K. Specialising Guava’s Cache to Reduce Energy Consumption 2015 Proceedings of the 7th International Symposium on Search-Based Software Engineering (SSBSE ’15), pp. 276-281  inproceedings DOI  
Abstract: In this article we use a Genetic Algorithm to perform parameter tuning on Google Guava’s Cache library, specialising it to OpenTripPlanner. A new tool, Opacitor, is used to deterministically measure the energy consumed, and we find that the energy consumption of OpenTripPlanner may be significantly reduced by tuning the default parameters of Guava’s Cache library. Finally we use Jalen, which uses time and CPU utilisation as a proxy to calculate energy consumption, to corroborate these results.
BibTeX:
@inproceedings{BurlesBBS15,
  author = {Nathan Burles and Edward Bowles and Bobby R. Bruce and Komsan Srivisut},
  title = {Specialising Guava’s Cache to Reduce Energy Consumption},
  booktitle = {Proceedings of the 7th International Symposium on Search-Based Software Engineering (SSBSE ’15)},
  publisher = {Springer},
  year = {2015},
  pages = {276-281},
  doi = {http://dx.doi.org/10.1007/978-3-319-22183-0_23}
}
Burles, N., Swan, J., Bowles, E., Brownlee, A.E.I., Kocsis, Z.A. and Veerapen, N. Embedded Dynamic Improvement 2015 Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference (GECCO '15) - the 1st International Genetic Improvement Workshop (GI '15), pp. 831-832  inproceedings DOI  
Abstract: We discuss the useful role that can be played by a subtype of improvement programming, which we term `Embedded Dynamic Improvement'. In this approach, developer-specified variation points define the scope of improvement. A search framework is embedded at these variation points, facilitating the creation of adaptive software that can respond online to changes in its execution environment.
BibTeX:
@inproceedings{BurlesSBBKV15,
  author = {Nathan Burles and Jerry Swan and Edward Bowles and Alexander E. I. Brownlee and Zoltan A. Kocsis and Nadarajen Veerapen},
  title = {Embedded Dynamic Improvement},
  booktitle = {Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference (GECCO '15) - the 1st International Genetic Improvement Workshop (GI '15)},
  publisher = {ACM},
  year = {2015},
  pages = {831-832},
  doi = {http://dx.doi.org/10.1145/2739482.2768423}
}
Du, X., Ni, Y., Ye, P., Yao, X., Minku, L.L. and Xiao, R. An Evolutionary Algorithm for Performance Optimization at Software Architecture Level 2015 Proceedings of IEEE Congress on Evolutionary Computation (CEC '15), pp. 2129 - 2136  inproceedings DOI  
Abstract: Architecture-based software performance optimization can not only significantly save time but also reduce cost. A few rule-based performance optimization approaches at software architecture (SA) level have been proposed in recent years. However, in these approaches, the number of rules being used and the order of application of each rule are uncertain in the optimization process and these uncertainties have not been fully considered so far. As a result, the search space for performance improvement is limited, possibly excluding optimal solutions. Aiming to solve this problem, we propose an evolutionary algorithm for rule-based performance optimization at SA level named EA4PO. First, the rule-based software performance optimization at SA level is abstracted into a mathematical model called RPOM. RPOM can precisely characterize the mathematical relation between the usage of rules and the optimal solution in the performance improvement space. Then, a framework named RSEF is designed to support the execution of rule sequences. Based on RPOM and RSEF, EA4PO is proposed to find the optimal performance improvement solution. In EA4PO, an adaptive mutation operator is designed to guide the search direction by fully considering heuristic information of rule usage during the evolution. Finally, the effectiveness of EA4PO is validated by comparing EA4PO with a typical rule-based approach. The results show that EA4PO can explore a relatively larger space and get better solutions.
BibTeX:
@inproceedings{DuNYYMX15,
  author = {Xin Du and Youcong Ni and Peng Ye and Xin Yao and Leandro L. Minku and Ruliang Xiao},
  title = {An Evolutionary Algorithm for Performance Optimization at Software Architecture Level},
  booktitle = {Proceedings of IEEE Congress on Evolutionary Computation (CEC '15)},
  publisher = {IEEE},
  year = {2015},
  pages = {2129 - 2136},
  doi = {http://dx.doi.org/10.1109/CEC.2015.7257147}
}
Fu, H., Sendhoff, B., Tang, K. and Yao, X. Robust Optimization Over Time: Problem Difficulties and Benchmark Problems 2015 IEEE Transactions on Evolutionary Computation
Vol. 19(5), pp. 731-745 
article DOI  
Abstract: The focus of most research in evolutionary dynamic optimization has been tracking moving optimum (TMO). Yet, TMO does not capture all the characteristics of real-world dynamic optimization problems (DOPs), especially in situations where a solution's future fitness has to be considered. To account for a solution's future fitness explicitly, we propose to find robust solutions to DOPs, which are formulated as the robust optimization over time (ROOT) problem. In this paper we analyze two robustness definitions in ROOT and then develop two types of benchmark problems for the two robustness definitions in ROOT, respectively. The two types of benchmark problems are motivated by the inappropriateness of existing DOP benchmarks for the study of ROOT. Additionally, we evaluate four representative methods from the literature on our proposed ROOT benchmarks, in order to gain a better understanding of ROOT problems and their relationship to more popular TMO problems. The experimental results are analyzed, which show the strengths and weaknesses of different methods in solving ROOT problems with different dynamics. In particular, the real challenges of ROOT problems have been revealed for the first time by the experimental results on our proposed ROOT benchmarks.
BibTeX:
@article{FuSTY15,
  author = {Haobo Fu and Bernhard Sendhoff and Ke Tang and Xin Yao},
  title = {Robust Optimization Over Time: Problem Difficulties and Benchmark Problems},
  journal = {IEEE Transactions on Evolutionary Computation},
  year = {2015},
  volume = {19},
  number = {5},
  pages = {731-745},
  doi = {http://dx.doi.org/10.1109/TEVC.2014.2377125}
}
Haraldsson, S.O. and Woodward, J.R. Genetic Improvement of Energy Usage is only as Reliable as the Measurements are Accurate 2015 Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference (GECCO '15) - the 1st International Genetic Improvement Workshop (GI '15), pp. 821-822  inproceedings DOI  
Abstract: Energy has recently become an objective for Genetic Improvement. Measuring software energy use is complicated which might tempt us to use simpler measurements. However if we base the GI on inaccurate measurements we can not expect improvements. This paper seeks to highlight important issues when evaluating energy use of programs.
BibTeX:
@inproceedings{HaraldssonW15,
  author = {Saemundur O. Haraldsson and John R. Woodward},
  title = {Genetic Improvement of Energy Usage is only as Reliable as the Measurements are Accurate},
  booktitle = {Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference (GECCO '15) - the 1st International Genetic Improvement Workshop (GI '15)},
  publisher = {ACM},
  year = {2015},
  pages = {821-822},
  doi = {http://dx.doi.org/10.1145/2739482.2768421}
}
Harman, M., Jia, Y. and Zhang, Y. Achievements, Open Problems and Challenges for Search Based Software Testing (keynote) 2015 Proceeding of the 8th IEEE International Conference on Software Testing, Verification and Validation (ICST '15), pp. 1-12  inproceedings DOI  
Abstract: Search Based Software Testing (SBST) formulates testing as an optimisation problem, which can be attacked using computational search techniques from the field of Search Based Software Engineering (SBSE). We present an analysis of the SBST research agenda, focusing on the open problems and challenges of testing non-functional properties, in particular a topic we call 'Search Based Energy Testing' (SBET), Multi-objective SBST and SBST for Test Strategy Identification. We conclude with a vision of FIFIVERIFY tools, which would automatically find faults, fix them and verify the fixes. We explain why we think such FIFIVERIFY tools constitute an exciting challenge for the SBSE community that already could be within its reach.
BibTeX:
@inproceedings{HarmanJZ15,
  author = {Mark Harman and Yue Jia and Yuanyuan Zhang},
  title = {Achievements, Open Problems and Challenges for Search Based Software Testing (keynote)},
  booktitle = {Proceeding of the 8th IEEE International Conference on Software Testing, Verification and Validation (ICST '15)},
  publisher = {IEEE},
  year = {2015},
  pages = {1-12},
  doi = {http://dx.doi.org/10.1109/ICST.2015.7102580}
}
Harman, M. and Petke, J. GI4GI: Improving Genetic Improvement Fitness Functions 2015 Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference (GECCO '15) - the 1st International Genetic Improvement Workshop (GI '15), pp. 793-794  inproceedings DOI  
Abstract: Genetic improvement (GI) has been successfully used to optimise non-functional properties of software, such as execution time, by automatically manipulating program's source code. Measurement of non-functional properties, however, is a non-trivial task; energy consumption, for instance, is highly dependant on the hardware used. Therefore, we propose the GI4GI framework (and two illustrative applications). GI4GI first applies GI to improve the fitness function for the particular environment within which software is subsequently optimised using traditional GI.
BibTeX:
@inproceedings{HarmanP15,
  author = {Mark Harman and Justyna Petke},
  title = {GI4GI: Improving Genetic Improvement Fitness Functions},
  booktitle = {Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference (GECCO '15) - the 1st International Genetic Improvement Workshop (GI '15)},
  publisher = {ACM},
  year = {2015},
  pages = {793-794},
  doi = {http://dx.doi.org/10.1145/2739482.2768415}
}
He, J., Chen, T. and Yao, X. On the Easiest and Hardest Fitness Functions 2015 IEEE Transactions on Evolutionary Computation
Vol. 19(2), pp. 295-305 
article DOI  
Abstract: The hardness of fitness functions is an important research topic in the field of evolutionary computation. In theory, this paper can help with understanding the ability of evolutionary algorithms (EAs). In practice, this paper may provide a guideline to the design of benchmarks. The aim of this paper is to answer the following research questions. Given a fitness function class, which functions are the easiest with respect to an EA? Which are the hardest? How are these functions constructed? This paper provides theoretical answers to these questions. The easiest and hardest fitness functions are constructed for an elitist (1 + 1) EA to maximize a class of fitness functions with the same optima. It is demonstrated that the unimodal functions are the easiest and deceptive functions are the hardest in terms of the time-based fitness landscape. This paper also reveals that in a fitness function class, the easiest function to one algorithm may become the hardest to another algorithm, and vice versa.
BibTeX:
@article{HeCY15,
  author = {Jun He and Tianshi Chen and Xin Yao},
  title = {On the Easiest and Hardest Fitness Functions},
  journal = {IEEE Transactions on Evolutionary Computation},
  year = {2015},
  volume = {19},
  number = {2},
  pages = {295-305},
  doi = {http://dx.doi.org/10.1109/TEVC.2014.2318025}
}
Jia, Y. Hyperheuristic Search for SBST 2015 Proceedings of IEEE/ACM 8th International Workshop on Search-Based Software Testing (SBST '15), pp. 15-16  inproceedings DOI  
Abstract: This paper argues that incorporating hyper heuristic techniques into existing SBST approaches could help to increase their applicability and generality. We propose a general two layer selective hyper heuristic approach for SBST and provide an example of its use for Combinatorial Interaction Testing (CIT).
BibTeX:
@inproceedings{Jia15,
  author = {Yue Jia},
  title = {Hyperheuristic Search for SBST},
  booktitle = {Proceedings of IEEE/ACM 8th International Workshop on Search-Based Software Testing (SBST '15)},
  year = {2015},
  pages = {15-16},
  doi = {http://dx.doi.org/10.1109/SBST.2015.10}
}
Jia, Y., Cohen, M.B., Harman, M. and Petke, J. Learning Combinatorial Interaction Test Generation Strategies using Hyperheuristic Search 2015 Proceedings of IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE ‘15), pp. 540-550  inproceedings DOI  
Abstract: The surge of search based software engineering research has been hampered by the need to develop customized search algorithms for different classes of the same problem. For instance, two decades of bespoke Combinatorial Interaction Testing (CIT) algorithm development, our exemplar problem, has left software engineers with a bewildering choice of CIT techniques, each specialized for a particular task. This paper proposes the use of a single hyperheuristic algorithm that learns search strategies across a broad range of problem instances, providing a single generalist approach. We have developed a Hyperheuristic algorithm for CIT, and report experiments that show that our algorithm competes with known best solutions across constrained and unconstrained problems: For all 26 real-world subjects, it equals or outperforms the best result previously reported in the literature. We also present evidence that our algorithm's strong generic performance results from its unsupervised learning. Hyperheuristic search is thus a promising way to relocate CIT design intelligence from human to machine.
BibTeX:
@inproceedings{JiaCHP15,
  author = {Yue Jia and Myra B. Cohen and Mark Harman and Justyna Petke},
  title = {Learning Combinatorial Interaction Test Generation Strategies using Hyperheuristic Search},
  booktitle = {Proceedings of IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE ‘15)},
  publisher = {IEEE},
  year = {2015},
  pages = {540-550},
  doi = {http://dx.doi.org/10.1109/ICSE.2015.71}
}
Jia, Y., Harman, M., Langdon, W.B. and Marginean, A. Grow and Serve: Growing Django Citation Services using SBSE 2015 Proceedings of the 7th International Symposium on Search-Based Software Engineering (SSBSE ’15), pp. 269-275  inproceedings DOI  
Abstract: We introduce a ‘grow and serve’ approach to Genetic Improvement (GI) that grows new functionality as a web service running on the Django platform. Using our approach, we successfully grew and released a citation web service. This web service can be invoked by existing applications to introduce a new citation counting feature. We demonstrate that GI can grow genuinely useful code in this way, so we deployed the SBSE-grown web service into widely-used publications repositories, such as the GP bibliography. In the first 24 hours of deployment alone, the service was used to provide GP bibliography citation data 369 times from 29 countries.
BibTeX:
@inproceedings{JiaHLM15,
  author = {Yue Jia and Mark Harman and William B. Langdon and Alexandru Marginean},
  title = {Grow and Serve: Growing Django Citation Services using SBSE},
  booktitle = {Proceedings of the 7th International Symposium on Search-Based Software Engineering (SSBSE ’15)},
  publisher = {Springer},
  year = {2015},
  pages = {269-275},
  doi = {http://dx.doi.org/10.1007/978-3-319-22183-0_22}
}
Jia, Y., Merayo, M. and Harman, M. Introduction to the special issue on Mutation Testing 2015 Journal of Software: Tesing, Verification and Reliability
Vol. 25(5-7), pp. 461-463 
article DOI  
BibTeX:
@article{JiaMH15,
  author = {Yue Jia and Mercedes Merayo and Mark Harman},
  title = {Introduction to the special issue on Mutation Testing},
  journal = {Journal of Software: Tesing, Verification and Reliability},
  year = {2015},
  volume = {25},
  number = {5-7},
  pages = {461-463},
  doi = {http://dx.doi.org/10.1002/stvr.1582}
}
Jia, Y., Wu, F., Harman, M. and Krinke, J. Genetic Improvement using Higher Order Mutation 2015 Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference (GECCO '15) - the 1st International Genetic Improvement Workshop (GI '15), pp. 803-804  inproceedings DOI  
Abstract: This paper presents a brief outline of a higher-order mutation-based framework for Genetic Improvement (GI). We argue that search-based higher-order mutation testing can be used to implement a form of genetic programming (GP) to increase the search granularity and testability of GI.
BibTeX:
@inproceedings{JiaWHK15,
  author = {Yue Jia and Fan Wu and Mark Harman and Jens Krinke},
  title = {Genetic Improvement using Higher Order Mutation},
  booktitle = {Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference (GECCO '15) - the 1st International Genetic Improvement Workshop (GI '15)},
  publisher = {ACM},
  year = {2015},
  pages = {803-804},
  doi = {http://dx.doi.org/10.1145/2739482.2768417}
}
Johnson, C.G. and Woodward, J.R. Fitness as Task-relevant Information Accumulation 2015 Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation (GECCO '15), pp. 855-856  inproceedings DOI  
Abstract: "If you cannot measure it, you cannot improve it. Lord Kelvin Fitness in GP/GI is usually a short-sighted greedy fitness function counting the number of satisfied test cases (or some other score based on error). If GP/GI is to be extended to successfully tackle "full software systems", which is the stated domain of Genetic Improvement, with loops, conditional statements and function calls, then this kind of fitness will fail to scale. One alternative approach is to measure the fitness gain in terms of the accumulated information at each executed step of the program. This paper discusses methods for measuring the way in which programs accumulate information relevant to their task as they run, by building measures of this information gain based on information theory and model complexity.
BibTeX:
@inproceedings{JohnsonW15,
  author = {Colin G. Johnson and John R. Woodward},
  title = {Fitness as Task-relevant Information Accumulation},
  booktitle = {Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation (GECCO '15)},
  publisher = {ACM},
  year = {2015},
  pages = {855-856},
  doi = {http://dx.doi.org/10.1145/2739482.2768428}
}
Kocsis, Z.A., Brownlee, A.E.I., Swan, J. and Senington, R. Haiku - a Scala Combinator Toolkit for Semi-automated Composition of Metaheuristics 2015 Proceedings of the 7th International Symposium on Search-Based Software Engineering (SSBSE ’15), pp. 125-140  inproceedings DOI  
Abstract: There is an emerging trend towards the automated design of metaheuristics at the software component level. In principle, metaheuristics have a relatively clean decomposition, where well-known frameworks such as ILS and EA are parametrised by variant components for acceptance, perturbation etc. Automated generation of these frameworks is not so simple in practice, since the coupling between components may be implementation specific. Compositionality is the ability to freely express a space of designs ‘bottom up’ in terms of elementary components: previous work in this area has used combinators, a modular and functional approach to componentisation arising from foundational Computer Science. In this article, we describe Haiku, a combinator tool-kit written in the Scala language, which builds upon previous work to further automate the process by automatically composing the external dependencies of components. We provide examples of use and give a case study in which a programatically-generated heuristic is applied to the Travelling Salesman Problem within an Evolutionary Strategies framework.
BibTeX:
@inproceedings{KocsisBSS15,
  author = {Zoltan A. Kocsis and Alexander E. I. Brownlee and Jerry Swan and Richard Senington},
  title = {Haiku - a Scala Combinator Toolkit for Semi-automated Composition of Metaheuristics},
  booktitle = {Proceedings of the 7th International Symposium on Search-Based Software Engineering (SSBSE ’15)},
  publisher = {Springer},
  year = {2015},
  pages = {125-140},
  doi = {http://dx.doi.org/10.1007/978-3-319-22183-0_9}
}
Krawiec, K., Swan, J. and O’Reilly, U.-M. Behavioral Program Synthesis: Insights and Prospects 2015 Genetic Programming Theory and Practice XIII  incollection URL 
Abstract: Genetic programming (GP) is a stochastic, iterative generate-andtest approach to synthesizing programs from tests, i.e. examples of the desired input-output mapping. The number of passed tests, or the total error in continuous domains, is a natural objective measure of a program’s performance and a common yardstick when experimentally comparing algorithms. In GP, it is also by default used to guide the evolutionary search process. An assumption that an objective function should also be an efficient ‘search driver’ is common for all metaheuristics, such as the evolutionary algorithms which GP is a member of. Programs are complex combinatorial structures that exhibit even more complex input-output behavior, and in this chapter we discuss why this complexity cannot be effectively reflected by a single scalar objective. In consequence, GP algorithms are systemically ‘underinformed’ about the characteristics of programs they operate on, and pay for this with unsatisfactory performance and limited scalability. This chapter advocates behavioral program synthesis, where programs are characterized by informative execution traces that enable multifaceted evaluation and substantially change the roles of components in an evolutionary infrastructure. We provide a unified perspective on past work in this area, discuss the consequences of the behavioral viewpoint, outlining the future avenues for program synthesis and the wider application areas that lie beyond.
BibTeX:
@incollection{KrawiecSO15,
  author = {Krzysztof Krawiec and Jerry Swan and Una-May O’Reilly},
  title = {Behavioral Program Synthesis: Insights and Prospects},
  booktitle = {Genetic Programming Theory and Practice XIII},
  publisher = {Springer},
  year = {2015},
  url = {http://www.cs.put.poznan.pl/kkrawiec/wiki/uploads/Research/2015GPTP.pdf}
}
Langdon, W.B. Genetic Improvement of Software for Multiple Objectives 2015 Proceedings of the 7th International Symposium on Search-Based Software Engineering (SSBSE ’15), pp. 12-28  inproceedings DOI  
Abstract: Genetic programming (GP) can increase computer program’s functional and non-functional performance. It can automatically port or refactor legacy code written by domain experts. Working with programmers it can grow and graft (GGGP) new functionality into legacy systems and parallel Bioinformatics GPGPU code. We review Genetic Improvement (GI) and SBSE research on evolving software.
BibTeX:
@inproceedings{Langdon15,
  author = {William B. Langdon},
  title = {Genetic Improvement of Software for Multiple Objectives},
  booktitle = {Proceedings of the 7th International Symposium on Search-Based Software Engineering (SSBSE ’15)},
  publisher = {Springer},
  year = {2015},
  pages = {12-28},
  doi = {http://dx.doi.org/10.1007/978-3-319-22183-0_2}
}
Langdon, W.B. Performance of Genetic Programming Optimised Bowtie2 on Genome Comparison and Analytic Testing (GCAT) Benchmarks 2015 BioData Mining
Vol. 8(1), pp. 1-7 
article DOI  
Abstract: Background Genetic studies are increasingly based on short noisy next generation scanners. Typically complete DNA sequences are assembled by matching short NextGen sequences against reference genomes. Despite considerable algorithmic gains since the turn of the millennium, matching both single ended and paired end strings to a reference remains computationally demanding. Further tailoring Bioinformatics tools to each new task or scanner remains highly skilled and labour intensive. With this in mind, we recently demonstrated a genetic programming based automated technique which generated a version of the state-of-the-art alignment tool Bowtie2 which was considerably faster on short sequences produced by a scanner at the Broad Institute and released as part of The Thousand Genome Project. Results Bowtie2 G P and the original Bowtie2 release were compared on bioplanet’s GCAT synthetic benchmarks. Bowtie2 G P enhancements were also applied to the latest Bowtie2 release (2.2.3, 29 May 2014) and retained both the GP and the manually introduced improvements. Conclusions On both singled ended and paired-end synthetic next generation DNA sequence GCAT benchmarks Bowtie2GP runs up to 45% faster than Bowtie2. The lost in accuracy can be as little as 0.2–0.5% but up to 2.5% for longer sequences.
BibTeX:
@article{Langdon15b,
  author = {William B. Langdon},
  title = {Performance of Genetic Programming Optimised Bowtie2 on Genome Comparison and Analytic Testing (GCAT) Benchmarks},
  journal = {BioData Mining},
  year = {2015},
  volume = {8},
  number = {1},
  pages = {1-7},
  doi = {http://dx.doi.org/10.1186/s13040-014-0034-0}
}
Langdon, W.B. Genetically Improved Software 2015 , pp. 181-220  inbook DOI  
Abstract: Genetic programming (GP) can dramatically increase computer programs’ performance. It can automatically port or refactor legacy code written by domain experts and specialist software engineers. After reviewing SBSE research on evolving software we describe an open source parallel StereoCamera image processing application in which GI optimisation gave a seven fold speedup on nVidia Tesla GPU hardware not even imagined when the original state-of-the-art CUDA GPGPU C++ code was written.
BibTeX:
@inbook{Langdon15c,
  author = {William B. Langdon},
  title = {Genetically Improved Software},
  publisher = {Springer},
  year = {2015},
  pages = {181-220},
  doi = {http://dx.doi.org/10.1007/978-3-319-20883-1_8}
}
Langdon, W.B. and Harman, M. Grow and Graft a Better CUDA pknotsRG for RNA Pseudoknot Free Energy Calculation 2015 Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference (GECCO '15) - the 1st International Genetic Improvement Workshop (GI '15), pp. 805-810  inproceedings DOI  
Abstract: Grow and graft genetic programming greatly improves GPGPU dynamic programming software for predicting the minimum binding energy for folding of RNA molecules. The parallel code inserted into the existing CUDA version of pknots was grown using a BNF grammar. On an nVidia Tesla K40 GPU GGGP gives a speed up of up to 10000 times.
BibTeX:
@inproceedings{LangdonH15,
  author = {William B. Langdon and Mark Harman},
  title = {Grow and Graft a Better CUDA pknotsRG for RNA Pseudoknot Free Energy Calculation},
  booktitle = {Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference (GECCO '15) - the 1st International Genetic Improvement Workshop (GI '15)},
  publisher = {ACM},
  year = {2015},
  pages = {805-810},
  doi = {http://dx.doi.org/10.1145/2739482.2768418}
}
Langdon, W.B. and Harman, M. Optimizing Existing Software With Genetic Programming 2015 IEEE Transactions on Evolutionary Computation
Vol. 19(1), pp. 118-135 
article DOI  
Abstract: We show that the genetic improvement of programs (GIP) can scale by evolving increased performance in a widely-used and highly complex 50000 line system. Genetic improvement of software for multiple objective exploration (GISMOE) found code that is 70 times faster (on average) and yet is at least as good functionally. Indeed, it even gives a small semantic gain.
BibTeX:
@article{LangdonH15b,
  author = {William B. Langdon and Mark Harman},
  title = {Optimizing Existing Software With Genetic Programming},
  journal = {IEEE Transactions on Evolutionary Computation},
  year = {2015},
  volume = {19},
  number = {1},
  pages = {118-135},
  doi = {http://dx.doi.org/10.1109/TEVC.2013.2281544}
}
Langdon, W.B. and Lam, B.Y.H. Genetically Improved BarraCUDA 2015 (RN/15/03)  techreport URL 
Abstract: BarraCUDA is a C program which uses the BWA algorithm in parallel with nVidia CUDA to align short next generation DNA sequences against a reference genome. The genetically improved (GI) code is up to three times faster on short paired end reads from The 1000 Genomes Project and 60% more accurate on a short BioPlanet.com GCAT alignment benchmark. GPGPU Barracuda running on a single K80 Tesla GPU can align short paired end nextgen sequences up to ten times faster than bwa on a 12 core CPU.
BibTeX:
@techreport{LangdonL15,
  author = {William B. Langdon and Brian Yee Hong Lam},
  title = {Genetically Improved BarraCUDA},
  year = {2015},
  number = {RN/15/03},
  url = {http://www.cs.ucl.ac.uk/fileadmin/UCL-CS/research/Research_Notes/rn-15-03.pdf}
}
Langdon, W.B., Lam, B.Y.H., Petke, J. and Harman, M. Improving CUDA DNA Analysis Software with Genetic Programming 2015 Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference (GECCO '15), pp. 1063-1070  inproceedings DOI  
Abstract: We genetically improve BarraCUDA using a BNF grammar incorporating C scoping rules with GP. Barracuda maps next generation DNA sequences to the human genome using the Burrows-Wheeler algorithm (BWA) on nVidia Tesla parallel graphics hardware (GPUs). GI using phenotypic tabu search with manually grown code can graft new features giving more than 100 fold speed up on a performance critical kernel without loss of accuracy.
BibTeX:
@inproceedings{LangdonLPH15,
  author = {William B. Langdon and Brian Yee Hong Lam and Justyna Petke and Mark Harman},
  title = {Improving CUDA DNA Analysis Software with Genetic Programming},
  booktitle = {Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference (GECCO '15)},
  publisher = {ACM},
  year = {2015},
  pages = {1063-1070},
  doi = {http://dx.doi.org/10.1145/2739480.2754652}
}
Li, J., Bai, R., Shen, Y. and Qu, R. Search with Evolutionary Ruin and Stochastic Rebuild: A Theoretic Framework and A Case Study on Exam Timetabling 2015 European Journal of Operational Research
Vol. 242(3), pp. 798-806 
article DOI  
Abstract: This paper presents a state transition based formal framework for a new search method, called Evolutionary Ruin and Stochastic Recreate, which tries to learn and adapt to the changing environments during the search process. It improves the performance of the original Ruin and Recreate principle by embedding an additional phase of Evolutionary Ruin to mimic the survival-of-the-fittest mechanism within single solutions. This method executes a cycle of Solution Decomposition, Evolutionary Ruin, Stochastic Recreate and Solution Acceptance until a certain stopping condition is met. The Solution Decomposition phase first uses some problem-specific knowledge to decompose a complete solution into its components and assigns a score to each component. The Evolutionary Ruin phase then employs two evolutionary operators (namely Selection and Mutation) to destroy a certain fraction of the solution, and the next Stochastic Recreate phase repairs the “broken” solution. Last, the Solution Acceptance phase selects a specific strategy to determine the probability of accepting the newly generated solution. Hence, optimisation is achieved by an iterative process of component evaluation, solution disruption and stochastic constructive repair. From the state transitions point of view, this paper presents a probabilistic model and implements a Markov chain analysis on some theoretical properties of the approach. Unlike the theoretical work on genetic algorithm and simulated annealing which are based on state transitions within the space of complete assignments, our model is based on state transitions within the space of partial assignments. The exam timetabling problems are used to test the performance in solving real-world hard problems.
BibTeX:
@article{LiBSQ15,
  author = {Jingpeng Li and Ruibin Bai and Yindong Shen and Rong Qu},
  title = {Search with Evolutionary Ruin and Stochastic Rebuild: A Theoretic Framework and A Case Study on Exam Timetabling},
  journal = {European Journal of Operational Research},
  year = {2015},
  volume = {242},
  number = {3},
  pages = {798-806},
  doi = {http://dx.doi.org/10.1016/j.ejor.2014.11.002}
}
Li, L., Harman, M., Wu, F. and Zhang, Y. SBSelector: Search Based Component Selection for Budget Hardware 2015 Proceedings of the 7th International Symposium on Search-Based Software Engineering (SSBSE '15), pp. 89-294  inproceedings DOI  
Abstract: Determining which functional components should be integrated to a large system is a challenging task, when hardware constraints, such as available memory, are taken into account. We formulate such problem as a multi-objective component selection problem, which searches for feature subsets that balance the provision of maximal functionality at minimal memory resource cost. We developed a search-based component selection tool, and applied it to the KDE-based application, Kate, to find a set of Kate instantiations that balance functionalities and memory consumption. Our results report that, compared to the best attainment of random search, our approach can reduce at most 23.70% memory consumption with respect to the same number components. While comparing to greedy search, the memory reduction can be up to 19.04%. SBSelector finds a instantiation of Kate that provides 16 more components, while only increasing memory by 1.7%.
BibTeX:
@inproceedings{LiHWZ15,
  author = {Lingbo Li and Mark Harman and Fan Wu and Yuanyuan Zhang},
  title = {SBSelector: Search Based Component Selection for Budget Hardware},
  booktitle = {Proceedings of the 7th International Symposium on Search-Based Software Engineering (SSBSE '15)},
  publisher = {Springer},
  year = {2015},
  pages = {89-294},
  doi = {http://dx.doi.org/10.1007/978-3-319-22183-0_25}
}
Li, B., Li, J., Tang, K. and Yao, X. Many-Objective Evolutionary Algorithms: A Survey 2015 ACM Computing Surveys
Vol. 48(1) 
article DOI  
Abstract: Multiobjective evolutionary algorithms (MOEAs) have been widely used in real-world applications. However, most MOEAs based on Pareto-dominance handle many-objective problems (MaOPs) poorly due to a high proportion of incomparable and thus mutually nondominated solutions. Recently, a number of many-objective evolutionary algorithms (MaOEAs) have been proposed to deal with this scalability issue. In this article, a survey of MaOEAs is reported. According to the key ideas used, MaOEAs are categorized into seven classes: relaxed dominance based, diversity-based, aggregation-based, indicator-based, reference set based, preference-based, and dimensionality reduction approaches. Several future research directions in this field are also discussed.
BibTeX:
@article{LiLTY15,
  author = {Bingdong Li and Jinlong Li and Ke Tang and Xin Yao},
  title = {Many-Objective Evolutionary Algorithms: A Survey},
  journal = {ACM Computing Surveys},
  year = {2015},
  volume = {48},
  number = {1},
  doi = {http://dx.doi.org/10.1145/2792984}
}
Marginean, A., Barr, E.T., Harman, M. and Jia, Y. Automated Transplantation of Call Graph and Layout Features into Kate 2015 Proceedings of the 7th International Symposium on Search-Based Software Engineering (SSBSE ’15), pp. 262-268  inproceedings DOI  
Abstract: We report the automated transplantation of two features currently missing from Kate: call graph generation and automatic layout for C programs, which have been requested by users on the Kate development forum. Our approach uses a lightweight annotation system with Search Based techniques augmented by static analysis for automated transplantation. The results are promising: on average, our tool requires 101 min of standard desktop machine time to transplant the call graph feature, and 31 min to transplant the layout feature. We repeated each experiment 20 times and validated the resulting transplants using unit, regression and acceptance test suites. In 34 of 40 experiments conducted our search-based autotransplantation tool, μ Scalpel, was able to successfully transplant the new functionality, passing all tests.
BibTeX:
@inproceedings{MargineanBHJ15,
  author = {Alexandru Marginean and Earl T. Barr and Mark Harman and Yue Jia},
  title = {Automated Transplantation of Call Graph and Layout Features into Kate},
  booktitle = {Proceedings of the 7th International Symposium on Search-Based Software Engineering (SSBSE ’15)},
  publisher = {Springer},
  year = {2015},
  pages = {262-268},
  doi = {http://dx.doi.org/10.1007/978-3-319-22183-0_21}
}
Minku, L., Sarro, F., Mendes, E. and Ferrucci, F. How to Make Best Use of Cross-Company Data for Web Effort Estimation? 2015 Proceedings of ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM '15), pp. 1-10  inproceedings DOI  
Abstract: [Context]: The numerous challenges that can hinder software companies from gathering their own data have motivated over the past 15 years research on the use of cross-company (CC) datasets for software effort prediction. Part of this research focused on Web effort prediction, given the large increase worldwide in the development of Web applications. Some of these studies indicate that it may be possible to achieve better performance using CC models if some strategy to make the CC data more similar to the within-company (WC) data is adopted. [Goal]: This study investigates the use of a recently proposed approach called Dycom to assess to what extent Web effort predictions obtained using CC datasets are effective in relation to the predictions obtained using WC data when explicitly mapping the CC models to the WC context. [Method]: Data on 125 Web projects from eight different companies part of the Tukutuku database were used to build prediction models. We benchmarked these models against baseline models (mean and median effort) and a WC base learner that does not benefit of the mapping. We also compared Dycom against a competitive CC approach from the literature (NN-filtering). We report a company-by- company analysis. [Results]: Dycom usually managed to achieve similar or better performance than a WC model while using only half of the WC training data. These results are also an improvement over previous studies that investigated the use of different strategies to adapt CC models to the WC data for Web effort estimation. [Conclusions]: We conclude that the use of Dycom for Web effort prediction is quite promising and in general supports previous results when applying Dycom to conventional software datasets.
BibTeX:
@inproceedings{MinkuSMF15,
  author = {Leandro Minku and Federica Sarro and Emilia Mendes and Filomena Ferrucci},
  title = {How to Make Best Use of Cross-Company Data for Web Effort Estimation?},
  booktitle = {Proceedings of ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM '15)},
  publisher = {IEEE},
  year = {2015},
  pages = {1-10},
  doi = {http://dx.doi.org/10.1109/ESEM.2015.7321199}
}
Ochoa, G., Chicano, F., Tinos, R. and Whitley, D. Tunnelling Crossover Networks 2015 Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation (GECCO '15), pp. 449-456  inproceedings DOI  
Abstract: Local optima networks are a recent model of fitness landscapes. They compress the landscape by representing local optima as nodes, and search transitions among them as edges. Previous local optima networks considered transitions based on mutation; this study looks instead at transitions based on deterministic recombination. We define and analyse networks based on the recently proposed partition crossover for k-bounded pseudo-Boolean functions, using NKq landscapes as a case study. Partition crossover was initially proposed for the travelling salesman problem, where it was found to ``tunnel" between local optima, i.e., jump from local optimum to local optimum. Our network analysis shows that this also happens for NK landscapes: local optima are densely connected via partition crossover. We found marked differences between the adjacent and random interaction NK models. Surprisingly, with the random model, instances have a lower number of local optima on average, but their networks are more sparse and decompose into several clusters. There is also large variability in the size and pattern of connectivity of instances coming from the same landscape parameter values. These network features offer new insight informing why some instances are harder to solve than others.
BibTeX:
@inproceedings{OchoaCTW15,
  author = {Gabriela Ochoa and Francisco Chicano and Renato Tinos and Darrell Whitley},
  title = {Tunnelling Crossover Networks},
  booktitle = {Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation (GECCO '15)},
  publisher = {ACM},
  year = {2015},
  pages = {449-456},
  doi = {http://dx.doi.org/10.1145/2739480.2754657}
}
Paixao, M., Harman, M. and Zhang, Y. Multi-objective Module Clustering for Kate 2015 Proceedings of the 7th International Symposium on Search-Based Software Engineering (SSBSE '15), pp. 282-288  inproceedings DOI  
Abstract: This paper applies multi-objective search based software remodularization to the program Kate, showing how this can improve cohesion and coupling, and investigating differences between weighted and unweighted approaches and between equal-size and maximising clusters approaches. We also investigate the effects of considering omnipresent modules. Overall, we provide evidence that search based modularization can benefit Kate developers.
BibTeX:
@inproceedings{PaixaoHZ15,
  author = {Matheus Paixao and Mark Harman and Yuanyuan Zhang},
  title = {Multi-objective Module Clustering for Kate},
  booktitle = {Proceedings of the 7th International Symposium on Search-Based Software Engineering (SSBSE '15)},
  publisher = {Springer},
  year = {2015},
  pages = {282-288},
  doi = {http://dx.doi.org/10.1007/978-3-319-22183-0_24}
}
Paixao, M., Harman, M. and Zhang, Y. Improving the Module Clustering of a C/C++ Editor using a Multi-objective Genetic Algorithm 2015 (RN/15/02)  techreport URL 
Abstract: This Technical Report applies multi-objective search based software remodularization to a C/C++ editor called Kate, showing how this can improve cohesion and coupling, and investigating differences between weighted and unweighted approaches and between equal-size and maximising clusters approaches. We also investigate the effects of considering omnipresent modules. Overall, we provide evidence that search based modularization can benefit Kate developers.
BibTeX:
@techreport{PaixaoHZ15b,
  author = {Matheus Paixao and Mark Harman and Yuanyuan Zhang},
  title = {Improving the Module Clustering of a C/C++ Editor using a Multi-objective Genetic Algorithm},
  year = {2015},
  number = {RN/15/02},
  url = {http://www.cs.ucl.ac.uk/fileadmin/UCL-CS/research/Research_Notes/rn-15-02-matheus_paixao.pdf}
}
Pérez-Ortiz, M., Gutiérrez, P.A., Hervás-Martínez, Cé. and Yao, X. Graph-Based Approaches for Over-sampling in the context of Ordinal Regression 2015 IEEE Transactions on Knowledge and Data Engineering
Vol. 27(5), pp. 1233 - 1245 
article DOI  
Abstract: The classification of patterns into naturally ordered labels is referred to as ordinal regression or ordinal classification. Usually, this classification setting is by nature highly imbalanced, because there are classes in the problem that are a priori more probable than others. Although standard over-sampling methods can improve the classification of minority classes in ordinal classification, they tend to introduce severe errors in terms of the ordinal label scale, given that they do not take the ordering into account. A specific ordinal over-sampling method is developed in this paper for the first time in order to improve the performance of machine learning classifiers. The method proposed includes ordinal information by approaching over-sampling from a graph-based perspective. The results presented in this paper show the good synergy of a popular ordinal regression method (a reformulation of support vector machines) with the graph-based proposed algorithms, and the possibility of improving both the classification and the ordering of minority classes. A cost-sensitive version of the ordinal regression method is also introduced and compared with the over-sampling proposals, showing in general lower performance for minority classes.
BibTeX:
@article{PerezOrtizGHY15,
  author = {María Pérez-Ortiz and Pedro Antonio Gutiérrez and César Hervás-Martínez and Xin Yao},
  title = {Graph-Based Approaches for Over-sampling in the context of Ordinal Regression},
  journal = {IEEE Transactions on Knowledge and Data Engineering},
  year = {2015},
  volume = {27},
  number = {5},
  pages = {1233 - 1245},
  doi = {http://dx.doi.org/10.1109/TKDE.2014.2365780}
}
Sarro, F., Al-Subaihin, A.A., Harman, M., Jia, Y., Martin, W. and Zhang, Y. Feature Lifecycles as They Spread, Migrate, Remain, and Die in App Stores 2015 Proceedings of IEEE 23rd International Requirements Engineering Conference (RE '15), pp. 76-85  inproceedings DOI  
Abstract: We introduce a theoretical characterisation of feature lifecycles in app stores, to help app developers to identify trends and to find undiscovered requirements. To illustrate and motivate app feature lifecycle analysis, we use our theory to empirically analyse the migratory and non-migratory behaviours of 4,053 non-free features from two App Stores (Samsung and BlackBerry). The results reveal that, in both stores, intransitive features (those that neither migrate nor die out) exhibit significantly different behaviours with regard to important properties, such as their price. Further correlation analysis also highlights differences between trends relating price, rating, and popularity. Our results indicate that feature lifecycle analysis can yield insights that may also help developers to understand feature behaviours and attribute relationships.
BibTeX:
@inproceedings{SarroAHJMZ15,
  author = {Federica Sarro and Afnan A. Al-Subaihin and Mark Harman and Yue Jia and William Martin and Yuanyuan Zhang},
  title = {Feature Lifecycles as They Spread, Migrate, Remain, and Die in App Stores},
  booktitle = {Proceedings of IEEE 23rd International Requirements Engineering Conference (RE '15)},
  publisher = {IEEE},
  year = {2015},
  pages = {76-85},
  doi = {http://dx.doi.org/10.1109/RE.2015.7320410}
}
Shen, X.-N. and Yao, X. Mathematical Modeling and Multi-objective Evolutionary Algorithms applied to Dynamic Flexible Job Shop Scheduling Problems 2015 Information Sciences
Vol. 298, pp. 198-224 
article DOI  
Abstract: Dynamic flexible job shop scheduling is of significant importance to the implementation of real-world manufacturing systems. In order to capture the dynamic and multi-objective nature of flexible job shop scheduling, and provide different trade-offs among objectives, this paper develops a multi-objective evolutionary algorithm (MOEA)-based proactive–reactive method. The novelty of our method is that it is able to handle multiple objectives including efficiency and stability simultaneously, adapt to the new environment quickly by incorporating heuristic dynamic optimization strategies, and deal with two scheduling policies of machine assignment and operation sequencing together. Besides, a new mathematical model for the multi-objective dynamic flexible job shop scheduling problem (MODFJSSP) is constructed. With the aim of selecting one solution that fits into the decision maker’s preferences from the trade-off solution set found by MOEA, a dynamic decision making procedure is designed. Experimental results in a simulated dynamic flexible job shop show that our method can achieve much better performances than combinations of existing scheduling rules. Three MOEA-based rescheduling methods are compared. The modified ɛ-MOEA has the best overall performance in dynamic environments, and its computational time is much less than two others (i.e., NSGA-II and SPEA2). Utilities of introducing the stability objective, heuristic initialization strategies and the decision making approach are also validated.
BibTeX:
@article{ShenY15,
  author = {Xiao-Ning Shen and Xin Yao},
  title = {Mathematical Modeling and Multi-objective Evolutionary Algorithms applied to Dynamic Flexible Job Shop Scheduling Problems},
  journal = {Information Sciences},
  year = {2015},
  volume = {298},
  pages = {198-224},
  doi = {http://dx.doi.org/10.1016/j.ins.2014.11.036}
}
Soria-Alcaraz, J.A., Ochoa, G., Goeffon, A., Lardeux, F. and Saubion, F. Combining Mutation and Recombination to Improve a Distributed Model of Adaptive Operator Selection 2015 Artificial EvolutionProceedings of the Conference on Artificial Evolution (EA '15)  inproceedings URL 
BibTeX:
@inproceedings{Soria-AlcarazOGLS15,
  author = {Jorge A. Soria-Alcaraz and Gabriela Ochoa and Adrien Goeffon and Frederic Lardeux and Frederic Saubion},
  title = {Combining Mutation and Recombination to Improve a Distributed Model of Adaptive Operator Selection},
  booktitle = {Proceedings of the Conference on Artificial Evolution (EA '15)},
  journal = {Artificial Evolution},
  year = {2015},
  url = {http://www.cs.stir.ac.uk/~goc/papers/dimaosxEA2015.pdf}
}
Swan, J., Adriaensen, S., Bishr, M., Burke, E.K., Clark, J.A., Causmaecker, P.D., Durillo, J., Hammond, K., Hart, E., Johnson, C.G., Kocsis, Z.A., Kovitz, B., Krawiec, K., Martin, S., Merelo, J.J., Minku, L.L., Özcan, E., Pappa, G.L., Pesch, E., Garcia-Sànchez, P., Schaerf, A., Sim, K., Smith, J., Stützle, T., Vo S., Wagner, S. and Yao, X. A Research Agenda for Metaheuristic Standardization 2015 Proceedings of the 11th Metaheuristics International Conference (MIC '15)  inproceedings URL 
BibTeX:
@inproceedings{Swan15,
  author = {Jerry Swan and Steven Adriaensen and Mohamed Bishr and Edmund K. Burke and John A. Clark and Patrick De Causmaecker and Juanjo Durillo and Kevin Hammond and Emma Hart and Colin G. Johnson and Zoltan A. Kocsis and Ben Kovitz and Krzysztof Krawiec and Simon Martin and J. J. Merelo and Leandro L. Minku and Ender Özcan and Gisele L. Pappa and Erwin Pesch and Pablo Garcia-Sànchez and Andrea Schaerf and Kevin Sim and Jim Smith and Thomas Stützle and Stefan Voß and Stefan Wagner and Xin Yao},
  title = {A Research Agenda for Metaheuristic Standardization},
  booktitle = {Proceedings of the 11th Metaheuristics International Conference (MIC '15)},
  year = {2015},
  url = {http://www.cs.nott.ac.uk/~pszeo/docs/publications/research-agenda-metaheuristic.pdf}
}
Swan, J. and Burles, N. Templar – A Framework for Template-Method Hyper-Heuristics 2015 Proceedings of the 18th European Conference on Genetic Programming (EuroGP '15), pp. 205-216  inproceedings DOI  
Abstract: In this work we introduce Templar, a software framework for customising algorithms via the generative technique of template-method hyper-heuristics. We first discuss the need for such an approach, presenting Quicksort as an example. We provide a functional definition of template-method hyper-heuristics, describe how this is implemented by Templar, and show how Templar may be invoked using simple client-code. Finally, we describe experiments using Templar to define a ‘hyper-quicksort’ with the aim of reducing power consumption—the results demonstrate that the generated algorithm has significantly improved performance on the test set.
BibTeX:
@inproceedings{SwanB15,
  author = {Jerry Swan and Nathan Burles},
  title = {Templar – A Framework for Template-Method Hyper-Heuristics},
  booktitle = {Proceedings of the 18th European Conference on Genetic Programming (EuroGP '15)},
  publisher = {Springer},
  year = {2015},
  pages = {205-216},
  doi = {http://dx.doi.org/10.1007/978-3-319-16501-1_17}
}
Veerapen, N., Ochoa, G., Harman, M. and Burke, E.K. An Integer Linear Programming approach to the single and bi-objective Next Release Problem 2015 Information and Software Technology
Vol. 65, pp. 1-13 
article DOI  
Abstract: Context

The Next Release Problem involves determining the set of requirements to implement in the next release of a software project. When the problem was first formulated in 2001, Integer Linear Programming, an exact method, was found to be impractical because of large execution times. Since then, the problem has mainly been addressed by employing metaheuristic techniques.

Objective

In this paper, we investigate if the single-objective and bi-objective Next Release Problem can be solved exactly and how to better approximate the results when exact resolution is costly.

Methods

We revisit Integer Linear Programming for the single-objective version of the problem. In addition, we integrate it within the Epsilon-constraint method to address the bi-objective problem. We also investigate how the Pareto front of the bi-objective problem can be approximated through an anytime deterministic Integer Linear Programming-based algorithm when results are required within strict runtime constraints. Comparisons are carried out against NSGA-II. Experiments are performed on a combination of synthetic and real-world datasets.

Findings

We show that a modern Integer Linear Programming solver is now a viable method for this problem. Large single objective instances and small bi-objective instances can be solved exactly very quickly. On large bi-objective instances, execution times can be significant when calculating the complete Pareto front. However, good approximations can be found effectively.

Conclusion

This study suggests that (1) approximation algorithms can be discarded in favor of the exact method for the single-objective instances and small bi-objective instances, (2) the Integer Linear Programming-based approximate algorithm outperforms the NSGA-II genetic approach on large bi-objective instances, and (3) the run times for both methods are low enough to be used in real-world situations.

BibTeX:
@article{VeerapenOHB15,
  author = {Nadarajen Veerapen and Gabriela Ochoa and Mark Harman and Edmund K. Burke},
  title = {An Integer Linear Programming approach to the single and bi-objective Next Release Problem},
  journal = {Information and Software Technology},
  year = {2015},
  volume = {65},
  pages = {1-13},
  doi = {http://dx.doi.org/10.1016/j.infsof.2015.03.008}
}
Wang, P., Emmerich, M., Li, R., Tang, K., Back, T. and Yao, X. Convex Hull-Based Multi-objective Genetic Programming for Maximizing Receiver Operating Characteristic Performance 2015 IEEE Transactions on Evolutionary Computation
Vol. 19(2), pp. 188-200 
article DOI  
Abstract: The receiver operating characteristic (ROC) is commonly used to analyze the performance of classifiers in data mining. An important topic in ROC analysis is the ROC convex hull (ROCCH), which is the least convex majorant (LCM) of the empirical ROC curve and covers potential optima for a given set of classifiers. ROCCH maximization problems have been taken as multiobjective optimization problem (MOPs) in some previous work. However, the special characteristics of ROCCH maximization problem makes it different from traditional MOPs. In this paper, the difference will be discussed in detail and a new convex hull-based multiobjective genetic programming (CH-MOGP) is proposed to solve ROCCH maximization problems. Specifically, convex hull-based without redundancy sorting (CWR-sorting) is introduced, which is an indicator-based selection scheme that aims to maximize the area under the convex hull. A novel selection procedure is also proposed based on the proposed sorting scheme. It is hypothesized that by using a tailored indicator-based selection, CH-MOGP becomes more efficient for ROC convex hull approximation than algorithms that compute all Pareto optimal points. Empirical studies are conducted to compare CH-MOGP to both existing machine learning approaches and multiobjective genetic programming (MOGP) methods with classical selection schemes. Experimental results show that CH-MOGP outperforms the other approaches significantly.
BibTeX:
@article{WangELTBY15,
  author = {Pu Wang and Michael Emmerich and Rui Li and Ke Tang and Thomas Back and Xin Yao},
  title = {Convex Hull-Based Multi-objective Genetic Programming for Maximizing Receiver Operating Characteristic Performance},
  journal = {IEEE Transactions on Evolutionary Computation},
  year = {2015},
  volume = {19},
  number = {2},
  pages = {188-200},
  doi = {http://dx.doi.org/10.1109/TEVC.2014.2305671}
}
Wang, H., Jiao, L. and Yao, X. Two_Arch2: An Improved Two-Archive Algorithm for Many-Objective Optimization 2015 IEEE Transactions on Evolutionary Computation
Vol. 19(4), pp. 524-541 
article DOI  
Abstract: Many-objective optimization problems (ManyOPs) refer, usually, to those multiobjective problems (MOPs) with more than three objectives. Their large numbers of objectives pose challenges to multiobjective evolutionary algorithms (MOEAs) in terms of convergence, diversity, and complexity. Most existing MOEAs can only perform well in one of those three aspects. In view of this, we aim to design a more balanced MOEA on ManyOPs in all three aspects at the same time. Among the existing MOEAs, the two-archive algorithm (Two_Arch) is a low-complexity algorithm with two archives focusing on convergence and diversity separately. Inspired by the idea of Two_Arch, we propose a significantly improved two-archive algorithm (i.e., Two_Arch2) for ManyOPs in this paper. In our Two_Arch2, we assign different selection principles (indicator-based and Pareto-based) to the two archives. In addition, we design a new Lp-norm-based (p <; 1) diversity maintenance scheme for ManyOPs in Two_Arch2. In order to evaluate the performance of Two_Arch2 on ManyOPs, we have compared it with several MOEAs on a wide range of benchmark problems with different numbers of objectives. The experimental results show that Two_Arch2 can cope with ManyOPs (up to 20 objectives) with satisfactory convergence, diversity, and complexity.
BibTeX:
@article{WangJY15,
  author = {Handing Wang and Licheng Jiao and Xin Yao},
  title = {Two_Arch2: An Improved Two-Archive Algorithm for Many-Objective Optimization},
  journal = {IEEE Transactions on Evolutionary Computation},
  year = {2015},
  volume = {19},
  number = {4},
  pages = {524-541},
  doi = {http://dx.doi.org/10.1109/TEVC.2014.2350987}
}
Wang, S., Minku, L.L. and Yao, X. Resampling-Based Ensemble Methods for Online Class Imbalance Learning 2015 IEEE Transactions on Knowledge and Data Engineering
Vol. 27(5), pp. 1356 - 1368 
article DOI  
Abstract: Online class imbalance learning is a new learning problem that combines the challenges of both online learning and class imbalance learning. It deals with data streams having very skewed class distributions. This type of problems commonly exists in real-world applications, such as fault diagnosis of real-time control monitoring systems and intrusion detection in computer networks. In our earlier work, we defined class imbalance online, and proposed two learning algorithms OOB and UOB that build an ensemble model overcoming class imbalance in real time through resampling and time-decayed metrics. In this paper, we further improve the resampling strategy inside OOB and UOB, and look into their performance in both static and dynamic data streams. We give the first comprehensive analysis of class imbalance in data streams, in terms of data distributions, imbalance rates and changes in class imbalance status. We find that UOB is better at recognizing minority-class examples in static data streams, and OOB is more robust against dynamic changes in class imbalance status. The data distribution is a major factor affecting their performance. Based on the insight gained, we then propose two new ensemble methods that maintain both OOB and UOB with adaptive weights for final predictions, called WEOB1 and WEOB2. They are shown to possess the strength of OOB and UOB with good accuracy and robustness.
BibTeX:
@article{WangMY15,
  author = {Shuo Wang and Leandro L. Minku and Xin Yao},
  title = {Resampling-Based Ensemble Methods for Online Class Imbalance Learning},
  journal = {IEEE Transactions on Knowledge and Data Engineering},
  year = {2015},
  volume = {27},
  number = {5},
  pages = {1356 - 1368},
  doi = {http://dx.doi.org/10.1109/TKDE.2014.2345380}
}
White, D.R. and Singer, J. Rethinking Genetic Improvement Programming 2015 Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference (GECCO '15) - the 1st International Genetic Improvement Workshop (GI '15), pp. 845-846  inproceedings DOI  
Abstract: We re-examine the central motivation behind Genetic Improvement Programming (GIP), and argue that the most important insight is the concept of applying Genetic Programming to existing software. This viewpoint allows us to make several observations about potential directions for GIP research.
BibTeX:
@inproceedings{WhiteS15,
  author = {David R. White and Jeremy Singer},
  title = {Rethinking Genetic Improvement Programming},
  booktitle = {Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference (GECCO '15) - the 1st International Genetic Improvement Workshop (GI '15)},
  publisher = {ACM},
  year = {2015},
  pages = {845-846},
  doi = {http://dx.doi.org/10.1145/2739482.2768426}
}
Wu, F., Weimer, W., Harman, M., Jia, Y. and Krinke, J. Deep Parameter Optimisation 2015 Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference (GECCO '15), pp. 1375-1382  inproceedings DOI  
Abstract: We introduce a mutation-based approach to automatically discover and expose `deep' (previously unavailable) parameters that affect a program's runtime costs. These discovered parameters, together with existing (`shallow') parameters, form a search space that we tune using search-based optimisation in a bi-objective formulation that optimises both time and memory consumption. We implemented our approach and evaluated it on four real-world programs. The results show that we can improve execution time by 12% or achieve a 21% memory consumption reduction in the best cases. In three subjects, our deep parameter tuning results in a significant improvement over the baseline of shallow parameter tuning, demonstrating the potential value of our deep parameter extraction approach.
BibTeX:
@inproceedings{WuWHJK15,
  author = {Fan Wu and Westley Weimer and Mark Harman and Yue Jia and Jens Krinke},
  title = {Deep Parameter Optimisation},
  booktitle = {Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference (GECCO '15)},
  publisher = {ACM},
  year = {2015},
  pages = {1375-1382},
  doi = {http://dx.doi.org/10.1145/2739480.2754648}
}
Yang, X., Tang, K. and Yao, X. A Learning-to-Rank Approach to Software Defect Prediction 2015 IEEE Transactions on Reliability
Vol. 64(1), pp. 234 - 246 
article DOI  
Abstract: Software defect prediction can help to allocate testing resources efficiently through ranking software modules according to their defects. Existing software defect prediction models that are optimized to predict explicitly the number of defects in a software module might fail to give an accurate order because it is very difficult to predict the exact number of defects in a software module due to noisy data. This paper introduces a learning-to-rank approach to construct software defect prediction models by directly optimizing the ranking performance. In this paper, we build on our previous work, and further study whether the idea of directly optimizing the model performance measure can benefit software defect prediction model construction. The work includes two aspects: one is a novel application of the learning-to-rank approach to real-world data sets for software defect prediction, and the other is a comprehensive evaluation and comparison of the learning-to-rank method against other algorithms that have been used for predicting the order of software modules according to the predicted number of defects. Our empirical studies demonstrate the effectiveness of directly optimizing the model performance measure for the learning-to-rank approach to construct defect prediction models for the ranking task.
BibTeX:
@article{YangTY15,
  author = {Xiaoxing Yang and Ke Tang and Xin Yao},
  title = {A Learning-to-Rank Approach to Software Defect Prediction},
  journal = {IEEE Transactions on Reliability},
  year = {2015},
  volume = {64},
  number = {1},
  pages = {234 - 246},
  doi = {http://dx.doi.org/10.1109/TR.2014.2370891}
}
Zhang, Y., Harman, M., Jia, Y. and Sarro, F. Inferring Test Models from Kate's Bug Reports using Multi-objectives Search 2015 Proceedings of the 7th International Symposium on Search-Based Software Engineering (SSBSE '15), pp. 301-307  inproceedings DOI  
Abstract: Models inferred from system execution logs can be used to test general system behaviour. In this paper, we infer test models from user bug reports that are written in the natural language. The inferred models can be used to derive new tests which further exercise the buggy features reported by users. Our search-based model inference approach considers three objectives: (1) to reduce the number of invalid user events generated (over approximation), (2) to reduce the number of unrecognised user events (under approximation), (3) to reduce the size of the model (readability). We apply our approach to 721 of Kate’s bug reports which contain the information required to reproduce the bugs. We compare our results to start-of-the-art KLFA tool. Our results show that our inferred models require 19 tests to reveal a bug on average, which is 98 times fewer than the models inferred by KLFA.
BibTeX:
@inproceedings{ZhangHJS15,
  author = {Yuanyuan Zhang and Mark Harman and Yue Jia and Federica Sarro},
  title = {Inferring Test Models from Kate's Bug Reports using Multi-objectives Search},
  booktitle = {Proceedings of the 7th International Symposium on Search-Based Software Engineering (SSBSE '15)},
  publisher = {Springer},
  year = {2015},
  pages = {301-307},
  doi = {http://dx.doi.org/10.1007/978-3-319-22183-0_27}
}
Barr, E.T., Brun, Y., Devanbu, P., Harman, M. and Sarro, F. The Plastic Surgery Hypothesis 2014 Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE '14), pp. 306-317  inproceedings DOI  
Abstract: Recent work on genetic-programming-based approaches to automatic program patching have relied on the insight that the content of new code can often be assembled out of fragments of code that already exist in the code base. This insight has been dubbed the plastic surgery hypothesis; successful, well-known automatic repair tools such as GenProg rest on this hypothesis, but it has never been validated. We formalize and validate the plastic surgery hypothesis and empirically measure the extent to which raw material for changes actually already exists in projects. In this paper, we mount a large-scale study of several large Java projects, and examine a history of 15,723 commits to determine the extent to which these commits are graftable, i.e., can be reconstituted from existing code, and find an encouraging degree of graftability, surprisingly independent of commit size and type of commit. For example, we find that changes are 43% graftable from the exact version of the software being changed. With a view to investigating the difficulty of finding these grafts, we study the abundance of such grafts in three possible sources: the immediately previous version, prior history, and other projects. We also examine the contiguity or chunking of these grafts, and the degree to which grafts can be found in the same file. Our results are quite promising and suggest an optimistic future for automatic program patching methods that search for raw material in already extant code in the project being patched.
BibTeX:
@inproceedings{BarrBDHS14,
  author = {Earl T. Barr and Yuriy Brun and Premkumar Devanbu and Mark Harman and Federica Sarro},
  title = {The Plastic Surgery Hypothesis},
  booktitle = {Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE '14)},
  publisher = {ACM},
  year = {2014},
  pages = {306-317},
  doi = {http://dx.doi.org/10.1145/2635868.2635898}
}
Brownlee, A.E., Swan, J., Özcan, E. and Parkes, A.J. Hyperion2: A Toolkit for Meta-, Hyper- Heuristic Research 2014 Proceedings of the 2014 Conference Companion on Genetic and Evolutionary Computation Companion (GECCO '14), pp. 1133-1140  inproceedings DOI  
Abstract: In order for appropriate meta-heuristics to be chosen and tuned for specific problems, it is critical that we better understand the problems themselves and how algorithms solve them. This is particularly important as we seek to automate the process of choosing and tuning algorithms and their operators via hyper-heuristics. If meta-heuristics are viewed as sampling algorithms, they can be classified by the trajectory taken through the search space. We term this trajectory a trace. In this paper, we present Hyperion2, a Java™ framework for meta- and hyper- heuristics which allows the analysis of the trace taken by an algorithm and its constituent components through the search space. Built with the principles of interoperability, generality and efficiency, we intend that this framework will be a useful aid to scientific research in this domain.
BibTeX:
@inproceedings{BrownleeSOP14,
  author = {Alexander E.I. Brownlee and Jerry Swan and Ender Özcan and Andrew J. Parkes},
  title = {Hyperion2: A Toolkit for Meta-, Hyper- Heuristic Research},
  booktitle = {Proceedings of the 2014 Conference Companion on Genetic and Evolutionary Computation Companion (GECCO '14)},
  publisher = {ACM},
  year = {2014},
  pages = {1133-1140},
  doi = {http://dx.doi.org/10.1145/2598394.2605687}
}
Consoli, P.A., Minku, L.L. and Yao, X. Dynamic Selection of Evolutionary Algorithm Operators Based on Online Learning and Fitness Landscape Metrics 2014 Proceedings of the 10th International Conference on Simulated Evolution And Learning (SEAL'14), pp. 359-370  inproceedings DOI  
Abstract: Self-adaptive mechanisms for the identification of the most suitable variation operator in Evolutionary meta-heuristics rely almost exclusively on the measurement of the fitness of the offspring, which may not be sufficient to assess the optimality of an operator (e.g., in a landscape with an high degree of neutrality). This paper proposes a novel Adaptive Operator Selection mechanism which uses a set of four Fitness Landscape Analysis techniques and an online learning algorithm, Dynamic Weighted Majority, to provide more detailed informations about the search space in order to better determine the most suitable crossover operator on a set of Capacitated Arc Routing Problem (CARP) instances. Extensive comparison with a state of the art approach has proved that this technique is able to produce comparable results on the set of benchmark problems.
BibTeX:
@inproceedings{ConsoliMY14,
  author = {Pietro A. Consoli and Leandro L. Minku and Xin Yao},
  title = {Dynamic Selection of Evolutionary Algorithm Operators Based on Online Learning and Fitness Landscape Metrics},
  booktitle = {Proceedings of the 10th International Conference on Simulated Evolution And Learning (SEAL'14)},
  publisher = {Springer},
  year = {2014},
  pages = {359-370},
  doi = {http://dx.doi.org/10.1007/978-3-319-13563-2_31}
}
Epitropakis, M.G., Caraffini, F., Neri, F. and Burk, E.K. A Separability Prototype for Automatic Memes with Adaptive Operator Selection 2014 Proceedings of IEEE Symposium on Foundations of Computational Intelligence (FOCI '14), pp. 70-77  inproceedings DOI  
Abstract: One of the main challenges in algorithmics in general, and in Memetic Computing, in particular, is the automatic design of search algorithms. A recent advance in this direction (in terms of continuous problems) is the development of a software prototype that builds up an algorithm based upon a problem analysis of its separability. This prototype has been called the Separability Prototype for Automatic Memes (SPAM). This article modifies the SPAM by incorporating within it an adaptive model used in hyper-heuristics for tackling optimization problems. This model, namely Adaptive Operator Selection (AOS), rewards at run time the most promising heuristics/memes so that they are more likely to be used in the following stages of the search process. The resulting framework, here referred to as SPAM-AOS, has been tested on various benchmark problems and compared with modern algorithms representing the-state-of-the-art of search for continuous problems. Numerical results show that the proposed SPAM-AOS is a promising framework that outperforms the original SPAM and other modern algorithms. Most importantly, this study shows how certain areas of Memetic Computing and Hyper-heuristics are very closely related topics and it also shows that their combination can lead to the development of powerful algorithmic frameworks.
BibTeX:
@inproceedings{EpitropakisCNB14,
  author = {Michael G. Epitropakis and Fabio Caraffini and Ferrante Neri and Edmund K. Burk},
  title = {A Separability Prototype for Automatic Memes with Adaptive Operator Selection},
  booktitle = {Proceedings of IEEE Symposium on Foundations of Computational Intelligence (FOCI '14)},
  publisher = {IEEE},
  year = {2014},
  pages = {70-77},
  doi = {http://dx.doi.org/10.1109/FOCI.2014.7007809}
}
Ferrucci, F., Harman, M. and Sarro, F. Search-Based Software Project Management, in Software Project Management in a Changing World 2014 , pp. 373-399  inbook DOI  
Abstract: Project management presents the manager with a complex set of related optimisation problems. Decisions made can more profoundly affect the outcome of a project than any other activity. In the chapter, we provide an overview of Search-Based Software Project Management, in which search-based software engineering (SBSE) is applied to problems in software project management. We show how SBSE has been used to attack the problems of staffing, scheduling, risk, and effort estimation. SBSE can help to solve the optimisation problems the manager faces, but it can also yield insight. SBSE therefore provides both decision making and decision support. We provide a comprehensive survey of search-based software project management and give directions for the development of this subfield of SBSE.
BibTeX:
@inbook{FerrucciHS14,
  author = {Filomena Ferrucci and Mark Harman and Federica Sarro},
  title = {Search-Based Software Project Management, in Software Project Management in a Changing World},
  publisher = {Springer},
  year = {2014},
  pages = {373-399},
  doi = {http://dx.doi.org/10.1007/978-3-642-55035-5_15}
}
Fu, H., Lewis, P.R., Sendhoff, B., Tang, K. and Yao, X. What Are Dynamic Optimization Problems? 2014 Proceedings of the IEEE Congress on Evolutionary Computation (CEC '14), pp. 1550-1557  inproceedings URL 
Abstract: Dynamic Optimization Problems (DOPs) have been widely studied using Evolutionary Algorithms (EAs). Yet, a clear and rigorous definition of DOPs is lacking in the Evolutionary Dynamic Optimization (EDO) community. In this paper, we propose a unified definition of DOPs based on the idea of multiple-decision-making discussed in the Reinforcement Learning (RL) community. We draw a connection between EDO and RL by arguing that both of them are studying DOPs according to our definition of DOPs. We point out that existing EDO or RL research has been mainly focused on some types of
DOPs. A conceptualized benchmark problem, which is aimed at the systematic study of various DOPs, is then developed. Some interesting experimental studies on the benchmark reveal that EDO and RL methods are specialized in certain types of DOPs and more importantly new algorithms for DOPs can be developed by combining the strength of both EDO and RL methods.
BibTeX:
@inproceedings{FuLSTY14,
  author = {Haobo Fu and Peter R. Lewis and Bernhard Sendhoff and Ke Tang and Xin Yao},
  title = {What Are Dynamic Optimization Problems?},
  booktitle = {Proceedings of the IEEE Congress on Evolutionary Computation (CEC '14)},
  publisher = {IEEE},
  year = {2014},
  pages = {1550-1557},
  url = {http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6900316}
}
Haraldsson, S.O. and Woodward, J.R. Automated Design of Algorithms and Genetic Improvement: Contrast and Commonalities 2014 Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation (GECCO '14), pp. 1373-1380  inproceedings DOI  
Abstract: Automated Design of Algorithms (ADA) and Genetic Improvement (GI) are two relatively young fields of research that have been receiving more attention in recent years. Both methodologies can improve programs using evolutionary search methods and successfully produce human competitive programs. ADA and GI are used for improving functional properties such as quality of solution and non-functional properties, e.g. speed, memory and, energy consumption. Only GI of the two has been used to fix bugs, probably because it is applied globally on the whole source code while ADA typically replaces a function or a method locally. While GI is applied directly to the source code ADA works ex-situ, i.e. as a separate process from the program it is improving.
Although the methodologies overlap in many ways they differ on some fundamentals and for further progress to be made researchers from both disciplines should be aware of each other's work.
BibTeX:
@inproceedings{HaraldssonW14,
  author = {Saemundur O. Haraldsson and John R. Woodward},
  title = {Automated Design of Algorithms and Genetic Improvement: Contrast and Commonalities},
  booktitle = {Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation (GECCO '14)},
  publisher = {ACM},
  year = {2014},
  pages = {1373-1380},
  doi = {http://dx.doi.org/10.1145/2598394.2609874}
}
Harman, M., Islam, S., Jia, Y., Minku, L.L., Sarro, F. and Srivisut, K. Less is More: Temporal Fault Predictive Performance over Multiple Hadoop Releases 2014
Vol. 8636Proceedings of the 6th International Symposium on Search-Based Software Engineering (SSBSE '14), pp. 240-246 
inproceedings DOI  
Abstract: We investigate search based fault prediction over time based on 8 consecutive Hadoop versions, aiming to analyse the impact of chronology on fault prediction performance. Our results confound the assumption, implicit in previous work, that additional information from historical versions improves prediction; though G-mean tends to improve, Recall can be reduced.
BibTeX:
@inproceedings{HarmanIJMSS14,
  author = {Mark Harman and Syed Islam and Yue Jia and Leandro L. Minku and Federica Sarro and Komsan Srivisut},
  title = {Less is More: Temporal Fault Predictive Performance over Multiple Hadoop Releases},
  booktitle = {Proceedings of the 6th International Symposium on Search-Based Software Engineering (SSBSE '14)},
  publisher = {Springer},
  year = {2014},
  volume = {8636},
  pages = {240-246},
  doi = {http://dx.doi.org/10.1007/978-3-319-09940-8_18}
}
Harman, M., Jia, Y., Krinke, J., Langdon, W.B., Petke, J. and Zhang, Y. Search Based Software Engineering for Software Product Line Engineering: A Survey and Directions for Future Work 2014 Proceedings of the 18th International Software Product Line Conference (SPLC ’14), pp. 5-18  inproceedings DOI  
Abstract: This paper presents a survey of work on Search Based Software Engineering (SBSE) for Software Product Lines (SPLs). We have attempted to be comprehensive, in the sense that we have sought to include all papers that apply computational search techniques to problems in software product line engineering. Having surveyed the recent explosion in SBSE for SPL research activity, we highlight some directions for future work. We focus on suggestions for the development of recent advances in genetic improvement, showing how these might be exploited by SPL researchers and practitioners: Genetic improvement may grow new products with new functional and non-functional features and graft these into SPLs. It may also merge and parameterise multiple branches to cope with SPL branchmania.
BibTeX:
@inproceedings{HarmanJKLPZ14,
  author = {Mark Harman and Yue Jia and Jens Krinke and William B. Langdon and Justyna Petke and Yuanyuan Zhang},
  title = {Search Based Software Engineering for Software Product Line Engineering: A Survey and Directions for Future Work},
  booktitle = {Proceedings of the 18th International Software Product Line Conference (SPLC ’14)},
  publisher = {ACM},
  year = {2014},
  pages = {5-18},
  doi = {http://dx.doi.org/10.1145/2648511.2648513}
}
Harman, M., Jia, Y. and Langdon, W.B. Babel Pidgin: SBSE Can Grow and Graft Entirely New Functionality into a Real World System 2014
Vol. 8636Proceedings of the 6th International Symposium on Search-Based Software Engineering (SSBSE '14), pp. 247-252 
inproceedings DOI  
Abstract: Adding new functionality to an existing, large, and perhaps poorly-understood system is a challenge, even for the most competent human programmer. We introduce a ‘grow and graft’ approach to Genetic Improvement (GI) that transplants new functionality into an existing system. We report on the trade offs between varying degrees of human guidance to the GI transplantation process. Using our approach, we successfully grew and transplanted a new ‘Babel Fish’ linguistic translation feature into the Pidgin instant messaging system, creating a genetically improved system we call ‘Babel Pidgin’. This is the first time that SBSE has been used to evolve and transplant entirely novel functionality into an existing system. Our results indicate that our grow and graft approach requires surprisingly little human guidance.
BibTeX:
@inproceedings{HarmanJL14,
  author = {Mark Harman and Yue Jia and William B. Langdon},
  title = {Babel Pidgin: SBSE Can Grow and Graft Entirely New Functionality into a Real World System},
  booktitle = {Proceedings of the 6th International Symposium on Search-Based Software Engineering (SSBSE '14)},
  publisher = {Springer},
  year = {2014},
  volume = {8636},
  pages = {247-252},
  doi = {http://dx.doi.org/10.1007/978-3-319-09940-8_19}
}
Harman, M., Jia, Y., Langdon, W.B., Petke, J., Moghadam, I.H., Yoo, S. and Wu, F. Genetic Improvement for Adaptive Software Engineering (keynote) 2014 Proceedings of the 9th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS '14), pp. 1-4  inproceedings DOI  
Abstract: This paper presents a brief outline of an approach to online genetic improvement. We argue that existing progress in genetic improvement can be exploited to support adaptivity. We illustrate our proposed approach with a 'dreaming smart device' example that combines online and offline machine learning and optimisation.
BibTeX:
@inproceedings{HarmanJLPMYW14,
  author = {Mark Harman and Yue Jia and William B. Langdon and Justyna Petke and Iman Hemati Moghadam and Shin Yoo and Fan Wu},
  title = {Genetic Improvement for Adaptive Software Engineering (keynote)},
  booktitle = {Proceedings of the 9th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS '14)},
  publisher = {ACM},
  year = {2014},
  pages = {1-4},
  doi = {http://dx.doi.org/10.1145/2593929.2600116}
}
Kocsis, Z.A., Neumann, G., Swan, J., Epitropakis, M.G., Brownlee, A.E.I., Haraldsson, S.O. and Bowles, E. Repairing and Optimizing Hadoop hashCode Implementations 2014 Proceedings of Symposium on Search-Based Software Engineering (SSBSE '14), pp. 259-264  inproceedings DOI  
Abstract: We describe how contract violations in JavaTM hashCode methods can be repaired using novel combination of semantics-preserving and generative methods, the latter being achieved via Automatic Improvement Programming. The method described is universally applicable. When applied to the Hadoop platform, it was established that it produces hashCode functions that are at least as good as the original, broken method as well as those produced by a widely-used alternative method from the ‘Apache Commons’ library.
BibTeX:
@inproceedings{KocsisNSEBHB14,
  author = {Z. A. Kocsis and G. Neumann and J. Swan and M. G. Epitropakis and A. E. I. Brownlee and S. O. Haraldsson and E. Bowles},
  title = {Repairing and Optimizing Hadoop hashCode Implementations},
  booktitle = {Proceedings of Symposium on Search-Based Software Engineering (SSBSE '14)},
  publisher = {Springer},
  year = {2014},
  pages = {259-264},
  doi = {http://dx.doi.org/10.1007/978-3-319-09940-8_22}
}
Kocsis, Z.A. and Swan, J. Asymptotic Genetic Improvement Programming with Type Functors and Catamorphisms 2014 Proceedings of Semantic Methods in Genetic Programming (SMGP) at Parallel Problem Solving from Nature (PPSN XIV)  inproceedings URL 
BibTeX:
@inproceedings{KocsisS14,
  author = {Zoltan A. Kocsis and Jerry Swan},
  title = {Asymptotic Genetic Improvement Programming with Type Functors and Catamorphisms},
  booktitle = {Proceedings of Semantic Methods in Genetic Programming (SMGP) at Parallel Problem Solving from Nature (PPSN XIV)},
  year = {2014},
  url = {http://www.cs.bham.ac.uk/~wbl/biblio/cache/cache/.hidden_13-jun_1511629145/http___www.cs.put.poznan.pl_kkrawiec_smgp2014_uploads_Site_Kocsis.pdf}
}
Kovitz, B. and Swan, J. Structural Stigmergy: A Speculative Pattern Language for Metaheuristics 2014 Proceedings of the 2014 Conference Companion on Genetic and Evolutionary Computation Companion (GECCO '14), pp. 1407-1410  inproceedings DOI  
Abstract: To construct graphs whose quality results from complicated relationships that pervade the entire graph, especially relationships at multiple scales, follow a strategy of repeatedly making local patches to a single graph. Look for small, easily recognized flaws in local areas of the graph and fix them. Add tags to the graph to represent non-local relationships and higher-level structures as individual nodes. The tags then have easily recognized flaws that relate to non-local and higher-level concerns, enabling local patching to set off cascades of local fixes that address those concerns.
BibTeX:
@inproceedings{KovitzS14,
  author = {Ben Kovitz and Jerry Swan},
  title = {Structural Stigmergy: A Speculative Pattern Language for Metaheuristics},
  booktitle = {Proceedings of the 2014 Conference Companion on Genetic and Evolutionary Computation Companion (GECCO '14)},
  publisher = {ACM},
  year = {2014},
  pages = {1407-1410},
  doi = {http://dx.doi.org/10.1145/2598394.2609845}
}
Kovitz, B. and Swan, J. Tagging in Metaheuristics 2014 Proceedings of the 2014 Conference Companion on Genetic and Evolutionary Computation Companion (GECCO '14), pp. 1411-1414  inproceedings DOI  
Abstract: Could decisions made during some search iterations use information discovered by other search iterations? Then store that information in tags: data that persist between search iterations.
BibTeX:
@inproceedings{KovitzS14b,
  author = {Ben Kovitz and Jerry Swan},
  title = {Tagging in Metaheuristics},
  booktitle = {Proceedings of the 2014 Conference Companion on Genetic and Evolutionary Computation Companion (GECCO '14)},
  publisher = {ACM},
  year = {2014},
  pages = {1411-1414},
  doi = {http://dx.doi.org/10.1145/2598394.2609844}
}
Langdon, W.B. Genetic Improvement of Programs 2014 Proceedings of the 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC '14), pp. 14-19  inproceedings DOI  
Abstract: Genetic programming can optimise software, including: evolving test benchmarks, generating hyper-heuristics by searching meta-heuristics, generating communication protocols, composing telephony systems and web services, generating improved hashing and C++ heap managers, redundant programming and even automatic bug fixing. Particularly in embedded real-time or mobile systems, there may be many ways to trade off expenses (such as time, memory, energy, power consumption) vs. Functionality. Human programmers cannot try them all. Also the best multi-objective Pareto trade off may change with time, underlying hardware and network connection or user behaviour. It may be GP can automatically suggest different trade offs for each new market. Recent results include substantial speed up by evolving a new version of a program customised for a special case.
BibTeX:
@inproceedings{Langdon14,
  author = {William B. Langdon},
  title = {Genetic Improvement of Programs},
  booktitle = {Proceedings of the 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC '14)},
  publisher = {IEEE},
  year = {2014},
  pages = {14-19},
  doi = {http://dx.doi.org/10.1109/SYNASC.2014.10}
}
Langdon, W.B. Improved CUDA 3D Medical Image Registration 2014 Summary of presentation at 6th UKMAC  misc URL 
Abstract: A combination of manual and genetic improvement (GI) can optimise a critical component of NiftyReg healthcare industry software across a diverse range of six nVidia graphics processing units (GPUs). The improved K20c kernel gives a speed up > 2000 fold compared to released code on a 3GHz CPU.
BibTeX:
@misc{Langdon14b,
  author = {William B. Langdon},
  title = {Improved CUDA 3D Medical Image Registration},
  year = {2014},
  url = {http://www0.cs.ucl.ac.uk/staff/W.Langdon/ftp/papers/Langdon_2014_ukmac.pdf}
}
Langdon, W.B. and Harman, M. Genetically Improved CUDA C++ Software 2014 Proceedings of the 17th European Conference on Genetic Programming (EuroGP '14), pp. 87-99  inproceedings DOI  
Abstract: Genetic Programming (GP) may dramatically increase the performance of software written by domain experts. GP and autotuning are used to optimise and refactor legacy GPGPU C code for modern parallel graphics hardware and software. Speed ups of more than six times on recent nVidia GPU cards are reported compared to the original kernel on the same hardware.
BibTeX:
@inproceedings{LangdonH14,
  author = {William B. Langdon and Mark Harman},
  title = {Genetically Improved CUDA C++ Software},
  booktitle = {Proceedings of the 17th European Conference on Genetic Programming (EuroGP '14)},
  publisher = {Springer},
  year = {2014},
  pages = {87-99},
  doi = {http://dx.doi.org/10.1007/978-3-662-44303-3_8}
}
Langdon, W.B., Modat, M., Petke, J. and Harman, M. Improving 3D Medical Image Registration CUDA Software with Genetic Programming 2014 Proceedings of the 2014 Conference on Genetic and Evolutionary Computation (GECCO '14), pp. 951-958  inproceedings DOI  
Abstract: Genetic Improvement (GI) is shown to optimise, in some cases by more than 35percent, a critical component of healthcare industry software across a diverse range of six nVidia graphics processing units (GPUs). GP and other search based software engineering techniques can automatically optimise the current rate limiting CUDA parallel function in the NiftyReg open source C++ project used to align or register high resolution nuclear magnetic resonance NMRI and other diagnostic NIfTI images. Future Neurosurgery techniques will require hardware acceleration, such as GPGPU, to enable real time comparison of three dimensional in theatre images with earlier patient images and reference data. With millimetre resolution brain scan measurements comprising more than ten million voxels the modified kernel can process in excess of 3 billion active voxels per second.
BibTeX:
@inproceedings{LangdonMPH14,
  author = {William B. Langdon and Marc Modat and Justyna Petke and Mark Harman},
  title = {Improving 3D Medical Image Registration CUDA Software with Genetic Programming},
  booktitle = {Proceedings of the 2014 Conference on Genetic and Evolutionary Computation (GECCO '14)},
  publisher = {ACM},
  year = {2014},
  pages = {951-958},
  doi = {http://dx.doi.org/10.1145/2576768.2598244}
}
Li, L., Harman, M., Letier, E. and Zhang, Y. Robust Next Release Problem: Handling Uncertainty during Optimization 2014 Proceedings of the 2014 Conference on Genetic and Evolutionary Computation (GECCO '14), pp. 1247-1254  inproceedings DOI  
Abstract: Uncertainty is inevitable in real world requirement engineering. It has a significant impact on the feasibility of proposed solutions and thus brings risks to the software release plan. This paper proposes a multi-objective optimization technique, augmented with Monte-Carlo Simulation, that optimizes requirement choices for the three objectives of cost, revenue, and uncertainty. The paper reports the results of an empirical study over four data sets derived from a single real world data set. The results show that the robust optimal solutions obtained by our approach are conservative compared to their corresponding optimal solutions produced by traditional Multi-Objective Next Release Problem. We obtain a robustness improvement of at least 18% at a small cost (a maximum 0.0285 shift in the 2D Pareto-front in the unit space). Surprisingly we found that, though a requirement's cost is correlated with inclusion on the Pareto-front, a requirement's expected revenue is not.
BibTeX:
@inproceedings{LiHLZ14,
  author = {Lingbo Li and Mark Harman and Emmanuel Letier and Yuanyuan Zhang},
  title = {Robust Next Release Problem: Handling Uncertainty during Optimization},
  booktitle = {Proceedings of the 2014 Conference on Genetic and Evolutionary Computation (GECCO '14)},
  publisher = {ACM},
  year = {2014},
  pages = {1247-1254},
  doi = {http://dx.doi.org/10.1145/2576768.2598334}
}
Minku, L.L., Sudholt, D. and Yao, X. Improved Evolutionary Algorithm Design for the Project Scheduling Problem Based on Runtime Analysis 2014 IEEE Transactions on Software Engineering
Vol. 40(1), pp. 83-102 
article DOI  
Abstract: Several variants of evolutionary algorithms (EAs) have been applied to solve the project scheduling problem (PSP), yet their performance highly depends on design choices for the EA. It is still unclear how and why different EAs perform differently. We present the first runtime analysis for the PSP, gaining insights into the performance of EAs on the PSP in general, and on specific instance classes that are easy or hard. Our theoretical analysis has practical implications-based on it, we derive an improved EA design. This includes normalizing employees' dedication for different tasks to ensure they are not working overtime; a fitness function that requires fewer pre-defined parameters and provides a clear gradient towards feasible solutions; and an improved representation and mutation operator. Both our theoretical and empirical results show that our design is very effective. Combining the use of normalization to a population gave the best results in our experiments, and normalization was a key component for the practical effectiveness of the new design. Not only does our paper offer a new and effective algorithm for the PSP, it also provides a rigorous theoretical analysis to explain the efficiency of the algorithm, especially for increasingly large projects.
BibTeX:
@article{MinkuSY14,
  author = {Leandro L. Minku and Dirk Sudholt and Xin Yao},
  title = {Improved Evolutionary Algorithm Design for the Project Scheduling Problem Based on Runtime Analysis},
  journal = {IEEE Transactions on Software Engineering},
  year = {2014},
  volume = {40},
  number = {1},
  pages = {83-102},
  doi = {http://dx.doi.org/10.1109/TSE.2013.52}
}
Minku, L.L. and Yao, X. How to Make Best Use of Cross-company Data in Software Effort Estimation? 2014 Proceedings of the 36th International Conference on Software Engineering (ICSE'14), pp. 446-456  inproceedings DOI  
Abstract: Previous works using Cross-Company (CC) data for making Within-Company (WC) Software Effort Estimation (SEE) try to use CC data or models directly to provide predictions in the WC context. So, these data or models are only helpful when they match the WC context well. When they do not, a fair amount of WC training data, which are usually expensive to acquire, are still necessary to achieve good performance. We investigate how to make best use of CC data, so that we can reduce the amount of WC data while maintaining or improving performance in comparison to WC SEE models. This is done by proposing a new framework to learn the relationship between CC and WC projects explicitly, allowing CC models to be mapped to the WC context. Such mapped models can be useful even when the CC models themselves do not match the WC context directly. Our study shows that a new approach instantiating this framework is able not only to use substantially less WC data than a corresponding WC model, but also to achieve similar/better performance. This approach can also be used to provide insight into the behaviour of a company in comparison to others.
BibTeX:
@inproceedings{MinkuY14,
  author = {Leandro L. Minku and Xin Yao},
  title = {How to Make Best Use of Cross-company Data in Software Effort Estimation?},
  booktitle = {Proceedings of the 36th International Conference on Software Engineering (ICSE'14)},
  publisher = {ACM},
  year = {2014},
  pages = {446-456},
  doi = {http://dx.doi.org/10.1145/2568225.2568228}
}
Neumann, G., Swan, J., Harman, M. and Clark, J.A. The Executable Experimental Template Pattern for the Systematic Comparison of Metaheuristics: Extended Abstract 2014 Proceedings of the 2014 Conference Companion on Genetic and Evolutionary Computation Companion (GECCO '14), pp. 1427-1430  inproceedings DOI  
BibTeX:
@inproceedings{NeumannSHC14,
  author = {Geoffrey Neumann and Jerry Swan and Mark Harman and John A. Clark},
  title = {The Executable Experimental Template Pattern for the Systematic Comparison of Metaheuristics: Extended Abstract},
  booktitle = {Proceedings of the 2014 Conference Companion on Genetic and Evolutionary Computation Companion (GECCO '14)},
  publisher = {ACM},
  year = {2014},
  pages = {1427-1430},
  doi = {http://dx.doi.org/10.1145/2598394.2609850}
}
Petke, J., Harman, M., Langdon, W.B. and Weimer, W. Using Genetic Improvement & Code Transplants to Specialise a C++ Program to a Problem Class 2014 Proceedings of the 17th European Conference on Genetic Programming (EuroGP '14), pp. 137-149  inproceedings DOI  
Abstract: Genetic Improvement (GI) is a form of Genetic Programming that improves an existing program. We use GI to evolve a faster version of a C++ program, a Boolean satisfiability (SAT) solver called MiniSAT, specialising it for a particular problem class, namely Combinatorial Interaction Testing (CIT), using automated code transplantation. Our GI-evolved solver achieves overall 17percent improvement, making it comparable with average expert human performance. Additionally, this automatically evolved solver is faster than any of the human-improved solvers for the CIT problem.
BibTeX:
@inproceedings{PetkeHLW14,
  author = {Justyna Petke and Mark Harman and William B. Langdon and Westley Weimer},
  title = {Using Genetic Improvement & Code Transplants to Specialise a C++ Program to a Problem Class},
  booktitle = {Proceedings of the 17th European Conference on Genetic Programming (EuroGP '14)},
  publisher = {Springer},
  year = {2014},
  pages = {137-149},
  doi = {http://dx.doi.org/10.1007/978-3-662-44303-3_12}
}
Song, L., Minku, L.L. and Yao, X. The Potential Benefit of Relevance Vector Machine to Software Effort Estimation 2014 Proceedings of the 10th International Conference on Predictive Models in Software Engineering (PROMISE '14), pp. 52-61  inproceedings DOI  
Abstract: Three key challenges faced by the task of software effort estimation (SEE) when using predictive models are: (1) in order to support decision-making, software managers should have access not only to the effort estimation given by the predictive model, but also how confident this model is in estimating a given project and how likely other effort values could be the real efforts required to develop this project, (2) SEE data is likely to contain noise, due to the participation of humans in the data collection, and this noise can hinder predictions if not catered, and (3) data collection is an expensive task, and guidelines on when new data need to be collected would be helpful for reducing the cost associated with data collection. However, even though SEE has been studied for decades and many predictors have been proposed, few methods focus on these issues. In this work, we show that relevance vector machine (RVM) is a promising predictive method for addressing these three challenges. More specifically, it explicitly handles noise, it provides probabilistic predictions of effort, and can be used to identify when the required efforts of new projects should be collected for using them as training examples. With that in mind, this work provides the first step in exploiting RVM's potential for SEE by validating both its point prediction and prediction intervals. It then explains in detail future directions in terms of how RVMs can be further exploited for addressing the above mentioned challenges. Our systematic experiments show that RVM is very competitive compared with state-of-the-art SEE approaches, being usually ranked the first or second in 7 across 11 data sets in terms of mean absolute error. We also demonstrate how RVM can be used to judge the amount of noise present in the data. In summary, we show that RVM is a very promising predictor for SEE and should be further exploited.
BibTeX:
@inproceedings{SongMY14,
  author = {Liyan Song and Leandro L. Minku and Xin Yao},
  title = {The Potential Benefit of Relevance Vector Machine to Software Effort Estimation},
  booktitle = {Proceedings of the 10th International Conference on Predictive Models in Software Engineering (PROMISE '14)},
  publisher = {ACM},
  year = {2014},
  pages = {52-61},
  doi = {http://dx.doi.org/10.1145/2639490.2639510}
}
Swan, J., Edjvet, M. and Özcan, E. Augmenting Metaheuristics with Rewriting Systems. 2014 (CSM-197)  techreport URL 
Abstract: We describe the use of a rewriting system to determine equivalence classes over the search-space of optimisation problems. These equivalence classes may be used to derive redundant subsequences in the search-space for incorporation into metaheuristics such as backtracking, genetic algorithms and tabu-search. We use this approach as the basis for a new tabu-search variant - 'Equational-TS' and apply it to the Quadratic Assignment Problem, yielding significant results in terms of the number of iterations to convergence.
BibTeX:
@techreport{SwanEO14,
  author = {Jerry Swan and Martin Edjvet and Ender Özcan},
  title = {Augmenting Metaheuristics with Rewriting Systems.},
  year = {2014},
  number = {CSM-197},
  url = {http://www.cs.stir.ac.uk/~jsw/metaheuristic-atp-TR.pdf}
}
Swan, J., Epitropakis, M.G. and Woodward, J. Gen-O-Fix: An Embeddable Framework for Dynamic Adaptive Genetic Improvement Programming 2014 (CSM-195)  techreport URL 
Abstract: Genetic Improvement Programming (GIP) is concerned with automating the burden of software maintenance, the most costly phase of the software life cycle. We describe Gen-O-Fix, a GIP framework which allows a software system hosted on the Java Virtual Machine to be continually improved (e.g. make better predictions; pass more regression tests; reduce power consumption). It is the first exemplar of a dynamic adaptive GIP framework, i.e. it can improve a system as it runs. It is written in the Scala programming language and uses reflection to yield source-to-source transformation. One of the design goals for Gen-O-Fix was to create a tool that is user-centric rather than researcher-centric: the end-user is required only to provide a measure of system quality and the URL of the source code to be improved. We discuss potential applications to predictive, embedded and high-performance systems.
BibTeX:
@techreport{SwanEW14,
  author = {Jerry Swan and Michael G Epitropakis and John Woodward},
  title = {Gen-O-Fix: An Embeddable Framework for Dynamic Adaptive Genetic Improvement Programming},
  year = {2014},
  number = {CSM-195},
  url = {http://www.cs.stir.ac.uk/~jrw/publications/genofix-TR.pdf}
}
Swan, J. and Kocsis, Z.A. Automatic Improvement Programming with Hylomorphisms 2014   techreport  
BibTeX:
@techreport{SwanK14,
  author = {Jerry Swan and Zoltan A. Kocsis},
  title = {Automatic Improvement Programming with Hylomorphisms},
  year = {2014}
}
Swan, J., Kocsis, Z.A. and Lisitsa, A. The 'Representative' Metaheuristic Design Pattern 2014 Proceedings of the 2014 Conference Companion on Genetic and Evolutionary Computation Companion (GECCO '14), pp. 1435-1436  inproceedings DOI  
BibTeX:
@inproceedings{SwanKL14,
  author = {Jerry Swan and Zoltan A. Kocsis and Alexei Lisitsa},
  title = {The 'Representative' Metaheuristic Design Pattern},
  booktitle = {Proceedings of the 2014 Conference Companion on Genetic and Evolutionary Computation Companion (GECCO '14)},
  publisher = {ACM},
  year = {2014},
  pages = {1435-1436},
  doi = {http://dx.doi.org/10.1145/2598394.2609842}
}
Swan, J., Woodward, J., Özcan, E., Kendall, G. and Burke, E. Searching the Hyper-heuristic Design Space 2014 Cognitive Computation
Vol. 6(6), pp. 66-73 
article DOI  
Abstract: We extend a previous mathematical formulation of hyper-heuristics to reflect the emerging generalization of the concept. We show that this leads naturally to a recursive definition of hyper-heuristics and to a division of responsibility that is suggestive of a blackboard architecture, in which individual heuristics annotate a shared workspace with information that may also be exploited by other heuristics. Such a framework invites consideration of the kind of relaxations of the domain barrier that can be achieved without loss of generality. We give a concrete example of this architecture with an application to the 3-SAT domain that significantly improves on a related token-ring hyper-heuristic.
BibTeX:
@article{SwanWOKB14,
  author = {Jerry Swan and John Woodward and Ender Özcan and Graham Kendall and Edmund Burke},
  title = {Searching the Hyper-heuristic Design Space},
  journal = {Cognitive Computation},
  year = {2014},
  volume = {6},
  number = {6},
  pages = {66-73},
  doi = {http://dx.doi.org/10.1007/s12559-013-9201-8}
}
Tang, K., Peng, F., Chen, G. and Yao, X. Population-based Algorithm Portfolios with Automated Constituent Algorithms Selection 2014 Information Sciences
Vol. 279, pp. 94-104 
article DOI  
Abstract: Population-based Algorithm Portfolios (PAP) is an appealing framework for integrating different Evolutionary Algorithms (EAs) to solve challenging numerical optimization problems. Particularly, PAP has shown significant advantages to single EAs when a number of problems need to be solved simultaneously. Previous investigation on PAP reveals that choosing appropriate constituent algorithms is crucial to the success of PAP. However, no method has been developed for this purpose. In this paper, an extended version of PAP, namely PAP based on Estimated Performance Matrix (EPM-PAP) is proposed. EPM-PAP is equipped with a novel constituent algorithms selection module, which is based on the EPM of each candidate EAs. Empirical studies demonstrate that the EPM-based selection method can successfully identify appropriate constituent EAs, and thus EPM-PAP outperformed all single EAs considered in this work.
BibTeX:
@article{TangPCY14,
  author = {Ke Tang and Fei Peng and Guoliang Chen and Xin Yao},
  title = {Population-based Algorithm Portfolios with Automated Constituent Algorithms Selection},
  journal = {Information Sciences},
  year = {2014},
  volume = {279},
  pages = {94-104},
  doi = {http://dx.doi.org/10.1016/j.ins.2014.03.105}
}
Woodward, J.R. and Swan, J. Template Method Hyper-heuristics 2014 Proceedings of the 2014 Conference Companion on Genetic and Evolutionary Computation Companion (GECCO '14), pp. 1437-1438  inproceedings DOI  
Abstract: The optimization literature is awash with metaphorically-inspired metaheuristics and their subsequent variants and hybridizations. This results in a plethora of methods, with descriptions that are often polluted with the language of the metaphor which inspired them [8]. Within such a fragmented field, the traditional approach of manual 'operator tweaking' makes it difficult to establish the contribution of individual metaheuristic components to the overall success of a methodology. Irrespective of whether it happens to best the state-of-the-art, such 'tweaking' is so labour-intensive that does relatively little to advance scientific understanding. In order to introduce further structure and rigour, it is therefore desirable to not only to be able to specify entire families of metaheuristics (rather than individual metaheuristics), but also be able to generate and test them. In particular, the adoption of a model agnostic approach towards the generation of metaheuristics would help to establish which metaheuristic components are useful contributors to a solution.
BibTeX:
@inproceedings{WoodwardS14,
  author = {John R. Woodward and Jerry Swan},
  title = {Template Method Hyper-heuristics},
  booktitle = {Proceedings of the 2014 Conference Companion on Genetic and Evolutionary Computation Companion (GECCO '14)},
  publisher = {ACM},
  year = {2014},
  pages = {1437-1438},
  doi = {http://dx.doi.org/10.1145/2598394.2609843}
}
Woodward, J., Swan, J. and Martin, S. The ‘Composite’ Design Pattern in Metaheuristics 2014 Proceedings of the 2014 Conference Companion on Genetic and Evolutionary Computation Companion (GECCO '14), pp. 1439-1444  inproceedings DOI  
BibTeX:
@inproceedings{WoodwardSM14,
  author = {John Woodward and Jerry Swan and Simon Martin},
  title = {The ‘Composite’ Design Pattern in Metaheuristics},
  booktitle = {Proceedings of the 2014 Conference Companion on Genetic and Evolutionary Computation Companion (GECCO '14)},
  publisher = {ACM},
  year = {2014},
  pages = {1439-1444},
  doi = {http://dx.doi.org/10.1145/2598394.2609848}
}
Yao, X., Harman, M. and Jia, Y. A Study of Equivalent and Stubborn Mutation Operators using Human Analysis of Equivalence 2014 Proceedings of the 36th International Conference on Software Engineering (ICSE'14), pp. 919-930  inproceedings DOI  
Abstract: Though mutation testing has been widely studied for more than thirty years, the prevalence and properties of equivalent mutants remain largely unknown. We report on the causes and prevalence of equivalent mutants and their relationship to stubborn mutants (those that remain undetected by a high quality test suite, yet are non-equivalent). Our results, based on manual analysis of 1,230 mutants from 18 programs, reveal a highly uneven distribution of equivalence and stubbornness. For example, the ABS class and half UOI class generate many equivalent and almost no stubborn mutants, while the LCR class generates many stubborn and few equivalent mutants. We conclude that previous test effectiveness studies based on fault seeding could be skewed, while developers of mutation testing tools should prioritise those operators that we found generate disproportionately many stubborn (and few equivalent) mutants.
BibTeX:
@inproceedings{YaoHJ14,
  author = {Xiangjuan Yao and Mark Harman and Yue Jia},
  title = {A Study of Equivalent and Stubborn Mutation Operators using Human Analysis of Equivalence},
  booktitle = {Proceedings of the 36th International Conference on Software Engineering (ICSE'14)},
  publisher = {ACM},
  year = {2014},
  pages = {919-930},
  doi = {http://dx.doi.org/10.1145/2568225.2568265}
}
Zhang, Y., Harman, M., Ochoa, G., Ruhe, G. and Brinkkemper, S. An Empirical Study of Meta- and Hyper-Heuristic Search for Multi-Objective Release Planning 2014 (RN/14/07)  techreport URL 
Abstract: A variety of meta-heuristic search algorithms have been introduced for optimising software release planning. However, there has been no comprehensive empirical study of different search algorithms across multiple different real world datasets. In this paper we present an empirical study of global, local and hybrid meta- and hyper-heuristic search based algorithms on 10 real world datasets. We find that the hyper-heuristics are particularly effective. For example, the hyper-heuristic genetic algorithm significantly outperformed the other six approaches (and with high effect size) for solution quality 85% of the time, and was also faster than all others 70% of the time. Furthermore, correlation analysis reveals that it scales well as the number of requirements increases.
BibTeX:
@techreport{ZhangHORB14,
  author = {Yuanyuan Zhang and Mark Harman and Gabriela Ochoa and Guenther Ruhe and S. Brinkkemper},
  title = {An Empirical Study of Meta- and Hyper-Heuristic Search for Multi-Objective Release Planning},
  year = {2014},
  number = {RN/14/07},
  url = {http://www.cs.ucl.ac.uk/fileadmin/UCL-CS/research/Research_Notes/RN_14_07.pdf}
}
Burke, E.K., Gendreau, M., Hyde, M., Kendall, G., Ochoa, G., Özcan, E. and Qu, R. Hyper-heuristics: A Survey of the State of the Art 2013 Journal of the Operational Research Society
Vol. 64(12), pp. 1695-1724 
article DOI  
Abstract: Hyper-heuristics comprise a set of approaches that are motivated (at least in part) by the goal of automating the design of heuristic methods to solve hard computational search problems. An underlying strategic research challenge is to develop more generally applicable search methodologies. The term hyper-heuristic is relatively new; it was first used in 2000 to describe heuristics to choose heuristics in the context of combinatorial optimisation. However, the idea of automating the design of heuristics is not new; it can be traced back to the 1960s. The definition of hyper-heuristics has been recently extended to refer to a search method or learning mechanism for selecting or generating heuristics to solve computational search problems. Two main hyper-heuristic categories can be considered: heuristic selection and heuristic generation. The distinguishing feature of hyper-heuristics is that they operate on a search space of heuristics (or heuristic components) rather than directly on the search space of solutions to the underlying problem that is being addressed. This paper presents a critical discussion of the scientific literature on hyper-heuristics including their origin and intellectual roots, a detailed account of the main types of approaches, and an overview of some related areas. Current research trends and directions for future research are also discussed.
BibTeX:
@article{Burke13,
  author = {Edmund K. Burke and Michel Gendreau and Matthew Hyde and Graham Kendall and Gabriela Ochoa and Ender Özcan and Rong Qu},
  title = {Hyper-heuristics: A Survey of the State of the Art},
  journal = {Journal of the Operational Research Society},
  year = {2013},
  volume = {64},
  number = {12},
  pages = {1695-1724},
  doi = {http://dx.doi.org/10.1057/jors.2013.71}
}
Burton, F.R. and Poulding, S. Complementing Metaheuristic Search with Higher Abstraction Techniques 2013 Proceedings of the 1st International Workshop on Combining Modelling and Search-Based Software Engineering (CMSBSE '13), pp. 45-48  inproceedings DOI  
Abstract: Search-Based Software Engineering and Model-Driven Engineering are both innovative approaches to software engineering. The premise of Search-Based Software Engineering is to reformulate engineering tasks as optimisation problems that can be solved using metaheuristic search techniques. Model-Driven Engineering aims to apply greater levels of abstraction to software engineering problems. In this paper, it is argued that these two approaches are complementary and that both research fields can make further progress by applying techniques from the other. We suggest ways in which synergies between the fields can be exploited.
BibTeX:
@inproceedings{BurtonP13,
  author = {Frank R. Burton and Simon Poulding},
  title = {Complementing Metaheuristic Search with Higher Abstraction Techniques},
  booktitle = {Proceedings of the 1st International Workshop on Combining Modelling and Search-Based Software Engineering (CMSBSE '13)},
  publisher = {IEEE},
  year = {2013},
  pages = {45-48},
  doi = {http://dx.doi.org/10.1109/CMSBSE.2013.6604436}
}
Epitropakis, M.G., Liy, X. and Burke, E.K. A Dynamic Archive Niching Differential Evolution Algorithm for Multimodal Optimization 2013 Proceedings of 2013 IEEE Congress on Evolutionary Computation (CEC '13), pp. 79-86  inproceedings DOI  
Abstract: Highly multimodal landscapes with multiple local/global optima represent common characteristics in real-world applications. Many niching algorithms have been proposed in the literature which aim to search such landscapes in an attempt to locate as many global optima as possible. However, to locate and maintain a large number of global solutions, these algorithms are substantially influenced by their parameter values, such as a large population size. Here, we propose a new niching Differential Evolution algorithm that attempts to overcome the population size influence and produce good performance almost independently of its population size. To this end, we incorporate two mechanisms into the algorithm: a control parameter adaptation technique and an external dynamic archive along with a reinitialization mechanism. The first mechanism is designed to efficiently adapt the control parameters of the algorithm, whilst the second one is responsible for enabling the algorithm to investigate unexplored regions of the search space and simultaneously keep the best solutions found by the algorithm. The proposed approach is compared with two Differential Evolution variants on a recently proposed benchmark suite. Empirical results indicate that the proposed niching algorithm is competitive and very promising. It exhibits a robust and stable behavior, whilst the incorporation of the dynamic archive seems to tackle the population size influence effectively. Moreover, it alleviates the problem of having to fine-tune the population size parameter in a niching algorithm.
BibTeX:
@inproceedings{EpitropakisLB13,
  author = {Michael G. Epitropakis and Xiaodong Liy and Edmund K. Burke},
  title = {A Dynamic Archive Niching Differential Evolution Algorithm for Multimodal Optimization},
  booktitle = {Proceedings of 2013 IEEE Congress on Evolutionary Computation (CEC '13)},
  publisher = {IEEE},
  year = {2013},
  pages = {79-86},
  doi = {http://dx.doi.org/10.1109/CEC.2013.6557556}
}
Feldt, R. and Poulding, S. Finding Test Data with Specific Properties via Metaheuristic Search 2013 Proceedings of the 24th IEEE International Symposium on Software Reliability Engineering (ISSRE '13), pp. 4-7 November  inproceedings DOI  
Abstract: For software testing to be effective the test data should cover a large and diverse range of the possible input domain. Boltzmann samplers were recently introduced as a systematic method to randomly generate data with a range of sizes from combinatorial classes, and there are a number of automated testing frameworks that serve a similar purpose. However, size is only one of many possible properties that data generated for software testing should exhibit. For the testing of realistic software systems we also need to trade off between multiple different properties or search for specific instances of data that combine several properties. In this paper we propose a general search-based framework for finding test data with specific properties. In particular, we use a metaheuristic, differential evolution, to search for stochastic models for the data generator. Evaluation of the framework demonstrates that it is more general and flexible than existing solutions based on random sampling.
BibTeX:
@inproceedings{FeldtP13,
  author = {Robert Feldt and Simon Poulding},
  title = {Finding Test Data with Specific Properties via Metaheuristic Search},
  booktitle = {Proceedings of the 24th IEEE International Symposium on Software Reliability Engineering (ISSRE '13)},
  publisher = {IEEE},
  year = {2013},
  pages = {4-7 November},
  doi = {http://dx.doi.org/10.1109/ISSRE.2013.6698888}
}
Ferrucci, F., Harman, M., Ren, J. and Sarro, F. Not Going to Take this Anymore: Multi-Objective Overtime Planning for Software Engineering Projects 2013 Proceedings of the 35th International Conference on Software Engineering (ICSE '13), pp. 462-471  inproceedings URL 
Abstract: Software Engineering and development is well- known to suffer from unplanned overtime, which causes stress and illness in engineers and can lead to poor quality software with higher defects. In this paper, we introduce a multi-objective decision support approach to help balance project risks and duration against overtime, so that software engineers can better plan overtime. We evaluate our approach on 6 real world software projects, drawn from 3 organisations using 3 standard evaluation measures and 3 different approaches to risk assessment. Our results show that our approach was significantly better (p < 0.05) than standard multi-objective search in 76% of experiments (with high Cohen effect size in 85% of these) and was significantly better than currently used overtime planning strategies in 100% of experiments (with high effect size in all). We also show how our approach provides actionable overtime planning results and inves- tigate the impact of the three different forms of risk assessment.
BibTeX:
@inproceedings{FerrucciHRS13,
  author = {Filomena Ferrucci and Mark Harman and Jian Ren and Federica Sarro},
  title = {Not Going to Take this Anymore: Multi-Objective Overtime Planning for Software Engineering Projects},
  booktitle = {Proceedings of the 35th International Conference on Software Engineering (ICSE '13)},
  publisher = {IEEE},
  year = {2013},
  pages = {462-471},
  url = {http://www0.cs.ucl.ac.uk/staff/mharman/icse13.pdf}
}
Harman, M. Software Engineering: An Ideal Set of Challenges for Evolutionary Computation 2013 Proceedings of The 15th International Conference Companion on Genetic and Evolutionary Computation (GECCO '13), pp. 1759-1760  inproceedings DOI  
Abstract: Software is an engineering material to be optimised. Until comparatively recently many computer scientists doubted this; why would one want to optimise something that could be made perfect by pure logical reasoning? However, the wider community has come to realise that, while very small programs may be perfect in isolation, larger software systems may never be (because the world in which they operate is not perfect). Once we accept this, we soon arrive at evolutionary computation as a means of optimising software. However, software is not merely some other engineering material to be optimised. Software is virtual and inherently adaptive, making it better suited to evolutionary computation than any other engineering material. This is leading to breakthroughs at the interface of software engineering and evolutionary computation, though there are still many exciting open problems for evolutionary commutation researchers to get their teeth into. This talk will cover some of these recent developments in Search Based Software Engineering (SBSE) and Dynamic Adaptive SBSE.
BibTeX:
@inproceedings{Harman2013,
  author = {Mark Harman},
  title = {Software Engineering: An Ideal Set of Challenges for Evolutionary Computation},
  booktitle = {Proceedings of The 15th International Conference Companion on Genetic and Evolutionary Computation (GECCO '13)},
  year = {2013},
  pages = {1759-1760},
  doi = {http://dx.doi.org/10.1145/2464576.2480770}
}
Harman, M., Langdon, W.B. and Weimer, W. Genetic Programming for Reverse Engineering 2013 Proceedings of the 20th Working Conference on Reverse Engineering (WCRE '13) - Keynote, pp. 1-10  inproceedings DOI  
Abstract: This paper overviews the application of Search Based Software Engineering (SBSE) to reverse engineering with a particular emphasis on the growing importance of recent developments in genetic programming and genetic improvement for reverse engineering. This includes work on SBSE for re-modularisation, refactoring, regression testing, syntax-preserving slicing and dependence analysis, concept assignment and feature location, bug fixing, and code migration. We also explore the possibilities for new directions in research using GP and GI for partial evaluation, amorphous slicing, and product lines, with a particular focus on code transplantation.
BibTeX:
@inproceedings{HarmanLW13,
  author = {Mark Harman and William B. Langdon and Westley Weimer},
  title = {Genetic Programming for Reverse Engineering},
  booktitle = {Proceedings of the 20th Working Conference on Reverse Engineering (WCRE '13) - Keynote},
  publisher = {IEEE},
  year = {2013},
  pages = {1-10},
  doi = {http://dx.doi.org/10.1109/WCRE.2013.6671274}
}
Jia, Y., Cohen, M.B., Harman, M. and Petke, J. Learning Combinatorial Interaction Testing Strategies using Hyperheuristic Search 2013 (RN/13/17)  techreport URL 
Abstract: Two decades of bespoke Combinatorial Interaction Testing (CIT) algorithm development have left software engineers with a bewildering choice of configurable system testing techniques. This paper introduces a single hyperheuristic algorithm that earns CIT strategies, providing a single generalist approach. We report experiments that show that our algorithm competes with known best solutions across constrained and unconstrained problems. For all 26 real world subjects and 29 of the 30 constrained benchmark problems studied, it equals or improves upon the best known result. We also present evidence that our algorithm's strong generic performance is caused by its effective unsupervised learning. Hyperheuristic search is thus a promising way to relocate CIT design intelligence from human to machine.
BibTeX:
@techreport{JiaCHP13,
  author = {Yue Jia and Myra B. Cohen and Mark Harman and Justyna Petke},
  title = {Learning Combinatorial Interaction Testing Strategies using Hyperheuristic Search},
  year = {2013},
  number = {RN/13/17},
  url = {http://www.cs.ucl.ac.uk/fileadmin/UCL-CS/research/Research_Notes/RN_13_17.pdf}
}
Krawiec, K. and Swan, J. Pattern-Guided Genetic Programming 2013 Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation (GECCO '13), pp. 949-956  inproceedings DOI  
Abstract: Online progress in search and optimization is often hindered by neutrality in the fitness landscape, when many genotypes map to the same fitness value. We propose a method for imposing a gradient on the fitness function of a metaheuristic (in this case, Genetic Programming) via a metric (Minimum Description Length) induced from patterns detected in the trajectory of program execution. These patterns are induced via a decision tree classifier. We apply this method to a range of integer and boolean-valued problems, significantly outperforming the standard approach. The method is conceptually straightforward and applicable to virtually any metaheuristic that can be appropriately instrumented.
BibTeX:
@inproceedings{KrawiecS13,
  author = {Krzysztof Krawiec and Jerry Swan},
  title = {Pattern-Guided Genetic Programming},
  booktitle = {Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation (GECCO '13)},
  publisher = {ACM},
  year = {2013},
  pages = {949-956},
  doi = {http://dx.doi.org/10.1145/2463372.2463496}
}
Krawiec, K. and Swan, J. Guiding Evolutionary Learning by Searching for Regularities in Behavioral Trajectories: A Case for Representation Agnosticism 2013 AAAI Fall Symposium: How Should Intelligence be Abstracted in AI Research  inproceedings URL 
BibTeX:
@inproceedings{KrawiecS13b,
  author = {Krzysztof Krawiec and Jerry Swan},
  title = {Guiding Evolutionary Learning by Searching for Regularities in Behavioral Trajectories: A Case for Representation Agnosticism},
  booktitle = {AAAI Fall Symposium: How Should Intelligence be Abstracted in AI Research},
  year = {2013},
  url = {http://www.cs.stir.ac.uk/~jsw/2013AAAIFallSymposium.pdf}
}
López-Camacho, E., Ochoa, G., Terashima-Marín, H. and Burke, E.K. An Effective Heuristic for the Two-dimensional Irregular Bin Packing Problem 2013 Annals of Operations Research
Vol. 206(1), pp. 241-264 
article DOI  
Abstract: This paper proposes an adaptation, to the two-dimensional irregular bin packing problem of the Djang and Finch heuristic (DJD), originally designed for the one-dimensional bin packing problem. In the two-dimensional case, not only is it the case that the piece’s size is important but its shape also has a significant influence. Therefore, DJD as a selection heuristic has to be paired with a placement heuristic to completely construct a solution to the underlying packing problem. A successful adaptation of the DJD requires a routine to reduce computational costs, which is also proposed and successfully tested in this paper. Results, on a wide variety of instance types with convex polygons, are found to be significantly better than those produced by more conventional selection heuristics.
BibTeX:
@article{Lopez-CamachoOTB13,
  author = {Eunice López-Camacho and Gabriela Ochoa and Hugo Terashima-Marín and Edmund K. Burke},
  title = {An Effective Heuristic for the Two-dimensional Irregular Bin Packing Problem},
  journal = {Annals of Operations Research},
  year = {2013},
  volume = {206},
  number = {1},
  pages = {241-264},
  doi = {http://dx.doi.org/10.1007/s10479-013-1341-4}
}
López-Camacho, E., Terashima-Marín, H., Ochoa, G. and Conant-Pablos, S.E. Understanding the Structure of Bin Packing Problems through Principal Component Analysis 2013 International Journal of Production Economics
Vol. 145(2), pp. 488-499 
article DOI  
Abstract: This paper uses a knowledge discovery method, Principal Component Analysis (PCA), to gain a deeper understanding of the structure of bin packing problems and how this relates to the performance of heuristic approaches to solve them. The study considers six heuristics and their combination through an evolutionary hyper-heuristic framework. A wide set of problem instances is considered, including one-dimensional and two-dimensional regular and irregular problems. A number of problem features are considered, which are reduced to the subset of nine features that more strongly relate with heuristic performance. PCA is used to further reduce the dimensionality of the instance features and produce 2D maps. The performance of the heuristics and hyper-heuristics is then super-imposed into these maps to visually reveal relationships between problem features and heuristic behavior. Our analysis indicates that some instances are clearly harder to solve than others for all the studied heuristics and hyper-heuristics. The PCA maps give a valuable indication of the combination of features characterizing easy and hard to solve instances. We found indeed correlations between instance features and heuristic performance. The so-called DJD heuristics are able to best solve a large proportion of instances, but simpler and faster heuristics can outperform them in some cases. In particular when solving 1D instances with low number of pieces, and, more surprisingly, when solving some difficult 2D instances with small areas with low variability. This analysis can be generalized to other problem domains where a set of features characterize instances and several problem solving heuristics are available.
BibTeX:
@article{Lopez-CamachoTOC13,
  author = {Eunice López-Camacho and Hugo Terashima-Marín and Gabriela Ochoa and Santiago Enrique Conant-Pablos},
  title = {Understanding the Structure of Bin Packing Problems through Principal Component Analysis},
  journal = {International Journal of Production Economics},
  year = {2013},
  volume = {145},
  number = {2},
  pages = {488-499},
  doi = {http://dx.doi.org/10.1016/j.ijpe.2013.04.041}
}
Menzies, T., Kocaguneli, E., Turhan, B., Minku, L.L. and Peters, F. Data Science for Software Engineering 2013 Proceedings of the 35th International Conference on Software Engineering (ICSE '13) - ICSE 2013 Tutorials Track, pp. 1484-1486  inproceedings URL 
Abstract: Target audience: Software practitioners and researchers wanting to understand the state of the art in using data science for software engineering (SE). Content: In the age of big data, data science (the knowledge of deriving meaningful outcomes from data) is an essential skill that should be equipped by software engineers. It can be used to predict useful information on new projects based on completed projects. This tutorial offers core insights about the state-of-the-art in this important field. What participants will learn: Before data science: this tutorial discusses the tasks needed to deploy machine-learning algorithms to organizations (Part1: Organization Issues). During data science: from discretization to clustering to dichotomization and statistical analysis. And the rest: When local data is scarce, we show how to adapt data from other organizations to local problems. When privacy concerns block access, we show how to privatize data while still being able to mine it. When working with data of dubious quality, we show how to prune spurious information. When data or models seem too complex, we show how to simplify data mining results. When data is too scarce to support intricate models, we show methods for generating predictions. When the world changes, and old models need to be updated, we show how to handle those updates. When the effect is too complex for one model, we show how to reason across ensembles of models. Pre-requisites: This tutorial makes minimal use of maths of advanced algorithms and would be understandable by developers and technical managers.
BibTeX:
@inproceedings{MenziesKTMP13,
  author = {Tim Menzies and Ekrem Kocaguneli and Burak Turhan and Leandro L. Minku and Fayola Peters},
  title = {Data Science for Software Engineering},
  booktitle = {Proceedings of the 35th International Conference on Software Engineering (ICSE '13) - ICSE 2013 Tutorials Track},
  publisher = {IEEE},
  year = {2013},
  pages = {1484-1486},
  url = {http://delivery.acm.org/10.1145/2490000/2487048/p1484-menzies.pdf?ip=128.16.11.133&id=2487048&acc=ACTIVE%20SERVICE&key=BF07A2EE685417C5%2ED93309013A15C57B%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID=757811719&CFTOKEN=67050471&__acm__=1456843848_930aa81a346ebfcdc1c81ddd5c714ffd}
}
Minku, L.L. and Yao, X. An Analysis of Multi-objective Evolutionary Algorithms for Training Ensemble Models Based on Different Performance Measures in Software Effort Estimation 2013 Proceedings of the 9th International Conference on Predictive Models in Software Engineering (PROMISE '13)  inproceedings DOI  
Abstract: Background: Previous work showed that Multi-objective Evolutionary Algorithms (MOEAs) can be used for training ensembles of learning machines for Software Effort Estimation (SEE) by optimising different performance measures concurrently. Optimisation based on three measures (LSD, MMRE and PRED(25)) was analysed and led to promising results in terms of performance on these and other measures.

Aims: (a) It is not known how well ensembles trained on other measures would behave for SEE, and whether training on certain measures would improve performance particularly on these measures. (b) It is also not known whether it is best to include all SEE models created by the MOEA into the ensemble, or solely the models with the best training performance in terms of each measure being optimised. Investigating (a) and (b) is the aim of this work.

Method: MOEAs were used to train ensembles by optimising four different sets of performance measures, involving a total of nine different measures. The performance of all ensembles was then compared based on all these nine performance measures. Ensembles composed of different sets of models generated by the MOEAs were also compared.

Results: (a) Ensembles trained on LSD, MMRE and PRED (25) obtained the best results in terms of most performance measures, being considered more successful than the others. Optimising certain performance measures did not necessarily lead to the best test performance on these particular measures probably due to overfitting. (b) There was no inherent advantage in using ensembles composed of all the SEE models generated by the MOEA in comparison to using solely the best SEE model according to each measure separately.

Conclusions: Care must be taken to prevent overfitting on the performance measures being optimised. Our results suggest that concurrently optimising LSD, MMRE and PRED (25) promoted more ensemble diversity than other combinations of measures, and hence performed best. Low diversity is more likely to lead to overfitting.

BibTeX:
@inproceedings{MinkuY13,
  author = {Leandro L. Minku and Xin Yao},
  title = {An Analysis of Multi-objective Evolutionary Algorithms for Training Ensemble Models Based on Different Performance Measures in Software Effort Estimation},
  booktitle = {Proceedings of the 9th International Conference on Predictive Models in Software Engineering (PROMISE '13)},
  publisher = {ACM},
  year = {2013},
  doi = {http://dx.doi.org/10.1145/2499393.2499396}
}
Minku, L.L. and Yao, X. Ensembles and Locality: Insight on Improving Software Effort Estimation 2013 Information and Software Technology (Special Issue on Best Papers from PROMISE 2011)
Vol. 55(8), pp. 1512–1528 
article DOI  
Abstract: Context Ensembles of learning machines and locality are considered two important topics for the next research frontier on Software Effort Estimation (SEE). Objectives We aim at (1) evaluating whether existing automated ensembles of learning machines generally improve SEEs given by single learning machines and which of them would be more useful; (2) analysing the adequacy of different locality approaches; and getting insight on (3) how to improve SEE and (4) how to evaluate/choose machine learning (ML) models for SEE. Method A principled experimental framework is used for the analysis and to provide insights that are not based simply on intuition or speculation. A comprehensive experimental study of several automated ensembles, single learning machines and locality approaches, which present features potentially beneficial for SEE, is performed. Additionally, an analysis of feature selection and regression trees (RTs), and an investigation of two tailored forms of combining ensembles and locality are performed to provide further insight on improving SEE. Results Bagging ensembles of RTs show to perform well, being highly ranked in terms of performance across different data sets, being frequently among the best approaches for each data set and rarely performing considerably worse than the best approach for any data set. They are recommended over other learning machines should an organisation have no resources to perform experiments to chose a model. Even though RTs have been shown to be more reliable locality approaches, other approaches such as k-Means and k-Nearest Neighbours can also perform well, in particular for more heterogeneous data sets. Conclusion Combining the power of automated ensembles and locality can lead to competitive results in SEE. By analysing such approaches, we provide several insights that can be used by future research in the area.
BibTeX:
@article{MinkuY13b,
  author = {Leandro L. Minku and Xin Yao},
  title = {Ensembles and Locality: Insight on Improving Software Effort Estimation},
  journal = {Information and Software Technology (Special Issue on Best Papers from PROMISE 2011)},
  year = {2013},
  volume = {55},
  number = {8},
  pages = {1512–1528},
  doi = {http://dx.doi.org/10.1016/j.infsof.2012.09.012}
}
Minku, L.L. and Yao, X. Software Effort Estimation as a Multi-objective Learning Problem 2013 ACM Transactions on Software Engineering and Methodology
Vol. 22(4) 
article DOI  
Abstract: Ensembles of learning machines are promising for software effort estimation (SEE), but need to be tailored for this task to have their potential exploited. A key issue when creating ensembles is to produce diverse and accurate base models. Depending on how differently different performance measures behave for SEE, they could be used as a natural way of creating SEE ensembles. We propose to view SEE model creation as a multi-objective learning problem. A multi-objective evolutionary algorithm (MOEA) is used to better understand the trade-off among different performance measures by creating SEE models through the simultaneous optimisation of these measures. We show that the performance measures behave very differently, presenting sometimes even opposite trends. They are then used as a source of diversity for creating SEE ensembles. A good trade-off among different measures can be obtained by using an ensemble of MOEA solutions. This ensemble performs similarly or better than a model that does not consider these measures explicitly. Besides, MOEA is also ?exible, allowing emphasis of a particular measure if desired. In conclusion, MOEA can be used to better understand the relationship among performance measures and has shown to be very effective in creating SEE models.
BibTeX:
@article{MinkuY13c,
  author = {Leandro L. Minku and Xin Yao},
  title = {Software Effort Estimation as a Multi-objective Learning Problem},
  journal = {ACM Transactions on Software Engineering and Methodology},
  year = {2013},
  volume = {22},
  number = {4},
  doi = {http://dx.doi.org/10.1145/2522920.2522928}
}
Ochoa, G. Search Methodologies in Real-world Software Engineering 2013 Proceeding of The 15th Annual Conference Companion on Genetic and Evolutionary Computation (GECCO '13), pp. 1085-1088  inproceedings DOI  
Abstract: One of the aims of software engineering is to reduce overall software costs. Optimisation is, therefore, relevant to the process of software development. This article describes recent case studies on the application of modern search methodologies to challenging real-world problems in software engineering. It also describes a recent research initiative: Dynamic Adaptive Automated Software Engineering (DAASE), whose goal is to embed optimisation into deployed software to create self-optimising adaptive systems. The article accompanies an invited talk for the Workshop on Bridging the Gap between Industry and Academia in Optimisation to be held as part of GECCO 2013.
BibTeX:
@inproceedings{Ochoa13,
  author = {Gabriela Ochoa},
  title = {Search Methodologies in Real-world Software Engineering},
  booktitle = {Proceeding of The 15th Annual Conference Companion on Genetic and Evolutionary Computation (GECCO '13)},
  year = {2013},
  pages = {1085-1088},
  doi = {http://dx.doi.org/10.1145/2464576.2482687}
}
Ochoa, G. and Villasana, M. Population-based Optimization of Cytostatic/cytotoxic Combination Cancer Chemotherapy 2013 Soft Computing
Vol. 17(6), pp. 913-924 
article DOI  
Abstract: This article studies the suitability of modern population based algorithms for designing combination cancer chemotherapies. The problem of designing chemotherapy schedules is expressed as an optimization problem (an optimal control problem) where the objective is to minimize the tumor size without compromising the patient’s health. Given the complexity of the underlying mathematical model describing the tumor’s progression (considering two types of drugs, the cell cycle and the immune system response), analytical and classical optimization methods are not suitable, instead, stochastic heuristic optimization methods are the right tool to solve the optimal control problem. Considering several solution quality and performance metrics, we compared three powerful heuristic algorithms for real-parameter optimization, namely, CMA evolution strategy, differential evolution, and particle swarm pattern search method. The three algorithms were able to successfully solve the posed problem. However, differential evolution outperformed its counterparts both in quality of the obtained solutions and efficiency of search.
BibTeX:
@article{OchoaV13,
  author = {Gabriela Ochoa and Minaya Villasana},
  title = {Population-based Optimization of Cytostatic/cytotoxic Combination Cancer Chemotherapy},
  journal = {Soft Computing},
  year = {2013},
  volume = {17},
  number = {6},
  pages = {913-924},
  doi = {http://dx.doi.org/10.1007/s00500-013-1043-5}
}
Ochoa, G., Verel, S., Daolio, F. and Tomassini, M. Local Optima Networks: A New Model of Combinatorial Fitness Landscapes 2013 , pp. 1-33  inbook URL 
Abstract: This chapter overviews a recently introduced network-based model of combinatorial landscapes: Local Optima Networks (LON). The model compresses the information given by the whole search space into a smaller mathematical object that is a graph having as vertices the local optima and as edges the possible weighted transitions between them. Two definitions of edges have been proposed: basin-transition and escape-edges, which capture relevant topological features of the underlying search spaces. This network model brings a new set of metrics to characterize the structure of combinatorial landscapes, those associated with the science of complex networks. These metrics are described, and results are presented of local optima network extraction and analysis for two selected combinatorial landscapes: NK landscapes and the quadratic assignment problem. Network features are found to correlate with and even predict the performance of heuristic search algorithms operating on these problems.
BibTeX:
@inbook{OchoaVDT13,
  author = {Gabriela Ochoa and Sebastien Verel and Fabio Daolio and Marco Tomassini},
  title = {Local Optima Networks: A New Model of Combinatorial Fitness Landscapes},
  publisher = {Springer},
  year = {2013},
  pages = {1-33},
  url = {http://www.cs.stir.ac.uk/~goc/papers/LONChapter2013.pdf}
}
Pappa, G.L., Ochoa, G., Hyde, M.R., Freitas, A.A., Woodward, J. and Swan, J. Contrasting Meta-learning and Hyper-heuristic Research: the Role of Evolutionary Algorithms 2013 Genetic Programming and Evolvable Machines  article DOI  
Abstract: The fields of machine meta-learning and hyper-heuristic optimisation have developed mostly independently of each other, although evolutionary algorithms (particularly genetic programming) have recently played an important role in the development of both fields. Recent work in both fields shares a common goal, that of automating as much of the algorithm design process as possible. In this paper we first provide a historical perspective on automated algorithm design, and then we discuss similarities and differences between meta-learning in the field of supervised machine learning (classification) and hyper-heuristics in the field of optimisation. This discussion focuses on the dimensions of the problem space, the algorithm space and the performance measure, as well as clarifying important issues related to different levels of automation and generality in both fields. We also discuss important research directions, challenges and foundational issues in meta-learning and hyper-heuristic research. It is important to emphasize that this paper is not a survey, as several surveys on the areas of meta-learning and hyper-heuristics (separately) have been previously published. The main contribution of the paper is to contrast meta-learning and hyper-heuristics methods and concepts, in order to promote awareness and cross-fertilisation of ideas across the (by and large, non-overlapping) different communities of meta-learning and hyper-heuristic researchers. We hope that this cross-fertilisation of ideas can inspire interesting new research in both fields and in the new emerging research area which consists of integrating those fields.
BibTeX:
@article{PappaOHFWS,
  author = {Gisele L. Pappa and Gabriela Ochoa and Matthew R. Hyde and Alex A. Freitas and John Woodward and Jerry Swan},
  title = {Contrasting Meta-learning and Hyper-heuristic Research: the Role of Evolutionary Algorithms},
  journal = {Genetic Programming and Evolvable Machines},
  year = {2013},
  doi = {http://dx.doi.org/10.1007/s10710-013-9186-9}
}
Patrick, M., Alexander, R., Oriol, M. and Clark, J.A. Efficient Subdomains for Random Testing 2013
Vol. 8084Proceedings of the 5th International Symposium on Search Based Software Engineering (SSBSE '13), pp. 251-256 
inproceedings DOI  
Abstract: Opinion is divided over the effectiveness of random testing. It produces test cases cheaply, but struggles with boundary conditions and is labour intensive without an automated oracle. We have created a search-based testing technique that evolves multiple sets of efficient subdomains, from which small but effective test suites can be randomly sampled. The new technique handles boundary conditions by targeting different mutants with each set of subdomains. It achieves an average 230% improvement in mutation score over conventional random testing.
BibTeX:
@inproceedings{PatrickAOC13,
  author = {Matthew Patrick and Rob Alexander and Manuel Oriol and John A. Clark},
  title = {Efficient Subdomains for Random Testing},
  booktitle = {Proceedings of the 5th International Symposium on Search Based Software Engineering (SSBSE '13)},
  publisher = {Springer},
  year = {2013},
  volume = {8084},
  pages = {251-256},
  doi = {http://dx.doi.org/10.1007/978-3-642-39742-4_20}
}
Petke, J., Langdon, W.B. and Harman, M. Applying Genetic Improvement to MiniSAT 2013
Vol. 8084Proceedings of the 5th International Symposium on Search Based Software Engineering (SSBSE '13), pp. 257-262 
inproceedings DOI  
Abstract: Genetic Programming (GP) has long been applied to several SBSE problems. Recently there has been much interest in using GP and its variants to solve demanding problems in which the code evolved by GP is intended for deployment. This paper investigates the application of genetic improvement to a challenging problem of improving a well-studied system: a Boolean satisfiability (SAT) solver called MiniSAT. Many programmers have tried to make this very popular solver even faster and a separate SAT competition track has been created to facilitate this goal. Thus genetically improving MiniSAT poses a great challenge. Moreover, due to a wide range of applications of SAT solving technologies any improvement could have a great impact. Our initial results show that there is some room for improvement. However, a significantly more efficient version of MiniSAT is yet to be discovered.
BibTeX:
@inproceedings{PetkeLH13,
  author = {Justyna Petke and William B. Langdon and Mark Harman},
  title = {Applying Genetic Improvement to MiniSAT},
  booktitle = {Proceedings of the 5th International Symposium on Search Based Software Engineering (SSBSE '13)},
  publisher = {Springer},
  year = {2013},
  volume = {8084},
  pages = {257-262},
  doi = {http://dx.doi.org/10.1007/978-3-642-39742-4_21}
}
Petke, J., Yoo, S., Cohen, M.B. and Harman, M. Efficiency and Early Fault Detection with Lower and Higher Strength Combinatorial Interaction Testing 2013 Proceedings of the 9th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE '13), pp. 26-36  inproceedings DOI  
Abstract: Combinatorial Interaction Testing (CIT) is important because it tests the interactions between the many features and parameters that make up the configuration space of software systems. However, in order to be practically applicable, it must be able to cater for soft and hard real-world constraints and should, ideally, report a test priority order that maximises earliest fault detection. We show that we can achieve the highest strength CIT in 5.65 minutes on average. This was previously thought to be too computationally expensive to be feasible. Furthermore, we show that higher strength suites find more faults, while prioritisations using lower strengths are no worse at achieving early fault revelation.
BibTeX:
@inproceedings{PetkeYCH13,
  author = {Justyna Petke and Shin Yoo and Myra B. Cohen and Mark Harman},
  title = {Efficiency and Early Fault Detection with Lower and Higher Strength Combinatorial Interaction Testing},
  booktitle = {Proceedings of the 9th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE '13)},
  publisher = {ACM},
  year = {2013},
  pages = {26-36},
  doi = {http://dx.doi.org/10.1145/2491411.2491436}
}
Poskitt, C.M. and Poulding, S. Using Contracts to Guide the Search-Based Verification of Concurrent Programs 2013
Vol. 8084Proceedings of the 5th International Symposium on Search Based Software Engineering (SSBSE '13), pp. 263-268 
inproceedings DOI  
Abstract: Search-based techniques can be used to identify whether a concurrent program exhibits faults such as race conditions, deadlocks, and starvation: a fitness function is used to guide the search to a region of the program’s state space in which these concurrency faults are more likely occur. In this short paper, we propose that contracts specified by the developer as part of the program’s implementation could be used to provide additional guidance to the search. We sketch an example of how contracts might be used in this way, and outline our plans for investigating this verification approach.
BibTeX:
@inproceedings{PoskittP13,
  author = {Christopher M. Poskitt and Simon Poulding},
  title = {Using Contracts to Guide the Search-Based Verification of Concurrent Programs},
  booktitle = {Proceedings of the 5th International Symposium on Search Based Software Engineering (SSBSE '13)},
  publisher = {Springer},
  year = {2013},
  volume = {8084},
  pages = {263-268},
  doi = {http://dx.doi.org/10.1007/978-3-642-39742-4_22}
}
Poulding, S., Alexander, R., Clark, J.A. and Hadley, M.J. The Optimisation of Stochastic Grammars to Enable Cost-effective Probabilistic Structural Testing 2013 Proceeding of The 15th Annual Conference on Genetic and Evolutionary Computation (GECCO '13), pp. 1477-1484  inproceedings DOI  
Abstract: The effectiveness of probabilistic structural testing depends on the characteristics of the probability distribution from which test inputs are sampled at random. Metaheuristic search has been shown to be a practical method of optimising the characteristics of such distributions. However, the applicability of the existing search-based algorithm is limited by the requirement that the software's inputs must be a fixed number of numeric values. In this paper we relax this limitation by means of a new representation for the probability distribution. The representation is based on stochastic context-free grammars but incorporates two novel extensions: conditional production weights and the aggregation of terminal symbols representing numeric values. We demonstrate that an algorithm which combines the new representation with hill-climbing search is able to efficiently derive probability distributions suitable for testing software with structurally-complex input domains.
BibTeX:
@inproceedings{PouldingACH13,
  author = {Simon Poulding and Robert Alexander and John A. Clark and Mark J. Hadley},
  title = {The Optimisation of Stochastic Grammars to Enable Cost-effective Probabilistic Structural Testing},
  booktitle = {Proceeding of The 15th Annual Conference on Genetic and Evolutionary Computation (GECCO '13)},
  publisher = {ACM},
  year = {2013},
  pages = {1477-1484},
  doi = {http://dx.doi.org/10.1145/2463372.2463550}
}
Rose, L.M. and Poulding, S. Efficient Probabilistic Testing of Model Transformations using Search 2013 Proceedings of the 1st International Workshop on Combining Modelling and Search-Based Software Engineering (CMSBSE '13), pp. 16-21  inproceedings DOI  
Abstract: Checking the output of a test case for correctness-applying a test oracle-is challenging for many types of software, including model transformations. Decreasing the number of test cases that are executed during testing will therefore reduce the costs involved in testing a model transformation. However, there is a trade-off: to be confident that the transformation fulfils its specification requires the execution of sufficient test cases to fully exercise the transformation. In this paper, we demonstrate a process that derives a method of sampling random models which enables the test engineer to balance testing cost and testing efficacy. The output of the process is an optimised probability distribution over the models on which the transformation acts; test sets that efficiently exercise the transformation may then be derived by sampling models from the optimised distribution. Furthermore, we describe benefits and challenges of combining model-driven engineering and search-based software engineering tools and techniques, which include conflating metamodels with grammars to enable grammar-based search techniques over a set of models, and the need to increase the scalability of model-driven engineering tools to make them more amenable to search.
BibTeX:
@inproceedings{RoseP13,
  author = {Louis M. Rose and Simon Poulding},
  title = {Efficient Probabilistic Testing of Model Transformations using Search},
  booktitle = {Proceedings of the 1st International Workshop on Combining Modelling and Search-Based Software Engineering (CMSBSE '13)},
  publisher = {IEEE},
  year = {2013},
  pages = {16-21},
  doi = {http://dx.doi.org/10.1109/CMSBSE.2013.6604431}
}
Rose, L., Poulding, S., Feldt, R. and Paige, R. Towards A Scalable Cloud Platform for Search-Based Probabilistic Testing 2013 Proceedings of the 29th IEEE International Conference on Software Maintenance (ICSM '13), pp. 480-483  inproceedings DOI  
Abstract: Probabilistic testing techniques that sample input data at random from a probability distribution can be more effective at detecting faults than deterministic techniques. However, if overly large (and therefore expensive) test sets are to be avoided, the probability distribution from which the input data is sampled must be optimised to the particular software-under-test. Such an optimisation process is often resource-intensive. In this paper, we present a prototypical cloud platform-and architecture-that permits the optimisation of such probability distributions in a scalable, distributed and robust manner, and thereby enables cost-effective probabilistic testing.
BibTeX:
@inproceedings{RosePFP13,
  author = {Louis Rose and Simon Poulding and Robert Feldt and Richard Paige},
  title = {Towards A Scalable Cloud Platform for Search-Based Probabilistic Testing},
  booktitle = {Proceedings of the 29th IEEE International Conference on Software Maintenance (ICSM '13)},
  publisher = {IEEE},
  year = {2013},
  pages = {480-483},
  doi = {http://dx.doi.org/10.1109/ICSM.2013.77}
}
Burke, E., Swan, J. and Woodward, J. Hyper-heuristics generate heuristics for problem classes 2013 OR55 Conference, Exeter  misc  
BibTeX:
@misc{SwanOR55b,
  author = {Edmund Burke and Jerry Swan and John Woodward},
  title = {Hyper-heuristics generate heuristics for problem classes},
  year = {2013}
}
Wang, T., Harman, M., Jia, Y. and Krinke, J. Searching for Better Configurations: A Rigorous Approach to Clone Evaluation 2013 Proceedings of the 9th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE '13), pp. 455-465  inproceedings DOI  
Abstract: Clone detection finds application in many software engineering activities such as comprehension and refactoring. However, the confounding configuration choice problem poses a widely-acknowledged threat to the validity of previous empirical analyses. We introduce desktop and parallelised cloud-deployed versions of a search based solution that finds suitable configurations for empirical studies. We evaluate our approach on 6 widely used clone detection tools applied to the Bellon suite of 8 subject systems. Our evaluation reports the results of 9.3 million total executions of a clone tool; the largest study yet reported. Our approach finds significantly better configurations (p < 0.05) than those currently used, providing evidence that our approach can ameliorate the confounding configuration choice problem.
BibTeX:
@inproceedings{WangHJK13,
  author = {Tiantian Wang and Mark Harman and Yue Jia and Jens Krinke},
  title = {Searching for Better Configurations: A Rigorous Approach to Clone Evaluation},
  booktitle = {Proceedings of the 9th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE '13)},
  publisher = {ACM},
  year = {2013},
  pages = {455-465},
  doi = {http://dx.doi.org/10.1145/2491411.2491420}
}
Wang, S., Minku, L.L., Ghezzi, D., Caltabiano, D., Tino, P. and Yao, X. Concept Drift Detection for Online Class Imbalance Learning 2013 Proceedings of International Joint Conference on Neural Networks (IJCNN '13), pp. 1-10  inproceedings DOI  
Abstract: Concept drift detection methods are crucial components of many online learning approaches. Accurate drift detections allow prompt reaction to drifts and help to maintain high performance of online models over time. Although many methods have been proposed, no attention has been given to data streams with imbalanced class distributions, which commonly exist in real-world applications, such as fault diagnosis of control systems and intrusion detection in computer networks. This paper studies the concept drift problem for online class imbalance learning. We look into the impact of concept drift on single-class performance of online models based on three types of classi?ers, under seven different scenarios with the presence of class imbalance. The analysis reveals that detecting drift in imbalanced data streams is a more dif?cult task than in balanced ones. Minority-class recall suffers from a signi?cant drop after the drift involving the minority class. Overall accuracy is not suitable for drift detection. Based on the ?ndings, we propose a new detection method DDM-OCI derived from the existing method DDM. DDM-OCI monitors minorityclass recall online to capture the drift. The results show a quick response of the online model working with DDM-OCI to the new concept.
BibTeX:
@inproceedings{WangMGCTY13,
  author = {Shuo Wang and Leandro L. Minku and Davide Ghezzi and Daniele Caltabiano and Peter Tino and Xin Yao},
  title = {Concept Drift Detection for Online Class Imbalance Learning},
  booktitle = {Proceedings of International Joint Conference on Neural Networks (IJCNN '13)},
  publisher = {IEEE},
  year = {2013},
  pages = {1-10},
  doi = {http://dx.doi.org/10.1109/IJCNN.2013.6706768}
}
Wang, S., Minku, L.L. and Yao, X. A Learning Framework for Online Class Imbalance Learning 2013 Proceedings of IEEE Symposium Series on Computational Intelligence (SSCI '13)  inproceedings URL 
Abstract: Online learning has been showing to be very useful for a large number of applications in which data arrive continuously and a timely response is required. In many online cases, the data stream can have very skewed class distributions, known as class imbalance, such as fault diagnosis of realtime control monitoring systems and intrusion detection in computer networks. Classifying imbalanced data streams poses new challenges, which have attracted very little attention so far. As the ?rst work that formally addresses this problem, this paper looks into the underlying issues, clari?es the research questions, and proposes a framework for online class imbalance learning that decomposes the learning task into three modules. Within the framework, we use a time decay function to capture the imbalance rate dynamically. Then, we propose a class imbalance detection method, in order to decide the current imbalance status in data streams. According to this information, two resamplingbased online learning algorithms are developed to tackle class imbalance in data streams. Three basic types of class imbalance change are discussed in our studies. The results suggest the usefulness of the learning framework. The proposed methods are shown to be effective on both minority-class accuracy and overall performance in all three cases we considered.
BibTeX:
@inproceedings{WangMY13,
  author = {Shuo Wang and Leandro L. Minku and Xin Yao},
  title = {A Learning Framework for Online Class Imbalance Learning},
  booktitle = {Proceedings of IEEE Symposium Series on Computational Intelligence (SSCI '13)},
  year = {2013},
  url = {http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6613138}
}
Wang, S., Minku, L.L. and Yao, X. Online Class Imbalance Learning and Its Applications in Fault Detection 2013 International Journal of Computational Intelligence and Applications
Vol. 12(4) 
article DOI  
Abstract: Although class imbalance learning and online learning have been extensively studied in the literature separately, online class imbalance learning that considers the challenges of both fields has not drawn much attention. It deals with data streams having very skewed class distributions, such as fault diagnosis of real-time control monitoring systems and intrusion detection in computer networks. To fill in this research gap and contribute to a wide range of real-world applications, this paper first formulates online class imbalance learning problems. Based on the problem formulation, a new online learning algorithm, sampling-based online bagging (SOB), is proposed to tackle class imbalance adaptively. Then, we study how SOB and other state-of-the-art methods can benefit a class of fault detection data under various scenarios and analyze their performance in depth. Through extensive experiments, we find that SOB can balance the performance between classes very well across different data domains and produce stable G-mean when learning constantly imbalanced data streams, but it is sensitive to sudden changes in class imbalance, in which case SOB's predecessor undersampling-based online bagging (UOB) is more robust.
BibTeX:
@article{WangMY13b,
  author = {Shuo Wang and Leandro L. Minku and Xin Yao},
  title = {Online Class Imbalance Learning and Its Applications in Fault Detection},
  journal = {International Journal of Computational Intelligence and Applications},
  year = {2013},
  volume = {12},
  number = {4},
  doi = {http://dx.doi.org/10.1142/S1469026813400014}
}
Wang, S. and Yao, X. Using Class Imbalance Learning for Software Defect Prediction 2013 IEEE Transactions on Reliability
Vol. 62(2), pp. 434-443 
article DOI  
Abstract: To facilitate software testing, and save testing costs, a wide range of machine learning methods have been studied to predict defects in software modules. Unfortunately, the imbalanced nature of this type of data increases the learning difficulty of such a task. Class imbalance learning specializes in tackling classification problems with imbalanced distributions, which could be helpful for defect prediction, but has not been investigated in depth so far. In this paper, we study the issue of if and how class imbalance learning methods can benefit software defect prediction with the aim of finding better solutions. We investigate different types of class imbalance learning methods, including resampling techniques, threshold moving, and ensemble algorithms. Among those methods we studied, AdaBoost.NC shows the best overall performance in terms of the measures including balance, G-mean, and Area Under the Curve (AUC). To further improve the performance of the algorithm, and facilitate its use in software defect prediction, we propose a dynamic version of AdaBoost.NC, which adjusts its parameter automatically during training. Without the need to pre-define any parameters, it is shown to be more effective and efficient than the original AdaBoost.NC.
BibTeX:
@article{WangY13,
  author = {Suo Wang and Xin Yao},
  title = {Using Class Imbalance Learning for Software Defect Prediction},
  journal = {IEEE Transactions on Reliability},
  year = {2013},
  volume = {62},
  number = {2},
  pages = {434-443},
  doi = {http://dx.doi.org/10.1109/TR.2013.2259203}
}
Xie, X., Kuo, F.-C., Chen, T.Y., Yoo, S. and Harman, M. Provably Optimal and Human-Competitive Results in SBSE for Spectrum Based Fault Localisation 2013
Vol. 8084Proceedings of the 5th International Symposium on Search Based Software Engineering (SSBSE '13), pp. 224-238 
inproceedings DOI  
Abstract: Fault localisation uses so-called risk evaluation formulæ to guide the localisation process. For more than a decade, the design and improvement of these formulæ has been conducted entirely manually through iterative publication in the fault localisation literature. However, recently we demonstrated that SBSE could be used to automatically design such formulæ by recasting this as a problem for Genetic Programming(GP). In this paper we prove that our GP has produced four previously unknown globally optimal formulæ. Though other human competitive results have previously been reported in the SBSE literature, this is the first SBSE result, in any application domain, for which human competitiveness has been formally proved. We also show that some of these formulæ exhibit counter-intuitive characteristics, making them less likely to have been found solely by further human effort.
BibTeX:
@inproceedings{XieKCYH13,
  author = {Xiaoyuan Xie and Fei-Ching Kuo and Tsong Yueh Chen and Shin Yoo and Mark Harman},
  title = {Provably Optimal and Human-Competitive Results in SBSE for Spectrum Based Fault Localisation},
  booktitle = {Proceedings of the 5th International Symposium on Search Based Software Engineering (SSBSE '13)},
  publisher = {Springer},
  year = {2013},
  volume = {8084},
  pages = {224-238},
  doi = {http://dx.doi.org/10.1007/978-3-642-39742-4_17}
}
Xie, X., Kuo, F.-C., Chen, T.Y., Yoo, S. and Harman, M. Theoretical Analysis of GP-Evolved Risk Evaluation Formulas for Spectrum Based Fault Localisation 2013 (RN/13/06)  techreport URL 
Abstract: Risk evaluation formulae convert program spectrum data from test executions into suspiciousness score, according to which statements are ranked to aid debugging activities. Designing such formulas remained largely a manual task until Genetic Programming has been recently applied: resulting formulae showed promising performance in empirical evaluation. We investigate the GP-evolved formulae theoretically and prove that GP has produced four maximal formulae that had not been known before. More interestingly, some of the newly found maximal formulae show characteristics that may seem inconsistent with human intuition. This is the first SBSE result with provable human competitiveness.
BibTeX:
@techreport{XieKCYH13b,
  author = {Xiaoyuan Xie and Fei-Ching Kuo and Tsong Yueh Chen and Shin Yoo and Mark Harman},
  title = {Theoretical Analysis of GP-Evolved Risk Evaluation Formulas for Spectrum Based Fault Localisation},
  year = {2013},
  number = {RN/13/06},
  url = {http://www.cs.ucl.ac.uk/fileadmin/UCL-CS/research/rn-13-06__2_.pdf}
}
Yao, X. Some Recent Work on Mutli-objective Approaches to Search-Based Software Engineering 2013
Vol. 8084Proceedings of the 5th International Symposium on Search Based Software Engineering (SSBSE '13) - Keynote paper, pp. 4-15 
inproceedings DOI  
Abstract: Multi-objective algorithms have been used to solve difficult software engineering problems for a long time. This article summarises some selected recent work of applying latest meta-heuristic optimisation algorithms and machine learning algorithms to software engineering problems, including software module clustering, testing resource allocation in modular software system, protocol tuning, Java container testing, software project scheduling, software project effort estimation, and software defect prediction. References will be given, from which the details of such application of computational intelligence techniques to software engineering problems can be found.
BibTeX:
@inproceedings{Yao13,
  author = {Xin Yao},
  title = {Some Recent Work on Mutli-objective Approaches to Search-Based Software Engineering},
  booktitle = {Proceedings of the 5th International Symposium on Search Based Software Engineering (SSBSE '13) - Keynote paper},
  publisher = {Springer},
  year = {2013},
  volume = {8084},
  pages = {4-15},
  doi = {http://dx.doi.org/10.1007/978-3-642-39742-4_2}
}
Yoo, S., Harman, M. and Clark, D. Fault Localization Prioritization: Comparing Information-theoretic and Coverage-based Approaches 2013 ACM Transactions on Software Engineering and Methodology
Vol. 22(3) 
article DOI  
Abstract: Test case prioritization techniques seek to maximize early fault detection. Fault localization seeks to use test cases already executed to help find the fault location. There is a natural interplay between the two techniques; once a fault is detected, we often switch focus to fault fixing, for which localization may be a first step. In this article we introduce the Fault Localization Prioritization (FLP) problem, which combines prioritization and localization. We evaluate three techniques: a novel FLP technique based on information theory, FLINT (Fault Localization using INformation Theory), that we introduce in this article, a standard Test Case Prioritization (TCP) technique, and a “test similarity technique” used in previous work. Our evaluation uses five different releases of four software systems. The results indicate that FLP and TCP can statistically significantly reduce fault localization costs for 73% and 76% of cases, respectively, and that FLINT significantly outperforms similarity-based localization techniques in 52% of the cases considered in the study.
BibTeX:
@article{YooHC13,
  author = {Shin Yoo and Mark Harman and David Clark},
  title = {Fault Localization Prioritization: Comparing Information-theoretic and Coverage-based Approaches},
  journal = {ACM Transactions on Software Engineering and Methodology},
  year = {2013},
  volume = {22},
  number = {3},
  doi = {http://dx.doi.org/10.1145/2491509.2491513}
}
Zhang, Y., Harman, M. and Lim, S.L. Empirical Evaluation of Search Based Requirements Interaction Management 2013 Information and Software Technology
Vol. 55(1), pp. 126-152 
article DOI  
Abstract: Context Requirements optimization has been widely studied in the Search Based Software Engineering (SBSE) literature. However, previous approaches have not handled requirement interactions, such as the dependencies that may exist between requirements, and, or, precedence, cost- and value-based constraints. Objective To introduce and evaluate a Multi-Objective Search Based Requirements Selection technique, using chromosome repair and to evaluate it on both synthetic and real world data sets, in order to assess its effectiveness and scalability. The paper extends and improves upon our previous conference paper on requirements interaction management.1 Method The popular multi-objective evolutionary algorithm NSGA-II was used to produce baseline data for each data set in order to determine how many solutions on the Pareto front fail to meet five different requirement interaction constraints. The results for this baseline data are compared to those obtained using the archive based approach previously studied and the repair based approach introduced in this paper. Results The repair based approach was found to produce more solutions on the Pareto front and better convergence and diversity of results than the previously studied NSGA-II and archive-based NSGA-II approaches based on Kruskal–Wallis test in most cases. The repair based approach was also found to scale almost as well as the previous approach. Conclusion There is evidence to indicate that the repair based algorithm introduced in this paper is a suitable technique for extending previous work on requirements optimization to handle the requirement interaction constraints inherent in requirement interactions arising from dependencies, and, or, precedence, cost- and value-based constraints.
BibTeX:
@article{ZhangHL13,
  author = {Yuanyuan Zhang and Mark Harman and Soo Ling Lim},
  title = {Empirical Evaluation of Search Based Requirements Interaction Management},
  journal = {Information and Software Technology},
  year = {2013},
  volume = {55},
  number = {1},
  pages = {126-152},
  doi = {http://dx.doi.org/10.1016/j.infsof.2012.03.007}
}
Aitken, J.M., Alexander, R., Kelly, T. and Poulding, S. Evolving Robust Networks for Systems-of-Systems 2012
Vol. 7515Proceedings of the 4th International Symposium on Search Based Software Engineering (SSBSE '12), pp. 30-44 
inproceedings DOI  
Abstract: Software systems that rely on ad-hoc networks are becoming increasingly complex and increasingly prevalent. Some of these systems provide vital functionality to military operations, emergency services and disaster relief; such systems may have significant impact on the safety of people involved in those operations. It is therefore important that those networks support critical software requirements, including those for latency of packet transfer. If a network ceases to meet the software’s requirements (e.g. due to a link failure) then engineers must be able to understand it well enough to reconfigure the network and restore it to a requirement-satisfying state. Given a complex network, it is difficult for a human to do this under time pressure. In this paper we present a search-based tool which takes a network defined using the Network Description Language (NDL), annotated with a set of network-hosted applications and a set of latency requirements between each. We then evolve variants of the network configuration which meet the requirements and are robust to single link failures. We use network calculus tools to get a fast, conservative evaluation of whether a given network meets its requirements. We demonstrate that this approach is viable, designs networks much faster than a human engineer could, and is superior to a random generate-and-test approach.
BibTeX:
@inproceedings{AitkenAKP12,
  author = {Jonathan M. Aitken and Rob Alexander and Tim Kelly and Simon Poulding},
  title = {Evolving Robust Networks for Systems-of-Systems},
  booktitle = {Proceedings of the 4th International Symposium on Search Based Software Engineering (SSBSE '12)},
  publisher = {Springer},
  year = {2012},
  volume = {7515},
  pages = {30-44},
  doi = {http://dx.doi.org/10.1007/978-3-642-33119-0_4}
}
Burton, F.R., Paige, R.F., Rose, L.M., Kolovos, D.S., Poulding, S. and Smith, S. Solving Acquisition Problems Using Model-Driven Engineering 2012
Vol. 7349Proceedings of 8th European Conference on Modelling Foundations and Applications (ECMFA '12), pp. 428-443 
inproceedings DOI  
Abstract: An acquisition problem involves the identification, procurement and management of resources that allow an organisation to achieve goals. Examples include through-life capability management (in the defense domain), and planning for the next release of a software system. The latter is representative of the challenges of acquisition, as solving the problem involves the assessment of the very many ways in which the different requirements of multiple heterogeneous customers may be satisfied. We present a novel approach to modelling acquisition problems, based on the use of Model-Driven Engineering principles and practices. The approach includes domain-specific modelling languages for acquisition problems, and uses model transformation to automatically generate potential solutions to the acquisition problem. We outline a prototype tool, built using the Epsilon model management framework. We illustrate the approach and tool on an example of the next release acquisition problem.
BibTeX:
@inproceedings{BurtonPRKPS12,
  author = {Frank R. Burton and Richard F. Paige and Louis M. Rose and Dimitrios S. Kolovos and Simon Poulding and Simon Smith},
  title = {Solving Acquisition Problems Using Model-Driven Engineering},
  booktitle = {Proceedings of 8th European Conference on Modelling Foundations and Applications (ECMFA '12)},
  publisher = {Springer},
  year = {2012},
  volume = {7349},
  pages = {428-443},
  doi = {http://dx.doi.org/10.1007/978-3-642-31491-9_32}
}
Chicano, F., Daolio, F., Ochoa, G., Vérel, Sé., Tomassini, M. and Alba, E. Local Optima Networks, Landscape Autocorrelation and Heuristic Search Performance 2012
Vol. 7492Proceedings of the 12th international conference on Parallel Problem Solving from Nature (PPSN '12), pp. 337-347 
inproceedings DOI  
Abstract: Recent developments in fitness landscape analysis include the study of Local Optima Networks (LON) and applications of the Elementary Landscapes theory. This paper represents a first step at combining these two tools to explore their ability to forecast the performance of search algorithms. We base our analysis on the Quadratic Assignment Problem (QAP) and conduct a large statistical study over 600 generated instances of different types. Our results reveal interesting links between the network measures, the autocorrelation measures and the performance of heuristic search algorithms.
BibTeX:
@inproceedings{ChicanoDOVTA12,
  author = {Francisco Chicano and Fabio Daolio and Gabriela Ochoa and Sébastien Vérel and Marco Tomassini and Enrique Alba},
  title = {Local Optima Networks, Landscape Autocorrelation and Heuristic Search Performance},
  booktitle = {Proceedings of the 12th international conference on Parallel Problem Solving from Nature (PPSN '12)},
  publisher = {Springer},
  year = {2012},
  volume = {7492},
  pages = {337-347},
  doi = {http://dx.doi.org/10.1007/978-3-642-32964-7_34}
}
Ó Cinnéide, M., Tratt, L., Harman, M., Counsell, S. and Moghadam, I.H. Experimental Assessment of Software Metrics Using Automated Refactoring 2012 Proceedings of the ACM-IEEE 6th International Symposium on Empirical Software Engineering and Measurement (ESEM '12), pp. 49-58  inproceedings DOI  
Abstract: A large number of software metrics have been proposed in the literature, but there is little understanding of how these metrics relate to one another. We propose a novel experimental technique, based on search-based refactoring, to assess software metrics and to explore relationships between them. Our goal is not to improve the program being refactored, but to assess the software metrics that guide the auto- mated refactoring through repeated refactoring experiments. We apply our approach to five popular cohesion metrics using eight real-world Java systems, involving 300,000 lines of code and over 3,000 refactorings. Our results demonstrate that cohesion metrics disagree with each other in 55% of cases, and show how our approach can be used to reveal novel and surprising insights into the software metrics under investigation.
BibTeX:
@inproceedings{CinneideTHCM12,
  author = {Mel Ó Cinnéide and Laurence Tratt and Mark Harman and Steve Counsell and Iman Hemati Moghadam},
  title = {Experimental Assessment of Software Metrics Using Automated Refactoring},
  booktitle = {Proceedings of the ACM-IEEE 6th International Symposium on Empirical Software Engineering and Measurement (ESEM '12)},
  publisher = {ACM},
  year = {2012},
  pages = {49-58},
  doi = {http://dx.doi.org/10.1145/2372251.2372260}
}
Daolio, F., Vérel, Sé., Ochoa, G. and Tomassini, M. Local Optima Networks and the Performance of Iterated Local Search 2012 Proceedings of The 14th International Conference on Genetic and Evolutionary Computation (GECCO '12), pp. 369-376  inproceedings DOI  
Abstract: Local Optima Networks (LONs) have been recently proposed as an alternative model of combinatorial fitness landscapes. The model compresses the information given by the whole search space into a smaller mathematical object that is the graph having as vertices the local optima and as edges the possible weighted transitions between them. A new set of metrics can be derived from this model that capture the distribution and connectivity of the local optima in the underlying configuration space. This paper departs from the descriptive analysis of local optima networks, and actively studies the correlation between network features and the performance of a local search heuristic. The NK family of landscapes and the Iterated Local Search metaheuristic are considered. With a statistically-sound approach based on multiple linear regression, it is shown that some LONs' features strongly influence and can even partly predict the performance of a heuristic search algorithm. This study validates the expressive power of LONs as a model of combinatorial fitness landscapes.
BibTeX:
@inproceedings{DaolioVOT12,
  author = {Fabio Daolio and Sébastien Vérel and Gabriela Ochoa and Marco Tomassini},
  title = {Local Optima Networks and the Performance of Iterated Local Search},
  booktitle = {Proceedings of The 14th International Conference on Genetic and Evolutionary Computation (GECCO '12)},
  publisher = {ACM},
  year = {2012},
  pages = {369-376},
  doi = {http://dx.doi.org/10.1145/2330163.2330217}
}
Harman, M. Overview of TASE 2012 talk on Search Based Software Engineering - Keynote 2012 Proceedings of the 6th International Symposium on Theoretical Aspects of Software Engineering (TASE '12), pp. 3-4  inproceedings DOI  
Abstract: This is an overview of the keynote presentation on SBSE at the Sixth IEEE International Symposium on Theoretical Aspects of Software Engineering (TASE 2012), held on the 4th-6th July 2012 in Beijing, China.
BibTeX:
@inproceedings{Harman12,
  author = {Mark Harman},
  title = {Overview of TASE 2012 talk on Search Based Software Engineering - Keynote},
  booktitle = {Proceedings of the 6th International Symposium on Theoretical Aspects of Software Engineering (TASE '12)},
  publisher = {IEEE},
  year = {2012},
  pages = {3-4},
  doi = {http://dx.doi.org/10.1109/TASE.2012.24}
}
Harman, M. An Introduction to Search Based Software Engineering - Keynote 2012 Proceedings of the 1st Chinese Search Based Software Engineering Workshop (CSBSE '12)  inproceedings  
BibTeX:
@inproceedings{Harman12b,
  author = {Mark Harman},
  title = {An Introduction to Search Based Software Engineering - Keynote},
  booktitle = {Proceedings of the 1st Chinese Search Based Software Engineering Workshop (CSBSE '12)},
  year = {2012}
}
Harman, M. The Role of Artificial Intelligence in Software Engineering 2012 Proceedings of the 1st International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE '12), pp. 1-6  inproceedings DOI  
Abstract: There has been a recent surge in interest in the application of Artificial Intelligence (AI) techniques to Software Engineering (SE) problems. The work is typified by recent advances in Search Based Software Engineering, but also by long established work in Probabilistic reasoning and machine learning for Software Engineering. This paper explores some of the relationships between these strands of closely related work, arguing that they have much in common and sets out some future challenges in the area of AI for SE.
BibTeX:
@inproceedings{Harman12c,
  author = {Mark Harman},
  title = {The Role of Artificial Intelligence in Software Engineering},
  booktitle = {Proceedings of the 1st International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE '12)},
  publisher = {IEEE},
  year = {2012},
  pages = {1-6},
  doi = {http://dx.doi.org/10.1109/RAISE.2012.6227961}
}
Harman, M., Burke, E., Clark, J. and Yao, X. Dynamic Adaptive Search Based Software Engineering 2012 Proceedings of the 6th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM '12) - Keynote Paper, pp. 1-8  inproceedings DOI  
Abstract: Search Based Software Engineering (SBSE) has proved to be a very effective way of optimising software engineering problems. Nevertheless, its full potential as a means of dynamic adaptivity remains under explored. This paper sets out the agenda for Dynamic Adaptive SBSE, in which the optimisation is embedded into deployed software to create self-optimising adaptive systems. Dynamic Adaptive SBSE will move the research agenda forward to encompass both software development processes and the software products they produce, addressing the long-standing, and as yet largely unsolved, grand challenge of self-adaptive systems.
BibTeX:
@inproceedings{HarmanBCY12,
  author = {Mark Harman and Edmund Burke and John Clark and Xin Yao},
  title = {Dynamic Adaptive Search Based Software Engineering},
  booktitle = {Proceedings of the 6th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM '12) - Keynote Paper},
  publisher = {ACM},
  year = {2012},
  pages = {1-8},
  doi = {http://dx.doi.org/10.1145/2372251.2372253}
}
Harman, M., Langdon, B., Jia, Y., White, D., Arcuri, A. and Clark, J. The GISMOE Challenge: Constructing the Pareto Program Surface Using Genetic Programming to Find Better Programs 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE '12) - Keynote Paper, pp. 1-14  inproceedings DOI  
Abstract: Optimising programs for non-functional properties such as speed, size, throughput, power consumption and bandwidth can be demanding; pity the poor programmer who is asked to cater for them all at once! We set out an alternate vision for a new kind of software development environment inspired by recent results from Search Based Software Engineering (SBSE). Given an input program that satisfies the functional requirements, the proposed programming environment will automatically generate a set of candidate program implementations, all of which share functionality, but each of which differ in their non-functional trade offs. The software designer navigates this diverse Pareto surface of candidate implementations, gaining insight into the trade offs and selecting solutions for different platforms and environments, thereby stretching beyond the reach of current compiler technologies. Rather than having to focus on the details required to manage complex, inter-related and conflicting, non-functional trade offs, the designer is thus freed to explore, to understand, to control and to decide rather than to construct.
BibTeX:
@inproceedings{HarmanLJWAC12,
  author = {Mark Harman and Bill Langdon and Yue Jia and David White and Andrea Arcuri and John Clark},
  title = {The GISMOE Challenge: Constructing the Pareto Program Surface Using Genetic Programming to Find Better Programs},
  booktitle = {Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE '12) - Keynote Paper},
  publisher = {ACM},
  year = {2012},
  pages = {1-14},
  doi = {http://dx.doi.org/10.1145/2351676.2351678}
}
Harman, M., Mansouri, S.A. and Zhang, Y. Search-based Software Engineering: Trends, Techniques and Applications 2012 ACM Computing Surveys (CSUR)
Vol. 45(1) 
article DOI  
Abstract: In the past five years there has been a dramatic increase in work on Search-Based Software Engineering (SBSE), an approach to Software Engineering (SE) in which Search-Based Optimization (SBO) algorithms are used to address problems in SE. SBSE has been applied to problems throughout the SE lifecycle, from requirements and project planning to maintenance and reengineering. The approach is attractive because it offers a suite of adaptive automated and semiautomated solutions in situations typified by large complex problem spaces with multiple competing and conflicting objectives. This article provides a review and classification of literature on SBSE. The work identifies research trends and relationships between the techniques applied and the applications to which they have been applied and highlights gaps in the literature and avenues for further research.
BibTeX:
@article{HarmanMZ12,
  author = {Mark Harman and S. Afshin Mansouri and Yuanyuan Zhang},
  title = {Search-based Software Engineering: Trends, Techniques and Applications},
  journal = {ACM Computing Surveys (CSUR)},
  year = {2012},
  volume = {45},
  number = {1},
  doi = {http://dx.doi.org/10.1145/2379776.2379787}
}
Langdon, W.B. and Harman, M. Genetically Improving 50000 Lines of C++ 2012 (RN/12/09)  techreport URL 
Abstract: There has been much recent interest in genetic improvement of programs. However, genetic improvement has yet to scale beyond toy laboratory programs. We seek to overcome this scalability barrier. We evolved a widely-used and highly complex 50000 line system, seeking improved versions that are faster than the original, yet at least as good semantically. Our approach found a version that is 70 times faster (on average) and is also a small semantic improvement on the original.
BibTeX:
@techreport{LangdonH12,
  author = {William B. Langdon and Mark Harman},
  title = {Genetically Improving 50000 Lines of C++},
  year = {2012},
  number = {RN/12/09},
  url = {http://www.cs.ucl.ac.uk/fileadmin/UCL-CS/research/RN_12_09.pdf}
}
Millard, A.G., White, D.R. and Clark, J.A. Searching for Pareto-optimal Randomised Algorithms 2012
Vol. 7515Proceedings of the 4th International Symposium on Search Based Software Engineering (SSBSE '12), pp. 183-197 
inproceedings DOI  
Abstract: Randomised algorithms traditionally make stochastic decisions based on the result of sampling from a uniform probability distribution, such as the toss of a fair coin. In this paper, we relax this constraint, and investigate the potential benefits of allowing randomised algorithms to use non-uniform probability distributions. We show that the choice of probability distribution influences the non-functional properties of such algorithms, providing an avenue of optimisation to satisfy non-functional requirements. We use Multi-Objective Optimisation techniques in conjunction with Genetic Algorithms to investigate the possibility of trading-off non-functional properties, by searching the space of probability distributions. Using a randomised self-stabilising token circulation algorithm as a case study, we show that it is possible to find solutions that result in Pareto-optimal trade-offs between non-functional properties, such as self-stabilisation time, service time, and fairness.
BibTeX:
@inproceedings{MillardWC12,
  author = {Alan G. Millard and David R. White and John A. Clark},
  title = {Searching for Pareto-optimal Randomised Algorithms},
  booktitle = {Proceedings of the 4th International Symposium on Search Based Software Engineering (SSBSE '12)},
  publisher = {Springer},
  year = {2012},
  volume = {7515},
  pages = {183-197},
  doi = {http://dx.doi.org/10.1007/978-3-642-33119-0_14}
}
Minku, L.L., Sudholt, D. and Yao, X. Evolutionary Algorithms for the Project Scheduling Problem: Runtime Analysis and Improved Design 2012 Proceedings of the 14th International Conference on Genetic and Evolutionary Computation (GECCO '12), pp. 1221-1228  inproceedings DOI  
Abstract: Even though genetic algorithms (GAs) have been used for solving the project scheduling problem (PSP), it is not well understood which problem characteristics make it difficult/easy for GAs. We present the first runtime analysis for the PSP, revealing what problem features can make PSP easy or hard. This allows to assess the performance of GAs and to make informed design choices. Our theory has inspired a new evolutionary design, including normalisation of employees' dedication for different tasks to eliminate the problem of exceeding their maximum dedication. Theoretical and empirical results show that our design is very effective in terms of hit rate and solution quality.
BibTeX:
@inproceedings{MinkuSY12,
  author = {Leandro L. Minku and Dirk Sudholt and Xin Yao},
  title = {Evolutionary Algorithms for the Project Scheduling Problem: Runtime Analysis and Improved Design},
  booktitle = {Proceedings of the 14th International Conference on Genetic and Evolutionary Computation (GECCO '12)},
  publisher = {ACM},
  year = {2012},
  pages = {1221-1228},
  doi = {http://dx.doi.org/10.1145/2330163.2330332}
}
Minku, L.L. and Yao, X. Can Cross-company Data Improve Performance in Software Effort Estimation? 2012 Proceedings of the 8th International Conference on Predictive Models in Software Engineering (PROMISE '12), pp. 69-78  inproceedings DOI  
Abstract: Background: There has been a long debate in the software engineering literature concerning how useful cross-company (CC) data are for software effort estimation (SEE) in comparison to within-company (WC) data. Studies indicate that models trained on CC data obtain either similar or worse performance than models trained solely on WC data. Aims: We aim at investigating if CC data could help to increase performance and under what conditions. Method: The work concentrates on the fact that SEE is a class of online learning tasks which operate in changing environments, even though most work so far has neglected that. We conduct an analysis based on the performance of different approaches considering CC and WC data. These are: (1) an approach not designed for changing environments, (2) approaches designed for changing environments and (3) a new online learning approach able to identify when CC data are helpful or detrimental. Results: Interesting features of data sets commonly used in the SEE literature are revealed, showing that different subsets of CC data can be beneficial or detrimental depending on the moment in time. The newly proposed approach is able to benefit from that, successfully using CC data to improve performance over WC models. Conclusions: This work not only shows that CC data can help to increase performance for SEE tasks, but also demonstrates that the online nature of software prediction tasks should be exploited, being an important issue to be considered in the future.
BibTeX:
@inproceedings{MinkuY12,
  author = {Leandro L. Minku and Xin Yao},
  title = {Can Cross-company Data Improve Performance in Software Effort Estimation?},
  booktitle = {Proceedings of the 8th International Conference on Predictive Models in Software Engineering (PROMISE '12)},
  publisher = {ACM},
  year = {2012},
  pages = {69-78},
  doi = {http://dx.doi.org/10.1145/2365324.2365334}
}
Ochoa, G., Walker, J., Hyde, M. and Curtois, T. Adaptive Evolutionary Algorithms and Extensions to the HyFlex Hyper-heuristic Framework 2012 Proceedings of the 12th international conference on Parallel Problem Solving from Nature (PPSN '12), pp. 418-427  inproceedings DOI  
Abstract: HyFlex is a recently proposed software framework for implementing hyper-heuristics and domain-independent heuristic optimisation algorithms [13]. Although it was originally designed to implement hyper-heuristics, it provides a population and a set of move operators of different types. This enable the implementation of adaptive versions of other heuristics such as evolutionary algorithms and iterated local search. The contributions of this article are twofold. First, a number of extensions to the HyFlex framework are proposed and implemented that enable the design of more effective adaptive heuristics. Second, it is demonstrated that adaptive evolutionary algorithms can be implemented within the framework, and that the use of crossover and a diversity metric produced improved results, including a new best-known solution, on the studied vehicle routing problem.
BibTeX:
@inproceedings{OchoaWHC12,
  author = {Gabriela Ochoa and James Walker and Matthew Hyde and Tim Curtois},
  title = {Adaptive Evolutionary Algorithms and Extensions to the HyFlex Hyper-heuristic Framework},
  booktitle = {Proceedings of the 12th international conference on Parallel Problem Solving from Nature (PPSN '12)},
  publisher = {Springer},
  year = {2012},
  pages = {418-427},
  doi = {http://dx.doi.org/10.1007/978-3-642-32964-7_42}
}
Poulding, S. Tutorial: High Performance SBSE using Commodity Graphics Cards 2012
Vol. 7515Proceedings of the 4th International Symposium on Search Based Software Engineering (SSBSE '12), pp. 29 
inproceedings DOI  
Abstract: In contrast to manual software engineering techniques, search-based software engineering (SBSE) is able to exploit high performance computing resources in order to improve scalability and to solve hitherto intractable engineering problems. This is an increasingly viable approach: affordable high performance computing is readily available using cloud services, server clusters, and—perhaps surprisingly—commodity graphics cards that may already be in your laptop or PC.
BibTeX:
@inproceedings{Poulding12,
  author = {Simon Poulding},
  title = {Tutorial: High Performance SBSE using Commodity Graphics Cards},
  booktitle = {Proceedings of the 4th International Symposium on Search Based Software Engineering (SSBSE '12)},
  publisher = {Springer},
  year = {2012},
  volume = {7515},
  pages = {29},
  doi = {http://dx.doi.org/10.1007/978-3-642-33119-0_3}
}
Swan, J., Harman, M., Ochoa, G. and Burke, E. Generic Software Subgraph Isomorphism 2012 Proceedings of the 4th International Symposium on Search Based Software Engineering (SSBSE '12) - Fast Abstracts  inproceedings URL 
BibTeX:
@inproceedings{SwanHOB12,
  author = {Jerry Swan and Mark Harman and Gabriela Ochoa and Edmund Burke},
  title = {Generic Software Subgraph Isomorphism},
  booktitle = {Proceedings of the 4th International Symposium on Search Based Software Engineering (SSBSE '12) - Fast Abstracts},
  year = {2012},
  url = {http://selab.fbk.eu/ssbse2012/documents/SSBSE_2012_Fast_Abstracts.pdf#page=53}
}
Swan, J., Burke, E., Kendall, G. and Özcan, E. Heuristic Function Resynthesis 2012 OR54 Conference, Edinburgh  misc  
BibTeX:
@misc{SwanOR54,
  author = {Jerry Swan and Edmund Burke and Graham Kendall and Ender Özcan},
  title = {Heuristic Function Resynthesis},
  year = {2012}
}
Swan, J., Özcan, E. and Kendall, G. Co-evolving add and delete hyper-heuristics 2012 Proceedings of the 9th International Conference on the Practice and Theory of Automated Timetabling (PATAT'12), pp. 395–399  inproceedings URL 
BibTeX:
@inproceedings{SwanPATAT2012,
  author = {Jerry Swan and Ender Özcan and Graham Kendall},
  title = {Co-evolving add and delete hyper-heuristics},
  booktitle = {Proceedings of the 9th International Conference on the Practice and Theory of Automated Timetabling (PATAT'12)},
  year = {2012},
  pages = {395–399},
  url = {http://www.cs.stir.ac.uk/~jsw/coevolutionary-add-delete-hyperheuristics.pdf}
}
Yang, X., Tang, K. and Yao, X. A Learning-to-Rank Algorithm for Constructing Defect Prediction Models 2012
Vol. 7435Proceedings of the 13th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL '12), pp. 167-175 
inproceedings DOI  
Abstract: This paper applies the learning-to-rank approach to software defect prediction. Ranking software modules in order of defect-proneness is important to ensure that testing resources are allocated efficiently. However, prediction models that are optimized for predicting explicitly the number of defects often fail to correctly predict rankings based on those defect numbers. We show in this paper that the model construction methods, which include the ranking performance measure in the objective function, perform better in predicting defect-proneness rankings of multiple modules. We present the experimental results, in which our method is compared against three other methods from the literature, using five publicly available data sets.
BibTeX:
@inproceedings{YangTY12,
  author = {Xiaoxing Yang and Ke Tang and Xin Yao},
  title = {A Learning-to-Rank Algorithm for Constructing Defect Prediction Models},
  booktitle = {Proceedings of the 13th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL '12)},
  publisher = {Springer},
  year = {2012},
  volume = {7435},
  pages = {167-175},
  doi = {http://dx.doi.org/10.1007/978-3-642-32639-4_21}
}
Yoo, S. Evolving Human Competitive Spectra-Based Fault Localisation Techniques 2012
Vol. 7515Proceedings of the 4th International Symposium on Search Based Software Engineering (SSBSE '12), pp. 244-258 
inproceedings DOI  
Abstract: Spectra-Based Fault Localisation (SBFL) aims to assist debugging by applying risk evaluation formulæ (sometimes called suspiciousness metrics) to program spectra and ranking statements according to the predicted risk. Designing a risk evaluation formula is often an intuitive process done by human software engineer. This paper presents a Genetic Programming (GP) approach for evolving risk assessment formulæ. The empirical evaluation using 92 faults from four Unix utilities produces promising results. Equations evolved by Genetic Programming can consistently outperform many of the human-designed formulæ, such as Tarantula, Ochiai, Jaccard, Ample, and Wong1/2, up to 6 times. More importantly, they can perform equally as well as Op2, which was recently proved to be optimal against If-Then-Else-2 (ITE2) structure, or even outperform it against other program structures.
BibTeX:
@inproceedings{Yoo12,
  author = {Shin Yoo},
  title = {Evolving Human Competitive Spectra-Based Fault Localisation Techniques},
  booktitle = {Proceedings of the 4th International Symposium on Search Based Software Engineering (SSBSE '12)},
  publisher = {Springer},
  year = {2012},
  volume = {7515},
  pages = {244-258},
  doi = {http://dx.doi.org/10.1007/978-3-642-33119-0_18}
}
Zliobaite, I., Bifet, A., Gaber, M., Gabrys, B., Gama, J., Minku, L. and Musial, K. Next Challenges for Adaptive Learning Systems 2012 ACM SIGKDD Explorations Newsletter
Vol. 14(1), pp. 48-55 
article DOI  
Abstract: Learning from evolving streaming data has become a 'hot' research topic in the last decade and many adaptive learning algorithms have been developed. This research was stimulated by rapidly growing amounts of industrial, transactional, sensor and other business data that arrives in real time and needs to be mined in real time. Under such circumstances, constant manual adjustment of models is in-efficient and with increasing amounts of data is becoming infeasible. Nevertheless, adaptive learning models are still rarely employed in business applications in practice. In the light of rapidly growing structurally rich 'big data', new generation of parallel computing solutions and cloud computing services as well as recent advances in portable computing devices, this article aims to identify the current key research directions to be taken to bring the adaptive learning closer to application needs. We identify six forthcoming challenges in designing and building adaptive learning (pre-diction) systems: making adaptive systems scalable, dealing with realistic data, improving usability and trust, integrat-ing expert knowledge, taking into account various application needs, and moving from adaptive algorithms towards adaptive tools. Those challenges are critical for the evolving stream settings, as the process of model building needs to be fully automated and continuous.
BibTeX:
@article{ZliobaiteBGGGMM12,
  author = {Indre Zliobaite and Albert Bifet and Mohamed Gaber and Bogdan Gabrys and Joao Gama and Leandro Minku and Katarzyna Musial},
  title = {Next Challenges for Adaptive Learning Systems},
  journal = {ACM SIGKDD Explorations Newsletter},
  year = {2012},
  volume = {14},
  number = {1},
  pages = {48-55},
  doi = {http://dx.doi.org/10.1145/2408736.2408746}
}