EVOSTAR2024: EVOSTAR
PROGRAM FOR FRIDAY, APRIL 5TH
Days:
previous day
all days

View: session overviewtalk overview

09:30-11:35 Session 12A: EvoMUSART 5 - Natural Language-Driven Audio-Visual Generation
Location: Room A
09:30
AI-Driven Meditation: Personalization for Inner Peace

ABSTRACT. Meditation is a mindful practice known for its difficulties, requiring focused attention despite distractions. Many people have traditionally relied on meditation apps with calming audio for support. This paper introduces an innovative AI-driven system aimed at improving the meditation experience, personalized to each user's needs. This system comprises of three core parts. First, it uses a language model to create meditation scripts that match user preferences. Second, it converts these scripts into audio, accompanied by selected background music to create a serene atmosphere. Lastly, a Compositional Pattern-Producing Network (CPPN) generated visually appealing videos featuring intricate patterns influenced by sentiment analysis and input audio. One of the system's strengths is its adaptability since users can choose from various text generation options, and the system adjusts the video color based on the type of meditation selected. An experiment, involving 14 participants, demonstrated comparable content and audio quality with traditional methods. Participants perceived the system as more personalized, expressing a preference for tailored meditation practices and indicating potential for increased user engagement. In summary, this AI-powered meditation system represents a significant advancement in the field, providing personalized, immersive experiences that integrate text, audio, and visuals. Its ability to adapt to users' preferences holds promise for enhancing meditation outcomes and fostering inner peace.

09:55
No Longer Trending on Artstation: Prompt Analysis of Generative AI Art

ABSTRACT. Image generation using generative AI is rapidly becoming a major new source of visual media, with billions of AI generated images created using diffusion models such as Stable DIffusion and Midjourney over the last few years. In this paper we collect and analyse over 3 million prompts and the images they generate. Using natural language processing, topic analysis and visualisation methods we aim to understand collectively how people are using text prompts and the impact of these systems on artists, and more broadly on visual culture. Our study shows that prompting focuses largely on surface aesthetics, reinforcing cultural norms and popular conventional representations and imagery. We also find that many users focus on popular topics (such as making colouring books, fantasy art, or Christmas cards), suggesting that the dominant use for the systems analysed is recreational rather than artistic.

10:20
MAP-Elites with Transverse Assessment for Multimodal Problems in Creative Domains

ABSTRACT. The recent advances in language-based generative models have paved the way for the orchestration of multiple generators of different artefact types (text, image, audio, etc.) into one system. Presently, many open-source pre-trained models combine text with other modalities, enabling shared vector embeddings to be compared across different generators. Within this context we propose a novel approach to handle multimodal creative tasks using Quality Diversity evolution. Our contribution is a variation of the MAP-Elites algorithm, MAP-Elites with Transverse Assessment (MEliTA), which is tailored for multimodal creative tasks and leverages deep learned models that assess coherence across modalities. MEliTA decouples the artefacts' modalities and promotes cross-pollination between elites. As a test bed for this algorithm, we generate text descriptions and cover images for a hypothetical video game and assign each artefact a unique modality-specific behavioural characteristic. Results indicate that MEliTA can improve text-to-image mappings within the solution space, compared to a baseline MAP-Elites algorithm that strictly treats each image-text pair as one solution. Our approach represents a significant step forward in multimodal bottom-up orchestration and lays the groundwork for more complex systems coordinating multimodal creative agents in the future.

10:45
The Chordinator: Modeling Music Harmony By Implementing Transformer Networks and Token Strategies

ABSTRACT. This paper compares two tokenization strategies for modeling chord progressions using the transformer architecture trained with a large dataset of chord progressions, exemplifying styles such as jazz, rock, pop, blues, or music for cinema. The first strategy includes a tokenization method treating all different chords as unique elements, which results in a vocabulary of 5202 independent tokens. The second strategy expresses the chords as a dynamic tuple describing root, nature (e.g., major, minor, diminished, etc.), and extensions (e.g., additions or alterations) producing a specific vocabulary of 59 tokens related to chords and 75 tokens for style, bars, form, and format. In the second approach, MIDI embeddings are added into the positional embedding layer of the transformer architecture, with an array of eight values related to the notes forming the chords. We propose a trigram analysis addition to the dataset to compare the generated chord progressions with the training dataset, which reveals common progressions and the extent to which a sequence is duplicated. We analyze progressions generated by the models comparing HITS@k metrics and human evaluation of 10 participants, rating the plausibility of the progressions as potential music compositions from a musical perspective. The second model reported lower validation loss, better metrics, and more musical consistency in the suggested progressions. The contribution of the proposed system is a dataset with 70.812 chord progressions (songs) from several popular music styles. A tokenization method to reduce the token vocabulary without compromising the chord vocabulary. A positional encoding strategy with domain-specific embeddings of the transformer network to better suit music information.

11:10
Deep Learning Approaches for Sung Vowel Classification

ABSTRACT. Phoneme classification is an important part of automatic speech recognition systems. However, attempting to classify phonemes during singing has been far less studied. In this work, we investigate sung vowel classification, a subset of the phoneme classification problem. Many prior approaches that attempt to classify spoken or sung vowels rely upon spectral feature extraction, such as formants or Mel-frequency cepstral coefficients. We explore classifying sung vowels with deep neural networks trained directly on raw audio. Using a dataset of singing excerpts, we compare three neural models and two spectral models for classifying five sung vowels performed in a variety of different vocal techniques. We find that our neural models achieved accuracies between 68.4% and 79.6%, and our spectral models failed to discern vowels. Of the neural models, we find that a fine-tuned transformer outperformed other models; however, a convolutional or recurrent model may provide satisfactory results in resource-limited scenarios. This result implies that neural approaches trained directly on raw audio, without spectral features, are viable approaches for singing phoneme classification and deserve further exploration.

09:30-11:35 Session 12B: EvoCOP 3 - Operators and theory
Location: Room B
09:30
A memetic algorithm with adaptive operator selection for graph coloring

ABSTRACT. This paper presents a memetic algorithm with adaptive operator selection for the k-coloring problem and the weighted vertex coloring problem. Our method uses online selection to adaptively determine the crossover and local search operators to apply during the search to improve the efficiency of the algorithm. This leads to better results than without the operator selection and allows us to find a new coloring with 404 colors for C2000.9, one of the largest and densest instances of the classical DIMACS coloring benchmarks. The proposed method also finds three new best solutions for the weighted vertex coloring problem. We investigate the impacts of the different algorithmic variants on both problems.

09:55
Reduction-Based MAX-3SAT with Low Nonlinearity and Lattices Under Recombination

ABSTRACT. A new construction is introduced for creating random MAX-3SAT instances with low nonlinearity. Instead of generating random clauses, we generate random SAT expressions over 3 variables and then convert these into SAT clauses. This yields more structured problems with lower nonlinearity. Lower nonlinearity is usually associated with real-world MAX-SAT applications. We also introduce a new method for weighting MAX-SAT clauses that preserves low nonlinearity and also breaks up plateaus. We evaluate these new problems by enumeration of instances with n = 30 variables. We enumerate all local optima, and all tunnels between local optima found using partition crossover. One unexpected result is that partition crossover creates more tunnels on these semi-structured MAX-SAT problems compared to results on random or adjacent NK landscapes. We show that partition crossover induces hypercube lattices over subsets of local optima; all of the local optima which appear in a lattice can be evaluated with a single linear equation.

10:20
Where the Really Hard Quadratic Assignment Problems Are: the QAP-SAT instances

ABSTRACT. The Quadratic Assignment Problem (QAP) is one of the major domains in the field of evolutionary computation, and more widely in combinatorial optimization. This paper studies the phase transition of the QAP, which can be described as a dramatic change in the problem's computational complexity and satisfiability, within a narrow range of the problem parameters. To approach this phenomenon, we introduce a new QAP-SAT design of the initial problem based on submodularity to capture its difficulty with new features. This decomposition is studied experimentally using branch-and-bound and tabu search solvers. A phase transition parameter is then proposed. The critical parameter of phase transition satisfaction and that of the solving effort are shown to be highly correlated for tabu search, thus allowing the prediction of difficult instances.

10:45
Experimental and Theoretical Analysis of Local Search Optimising OBDD Variable Orderings

ABSTRACT. Building on recent interest in the analysis of the performance of randomised search heuristics for permutation problems we investigate the performance of local search when applied to the classical combinatorial optimisation problem of finding an optimal variable ordering for ordered binary decision diagrams, a data structure for Boolean functions. This brings theory-oriented analysis towards a practically relevant combinatorial optimisation problem. We investigate a class of benchmark functions as well as the leading bit of binary addition, both Boolean functions where the variable ordering makes the difference between linear and exponential size. We present experiments with two local search variants using five different operators for permutations from the literature. These experiments as well as theoretical results show which operators and local search variants perform best improving our understanding of the operators and local search in combinatorial optimisation.

11:10
A Theoretical Investigation Of Termination Criteria For Evolutionary Algorithms

ABSTRACT. We take a theoretical approach to analysing conditions for terminating evolutionary algorithms. After looking at situations where much is known about the particular algorithm and problem class, we consider a more generic approach. Schemes that depend purely on the previous time to improvement are shown not to work. An alternative criterion, the $\lambda$-parallel scheme, does terminate correctly (with high probability) for any randomised search heuristic algorithm on any problem, provided certain conditions on the improvement probabilities are met. A more natural and less costly approach is then presented based on the runtime so far. This is shown to work for the classes of monotonic and path problems (for Randomised Local Search). It remains an open question whether it works in a more general setting.

09:30-11:35 Session 12C: EML 3 - Applied EML
Location: Room C
09:30
Hybrid Surrogate Assisted Evolutionary Multiobjective Reinforcement Learning for Continuous Robot Control
PRESENTER: Atanu Mazumdar

ABSTRACT. Many real world reinforcement learning (RL) problems consist of multiple conflicting objective functions to be optimized simultaneously. Finding these optimal policies (known as Pareto optimal policies) for different preferences of objectives requires extensive state space exploration. Thus, obtaining a dense set of Pareto optimal policies is challenging and often reduces the sample efficiency. In this paper, we propose a hybrid multiobjective policy optimization approach for solving multiobjective reinforcement learning (MORL) problems with continuous actions. Our approach combines the faster convergence of multiobjective policy gradient (MOPG) and a surrogate assisted multiobjective evolutionary algorithm (MOEA) to produce a dense set of Pareto optimal policies. The solutions found by the MOPG algorithm are utilized to build computationally inexpensive surrogate models in the parameter space of the policies that approximate the return of policies. An MOEA is executed that utilizes the surrogates' mean prediction and uncertainty in the prediction to find approximate optimal policies. The final solution policies, which are quite few, are later evaluated using the simulator. Tests on multiobjective continuous action RL benchmarks show that a hybrid surrogate assisted multiobjective evolutionary optimizer with robust selection criterion produces a dense set of Pareto optimal policies without extensively exploring the state space. We also apply the proposed approach to train Pareto optimal agents for autonomous driving, where the hybrid approach produced superior results compared to a state-of-the-art MOPG algorithm.

09:55
Genetic Programming with Aggregate Channel Features for Flower Localization Using Limited Training Data

ABSTRACT. Flower localization is a crucial image pre-processing step for subsequent classification/recognition that confronts challenges with diverse flower species, varying imaging conditions, and limited data. Existing flower localization methods face limitations, including reliance on color information, low model interpretability, and a large demand for training data. This paper proposes a new genetic programming (GP) approach called ACFGP with a novel representation to automated flower localization with limited training data. The novel GP representation enables ACFGP to evolve effective programs for generating aggregate channel features and achieving flower localization in diverse scenarios. Comparative evaluations against the baseline benchmark algorithm and YOLOv8 demonstrate ACFGP's superior performance. Further analysis highlights the effectiveness of the aggregate channel features generated by ACFGP programs, demonstrating the superiority of ACFGP in addressing challenging flower localization tasks.

10:20
Progressive Self-Supervised Multi-Objective NAS for Image Classification

ABSTRACT. We present a novel methodology to search for convolutional neural networks (CNNs) with improved accuracy and reduced complexity within a self-supervised framework. Our aim is to search for competitive, yet simple, generic architectures that can be used for multiple tasks (i.e., as a pretrained model). This is achieved through cartesian genetic programming (CGP) for neural architecture search (NAS). Our approach integrates self-supervised learning with a progressive architecture search process. This synergy unfolds within the continuous domain which is tackled via multi-objective evolutionary algorithms (MOEAs). To empirically validate our proposal, we adopted a rigorous evaluation using the non-dominated sorting genetic algorithm II (NSGA-II) for the CIFAR-100, CIFAR-10, SVHN and CINIC-10 datasets. The experimental results showcase the competitiveness of our approach in relation to state-of-the-art proposals concerning both classification performance and model complexity. Additionally, the effectiveness of this method in achieving strong generalization can be inferred.

10:45
Hindsight Experience Replay with Evolutionary Decision Trees for Curriculum Goal Generation

ABSTRACT. Reinforcement learning (RL) algorithms often require a significant number of experiences to learn a policy capable of achieving desired goals in multi-goal robot manipulation tasks with sparse rewards. Hindsight Experience Replay (HER) is an existing method that improves learning efficiency by using failed trajectories and replacing the original goals with hindsight goals that are uniformly sampled from the visited states. However, HER has a limitation: the hindsight goals are mostly near the initial state, which hinders solving tasks efficiently if the desired goals are far from the initial state. To overcome this limitation, we introduce a curriculum learning method called HERDT (HER with Decision Trees). HERDT uses binary DTs to generate curriculum goals that guide a robotic agent progressively from an initial state toward a desired goal. During the warm-up stage, DTs are optimized using the Grammatical Evolution algorithm. In the training stage, curriculum goals are then sampled by DTs to help the agent navigate the environment. Since binary DTs generate discrete values, we fine-tune these curriculum points by incorporating a feedback value (i.e., Q value). This fine-tuning enables us to adjust the difficulty level of the generated curriculum points, ensuring that they are neither overly simplistic nor excessively challenging. In other words, these points are precisely tailored to match the robot's ongoing learning policy. We evaluate our proposed approach on different sparse reward robotic manipulation tasks and compare it with the state-of-the-art HER approach. Our results demonstrate that our method consistently outperforms or matches the existing approach in all the tested tasks.