UCL App Store Analysis Group


In this page you can find information about our work.

Causal Impact Analysis for App Releases in Google Play

Abstract. App developers would like to understand the impact of their own and their competitors’ software releases. To address this we introduce Causal Impact Release Analysis for app stores, and our tool, CIRA, that implements this analysis. We mined 38,858 popular Google Play apps, over a period of 12 months. For these apps, we identified 26,339 releases for which there was adequate prior and posterior time series data to facilitate causal impact analysis. We found that 33% of these releases caused a statistically significant change in user ratings. We use our approach to reveal important characteristics that distinguish causal significance in the Google Play store. To explore the actionability of causal impact analysis, we elicited the opinions of 52 developers: 75% concurred with the causal assessment, of which x% claimed that their company would consider changing their app release str


This work has been accepted at FSE 2016.
Download: full paper.

Clustering Mobile Apps Based on Mined Textual Descriptions

Abstract. Context: Categorising software systems according to their functionality yields many benefits to both users and devel- opers. Objective: In order to uncover the latent cluster- ing of mobile apps in app stores, we propose a novel tech- nique that measures app similarity based on claimed be- haviour. Method: Features are extracted using information retrieval augmented with ontological analysis and used as attributes to characterise apps. These attributes are then used to cluster the apps using agglomerative hierarchical clustering. We empirically evaluate our approach on 17,877 apps mined from the BlackBerry and Google app stores in 2014. Results: The results show that our approach dramat- ically improves the existing categorisation quality for both Blackberry (from 0.02 to 0.41 on average) and Google (from 0.03 to 0.21 on average) stores. We also find a strong Spear- man rank correlation (ρ = 0.96 for Google and ρ = 0.99 for BlackBerry) between the number of apps and the ideal gran- ularity within each category, indicating that ideal granular- ity increases with category size, as expected. Conclusions: Current categorisation in the app stores studied do not ex- hibit a good classification quality in terms of the claimed feature space. However, a better quality can be achieved using a good feature extraction technique and a traditional clustering method.


This work has been accepted at ESEM 2016.
Download: full paper.
Data: The data used in this study is available here.

Mobile App and App Store Analysis, Testing and Optimisation

Abstract. App stores are not merely disrupting traditional software deployment practice, but also offer considerable potential benefit to scientific research. Software engineering researchers have never had available, a more rich, wide and varied source of information about software products. There is some source code availability, supporting scientific investigation as it does with more traditional open source systems. However, what is important and different about app stores, is the other data available. Researchers can access user perceptions, expressed in rating and review data. Information is also available on app popularity (typically expressed as the number or rank of downloads). For more traditional applications, this data would simply be too commercially sensitive for public release. Pricing information is also partially available, though at the time of writing, this is sadly submerging beneath a more opaque layer of in-app purchasing. This talk will review research trends in the nascent field of App Store Analysis, presenting results from the UCL app Analysis Group (UCLappA) and others, and will give some directions for future work.


Keynote paper at MobileSoft2016.
Download: full paper.

Feature Lifecycles as They Spread, Migrate, Remain, and Die in App Stores

Abstract. We introduce a theoretical characterisation of feature lifecycles in app stores, to help app developers to identify trends and to find undiscovered requirements. To illustrate and motivate app feature lifecycle analysis, we used our theory to empirically analyse the migratory and non-migratory behaviours of 4,192 features from two App Stores (Samsung and Blackberry). The results reveal that, in both stores, intransitive features (those that neither migrate nor die out) exhibit statistically significantly different behaviours with regard to important properties, such as their price. Further correlation analysis also highlights differ- ences between behaviours relating price, rating and popularity. Our results indicate that feature lifecycle analysis can yield insights that may also help developers to understand feature requirement behaviours and attribute relationships.


This work has been accepted at RE 2015.
Download: full paper, report containing all the results obtained in our analysis.
Data: The data used in this study can be requested here.

The App Sampling Problem for App Store Mining

Abstract. Many papers on App Store Mining are susceptible to the App Sampling Problem, which exists when only a subset of apps are studied, resulting in potential sampling bias. We introduce the App Sampling Problem, and study its effects on sets of user review data. We investigate the effects of sampling bias, and techniques for its amelioration in App Store Mining and Analysis, where sampling bias is often unavoidable. We mine 106,891 requests from 2,729,103 user reviews and investigate the properties of apps and reviews from 3 different partitions: the sets with fully complete review data, partially complete review data, and no review data at all. We find that app metrics such as price, rating, and download rank are significantly different between the three completeness levels. We show that correlation analysis can find trends in the data that prevail across the partitions, offering one possible approach to App Store Analysis in the presence of sampling bias.


This work has been accepted at MSR 2015.
Download: full paper
Data: The data used in this study can be requested here

App Store Mining and Analysis (MSR2012)

Abstract. This paper introduces app store mining and analysis as a form of software repository mining. Unlike other software repositories traditionally used in MSR work, app stores usually do not provide source code. However, they do provide a wealth of other information in the form of pricing and customer reviews. Therefore, we use data mining to extract feature information, which we then combine with more readily available information to analyse apps’ technical, customer and business aspects. We applied our approach to the 32,108 non-zero priced apps available in the Blackberry app store in September 2011. Our results show that there is a strong correlation between customer rating and the rank of app downloads, though perhaps surprisingly, there is no correlation between price and downloads, nor between price and rating. More importantly, we show that these correlation findings carry over to (and are even occasionally enhanced within) the space of data mined app features, providing evidence that our ‘App store MSR’ approach can be valuable to app developers.


This work has been presented at MSR2012.
Download: full paper
Data: we provide figures from our analysis. The data used in this study can be requested here.

App Store Analysis: Relationships between Customer, Business and Technical Characteristics

Abstract. This paper argues that App Store Analysis can be used to understand the rich interplay between app customers and their developers. We use data mining to extract price and popularity information and natural language processing and data mining to elicit each app’s claimed features from the Blackberry App Store, revealing strong correlations between customer rating and popularity (rank of app downloads). We found evidence for a mild correlation between the number of features claimed for an app and its price, but we found little evidence for any other correlations in which price was a participant. We also found that free apps have significantly (p- value < 0.001) higher rating than non-free apps, with a moderately high effect size (Aˆ12 = 0.68). We also provide initial evidence that extracted claimed features are meaningful to developers (precision = 0.71, recall = 0.77). All data from our experiments and analysis are made available on-line to support further analysis.


This work is currently under review. You can download the technical report here.
Data: we provide a report containing all the correlation graphs for each category used in our analysis. If the paper is accepted we will add the data from the paper to support replication.

Mining App Stores: Extracting Technical, Business and Customer Rating Information for Analysis and Prediction

Abstract. App development is an increasingly innovative and lucrative software industry. However, determining a suitable market price of an App is both demanding and critical; the comparatively low unit price, but considerable volume of sales dramatically increases the impact of miss-pricing. In this paper we leverage app store repository mining and machine learning, to automatically construct predictive models for this prediction problem. We implement and evaluate our approach on 9,588 non-free Apps from the Blackberry App Store, demonstrating that our approach statistically significantly outperforms existing approaches with at least medium effect size in 15 out of 17 (88%) of Blackberry App Store categories.


This paper is currently under work. More details can be found in our technical report.
Download: technical report
Data: we provide figures from our analysis. If the paper is accepted we will add the data from the paper to support replication.