Projects

Master Thesis: Conditional Normalising Flows for Interpretability Permalink

thesis This work aims to fill the gap of two existing interpretability methods for global feature importance (Relative feature importance and Shapley additive global importance), by using deep conditional density estimator/sampler – Conditional Normalising Flow. After utilising noise regularisation, they do not require a rigorous hyperparameter search and could be used in the vanilla setting. We make an extensive empirical evaluation on different synthetic and real datasets with the help of a self-designed evaluation benchmark. Ground-truth feature importances are in general intractable, thus we inherited the concepts of strong and weak feature relevance, which have one-to-one relation to the causal structure of data generating mechanism. By utilising continuous datasets with a known causal graph, we can reason about the validity of estimated importances. Additionally, we provided a use case of RFI for detection of influence of sensitive attributes, when they are included in predictive modelling or completely ignored. This thesis extends the existing Python library for Relative Feature Importance, with Conditional Normalising Flow and Mixture Density Network, as well as with the synthetic benchmark.

Research Project: Certain Pseudo-Labeling for Semi-Supervised Image Classification Permalink

semi-supervised-certain-pseudo-labeling Employing uncertanty estimation in pseudo-labelling for semi-supervised image classification methods (Fix-Match). As Vanilla Fix-Match can suffer from overconfident pseudo-labels, we propose to employ active learning techniques to choose more confidently pseudo-labelled images from unlabelled pool. Experimenting with different uncertainty estimators MC Dropout / Spectral Normalisation & GMMs and uncertainty strategies.

Study Project: Patent Similarity Siemens Permalink

GCN-align Filing patents is a messy process. Applicants need to ensure their patent does not infringe upon other tens of millions of existing patents. This project aims to automate the search of similar patents.
Measuring, visualising and comparing patent similarities with metadata such as IPC tags. We aim to explore multiple similarity measures (textual, IPC tags, x-citations, graphical, etc), assess correlations, and visualise our findings to draw conclusions.

Study Project: Linear Regression on Homomorphic Encrypted Data Permalink

secure_ML_scheme The aim of this project is to perform Machine Learning (ML) algorithm, namely Linear Regression, on the ciphertext and evaluate its performance and usability. We use gradient descent in order to find LR coefficient estimates, as this algorithm does not require division.
We used Microsoft SEAL C++ library and its Python wrapper - PySEAL. SEAL uses the Fan-Vercauteren homomorphic encryption scheme - Somewhat Fully Homomorphic scheme.

Study Project: Multimodal Emotion Messaging Estimator Permalink

meme In this project, we introduce the concept of Multimodal Emotion Messaging Estimator (MEME), a human computation and recommendation system. MEME has two main objectives.
1. To recommend emojis on instant messaging platforms which are relevant to the context of the ongoing conversation between users.
2. Creating implicit coherent massive multimodal (text, audios, images) dataset for the multimodal sentiment analysis.