My current research lies in the areas of Data Science and Artificial Intelligence (Natural Language Processing). I try to answer these questions:
1) How to make better personalization and recommendations in the era of Big Data? For example, explainable Recommender Systems, conversational Recommender Systems, privacy-preserving Recommender Systems.
2) How to discover new knowledge based on complex, large-scale, noisy, diverse, and dynamic data? For example, social data (e.g., social media posts in Twitter), pervasive data (e.g., exercises data in smart phones), text data, image data, environment data, climate data.
3) What is intelligence and how to model, test, and apply findings and theories of biological learning to machines?
Student applications and collaboration from other discipline areas and industry are very welcome! My email is my first name.last name@newcastle.ac.uk
- Grants:
-
Co-I of HEIF of Lincoln University, £3000. “GaitLLM: An AI-Powered App for Integrated Running Gait Analysis and Clinical Reporting”, Industry Partner: Great Northern Physiotherapy, 01/26-06/26
- Co-I of NHRP small grants, £5000 each: a) “Foundational Prototype for a Conversational AI to Assist Collateral History Gathering in ADHD/ASD Pathways.” b) “Enhancing Human-AI Interaction for Pre-Clinical Dementia Assessment: Software Optimisation of the LUMEN Tool”
- Co-I of ENT UK Foundation Research Grant 2025, £1,200. “Patient-focused implementation of multi-modality AI models for thyroid nodule diagnosis, communication and surveillance”
- £217,166 (PI) Innovate UK funding for a 2–year KTP project titled “To accelerate product innovation and increase business productivity by automating the construction of an enterprise-level knowledge management and discovery platform via the use of novel Generative AI, Large Language Models, and human-AI collaboration techniques”. Industry partner: P&G. 2025-2027
- £30,912 (PI) funding from P&G for a 3 months project on “Probabilistic Modelling and Large Language Modelling”. 4/2025-8/2025. Industry partner: P&G
- £10K (Co-I) award in a joint project in A Large Language Model based Dementia patient data collection chatbot from NIHR Newcastle BRC’s Dementia, Mental Health and Neurodegeneration Theme Accelerator Project Call
- EPSRC CASE studentship (PI) 2024. The industry collaborator is Sage.
- 05/2024-08/2024 (PI) P&G and Newcastle University co-funded project: Feasibility Study of constructing a Knowledge and data management platform based on Large Language Models and domain knowledge graphs. Project in total: £67,507, Co-I: Stephen McGough.
- 08/2024 -08/2026 (Co-I) Innovate UK KTP project: Transforming Ryder Architecture: empowering business productivity and services with a novel Large Language Model and Artificial Intelligence System, project in total £189,212. PI: Prof. Mohamad Kassem, Engineering School.
- 05/2023 – 09/2023 PI, Innovate UK in Professional & Financial Service project entitled An AI-powered tool for enhancing consumer trust in financial promotions and marketing (PI: L. Liang, Co-I: A Karim Aldohni). Industry partner: start-up company Deriskly
- Collaborative Innovation Fund of Royal Berkshire NHS Foundation Trust and University of Reading project entitled: Improving preoperative diagnosis of thyroid nodules by developing ultrasound artificial intelligence (AI) decision support system. Collaborators: Royal Berkshire Hospital Trust Foundation, NHS, 1/1/2021-30/12/2022 (PI)
- EIT Food/Horizon2020 project entitled: Developing a Digital Toolkit to Enhance the Communication of Scientific Health Claims (Co-I, Project in total €495,204 for phase 1, €708,622 for phase 2, €466,000 for phase 3), collaborators: Department of English Language and Applied Linguistics, Department of Design, the School of Agriculture, Policy and Development of University of Reading, Technical University of Munich (TUM), British Nutrition Foundation, start-up company Food Maestro, food company Maspex, Institute of Animal Reproduction and Food Research of the Polish Academy of Sciences. 1/1/2019-31/12/2021
- Vacancies/Scholarships
- I have multiple Phd studentship in generative models (e.g., Large Language Models), NLP, recommender systems, knowledge graphs. Please contact me first (myfirstname.lastname@newcastle.ac.uk), before submitting the formal application form.
1) University level competitive studentship.2) CSC phd studentship for Chinese student. Deadline: end of Jan 2026. Full scholarship.
- Available PhD Student Projects
User Profiling for Personalisation. This project is to develop scalable and effective explicit and implicit user profiling approaches to discover new knowledge about users’ individual interests, preferences, emotional status, and information needs. It will use advanced natural language understanding, Large Language Models, reinforcement learning, and deep learning techniques to construct user profiles based on both explicit and implicit multi-modality user behaviour data (text, images, audio, temporal data, graph). The proposed user profiling techniques will be applied to recommender systems to make personalised recommendations.
Hashing Techniques for High Dimensional Data. Hashing is a key technique to analyse big data. It has been popularly used for dimensionality reduction and data size reduction. This project is to develop novel hashing algorithms to sample, compress, and index big data such as social and climate data to facilitate effective and efficient information retrieval and recommendation. This project will also explore machine learning based hashing techniques. This project will contribute to new solutions to make better usage and processing of big data.
Other directions such as Responsible Recommender Systems (e.g., trustworthy, sustainable), Conversational Recommender Systems, LLMs based Recommender Systems, Scalable Recommender Systems are also available. Note: Good programming skills and theoretical modelling are required for all projects.
- Selected Recent Research Work
1. User Profiling and Recommender Systems.
Personalisation attempts to help users solve the information overload issue. It is the ability to provide content and services tailored to individuals based on knowledge about their preferences and behaviours. User profiling is the foundation of personalisation. I proposed novel user profiling approaches to discover knowledge about users such as their interests, preferences, and information needs, from massive social data that contains user generated content and behaviour information.
Social tags. I investigated the distinctive features and multiple relationships of social tags, and explored novel approaches to solve the tag quality problem and profile users accurately.
Social Tags & Item Taxonomy. I proposed an approach to integrate social tags from community users and the standard taxonomy information provided by experts to profile users and make recommendations.
Social Media. I modelled the recency phenomenon and the implicit information network among users, topics, and micro-blogs. I proposed to take the temporal factor and implicit information network in social media to profile users and recommend topics to users.
Ratings. Inspired by Neural Language Model, I proposed a probabilistic rating auto-encoder to perform unsupervised feature learning and generate latent user feature profiles from large-scale user rating data.
Images. The traditional implicit rating information network is augmented with visual factors based on item images to make recommendations.
Heterogenous Information Network. I proposed a deep reinforcement learning based approach to profile users in heterogenous information network for recommender systems.
Some example figures are shown below. (The related publications please see G1 on Publications page)
2. Big Data Processing Techniques.
Targeting the challenge of big data and high dimensional data, I proposed parallel user profiling approaches and Hashing based indexing/blocking techniques.
Parallel user profiling. I proposed a parallel user profiling implementation based on advanced cloud computing techniques such as Hadoop, MapReduce and Cascading. The experiments were conducted on a 7GB delicious.com dataset with 420 million tag assignments.
Hashing based indexing/blocking. I proposed noise-tolerant hashing based indexing, two stage similarity-aware indexing techniques for noisy large-scale datasets in the application areas of real-time entity resolution and real-time social recommender systems. In the joint work with students and colleagues, a semantic-aware blocking approach has been proposed to efficiently unify both textual and semantic features to map large noisy data into small data blocks.
Some example figures are shown below. (The related publications please see G2 on Publications page)
3. Natural Language Processing: Sentiment Analysis and Question Answering Systems.
We applied deep learning techniques for sentiment analysis task and question answering systems. The sentiment analysis task was trained on 1 million tweets randomly selected from a 5.3TB Twitter dataset. Some example figures are shown below. (The related publications please see G3 on Publications page)
- Postdoc and research assistant
- Emmanuel (5/2024 -8/2024 )
- Manish (5/2024 – 8/2024)
- Nicolay Rusnachenko (12/2022-12/2023)
- Zehao Liu (4/2020-12/2021)
- Xiao Li (7/2020-12/2021)
- Current Phd Student Projects
- Jianfei Xu (2025 – ) Multi-modal Generative AI models for Personalized Healthcare
- Tian Li (2025 – ) Dialogue Agents with Human Level Attributes
- Liting Huang (2025 – ) Securing Large Language Model from Knowledge attacks
- Shu Li (2024 – ) Multi-modal Semantic Enrichment of Time Series for Explainable Forecasting
- Ting Zhu (2024 – ) Emotional Aware Personalized Multi-modal Conversational Dialogue Systems
- Alex Robertson (2024 – ) Mitigation Hallucinations of Large Language Models based on Knowledge Graph based Recommendation techniques. EPSRC CASE studentship, co-funded by Sage
- Zehao Liu (2020 – ) Hashing Techniques for Recommender Systems
- Selected Past Student Projects
- Thanet Markchom (2019 – 2023), PhD student Project. Explainable Visually-Aware Recommender Systems Based on Heterogeneous Information Networks (Thesis)
- Banda Ramadan, Co-supervisor of PhD Student Project, Indexing Techniques for Real-time Entity Resolution, 2012-2015, ANU (Thesis)
- Shanthini Ramu (2021). Master student project. Multimodal Sentiment Analysis based on Video, Audio, and Text
- Yu Zhou (2021). Master student project. Automatic Question Answer Generation Bots (github)
- Umarani Ganeshbabu (2019). A Dynamic Bayesian Network Approach for Analysing Topic-sentiment Evolution
- Bhuvana Madhusudana (2019). Deep Learning for Bot Detection
- Selected Final Year undergraduate student projects: Conversational Chatbot in 3D Models (2020, demo), twitter (2021, github),
- XingYi Xu, Co-supervisor of Master Student Project, Deep Learning for Sentiment Analysis, 2015-2016, The University of Melbourne (Paper)
- Haifan (Tony) Wu, Co-supervisor of Master Student Project, Recommender Systems based on Social Media in Health, 2015-2016, The University of Melbourne (Report)
- Mingyuan Cui, Co-supervisor of Master Student Project, Towards a Scalable and Robust Entity resolution Framework — Blocking under Relational Constraints, 2014, ANU (Paper | Report)
- Haoran Du, Co-supervisor of Masters Student Project, 2014, Big Data Analysis — A Case Study for Recommender Systems, 2014, ANU (PDF | Report | Slides)
- Honours Student Project, Noise-Tolerant Approximate Blocking for Dynamic Real-time Entity Resolution, 2013, ANU (Paper | Thesis)
- Shouheng Li, Primary supervisor of Master Student Project, Two-stage Similarity-aware Indexing for Real-time Entity Resolution, 2013, ANU (Paper | Report)
- Chrislyn Braganza, Primary supervisor of Summer Research Scholar project, Real-time Social Recommender System, 2012, ANU
- Primary supervisor of Undergraduate IT Capstone Project, A Personalized Recommender System based on Microblogs, 2011, QUT
Leave a comment