Hi, I'm Md Kamrul Islam 👋

I specialize in Big Data Management, Deep Learning, and Foundation Models, with hands-on experience building scalable pipelines and intelligent systems in domains such as healthcare and cybersecurity.

I'm currently pursuing the Erasmus Mundus Joint Master's in Big Data Management & Analytics (BDMA) at Université Paris-Saclay, CentraleSupélec. Before this, I earned my Bachelor of Engineering in Software Engineering from Sichuan University, China. My academic journey has taken me across China, Belgium, Spain, and now France. I speak English and Chinese fluently, and a bit of French.

Research Interests

Foundation Models, Deep Learning, Medical Imaging, Big Data Analytics

Hobbies

Tennis, Cycling, Traveling, Hiking

Kamrul Islam - AI Researcher & Data Scientist

Education

Erasmus Mundus Masters in Big Data Management and Analytics (BDMA) logo

Erasmus Mundus Masters in Big Data Management and Analytics (BDMA)

2023 – Present
  • Université libre de Bruxelles (ULB) logo
    Semester 1:
    Université libre de Bruxelles (ULB)
    Sept 2023 – Jan 2024|Brussels, Belgium
    Master of Science in Computer Science and Engineering
  • Universitat Politècnica de Catalunya (UPC) logo
    Semester 2:
    Universitat Politècnica de Catalunya (UPC)
    Feb 2024 – Jun 2024|Barcelona, Spain
    Master Erasmus Mundus in Big Data Management and Analytics
  • CentraleSupélec (CS), Université Paris-Saclay logo
    Semester 3 & 4:
    CentraleSupélec (CS), Université Paris-Saclay
    Sept 2024 – Present|Paris, France
    Master of Science in Engineering
Sichuan University logo

Sichuan University

2018 – 2022
|
Chengdu, China
Bachelor of Engineering in Software Engineering
Average Grade: 87/100
Bachelor Thesis: Brain Tumor Detection and Classification using CNN
Best Bachelor Thesis Award

Research & Professional Experience

My research focuses on AI-driven solutions for real-world challenges, from medical image analysis to security automation

Laboratoire Images, Signaux et Systèmes Intelligents (LISSI)

Research Engineer Intern

Research
May 2024 - Present|Université Paris-Est Créteil|Paris, France

Research Focus: Cybersecurity, Business Process Modeling, and LLMs

Problem: Manual security requirement extraction from multimodal documents is time-consuming and error-prone, leading to incomplete security specifications in business process workflows.
Solution: Developing an LLM-assisted solution that automatically extracts security and data-sharing requirements from multimodal inputs to generate BPMN workflows with semantically valid SecBPMN annotations.
  • Research Focus: Automated security requirement extraction using multimodal AI for business process modeling
  • Novel Approach: Integration of LLM-based text analysis for security annotated BPMN workflow generation
  • Technical Innovation: Development of SecBPMN-compliant workflow generation with context-aware data protection mechanisms
  • Technologies: Python, React-JS, LangChain, Retrieval-Augmented Generation, SecBPMN, Knowledge Graphs

Laboratoire Interdisciplinaire des Sciences du Numérique (LISN)

Graduate Research Assistant

Research
Oct 2024 – Present|CentraleSupélec|Gif-sur-Yvette, France

Research Focus: Deep Learning, Medical Image Analysis, and Self-Supervised Learning

Problem: Traditional deep clustering methods lack geometric invariance, leading to poor performance on medical images with varying orientations and requiring extensive data augmentation.
Solution: Developed a novel deep clustering architecture integrating Group Equivariant CNNs to encode geometric symmetries directly in network architecture, eliminating the need for explicit data augmentation.
  • Research Project: 'Enhancing Self-Supervised Learning for Image Clustering Using Geometric Deep Learning'
  • Novel Architecture: Developed Group Equivariant CNN-based clustering model that preserves geometric invariances in medical image analysis
  • Performance Results: 15% improvement in clustering accuracy on NIH chest X-ray datasets compared to baseline methods
  • Technical Innovation: Eliminated need for explicit data augmentation while improving clustering performance and generalization
  • Scalability: Built optimized training pipeline using PyTorch DistributedDataParallel and automatic mixed precision for multi-GPU systems
  • Technologies: PyTorch, OpenCV, LaTeX, HPC, NIH Chest X-ray Dataset

Chengdu Suncape Data Co., Ltd

Software Engineer Intern

Industry
December 2020 - May 2021|Chengdu, China
  • Data Engineering: Developed and optimized Apache Spark pipelines for large-scale data processing and analysis
  • Performance Improvement: Enhanced data preprocessing workflows achieving 20% increase in predictive model accuracy
  • Collaboration: Worked with cross-functional teams using Agile Scrum methodology for code quality and version control
  • Technologies: Python, PySpark, SciKit-Learn, Agile Scrum, Jira, Git

Publications

Navigating the AI Frontier: A Critical Literature Review on Integrating Artificial Intelligence into Software Engineering Education

C. K. Sah, L. Xiaoli, M. M. Islam and M. K. Islam
2024 36th International Conference on Software Engineering Education and Training (CSEE&T)
2024|Würzburg, Germany|pp. 1–5

A comprehensive literature review examining the integration of artificial intelligence into software engineering education, analyzing current trends, challenges, and future directions in this emerging field.

View Publication

Featured Projects

Academic Project
Completed

DigiScan360

Big Data Analytics, ML, LLMs, Knowledge Graphs

February 2024 – June 2024

DigiScan360 is a visual tool for competitive intelligence based on various data sources such as e-commerce, expert reviews and social media. It enables comprehensive analysis and insights through a combination of data collection, processing, and visualization.

Technologies

PySparkLLMsSQL ServerMicrosoft FabricAzure Data FactoryPower BIGraphDBSPARQL

Key Achievements

Developed a competitive intelligence platform and pitched it as a startup prototype at UPC's entrepreneurship initiative
Built end-to-end data pipeline processing large-scale datasets using PySpark and Azure Data Factory with real-time Power BI dashboards
Implemented LLaMA-3 for sentiment analysis and trend extraction, uncovering actionable business insights from multiple data sources
Constructed knowledge graph in GraphDB with SPARQL queries for advanced grpah-based analytics
1 / 4

Get in touch

Do you have a project in your mind, contact me here

Find Me

Connect with me

Available for opportunities

Open to PhD opportunities and applied research roles in industry

Send me a message