Mohamed Abbas

Mohamed Abbas

AI Researcher & Software Engineer

Computer Science student at AUC (GPA: 3.93) specializing in Natural Language Processing and Machine Translation. Applied Scientist Intern at Microsoft Egypt, focusing on identity resolution and large-scale data mining. Passionate about Arabic NLP and building scalable AI systems.

Education

American University in Cairo (AUC)

B.Sc. in Computer Science, Minor in Mathematics

Sep 2021 – Jan 2026 | Cairo, Egypt

Major GPA: 3.96 | Overall GPA: 3.93

  • USAID Scholarship Recipient
  • Top 5 Ranking in Egyptian Collegiate Programming Contest

University of Nebraska

Exchange Semester

Sep 2023 – Jan 2024 | Nebraska, US

  • Computer Vision, Artificial Intelligence
  • Databases, Programming Concepts

Work Experience

Microsoft Egypt

Applied Scientist Intern

Jul 2025 – Present | Clarity Team, Identity Scope

  • Expanded identity-resolution coverage by mining UET Edge datasets, yielding 4× more MUID-MUID pairs
  • Enriched signal features by parsing 34 new url params, discovering 18M additional cross-device pairs in 10 days
  • Authored Azure Data Factory pipeline with daily monitoring scripts for same-day detection of failing extractors

Money Fellows

Software Engineer Intern

Jul 2024 – Sep 2025 | Cairo, Egypt

  • Integrated into team as Junior Software Engineer, participating in Agile ceremonies
  • Refactored legacy codebases to .NET 8, applying clean architecture and microservices patterns
  • Leveraged Entity Framework Core, LINQ, MediatR, CQRS, Redis caching, and FluentValidation

MawGood

Founder and CEO

Aug 2023 – Present | Cairo, Egypt

  • Founded platform supporting employment of individuals with disabilities
  • Collaboration with AUC Venture Lab

Research Experience

MBZUAI Research Internship

MBZUAI - UGRIP Research Intern

Tokenization-free Machine Translation for Dialectal Arabic

Jun 2025 | Abu Dhabi, UAE

  • Co-developed tokenisation-free Arabic→English MT system: Pixel-M4 encoder + GPT-2 decoder
  • Pre-trained on UNPC ar-en (73M sentence pairs) using 2 × Tesla A100 80GB GPUs
  • Benchmarked AraBERT, ARBERT, and CAMeLBERT encoders
  • Pixel-M4 doubled average BLEU on 24/25 MADAR dialects vs fine-tuned AraBERT baseline
  • Supervised by Dr. Bashar Alhafni & Dr. Yova Kementchedjhieva

American University in Cairo

Natural Language to SQL Query

Aug 2024 – Jan 2025

  • Developed model to transform natural language queries into SQL using NSText2SQL dataset
  • Enhanced model accuracy to 22% through 24-day hyperparameter optimization and RAG system integration
  • Applied QLoRA and 4-bit quantization for hardware-constrained fine-tuning

Projects

🤖 AI & Machine Learning

ArabianGPT2-124M-FromScratch

Engineered and trained large-scale GPT-2 model (124M parameters) from scratch using PyTorch for Arabic text generation

View on GitHub →

Arabic Meter AI with BERT

BERT-based model for classification of Arabic poetic meters, trained on 1.9M+ lines of poetry using Hugging Face Transformers

View on GitHub →

Transformer-EN2AR-From-Scratch

Machine translation Transformer model in PyTorch with self-attention mechanisms for English to Arabic real-time processing

View on GitHub →

Brain Tumor Detection (BCM-CNN)

Deep learning model for MRI scan analysis achieving 98% accuracy with VGG19 architecture using TensorFlow

View on GitHub →

Cell Detection Computer Vision

Integrated MATLAB with YOLOv5 for image analysis and object detection with custom scripts for circle detection and distance calculation

View on GitHub →

Text Generation with RNNs

Explored LSTM networks for dynamic text generation with bidirectional architectures, optimizing sequence modeling

View on GitHub →

Twitter Sentiment Analysis

Analyzed 1.6 million tweets using Multinomial Naïve Bayes classifier achieving 96% accuracy in sentiment classification

View on GitHub →

Poem Generation with Bigram Language Model

Implemented Bigram Language Model using PyTorch trained on Arabic text dataset for Arabic poetry generation

View on GitHub →

AutoPricePro

Car price prediction using RandomForestRegressor achieving R-squared score of ~0.935 with feature engineering and data preprocessing

View on GitHub →

💻 Software Engineering

School Management System (ElSaher In Math)

Full-stack system with website, dashboard, and mobile app using ASP.NET Core 7, Blazor WebAssembly, and Blazor-MAUI. Deployed for 6 months with teacher collaboration

View on GitHub →

Rate-AUC-Professors

Student-driven rating platform for AUC professors with anonymous ratings and material sharing. Built with ASP.NET Core 7 backend and React frontend

View on GitHub →

Sabeel's Organization Website

Website for Sabeel's organization using ASP.NET Core 7 backend and Angular frontend with Entity Framework and Microsoft SQL Server

View on GitHub →

Examify

End-to-end exam creation and management solution with Angular frontend and ASP.NET Core 7 backend, featuring RESTful API integration

View on GitHub →

MawGood Platform

Platform enhancing employment opportunities for individuals with disabilities. Built with Angular and ASP.NET Core with JWT authentication

View on GitHub →

⚙️ Systems & Computer Architecture

RISC-V Pipelined Processor (Verilog)

Designed pipelined RISC-V processor in Verilog supporting 40 RISCV-32I ISA instructions, deployed on Nexys A7-100T FPGA using Vivado

View on GitHub →

RISC-V Simulator

Simulator supporting all 40 RV32I base integer instruction set according to RISC-V ISA specifications. Implemented in C++

View on GitHub →

Signed Sequential 8-bit Multiplier (Verilog)

8-bit signed multiplier designed in Logisim-Evolution and implemented in Verilog on Xilinx Artix 7 FPGA Basys 3 board

View on GitHub →

Tomasulo Algorithm Simulation

C++ simulation of Tomasulo algorithm demonstrating dynamic scheduling and out-of-order execution in modern processors

View on GitHub →

Plagiarism Detection Algorithms

Suite of string matching algorithms (Rabin-Karp, KMP, Boyer-Moore, Hamming Distance) in C++ for plagiarism detection

View on GitHub →

Quine-McCluskey Logic Minimization

Boolean minimizer algorithm in C++ with truth table generation, expression conversion, and canonical SOP/POS computation

View on GitHub →

Technical Skills

Languages

Python C/C++ C# JavaScript SQL

ML/AI Frameworks

PyTorch TensorFlow Hugging Face OpenCV Scikit-learn

Web Frameworks

ASP.NET Core Entity Framework Angular React Blazor

Tools & Technologies

Git Docker Linux Azure CI/CD