✦ ✧

AI & Data
Architect

16+ years building scalable data/AI ecosystems — from real-time streaming pipelines and data lakehouses to LLM-powered agentic systems.

Ahmed's GitHub Contribution Heatmap

About

I'm Ahmed Sayed, an AI & Data Architect based in Cairo, Egypt. I design and build production-grade agentic AI systems, scalable data platforms, and real-time streaming architectures. Currently at Gymshark, where I'm deploying LLM-powered agents that automate business-critical workflows across marketing, DevOps, HR, and engineering.

16+
Years in Data & AI
6+
AI Agents in Production
5
Countries Worked Across
20+
Large-Scale Projects

Experience

AI and Data Architect
Gymshark
Remote, United Kingdom
05/2025 → present
Designing and deploying agentic AI systems in production — LLMs, RAG pipelines, orchestration frameworks. Building end-to-end AI solutions on GCP with LangChain, Gemini, and Google ADK.
Lead Data Engineer & Architect
Gymshark
Remote, United Kingdom
10/2023 → 05/2025
Led Snowflake → BigQuery migration. Architected Iceberg Data Lakehouse on GCS. Built real-time streaming with Dataflow + Beam. Established gold-standard CI/CD pipelines.
Principal Data Engineer
Curenta
Remote, Los Angeles
03/2023 → 09/2023
Modern data lake on Azure. AI scraping bots for healthcare automation. Built AI chatbot for medical order processing with ChatGPT integration.
Data Engineering Manager
Megamind IT Solutions
Riyadh, Saudi Arabia
02/2022 → 03/2023
Built and led data engineering team. Centralized SQL Server DW integrating 40+ hospitals. Developed clinic utilization ML models.
Data Engineer Supervisor
Telecom Egypt
Cairo, Egypt
10/2019 → 03/2022
Built streaming platform handling 10M+ events/sec with Kafka & Spark. Multi-tenant Hadoop analytics for 10,000+ users.
Senior Data Engineer / Team Lead
Telecom Egypt
Cairo, Egypt
06/2017 → 10/2019
CVM web app for marketing segmentation. Data migration from 20+ legacy systems to Teradata — 60M+ customers, zero data loss.
Data Engineer
Telecom Egypt
Cairo, Egypt
06/2016 → 06/2017
Customer 360 App with Python Django, Hive DW, and HBase. SQL Server warehousing, SSIS/SSAS pipelines.

Key Projects

AI & Agents

🤖 Customer Care Agent

Production agentic system automating returns and customer queries. Integrated BigQuery + Intercom with Google ADK for autonomous decision-making.

Python Google ADK BigQuery Intercom
AI & Agents

✍️ Copymate — Marketing AI

Content generation platform with RAG pipeline ingesting marketing examples from Notion into AlloyDB vector store. Powers autonomous content creation.

LangChain Gemini RAG AlloyDB
AI & Agents

⚙️ Data Engineer Mate

Autonomous coding agent trained on 100+ curated tasks to generate production-quality data pipelines with zero human intervention.

LangChain Python CI/CD GCP
AI & Agents

🛡️ Gatekeeper DevOps Agent

Autonomous DevOps agent handling CI/CD orchestration, unit testing, error detection, and self-healing remediation on GCP infrastructure.

GCP Terraform CI/CD Python
AI & Agents

👥 People Ops Bot

Internal AI assistant answering HR and policy questions via RAG pipeline using embedded company PDFs and Gemini for natural language understanding.

Gemini RAG Vector DB Python
AI & Agents

🏥 AI Scraping Bot — Healthcare

Intelligent bot extracting patient and medication data from HospiceMD and JNCloud. Zero-interference real-time order synchronization with Curenta APIs.

Python AI/ML REST APIs OCR
AI & Agents

💊 Medical Order Chatbot

ChatGPT-powered chatbot for medication orders via text and voice. Validates inputs, requests missing details, and auto-generates orders in real time.

ChatGPT NLP Voice Python
AI & Agents

📠 Fax Automation (OCR)

OCR-based pipeline converting faxed prescriptions into structured digital orders in real time. Eliminated manual processing with error-free automation.

OCR Python Automation
Data Engineering

🏔️ Iceberg Data Lakehouse

Architected modern Lakehouse on GCS with Apache Iceberg and PySpark/Dataproc. Unified analytics, ACID transactions, and time-travel queries at scale.

Iceberg PySpark Dataproc GCS
Data Engineering

🔄 Snowflake → BigQuery Migration

Led large-scale cloud migration with zero data loss. Optimized cost and query performance while maintaining full data lineage and governance.

Snowflake BigQuery GCP Migration
Data Engineering

⚡ Real-time Streaming Platform

High-performance event streaming handling 10M+ events/sec. Powers real-time analytics, alerting, and downstream ML model serving.

Kafka Spark Streaming 10M+ eps
Data Engineering

🌊 Azure Delta Lake Platform

Enterprise-grade Data Lake on Azure Blob Storage with Delta Lake + Databricks. Unified ingestion from microservices and SaaS platforms (Xero, HubSpot, Salesforce).

Delta Lake Databricks Azure Spark
Data Engineering

📡 EventBridge → GCP Streaming

Low-latency real-time pipeline ingesting AWS EventBridge data into GCP using Dataflow + Apache Beam Java SDK for cross-cloud analytics.

Dataflow Beam EventBridge Java
Data Engineering

🏗️ Terraform Infrastructure Platform

Infrastructure as Code across GCP — standardized provisioning, environment consistency, automated deployments with quality gates and monitoring.

Terraform GCP IaC CI/CD DevOps
Data Engineering

🏥 Centralized Hospital DW

Unified data warehouse integrating 40+ hospitals across the Middle East. Harmonized clinical, financial, and operational data for cross-facility analytics.

SQL Server SSIS NiFi Airflow
Data Engineering

🔀 Legacy → Teradata Migration

Migrated 60M+ customer records from 20+ legacy systems to Teradata Vantage Cloud. Zero data loss, full integrity validation, seamless cutover.

Teradata ETL Migration 60M+ rows
Data Engineering

🗄️ Multi-tenant Hadoop Analytics

Enterprise analytics platform serving 10,000+ users with tenant isolation, reducing report generation time by 75%. Secure data access and segregation.

Hadoop Hive Multi-tenant 10K+ users
Data Engineering

🔌 Kafka Event Hub Streaming

Real-time data sync with Kafka, Azure Event Hubs, and Stream Analytics. Low-latency event processing powering operational dashboards and alerts.

Kafka Event Hubs Stream Analytics Azure
ML & Analytics

🔗 Product Recommendation Engine

Graph-based personalization using Neo4j. Leverages CRM data and behavioral interactions to deliver context-aware product suggestions at scale.

Neo4j Graph AI CRM Python
ML & Analytics

🏥 Clinic Utilization Predictor

ML model forecasting clinic utilization to optimize doctor scheduling and resource allocation. Improved operational efficiency through data-driven workforce distribution.

TensorFlow Scikit-learn Pandas SQL Server
ML & Analytics

👤 Customer 360 Platform

Unified customer view enabling marketing, product, and support teams to deliver personalized, data-driven engagement across all touchpoints.

BigQuery Python Analytics Segmentation
ML & Analytics

🎯 CVM — Customer Value Management

No-code drag-and-drop web app for marketing teams. Customer segmentation, campaign creation, and deployment — bridging tech and business.

Django Segmentation No-Code Marketing
ML & Analytics

🩺 Talina.ai — AI Candidate Screening

AI-powered HR platform automating resume reviews, interview analysis, and psychometric testing. Streamlined recruiter decision-making at scale.

AI/ML NLP Python HR Tech
ML & Analytics

🩻 Doctor 360 Portal

Self-service web portal for doctors — financial statements, shareholding, historical data. No-code interface built with Python Django for medical professionals.

Django Analytics Healthcare Self-service

Tech Stack

Languages

  • Python
  • SQL
  • Java

Cloud

  • GCP / BigQuery
  • Azure / Databricks
  • Dataflow / Dataproc
  • AlloyDB / AKS

AI / ML

  • LangChain
  • Gemini / LLMs
  • RAG Pipelines
  • Google ADK
  • TensorFlow

Data

  • Kafka / Spark
  • Snowflake / Teradata
  • Neo4j / Hive / HBase
  • Delta Lake / Iceberg

DevOps

  • Terraform
  • Docker
  • CI/CD Pipelines
  • Airflow

Web

  • Python Django
  • .NET Core

Let's Connect ✦

Open to collaborations, consulting, and interesting conversations about AI, data, and building things that matter.