Mohamed Shawky Sabae

Mohamed Shawky Sabae

Computer Vision Engineer at Rembrand. Teaching Assistant in Computer Engineering at Cairo University. Working at the intersection of computer vision and computer graphics, with research interests in 3D computer vision, inverse rendering, differentiable rendering, and implicit neural representations.

Computer Vision Engineer

Rembrand

March 2023 - Present

Developing computer vision and generative AI solutions for virtual product placement. Working on inverse rendering methods for material and light decomposition from RGB images. Working on 3D reconstruction pipelines to recover camera parameters and scene geometry from videos.

Teaching Assistant

Faculty of Engineering, Cairo University

December 2022 - Present

Handling course recitations and labs at Computer Engineering department. Courses: Computer Vision - Computer Graphics - Machine Learning - Neural Networks - Cognitive Robotics.

Computer Vision Engineer

Anovate.ai

August 2021 - February 2023

Worked on deep learning applications for 3D computer vision, including 3D scene understanding and 3D object reconstruction. Developed a pipeline for 3D mesh processing and wireframe parsing. Worked on instance segmentation for satellite imagery. Deployed high performance deep learning models in production using tools, including NVIDIA Triton Inference Server. Managed machine learning projects, along with communication with their clients.

Student Developer at RoboComp

Google Summer of Code 2020

May 2020 - August 2020

Project title : DNNs for precise manipulation of household objects. Implemented and optimized segmentation-driven 6D pose estimation neural network architecture. Used CoppeliaSim simulator to test pose estimation performance and augment training data. Integrated and tested pose estimation components with the new software architecture (based on DSR (Deep State Representation) and implemented using CRDT and RTPS) for precise manipulation of household objects.

Deep Learning Research Intern

Valeo

July 2019 - September 2019

Studied the effect of neural style transfer in modeling the real sensor noise, experimented with offline style transfer methods, including single, multiple and arbitrary style transfer. Applied generative adversarial networks (GANs) like CycleGAN, and neural style transfer on domain translation of LiDAR data from simulated (CARLA) to real (KITTI). Improved the performance of YOLO object detection on LiDAR data by 6%, by augmenting using domain translated data. Built an end-to-end architecture, where YOLO object detector loss in combined with CycleGAN to improve training.

Master of Science - Computer Engineering

Faculty of Engineering, Cairo University

October 2022 - January 2025

GPA: 3.97. Thesis: Neural Implicit Camera and Geometry Representations for Multiview 3D Reconstruction Without Camera Parameters.

Bachelor of Science - Computer Engineering

Faculty of Engineering, Cairo University

September 2016 - May 2021

Grade: Distinction with Honors. Cumulative Percentage: 91%. GPA: 4.00. Rank: 4th (out of 71). Graduation Thesis: Face Generation from Text using StyleGAN2.

NoPose-NeuS: Jointly Optimizing Camera Poses with Neural Implicit Surfaces for Multi-view Reconstruction

NeurIPS 2023 UniReps Workshop · Dec 15, 2023

Abstract: Learning neural implicit surfaces from volume rendering has become popular for multi-view reconstruction. Neural surface reconstruction approaches can recover complex 3D geometry that are difficult for classical Multi-view Stereo (MVS) approaches, such as non-Lambertian surfaces and thin structures. However, one key assumption for these methods is knowing accurate camera parameters for the input multi-view images, which are not always available. In this paper, we present NoPose-NeuS, a neural implicit surface reconstruction method that extends NeuS to jointly optimize camera poses with the geometry and color networks. We encode the camera poses as a multi-layer perceptron (MLP) and introduce two additional losses, which are multi-view feature consistency and rendered depth losses, to constrain the learned geometry for better estimated camera poses and scene surfaces. Extensive experiments on the DTU dataset show that the proposed method can estimate relatively accurate camera poses, while maintaining a high surface reconstruction quality with 0.89 mean Chamfer distance.

StyleT2F: Generating Human Faces from Textual Description Using StyleGAN2

arXiv preprint · Apr 17, 2022

Abstract: AI-driven image generation has improved significantly in recent years. Generative adversarial networks (GANs), like StyleGAN, are able to generate high-quality realistic data and have artistic control over the output, as well. In this work, we present StyleT2F, a method of controlling the output of StyleGAN2 using text, in order to be able to generate a detailed human face from textual description. We utilize StyleGAN's latent space to manipulate different facial features and conditionally sample the required latent code, which embeds the facial features mentioned in the input text. Our method proves to capture the required features correctly and shows consistency between the input text and the output images. Moreover, our method guarantees disentanglement on manipulating a wide range of facial features that sufficiently describes a human face.

Unsupervised Neural Sensor Models for Synthetic LiDAR Data Augmentation

NeurIPS 2019 Machine Learning for Autonomous Driving Workshop · Dec 1, 2019

Abstract: Data scarcity is a bottleneck to machine learning-based perception modules, usually tackled by augmenting real data with synthetic data from simulators. Realistic models of the vehicle perception sensors are hard to formulate in closed form, and at the same time, they require the existence of paired data to be learned. In this work, we propose two unsupervised neural sensor models based on unpaired domain translations with CycleGANs and Neural Style Transfer techniques. We employ CARLA as the simulation environment to obtain simulated LiDAR point clouds, together with their annotations for data augmentation, and we use KITTI dataset as the real LiDAR dataset from which we learn the realistic sensor model mapping. Moreover, we provide a framework for data augmentation and evaluation of the developed sensor models, through extrinsic object detection task evaluation using YOLO network adapted to provide oriented bounding boxes for LiDAR Bird-eye-View projected point clouds. Evaluation is performed on unseen real LiDAR frames from KITTI dataset, with different amounts of simulated data augmentation using the two proposed approaches, showing improvement of 6% mAP for the object detection task, in favor of the augmenting LiDAR point clouds adapted with the proposed neural sensor models over the raw simulated LiDAR.

Generative Data Augmentation for Semantic Segmentation

A research project for enhancing semantic segmentation methods using synthetic data from generative image-to-image translation in an end-to-end approach. End-to-end training workflow of Pix2Pix image translation guided by UNet segmentation.

Neural Malware Signature Generation

A research project for generating malware signatures using end-to-end neural networks. Training AutoEncoders on image-encoded malwares to generate latent representations used as signatures.

Retratista (Graduation Project)

A software (web application) for human face generation and manipulation from speech and textual description using generative adversarial networks (GANs). Designed and implemented the generation of face embedding and morphing using StyleGAN2.

Artistic Style Transfer Using Texture Synthesis

A python implementation of Style-Transfer via Texture-Synthesis paper, which uses classical methods. Implemented the main stylization loop and learning algorithms.

In the Middle of Nowhere (OpenGL Game)

A primitive 3D game using OpenGL with C++, created for Computer Graphics Course, where I only used simple 3d models and created shaders for some effects using GLSL.