/.

Welcome,

I'm Tathya, a robotics & perception engineer, bridging AI & Vision for Autonomous Systems

02

Work Experience

02

Work Experience

02

Work Experience

AI Research Engineer

@Collective Dynamics & Controls Lab x DARPA

| Sept, 2024 - PRESENT

  • Fine-tuned vision-language models (ViT & Spatial VAEs) using PEFT & LoRA for casualty detection/classification tasks; improved mAP by 22% while reducing false positives by 30%, meeting DARPA Triage Challenge benchmarks.

  • Curated & augmented large scale datasets using Stable Diffusion & Labelbox to enhance generalization for varied scenarios.Conducted large-scale finetuning on multi-GPU clusters further deploying AI models on Spot UGV.

  • Developed Nav2 Behavior Trees for autonomous navigation, integrating custom plugins for obstacle avoidance and mission-specific planning.

  • Demonstrated proficiency in Pytorch, Computer Vision, ONNX, OpenCV, ROS2, Docker, Fine-tuning, Transformers, ROS, Robotics, CUDA, Optimization}

AI Research Engineer

@Collective Dynamics & Controls Lab x DARPA

| Sept, 2024 - PRESENT

  • Fine-tuned vision-language models (ViT & Spatial VAEs) using PEFT & LoRA for casualty detection/classification tasks; improved mAP by 22% while reducing false positives by 30%, meeting DARPA Triage Challenge benchmarks.

  • Curated & augmented large scale datasets using Stable Diffusion & Labelbox to enhance generalization for varied scenarios.Conducted large-scale finetuning on multi-GPU clusters further deploying AI models on Spot UGV.

  • Developed Nav2 Behavior Trees for autonomous navigation, integrating custom plugins for obstacle avoidance and mission-specific planning.

  • Demonstrated proficiency in Pytorch, Computer Vision, ONNX, OpenCV, ROS2, Docker, Fine-tuning, Transformers, ROS, Robotics, CUDA, Optimization}

AI Research Engineer

@Collective Dynamics & Controls Lab x DARPA

| Sept, 2024 - PRESENT

  • Fine-tuned vision-language models (ViT & Spatial VAEs) using PEFT & LoRA for casualty detection/classification tasks; improved mAP by 22% while reducing false positives by 30%, meeting DARPA Triage Challenge benchmarks.

  • Curated & augmented large scale datasets using Stable Diffusion & Labelbox to enhance generalization for varied scenarios.Conducted large-scale finetuning on multi-GPU clusters further deploying AI models on Spot UGV.

  • Developed Nav2 Behavior Trees for autonomous navigation, integrating custom plugins for obstacle avoidance and mission-specific planning.

  • Demonstrated proficiency in Pytorch, Computer Vision, ONNX, OpenCV, ROS2, Docker, Fine-tuning, Transformers, ROS, Robotics, CUDA, Optimization}

Software Intern, Computer Vision

@Mowito Robotics

| Jul, 2024 - Aug, 2024

  • Achieved a 30% reduction in MAE for encoder diagnostics by optimizing a vision engine using OpenCV & ROS2 while testing it on ROSBags and deployed on a UR10 robotic arm for predictive maintenance

  • Led research on CLIP-based Vision-Language Agents (VLA), integrating FoundationPose for 6D pose estimation and SAM for segmentation to enhance object localization in dynamic environments. Built an interactive evaluation framework with Langchain and Streamlit, enabling seamless natural language-based task instructions using LLMs.

Software Intern, Computer Vision

@Mowito Robotics

| Jul, 2024 - Aug, 2024

  • Achieved a 30% reduction in MAE for encoder diagnostics by optimizing a vision engine using OpenCV & ROS2 while testing it on ROSBags and deployed on a UR10 robotic arm for predictive maintenance

  • Led research on CLIP-based Vision-Language Agents (VLA), integrating FoundationPose for 6D pose estimation and SAM for segmentation to enhance object localization in dynamic environments. Built an interactive evaluation framework with Langchain and Streamlit, enabling seamless natural language-based task instructions using LLMs.

Robotics Perception Intern

@Mowito Robotics, PA

| JUL, 2024 - AUG, 2024

  • Optimized a vision engine for encoder diagnostics using ROS2 Behavior Trees & Services, improving robotic arm stability by 30% via noise reduction.

  • Deployed vision-based predictive maintenance on a UR10 robotic arm, integrating MoveIt for motion planning.

  • Led research on CLIP-based image segmentation for real-time robotic pick-and-place estimation, leveraging LangChain & Streamlit.

Robotics Perception Intern

@Mowito Robotics, PA

| JUL, 2024 - AUG, 2024

  • Optimized a vision engine for encoder diagnostics using ROS2 Behavior Trees & Services, improving robotic arm stability by 30% via noise reduction.

  • Deployed vision-based predictive maintenance on a UR10 robotic arm, integrating MoveIt for motion planning.

  • Led research on CLIP-based image segmentation for real-time robotic pick-and-place estimation, leveraging LangChain & Streamlit.

Graduate Research Assistant

@Maryland Robotics Center, UMD

| Dec, 2023 - Mar, 2024

  • Pioneered Enhanced autonomous driving perception by integrating Swin Transformer & ViT-B32 with UNet, optimizing feature extraction for ADAS scene segmentation .Implemented scalable training pipelines on distributed GPU clusters and utilized ONNX Runtime to achieve real-time inference capability. Demonstrated expertise in RNNs/LSTMs & CNN Models.

  • Developed medical imaging pipeline using 3D UNet for Type-B Aortic segmentation utilizing CUDA Runtime. Automated dataset augmentation pipelines, increasing generalizability deploying on AWS Sagemaker

  • Built high-fidelity radar and LiDAR simulations for the DARPA Cognisense Project, supporting autonomous vehicle decision-making in complex environments.

Graduate Research Assistant

@Maryland Robotics Center, UMD

| Dec, 2023 - Mar, 2024

  • Pioneered Enhanced autonomous driving perception by integrating Swin Transformer & ViT-B32 with UNet, optimizing feature extraction for ADAS scene segmentation .Implemented scalable training pipelines on distributed GPU clusters and utilized ONNX Runtime to achieve real-time inference capability. Demonstrated expertise in RNNs/LSTMs & CNN Models.

  • Developed medical imaging pipeline using 3D UNet for Type-B Aortic segmentation utilizing CUDA Runtime. Automated dataset augmentation pipelines, increasing generalizability deploying on AWS Sagemaker

  • Built high-fidelity radar and LiDAR simulations for the DARPA Cognisense Project, supporting autonomous vehicle decision-making in complex environments.

Graduate Research Assistant

@Maryland Robotics Center, UMD

| Dec, 2023 - Mar, 2024

  • Pioneered Enhanced autonomous driving perception by integrating Swin Transformer & ViT-B32 with UNet, optimizing feature extraction for ADAS scene segmentation .Implemented scalable training pipelines on distributed GPU clusters and utilized ONNX Runtime to achieve real-time inference capability. Demonstrated expertise in RNNs/LSTMs & CNN Models.

  • Developed medical imaging pipeline using 3D UNet for Type-B Aortic segmentation utilizing CUDA Runtime. Automated dataset augmentation pipelines, increasing generalizability deploying on AWS Sagemaker

  • Built high-fidelity radar and LiDAR simulations for the DARPA Cognisense Project, supporting autonomous vehicle decision-making in complex environments.

01

Featured Projects

01

Featured Projects

01

Featured Projects

CLIPGain : Evaluating Temporal Consistency of MFMs

Developed evaluation methods to assess temporal coherence in multimodal foundation models (MFMs) for video-language understanding. Focused on video question answering and captioning tasks using metrics like BERTScore, CLIPScore, and a new metric, CLIPGain, to measure temporal reasoning and semantic alignment. The framework highlights gradual improvements in temporal understanding, offering insights beyond binary correctness, and sets a foundation for building temporally coherent multimodal models.

Deep Learning

VLM

Multimodal Models

OpenCV

Pytorch

CUDA

View Project

CLIPGain : Evaluating Temporal Consistency of MFMs

Developed evaluation methods to assess temporal coherence in multimodal foundation models (MFMs) for video-language understanding. Focused on video question answering and captioning tasks using metrics like BERTScore, CLIPScore, and a new metric, CLIPGain, to measure temporal reasoning and semantic alignment. The framework highlights gradual improvements in temporal understanding, offering insights beyond binary correctness, and sets a foundation for building temporally coherent multimodal models.

Deep Learning

VLM

Multimodal Models

OpenCV

Pytorch

CUDA

View Project

TransUNet based Autonomous Scene Segmentation

This project developed a robust semantic segmentation model for autonomous driving by integrating a traditional UNet with advanced Transformer networks—specifically, the Swin Transformer—to capture both local and long-range dependencies. By leveraging the SwinUNet architecture, the model achieved significant improvements, including a superior mean IoU of 0.79, a Dice Coefficient of 0.87, and a notably low segmentation loss of 0.29, all while delivering real-time inference speeds (26.63 ms) on standard hardware. This fusion of CNN and Transformer architectures not only enhances the segmentation of small and distant objects in complex urban scenes but also demonstrates its potential for efficient and accurate deployment in autonomous vehicle systems.

Pytorch

Deep Learning

Transformer Models

OpenCV

ROS

CUDA

View Project

TransUNet based Autonomous Scene Segmentation

This project developed a robust semantic segmentation model for autonomous driving by integrating a traditional UNet with advanced Transformer networks—specifically, the Swin Transformer—to capture both local and long-range dependencies. By leveraging the SwinUNet architecture, the model achieved significant improvements, including a superior mean IoU of 0.79, a Dice Coefficient of 0.87, and a notably low segmentation loss of 0.29, all while delivering real-time inference speeds (26.63 ms) on standard hardware. This fusion of CNN and Transformer architectures not only enhances the segmentation of small and distant objects in complex urban scenes but also demonstrates its potential for efficient and accurate deployment in autonomous vehicle systems.

Pytorch

Deep Learning

Transformer Models

OpenCV

ROS

CUDA

View Project

TD3-Powered Reinforcement Learning for AGV Autonomous Navigation

Developed evaluation methods to assess temporal coherence in multimodal foundation models (MFMs) for video-language understanding. Focused on video question answering and captioning tasks using metrics like BERTScore, CLIPScore, and a new metric, CLIPGain, to measure temporal reasoning and semantic alignment. The framework highlights gradual improvements in temporal understanding, offering insights beyond binary correctness, and sets a foundation for building temporally coherent multimodal models.

Pytorch

Deep Learning

Transformer Models

OpenCV

ROS

CUDA

View Project

TD3-Powered Reinforcement Learning for AGV Autonomous Navigation

Developed evaluation methods to assess temporal coherence in multimodal foundation models (MFMs) for video-language understanding. Focused on video question answering and captioning tasks using metrics like BERTScore, CLIPScore, and a new metric, CLIPGain, to measure temporal reasoning and semantic alignment. The framework highlights gradual improvements in temporal understanding, offering insights beyond binary correctness, and sets a foundation for building temporally coherent multimodal models.

Pytorch

Deep Learning

Transformer Models

OpenCV

ROS

CUDA

View Project

CLIPGain : Evaluating Temporal Consistency of MFMs

Developed evaluation methods to assess temporal coherence in multimodal foundation models (MFMs) for video-language understanding. Focused on video question answering and captioning tasks using metrics like BERTScore, CLIPScore, and a new metric, CLIPGain, to measure temporal reasoning and semantic alignment. The framework highlights gradual improvements in temporal understanding, offering insights beyond binary correctness, and sets a foundation for building temporally coherent multimodal models.

Deep Learning

VLM

Multimodal Models

Pytorch

OpenCV

CUDA

CLIPGain : Evaluating Temporal Consistency of MFMs

Developed evaluation methods to assess temporal coherence in multimodal foundation models (MFMs) for video-language understanding. Focused on video question answering and captioning tasks using metrics like BERTScore, CLIPScore, and a new metric, CLIPGain, to measure temporal reasoning and semantic alignment. The framework highlights gradual improvements in temporal understanding, offering insights beyond binary correctness, and sets a foundation for building temporally coherent multimodal models.

Deep Learning

VLM

Multimodal Models

Pytorch

OpenCV

CUDA

TransUNet based Autonomous Vehicle Scene Segmentation

This project developed a robust semantic segmentation model for autonomous driving by integrating a traditional UNet with advanced Transformer networks—specifically, the Swin Transformer—to capture both local and long-range dependencies. By leveraging the SwinUNet architecture, the model achieved significant improvements, including a superior mean IoU of 0.79, a Dice Coefficient of 0.87, and a notably low segmentation loss of 0.29, all while delivering real-time inference speeds (26.63 ms) on standard hardware. This fusion of CNN and Transformer architectures not only enhances the segmentation of small and distant objects in complex urban scenes but also demonstrates its potential for efficient and accurate deployment in autonomous vehicle systems.

Deep Learning

VLM

Multimodal Models

Pytorch

OpenCV

CUDA

TD3-based Reinforcement Learning for AGV Autonomous Navigation

This project implements a deep reinforcement learning (DRL) approach for autonomous mobile robot navigation, comparing a deep neural network-based TD3 with a shallow TD3 model. Trained in Gazebo with LiDAR perception, the deep TD3 model achieved faster convergence, more stable Q-values, and lower network loss, enabling better obstacle avoidance and decision-making. This work highlights advancements in policy gradient methods, continuous action space optimization, and DRL for real-world robotics.

Reinforcement Learning

Pytorch

Scikit

Keras

OpenAI Gym

ROS

TD3-based Reinforcement Learning for AGV Autonomous Navigation

This project implements a deep reinforcement learning (DRL) approach for autonomous mobile robot navigation, comparing a deep neural network-based TD3 with a shallow TD3 model. Trained in Gazebo with LiDAR perception, the deep TD3 model achieved faster convergence, more stable Q-values, and lower network loss, enabling better obstacle avoidance and decision-making. This work highlights advancements in policy gradient methods, continuous action space optimization, and DRL for real-world robotics.

Reinforcement Learning

Pytorch

Scikit

Keras

OpenAI Gym

ROS

Precision 3D Aorta Segmentation with Attention-Guided V-Net

This project introduces a deep learning model for 3D medical image segmentation, specifically targeting Type-B Aortic Dissection (TBAD) in MRI & CT scans. Built on V-Net, it integrates attention mechanisms to enhance segmentation accuracy of True Lumen (TL), False Lumen (FL). Trained on the ImageTBAD dataset , the model uses attention gates for region refinement & self-attention for long-range dependencies. Optimized with Dice coefficient loss & cross-entropy loss, it outperforms standard V-Net models. This work highlights expertise in deep learning, 3D CNNs, attention-based segmentation, and medical AI applications.

TorchIO

Pytorch

3D Segmentation

U-Net

Self-Attention

CNNS

Precision 3D Aorta Segmentation with Attention-Guided V-Net

This project introduces a deep learning model for 3D medical image segmentation, specifically targeting Type-B Aortic Dissection (TBAD) in MRI & CT scans. Built on V-Net, it integrates attention mechanisms to enhance segmentation accuracy of True Lumen (TL), False Lumen (FL). Trained on the ImageTBAD dataset , the model uses attention gates for region refinement & self-attention for long-range dependencies. Optimized with Dice coefficient loss & cross-entropy loss, it outperforms standard V-Net models. This work highlights expertise in deep learning, 3D CNNs, attention-based segmentation, and medical AI applications.

TorchIO

Pytorch

3D Segmentation

U-Net

Self-Attention

CNNS

Precision 3D Aorta Segmentation with Attention-Guided V-Net

This project introduces a deep learning model for 3D medical image segmentation, specifically targeting Type-B Aortic Dissection (TBAD) in MRI & CT scans. Built on V-Net, it integrates attention mechanisms to enhance segmentation accuracy of True Lumen (TL), False Lumen (FL). Trained on the ImageTBAD dataset , the model uses attention gates for region refinement & self-attention for long-range dependencies. Optimized with Dice coefficient loss & cross-entropy loss, it outperforms standard V-Net models. This work highlights expertise in deep learning, 3D CNNs, attention-based segmentation, and medical AI applications.

TorchIO

Pytorch

3D Segmentation

U-Net

Self-Attention

CNNS