Portfolio

Featured Work

ICML 2026 First Author

Distributional Active Inference

Best average rank across 19 continuous control tasks on three benchmark suites, with up to +62% sample efficiency over the next-best baseline. Integrates Active Inference into distributional RL without a world model. ICML 2026.

MOMBO: Deterministic Uncertainty Propagation for Offline RL

NeurIPS 2024 First Author

arXiv Code Video

MOMBO: Deterministic Uncertainty Propagation for Offline RL

Best convergence rate (avg AULC rank 1.2) across all 12 D4RL offline benchmarks. Deterministic moment matching replaces Monte Carlo Bellman targets, with provably tighter suboptimality bounds. NeurIPS 2024.

EPPO: Evidential Proximal Policy Optimization

TMLR 2025 First Author

arXiv Code

EPPO: Evidential Proximal Policy Optimization

State-of-the-art in non-stationary control: average rank 1.5 across 10+ environments. Evidential critic simultaneously preserves plasticity and drives directed exploration from a single probabilistic framework. TMLR 2025.

CDDP: Continual Learning of Multi-modal Dynamics

L4DC 2024 First Author

arXiv Code

CDDP: Continual Learning of Multi-modal Dynamics

Learns new dynamical modes sequentially without catastrophic forgetting or mode labels, outperforming parameter-transfer baselines on 4 out of 5 datasets. Neural episodic memory with a Dirichlet Process prior for automatic mode discovery. L4DC 2024.

ICLR 2022 2nd Author

arXiv Code Video

Evidential Turing Processes

The only model achieving top-tier performance on calibration, class overlap, and OOD detection simultaneously across five real-world benchmarks. External memory unifies global and local uncertainty in a single principled framework. ICLR 2022.

iS-QL: Bridging Target-free and Target-based Reinforcement Learning

ICLR 2026 4th Author

arXiv Code

iS-QL: Bridging Target-free and Target-based Reinforcement Learning

Closes the 10–60% performance gap between target-free and target-based RL by sharing all parameters except the final linear head — matching target-based stability at near target-free memory cost across Atari, DMC, and language modeling. ICLR 2026.

PAC4SAC: PAC-Bayesian Soft Actor-Critic Learning

AABI 2024 2nd Author

arXiv Code

PAC4SAC: PAC-Bayesian Soft Actor-Critic Learning

2-3x sample efficiency improvement on high-dimensional tasks (Ant), best cumulative regret across all four PyBullet environments. First actor-critic using a PAC-Bayesian generalization bound as the critic training objective. AABI 2024.

ObjectRL: An Object-Oriented Reinforcement Learning Codebase

arXiv 2025 2nd Author

arXiv Code

ObjectRL: An Object-Oriented Reinforcement Learning Codebase

Extending SAC to a new algorithm takes roughly 5 lines: just override the two methods that change. Full OOP codebase where encapsulation, inheritance, and polymorphism map directly to RL algorithm components. arXiv 2025.

BFL: Aggregating Variational Bayesian Networks in Federated Learning

NeurIPS 2022 Workshop 2022 3rd Author

arXiv Code

BFL: Aggregating Variational Bayesian Networks in Federated Learning

Low-spread aggregation rules match or beat deterministic FedAvg while providing significantly better calibration across clients. First systematic study of aggregation strategies for Variational Bayesian Neural Networks in federated learning. NeurIPS 2022 Workshop.

Thesis

Probabilistic Methods for Sample-Efficient Reinforcement Learning

Ph.D. Thesis 2026 First Author

Paper

Probabilistic Methods for Sample-Efficient Reinforcement Learning

Doctoral thesis presenting six peer-reviewed algorithms at NeurIPS, ICML, ICLR, TMLR, and UAI, unified by one claim: probabilistic uncertainty representations make reinforcement learning agents faster, more adaptive, and more data-efficient.

Memory-based Approaches to Problems in Probabilistic Modeling

Master's Thesis 2022 First Author

Paper

Memory-based Approaches to Problems in Probabilistic Modeling

Master's thesis at Istanbul Technical University demonstrating that external memory solves two open problems in probabilistic ML: total calibration of neural networks (ETP, ICLR 2022) and continual learning of multi-modal dynamical systems (CDDP, L4DC 2024).

Industry Experience

ML Engineer

Signature Verification for Fraud Detection

Siamese CNN trained on handwritten signatures deployed at Vakifbank R&D for cheque fraud detection — 95% accuracy on internal data, 88% on the public CEDAR benchmark.