About
I am a machine learning research engineer building retrieval, representation-learning, and evaluation systems for scientific discovery.
I’m currently CTO & Co-Founder at Deep MedChem (Prague), where I lead hands-on R&D work across:
- large model training + inference pipelines,
- scalable vector retrieval of molecules (2D/3D similarity),
- evaluation harnesses and benchmarking,
- and product-grade scientific software (APIs + UI + deployment).
Research interests
- representation learning
- retrieval / ANN / indexing
- evaluation and benchmark design
- scientific ML
- reasoning / synthetic datasets
- product-grade research systems
Selected work
CHEESE — Chemical Embeddings Search Engine (first author)
CHEESE reformulates ligand-based screening with expensive 3D metrics into scalable vector search. It supports 2D fingerprints + 3D shape + 3D electrostatics similarity, and is shipped as a product suite (Search / Explorer / Modeller / Electrostatics).
Public metrics (from the paper mirror + product docs):
- Reported up to 10^3 speedup and 10^6 lower cost per query on established benchmark suites over SOTA.
- Systems: I built a custom disk-based vector DB indexing 40B+ isometric embeddings
Links:
- Paper landing: /publications/cheese-paper
- CHEESE Search: https://cheese.deepmedchem.com
- Supplementary repo: https://github.com/Deep-MedChem/cheese-paper
SynthonGPT (first author)
SynthonGPT is a compact synthon-conditioned transformer for navigating makeable chemical space (grounded in vendor enumerations rather than hallucinated SMILES).
Public metrics (from the report):
- Count-matched benchmarks show up to 3.1x higher unique scaffold recovery vs F‑Trees and 1.76x vs SpaceLight while maintaining higher diversity (lower mean similarity).
- ~90M params, trained in ~10 hours on a single RTX 4090; sub-second inference on CPU/GPU (report).
Links:
CellARC (first author)
CellARC is a synthetic benchmark for abstraction/reasoning built from multicolour 1D cellular automata, with reproducible dataset generation, baselines, and a public leaderboard.
Links:
- Paper: https://arxiv.org/abs/2511.07908
- Repo: https://github.com/mireklzicar/cellarc
- Leaderboard: https://cellarc.mireklzicar.com
BitBIRCH-Lean (co-author)
Co-authored BitBIRCH-Lean, a memory-efficient implementation of the BitBIRCH clustering algorithm for very large molecular libraries. I contributed the bit-packing and optimization work that helped make the implementation use 8x less memory while being 2x faster.
BitBIRCH-Lean uses compressed fingerprint representations inside the clustering tree and supports optional C++ acceleration, enabling high-throughput clustering workflows on workstation-scale hardware rather than requiring specialized infrastructure.
Experience snapshot
CTO & Co-Founder, Deep MedChem
Foundational models for large-scale molecular search, evaluation, and deployed scientific software (cloud/on‑prem).
Research Scientist in Machine Learning, The MAMA AI
R&D; model training; production ML pipelines; entreprise client projects.
Machine Learning in Bioinformatics, Biodviser
Neural alignment-free sequence analysis and representation learning.
Python Software Developer, Charles University
Built software used by the Central Library.
Research internships and freelancing
Scientific computing, data analysis, mathematical methods...
Background
My background combines hands-on systems building with coursework in bioinformatics, computer science, mathematics, and philosophy.
Mathematics, Open University
Coursework in mathematics.
Bioinformatics, Charles University
Coursework in computer science, biology, chemistry.
Philosophy, Charles University
Coursework in philosophy.