Yang Azzollini
DPhil, University of Oxford · yang.maths@gmail.com · GitHub
I work on honest, finite-sample uncertainty quantification for multivariate processes observed on asynchronous grids — the problem of giving a calibrated variance bound for a joint estimate when the data sources report on irregular, source-specific schedules. The setting arises wherever heterogeneous streams of measurement need to be combined under a rigorous reliability claim: autonomous-vehicle sensor fusion, longitudinal medical monitoring, production ML telemetry, high-frequency financial econometrics. The central commitment is that the bound should hold for the realised sampling pattern, in finite samples, without distributional assumptions on the timestamp-generating process.
Current direction
The 2016 thesis worked out the exact finite-sample variance of a covariance estimator on asynchronously-sampled bivariate data. The 2026 paper extends this to a best-linear-unbiased optimality statement with closed-form weights, a Cramér–Rao identity at the independence point, and a two-step feasible construction whose plug-in penalty is bounded by a value-function envelope argument. The central theorems are mechanically verified in Lean 4 — a small structural axiomatisation captures the cross-product moment relations needed for the variance decomposition, and the algebraic content of the BLUE result is then verified by the Lean kernel.
The next direction is to extend the framework from bivariate to multipartite — multiple heterogeneous streams on a single system, of the kind that arise in sensor fusion or in longitudinal multi-instrument clinical data. The mathematical structure generalises; the open work is in the multipartite Lean development, the empirical demonstrations on non-financial data, and a methodology paper on the structural-axiomatisation pattern as a general technique for machine-verified finite-sample statistics. I'm also writing a book-length treatment of the bivariate framework, currently in advanced draft.
Research
Lean 4 formalisation
The five central theorems of the 2026 paper — the variance decomposition for asynchronously-observed cross-products, the Cauchy–Schwarz lower bound on the diagonal contribution, the closed-form BLUE weight, the asynchronous Gauss–Markov theorem, and the Cramér–Rao identity at the independence point — are mechanically verified against a five-field structural axiomatisation of the cross-product moment relations. Five Lean modules, zero remaining sorry statements. The structural-axiomatisation pattern is, to my knowledge, among the first applications of this design to applied statistics in Lean, and the verification establishes the algebraic content of the central theorems as machine-checked given the moment-relation axioms.
2026 paper (in preparation)
Optimal weighted Zhou–Hayashi–Yoshida estimators for asynchronous covariance. A finite-sample BLUE characterisation for the integrated-covariance estimator class on asynchronously-observed bivariate semimartingales, with closed-form optimal weights, a Cramér–Rao identity at the independence point, and a two-step feasible construction. The variance bound is exact in finite samples, conditional on the realised sampling pattern, and does not assume a parametric arrival model. Target venue: Journal of Financial Econometrics. arXiv link to follow.
2016 DPhil thesis
Correlation methods in the statistical analysis of financial trading data. University of Oxford. Supervised by Peter Clifford and Brian Ripley. The thesis established the exact finite-sample variance of the extended Zhou–Hayashi–Yoshida estimator class, closing a finite-sample question that the asymptotic literature had left open.
Writing and talks
This section is intentionally empty for now. Pieces will appear here as they're written.
About
I stumbled on these problems on a trading desk at a top investment bank, which also sponsored my doctorate. Volatility, asynchronous correlation, and lead-lag looked like three different questions at the time; they turned out to share the same mathematical structure, and that structure has been worth decades of work to understand carefully. I currently live in Silicon Valley, still working on problems that no one really understands yet.
The pacing of independent research is unusual. The 2016 thesis sat largely unpublished for nearly a decade while the trading-desk work paid for the time to think about it; the 2026 paper is the result of that thinking, returned to and finished without external deadlines. The Lean formalisation came out of a parallel interest in formal verification as a methodology for making applied statistical claims auditable in a way the field doesn't yet take for granted.
Availability
Based in Silicon Valley, with regular access to the Stanford and Berkeley campuses, San Francisco, and the Peninsula. I'm open to research collaborations, visiting positions, and conversations with researchers working in adjacent areas — finite-sample uncertainty quantification, distribution-free inference, formal methods for applied statistics. Bay Area coffees welcome; email is the best way to reach me.