The workshop runs over two days, July 15–16, 2026. All times are local (Paris, CEST).
| Time | Activity | Location |
|---|---|---|
| 08:15 – 09:00 | Breakfast & Registration | Club de la Chasse ↗ |
| 09:00 – 09:15 | Opening Remarks | Club de la Chasse ↗ |
| 09:15 – 10:00 |
Talk: Ruta Mehta
Fairness and Incentives in Federated Learning
AbstractHide abstractWith the advent of generative AI, the paradigms of data sharing become crucially important for both economic and welfare reasons. Federated learning (FL) offers an effective paradigm for sharing rich, distributed data while protecting data privacy. Nonetheless, the heterogeneous nature of distributed data makes it challenging to define and ensure fairness among local agents, creating incentive issues. For instance, intuitively, if not compensated properly, an agent with high-quality data may not be incentivized to participate if the data of others is of low quality. Furthermore, on the one hand, agents benefit from the global model trained on shared data. On the other hand, by participating in federated learning, they may also incur costs (related to privacy and communication) due to data sharing. In this talk, I will attempt to take a social choice and game theoretic perspective to address these fairness and incentive issues. In this process, I will show how FL and SCT can inform each other, leading to newer insights and avenues. |
Club de la Chasse ↗ |
| 10:00 – 10:45 |
Talk: Patrick Loiseau
Geometry of Relaxed Fair Regression: A Unified Framework for Aware and Unaware Settings
AbstractHide abstractFairness-accuracy trade-offs are a central concern in the deployment of fairness-aware machine learning methods. When sensitive attributes are unavailable at inference time – the so-called unawareness setting, principled methods for obtaining accurate predictions under relaxed fairness constraints are largely missing. In this work, we address this gap by formulating regression under a demographic parity penalty as an optimal transport problem. Our framework unifies both the aware and unaware settings and characterizes optimal prediction functions via optimal transport maps, under both squared Wasserstein-2 and Total Variation penalties. These results reveal that the choice of penalty reflects fundamentally different fairness philosophies: the Wasserstein penalty induces a smooth, population-wide compromise, while Total Variation enforces exact parity for a subset of individuals. Building on these theoretical characterizations, we propose an algorithm that is simple to implement, computationally efficient, and consistently matches or outperforms state-of-the-art baselines on real-world benchmarks. Joint work with Marie Generali Lince, Vincent Divol, Rémi Flamary, Solenne Gaucher. |
Club de la Chasse ↗ |
| 10:45 – 11:15 | Morning Coffee Break | Club de la Chasse ↗ |
| 11:15 – 12:00 |
Talk: Raul Castro Fernandez
Data Ecology: Understanding and Designing Data Ecosystems
AbstractHide abstractData shapes our world not only through personal data, but through every dataflow that determines what governments see, what firms model, what AI systems learn, and what organizations can do. Yet we lack a systematic account of what data does in these complex ecosystems. Without one, interventions remain partial or poorly targeted, while beneficial arrangements—rare-disease consortia, accountable government data access, compensation for data contributors—often fail to form. This talk introduces data ecology: a research program that studies and designs data ecosystems as systems of agents interacting through dataflows, drawing on tools from computer science, economics, law, and philosophy. The first part asks what data does, presenting the potential-effect function and the structural properties it implies: correlated spillovers, integration hubs, and dataflow dependence. The second part turns to design, focusing on three classes of data ecosystems: intra-organizational systems, illustrated by Pneuma, an agentic system for relational data work; cross-organizational systems, illustrated by a data escrow that makes data sharing programmable; and data-sharing markets, illustrated by consortia protocols for multi-party pooling. I will close by sketching the larger program and several pieces of in-progress work. |
Club de la Chasse ↗ |
| 12:00 – 13:45 | Lunch Break (on your own) | — |
| 13:45 – 14:30 | Talk: Federico Echenique Title & abstract — TBD | Club de la Chasse ↗ |
| 14:30 – 15:15 |
Talk: Kai Hao Yang
Non-Discriminatory Personalized Pricing
AbstractHide abstractA monopolist offers personalized prices to consumers with unit demand. Consumers differ in their values, costs, and protected characteristics—such as race or gender. The seller is subject to a non-discrimination constraint: consumers with the same cost, but different protected characteristics must face identical price distributions. Such regulations are present in markets like credit or insurance. We characterize the optimal pricing rule. Under this rule, surplus accrues to both protected groups, but only to those with intermediate values. Strengthening the constraint to cover transaction prices redistributes surplus, harming the low-value group and benefiting the high-value group. Meanwhile, prohibiting the use of protected characteristics as pricing inputs instead of regulating outputs harms the low-value group. |
Club de la Chasse ↗ |
| 15:15 – 15:45 | Afternoon Coffee Break | Club de la Chasse ↗ |
| 15:45 – 16:30 | Talk: Ali Makhdoumi Title & abstract — TBD | Club de la Chasse ↗ |
| 16:30 – 17:15 |
Talk: Juba Ziani
Data Sharing with Endogenous Choices over Differential Privacy Levels
AbstractHide abstractMotivated by the rapid push to decentralize sharing of data, we study whether large-scale data sharing coalitions can form in a decentralized manner under differential privacy when players have heterogeneous privacy preferences. We first consider a fully decentralized data-sharing mechanism in which each player decides whether to participate and how much privacy noise to add locally to their sensitive data before sharing. Privacy choices induce a fundamental trade-off: higher privacy lowers individual privacy costs but reduces data utility and statistical accuracy for the coalition. These choices generate externalities across players, making both participation and privacy levels strategic. Our goal is to understand which coalitions are stable, how privacy choices shape equilibrium outcomes, and how fully decentralized data-sharing compares to a centralized, socially optimal benchmark when the number of players is large. We provide a comprehensive analysis across multiple privacy-cost regimes corresponding to different attack/observation models in differential privacy, showing that full decentralization is highly inefficient in terms of both social welfare and estimator accuracy. Surprisingly, we find that a simple partially decentralized mechanism (where players still retain participation agency, but a central designer chooses a fixed privacy noise level for everyone) closes this efficiency gap down to constant factors across all privacy-cost regimes. |
Club de la Chasse ↗ |
| 17:15 – 17:30 | Walk to Rice Global Paris Center | — |
| 17:30 – 19:00 | Poster Session & Reception
(Note: change of location) |
Rice Global Paris Center ↗ |
| Time | Activity | Location |
|---|---|---|
| 08:15 – 09:00 | Breakfast | Club de la Chasse ↗ |
| 09:00 – 09:45 |
Talk: Rasmus Pagh
Consistent Release of Hierarchical Data Under Differential Privacy
AbstractHide abstractStatistical data such as census data is often released in hierarchical form, with counts reported at multiple geographic levels such as blocks, municipalities, regions, states, and the nation as a whole. Differential privacy provides strong protection for individuals represented in such data by adding random noise to released counts. However, independent noise addition leads to inconsistencies: the reported count for a region need not equal the sum of the reported counts for its subregions. In joint work with Lebeda and Sejer, we show that consistency and improved accuracy can be achieved simultaneously through a direct recursive approach based on optimal matrix factorizations. Compared with the Gaussian mechanism at the same privacy guarantee, our method can reduce variance by up to a factor of three while producing consistent releases by construction. This improves upon previous “consistency by post-processing” approaches. We also present lower bounds suggesting that the method is optimal at least for some hierarchies, and show how it extends efficiently to sparse vector data. Finally, we discuss implications for practical deployments of differential privacy, including the release of U.S. Census statistics. |
Club de la Chasse ↗ |
| 09:45 – 10:30 | Talk: Edwige Cyffers Title & abstract — TBD | Club de la Chasse ↗ |
| 10:30 – 11:00 | Morning Coffee Break | Club de la Chasse ↗ |
| 11:00 – 11:45 |
Talk: Antti Honkela
Towards Privacy Standards for AI in Health
AbstractHide abstractPrivacy is widely regarded as an essential requirement for AI in health applications, yet implementing it in practice at scale is far from trivial. In my talk, I will discuss challenges in the implementation of privacy in the context of secondary use of health data in the European Health Data Space (EHDS) as well as steps toward possible solutions. These include the calibration of the privacy–utility tradeoff, quantifying and communicating the privacy of common algorithms as well as external verification of privacy promises. |
Club de la Chasse ↗ |
| 11:45 – 13:45 | Lunch Break | On your own |
| 13:45 – 14:30 | Talk: Ravi Kumar Title & abstract — TBD | Club de la Chasse ↗ |
| 14:30 – 15:15 | Talk: Eric Mazumdar Title & abstract — TBD | Club de la Chasse ↗ |
| 15:15 – 15:45 | Afternoon Coffee Break | Club de la Chasse ↗ |
| 15:45 – 16:30 |
Talk: Steven Wu
The Agentic Garden of Forking Paths
AbstractHide abstractAI agents are reshaping data science by automating both machine learning research and data analysis in empirical research. This talk examines what happens when agents enter two classical sources of statistical concern: adaptive reuse of held-out data and the garden of forking paths. In the first part, we use autonomous research agents to explore why repeated benchmark use often does not lead to severe overfitting, showing that successful agent-discovered strategies can be compressed into very short prompts and reproduced by fresh agents. In the second part, we use many AI analysts to explore how different defensible analytical choices can lead to different conclusions, while also revealing how easily agentic analysis can be steered toward selective reporting. Together, these results suggest that agents can help map and stress-test scientific workflows, but also demand new tools, audits, and standards for validity in the agentic era. |
Club de la Chasse ↗ |
| 16:30 – 17:15 |
Talk: Clément Canonne
Verification of Statistical Properties from Sensitive Data
AbstractHide abstractAnalyzing (very) large datasets to build accurate models is the workhorse of machine learning and underlies most of the advances in AI/ML over the past decades. These datasets are increasingly seen as valuable assets, e.g., due to the difficulty in obtaining them (sensitive, regulated, or carefully curated user data), generating them (compute-heavy processes), or trusting them (poisoning attacks). For companies owning such datasets, this leads to a thorny issue: how to convince interested customers that a dataset is reliable and has the application-specific statistical properties they need, while revealing as little data as possible? In this talk, I will discuss a recent line of work, from the theoretical computer science community, aimed at designing principled approaches to address this problem. |
Club de la Chasse ↗ |
| 17:15 – 17:30 | Closing Remarks | Club de la Chasse ↗ |