JupyterCon returned to San Diego this November, and I’m genuinely grateful I had the chance to be there. The energy of the community—educators, researchers, maintainers, enterprise teams, and so many others—was contagious. Every conversation felt like a reminder of why this ecosystem matters and how lucky we are to be part of it.
For those of us working on Berkeley DataHub, the conference was more than a series of talks. It was an opportunity to reconnect with the people behind the tools we rely on, to learn from their stories, and to reflect on how our own service can grow. What stood out wasn’t one big announcement, but a set of deeper themes that resonated with the challenges and hopes we carry on campus.
Teaching and Infrastructure Must Scale Together
One of my favorite moments was hearing the Berkeley team present Teaching Data Engineering at Scale With Jupyter Notebooks at UC Berkeley (Rebecca Dang, Jonathan M. Ferrari, Christy Quang, Michael Ball, and Lisa Yan). They showed how students can write PostgreSQL and MongoDB queries directly in Jupyter, using familiar tools like JupySQL and pandas, all without the usual headaches of local setup.
For DataHub, this reinforced something I’ve felt for a while: we can’t scale just by tuning clusters. We have to scale alongside instructors, designing workflows that are modular, reproducible, and aligned with how students learn best. When teaching and infrastructure grow together, everything gets easier.

Image: Teaching Data Engineering at Scale With Jupyter Notebooks at UC Berkeley
Real Users Are Messier Than We Think
Another highlight was Adam Thornton’s talk, How 500 Real Users Are Worse Than 3000 Bot Users. It was funny, honest, and very real. Synthetic tests showed platforms handling thousands of sessions without issue—but the moment real humans joined, new problems appeared: unpredictable CPU bursts, memory spikes, restart storms, big data pulls, and a wild mix of notebook workloads.
Talking to others who run large notebook platforms made me feel both seen and motivated. Real users are messy—and that’s not a bug, it’s human nature.
For DataHub, this means we need observability that reflects actual human behavior, not idealized load curves. And our testing should replay real (anonymized) past usage, not just synthetic traffic. If we want resilient infrastructure, we need to understand how people really interact with it.
Sustaining Jupyter Means Sustaining Its People
One of the most meaningful parts of the conference came from community-focused sessions about the lived experiences of JupyterHub maintainers. The Voices of JupyterHub was honest in a way that stayed with me: contributor pathways are still hard to navigate, maintainers carry a heavy emotional and cognitive load, and the “bus factor” is uncomfortably real.
I felt incredibly grateful for the chance to talk to maintainers directly—people whose work shapes everything we do on DataHub. It reminded me how much of what we rely on comes from a small number of deeply committed individuals.
For DataHub, this is a wake-up call. We need to contribute upstream, welcome student developers into JupyterOps work, and create our own contributor pathways. If we benefit from this ecosystem, we have a responsibility to help sustain it.
Sustainability isn’t just technical. It’s relational.
Enterprise Teams Are Already Living Our Future
I loved chatting with people running long-standing enterprise Jupyter deployments—some supporting thousands of daily users. These teams were incredibly generous with their time and insights, and it felt like peeking into a possible future for DataHub.
They’re already solving challenges we’re starting to face:
- multi-tenant isolation
- image governance and compliance
- resource quotas and guardrails
- large-scale environment management
Their architectures—Kubernetes-native patterns, role-based access, automated image pipelines—offered a glimpse of how DataHub might evolve as more departments and research groups adopt it.
We don’t have to become an enterprise platform. But we can borrow the best ideas:
- automated environment pipelines
- consistent container governance
- tenant-level isolation
- robust platform health metrics
- reproducible cloud-native designs
I came away excited about how much we can learn from teams living “our future,” and grateful for their openness in sharing what works (and what doesn’t).
Where This Leaves Berkeley DataHub
Stepping back, the themes of JupyterCon—thoughtful pedagogy, real-world load, community care, and mature platform design—combine into a clear picture: scaling Jupyter isn’t a single technical challenge but a multidimensional one.
For DataHub, I’m taking home several priorities:
1. Realistic Load Testing
Test based on real human behavior—including replayed workloads and anomaly analysis.
2. Stronger Observability
Build dashboards that reflect user experience: session churn, memory spikes, I/O storms, restart loops—not just cluster metrics.
3. Community Investment
Lean into our responsibility as part of the JupyterHub ecosystem: contribute upstream, collaborate with maintainers, and grow new contributors at Berkeley.
DataHub exists to make teaching, learning, and discovery easier at scale—and the conference reminded me just how many people around the world are working toward the same goal. Scaling isn’t just about bigger clusters. It’s about thoughtful pedagogy, resilient communities, understanding real user behavior, and learning from those who’ve already built these systems at massive scale. The future of DataHub is bright—and JupyterCon 2025 made me feel lucky to be part of the community building it.