Looking back over the past two decades, what do you see as the most significant milestones in perfSONAR’s evolution?
Mark: Three milestones stand out. First is perfSONAR coming into existence. The idea that you could ask systems across multiple organizations — often those you don’t control — to participate in a measurement was novel 20 years ago. It’s still rare outside of R&E.
Second is the establishment of the development collaboration. Having multiple organizations commit full-time, professional staff to the project gives it long-term support and a lot of accumulated institutional wisdom about how perfSONAR works and how best to apply it.
Third is the expansion of what perfSONAR could do that began in 2017 with the release of 4.0. The modular architecture of pScheduler has enabled the ability to perform more types of measurements (approximately two dozen) with more tools (over 40) and more ways to send them into other systems for storage or further processing (about a dozen). The API that came with it made perfSONAR into a measurement building block for other systems, making it more flexible and extensible. We’re now at version 5.2.2 and continue to reap the benefits of that shift.
How
has perfSONAR shaped the way R&E networks think about and approach performance
monitoring and troubleshooting?
Mark: In many ways, it goes the other way around:
R&E’s approach has shaped perfSONAR. At my first Internet2 Technology Exchange in
2015, I heard a lot from our members about what they wanted. The development team went on to
check off almost every box on that wishlist.
R&E networks differ from those in
other sectors because they’re operated by multiple organizations alone and in concert. The open
collection and sharing of performance data across networks is a side effect of everyone trying
to achieve the singular goal of making them valuable tools for moving scientific data quickly
and efficiently.
Back to your original question, perfSONAR has influenced R&E’s
thinking in its by-the-community, for-the-community approach. Network operators have the freedom
to deploy it widely without worrying about the cost beyond the hardware — and even that can come
off the surplus pile. Our organizational structure has even been used as a template for other
community-led projects.
How
is the implementation of perfSONAR on Internet2’s network benefiting the
community?
Mark: perfSONAR’s mantra has always been “the more,
the merrier,” because more measurement points mean a finer-grained view of exactly where a
problem is occurring. Historically, Internet2 had perfSONAR nodes at every point of presence,
with many supporting internal visibility and operational needs and a few open to the community.
With the launch of the Next Generation Infrastructure in 2021, open-to-the-community perfSONAR
now exists everywhere on the Internet2 network,
giving members end-to-end visibility from their devices to the nearest edge of the Internet2
backbone to the points handling their data as it travels.
Can
you share a story or example where perfSONAR made a real difference?
Mark: One of my favorite examples happened during
SC18, where I ran into a network engineer from a small university in New England who was
struggling with dropped packets that disrupted measurements with other institutions. He
suspected one specific device, but didn’t have a good way to prove it.
We talked a
bit about his test infrastructure and found two of his perfSONAR nodes that were well-placed to
narrow it down. We ran some ad hoc tests from my laptop, which we perched atop the only surface
we could find, a stray stool. The first ones replicated the problem, and we did some follow-ons
to see where it stopped. It turned out that the suspect device was dropping otherwise-valid ICMP
packets above a certain size but smaller than the MTUs on the interfaces. The device vendor
confirmed the bug, issued a patch, and the problem was solved.
How
is perfSONAR preparing for the emerging challenges and opportunities facing the R&E
community?
Mark: The development team has spent the last decade
changing nearly every corner of perfSONAR from something with limited capabilities to something
we can expand easily. The current architecture doesn’t do absolutely everything we or anyone
else in the community has dreamed up, but it has handled nearly all of the curveballs thrown at
it. That flexibility gives me a lot of confidence that, as R&E comes up with new things to
measure or ways to report the results, we’ll be able to integrate them.
Looking to the next perfSONAR release, what new capabilities
can the user community expect?
Mark: The theme for version 5.x, which has been
underway for two years, has been overhauling storage and visualization. A lot of that software
was developed in-house, which was the right thing to do at the time, but we’ve been able to
build on well-understood, open-source software like LogStash, OpenSearch,
and Grafana. That
puts a lot of power to analyze and display results in the hands of the community, who can do
that on their own terms.
pSConfig, our program for centrally
managing fleets of perfSONAR nodes, was overhauled in an earlier release and will continue to be
part of the ecosystem. Its companion tool, the pSConfig Web Administrator, which allows
building up mesh configurations without having to handwrite JSON, will see a significant upgrade
in a future release. The new tool, called pSCompose, will interface directly with pScheduler, providing immediate access to new tests,
tools, and archivers.
Imagine perfSONAR 10 years from now – what will it be doing
that it doesn’t do today?
Mark: perfSONAR primarily resides at Layer 3 because
that’s where science data lives. As boring as it may sound, I expect the emphasis to remain
there for quite some time. perfSONAR’s architecture will allow integration with new tools to
measure new protocols as they evolve.
What I think will be most interesting is less
about new protocols and tools and more about unusual applications. For example, I’ve been
thinking about ways to use perfSONAR as a unit-testing framework to verify that network security
policies have been properly implemented, maybe even cooperatively between institutions. Doing
this as part of a regular testing regime would shine a light on problems that might have
remained hidden until an incident happened.
Is
there anything else happening with perfSONAR this year that the community should be
aware of?
Mark: perfSONAR will once again be part of SCinet at SC25, Nov. 16-21 in St. Louis, its 20th
appearance since the prototype debut in 2005. Like the rest of SCinet, the deployment there has
become a testing ground for new ways to run it. Much of what we do there has pollinated what we
do at Internet2 and what the project recommends to perfSONAR administrators.
The 2025 Technology Exchange, Dec. 8-12 in Denver, will offer
a full-day perfSONAR workshop on Dec. 8. It will cover everything from installation to using
pScheduler’s command-line interface to customizing Grafana dashboards. The development team will
be giving a talk about the current and future state of the project in the Advanced Networking
track on Dec. 10.
We have a few other things in the pipeline that aren’t ready to be
shared yet, so keep your ears open.