5 Questions for Karl Newell and Chris Wilkinson on Embracing Network Automation at Internet2
By Amber Rasche - Senior Communications Specialist, Internet2
Estimated reading time: 9 minutes
Let’s make network automation a priority in 2023. We’ve heard the community loud and clear: It’s time to move the conversation from “if” to “when and how” the research and education (R&E) community can embrace automation for the betterment of all.
In this “Embracing Network Automation” blog series, we’re gathering insights from several R&E network organizations that are paving the way for progress in this space.
Karl Newell is the network software architect and Chris Wilkinson is the senior director of infrastructure engineering and architecture for Internet2 Network Services. In this Q&A, Karl and Chris discuss how Internet2 got started on its network automation journey and unpack some of the biggest wins and hardest lessons learned along the way. They also share what’s next for Internet2 in the automation space and offer advice on how others in the R&E community can embrace network automation to benefit their organizations and the people they serve.
When did Internet2 first embark on its network automation journey?
Chris Wilkinson: Arguably, Internet2 has been using automation in various forms for well over a decade. One of the first examples is OESS, or the Open Exchange Software Suite, a web interface and API that enabled our members to self-provision layer 2 and layer 3 services on the Internet2 network, including Cloud Connect. OESS created active configuration elements on Internet2 switches, and it maintained its own source of truth. Before the Internet2 Next Generation Infrastructure (NGI) project, the GlobalNOC developed a series of tools that allowed engineers to test and deploy groups of configuration elements to Internet2 switches. These show a progression of increasingly complex, community-developed software packages combining APIs, or application programming interfaces, and manual interfaces to provision services – I think by most standards those examples meet the definition of automation as we understand it today.
Karl Newell: In those earlier examples, I think it’s important to clarify that automation facilitated service provisioning but did not cover all of the device configurations. So there was still a lot of manual configuration work that was done by engineers. During the planning stages for NGI in 2018, we realized we were going to double the number of devices in the network and needed to automate the provisioning and management of network configuration, too. We set a goal to automate 100% of the network configuration and 100% of the service provisioning, and we achieved that goal with NGI.
What are some of the biggest wins Internet2 has had in the network automation space and how are those successes benefiting the community you serve?
Karl Newell: One of the first automation benefits we saw with NGI was during the migration to the new network. We migrated over 1,200 BGP peers in 30 days – up to 250 BGP peers in one night – a much more rapid pace than we had ever achieved before.
Since then, configuration consistency – and the stability it provides – is another big win that we’ve seen. Manual configuration can result in configuration drift, where copying and pasting elements from old devices, making changes, and potentially making mistakes can carry over from device to device. And so you have this progression over time of your configuration slowly drifting and becoming inconsistent across devices, which can cause huge problems when engineers are troubleshooting, mitigating security vulnerabilities, or deploying updates. As a large service provider network, we have a lot of devices and our route policy is extremely complex – so consistency through automation and orchestration alleviates these problems and ultimately results in a more stable, resilient network.
Chris Wilkinson: And that’s not to say that our engineers never log into network devices; there are cases where they make manual changes on the fly. But well-executed automation and orchestration work to minimize such actions and force an acceptance process. For example, if our solution detects manual entries, it prompts us to review those and bring them into standardization. Automation provides checks and balances to prevent that configuration drift that Karl mentioned.
I would also add simpler and faster service provisioning as a clear benefit. We use Cisco Network Services Orchestrator (NSO) and, once an engineer is trained to use NSO, it’s more of a cookbook-type provisioning exercise. We have all the ingredients – that is, valid and consistent configuration elements – already gathered in NSO. Engineers no longer have to be in the weeds dealing with 50-plus lines of configuration code; they only need to deal with the 5 lines that actually require their input. So it’s much more straightforward.
Karl Newell: Along those same lines, we can also deploy mass updates quickly and easily. For example, if we found an issue in our security policy or route policy, we could very quickly redeploy configurations network-wide. That’s in stark contrast to the previous state where we had to log into each router, which is time-consuming.
One last benefit I’ll mention is being able to provide our membership with visibility into their services. Through NSO and the work we did to prepare for automation, we’ve got our data and service models in a very normalized and structured state. As an example, we know that a specific BGP peer is associated with a particular member, so we can make that service visible to them within the new Internet2 Insight Console, where they can also manage and troubleshoot it. The consistency required up front and maintained through automation and orchestration makes this possible.
What are two of the hardest lessons Internet2 has learned about network automation along the way?
Chris Wilkinson: Automation is a bit of a paradigm shift for network engineers, and training is essential for helping them get accustomed to using tools like NSO. All of the automation and orchestration tools on the market, including NSO, are Swiss army knives meant to be adapted to your particular use case. So generic network automation training isn’t an option.
Karl Newell: I agree. Our software development and network engineering teams are tightly integrated, but we saw gaps in training and knowledge around our automation framework on the operations side. And to Chris’s point, we needed to develop training tailored to our use case while building the automation solution. The great part about that is, once engineers were trained in how to use NSO, they quickly adapted and started using it in ways we never imagined because it made their lives easier. Those new insights fed into our development work and helped us quickly respond to changes in design and scope.
Chris Wilkinson: Automation also requires substantial sources of truth to be successful. So there’s a lot of upfront work required to generate those inputs and then develop the automation software and underlying orchestration tools.
Karl Newell: Yes, and because of that, I think false starts and mistakes are inevitable. So the sooner you can get started the better because it’s going to be slower than expected and needs to be iterative. We rewrote our initial NSO service models more than once as we figured out the best way to leverage the tool. One hard but important lesson we learned is that it’s okay to throw away work – it’s more productive than trying to hold onto something that doesn’t meet your needs. It’s not a waste of effort. We learned a lot along the way.
Chris Wilkinson: Absolutely. Having a solid project plan with adequate time and resources is ideal. But automation isn’t easy, it’s not instant results, and there’s never enough time – your team has to make compromises and be agile to get the job done.
What’s next for Internet2 in the network automation space?
Chris Wilkinson: With our orchestration tools reaching a level of maturity, we’re very much focused on the member-facing interfaces and tools that are web-based and API-based.
Karl Newell: That’s right. Through the Internet2 Insight Console, we are empowering community members to visualize, manage, and troubleshoot Internet2 network services. Looking Glass, which is an enhanced replacement for the Router Proxy service, is the first feature we made available to the community last month. We will soon begin piloting Virtual Networks, a replacement for OESS based on modern frameworks and design principles. Future releases will address routing integrity issues by providing more access to services like BGP peering configurations and prefix management, as well as visibility into telemetry and monitoring, routes announced and rejected, and RPKI Route Origin Authorization status.
We also want to help the community make progress with automation. How can we help members get started or keep moving forward? How do we all work together? The Internet2 NTAC Network Automation Special Interest Group convenes community members monthly to talk about automation ideas, challenges, and progress. (Internet2 community members who are interested in joining can complete this form.) We also have some great sessions on automation taking place at the 2023 Internet2 Community Exchange, including two workshops: “Get Started with Network Automation” and “Automation with Cisco Network Services Orchestrator.” We hope to see you there!
What advice would you give to peers across the R&E community on how to embrace network automation at their organizations?
Karl Newell: Whenever I talk to community members about network automation, I always stress the importance of integrating and cross-training your network engineering and software development staff. In other words, break down the silos and ensure they’re working together.
Also, don’t boil the ocean. The biggest impediment to getting started in network automation is trying to do too much at once. It’s easy to get overwhelmed. You may not know where to start or get hung up with unreasonable expectations looking for the easy button that doesn’t exist. Instead, identify a well-defined process – something you do frequently – and then figure out how to automate it.
Chris Wilkinson: To bolster what Karl advised, make sure your organization also has executive support and cross-team buy-in for automation, including the resources you need to get the job done. There is no doubt that organizations are almost always resource-constrained, but having executive support and knowing automation is a priority for the community made it much easier for Internet2 to have success in this space.