When Microsoft launched its Copilot+ PC range almost a year ago, it announced that it would deliver the Copilot Runtime, a set of tools to help developers take advantage of the devices’ built-in AI accelerators, in the shape of neural processing units (NPUs). Instead of massive cloud-hosted models, this new class of hardware would encourage the use of smaller, local AI, keeping users’ personal information where it belonged.
NPUs are key to this promise, delivering at least 40 trillion operations per second. They’re designed to support modern machine learning models, providing dedicated compute for the neural networks that underpin much of today’s AI. An NPU is a massively parallel device with a similar architecture to a GPU, but it offers a set of instructions that are purely focused on the requirements of AI and support the necessary feedback loops in a deep learning neural network.
The slow arrival of the Copilot Runtime
It’s taken nearly a year for the first tools to arrive, much of them still in preview. To be fair, that’s not surprising, considering the planned breadth of the Copilot Runtime and the need to deliver a set of reliable tools and services. Still, it’s taken longer than Microsoft initially promised.
Why PyTorch?
PyTorch provides a set of abstractions and features that can help build more complex models, with support for tensors and neural networks. Tensors make it easy to work with large multidimensional arrays, a key tool for neural network–based machine learning. At the same time, PyTorch also provides a basic neural network model that can both define and train your machine learning models, with the ability to manage forward passes through the network.
It’s a useful tool, as it’s used by open source AI model services such as the Hugging Face community. With PyTorch you can quickly write code that lets you experiment with models, allowing you to quickly see how changes in parameters, tuning, or training data affect outputs.
Bringing PyTorch to Arm
With Copilot+ PCs at the heart of Microsoft’s endpoint AI development strategy, they need to be as much a developer platform as an end-user device. As a result, Microsoft has been delivering more and more Arm-based developer tools. The latest is a set of Arm-native builds of PyTorch and its LibTorch libraries. Sadly, these builds don’t yet support Qualcomm’s Hexagon NPUs, but the Snapdragon X processors in Arm-based Copilot+ PCs are more than capable enough for even relatively complex generative AI models.
Installing PyTorch on Windows on Arm
I tried it out using a seventh-generation Surface Laptop with a 12-core Qualcomm X Elite processor and 16GB of RAM. (Although it worked, it showed an interesting gap in Microsoft’s testing: The chipset I used was not in the headers for the code used to compile PyTorch.) Like most development platforms, it’s a matter of getting your toolchain in place before you start coding, so be sure to follow the directions in the announcement blog post.
As PyTorch depends on compiling many of its modules as part of installation, you need to have installed the Visual Studio Build Tools, with support for C++, before installing Python. If you’re using Visual Studio, make sure you’ve enabled Desktop Development with C++ and installed the latest Arm64 build tools. Next, install Rust, using the standard Rust installer. This will automatically detect the Arm processor and ensure you have the right version.
Running AI models in PyTorch on Windows
You’re now ready to start experimenting with PyTorch to build, train, or test models. Microsoft provided some sample code as part of its announcement, but I found its formatting didn’t copy to Visual Studio Code, so I downloaded the files from a linked GitHub repository. This turned out to be the right choice, as the blog post didn’t include the essential requirements.txt file needed to install necessary components.
The sample code downloads a pretrained Stable Diffusion model from Hugging Face and then sets up an inferencing pipeline around PyTorch, implementing a simple web server and a UI that takes in a prompt, lets you tune the number of passes used, and sets the seed used. Generating an image takes 30 seconds or so on a 12-core Snapdragon X Elite, with the only real constraint being available memory. You can get details of operations (and launch the application) from the Visual Studio Code terminal.
Explore IT Tech News for the latest advancements in Information Technology & insightful updates from industry experts!
Source: https://www.infoworld.com/article/3980180/running-pytorch-on-an-arm-copilot-pc.html