European startup Pruna AI announces the launch of its open-source framework designed to optimize artificial intelligence models through compression methods such as caching, pruning, quantization, and distillation. This framework aims to enhance the efficiency of AI models while standardizing the save and load processes for compressed models.
John Rachwan, co-founder and CTO of Pruna AI, explained in an interview with TechCrunch, “We are standardizing the call, save, and load processes for compression methods much like Hugging Face did for transformers and diffusion models.” The framework will evaluate whether significant quality loss occurs after model compression and assess the resulting performance gains.
Currently, major AI labs like OpenAI utilize various compression techniques. For instance, OpenAI applied distillation to create the faster GPT-4 Turbo version. The distillation process, which trains a smaller, more efficient model based on the knowledge of a larger model, has proven effective, as shown by Black Forest Labs’ Flux.1-schnell model.
Rachwan pointed out that while large companies often build their tools internally, Pruna AI seeks to provide an all-in-one solution in the open-source world. He stated, “What you typically find in open-source is based on individual methods, such as one method of quantization for LLMs or one caching method for diffusion models. Pruna combines them all, simplifying usage and integration.”
The Pruna AI framework supports a wide range of model types, focusing particularly on optimizing generative models for images and videos currently. Clients already using the framework include Scenario and PhotoRoom.
In addition to the open-source version, Pruna AI offers an enterprise solution featuring an optimization agent that automatically finds the best parameters for model compression. Rachwan describes this agent, saying, “You upload your model and specify that you want more speed but with no accuracy loss greater than 2%. The agent will then determine the best combination of techniques.”
For its professional version, Pruna AI employs a pay-as-you-go model, similar to renting GPUs via cloud services. Rachwan emphasizes the financial benefits, noting that a properly optimized model can save substantial computing costs, citing an example where Pruna AI successfully reduced a Llama model by eight times with negligible quality loss.
Recently, Pruna AI secured $6.5 million in seed funding from investors such as EQT Ventures, Daphni, Motier Ventures, and Kima Ventures. The open-source framework is set to become available this Thursday, indicating Pruna AI’s commitment to enhancing the field of AI model optimization.
