When Less Means More: How Jevons Paradox Applies to Our Post-DeepSeek World
DeepSeek R1's radical efficiency marks a paradigm shift in AI, dramatically reducing training costs and resource needs. Enterprises must prepare for increased AI experimentation, deployment and demand, leveraging AI for maximum strategic advantage.
Efficiency unlocks the next AI revolution
For the past several years, large language models (LLMs) have followed a predictable trajectory: scale, scale and scale some more. More parameters, more compute and more GPUs — a brute-force approach that has delivered state-of-the-art performance but at an astronomical cost.
Then, DeepSeek R1 happened.
DeepSeek's radical efficiency — using just 2,048 GPUs instead of more than 100,000 to train a cutting-edge model — proves the AI arms race is shifting from sheer size to intelligent optimization.
DeepSeek isn't just an incremental step forward: it's a fundamental shift in how AI is built, deployed and commercialized.
Here's where things get interesting. Whenever a technology becomes dramatically more efficient, its use doesn't plateau — it explodes. This phenomenon, known as Jevons Paradox, suggests that as AI models become cheaper to train and infer, several things are likely to happen:
- Hardware providers: High-performance hardware demand will continue to increase
- Cloud providers: GPUaaS and related AI PaaS demand will increase
- Frontier model providers: Models will be iterated and improved at a much more rapid pace
- AI App providers: New capabilities will be released more frequently
- Enterprises: There will be an explosion of internal experimentation, AI development and AI app vendors to choose from
For enterprises, it will be key to prepare for this in three ways:
- Have a plan to manage cloud and/or infrastructure costs
- Develop a method for judiciously selecting which AI application vendors to "hitch your wagon" to and how to continue to experiment with others
- Create a governance process to determine which use cases are worth building custom AI capabilities for
This will allow enterprises to efficiently separate the signal from the noise and ensure they are leveraging their most precious resources for the highest value-potential use cases.
In this post-DeepSeek world, businesses that invest in AI efficiency now will own the future of their industries.
From brute-force scaling to smarter scaling
The chart below highlights a striking trend: GenAI research has skyrocketed from a handful of papers in 2017 to nearly 20,000 published in 2024.
This surge in AI performance isn't only about raw computing power or massive capital investment. While scaling compute has enabled larger, more capable models, significant breakthroughs have resulted from the ingenuity of AI researchers. Advances in novel architectures and training strategies — like those behind DeepSeek R-1 — show that fresh ideas can drive performance leaps beyond what brute-force scaling alone can achieve.
To appreciate the magnitude of this shift, let's compare the old and new paradigms of AI development:
Feature | Pre-DeepSeek Scaling (GPT-4, Gemini, LLaMA 2/3, etc.) | Post-DeepSeek R1 Scaling (New Paradigm) |
---|---|---|
Parameter Utilization | All parameters (trillions) loaded and used at once | Mixture of Experts (MoE) – only a subset (e.g., 37B out of 671B) activated per query |
Precision Optimization | FP32 (Full precision) or FP16 (Half-precision) | Reduced precision (potentially 8-bit or lower) for reduced memory footprint |
Inference Efficiency | Requires large memory footprint for all queries | Dynamic activation of specialized modules for efficiency |
Training Compute Needs | Tens of thousands of NVIDIA A100/H100 GPUs (e.g., 100,000+ GPUs for GPT-4) | ~2,000 GPUs (e.g., 2,048 H800 GPUs for DeepSeek R1) |
Training Cost | $100M–$500M per training run | Estimated $5M–$20M per training run |
Inference Speed | Slower due to full model activation | Faster token processing through modular computation |
Hardware Requirements | High-end NVIDIA GPUs (A100/H100) with proprietary optimizations | Mix of commodity chips (e.g., RISC-V) and GPUs |
Memory Usage | High memory usage per token | Over 75% memory reduction via MoE & precision reduction |
Accessibility for Startups | Prohibitively expensive for new entrants | Drastically lower cost barrier, enabling smaller players |
Scalability Bottlenecks | GPU scarcity and high costs | Overcomes GPU shortage by reducing compute needs |
DeepSeek's model proves frontier model companies no longer need hyperscale infrastructure to develop competitive AI. Instead, MoE (mixture of experts) architectures, lower precision computing and selective activation of model parameters have slashed costs while improving efficiency.
Jevons Paradox in AI: The more efficient, the more demand
At first glance, reducing AI training costs might seem like it would reduce the need for compute overall — after all, fewer resources are needed per training run. But Jevons Paradox tells us the opposite happens: when technology becomes more efficient, consumption skyrockets.
Historically, this paradox has played out across multiple industries:
- Coal and the Industrial Revolution: When James Watt improved steam engine efficiency, coal demand soared because more industries could afford to use steam power.
- Semiconductors and computing: As Moore's Law drove down the cost per transistor, computing became ubiquitous, leading to today's AI-driven world.
- Cloud computing: As AWS, Azure and Google Cloud lowered computing costs, enterprises didn't spend less on IT — they spent more but received exponentially more value.
AI is following the same trajectory. Now that LLMs are dramatically cheaper to train and run, enterprises will:
- Experiment more with fine-tuning: Why rely on OpenAI or Google when you can fine-tune your own open-weight model for a fraction of the cost?
- Deploy AI deeper into business operations: As inference costs drop, AI can expand from chatbots and search to supply chains, manufacturing and R&D.
- Have more off-the-shelf AI applications to choose from: AI applications vendors will get more reps with more real-life use cases and be able to package them into easy-to-deploy software
Rather than slowing AI adoption, the example of DeepSeek's efficiency breakthroughs will accelerate AI, making AI a must-have for every major enterprise.
In 2022, the ChatGPT example proved that LLMs could improve enterprise productivity. Since then, many organizations have focused on the "buy" quadrant of the Build vs. Buy matrix (below).
For many, the safest and least costly option was to buy – or, more precisely, rent – AI tools from foundation model developers like OpenAI, Anthropic, Google, Microsoft and other AI providers. During this time, early adopters had the opportunity to determine their most strategic use cases; gain organizational adoption; cultivate internal skills to tune, develop, deploy, secure and manage AI models; assemble their own AI Centers of Excellence (COE); and determine the ROI from these efforts.
DeepSeek's model makes it clear that now is the time for enterprises to break through and use AI to accelerate growth and transform the delivery of products or services. Cost and infrastructure are no longer the greatest barriers, although a shortage of skills and data quality may often delay the development of the AI that is needed to best serve strategic business objectives.
Boards and C-suites are encouraged to reevaluate their readiness to develop and deploy AI and to revisit the business opportunities and objectives and ask, "How can AI accelerate the success of each objective?"
Important considerations for enterprise boards and senior executives in the post-DeepSeek world:
Frontier model companies will adopt DeepSeek's techniques, which are open source, and will be able to iterate on new models in a rapid fashion, increasing the fidelity of applications that leverage these frontier models. A potential consequence of faster adoption of cloud-hosted AI applications is higher cloud bills - assuming decreasing prices don't race to the bottom - or the need for on-premise infrastructure to affordably scale their increasing inference needs
- Recommendation: Enterprises need be prepare for higher cloud bills and/or consider scaling their on-prem infrastructure faster to handle the greater adoption of AI tools
AI product companies (vertical industry or horizontal workflow applications) now have more choices to develop higher fidelity AI reasoning models which can be leveraged to create and improve their applications.
- Recommendation: Enterprises (non HPC companies or GPUaaS providers) need to prepare for the wave of AI startups pitching their new point solutions, AI assistants, or AI agents. Enterprises need a well-organized governance process to understand when to build or buy based on use cases, business value, and strategic roadmaps.
For high performance compute (HPC) companies and GPU-as-a-Service (GPUaaS) providers, their GPU compute needs will continue to expand even though the compute needed to train each model is declining – as per Jevons Paradox.
- Recommendation: HPC companies and GPUaaS providers need to prepare for an increase in demand for their services. This entails not only an expansion of compute resources but also the energy required to power and cool their data centers in order to meet AI demand.
As AI demand continues to grow, enterprises are driven to increase the development, testing, deployment, and management of their AI applications. However, the skills needed to customize or optimize AI models are not growing fast enough to meet demand. The skills gap is real and will persist.
- Recommendation: Enterprises should assess their readiness for AI, including the array of skills needed across the organization to help scale their AI. Where possible, consider consolidating skilled AI resources into an AI Center of Excellence that can manage data, provide MLOps assistance, and assist with governance.
Keep learning: A Primer on AI Agents and How to Get Started
Enterprise AI security: Mitigating risks with open models
To effectively deploy open-weight models and mitigate the inherent security risk, organizations should utilize a structured risk mitigation framework:
Data security and privacy
Risk: Unknown training data sources and potential privacy violations.
Mitigation: Audit model provenance, enforce enterprise-controlled fine-tuning and ensure all inference occurs on U.S., EU or trusted-host infrastructure.
Model security and integrity
Risk: Backdoors, adversarial attacks and compromised model weights.
Mitigation: Conduct cryptographic integrity checks, implement AI firewalls and run adversarial testing to detect vulnerabilities.
Compliance and legal risks
Risk: Non-compliance with evolving AI regulations.
Mitigation: Align with frameworks like NIST AI Risk Management and sector-specific compliance guidelines.
National security and geopolitical risks
Risk: AI models trained outside the U.S. could have hidden biases or vulnerabilities.
Mitigation: Limit deployment in critical sectors, enforce U.S.-based cloud hosting and conduct rigorous third-party audits.
By implementing these safeguards, enterprises can confidently adopt open AI models while mitigating security, compliance and geopolitical risks.
Final thoughts: Efficiency unlocks the next AI boom
DeepSeek R1 has signaled the dawn of an AI efficiency revolution. The days of 100,000 GPU clusters and billion-dollar training runs appear to be numbered – replaced by an era where the cost to iterate on model capabilities drops significantly.
Jevons Paradox tells us making AI more efficient won't slow its adoption — it will accelerate it. The enterprises that recognize this shift early and invest in their own AI capabilities will gain significant competitive edge.
In the post-DeepSeek world, AI isn't just for the tech giants. It's for every enterprise bold enough to envision and own its future.
The DeepSeek example has helped to democratize access to AI.
This report may not be copied, reproduced, distributed, republished, downloaded, displayed, posted or transmitted in any form or by any means, including, but not limited to, electronic, mechanical, photocopying, recording, or otherwise, without the prior express written permission of WWT Research. It consists of the opinions of WWT Research and as such should be not construed as statements of fact. WWT provides the Report "AS-IS", although the information contained in Report has been obtained from sources that are believed to be reliable. WWT disclaims all warranties as to the accuracy, completeness or adequacy of the information.