While China’s most ambitious open-source model may have been quietly fed by one of its Western rivals, if the product is an Open Source LLM better than GPT-4, does anyone really care?
A couple of months ago we posted about DeepSeek training AI for pennies on the dollar and then another one about an entire DeepSeek ecosystem. While OpenAI might’ve set the pace for LLMs, DeepSeek is sprinting hard to catch up. The company just dropped an upgraded version of its R1 large language model, a reasoning-first AI built to tackle the logic and math gaps where models like GPT-4 sometimes fumble.
On paper, DeepSeek R1 is serious muscle. It was trained on 7 trillion tokens, can handle 128K context windows, and here’s the kicker, it’s open-source. All of it. No waitlists, no API traps, just raw access. That alone makes it stand out in a world where GPT-4 and Gemini still live behind velvet ropes. But the real story? A quiet controversy bubbling underneath.
Reports suggest DeepSeek may have used outputs from Google’s Gemini to help train its latest model. So while DeepSeek is publicly challenging OpenAI, it may have taken a detour through Mountain View to get there. Ambitious? Yes. Ethical? That’s a bit murkier.
Built to reason, trained at scale, and released into the wild

DeepSeek isn’t new to the game. Its original R1 model made waves for outperforming GPT-4 on several logic and math-heavy benchmarks. But the latest version ups the stakes. According to the company’s own Hugging Face listing, DeepSeek R1-32B is trained on a whopping 7 trillion tokens, one of the largest training sets ever made public. It uses a custom tokenizer, supports a 128,000-token context window, and boasts performance boosts in reasoning tasks, coding, and long-context understanding.
In short, it’s aiming squarely at GPT-4’s weak spots: multi-step reasoning, instruction following, and math. It even claims “GPT-4-level performance” across several industry-standard benchmarks. But DeepSeek isn’t just flexing numbers. It’s open-sourcing everything, from the model weights to the training framework, allowing researchers, startups, and competitors to poke, prod, and build off it.
That transparency? It’s a strategic flex in itself, especially as OpenAI and Google double down on closed models and enterprise licensing deals.
But that open-source badge comes with a footnote. According to a TechCrunch report, DeepSeek may have trained R1 using outputs from Google’s Gemini models. The details are hazy, there’s no official confirmation from DeepSeek, but the implication is eyebrow-raising. If true, it means China’s most ambitious open-source model was quietly fed by one of its Western rivals. This isn’t unprecedented.
Many LLMs scrape the internet and pick up content generated by other models. But when you’re training a competitor to GPT-4 and touting your model’s originality, relying on Gemini’s outputs starts to blur the ethical lines. Did DeepSeek just get clever with public data, or did it tiptoe past the spirit of open-source integrity?
Either way, it’s a reminder that in 2025’s AI race, the line between inspiration and ingestion is getting thinner by the day.
The Proverbial Center of Gravity

Now the real question isn’t whether DeepSeek R1 is impressive. It clearly is. The question is whether it shifts the “center of gravity” in the global AI race. By combining large-scale training, reasoning-first architecture, and open access, DeepSeek is carving out a new lane, one that appeals to researchers, developers, and even countries looking for alternatives to Western AI gatekeepers.
China’s open-source push isn’t just about democratizing tech. It’s also about decentralizing influence. When DeepSeek posts a model that rivals GPT-4 and releases it freely while the West keeps its best locked down, it changes the narrative. Suddenly, Silicon Valley isn’t the only place building next-gen reasoning machines. And that’s a big deal.
Still, DeepSeek R1 isn’t perfect. It’s not as well-integrated across platforms as GPT-4. It lacks some of the conversational finesse of Gemini. And if the Gemini-training claims hold water, it may raise long-term questions about originality and dependence. But none of that cancels out the big takeaway: DeepSeek has moved from fast follower to credible challenger.
Whether that challenge is built entirely from scratch, or patched together with borrowed code and clever data scraping, is almost secondary. In 2025, the AI landscape doesn’t reward purity. It rewards performance. And right now, DeepSeek’s R1 is performing loud and clear.
Open Source AI
In conclusion, DeepSeek’s R1 isn’t just another LLM, it’s a signpost. One that says the AI game is now global, messy, and wide open. Whether it truly matches GPT-4 in practice, or just looks good on benchmarks, almost doesn’t matter. It’s open. It’s fast. And it’s coming from a country that’s rapidly proving it doesn’t need Silicon Valley’s permission to play catch-up.
The Gemini rumour? It adds spice. Maybe even suspicion. But let’s be honest, AI training in 2025 is a soup of gray zones and scraped data. What matters is that DeepSeek released something real, scalable, and ready to be tested by anyone!
In case you missed:
- Training AI for Pennies on the Dollar: Are DeepSeek’s Costs Being Undersold?
- DeepSeek’s AI Revolution: Creating an Entire AI Ecosystem
- China launches world’s first AI-powered underwater data centre!
- Samsung’s new Android XR Headset all set to crush Apple’s Vision Pro
- China’s just created an AI-powered Inspector for Nuclear Disarmament!
- This Prosthetic Hand can detach and crawl across the floor. How cool is that?
- AI’s Inner Demons: Hallucinations or just a look in the Mirror?
- NVIDIA’s Isaac GR00T N1: From Lab Prototype to Real-World Robot Brain
- PVC Meta Coins: The Next Big Thing in Crypto?
- CES 2025: NVIDIA’s Cosmos Just Gave Robots a ‘ChatGPT Moment’!