In December 2025, Zhipu AI released GLM-4.7. A 355 billion parameter model. Open weights. MIT license. Benchmarks that rivaled or beat some of the best models in the world. Tech media ran the story the same way they always do. Big numbers, breathless headlines, zero due diligence.

If this feels familiar, it should. DeepSeek ran the same playbook in January 2025. Drop a model. Flood the press. Generate FOMO. Get mass adoption before anyone asks where the capabilities actually came from.

GLM-4.7 is running the same play. And if you are building on it, feeding it your code, or deploying it in your stack, you need to understand what you are actually dealing with.

They Got Caught Harvesting Data

China’s own National Cyber Security Information Centre flagged Zhipu AI’s chatbot, Qingyan (also known as ChatGLM), for collecting user information beyond what users authorized. This was not an allegation from a Western government trying to score geopolitical points. This was China’s own cybersecurity authority saying Zhipu was collecting too much data.

Think about that. In a country not known for aggressive privacy enforcement, the regulators still found Zhipu’s data practices excessive.

Now consider what happens when you send your proprietary code to their API. Zhipu operates under China’s data governance framework. The Personal Information Protection Law and the Cybersecurity Law. Under these laws, the Chinese government can compel any domestic company to provide access to data for national security purposes. This is not a hypothetical. It is the law. If your code touches Zhipu’s servers, it is subject to Chinese state access. Period.

We wrote about the same structural problem with US servers and the FISA Court’s reach into American companies. The difference is that the US system at least has a court process, however flawed. China’s system does not require even that formality.

The Benchmarks Do Not Mean What You Think

Zhipu reports that GLM-4.7 scores 73.8% on SWE-bench Verified. 95.7% on AIME 2025. 84.9% on LiveCodeBench. 42.8% on Humanity’s Last Exam. These are strong numbers. They are also self-reported.

That distinction matters more than most people realize. These are not independent audits. They are marketing materials published by the company that built the model. Zhipu controls the evaluation environment. They choose which benchmarks to highlight. They choose which configurations to report. And in at least one case, they modified the test conditions. Their own documentation notes that for the τ²-Bench evaluation, they added additional prompts to avoid failure modes and applied fixes from Anthropic’s Opus 4.5 release report. That is not cheating. But it is not an apples-to-apples comparison either.

The broader Chinese AI ecosystem has a documented pattern here. ETH Zurich’s SRI Lab found 50% data contamination overlap in Kimi K2’s benchmark results with the Omni-Math test set. IQuest-Coder claimed 81.4% on SWE-Bench but independent testing put the real number closer to 76%. Self-reported scores in this space should be treated as claims, not facts.

Meanwhile, Claude Opus 4.6 scores 80.8% on SWE-bench Verified. Independently evaluated. No modified test prompts. No asterisks. Claude Sonnet 4.5 hit 100% on AIME 2025 with tool use. The gap between reported and real narrows when a third party is doing the measuring.

Who Built This and Who Backs It

Zhipu AI was spun out of Tsinghua University in 2019. Tsinghua is not just any Chinese university. It is the primary feeder institution for the Chinese Communist Party’s technology leadership pipeline and a central node in China’s national AI strategy. Zhipu’s founding and growth are directly tied to state priorities, backed by both national and private investment aligned with government goals.

The US Department of Commerce placed Zhipu AI on the Entity List in 2025. That is the trade blacklist. The same list that includes Huawei. The US government made a formal determination that Zhipu poses a national security concern serious enough to restrict American companies from doing business with them.

About 85% of Zhipu’s 2024 revenue came from on-premise deployments for government and enterprise clients. Their primary business is selling AI infrastructure to state entities and large organizations that demand data control. They also accelerated global expansion by selling “AI-in-a-Box” servers to governments in Southeast Asia and the Middle East. This is not a startup trying to democratize AI. This is an arm of China’s technology export strategy.

Their newest model, GLM-5, was trained entirely on Huawei Ascend chips using the MindSpore framework. That is not a technical footnote. That is a deliberate demonstration that they no longer need American hardware. Every chip export control the US put in place? They are telling you those controls no longer apply to them.

The Distillation Pipeline

In February 2026, Anthropic told Congress that DeepSeek, Moonshot AI, and MiniMax used roughly 24,000 fraudulent accounts to generate over 16 million interactions with Claude. The purpose was distillation. Feed massive volumes of prompts to a frontier model, harvest the outputs, and use them to train a cheaper model that replicates capabilities the company never independently developed.

The same month, OpenAI alleged that DeepSeek employees used third-party routers and masking techniques to bypass geographic restrictions and systematically extract outputs from ChatGPT. Google’s Threat Intelligence Group reported observing campaigns using over 100,000 prompts aimed at replicating Gemini’s reasoning abilities.

Three of America’s largest AI companies, independently, reported the same thing happening to them at the same time. This is not a coincidence. It is a strategy.

A former Google engineer, Linwei Ding, was arrested and charged with four counts of trade secret theft for stealing AI technology while simultaneously employed by two China-based companies. The Department of Justice has brought charges against multiple Chinese nationals for IP theft spanning electric vehicles, semiconductors, autonomous driving, and AI. US companies lose an estimated quarter to half a trillion dollars annually through IP theft. The FBI calls it the greatest transfer of wealth in history.

Zhipu was not named in Anthropic’s specific complaint. But Zhipu operates in the same ecosystem, under the same government, drawing from the same talent pool, and benefiting from the same state-directed technology acquisition strategy. The distillation campaigns that hit Claude, ChatGPT, and Gemini did not happen in a vacuum. They are part of a systematic effort to close the capability gap without doing the foundational research.

The Open-Source Play Is Strategic, Not Generous

GLM-4.7 is released under the MIT license. Permissive. No restrictions. Run it locally. Modify it. Deploy it commercially. Sounds great. And that framing is exactly the point.

Open-source AI from a US Entity List company backed by Chinese state investment is not a gift to the developer community. It is a distribution strategy. Get the model into as many stacks as possible. Build dependency. Normalize the toolchain. Once developers are building on your architecture, you own the ecosystem regardless of what the license says.

Zhipu is not alone in this. The broader Chinese AI open-source push follows the same pattern. Release open weights at a fraction of the cost of American alternatives. Undercut on price. Flood developer communities. Capture adoption. The $3 per month API pricing is not sustainable generosity. It is market capture subsidized by state backing.

The MIT license does allow you to self-host, air-gap, and run the model without any data leaving your infrastructure. That is a real mitigation if you do it correctly. But most developers will not. Most will hit the API. Most will send their code, their data, their proprietary logic to Zhipu’s servers because it is easier. And that is the play.

Claude Versus GLM-4.7: What the Numbers Actually Say

Here is the honest comparison. No modified prompts. No self-reported claims where independent data exists.

SWE-bench Verified (real-world software engineering):

  • Claude Opus 4.6: 80.8%
  • Claude Sonnet 4.6: 79.6%
  • GLM-4.7: 73.8% (self-reported, highest among open-source)

AIME 2025 (mathematical reasoning):

  • Claude Sonnet 4.5: 100% (with tool use), 87% without
  • Claude Opus 4: 90.0% (high-compute)
  • GLM-4.7: 95.7% (self-reported)

GPQA Diamond (graduate-level reasoning):

  • Claude Opus 4: 83.3% (high-compute)
  • GLM-4.7: not prominently reported (telling in itself)

Bottom line for skimmers:

Claude leads GLM-4.7 by 7 points on SWE-bench (the benchmark that matters for real software engineering) — with independently verified scores vs. self-reported ones. GLM’s numbers come with modified test prompts and zero third-party validation. The real gap is likely wider.

On SWE-bench, the benchmark that matters most for professional software engineering, Claude leads by 7 full points. That is not a rounding error. That is a generation gap. And that gap exists between independently verified scores and self-reported ones. The real delta is likely larger.

GLM-4.7 also has a documented implementation problem. Developers report that many popular third-party frontends and libraries fail to handle its reasoning tokens correctly, causing the model to fail at tasks it theoretically benchmarks well on. Benchmark performance that does not survive contact with real toolchains is not performance. It is a number on a slide.

The Threat Model You Are Ignoring

If you run a business and you are evaluating AI tools, here is the threat model.

Scenario one: you use GLM-4.7 via the API. Your prompts, your code, your proprietary logic hits Zhipu’s servers in China. That data is subject to Chinese law. The Chinese government can compel access. You have no legal recourse in a Chinese court as a foreign entity. Your intellectual property is exposed to a state that has a documented, decades-long strategy of acquiring foreign technology by any means available.

Scenario two: you self-host GLM-4.7 on your own hardware, fully air-gapped. The data stays local. But you are building your infrastructure on a model whose capabilities may have been derived, in part, from distilled outputs of American AI systems. You are building on a foundation whose provenance you cannot verify.

Scenario three: you use Claude. Your data is processed by Anthropic, a US company subject to US law. Is US law perfect? No. We have written extensively about FISA Court overreach and the structural problems in the security industry. But there is a court process. There is a legal framework you can engage with. There are compliance mechanisms. And Anthropic has publicly committed to AI safety and responsible development in ways that Zhipu, operating under CCP oversight, structurally cannot.

What You Should Actually Do

Apply the same threat modeling to your AI vendors that you apply to the rest of your stack. At instem.ai, we run pfSense firewalls, hard VLANs, and LuLu to block unauthorized telemetry. We moved our servers to Hetzner in Germany specifically because of US jurisdictional exposure. That posture does not stop at the network edge. It extends to every tool that touches your data, including your AI.

Concrete steps:

  • Audit your AI supply chain. If any tool, plugin, or API in your stack calls a Chinese AI model, you need to know about it. Some coding assistants and open-source tools quietly route to models like GLM without making it obvious.
  • Treat benchmark claims as marketing. Demand independent verification. If a vendor will not submit to third-party evaluation, the numbers are not trustworthy.
  • Use Claude for anything sensitive. Not because Anthropic is perfect. Because the legal framework, the safety commitments, and the independently verified performance are categorically better than the alternative.
  • If you must use open-source Chinese models, air-gap them completely. No API calls. No telemetry. Local inference on hardware you control. Treat the model the way you would treat code from an untrusted source: run it in a sandbox with no network access.
  • Read the data governance laws of the country your AI vendor operates in. Not their privacy policy. The actual law. If the government can compel access to your data, the privacy policy is meaningless.

This Is Not Xenophobia. This Is Threat Modeling.

Criticizing a specific company’s data practices, state ties, and the documented IP theft patterns of the ecosystem it operates in is not anti-Chinese sentiment. It is the same analysis we apply to American companies. We called out Apple for running undisclosed surveillance on every Mac. We documented the FISA Court’s ability to compel US companies to hand over data in secret. We do not give American companies a pass and neither should you.

But the facts are the facts. Zhipu AI is on the US Entity List. China’s own regulators caught them harvesting excess data. The Chinese AI ecosystem has been caught running large-scale distillation campaigns against every major American AI company. The IP theft numbers are in the hundreds of billions annually. And the model they are asking you to build on was developed under the oversight of a government that views technology acquisition as a national security priority.

Claude is faster, scores higher on the benchmarks that matter, is independently verified, and operates under a legal framework you can actually engage with. The performance gap is real. The security gap is wider.

instem.ai is built on zero-trust from the ground up. That starts with knowing exactly where your data goes and who has access to it.

One more thing, because this matters. instem.ai is not Claude wrapped in a bow. It is not a wrapper around any model. We are model agnostic. We built our own orchestration layer, our own enforcement infrastructure, and we are building our own models in-house. Claude, GPT, open-source, proprietary, whatever scores highest and performs best for a given task at a given time is what we run. Today that might be Claude for code. Tomorrow it might be something else entirely. The moment a better option exists, we switch. No loyalty to any vendor. No dependency on any single company's roadmap. No one else's limitations become ours.

That is the whole point. If you hitch your entire operation to one model, one company, one API, you are one pricing change, one policy shift, one acquisition away from rebuilding everything. We do not operate that way. We evaluate, we verify, we benchmark, and we control the stack end to end. The model is a component. It is not the product. instem.ai is the product. We own the stack.

Choose your AI the way you choose your infrastructure. Know where your data goes. Know who can access it. Know the law that governs it. Everything else is marketing.