/ entity matching / Chinese company names / AI testing

How to Test AI Extraction on Similar Chinese Company Names

Similar names are a hard case for supplier matching, so test sets should include realistic near-matches.

Supplier matching often fails on names that differ by one location word, industry term, or legal suffix. A model that handles obvious names may still merge two separate companies when the names look familiar.

Build a test set with near-matches. Include companies with similar English trade names, related group names, different Chinese legal names, and shared addresses. Add invoices and certificates where the holder differs from the seller.

Measure false merges and false splits separately. A false merge can hide a risky mismatch. A false split can waste analyst time. The system needs different fixes for each problem.

Keep the original Chinese text in the evaluation. Translating all names before testing may remove the very differences you need to measure.

Use the test results to tune review rules. If the model cannot separate similar names reliably, require human confirmation for first orders and beneficiary comparisons.

Working checklist

  • Use near-match test cases.
  • Measure false merges.
  • Measure false splits.
  • Keep original Chinese names.
  • Require review where matching is weak.

Sources reviewed