
SQL数据校验规则的核心是“用查询说话”,不是靠人工肉眼比对,而是通过可执行、可复用的SQL语句主动暴露不一致。重点在于:明确校验目标(字段级?关联级?业务逻辑级?)、选择合适检查方式(COUNT对比、LEFT JOIN找缺失、NOT EXISTS找异常、聚合+CASE做条件统计),并让结果直接返回问题记录或汇总指标。
字段值一致性校验(同一表内)
检查某字段是否符合预设取值范围、非空约束或格式规范。例如:用户表中status只能是'active'、'inactive'、'pending',且不能为空:
- 查非法值:SELECT user_id, status FROM users WHERE status NOT IN ('active', 'inactive', 'pending') OR status IS NULL;
- 查空值占比(用于评估严重性):SELECT COUNT(*) FILTER (WHERE status IS NULL) * 100.0 / COUNT(*) AS null_pct FROM users;
- 查常见格式问题(如手机号含字母):SELECT phone FROM users WHERE phone !~ '^[0-9]{11}$';(PostgreSQL正则写法,MySQL可用REGEXP)
主外键关联一致性校验(跨表)
验证子表记录是否都在父表中有对应主键,避免“孤儿数据”。例如:订单表orders的customer_id必须存在于客户表customers中:
- 找所有无效外键:SELECT o.order_id, o.customer_id FROM orders o LEFT JOIN customers c ON o.customer_id = c.customer_id WHERE c.customer_id IS NULL;
- 快速统计不一致数量:SELECT COUNT(*) FROM orders o WHERE NOT EXISTS (SELECT 1 FROM customers c WHERE c.customer_id = o.customer_id);
- 反向检查(哪些客户无订单):SELECT c.customer_id FROM customers c WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);(属于完整性补全类校验)
业务逻辑一致性校验(多条件组合)
反映真实业务规则,往往需JOIN+聚合+条件判断。例如:“已发货订单的发货日期不能早于下单日期”:
- 直接定位违规记录:SELECT order_id, order_date, ship_date FROM orders WHERE status = 'shipped' AND ship_date
- 按天统计异常量(便于监控趋势):SELECT DATE(ship_date) AS day, COUNT(*) AS bad_ships FROM orders WHERE status = 'shipped' AND ship_date
- 检查金额平衡(如订单总金额 = 各明细行金额之和):SELECT o.order_id FROM orders o JOIN (SELECT order_id, SUM(amount) AS line_total FROM order_items GROUP BY order_id) i ON o.order_id = i.order_id WHERE o.total_amount != i.line_total;
校验结果可读 & 可运维
生产环境的校验SQL不是跑一次就丢,要能被定时任务调用、被监控系统识别、被业务方看懂:
- 每条校验SQL开头加注释说明目的,例如:-- 校验:订单状态为'shipped'时,shipping_address_id 必须非空
- 返回结构统一:至少包含check_name、error_count、sample_ids(用STRING_AGG或GROUP_CONCAT取前5个ID)
- 避免SELECT *,只查关键定位字段;大表加WHERE限制时间范围(如AND create_time >= '2024-01-01')提升响应速度
- 敏感字段(如身份证、手机号)在校验结果中脱敏显示,例如CONCAT(LEFT(id_card, 3), '****', RIGHT(id_card, 4))










