How To Read The Dashboard
- Coverage shows how many departments use a field pattern.
- Variants show how many different labels map to the same standardized category.
- Confidence shows how defensible a category currently is for planning use.
- Trusted Core is the same thing as the High Confidence subset. The dashboard uses both names for the strongest rows currently suitable for decision-support discussion.
High Confidence
Medium Confidence
Trusted Core = High Confidence
What Coverage and Inconsistency Mean
These two columns are calculated automatically from the data — not assigned by hand.
- Universal — the field appears across 10 or more of the 27 tracked departments. It is essentially a cross-university pattern within the current project scope.
- Common — the field appears in 5 to 9 of the 27 tracked departments. Widely shared but not universal.
- Specialized — the field appears in 2 to 4 departments. Present in multiple places but not broadly adopted.
- Limited — the field appears in only 1 department. May be department-specific or a one-off.
Inconsistency is based on how many different raw label variants map to the same standardized field.
- High inconsistency — 15 or more different labels were found for the same concept across forms. The field is widely used but departments are not naming it the same way.
- Medium inconsistency — 5 to 14 variants. Some fragmentation but not extreme.
- Low inconsistency — fewer than 5 variants. Departments are already using relatively consistent naming for this field.
How Confidence Was Assigned
Confidence was assigned by the analysis workflow after cleanup and category remapping. It is not a manual score on every row, and it is not a claim that a row is permanently final. It is a practical decision-support signal based on how reliable a category currently is.
- High confidence was assigned to stronger identity, contact, department, course, signature, and core administrative fields that were repeatedly observed and cleaned into stable categories.
- Medium confidence was assigned to categories that are useful but still broader, more context-dependent, or more likely to contain mixed variants.
- Low confidence was assigned to residual noise, unmatched labels, artifact-like fields, and office-use buckets that still need more review.
Document classification used a mix of rule-based filtering and targeted AI review for ambiguous files. The confidence labels themselves were produced by the analysis workflow after those cleanup steps, not by a person hand-tagging each row.
Why This Is Useful
Even without building or owning implementation, this analysis provides a structured view of field inconsistency and data quality that can help future modernization work start from evidence instead of guesswork.
The strongest value of this work is not claiming that every field is already perfect. It is making the current forms landscape measurable, explainable, and easier to prioritize.
Current Limits
- Residual low-confidence rows remain and still need targeted review.
- Some extracted labels depend on source quality, especially older PDFs and OCR-heavy documents.
- Not every standardized category is final; some are still practical analysis buckets rather than final enterprise definitions.