While trying to measure readability, researchers noted that using very simple metrics for their baselines, e.g. the average sentence or word length, they would get relatively plausible results which also correlated with other more granular measures.
This suggests that sometimes more broad metrics are useful to get a basic idea or direction of where the data points to.
For example, Martinc et al. found that their deep neural language models were not able to outperform an ASL baseline (r = .906) on the Newsela corpus in an unsupervised setting.
@olney2022, p. 308