Machine Learning Models Exceeding Human Reading Comprehension

I was recently asked to answer this question:

What are your thoughts on the result of the accuracy rate for the machine reading comprehension of Alibaba iDST’s NLP and Microsoft Research Asia surpassing human beings for the first time in the recent SQuAD competition?

This is more proof of the accelerating advances in machine learning (ML). A third model, Hybrid AoA Reader, also exceeded human performance and pushed the F1 score to less than 2% below human performance. AoA is just one of several promising new approaches. Improvements will continue and by the end of this year it is likely that exact match (EM) and F1 scores will exceed human performance.

Note that these models are still operating on a tightly bound use case. It is important to not leap to the conclusion that these models comprehend the text in the same way humans do. The models are still operating on surface features and do not respond with the richness and variety of humans. The correct intuition is that when a valuable use case can be simplified to a limited set of alternatives, ML models are very powerful.

We have only begun to explore the space of potential reading comprehension models. The application of these models to other comprehension tasks has just started to be explored. More generally, the application of validated lab models to commercial applications is in its infancy. This is another core intuition for the near term promise of ML; there is a substantial backlog of powerful ML models that businesses have not yet explored.