Learning What’s in a Name with Graphical Models
Main Article Content
Abstract
“The UK” is a country, but “The UK Department of Transport” is an organization within that country. In a named entity recognition (NER) task, where we want to label each word with a name tag (organization/person/location/other/not a name), how can a computer model know one from the other?
In this article, we’ll explore three model families that are remarkably successful at NER: Hidden Markov Models (HMMs), Maximum-Entropy Markov Models (MEMMs), and Conditional Random Fields (CRFs). We’ll use interactive visualizations to explain the graphical structure of each. Our overarching goal is to demonstrate how visualizations can be effective tools for communicating and clarifying complex, abstract concepts. The visualizations will allow us to compare and contrast between model families, and understand how each builds on and addresses key issues affecting its predecessors.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
References
Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning. Daphne Koller and Nir Friedman. 2009. The MIT Press.
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Judea Pearl. 1988. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA. https://doi.org/10.1016/C2009-0-27609-4 DOI: https://doi.org/10.1016/C2009-0-27609-4
Genes, Themes and Microarrays: Using Information Retrieval for Large-Scale Gene Analysis. Hagit Shatkay, Stephen Edwards, W John Wilbur, and Mark Boguski. 2000. In Proceedings of the International Conference on Intelligent Systems for Molecular Biology, 317–328.
Information Extraction Using Hidden Markov Models. Timothy Robert Leek. 1997. Master’s Thesis, UC San Diego.
Information Extraction with HMMs and Shrinkage. Dayne Freitag and Andrew McCallum. 1999. In Papers from the AAAI-99 Workshop on Machine Learning for Information Extraction (AAAI Techinical Report WS-99-11), 31–36.
A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Lawrence R Rabiner. 1989. Proceedings of the IEEE 77, 2: 257–286. https://doi.org/10.1109/5.18626 DOI: https://doi.org/10.1109/5.18626
An Algorithm that Learns What’s in a Name. Daniel M. Bikel, Richard Schwartz, and Ralph M. Weischedel. 1999. Machine Learning 34, 1: 211–231. https://doi.org/10.1023/A:1007558221122 DOI: https://doi.org/10.1023/A:1007558221122
Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. Erik F. Tjong Kim Sang and Fien De Meulder. 2003. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, 142–147. https://doi.org/10.48550/arXiv.cs/0306050 DOI: https://doi.org/10.3115/1119176.1119195
Appendix A.5 — HMM Training: The Forward-Backward Algorithm. Daniel Jurafsky and James H. Martin. 2021. In Speech and Language Processing. 8–10.
Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm. A. Viterbi. 1967. IEEE Transactions on Information Theory 13, 2: 260–269. https://doi.org/10.1109/TIT.1967.1054010 DOI: https://doi.org/10.1109/TIT.1967.1054010
Appendix A.4 — Decoding: The Viterbi Algorithm. Daniel Jurafsky and James H. Martin. 2021. In Speech and Language Processing. 8–10.
Maximum Entropy Markov Models for Information Extraction and Segmentation. Andrew McCallum, Dayne Freitag, and Fernando C. N. Pereira. 2000. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML ’00), 591–598.
Maximum Entropy Models for Antibody Diversity. Thierry Mora, Aleksandra M. Walczak, William Bialek, and Curtis G. Callan. 2010. Proceedings of the National Academy of Sciences 107, 12: 5405–5410. https://doi.org/10.1073/pnas.1001705107 DOI: https://doi.org/10.1073/pnas.1001705107
Human Behavior Modeling with Maximum Entropy Inverse Optimal Control. Brian Ziebart, Andrew Maas, J. Bagnell, and Anind Dey. 2009. In Papers from the 2009 AAAI Spring Symposium, Technical Report SS-09-04, Stanford, California, USA, 92–97.
On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes. Andrew Ng and Michael Jordan. 2001. In Advances in Neural Information Processing Systems.
Inducing Features of Random Fields S. Della Pietra, V. Della Pietra, and J. Lafferty. 1997. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 4: 380–393. https://doi.org/ 10.1109/34.588021 DOI: https://doi.org/10.1109/34.588021
Une Approche théorique de l’Apprentissage Connexionniste: Applications à la Reconnaissance de la Parole. Léon Bottou. 1991. Université de Paris X.
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML ’01), 282–289.
The Label Bias Problem. Awni Hannun. 2019. Awni Hannun — Writing About Machine Learning.
Discriminative Probabilistic Models for Relational Data. Ben Taskar, Pieter Abbeel, and Daphne Koller. 2013. https://doi.org/10.48550/ARXIV.1301.0604
Accurate Information Extraction from Research Papers using Conditional Random Fields. Fuchun Peng and Andrew McCallum. 2004. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, 329–336. https://doi.org/10.1016/j.ipm.2005.09.002 DOI: https://doi.org/10.1016/j.ipm.2005.09.002
Discriminative Fields for Modeling Spatial Dependencies in Natural Images. Sanjiv Kumar and Martial Hebert. 2003. In Advances in Neural Information Processing Systems.
Multiscale Conditional Random Fields for Image Labeling. Xuming He, R.S. Zemel, and M.A. Carreira-Perpinan. 2004. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004, CVPR 2004, II–II. https://doi.org/10.1109/CVPR.2004.1315232 DOI: https://doi.org/10.1109/CVPR.2004.1315232
Conditional Random Fields as Recurrent Neural Networks. Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, and Philip H. S. Torr. 2015. In 2015 IEEE International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv.2015.179 DOI: https://doi.org/10.1109/ICCV.2015.179
Convolutional CRFs for Semantic Segmentation. Marvin T. T. Teichmann and Roberto Cipolla. 2018. https://doi.org/10.48550/arxiv.1805.04777
RNA Secondary Structural Alignment with Conditional Random Fields. Kengo Sato and Yasubumi Sakakibara. 2005. Bioinformatics 21: ii237–ii242. https://doi.org/10.1093/bioinformatics/bti1139 DOI: https://doi.org/10.1093/bioinformatics/bti1139
Protein Fold Recognition Using Segmentation Conditional Random Fields (SCRFs). Yan Liu, Jaime Carbonell, Peter Weigele, and Vanathi Gopalakrishnan. 2006. J. Comput. Biol. 13, 2: 394–406. https://doi.org/10.1089/cmb.2006.13.394 DOI: https://doi.org/10.1089/cmb.2006.13.394
Introduction to Markov Random Fields
Andrew Blake and Pushmeet Kohli. 2011. In Markov Random Fields for Vision and Image Processing. The MIT Press. DOI: https://doi.org/10.7551/mitpress/8579.001.0001
An Introduction to Conditional Random Fields. Charles Sutton and Andrew McCallum. 2010. arXiv. https://doi.org/10.48550/ARXIV.1011.4088