Welcome to Abydos's documentation!¶
- Introduction
- FAQ
- Why is the library licensed under GPL3+? Can you change the license?
- What is the purpose of this library?
- Can you add this new feature?
- Can I contribute to the project?
- Will you add Metaphone 3?
- Why have you included algorithm X when it is already a part of NLTK/SciPy/...?
- Are there similar projects for languages other than Python?
- What is the process for adding a new class to the library?
- Are these really Frequently Asked Questions?
- abydos
- Release History
Indices¶
- AZvanGemund07
Rui Abreu, Peter Zoeteweij, and Arjan J. C. van Gemund. An evaluation of similarity coefficients for software fault localization. In 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06). 2007. doi:10.1109/PRDC.2006.18.
- Ada17
Jason Adams. Ruby port of uealite stemmer. 2017. URL: https://github.com/ealdent/uea-stemmer.
- Ain73
William A. Ainsworth. A system for converting text into speech. IEEE Transactions on Audio and Electroacoustics, AU-21(3):288–290, June 1973. doi:10.1109/TAU.1973.1162452.
- AmonME12
Iván Amón, Francisco Moreno, and Jaime Echeverri. Algoritmo fonético para detección de cadenas de texto duplicadas en el idioma español. Revista Ingenier\'ıas Universidad de Medell\'ın, 11(20):127–138, June 2012. URL: http://www.scielo.org.co/scielo.php?pid=S1692-33242012000100011\&script=sci\_abstract\&tlng=es.
- And73
Michael R. Anderberg. Cluster Analysis for Applications. Academic Press, New York, 1973. doi:10.1016/C2013-0-06161-0.
- AM04
Marti J. Anderson and Russell B. Millar. Spatial variation and effects of habitat on temperate reef fish assemblages in northeastern new zealand. Journal of Experimental Marine Biology and Ecology, 305:191–221, 2004. doi:10.1016/j.jembe.2003.12.011.
- AndresM04
A. Martín Andrés and P. Femia Marzo. Delta: a new measure of agreement between two raters. British Journal of Mathematical and Statistical Psychology, 57(1):1–20, May 2004. doi:10.1348/000711004849268.
- AC77
Brian Austin and Rita R. Colwell. Evaluation of some coefficients for use in numerical taxonomy of microorganisms. International Journal of Systematic Bacteriology, 27(3):204–210, July 1977. doi:10.1099/00207713-27-3-204.
- Axe09
Pål Axelsson. Sfinxbis. Technical Report, Swedish Alliance for Middleware Infrastructure, April 2009. URL: http://www.swami.se/download/18.248ad5af12aa8136533800091/SfinxBis.pdf.
- BUB76
Cesare Baroni-Urbani and Mauro W. Buser. Similarity of binary data. Systematic Biology, 25(3):251–259, September 1976. doi:10.2307/2412493.
- BCP02
Ilaria Bartolini, Paolo Ciaccia, and Marco Patella. String matching with metric trees using an approximate distance. In Alberto H. F. Laender and Arlindo L. Oliveira, editors, SPIRE 2002: String Processing and Information Retrieval, 271–283. Berlin, Heidelberg, 2002. Springer Berlin Heidelberg. URL: http://www-db.disi.unibo.it/research/papers/SPIRE02.pdf, doi:10.1007/3-540-45735-6\_24.
- BB95
Vladimir Batagelj and Matevž Bren. Comparing resemblance measures. Journal of Classification, 12(1):73–90, March 1995. doi:10.1007/BF01202268.
- Bau89
Forrest B. Baulieu. A classification of presence/absence based dissimilarity coefficients. Journal of Classification, 6(1):233–246, 1989. doi:10.1007/BF01908601.
- Bau97
Forrest B. Baulieu. Two variant axiom systems for presence/absence based dissimilarity coefficients. Journal of Classification, 14(1):159–170, 1997. doi:10.1007/s003579900009.
- BM08
Alexander Beider and Stephen P. Morse. Beider-morse phonetic matching: an alternative to soundex with fewer false hits. International Review of Jewish Genealogy, Summer 2008. URL: https://stevemorse.org/phonetics/bmpm.htm.
- Ben01
Rudolfo Benini. Principii di Demografia. Number 29 in Manuali Barbera di Scienze Giuridiche Sociali e Politiche. G. Barbera, Firenze, 1901. URL: http://www.archive.org/stream/principiididemo00benigoog.
- BAG54
E. M. Bennet, R. Alpert, and A. C. Goldstein. Communications through limited-response questioning. Public Opinion Quarterly, 18(3):303–308, 1954. doi:10.1086/266520.
- Bha46
Anil Kumar Bhattacharyya. On a measure of divergence between two multinomial populations. Sankhyā: The Indian Journal of Statistics (1933-1960), 7(4):401–406, July 1946. doi:10.2307/25047882.
- BP80
Gerard Bouchard and Christian Pouyez. Name variations and computerized record linkage. Historical Methods: A Journal of Quantitative and Interdisciplinary History, 13(2):119–125, 1980. doi:10.1080/01615440.1980.10594037.
- BBL81
Gérard Bouchard, Patrick Brard, and Yolande Lavoie. Fonem: un code de transcription phonétique pour la reconstitution automatique des familles saguenayennes. Population, 1981. URL: http://www.persee.fr/doc/pop\_0032-4663\_1981\_num\_36\_6\_17248, doi:10.2307/1532326.
- Boy98
Carolyn B. Boyce. Information on the refined soundex algorithm. November 1998. URL: https://web.archive.org/web/20010513121003/http://www.bluepoof.com:80/Soundex/info2.html.
- Boy11
Leonid Boytsov. Indexing methods for approximate dictionary searching: comparative analysis. Journal of Experimental Algorithmics, 16:1.1:1.1–1.1:1.91, May 2011. doi:10.1145/1963190.1963191.
- Bra51
George W. Brainerd. The place of chronological ordering in archaeological analysis. American Antiquity, 16(4):301–313, April 1951. doi:10.2307/276979.
- BB32
Josias Braun-Blanquet. Plant Sociology: The Study of Plant Communities. McGraw-Hill Book Company, New York, 1932. URL: https://archive.org/details/plantsociologyst00brau.
- BC57
J. Roger Bray and John T. Curtis. An ordination of upland forest communities of southern wisconsin. Ecological Monographs, 27(4):325–349, February 1957. URL: http://cescos.fau.edu/gawliklab/papers/BrayJRandJTCurtis1957.pdf, doi:10.2307/1942268.
- Bro97
Andrei Z. Broder. On the resemblance and containment of documents. In Compression and Complexity of Sequences: Proceedings, Positano, Amalfitan Coast, Salerno, Italy, June 11-13, 1997, 21–29. 1997. doi:10.1109/SEQUEN.1997.666900.
- BW94
Michael Burrows and David J. Wheeler. A block sorting lossless data compression algorithm. SRC Research Report 124, Digital Equipment Corporation, Palo Alto, May 1994. URL: http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-124.html.
- CBW97
Yong Cao, Anthony W. Bark, and W. Peter Williams. Similarity measure bias in river benthic aufwuchs community analysis. Water Environment Research, 69(1):95–106, 1997. doi:10.2175/106143097x125227.
- Cau99
Jörg Caumanns. A fast and simple stemming algorithm for german words. Technical Report, Free University of Berlin, 1999. URL: https://refubium.fu-berlin.de/bitstream/handle/fub188/18405/tr-b-99-16.pdf.
- Cha08
Sung-Hyuk Cha. Taxonomy of nominal type histogram distance measures. In Proceedings of the American Conference on Applied Mathematics (MATH '08). 2008. URL: http://www.wseas.us/e-library/conferences/2008/harvard/math/49-577-887.pdf.
- CTY06
Sung-Hyuk Cha, Charles C. Tappert, and Sungsoo Yoon. Enhancing binary feature vector similarity measures. Journal of Pattern Recognition Research, 1(1):63–77, 2006. doi:10.13176/11.20.
- CCCS04
Anne Chao, Robin L. Chazdon, Robert K. Colwell, and Tsung-Jen Shen. A new statistical approach for assessing similarity of species composition with incidence and abundance data. Ecology Letters, 8(2):148–159, 2004. doi:10.1111/j.1461-0248.2004.00707.x.
- CCT10
Seung-Seok Choi, Sung-Hyuk Cha, and Charles C. Tappert. A survey of binary similarity and distance measures. Systemics, Cybernetics and Informatics, 8(1):43–48, 2010.
- Chr06
Peter Christen. A comparison of personal name matching: techniques and practical issues. Technical Report TR-CS-06-02, Australian National University, Canberra, Australia, 2006. URL: https://openresearch-repository.anu.edu.au/bitstream/1885/44521/3/TR-CS-06-02.pdf.
- Chr11
Peter Christen. Febrl (freely extensible biomedical record linkage) – encode.py. December 2011. URL: https://sourceforge.net/projects/febrl/.
- CGHH91
Kenneth Church, William Gale, Patrick Hanks, and Donald Hindle. Using statistics in lexical analysis. In Lexical Acquisition: Exploiting On-Line Resources to Build up a Lexicon, pages 115–164. Lawrence Erlbaum, Hillsdale, NJ, 1991.
- Chu
Richard Churchill. Ueastem.java. URL: http://lemur.cmp.uea.ac.uk/Research/stemmer/UEAstem.java.
- CV05
Rudi Cilibrasi and Paul Michael Béla Vitanyi. Clustering by compression. IEEE Transactions on Information Theory, 51(4):1523–1545, April 2005. URL: https://ieeexplore.ieee.org/document/1412045, doi:10.1109/TIT.2005.844059.
- CislakG17
Aleksander Cisłak and Szymon Grabowski. Lightweight fingerprints for fast approximate keyword matching using bitwise operations. CoRR, 2017. URL: http://arxiv.org/abs/1711.08475.
- Cla52
Philip J. Clark. An extension of the coefficient of divergence for use with multiple characters. Copeia, 1952(2):61–64, June 1952. doi:10.2307/1438532.
- Cle76
Paul W. Clement. A formula for computing inter-observer agreement. Psychological Reports, 39(1):257–258, 1976. doi:10.2466/pr0.1976.39.1.257.
- Cod18a
Rosetta Code. Longest common subsequence. 2018. URL: http://rosettacode.org/wiki/Longest\_common\_subsequence\#Dynamic\_Programming\_6.
- Cod18b
Rosetta Code. Run-length encoding. 2018. URL: https://rosettacode.org/wiki/Run-length\_encoding\#Python.
- Coh11
Adam Cohen. Fuzzywuzzy: fuzzy string matching in python. July 2011. URL: https://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/.
- Coh60
Jacob Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37–46, 1960. doi:10.1177/001316446002000104.
- CRF03
William A. Cohen, Pradeep Ravikumar, and Stephen E. Fienberg. A comparison of string distance metrics for name-matching tasks. In IIWEB'03 Proceedings of the 2003 International Conference on Information, 73–78. 2003. URL: http://www.cs.cmu.edu/~wcohen/postscript/ijcai-ws-2003.pdf.
- CRFR03
William W. Cohen, Pradeep Ravikumar, Stephen E. Fienberg, and Kathryn Rivard. Secondstring. 2003. URL: https://github.com/TeamCohen/secondstring.
- Col49
Lamont C. Cole. The measurement of interspecific association. Ecology, 30(4):411–424, 1949. doi:10.2307/1932444.
- CT12
Viviana Consonni and Roberto Todeschini. New similarity coefficients for binary data. MATCH Communications in Mathematical and in Computer Chemistry, 68:581–592, 2012.
- Cor03
Graham Cormode. Seuqnce Distance Embeddings. PhD thesis, The University of Warwick, 2003. URL: http://wrap.warwick.ac.uk/61310/7/WRAP\_THESIS\_Cormode\_2003.pdf.
- CPSV00
Graham Cormode, Mike Paterson, Süleyman Cenk Sahinalp, and Uzi Vishkin. Communication complexity of document exchange. In SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, 197–200. 2000.
- Cor73
IBM Corporation. Alpha Search Inquiry System, General Information Manual. White Plains, NY, 1973.
- Cor17
IBM Corporation. IBM SPSS Statistics Algorithms. IBM Corporation, 25 edition, 2017. URL: ftp://public.dhe.ibm.com/software/analytics/spss/documentation/statistics/subscription/en/client/Manuals/IBM\_SPSS\_Statistics\_Algorithms.pdf.
- Cov96
Michael A. Covington. An algorithm to align words for historical comparison. Computational Linguistics, 22(4):481–496, December 1996.
- Cro51
Lee J. Cronbach. Coefficient alpha and the internal structure of tests. Psychometrika, 16(3):297–334, September 1951. doi:10.1007/BF02310555.
- C+69
Jay L. Cunningham and others. A study of the organization and search of bibliographic holdings in on-line computer systems: phase i. Technical Report, University of California, Berkleley, Institute of Library Research, March 1969. URL: https://files.eric.ed.gov/fulltext/ED029679.pdf.
- Cze09
Jan Czekanowski. Zur differentialdiagnose der neandertalgruppe. Korrespondenz-Blatt der Deutschen Gesellschaft für Anthropologie, Ethnologie und Urgeschichte, 40:44–47, 1909.
- DLP99
Ido Dagan, Lillian Lee, and Fernando C. N. Pereire. Similarity-based models of word cooccurrence probabilities. Machine Learning, 34(1–3):43–69, February 1999. doi:10.1023/A:1007537716579.
- Dal05
Andrew Dalke. Arithmetic coder (python recipe). 2005. URL: http://code.activestate.com/recipes/306626/.
- DLZ05
Valentin Dallmeier, Christian Lindig, and Andreas Zeller. Lightweight. In ECOOP'05 Proceedings of the 19th European conference on Object-Oriented Programming. 2005. URL: https://www.st.cs.uni-saarland.de/papers/dlz2004/dlz2004.pdf, doi:10.1007/11531142\_23.
- Dam64
Fred J. Damerau. A technique for computer detection and correction of spelling errors. Communications of the ACM, 7(3):171–176, March 1964. doi:10.1145/363958.363994.
- Dav62
Leon Davidson. Retrieval of misspelled names in an airlines passenger record system. Communications of the ACM, 5(3):169–171, March 1962. doi:10.1145/366862.366913.
- dcm4che
dcm4che. DICOM toolkit & library: phonem.java. URL: https://github.com/dcm4che/dcm4che/blob/master/dcm4che-soundex/src/main/java/org/dcm4che3/soundex/Phonem.java.
- Den65
Sally F. Dennis. The construction of a thesaurus automatic from a sample of text. In Mary Elizabeth Stevens, Vincent E. Giuliano, and Laurence B. Heilprin, editors, Statistical Association Techniques for Mechanized Documentation: Symposium Proceedings, number 269 in National Bureau of Standards Miscellaneous Publication, 61–148. Washington, D.C., December 1965. United States Department of Commerce. URL: https://archive.org/details/statisticalassoc269stev.
- DD16
Michel Marie Deza and Elena Deza. Encyclopedia of Distances. Springer-Verlag, Berlin, 4 edition, 2016.
- Dic45
Lee R. Dice. Measures of the amount of ecologic association between species. Ecology, 26(3):297–302, 1945. URL: https://www.jstor.org/stable/1932409, doi:10.2307/1932409.
- Dig83
P. G. N. Digby. Approximating the tetrachoric correlation coefficient. Biometrics, 39(3):753–757, September 1983. doi:10.2307/2531104.
- Dol70
James L. Dolby. An algorithm for variable-length proper-name compression. Journal of Library Automation, 3(4):257–275, 1970. URL: https://ejournals.bc.edu/ojs/index.php/ital/article/download/5259/4734, doi:10.6017/ital.v3i4.5259.
- Doo84
Mayrick H. Doolittle. The verification of predictions. The American Meteorological Journal, 2:327–329, 1884. URL: https://books.google.com/books?id=2f0wAQAAMAAJ&pg=PA327.
- DHC+08
Sean S. Downey, Brian Hallmark, Murray P. Cox, Peter Norquest, and J. Stephen Lansing. Computational featuresensitive reconstruction of language relationships: developing the aline distance for comparative historical linguistic reconstruction. Journal of Quantitative Linguistics, 15(4):340–369, November 2008. doi:10.1080/09296170802326681.
- DK32
Harold E. Driver and Alfred L. Kroeber. Quantitative expression of cultural relationships. University of California Publications in American Archaeology and Ethnology, 31(4):211–256, 1932. URL: http://digitalassets.lib.berkeley.edu/anthpubs/ucb/text/ucp031-005.pdf.
- Dun93
Ted Dunning. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1):61–74, 1993. URL: http://www.aclweb.org/anthology/J93-1003.
- EH88
Andrzej Ehrenfeucht and David Haussler. A new distance metric on strings computable in linear time. Discrete Applied Mathematics, 20(3):191–203, 1988. doi:10.1016/0166-218X(88)90076-5.
- Eid14
Horst Eidenberger. Categorization and Machine Learning: The ModModel of Human Understanding in Computers. atpress, 2014.
- Ell56
Heinz Ellenberg. Grundlagen Der Vegetationsgliederung. Teil 1. Aufgaben Und Methoden Der Vegetationskunde. Verlag Eugen Ulmer, Stuttgart, 1956.
- EJMS76
Honey S. Elovitz, Rodney W. Johnson, Astrid McHugh, and John E. Shore. Automatic translation of english text to pphonetic by means of letter-to-sound rules. NRL Report 7948, document AD/A021 929, Naval Research Laboratory, Washington, D.C., 1976.
- Eri97
Klas Erikson. Approximate swedish name matching - survey and test of different algorithms. Nada report TRITA-NA-E9721, KTH, Royal Institute of Technology, Stockholm, Sweden, 1997. URL: ftp://ftp.nada.kth.se/pub/documents/Theory/Viggo-Kann/NADA-E9721.pdf.
- Eyr38
Henri Eyraud. Les principes de la mesure des corrélations. Annales de l'Universit/e de Lyon, III Series, Section A, 1:30–47, 1938.
- Fag57
Edward W. Fager. Determination and analysis of recurrent groups. Ecology, 38(4):586–595, October 1957. doi:10.2307/1943124.
- FM63
Edward W. Fager and John A. McGowan. Zooplankton species groups in the north pacific. Science, 140(3566):453–460, 1963. doi:10.1126/science.140.3566.453.
- Fai83
Daniel P. Faith. Asymmetric binary similarity measures. Oecologia, 57(3):287–290, March 1983. doi:10.1007/BF00377169.
- Fle75
Joseph L. Fleiss. Measuring agreement between two judges on the presence or absence of a trait. Biometrics, 31(3):651–659, 1975. doi:10.2307/2529549.
- FLP03
Joseph L. Fleiss, Bruce Levin, and Myunghee Cho Paik. Statistical Methods for Rates and Proportions. Wiley Series in Probability and Statistics. John Wiley & Sons, Hoboken, 3rd edition, 2003.
- For07
Stephen A. Forbes. On the local distribution of certain illinois fishes: an essay in statistical ecology. Bulletin of the Illinois State Laboratory of Natural History, 7:273–303, 1907.
- For25
Stephen A. Forbes. Method of determining and measuring the associative relations of species. Science, 61(1585):518–524, 1925.
- FK66
Earl G. Fossum and Gilbert Kaskey. Optimization and standardization of information retrieval language and systems. Technical Report, Directorate of Information Sciences, Air Force Office of Scientific Research, Office of Aerospace Research, United States Air Force, Washington, D.C., 1966. URL: https://archive.org/details/DTIC\_AD0630797.
- FM83
E. B. Fowlkes and Colin L. Mallows. A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78(383):553–569, 1983. doi:10.1080/01621459.1983.10478008.
- FurnrohrRvR02
Michael Fürnrohr, Birgit Rimmelspacher, and Tilman von Roncador. Zusammenführung von datenbeständen ohne numerische identifikatoren: ein verfahren im rahmen der testuntersuchungen zu einem registergestützten zensus. Bayern in Zahlen, 2002(7):308–321, 2002. URL: https://www.statistik.bayern.de/medien/statistik/zensus/zusammenf\_\_hrung\_von\_datenbest\_\_nden\_ohne\_numerische\_identifikatoren.pdf.
- Gad90
T. N. Gadd. Phonix: the algorithm. Program, 24(4):363–366, 1990. doi:10.1108/eb047069.
- Gar15
Lars Marius Garshol. Norphone comparator. 2015. URL: https://github.com/larsga/Duke/blob/master/duke-core/src/main/java/no/priv/garshol/duke/comparators/NorphoneComparator.java.
- GM88
Wilde Georg and Carsten Meyer. Nicht wörtlich genommen, 'schreibweisentolerante' suchroutine in dbase implementiert. c't Magazin für Computer Technik, pages 126–131, October 1988.
- GW66
N. Gilbert and Terry C. E. Wells. Analysis of quadrat data. Journal of Ecology, 54(3):675–685, November 1966. doi:10.2307/2257810.
- Gil84
Grove K. Gilbert. Finley's tornado predictions. American Meteorological Journal, 1:166–172, 1884.
- Gil97
Leicester E. Gill. Ox-link: the oxford medical record linkage system. In Record Linkage Techniques. Washington, D.C., March 1997. Federal Committee on Statistical Methodology, Office of Management and Budget. URL: https://pdfs.semanticscholar.org/fff7/02a3322e05c282a84064ee085e589ef74584.pdf.
- Gin12
Corrado Gini. Variabilità e mutabilità. Contributo allo Studio delle Distribuzioni e delle Relazioni Statistiche. C. Cuppini, Bologna, 1912.
- Gin15
Corrado Gini. Nuovi contributi all teoria delle relazioni statistiche. Atti del Reale Istituto Veneto di Scienze, Lettere ed Arti, Series 8, 74(2):1903–1942, 1915.
- Gle20
Henry Allan Gleason. Some applications of the quadrat method. Bulletin of the Torrey Botanical Club, 47(1):21–33, January 1920. doi:10.2307/2480223.
- Goo67
David W. Goodall. The distribution of the matching coefficient. Biometrics, 23(4):647–656, December 1967. doi:10.2307/2528419.
- GK54
Leo A. Goodman and William H. Kruskal. Measures of association for cross classification i. Journal of the American Statistical Association, 49(268):732–764, 1954. doi:10.2307/2281536.
- GK59
Leo A. Goodman and William H. Kruskal. Measures of association for cross classification ii: further discussion and references. Journal of the American Statistical Association, 54(285):123–163, March 1959. doi:10.2307/2282143.
- Got82
Osamu Gotoh. An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162(3):705–708, 1982. URL: http://www.sciencedirect.com/science/article/pii/0022283682903989, doi:10.1016/0022-2836(82)90398-9.
- Gow71
John C. Gower. A general coefficient of similarities and some of its properties. Biometrics, 27(4):857–871, December 1971. doi:10.2307/2528823.
- GL86
John C. Gower and Pierre Legendre. Metric and Euclidean properties of dissimilarity coefficients. Journal of Classification, 3(1):5–48, February 1986. doi:10.1007/BF01896809.
- GIJ+01
Luis Gravano, Panagiotis G. Ipeirotis, H. V. Jagadish, Nick Koudas, S. Muthukrishman, and Divesh Srivastava. Approximate string joins in a database (almost) for free. In Proceedings of the 27th VLDB Conference, Roma, Italy, 2001. 2001.
- Gro91
Aaron D. Gross. Getty synoname: the development of software for personal name pattern matching. In Intelligent Text and Image Handling - Volume 2, RIAO '91, 754–763. Paris, France, France, 1991. LE CENTRE DE HAUTES ETUDES INTERNATIONALES D'INFORMATIQUE DOCUMENTAIRE. URL: http://dl.acm.org/citation.cfm?id=3171004.3171021.
- Guirk
J. P. Guildford. Fundamental Statistics in Psychology and Education. McGraw-Hill Book Company, New York, New York. URL: https://archive.org/details/in.ernet.dli.2015.228996.
- Gut76
Gloria J. A. Guth. Surname spellings and computerized record linkage. Historical Methods Newsletter, 10(1):10–19, 1976. doi:10.1080/00182494.1976.10112645.
- Gut41
Louis Guttman. An outline of the statistical theory of prediction. In Paul Horst, editor, The Prediction of Personal Adjustment, number 48, pages 253–311. Social Science Research Council, 1941. URL: https://babel.hathitrust.org/cgi/pt?id=uc1.b4579784;view=1up;seq=271.
- Gwe08
Kilem Li Gwet. Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psychology, 61(1):29–48, 2008. doi:10.1348/000711006X126600.
- HH00
Martin Haase and Kai Heitmann. Die erweiterte kölner phonetik. 2000.
- Ham61
Ulrich Hamann. Merkmalbestand und verwandtschaftsbeziehungen der farinosae: ein beitrag zum system der monokotyledonen. Willdenowia, 2:639–768, 1961.
- Ham50
R. W. Hamming. Error detecting and error correcting codes. The Bell System Technical Journal, 29(2):147–160, April 1950. URL: https://ieeexplore.ieee.org/document/6772729/, doi:10.1002/j.1538-7305.1950.tb00463.x.
- Har91
Donna Harman. How effective is stemming? Journal of the American Society for Information Science, 42(1):7–15, 1991. URL: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.104.9828\&rep=rep1\&type=pdf, doi:10.1002/(SICI)1097-4571(199101)42:1\%3C7::AID-ASI2\%3E3.0.CO;2-P.
- HL78
Francis C. Harris and Benjamin B. Lahey. A method for combining occurrence and nonoccurrence interobserver agreement scores. Journal of Applied Behavior Analysis, 11(4):523–527, 1978. doi:10.1901/jaba.1978.11-523.
- Has14
Ahmad Basheer Hassanat. Dimensionality invariant similarity measure. Journal of American Science, 10(8):221–226, 2014. URL: https://arxiv.org/abs/1409.0923.
- HD73
Robert P. Hawkins and Victor A. Dotson. Reliability scores that delude: an alice in wonderland trip through the misleading characteristics of inter-observer agreement scores in interval recording. Technical Report, Western Michigan University, 1973. URL: https://eric.ed.gov/?id=ED094277.
- Hel09
Ernst Hellinger. Neue begründung der theorie quadratischer formen von unendlichvielen veränderlichen. Journal Für Die Reine Und Angewandte Mathematik, 1909(136):210–271, 1909. doi:10.1515/crll.1909.136.210.
- HH77
Robert A. Henderson and Malcolm L. Heron. A probabilistic method of paleobiogeographic analysis. Lethaia, 10(1):1–15, 1977. doi:10.1111/j.1502-3931.1977.tb00584.x.
- Hen76
Louis Henry. Projet de transcription phonétique des noms de famille. Annales de Démographie Historique, 1976:201–214, 1976. URL: https://www.persee.fr/doc/adh\_0066-2062\_1976\_num\_1976\_1\_1313.
- HBD76
Theodore Hershberg, Alan Burstein, and Robert Dockhorn. Record linkage. Historical Methods Newsletter, 9(2–3):137–163, 1976. doi:10.1080/00182494.1976.10112639.
- HBD79
Theodore Hershberg, Alan Burstein, and Robert Dockhorn. Verkettung von daten: record linkage am beispiel des philadelphia social history project. In Wilhelm Heinz Schröder, editor, Moderne Stadtgeschichte, volume 8, pages 35–73. Klett-Cotta, 1979. URL: https://www.ssoar.info/ssoar/handle/document/32782.
- HM02
David Holmes and M. Catherine McCabe. Improving precision and recall for soundex retrieval. In Proceedings. International Conference on Information Technology: Coding and Computing, 22–26. April 2002. URL: https://ieeexplore.ieee.org/document/1000354/, doi:10.1109/ITCC.2002.1000354.
- Hoo02
David Hood. Cavesystem: phonetic matching algorithm. Technical Report CTP060902, University of Otago, Dunedin, New Zealand, September 2002. URL: https://caversham.otago.ac.nz/files/working/ctp060902.pdf.
- Hoo04
David Hood. Caverphone revisited. Technical Report CTP150804, University of Otago, Dunedin, New Zealand, December 2004. URL: https://caversham.otago.ac.nz/files/working/ctp150804.pdf.
- Hor66
Henry S. Horn. Measurement of "overlap" in comparative ecological studies. The American Naturalist, 100(914):419–424, September 1966. doi:10.2307/2459242.
- Hubalek08
Zdenek Hubálek. Coefficients of association and similarity, based on binary (presence-absence) data: an evaluation. Biological Reviews, 57(4):669–689, February 2008. doi:10.1111/j.1469-185X.1982.tb00376.x.
- Hur69
Stuart H. Hurlbert. A coefficient of interspecific assciation. Ecology, 50(1):1–9, January 1969. doi:10.2307/1934657.
- Jac01
Paul Jaccard. Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines. Bulletin de la Société Vaudoise des Sciences Naturelles, 37:241–272, 1901. URL: https://core.ac.uk/download/pdf/20654241.pdf.
- Jar89
Matthew A. Jaro. Advances in record linkage methodology as applied to the 1985 census of tampa florida. Journal of the American Statistical Association, 84(406):414–420, 1989. doi:10.1080/01621459.1989.10478785.
- JS05
Marie-Claire Jenkins and Dan Smith. Conservative stemming for search and indexing. Technical Report, University of East-Anglia, Norwich, UK, 2005. URL: http://lemur.cmp.uea.ac.uk/Research/stemmer/stemmer25feb.pdf.
- JBG13
Sergio Jiminez, Claudio Becerra, and Alexander Gelbukh. SOFTCARDINALITY-CORE: improving text overlap with distributional measures for semantic textual similarity. In Second Joint Conference on Lexical and Computational Semantics (\textasteriskcenteredSEM ), Volume 1: Proceedings of the Main Conference and the Shared Task, 194–201. Atlanta, GA, June 2013. Association for Computational Linguistics. URL: http://www.aclweb.org/anthology/S13-1028.
- Joh67
Stephen C. Johnson. Hierarchical clustering schemes. Psychometrika, 32(3):241–254, September 1967. doi:10.1007/BF02289588.
- JH05
James A. Jones and Mary Jean Harrold. Empirical evaluation of the tarantula automatic fault-localization technique. In ASE '05 Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering, 273–282. New York, November 2005. ACM, ACM. doi:10.1145/1101908.1101949.
- Kem05
Sebastian Kempken. Bewertung historischer und regionaler schreibvarianten mit hilfe von abstandsmaßen. Master's thesis, Universität Duisburg-Essen, December 2005. URL: https://duepublico.uni-duisburg-essen.de/servlets/DerivateServlet/Derivate-17252/BewertungSchreibvarianten.pdf.
- Ken38
Maurice G. Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, June 1938. doi:10.2307/2332226.
- KF77
Ronald N. Kent and Sharon L. Foster. Direct observational procedure: methodological issues in naturalistic settings. In Anthony R. Ciminero, Karen, S. Calhoun, and Henry E. Adams, editors, Handbook of Behavioral Assessment, chapter 9, pages 279–328. John Wiley & Sons, New York, 1977. URL: https://archive.org/details/handbookofbehavi00cimi.
- Knu98
Donald E. Knuth. The Art of Computer Programming: Volume 3, Sorting and Searching, pages 394. Addison-Wesley, 1998.
- Kollar
Maroš Kollár. Text::phonetic::phonix. URL: https://github.com/maros/Text-Phonetic/blob/master/lib/Text/Phonetic/Phonix.pm.
- Kon00
Grzegorz Kondrak. A new algorithm for the alignment of phonetic sequences. In NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference. 2000. doi:10.0000/dl.acm.org/974343.
- Kon02
Grzegorz Kondrak. Algorithms for Language Reconstruction. PhD thesis, University of Toronto, 2002. URL: https://webdocs.cs.ualberta.ca/~kondrak/papers/thesis.pdf.
- KD03
Grzegorz Kondrak and Bonnie J. Dorr. A similarity-based approach and evaluation methodology for reduction of drug name confusion. Technical Report, University of Maryland, Institute for Advanced Computer Studies, 2003. URL: https://apps.dtic.mil/dtic/tr/fulltext/u2/a452242.pdf.
- KV17
Kerrthi Koneru and Cihan Varol. Privacy preserving record linkage using metasoundex algorithm. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), 443–447. December 2017. URL: https://ieeexplore.ieee.org/document/8260671/, doi:10.1109/ICMLA.2017.0-121.
- KR37
G. Frederic Kuder and Marion Webster Richardson. The theory of the estimation of test reliability. Psychometrika, 2(3):151–160, September 1937. doi:10.1007/bf02288391.
- Kuh95
Michael Kuhn. Metaphone searches. November 1995. URL: http://aspell.net/metaphone/metaphone-kuhn.txt.
- Kuh64
John L. Kuhns. The continuum of coefficients of association. In Mary Elizabeth Stevens, Vincent E. Giuliano, and Laurence B. Heilprin, editors, Statistical Association Methods for Mechanized Documentation, number 269 in National Bureau of Standards Miscellaneous Publication, 33–40. 1964.
- Kul15
Maciej Kula. Simple minhash implementation in python. June 2015. URL: https://maciejkula.github.io/2015/06/01/simple-minhash-implementation-in-python/.
- Kulczynski27
Stanisław Kulczynśki. Die pflanzenassoziationen der pieninen. Bulletin International de l'Academie Polonaise des Sciences et des Lettres, Classe des Sciences Mathematiques et Naturelles, B (Sciences Naturelles), pages 57–203, 1927.
- Koppen70
Wladimir Köppen. Die aufeinanderfolge der periodischen witterungserscheinungen nach den grundsätzen der wahrscheinlichkeitsrechnung. In Repertorium für Meteorologie, volume 2, pages 189–238. Akademiia Nauk, 1870. URL: https://books.google.com/books?id=1ww0AQAAMAAJ\&pg=RA1-PA187\#v=onepage\&q\&f=false.
- LR96
Andrew J. Lait and Brian Randell. An assessment of name matching algorithms. Technical Report, University of Newcastle upon Tyne, Newcastle upon Tyne, UK, 1996. URL: http://homepages.cs.ncl.ac.uk/brian.randell/Genealogy/NameMatching.pdf.
- LW66
Godfrey N. Lance and William T. Williams. Computer programs for hierarchical polythetic classification ("similarity analysis"). Computer Journal, 1966. doi:10.1093/comjnl/9.1.60.
- LW67a
Godfrey N. Lance and William T. Williams. A general theory of classificatory sorting strategies. ii. clustering systems. Computer Journal, 10(3):271–277, January 1967. URL: https://academic.oup.com/comjnl/article-pdf/10/3/271/1333425/100271.pdf, doi:10.1093/comjnl/10.3.271.
- LW67b
Godfrey N. Lance and William T. Williams. Mixed-data classificatory programs i. agglomerative systems. Australian Computer Journal, 1:15–20, 1967.
- Lan13
Joerg Lang. Inner wworking of the german analyzer in lucene. November 2013. URL: http://www.evelix.ch/unternehmen/Blog/evelix/2013/11/11/inner-workings-of-the-german-analyzer-in-lucene.
- LL98
Pierre Legendre and Louis Legendre. Numerical Ecology. Number 20 in Developments in Environmental Modelling. Elsevier, Amsterdam, 2nd edition, 1998.
- Lev65
Vladimir I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR, 163(4):845–848, 1965. URL: http://mi.mathnet.ru/dan31411.
- Lev66
Vladimir I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8):707–710, February 1966. URL: https://nymity.ch/sybilhunting/pdf/Levenshtein1966a.pdf.
- Lin04
Chin-Yew Lin. Rouge: a package for automatic evaluation of summaries. In Text Summarization Branches Out. 2004. URL: http://aclweb.org/anthology/W04-1013.
- LSShaweTaylor+02
Huma Lodhi, Craig Saunders, John Shawe-Taylor, Nello Cristianini, and Chris Watkins. Text classification using string kernels. Journal of Machine Learning Research, 2:419–444, 2002. doi:10.1162/153244302760200687.
- Lov68
Julie Beth Lovins. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1–2):22–31, June 1968. URL: http://www.mt-archive.info/MT-1968-Lovins.pdf.
- LA77
Billy T. Lynch and William L. Arends. Selection of a surname coding procedure for the srs record linkage system. Technical Report, Statistical Reporting Service, US Department of Agriculture, Washington, D.C., February 1977. URL: https://naldc.nal.usda.gov/download/27833/PDF.
- LegareLC72
Jacques Légaré, Yolande Lavoie, and Hubert Charbonneau. The early canadian population: problems in automatic record linkage. Canadian Historical Review, 53(4):427–442, December 1972. doi:10.3138/CHR-053-04-03.
- Mar15
Daniel Marcelino. Soundexbr: soundex (phonetic) algorithm for Brazilian portuguese. July 2015. URL: https://github.com/danielmarcelino/SoundexBR.
- Mat75
Brian W. Matthews. Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Structure, 405(2):442–451, 1975.
- Mat55
Kameo Matusita. Decision rules, based on the distance, for problems of fit, two samples, and estimation. The Annals of Mathematical Statistics, 26(4):631–640, December 1955. doi:10.2307/2236376.
- MP68
A. E. Maxwell and A. E. G. Pilliner. Deriving coefficients of reliability and agreement for ratings. The British Journal of Mathematical and Statistical Psychology, 21(1):105–116, May 1968. doi:10.1111/j.2044-8317.1968.tb00401.x.
- McC64
Bayard H. McConnaughey. The determination and analysis of plankton communities. Lembaga Penelitian Laut, pages 1–40, 1964.
- Mic99
Jörg Michael. Doppelgänger gesucht – ein programm für die kontextsensitive phonetische stringumwandlung. c't Magazin für Computer Technik, pages 252, 1999. URL: http://www.heise.de/ct/ftp/99/25/252/.
- Mic07
Jörg Michael. Phonet.c. August 2007. URL: ftp://ftp.heise.de/pub/ct/listings/phonet.zip.
- Mic20
Ellis L. Michael. Marine ecology and the coefficient of association: a plea in behalf of quantitative biology. The Journal of Ecology, 8(1):54–59, 1920. doi:10.2307/2255213.
- Min10
Hermann Minkowski. Geometrie der Zahlen. R. G. Teubner, Leipzig, 1910. URL: https://archive.org/stream/geometriederzahl00minkrich.
- Mok97
Gary Mokotoff. Soundexing and genealogy. 1997. URL: http://www.avotaynu.com/soundex.htm.
- ME96
Alvaro E. Monge and Charles P. Elkan. The field matching problem: algorithms and applications. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD'96, 267–270. AAAI Press, 1996. URL: http://dl.acm.org/citation.cfm?id=3001460.3001516.
- MKTM77
Gwendolyn B. Moore, John L. Kuhns, Jeffrey L. Trefftzs, and Christine A. Montgomery. Accessing Individual Records from Personal Data Files Using Non-Unique Identifiers. Number 500-2 in Special Publication. National Bureau of Standards, Washington, D.C., February 1977. URL: https://archive.org/details/accessingindivid00moor.
- MYCappe08
Erwan Moreau, François Yvon, and Olivier Cappé. Robust similarity measures for named entities matching. In COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1, 593–600. August 2008.
- Mor59
Masaaki Morisita. Measuring of interspecific association and similarity between communities. In Memoirs of the Faculty of Science, volume 3 of Series E (Biology), pages 65–80. Kyushu University, 1959.
- Mor12
James F. Morris. A Quantitative MethoMethod for Vetting "Dark Network" Intelligence Sources for Social Network Analysis. PhD thesis, Air Force Institute of Technology, 2012. URL: https://apps.dtic.mil/dtic/tr/fulltext/u2/a561702.pdf.
- MLM12
Alejandro Mosquera, Elena Lloret, and Paloma Moreda. Towards facilitating the accessibility of web 2.0 Texts through text normalisation. In Proceedings of the LREC workshop: Natural Language Processing for Improving Textual Accessibility (NLP4ITA) ; Istanbul, Turkey., 9–14. 2012. URL: http://www.taln.upf.edu/pages/nlp4ita/pdfs/mosquera-nlp4ita2012.pdf.
- MDobrzanskiZ50
J. Motyka, B. Dobrzański, and S. Zawadzki. Wstçpne badania nad lakami paludniowo-wschodnilj lubel-szczyzny (preliminary studies on meadows in the south-east of the province lublin). Annales Universitatis Mariae Curie-Skłodowska, Sectio E, 5(13):367–447, 1950.
- Mou62
M. D. Mountford. An index of similarity and its application to classificatory problems. In P. W. Murphy, editor, Progress in Soil Zoology: Papers from a Colloquium on Research Methods Organized by the Soil Zoology Committee of the International Society of Soil Science, 43–50. London, July 1962. Butterworths. URL: https://openlibrary.org/books/OL5908681M/Progress\_in\_soil\_zoology.
- Moz36
Alan Mozley. The statistical analysis of the distribution of pond molluscs in western Canada. The American Naturalist, 1936. doi:10.1086/280660.
- NMM11
Rashid Naseem, Onaiza Maqbool, and Siraj Muhammad. Improved similarity measures for software clustering. In Proceedings of the Euromicro Conference on Software Maintenance and Reengineering, CSMR. March 2011. doi:10.1109/CSMR.2011.9.
- NW70
Saul B. Needleman and Christian D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3):443–453, 1970. URL: http://www.sciencedirect.com/science/article/pii/0022283670900574, doi:10.1016/0022-2836(70)90057-4.
- Och57
Akira Ochiai. Zoogeographical studies on the soleoid fishes found in Japan and its neighhouring regions-ii. Bulletin of the Japanese Society of Scientific Fisheries, 22(9):526–530, 1957. URL: https://www.jstage.jst.go.jp/article/suisan1932/22/9/22\_9\_526/\_pdf/-char/en, doi:10.2331/suisan.22.526.
- oC13
Library of Congress. Classification and Shelflisting Manual. Library of Congress, 2013. URL: https://www.loc.gov/aba/publications/FreeCSM/freecsm.html.
- Ope12
OpenRefine. Clustering in depth. 2012. URL: https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth.
- Orloci67
Laszlo Orlóci. An agllomerative method for classification of plant communities. The Journal of Ecology, 55(1):193–206, March 1967. doi:10.2307/2257725.
- Ots36
Yanosuke Otsuka. The faunal character of the Japanese pleistocene marine mollusca, as evidence of the climate having become colder during the pleistocene in Japan. Bulletin of the Biogeographical Society of Japan, 6(16):165–170, 1936.
- Ozb15
Hakan Ozbay. Ozbay metric. 2015. URL: https://github.com/hakanozbay/ozbay-metric.
- Pai90
Chris D. Paice. Another stemmer. In ACM SIGIR Forum, volume 24, 56–61. Fall 1990. URL: https://dl.acm.org/citation.cfm?id=101310, doi:10.1145/101306.101310.
- PRWZ02
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, 311–318. 2002. URL: https://www.aclweb.org/anthology/P02-1040.pdf.
- PK14
Vimal P. Parmar and CK Kumbharana. Study existing various phonetic algorithms and designing and development of a working model for the new developed algorithm and comparison by implementing ti with existing algorithm(s). International Journal of Computer Applications, 98(19):45–49, 2014. doi:10.5120/17295-7795.
- Pas06
Rebecca Passonneau. Measuring agreement on set-valued items (masi) for semantic and pragmatic annotation. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC'06), 831–836. May 2006.
- Pea00
Karl Pearson. Mathematical contributions to the theory of evolution. vii. on the correlation of characters not quantitatively measurable. Philosophical Transactions of the Royal Society, 195 A:1–47, 1900. doi:10.1098/rsta.1900.0022.
- PH13
Karl Pearson and David Heron. On theories of association. Biometrika, 9(1/2):159–315, 1913. doi:10.2307/2331805.
- Pec10
Pavel Pecina. Lexical association measures and collocation extraction. Language Resources & Evaluation, 44(1/2):137–158, 2010. doi:10.2307/40666353.
- Pei84
Charles S. Peirce. The numerical measure of the success of predictions. Science, 4(93):453–454, 1884. doi:10.1126/science.ns-4.93.453-a.
- Pen52
Lionel S. Penrose. Distance, size and shape. Annals of Eugenics, 17(1):337–343, January 1952. doi:10.1111/j.1469-1809.1952.tb02527.x.
- Pfe00
Ulrich Pfeifer. Wait 1.8 - soundex.c. 2000. URL: https://fastapi.metacpan.org/source/ULPFR/WAIT-1.800/soundex.c.
- Phi90a
Lawrence Philips. Hanging on the metaphone. Computer Language, 7(12):39–44, December 1990.
- Phi90b
Lawrence Philips. Metaphone. December 1990. URL: http://aspell.net/metaphone/metaphone.basic.
- Phi00
Lawrence Philips. The double metaphone search algorithm. C/C++ Users Journal, 18(6):38–43, June 2000.
- Pli18
Guillaume Plique. Talisman. 2018. URL: https://github.com/Yomguithereal/talisman.
- PZ84
Joseph J. Pollock and Antonio Zamora. Automatic spelling correction in scientific and scholarly text. Communications of the ACM, 27(4):358–368, April 1984. URL: http://dl.acm.org/citation.cfm?id=358048, doi:10.1145/358027.358048.
- Por80
Martin F. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, July 1980. URL: http://snowball.tartarus.org/algorithms/porter/stemmer.html, doi:10.1108/eb046814.
- Por02
Martin F. Porter. The english (porter2) stemming algorithm. September 2002. URL: http://snowball.tartarus.org/algorithms/english/stemmer.html.
- Pos69
Hans Joachim Postel. Die kölner phonetik: ein verfahren zur identifizierung von personennamen auf der grundlage der gestaltanalyse. IBM-Nachrichten, 19:925–931, 1969.
- Pra15
Jörg Prante. Elasticsearch – haasephonetik.java. 2015. URL: https://github.com/elastic/elasticsearch/blob/master/plugins/analysis-phonetic/src/main/java/org/elasticsearch/index/analysis/phonetic/HaasePhonetik.java.
- Rruvzivcka58
M. Růžička. Anwendung mathematische-statistischer methoden in der geobotanik (synthetische bearbeitung von aufnahmen). Biologia, Bratislava, 13:647–661, 1958.
- RTS+01
Dragomir Radev, Simone Teufel, Horacio Saggion, Wai Lam, John Blitzer, Arda Çelebi, Hong Qi, Elliott Drabek, and Danyu Liu. Evaluation of text summarization in a cross-lingual information retrieval framework. Technical Report, Johns Hopkins, 2001. URL: https://pdfs.semanticscholar.org/44a1/df62a1c815fc84aa42788283655a38c85550.pdf.
- Ran71
William M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336):846–850, December 1971. doi:10.2307/2284239.
- RM88
John W. Ratcliff and David E. Metzener. Pattern matching: the gestalt approach. Dr. Dobbs Journal, 1988. URL: http://www.drdobbs.com/database/pattern-matching-the-gestalt-approach/184407970.
- RC79
David M. Raup and Rex E. Crick. Measurement of faunal similarity in paleontology. Journal of Paleontology, 53(5):1213–1227, September 1979. doi:10.2307/1304099.
- RaissouliLC09
Mustapha Raïssouli, Fatima Leazizi, and Mohamed Chergui. Arithmetic-geometric-harmonic mean of three positive operators. Journal of Inequalities in Pure and Applied Mathematics, 2009. URL: http://www.emis.de/journals/JIPAM/images/014\_08\_JIPAM/014\_08.pdf.
- Ree14
Tony Rees. Taxamatch, an algorithm for near ('fuzzy') matching on scientific names in taxonomic databases. PLoS ONE, 9(9):1–27, September 2014. doi:10.1371/journal.pone.0107510.
- RB13
Tony Rees and Barbara Boehmer. The mdld (modified damerau-levenshtein distance) algorithm. November 2013. URL: https://confluence.csiro.au/public/taxamatch/the-mdld-modified-damerau-levenshtein-distance-algorithm.
- Rep13
Dominic John Repici. Understanding classic soundex algorithms. 2013. URL: http://creativyst.com/Doc/Articles/SoundEx1/SoundEx1.htm\#SoundExAndCensus.
- RU09
Nicholas Ring and Alexandra L. Uitdenbogerd. Finding `lucy in disguise': the misheard lyric matching problem. In Gary Geunbae Lee, Dawei Song, Chin-Yew Lin, Akiko Aizawa, Kazuko Kuriyama, Masaharu Yoshioka, and Tetsuya Sakai, editors, Information Retrieval Technology, 157–167. Berlin, Heidelberg, 2009. Springer Berlin Heidelberg. doi:10.1007/978-3-642-04769-5\_14.
- Rob86
David W. Roberts. Ordination on the basis of fuzzy set theory. Vegetatio, 66(3):123–131, 1986. doi:10.1007/BF00039905.
- RC67
A. H. Robinson and Colin Cherry. Results of a prototype television bandwidth compression scheme. In Proceedings of the IEEE, volume 55, 356–364. IEEE, 1967. doi:10.1109/PROC.1967.5493.
- Rob51
W. S. Robinson. A method for chronologically ordering archaeological deposits. American Antiquity, 16(4):293–301, April 1951. doi:10.2307/276978.
- RT60
David J. Rogers and Taffee T. Tanimoto. A computer program for classifying plants. Science, 132(3434):1115–1118, October 1960. doi:10.1126/science.132.3434.1115.
- RG66
Eugene Rogot and Irving D. Goldberg. A proposed index for measuring agreement in test-retest studies. Journal of Chronic Diseases, 1966. doi:10.1016/0021-9681(66)90032-4.
- RY05
Gong Ruibin and Chan Kai Yun. An adaptive model for phonetic string search. In Knowledge-Based Intelligent Information and Engineering Systems, 9th International Conference, KES 2005 Melbourne, Australia, September 14-16, 2005 Proceedings, Part III, volume 3683 of Lecture Notes in Artificial Intelligence, 915–921. 2005.
- Ruk18
Dorothea Rukasz. Pprl – privacy preserving record linkage. 2018. URL: https://github.com/cran/PPRL.
- RHJF14
Daniel E. Russ, Kwan-Yuet Ho, Calvin A. Johnson, and Melissa C. Friesen. Computer-based coding of occupation codes for epidemiological analysis. In 2014 IEEE 27th International Symposium on Computer-Based Medical Systems, 347–350. 2014. doi:10.1109/CBMS.2014.79.
- RR40
Paul F. Russell and T. Ramachandra Rao. On habitat and association of species of anopheline larvae in south-eastern madras. Journal of the Malaria Institute of India, 3(1):153–178, 1940.
- Rus18
Robert C. Russell. Index. 1918. URL: https://patentimages.storage.googleapis.com/31/35/a1/f697a3ab85ced6/US1261167.pdf.
- Sav05
Jacques Savoy. IR multilingual resources at unine. 2005. URL: http://members.unine.ch/jacques.savoy/clef/.
- Schurer07
Kevin Schürer. Creating a nationally representative individual and household sample for great britain, 1851 to 1901 - the victorian panel study (vps). Historical Social Research / Historische Sozialforschung, 32(2):211–331, 2007. doi:10.2307/20762213.
- SGRW96
Robyn Schinke, Mark Greengrass, Alexander M. Robertson, and Peter Willett. A stemming algorithm for latin text databases. Journal of Documentation, 52(2):172–187, 1996. doi:10.1108/eb026966.
- SBB04
Rainer Schnell, Tobias Bachteler, and Stefan Bender. A toolbox for record linkage. Australian Journal of Statistics, 33(1-2):125–133, 2004. URL: https://pdfs.semanticscholar.org/2353/21c24ed0401cd05d7752c2c8a8da5b7a4dc0.pdf.
- Sco55
William A. Scott. Reliability of content analysis: the case of nominal scale coding. Public Opinion Quarterly, 19(3):321–325, 1955. doi:10.1086/266577.
- Sei93
Heinz-Jürgen Seiffert. Problem 887. Nieuw Archief voor Wiskunde, 11(4):176, 1993.
- Seq18
SequentiX. Distance measures. 2018. URL: https://www.sequentix.de/gelquest/help/distance\_measures.htm.
- SA10
Boumedyen A. N. Shannaq and Victor V. Alexandrov. Using product similarity for adding business. Global Journal of Computer Science and Technology, 10(12):2–8, October 2010. URL: https://www.sial.iias.spb.su/files/386-386-1-PB.pdf.
- SS07
Dana Shapira and James A. Storer. Edit distance with move operations. Journal of Discrete Algorithms, 5(2):380–392, June 2007. doi:10.1016/j.jda.2005.01.010.
- Shi93
Guang R. Shi. Multivariate data analysis in palaeoecology and palaeobiogeography—a review. Palaeogeography, Palaeoclimatology, Palaeoecology, 105(3-4):199–234, 1993. doi:10.1016/0031-0182(93)90084-v.
- SGGomezAP14
Grigori Sidorov, Alexander Gelbukh, Helena Gómez-Adorno, and David Pinto. Soft similarity and soft cosine measure: similarity of features in vector space model. Computación y Sistemas, 2014. URL: http://www.scielo.org.mx/pdf/cys/v18n3/v18n3a7.pdf, doi:10.13053/CyS-18-3-2043.
- Sim49
Edward H. Simpson. Measurement of diversity. Nature, 163:688, April 1949. URL: https://www.nature.com/articles/163688a0, doi:10.1038/163688a0.
- Sjoo09
Allan Sjöö. Swamisfinxbix. 2009. URL: http://www.swami.se/download/18.248ad5af12aa8136533800093/swamiSfinxBis.java.txt.
- SW81
Temple F. Smith and Michael S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195–197, 1981. URL: http://www.sciencedirect.com/science/article/pii/0022283681900875, doi:10.1016/0022-2836(81)90087-5.
- SD02
Chakkrit Snae and Bernard Diaz. An interface for mining genealogical nominal data using the concept of linkage and a hybrid name matching algorithm. Journal of 3D-Forum Society, 16(1):142–147, 2002. URL: https://web.archive.org/web/20050329140715/www.csc.liv.ac.uk/~chakkrit/Publications/hc2001\_Journal.pdf.
- SM58
Robert R. Sokal and Charles D. Michener. A statistical method for evaluating systematic relationships. The University of Kansas Science Bulletin, 38, part 2(22):1409–1438, March 1958. URL: https://archive.org/details/cbarchive\_133648\_astatisticalmethodforevaluatin1902.
- SS63
Robert R. Sokal and Peter H. A. Sneath. Principles of Numerical Taxonomy. W. H. Freeman and Company, San Francisco, 1963.
- Son11
Wayne Song. Typo-distance. 2011. URL: https://github.com/wsong/Typo-Distance.
- Sor58
Theodor Sorgenfrei. Molluscan Assemblages from the Marine Middle Miocene of South Jutland and Their Environments. Number 79 in 2. Danmarks Geologiske Undersøgelse, 1–503, 1958.
- Sta97
United States. Using the Census Soundex. Number 55 in General Information Leaflet. National Archives and Records Administration, Washington, D.C., 1997. URL: https://hdl.handle.net/2027/pur1.32754067050041.
- Sta07
United States. Soundex system: the soundex indexing system. 2007. URL: https://www.archives.gov/research/census/soundex.html.
- Ste34
J. F. Steffensen. On certain measures of dependence between statistical variables. Biometrika, 26(1/2):251–255, May 1934. doi:10.2307/2332058.
- SLaclavik15
Sam Steingold and Michal Laclavík. An information theoretic metric for multi-class categorization. Technical Report, Magnetic Media Online, 2015. URL: https://github.com/Magnetic/proficiency-metric/blob/master/paper/predeval.pdf.
- Ste14
Kevin L. Stern. Dameraulevenshteinalgorithm.java. 2014. URL: https://github.com/KevinStern/software-and-algorithms/blob/master/src/main/java/blogspot/software\_and\_algorithms/stern\_library/string/DamerauLevenshteinAlgorithm.java.
- Sti61
H. Edmund Stiles. The association factor in information retrieval. Journal of the ACM, 8(2):271–279, April 1961. doi:10.1145/321062.321074.
- SSK05
Giorgos Stoilos, Giorgos Stamou, and Stefanos Kollias. A string metric for ontology alignment. In ISWC'05 Proceedings of the 4th international conference on The Semantic Web, 624–637. Galway, Ireland, November 2005. doi:10.1007/11574620\_45.
- Stu53
A. Stuart. The estimation and comparison of strengths of association in contingency tables. Biometrika, 40(1/2):105–110, June 1953. doi:10.2307/2333101.
- Szy34
Dezydery Szymkiewicz. Une contribution statistique à la géographie floristique. Acta Societatis Botanicorum Poloniae, 11(3):249–265, 1934. URL: https://pbsociety.org.pl/journals/index.php/asbp/article/download/asbp.1934.012/6710, doi:10.5586/asbp.1934.012.
- Sorensen48
Thorvald Sørensen. A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on danish commons. Kongelige Danske Videnskabernes Selskab, 5(4):1–34, 1948. URL: http://www.royalacademy.dk/Publications/High/295\_S\%C3\%B8rensen,\%20Thorvald.pdf.
- Taf70
Robert L. Taft. Name Search Techniques. Special report (New York State Identification and Intelligence System). Bureau of Systems Development, New York State Identification and Intelligence System, 1970.
- Tan58
T. T. Tanimoto. An elementary mathematical theory of classification and prediction. Technical Report, IBM, 1958.
- Tar60
Kazimierz Tarwid. Szacowanie zbieznosci nisz ekologicznych gatunkow droga oceny prawdopodobienstwa spotykania sie ich w polowach. Ekologia Polska, Seria B, pages 115–130, 1960.
- Tic84
Walter F. Tichy. The string-to-string correction problem with block moves. ACM Transactions on Computer Systems, 2(4):309–321, November 1984. doi:10.1145/357401.357404.
- Tic
Ticki. Eudex: a blazingly fast phonetic reduction/hashing algorithm. URL: https://github.com/ticki/eudex.
- Tic16
Ticki. The eudex algorithm. December 2016. URL: http://ticki.github.io/blog/the-eudex-algorithm/.
- Tul97
Rodham E. Tulloss. Assessment of similarity indices for undesirable properties and a new tripartite similarity index based on cost functions. In Mary E. Palm and Ignacio H. Chapela, editors, Mycology in Sustainable Development: Expanding Concepts, Vanishing Borders, pages 122–143. Parkway Publishers, Inc., Boone, NC, 1997.
- TCLM88
W. A. Turner, G. Charton, F. Laville, and B. Michelet. Packaging information for peer review: new co-word analysis techniques. In Handbook of Quantitative Studies of Science and Technology. New Holland, 1988.
- Tve77
Amos Tversky. Features of similarity. Psychological Review, 84(4):327–352, 1977. URL: http://www.cogsci.ucsd.edu/~coulson/203/tversky-features.pdf, doi:10.1037/0033-295x.84.4.327.
- Ukk92
Esko Ukkonen. Approximate string-matching with q-grams and maximal matches. Theoretical Computer Science, 92(1):191–211, 1992. doi:10.1016/0304-3975(92)90143-4.
- Uph77
William B. Upholt. Estimation of DNA sequence divergence from comparison of restriction endonuclease digests. Nucleic Acids Research, 4(5):1257–1265, January 1977. doi:10.1093/nar/4.5.1257.
- VB12
Cihan Varol and Coskun Bayrak. Hybrid matching algorithm for personal names. Journal of Data and Information Quality, 3(4):8:1–8:18, September 2012. doi:10.1145/2348828.2348830.
- WF74
Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. Journal of the ACM, 21(1):168–173, January 1974. doi:10.1145/321796.321811.
- War08
Matthijs J. Warrens. Similarity Coefficients for Binary Data: Properties of Coefficients, Coefficient Matrices, Multi-way Metrics and Multivariate Coefficients. PhD thesis, Universiteit Leiden, Leiden, June 2008. URL: https://openaccess.leidenuniv.nl/bitstream/handle/1887/12987/Full\_thesis.pdf.
- Whid.
Simon White. How to strike a match. Web, Nd. The oldest version on Internet Archive was archived in 2004. URL: http://www.catalysoft.com/articles/StrikeAMatch.html.
- Whi52
R. H. Whittaker. A study of summer foliage insect communities in the great smoky mountains. Ecological Monographs, 22(1):1–44, January 1952. doi:10.2307/1948527.
- Whi82
Robert H. Whittaker. Ordination of Plant Communities. Volume 5 of Handbook of Vegetation Sciecne. Springer Netherlands, 1982.
- Wik18
Wikibooks. Algorithm implementation/strings/longest common substring. 2018. URL: https://en.wikibooks.org/wiki/Algorithm\_Implementation/Strings/Longest\_common\_substring\#Python.
- Wil05
Martin Wilz. Aspekte der kodierung phonetischer Ähnlichkeiten in deutschen eigennamen. Master's thesis, Universität zu Köln, Köln, 2005. URL: http://ifl.phil-fak.uni-koeln.de/sites/linguistik/Phonetik/import/Phonetik\_Files/Allgemeine\_Dateien/Martin\_Wilz.pdf.
- Win90
William E. Winkler. String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage. Technical Report, U.S. Bureau of the Census, Statistical Research Division, Washington, D.C., 1990. URL: https://files.eric.ed.gov/fulltext/ED325505.pdf.
- WMJL94
William E. Winkler, George McLaughlin, Matthew A. Jaro, and Maureen Lync. Strcmp95.c. January 1994. URL: https://web.archive.org/web/20110629121242/http://www.census.gov/geo/msb/stand/strcmp.c.
- Xia13
Hua Xiang. Similarity-based Virtual Screening: Effect of the Choice of Similarity Measure. PhD thesis, The University of Sheffield, 2013. URL: http://etheses.whiterose.ac.uk/5662/1/Thesis\_Final.pdf.
- YJH+16
Ruiyu Yang, Yuxiang Jiang, Matthew W. Hahn, Elizabeth A. Houseworth, and Predrag Radivojac. New metrics for learning and inference on sets, ontologies, and functions. March 2016. URL: https://arxiv.org/abs/1603.06846v1.
- Yat34
Frank Yates. Contingency tables involving small numbers and the \$\chi \$2 Test. Supplement to the Journal of the Royal Statistical Society, 1(2):217–235, 1934. doi:10.2307/2983604.
- You50
William John Youden. Index for rating diagnostic tests. Cancer, 3(1):32–35, 1950. doi:10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3.
- YB07
Li Yujian and Liu Bo. A normalized levenshtein distance metric. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):1091–1095, 2007. doi:10.1109/TPAMI.2007.1078.
- Yul12
G. Udny Yule. On the methods of measuring association between two attributes. Journal of the Royal Statistical Society, 1912. doi:10.2307/2340126.
- YK68
G. Udny Yule and Maurice G. Kendall. An Introduction to the Theory of Statistics. Griffin, London, 14 edition, 1968.
- Zac14
Siderite Zackwehdex. Super fast and accurate string distance algorithm: sift4. 2014. URL: https://siderite.blogspot.com/2014/11/super-fast-and-accurate-string-distance.html.
- Zed15
Jesper Zedlitz. Phonet4java phonet.java. 2015. URL: https://github.com/jze/phonet4java/blob/master/src/main/java/de/zedlitz/phonet4java/Phonet.java.
- ZD96
Justin Zobel and Philip Dart. Phonetic string matching: lessons from information retrieval. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '96, 166–172. New York, NY, USA, 1996. ACM. doi:10.1145/243199.243258.
- delHigueraMico08
Colin de la Higuera and Luisa Micó. A contextual normalised edit distance. In First International Workshop on Similarity Search and Applications (sisap 2008). 2008. doi:10.1109/SISAP.2008.17.
- delPAngelesBailonM16
María del Pilar Angeles and Noemi Bailón-Miguel. Performance of spanish encoding functions during record linkage. In DATA ANALYTICS 2016: The Fifth International Conference on Data Analysis, 1–7. 2016. URL: https://core.ac.uk/download/pdf/55855695.pdf\#page=14.
- delPAngelesEGGM15
María del Pilar Angeles, Adrián Espino-Gamez, and Jonathan Gil-Moncada. Comparison of a modified spanish phonetic, soundex, and phonex coding functions during data matching process. In 2015 International Conference on Informatics, Electronics Vision (ICIEV), 1–5. June 2015. URL: https://www.researchgate.net/publication/285589803\_Comparison\_of\_a\_Modified\_Spanish\_Phonetic\_Soundex\_and\_Phonex\_coding\_functions\_during\_data\_matching\_process, doi:10.1109/ICIEV.2015.7334028.
- JPGTrust91
The J. Paul Getty Trust. Synoname. 1991. URL: http://www.cs.cmu.edu/Groups/AI/areas/nlp/misc/synoname/synoname.zip.
- vandMaarel69
Eddy van der Maarel. On the use of ordination model in phytosociology. Vegetatio Acta Geobotanica, 19(1–6):21–46, January 1969.
- vonRethS77
Hans-Peter von Reth and Hans-Jörg Schek. Eine zugriffsmethode für die phonetische Ähnlichkeitssuche. Technical Report 77.03.002, IBM Deutschland GmbH., 1977.