NEW 3.0
pdNickname first name and nickname database
MORE NAMES AND NEW FEATURES

First Name and Nickname Database

Matching and merging first names and nicknames can be tricky. How do you relate William Smith with Bill Smith or Billy Smith? The answer is the new pdNickname 3.0. It is an easy-to-use, comprehensive, and up-to-date database designed to facilitate matching names that are dissimilar because one is a given name while another is a nickname or other variation. It is available in Pro and Standard editions.

Coverage includes hundreds of thousands of names and the package employs the best matching algorithms designed for this process. As an added benefit, languages of origin and use have also been researched and included along with additional features never before available on this scale.

Pro and Standard

Both editions include the same names and features except the Pro version comes equipped with fuzzy logic. Fuzzy logic allows matching when lists have typographical errors or stylized spellings. The Standard edition has everything except fuzzy logic. More about fuzzy logic.

A HIGHLY REGARDED NAME AND NICKNAME PACKAGE

pdNickname is a proprietary resource not duplicated elsewhere. For more than 20 years our software has been utilized by businesses and organizations around the world in applications you use every day.

Comprehensive first name and nickname database:
  • 397,000 first name and nickname formations
  • 40 million standard first name and nickname variation records
First name types and relationships identified:
  • Short form nickname
  • Diminutive nickname
  • Close variant
  • Near variant
  • Distant variant
  • Phonetic match
  • Fuzzy logic match (Pro only)
  • Opposite gender match
Most advanced phonetic matching algorithms
Match quality scored on a 1 to 99 scale for exceptional ordering of results
More than 500 languages of first name origin and use identified
Excellent for genealogical and scholarly research

Pro only features

Almost 10 million fuzzy logic first name variation records
Other benefits:
  • Designed to be fully compatible with pdSurname and pdGender
  • Comes in multiple file formats: Comma Delimited (CSV), Fixed Length, and DBF
  • Full documentation
  • Perpetual Site License—allowing installation on all computers in the same building within a single company or organization
  • Available for immediate download

SPECIFICATIONS

pdNickname is a proprietary resource not duplicated elsewhere. For more than 20 years our software has been utilized by businesses and organizations around the world in applications you use every day.

Logo
pdNickname first name and nickname database logo
Sku
Pro: 1NN300P | Standard: 1NN300S
Product Name
pdNickname Pro | pdNickname Standard
Version number
3.0
Description
Name and nickname software
Total records*
Pro: 51,789,529 | Standard: 42,051,769
Zipped size**
Pro: 417.7 MB | Standard: 339.8 MB
Extracted size**
Pro: 11.6 GB | Standard: 9.5 GB
File formats included
Comma Delimited (CSV), Fixed Length, and DBF
Availability
Immediate download
List price
Pro: $495 Buy | Standard: $399 Buy

*The record count is the total number of records contained in each of the three included file formats.

**The zipped and extracted sizes show the combined total size of all product files.

Compatibility
pdNickname utilizes only the ANSI character set (ASCII values 0 to 127 and extended values 128 to 255) and comes in multiple file formats to insure compatibility. The software has also been developed to be fully compatible with pdSurname and pdGender. The name pair format in pdNickname is very similar to the pdSurname database except pdNickname is used to match give names and nicknames while pdSurname matches last names. pdGender is based on the first name database and is designed to apply gender identification to first name records. Note that pdSurname and pdGender are not required to use pdNickname but they are highly attuned to work together.
Optional developer license
Available (Questions... | Apply...)

DOCUMENTATION

For better usability of our software we create precision documentation with examples—so you don’t have to be worry. The user guide includes detailed instructions, file layouts, the site license, and additional information useful for both business applications and those employing the product for research.

To view the PDF user guide you will need Adobe Acrobat Reader version 4.05 or higher installed on your computer or device. This is a free program downloadable from the Adobe website.

View documentation

SAMPLE

A random sample of the software database is available for download. It has related variations and nicknames for the given names “Hillary”, “Donald”, “Susanna”, and “Alexander”. There are 100 name-pair examples for each name along with a separate names database with additional information, such as languages of origin and use, name rank in the United States, and any other special characteristics. The sample also includes the written documentation from the product and other information. It is extracted from the Pro edition. The Standard edition does not include fuzzy logic records.

The database come in three file formats to insure compatibility with any database system. Each format contains the same data. Formats include: comma delimited (CSV), fixed length, and DBF.

The written documentation, including the site license, comes in Adobe Acrobat PDF format. To view these documents you will need Adobe Acrobat Reader version 4.05 or higher installed on your computer or device. This is a free program downloadable from the Adobe website.

Download sample

What is fuzzy logic?

If you typed “Garfeild” into a word processor, it would probably be underlined with a squiggly red line signifying a misspelling. It is the name “Garfield” with the “IE” reversed to “EI”—a common mistake.

The fuzzy logic technology in the Pro edition of this software allows matching name data that has typographical errors. If you look at the fuzzy logic examples we have provided below, you are likely to see errors you have repeatedly made or seen. In many cases you will have to look close to see the difference, but they are different.

Fuzzy logic attempts to duplicate real errors created while entering names into databases. The most likely typographical errors are determined based on the number of letters, the characters involved, where they are located in the name, the language, and other factors.

The biggest advantage in our technology is in its ability to work with language rules that indicate how individual of various nationalities may hear and spell names.

Some fuzzy logic spellings have one typographical error while others have multiple issues, so the technology is suited for even the worst typists and transcribers. The algorithms have five layers:

Phonetic misspellings

These algorithms look at digraphs, trigraphs, tetragraphs, pentagraphs, hexagraphs, and even a German heptagraph, “SCHTSCH”, used to translate Russian words with the “SHCHA” or “SHCH” (romanticized) sound. These are, respectively, two to seven letter sequences that form one phoneme or distinct sound. Most of letter sequences trigraph and above are Irish who have more language rules than you can shake a stick at.

Many misspellings occur as transcribers enter the sounds they hear. The character sequences and the sounds they produce are different for each language and situation, such as before, after, or between certain vowels and consonants, so our substitutions are language-rule based. Furthermore, our algorithms consider both how a name may sound to someone who speaks English as well as how it may sound to someone who speaks Spanish, which is often different. Take the digraph “SC”. Before the vowels “E” or “I” it is most likely to be misspelled by an English speaker as “SHE” or “SHI” while a Spanish speaker may hear “CHE” or “CHI” and sometimes “YE” or “YI”. Our library includes over 80,000 language-based letter sequence phonetic rules. Phonetic misspelling examples:

Example 1 | Real: BARTHOLOMEW | Fuzzy: BARTHOLOMUE
Example 2 | Real: DAWNETTE | Fuzzy: DAUNETTE
Example 3 | Real: NATHANIEL | Fuzzy: NATHANAIL
Example 4 | Real: PHYLLIS | Fuzzy: FYLLIS
Example 5 | Real: SIGOURNEY | Fuzzy: SIGOURNI
Example 6 | Real: XAVIER | Fuzzy: XAVAR

Reversed digraphs

These algorithms look for misspellings due to reversed digraphs (two letter sequences that form one phoneme or distinct sound) which are a common typographical issue, such as “IE” substituted for “EI”. The character sequences and the sounds they produce are different for each language and situation, such as before, after, or between certain vowels and consonants, so our substitutions are language-rule based. Reversed digraph examples:

Example 7 | Real: ANNABETH | Fuzzy: ANNABEHT
Example 8 | Real: CAETLIN | Fuzzy: CEATLIN
Example 9 | Real: EUGENE | Fuzzy: UEGENE
Example 10 | Real: FRIEDRICH | Fuzzy: FREIDRICH
Example 11 | Real: RAQUEL | Fuzzy: RAUQEL
Example 12 | Real: VICKTOR | Fuzzy: VIKCTOR

Double letter misspellings

These algorithms look for misspellings due to double letters typed as single letters and single letters that are doubled. The most common typographical issues occur with the characters, in order of frequency, “SS”, “EE”, “TT”, “FF”, “LL”, “MM”, and “OO”. Double-letter misspelling examples:

Example 13 | Real: EMANNUEL | Fuzzy: EMMANNUEL
Example 14 | Real: KASSANDREA | Fuzzy: KASANDREA

Missed letters

These algorithms look for missed keystrokes and provide fuzzy logic matches with missing letters. Unlike the other algorithms, these are not language specific. Keystrokes can be missed in any language. Missed letter examples:

Example 15 | Real: ABDUL | Fuzzy: ADUL
Example 16 | Real: MARGARET | Fuzzy: MRGARET

String manipulations

These algorithm changes letters and syllables in a variety of ways. They are less guided by language rules and more guided by randomness. String manipulation examples:

Example 17 | Real: CYNTHIA | Fuzzy: CYNTTHA
Example 18 | Real: GERALD | Fuzzy: GERLLD

Comprehensive first name and nickname database

There are many uses for pdNickname. Traditionally the chief use is by businesses and organizations trying to merge database records and remove duplications from their computerized name lists. Often the same person is entered under different versions of a name leading to information in multiple locations. The software is used to fix this.

The package has two data sets—a names database and relationship file.

The names database list all 397,000 given names and nicknames provided with the software, complete with type of name, languages of origin and use, name rank in the United States, and other demographics.

The relationship file provides related name pairs, such as “Elizabeth | Beth” and “Thomas | Tom”. There are 40 million records in this file, but half the file has the name pairs ordered one way and the other half has the name pairs reversed, such as “Jason | Jase” compared to “Jase | Jason”. Additionally, all the names from the United States are organized in one section, and all the international names in another section. All parts may not be needed for a particular project, and the divisions make it easy to build a custom database from selected sections.

One name in each name pair may be a given first name and the other a nickname or variation. Here are more examples:

Example 1 | Name1: BEATRICE | Name2: BEA | Short form nickname
Example 2 | Name1: GABRIEL | Name2: GABE | Short form nickname
Example 3 | Name1: ANTHONY | Name2: TONY | Diminutive nickname
Example 4 | Name1: NICHOLLE | Name2: NIKKI | Diminutive nickname
Example 5 | Name1: DELORES | Name2: DELORIS | Close variation
Example 6 | Name1: MATTHEW | Name2: MATTHIEU | Near variation
Example 7 | Name1: BENJAMIN | Name2: BINYAMIN | Distant variation
Example 8 | Name1: JULIA | Name2: JOLEAH | Phonetic match
Example 9 | Name1: OLIVIER | Name2: OLIVEIR | Fuzzy logic match—Pro only
Example 10 | Name1: DANIEL | Name2: DANIELLA | Opposite gender match

Using this software results in reduced long-term costs, improved customer service, and better marketing data.

First name and nickname types and relationships

The software is composed of given first names and nicknames from the United States and around the world. Name-pair records have two sets of name information, a NAME1 side and a NAME2 side. The relationship between each name pair is classified as a short form nickname; or a diminutive nickname; or a close, near, or distant onomastic variation; or a phonetic match; or an opposite gender match; or, in the Pro edition only, a fuzzy logic match.

The onomastic distance of true variations is rated on a 1 (closest) to 3 scale. The value is determined by tabulating or estimating the number of lines separating the names on a name tree.

Records are coded as follows:

Here are some examples of related names and how they are coded:

Example 1 | Name1: PHILLIP | Name2: PHILL | Short form nickname (S)
Example 2 | Name1: REBECCA | Name2: BECCA | Short form nickname (S)
Example 3 | Name1: IGNACIO | Name2: NACHO | Diminutive nickname (D)
Example 4 | Name1: OKSANA | Name2: OKSANOCHKA | Diminutive nickname (D)
Example 5 | Name1: ALAN | Name2: ALLEN | Close variation (1)
Example 6 | Name1: MONIQUE | Name2: MONIQUA | Near variation (2)
Example 7 | Name1: WILLAMINA | Name2: WILHELMINA | Distant variation (3)
Example 8 | Name1: TERENCE | Name2: TORENCE | Phonetic match (P)
Example 9 | Name1: FRANCISCA | Name2: FANCISCA | Fuzzy logic match—Pro only (F)
Example 10 | Name1: RASHEED | Name2: RASHEEDA | Opposite gender match (X)

Most advanced phonetic matching algorithms

A major benefit of the software is the advanced system for indexing last names based on sound and spelling, such as “Garry” and “Gerry” or “Lana” and “Lona”. Our structure analyzes hundreds of phonetic lines to make comparisons. These include proprietary algorithms designed for specific languages, language families, and dialects along with special situations. They are the most advanced algorithms ever designed for names.

The system is designed to pick out similar names that are not onomatologically related, but it also matches many names that are not listed as related names in onomastic documentation, often due to the rarity of the spelling, but are doubtlessly derived from the same name formation. Many thousands of unlisted variations are picked with the phonetic algorithms.

Here are more examples of phonetic matches:

Example 1 | Name1: JULIA | Name2: JOLEAH
Example 2 | Name1: TERENCE | Name2: TORENCE

Additionally, as part of our phonetic indexing process, we include matches from six open source algorithms most data engineers are familiar with:

Soundex

his is the original phonetic algorithm. It was developed by Robert C. Russell and Margaret King Odell and patented in 1918 and 1922. The process was the first to index names by sound, as pronounced in English. The algorithm mainly encodes consonants. A vowel is not encoded unless it is the first letter.

Metaphone

This is considered the first advanced phonetic algorithm. It was published in 1990 by Lawrence Philips and improved on Soundex by using information about variations and inconsistencies in English spelling and pronunciation to produce more accurate coding.

Double Metaphone

This algorithm, also published by Lawrence Philips, is called “Double” because it can return both a primary and a secondary code for a name string. The algorithm takes into account spelling peculiarities of a number of languages in addition to English.

New York State Identification and Intelligence System (NYSIIS)

This algorithm was developed in 1970 and is similar to Soundex except it maintains relative vowel positioning and handles some phonemes and sequential letters better. The accuracy increase over Soundex has been cited as 2.7 percent.

Caverphone

This algorithm was first developed by David Hood in the Caversham Project at the University of Otago in New Zealand in 2002 and revised in 2004. It was created to assist in data matching between late 19th century and early 20th century New Zealand electoral rolls.

Daitch–Mokotoff Soundex

This algorithm was developed in 1985 by Jewish genealogists Gary Mokotoff and Randy Daitch. It is a refinement of Soundex algorithms designed to allow greater accuracy in matching of Eastern European and Ashkenazi Jewish names with similar pronunciation but differences in spelling. While specifically developed for matching surnames, it is often useful for matching first names and other words as well.

Match quality scored on a 1 to 99 scale for exceptional ordering of results

The overall quality of each name-pair match is scored on a scale of 01 (best) to 99. The number of matches from a query can sometimes be very numerous, and the score is effective in ordering the output for filtering. Users will find this a major advantage with our system. The scoring considers several factors:

Note that some archaic matches are included for their onomastic significance. The score of “99” is reserved for these and only matches.

Also note that opposite gender matches and fuzzy logic matches (Pro edition only) are not scored because not enough of the criteria necessary for scoring are present for these matches.

Languages of origin and use

Language coverage is extensive. The list exceeds 500 languages, language families, and dialects. Some languages refer to ethnic groups. None of the languages were derived algorithmically and the provided information represents years of extensive onomastic research. When different sources list different origins and usages they may be combined depending on the reliability of the source and the reasonability of the information. Differently styled names can have different language values.

Top 30 languages

The following are the top 30 languages with the number of occurrences in the names database. The language count is one for each unique name formation and not one for each relationship (which would be many more):

1. English (225,000)
2. Arabic (46,700)
3. Turkish (6,700)
4. Punjabi (6,700)
5. French (5,900)
6. Iranian (5,800)
7. Urdu (4,900)
8. Afghan Arabic (4,400)
9. Swedish (4,100)
10. Finnish (3,600)

11. Spanish (3,300)
12. Italian (3,100)
13. Bengali (3,100)
14. German (3,100)
15. Pashto (3,000)
16. Norwegian (3,000)
17. Danish (2,900)
18. Korean (2,700)
19. Egyptian Arabic (2,400)
20. Czech (2,200)

21. Czech (2,000)
22. Russian (1,900)
23. Dutch (1,800)
24. Hungarian (1,800)
25. Portuguese (1,700)
26. Malaysian Malay (1,700)
27. Albanian (1,700)
28. Japanese (1,700)
29. Bosniak Bosnian (1,500)
30. Icelandic (1,400)

Note that the counts are rounded to the lower 100.

Also note that the Arabic and Muslim name section is very large due to the many different variations and ways of writing these names. These include theophoric combination names such as those with the religious prefix “Abdul”. Both common and uncommon possibilities are included, and the use of Sun Letters in Arabic and Maltese is accounted for.

A list of all the identified languages with counts is included with the software as a Microsoft Excel (XLSX) file. The language names chosen are detailed and easy to search for.

Excellent for genealogical and scholarly research

In addition to being a powerful resource for businesses and organizations working with list of names, the software is recommended for ancestry researchers, students, teachers, and scholars. Attention has been paid to accurately and precisely representing the origin and history of given first names (also known as personal names and forenames) and nicknames (including short forms, diminutives, and even hypocoristics) and the relationships between them. It is of particular benefit in the following fields:

Special and unique origins

Of interest to those studying names, many records provide information about special and unique origins: