General

We are about quality. Quality software products, quality customer support, and most importantly, quality results for our clients. Our offerings are designed to make database systems more useful, more accurate, and easier to manage.

For more than 20 years our software has been utilized around the world in applications you use every day. Our customers are businesses, organizations, churches, schools, researchers, government agencies, and individual who know Peacock Data is committed to total client satisfaction, state-of-the-art technology, innovation, and fair pricing.

Imagine the possibilities.

We offer a full line of database software products. You will find unique packages relating to ZIP Codes, United Stastes Census and ACS demographics, GeoCoding, first names and nicknames, last names, gender coding, countries of the world, and more—and the list is growing.

Names: pdNickname | pdSurname | pdGender | pdSuite Names

GeoCoding: pdGeoTIGER

Demographics: pdCensus2010 | pdCensus2000

International: pdCountry

ZIP Codes: pdZIP

Everything: pdSuite Master Collection

It is easy to purchase Peacock Data software products using the online cart system. Simply add the products you want using the BUY buttons and checkout using PayPal or your credit card.

Products are delivered electronically and you will be able to download your databases immediately upon successful completion of our online order form. Once your payment is processed, you will be directed to a webpage where you can download your products. Links will also be provided in an email so you can download at a later time. Our downloads are powered by Digital Product Delivery (DPD).

Software is delivered in Zip file format. Inside the archive you will find one or more product files along with full documentation and the license agreement.

Database files are provided in multiple formats, including Comma Delimited (CSV), Fixed Length, and DBF. For smaller products, all formats are in the same zip archive. For multi-gigabyte products, we archive each format in a separate zip file for the convenience of users and due to operating system file-size limitations.

Some products also incorporate bonuses. For example, we include our 300,000-record pdGeoSupplement geographic reference file free with purchase of pdGeoTIGER, pdCensus2010, or pdSuite Master Collection.

Our software packages are designed to be compatible with any operating system or database platform.

Our products utilize only the ANSI character set. This includes ASCII values 0 to 127 and extended values 128 to 255. These are also known as the extended Latin alphabet. Some users may need to configure their database system to import the extended values. In many cases the option will be labeled the “Latin-1” character set.

All our database products come with a perpetual multi-seat Site License granting the right to install the product on all computers in the same building within a single company or organization. Separate licenses are needed for each building the information is used in. The full text of the Site License is provided with the product or can be read anytime in the Support section of our website.

Yes, an optional Developer License is available granting the right to incorporate the data in for-profit services and products. Online forms are available if you have questions about or want to apply for a Developer License.

Questions about developer licenses Apply for a developer licenses

Yes, version upgrades are available. Currently they are offered for the newest editions of pdNickname, pdGender, pdGeoTIGER, and pdCensus2010. But the list changes as products are updated. Some upgrades are free.

To upgrade to a higher version, you must own a site license to the lower version. You will need your 5-character Peacock Data customer ID, invoice number, or transaction number to verify eligibility.

Go to upgrade order form

On the web our home page is: https://www.peacockdata.com

Our physical address and contact information are:

901 W Columbus St, Ste 207, Bakersfield, CA 93301, USA

(800) 609-9231 (toll free)

(818) 480-4391 (fax)

Go to our contact form

We began operation as Peacock Data, Inc. over twelve years ago, on January 1, 2003. Our President/CEO has been in the database business for more than twenty years and in related fields since 1984. From the beginning, his motto has been, “For us it’s the service AFTER the sale that counts!”

Some make better mousetraps. We make better software.

Extensive R&D, fresh information sources, data provided in multiple ways so users have choices, and well constructed user guides, are among the benefits from Peacock Data.

Our products are unique, have years of development and field testing behind them, and are fashioned to be the easiest to use regardless of experience.

We are committed to total client satisfaction, state-of-the-art technology, innovation, and fair pricing.

Imagine the possibilities.

We are a FoxPro/Visual FoxPro house and it is used in all our development. Our database products, however, are compatible with any database platform.

We take the privacy of your personal information very seriously and will use your information only in accordance with the terms our Privacy Policy.

View the full privacy policy

We have established reasonable terms to make working with purchasers of our database products a productive, agreeable, and fair enterprise for all parties involved.

View the full terms of service

Yes, our affiliates program offers a unique way for your website or app to link to the Peacock Data product line. You will be provided with all of the tools necessary to convert your existing traffic into sales along with full support from dedicated affiliate managers. Apply now to join the program and earn substantial rewards!

Peacock Data Affiliates Program

pdNickname

Matching and merging first names and nicknames can be tricky. How do you relate William Smith with Bill Smith or Billy Smith? The answer is pdNickname. It is an easy-to-use, comprehensive, and up-to-date database designed to facilitate matching names that are dissimilar because one is a given name while another is a nickname or other variation. It is available in Pro and Standard editions.

Coverage includes hundreds of thousands of names and the package employs the best matching algorithms designed for this process. The software is a one-of-a-kind proprietary resource that for more than 20 years has been utilized by businesses and organizations around the world in applications you use every day.

There are many uses for pdNickname. Traditionally the chief use is by businesses and organizations trying to merge database records and remove duplications from their computerized name lists. Often the same person is entered under different versions of a name leading to information in multiple locations. The software is used to fix this.

The package has two data sets—a names database and relationship file.

The names database list all 397,000 given names and nicknames provided with the software, complete with type of name, languages of origin and use, name rank in the United States, and other demographics.

The relationship file provides related name pairs, such as “Elizabeth | Beth” and “Thomas | Tom”. There are 40 million records in this file, but half the file has the name pairs ordered one way and the other half has the name pairs reversed, such as “Jason | Jase” compared to “Jase | Jason”. Additionally, all the names from the United States are organized in one section, and all the international names in another section. All parts may not be needed for a particular project, and the divisions make it easy to build a custom database from selected sections.

One name in each name pair may be a given first name and the other a nickname or variation. Here are more examples:

Example 1 | Name1: BEATRICE | Name2: BEA | Short form nickname
Example 2 | Name1: GABRIEL | Name2: GABE | Short form nickname
Example 3 | Name1: ANTHONY | Name2: TONY | Diminutive nickname
Example 4 | Name1: NICHOLLE | Name2: NIKKI | Diminutive nickname
Example 5 | Name1: DELORES | Name2: DELORIS | Close variation
Example 6 | Name1: MATTHEW | Name2: MATTHIEU | Near variation
Example 7 | Name1: BENJAMIN | Name2: BINYAMIN | Distant variation
Example 8 | Name1: JULIA | Name2: JOLEAH | Phonetic match
Example 9 | Name1: OLIVIER | Name2: OLIVEIR | Fuzzy logic match—Pro only
Example 10 | Name1: DANIEL | Name2: DANIELLA | Opposite gender match

Using this software results in reduced long-term costs, improved customer service, and better marketing data.

The software is composed of given first names and nicknames from the United States and around the world. Name-pair records have two sets of name information, a NAME1 side and a NAME2 side. The relationship between each name pair is classified as a short form nickname; or a diminutive nickname; or a close, near, or distant onomastic variation; or a phonetic match; or an opposite gender match; or, in the Pro edition only, a fuzzy logic match.

The onomastic distance of true variations is rated on a 1 (closest) to 3 scale. The value is determined by tabulating or estimating the number of lines separating the names on a name tree.

Records are coded as follows:

  • 1 = Close onomastic variant
  • 2 = Near onomastic variant
  • 3 = Distant onomastic variant
  • S = Short form nickname
  • D = Diminutive nickname
  • P = Phonetic match
  • X = Opposite gender match
  • F = Fuzzy logic match (Pro only)

Here are some examples of related names and how they are coded:

Example 1 | Name1: PHILLIP | Name2: PHILL | Short form nickname (S)
Example 2 | Name1: REBECCA | Name2: BECCA | Short form nickname (S)
Example 3 | Name1: IGNACIO | Name2: NACHO | Diminutive nickname (D)
Example 4 | Name1: OKSANA | Name2: OKSANOCHKA | Diminutive nickname (D)
Example 5 | Name1: ALAN | Name2: ALLEN | Close variation (1)
Example 6 | Name1: MONIQUE | Name2: MONIQUA | Near variation (2)
Example 7 | Name1: WILLAMINA | Name2: WILHELMINA | Distant variation (3)
Example 8 | Name1: TERENCE | Name2: TORENCE | Phonetic match (P)
Example 9 | Name1: FRANCISCA | Name2: FANCISCA | Fuzzy logic match—Pro only (F)
Example 10 | Name1: RASHEED | Name2: RASHEEDA | Opposite gender match (X)

Yes, a major benefit of the software is the advanced system for indexing last names based on sound and spelling, such as “Garry” and “Gerry” or “Lana” and “Lona”. Our structure analyzes hundreds of phonetic lines to make comparisons. These include proprietary algorithms designed for specific languages, language families, and dialects along with special situations. They are the most advanced algorithms ever designed for names.

The system is designed to pick out similar names that are not onomatologically related, but it also matches many names that are not listed as related names in onomastic documentation, often due to the rarity of the spelling, but are doubtlessly derived from the same name formation. Many thousands of unlisted variations are picked with the phonetic algorithms.

Here are more examples of phonetic matches:

Example 1 | Name1: JULIA | Name2: JOLEAH
Example 2 | Name1: TERENCE | Name2: TORENCE

Additionally, as part of our phonetic indexing process, we include matches from six open source algorithms most data engineers are familiar with:

Soundex

his is the original phonetic algorithm. It was developed by Robert C. Russell and Margaret King Odell and patented in 1918 and 1922. The process was the first to index names by sound, as pronounced in English. The algorithm mainly encodes consonants. A vowel is not encoded unless it is the first letter.

Metaphone

This is considered the first advanced phonetic algorithm. It was published in 1990 by Lawrence Philips and improved on Soundex by using information about variations and inconsistencies in English spelling and pronunciation to produce more accurate coding.

Double Metaphone

This algorithm, also published by Lawrence Philips, is called “Double” because it can return both a primary and a secondary code for a name string. The algorithm takes into account spelling peculiarities of a number of languages in addition to English.

New York State Identification and Intelligence System (NYSIIS)

This algorithm was developed in 1970 and is similar to Soundex except it maintains relative vowel positioning and handles some phonemes and sequential letters better. The accuracy increase over Soundex has been cited as 2.7 percent.

Caverphone

This algorithm was first developed by David Hood in the Caversham Project at the University of Otago in New Zealand in 2002 and revised in 2004. It was created to assist in data matching between late 19th century and early 20th century New Zealand electoral rolls.

Daitch–Mokotoff Soundex

This algorithm was developed in 1985 by Jewish genealogists Gary Mokotoff and Randy Daitch. It is a refinement of Soundex algorithms designed to allow greater accuracy in matching of Eastern European and Ashkenazi Jewish names with similar pronunciation but differences in spelling. While specifically developed for matching surnames, it is often useful for matching first names and other words as well.

Yes, language coverage is extensive. The list exceeds 500 languages, language families, and dialects. Some languages refer to ethnic groups. None of the languages were derived algorithmically and the provided information represents years of extensive onomastic research. When different sources list different origins and usages they may be combined depending on the reliability of the source and the reasonability of the information.

Top 30 languages

The following are the top 30 languages with the number of occurrences in the names database. The language count is one for each unique name formation and not one for each relationship (which would be many more):

  1. English (225,000)
  2. Arabic (46,700)
  3. Turkish (6,700)
  4. Punjabi (6,700)
  5. French (5,900)
  6. Iranian (5,800)
  7. Urdu (4,900)
  8. Afghan Arabic (4,400)
  9. Swedish (4,100)
  10. Finnish (3,600)
  11. Spanish (3,300)
  12. Italian (3,100)
  13. Bengali (3,100)
  14. German (3,100)
  15. Pashto (3,000)
  16. Norwegian (3,000)
  17. Danish (2,900)
  18. Korean (2,700)
  19. Egyptian Arabic (2,400)
  20. Polish (2,000)
  21. Czech (2,000)
  22. Russian (1,900)
  23. Dutch (1,800)
  24. Hungarian (1,800)
  25. Portuguese (1,700)
  26. Malaysian Malay (1,700)
  27. Albanian (1,700)
  28. Japanese (1,700)
  29. Bosniak Bosnian (1,500)
  30. Icelandic (1,400)

Note that the counts are rounded to the lower 100.

Also note that the Arabic and Muslim name section is very large due to the many different variations and ways of writing these names. These include theophoric combination names such as those with the religious prefix “Abdul”. Both common and uncommon possibilities are included, and the use of Sun Letters in Arabic and Maltese is accounted for.

A list of all the identified languages with counts is included with the software as a Microsoft Excel (XLSX) file. The language names chosen are detailed and easy to search for.

The overall quality of each name-pair match is scored on a scale of 01 (best) to 99. The number of matches from a query can sometimes be very numerous, and the score is effective in ordering the output for filtering. Users will find this a major advantage with our system. The scoring considers several factors:

  • How closely the names are onomatologically linked
  • If the match is a nickname or given name variant—nicknames are generally scored higher, but not always
  • If a nickname match is a short form or diminutive—short forms are generally scored higher, but not always, such as when a diminutive is known to be very popular
  • If a nickname matches the beginning syllable of an associated name or another part of the name—matches to the beginning are generally scored higher, but not always, such as when a nickname matched to another part is known to be very popular
  • How closely the languages match
  • How closely the names are spelled and pronounced
  • The popularity of the names involved in the match

Note that some archaic matches are included for their onomastic significance. The score of “99” is reserved for these and only matches.

Also note that opposite gender matches and fuzzy logic matches (Pro edition only) are not scored because not enough of the criteria necessary for scoring are present for these matches.

Basic gender coding can be performed with pdNickname, but pdGender is specifically designed for the task. It has multiple gender coding fields filtered for languages, rare unisex usage by one gender, archaic names, and nicknames. Some extras pdGender can do include:

  • When names are different genders in different languages and nationalities, users can choose which languages and nationalities to take precedence
  • When names are one gender now in current times, but were the opposite gender in a previous era, the system automatically applies the modern usage
  • When names are one gender when used as a proper given name and unisex when used as a nickname, users can choose to have the given name usage applied
  • When unisex names are only rarely used by one gender and much more common in the opposite gender, users can choose to ignore the rare instances

Yes, pdNickname, pdSurname, and pdGender make excellent partners. They have been developed to be fully compatible. The name pair format in pdNickname is very similar to the pdSurname database except pdNickname is used to match give names and nicknames while pdSurname matches last names. pdGender is based on the first name database and is designed to apply gender identification to first name records. Note that pdSurname and pdGender are not required to use pdNickname but they are highly attuned to work together.

Yes, in addition to being a powerful resource for businesses and organizations working with list of names, the software is recommended for ancestry researchers, students, teachers, and scholars. Attention has been paid to accurately and precisely representing the origin and history of given first names (also known as personal names and forenames) and nicknames (including short forms, diminutives, and even hypocoristics) and the relationships between them. It is of particular benefit in the following fields:

  • Genealogy
  • Anthroponymy
  • Onomatology
  • Ethnology
  • Linguistics
  • Related fields

Special and unique origins

Of interest to those studying names, many records provide information about special and unique origins:

  • Names from religion:
    • Biblical
    • Quranic
    • Sanskrit
  • Bynames: a familiar name for a person, similar to a nickname, that is often used as a replacement for a personal name—for example, “Rocky” is a common byname for boxers
  • History: names that became known through historical events
  • Literature: literary names created by authors, composers, and poets
  • Names from mythology and legend:
    • Arthurian Legend
    • Egyptian Mythology
    • Greek Mythology
    • Irish Mythology
    • Judeo-Christian Legend
    • Norse Mythology
    • Roman Mythology
    • Many others mythologies and legends
  • Roman names:
    • Roman cognominia: originally nicknames that were later utilized to augment family names to identify a particular branch within a family or family within a clan
    • Roman gentes: identified a family consisting of all those individuals who shared the same nomen and claimed descent from a common ancestor
    • Roman nominia: hereditary surnames that identified a person as a member of a distinct gens
    • Roman praenomina: early personal names chosen by the parents of a Roman child originally bestowed the eighth day after the birth of a girl, or the ninth day after the birth of a boy; the praenomen would then be formally conferred a second time when girls married, or when boys reaching manhood and assumed the toga virilis (which in the case of Romans boys was about age 14 or 15)
  • Surnames: given names and nicknames that are also surnames

If you typed “Garfeild” into a word processor, it would probably be underlined with a squiggly red line signifying a misspelling. It is the name “Garfield” with the “IE” reversed to “EI”—a common mistake.

The fuzzy logic technology in the Pro edition of this software allows matching name data that has typographical errors. If you look at the fuzzy logic examples we have provided below, you are likely to see errors you have repeatedly made or seen. In many cases you will have to look close to see the difference, but they are different.

Fuzzy logic attempts to duplicate real errors created while entering names into databases. The most likely typographical errors are determined based on the number of letters, the characters involved, where they are located in the name, the language, and other factors.

The biggest advantage in our technology is in its ability to work with language rules that indicate how individual of various nationalities may hear and spell names.

Some fuzzy logic spellings have one typographical error while others have multiple issues, so the technology is suited for even the worst typists and transcribers. The algorithms have five layers:

Phonetic misspellings

These algorithms look at digraphs, trigraphs, tetragraphs, pentagraphs, hexagraphs, and even a German heptagraph, “SCHTSCH”, used to translate Russian words with the “SHCHA” or “SHCH” (romanticized) sound. These are, respectively, two to seven letter sequences that form one phoneme or distinct sound. Most of letter sequences trigraph and above are Irish who have more language rules than you can shake a stick at.

Many misspellings occur as transcribers enter the sounds they hear. The character sequences and the sounds they produce are different for each language and situation, such as before, after, or between certain vowels and consonants, so our substitutions are language-rule based. Furthermore, our algorithms consider both how a name may sound to someone who speaks English as well as how it may sound to someone who speaks Spanish, which is often different. Take the digraph “SC”. Before the vowels “E” or “I” it is most likely to be misspelled by an English speaker as “SHE” or “SHI” while a Spanish speaker may hear “CHE” or “CHI” and sometimes “YE” or “YI”. Our library includes over 80,000 language-based letter sequence phonetic rules. Phonetic misspelling examples:

Example 1 | Real: BARTHOLOMEW | Fuzzy: BARTHOLOMUE
Example 2 | Real: DAWNETTE | Fuzzy: DAUNETTE
Example 3 | Real: NATHANIEL | Fuzzy: NATHANAIL
Example 4 | Real: PHYLLIS | Fuzzy: FYLLIS
Example 5 | Real: SIGOURNEY | Fuzzy: SIGOURNI
Example 6 | Real: XAVIER | Fuzzy: XAVAR

Reversed digraphs

These algorithms look for misspellings due to reversed digraphs (two letter sequences that form one phoneme or distinct sound) which are a common typographical issue, such as “IE” substituted for “EI”. The character sequences and the sounds they produce are different for each language and situation, such as before, after, or between certain vowels and consonants, so our substitutions are language-rule based. Reversed digraph examples:

Example 7 | Real: ANNABETH | Fuzzy: ANNABEHT
Example 8 | Real: CAETLIN | Fuzzy: CEATLIN
Example 9 | Real: EUGENE | Fuzzy: UEGENE
Example 10 | Real: FRIEDRICH | Fuzzy: FREIDRICH
Example 11 | Real: RAQUEL | Fuzzy: RAUQEL
Example 12 | Real: VICKTOR | Fuzzy: VIKCTOR

Double letter misspellings

These algorithms look for misspellings due to double letters typed as single letters and single letters that are doubled. The most common typographical issues occur with the characters, in order of frequency, “SS”, “EE”, “TT”, “FF”, “LL”, “MM”, and “OO”. Double-letter misspelling examples:

Example 13 | Real: EMANNUEL | Fuzzy: EMMANNUEL
Example 14 | Real: KASSANDREA | Fuzzy: KASANDREA

Missed letters

These algorithms look for missed keystrokes and provide fuzzy logic matches with missing letters. Unlike the other algorithms, these are not language specific. Keystrokes can be missed in any language. Missed letter examples:

Example 15 | Real: ABDUL | Fuzzy: ADUL
Example 16 | Real: MARGARET | Fuzzy: MRGARET

String manipulations

These algorithm changes letters and syllables in a variety of ways. They are less guided by language rules and more guided by randomness. String manipulation examples:

Example 17 | Real: CYNTHIA | Fuzzy: CYNTTHA
Example 18 | Real: GERALD | Fuzzy: GERLLD

Both editions include the same names and features except the Pro version comes equipped with fuzzy logic. Fuzzy logic allows matching when lists have typographical errors. The Standard edition has everything except fuzzy logic.

pdSurname

How do you match and merge last names on your lists? The answer is the new pdSurname. It is a one-of-a-kind proprietary resource that does for surnames what our highly regarded pdNickname software does for first names. The package is designed to facilitate matching last names that are not exactly the same but are close in relationship, spelling, or sound. It is available in Pro and Standard editions.

Regardless of the version, it contains a large set of last names and variations covering more than 600 languages and all races along with a host of additional features never before available on this scale. The software is also recommended for genealogical and scholarly research.

Each record has two sets of name information, a NAME1 side and a NAME2 side. The relationship between each name pair is identified as a close, near, or distant onomastic variation, as a phonetic variation or, in the Pro edition only, as a fuzzy logic variation.

The onomastic distance of true variations is rated on a 1 (closest) to 3 scale. The value is determined by tabulating or estimating the number of lines separating the names on a name tree.

Records are coded as follows:

  • 1 = Close onomastic variant
  • 2 = Near onomastic variant
  • 3 = Distant onomastic variant
  • P = Phonetic match
  • F = Fuzzy logic match (Pro only)

Here are some examples of related names:

Example 1 | Name1: ACKERMAN | Name2: AKERMAN | Close onomastic variant (1)
Example 2 | Name1: MANCILL | Name2: MONSELL | Near onomastic variant (2)
Example 3 | Name1: WILLIAMSON | Name2: WILMSEN | Distant onomastic variant (3)
Example 4 | Name1: CORREY | Name2: CURIE | Phonetic match (P)
Example 5 | Name1: SANTILLA | Name2: SANTOLLA | Phonetic match (P)
Example 6 | Real: GUALTIERREZ | Fuzzy: GUALTIEREZ | Fuzzy logic match (Pro only) (F)
Example 7 | Real: AAGARD | Fuzzy: OUGHGARD | Fuzzy logic match (Pro only) (F)

A major benefit of the software is the advanced system for indexing last names based on sound and spelling. Our structure analyzes hundreds of phonetic lines to make comparisons. These include proprietary algorithms designed for specific languages, language families, and dialects along with special situations. They are the most advanced algorithms ever designed for last names.

Additionally, as part of our phonetic indexing process, we include matches from six open source algorithms most data engineers are familiar with:

  • Soundex
  • Metaphone
  • Double Metaphone
  • New York State Identification and Intelligence System (NYSIIS)
  • Caverphone
  • Daitch-Mokotoff Soundex

The overall quality of each name-pair match is quantified on a scale of 01 (best) to 99. The scoring considers several factors:

  • Phonetic points from our proprietary algorithm
  • How many open source algorithms were matched
  • How close the languages and race match
  • If the name pairs are onomatologically linked

The number of matches from a query can sometimes be very numerous, and the score is effective in ordering the output for filtering. Users will find this a major advantage with our system.

Our algorithms are specially tuned to work effectively with names that have prefixes such as MC, MAC, O, DE, LA, VAN, AL, ST, and many others. Traditionally phonetic algorithms have difficulty with these names because the prefixes create numerous false matches and miss true matches. Our algorithm greatly reduces this problem by separately measuring both the full name and the main part of the name following the prefix. Knowing the language of the name is key to our technique. Users will find this a major advantage with our system. Here are examples:

Example 1 | Name1: MCARTHUR | Name2: MCDALE | FALSE MATCH: Matched by open sources but not by our proprietary algorithms
Example 2 | Name1: DEGARCIA | Name2: GARCIA | TRUE MATCH: Matched only by our proprietary algorithms

Yes, language coverage is extensive. The list exceeds 600 languages, language families, and dialects. Some languages refer to ethnic groups. None of the languages were derived algorithmically and the provided information represents years of extensive onomastic research. When different sources list different origins and usages they may be combined depending on the reliability of the source and the reasonability of the information. Differently styled names can have different language values.

Top 30 languages

The following are the top 30 languages with the number of occurrences in the names database. The language count is one for each unique name formation and not one for each relationship (which would be many more):

  1. Polish Jewish (36,900)
  2. Irish (35,500)
  3. Czech Jewish (30,200)
  4. German Jewish (22,500)
  5. German (20,200)
  6. Spanish (16,700)
  7. French (14,600)
  8. English (14,500)
  9. Italian (12,800)
  10. Scottish (12,400)
  11. Other Jewish (7,700)
  12. Dutch (7,100)
  13. Russian (5,400
  14. Polish (5,000)
  15. Catalan (3,500
  16. Armenian (2,800)
  17. Arabic (2,500)
  18. Native American Dakota (2,500)
  19. Japanese (2,200)
  20. Czech (2,200)
  21. Hindi (2,000)
  22. Swedish (1,700)
  23. Hungarian (1,700
  24. Middle English (1,600)
  25. Norwegian (1,400)
  26. Indian (1,300
  27. Turkish (1,300)
  28. Anglicized Irish (1,200)
  29. Ukrainian (1,100)
  30. Welsh (1,100)

Note that the counts are rounded to the lower 100.

A list of all the identified languages is included with the software as a Microsoft Excel (XLSX) file. The language names chosen are detailed and easy to search for.

Yes, the race usage of each name is identified in a series of fields which provide an actual or estimated percentage of use for each race. Differently styled names can have different race values. Race coverage includes:

  • White
  • Black
  • Hispanic/Latino
  • Asian/Pacific
  • Native American/Alaskan
  • Multirace

Yes, pdSurname, pdNickname, and pdGender make excellent partners. They have been developed to be fully compatible. The name pair format in pdSurname is very similar to the pdNickname database except pdSurname is used to match last names while pdNickname matches first names. pdGender is based on the first name database and is designed to apply gender identification to first name records. Note that pdNickname and pdGender are not required to use pdSurname but they are highly attuned to work together.

Yes, the software is recommended for ancestry researchers, students, teachers, and scholars. Attention has been paid to accurately and precisely representing the origin and history of last names and the relationships between them. It is of particular benefit in the following fields:

  • Genealogy
  • Anthroponymy
  • Onomatology
  • Ethnology
  • Linguistics
  • Related fields

In addition to the onomasiological research, during each development cycle certain aspects are emphasized. In this version of pdSurname special attention was paid to the following:

  • Hispanic/Latino/Iberian names: including Spanish, Basque, Catalan, Galician, and Portuguese
  • Native American names: in fact we designed this to be the definitive collection
  • Irish names: due to the excellent records maintained by the National Library of Ireland
  • English names: including Anglo-Saxon, Middle English, and Modern English
  • Ashkenazi Jewish names: particularly Polish Jewish, German Jewish, and Czech Jewish
  • Prefix names: our algorithms are specially tuned to work effectively with names that have prefixes such as MC, MAC, O, DE, LA, VAN, AL, ST, and many others

The fuzzy logic technology in the Pro edition of this software allows matching data that has typographical errors. If users look at the fuzzy logic records, they are likely to see errors they have repeatedly made or seen. In many cases you will have to look close to see the difference, but they are different. There are more than 28 million fuzzy logic records.

The most likely typographical errors are determined based on the number of letters, the characters involved, where they are located in the name, the language, and other factors. None of the fuzzy spellings formulate a real name already in the database. This sometimes happens when the fuzzy spelling was already a real variation of the same name.

Some fuzzy logic matches have one typographical error while others have multiple issues, so the technology is suited for even the worst typists and transcribers. The algorithms have five layers:

Phonetic misspellings

These algorithms look at digraphs, trigraphs, tetragraphs, pentagraphs, hexagraphs, and even a German heptagraph, “SCHTSCH”, used to translate Russian words with the “SHCHA” or “SHCH” (romanticized) sound. These are, respectively, two to seven letter sequences that form one phoneme or distinct sound. Most of letter sequences trigraph and above are Irish who have more language rules than you can shake a stick at.

Many misspellings occur as transcribers enter the sounds they hear. The character sequences and the sounds they produce are different for each language and situation, such as before, after, or between certain vowels and consonants, so our substitutions are language-rule based. Furthermore, our algorithms consider both how a name may sound to someone who speaks English as well as how it may sound to someone who speaks Spanish, which is often different. Take the digraph “SC”. Before the vowels “E” or “I” it is most likely to be misspelled by an English speaker as “SHE” or “SHI” while a Spanish speaker may hear “CHE” or “CHI” and sometimes “YE” or “YI”. Our library includes over 80,000 language-based letter sequence phonetic rules. Phonetic misspelling examples:

Example 1 | Real: AGLIANO | Fuzzy: ALLANO
Example 2 | Real: GUALTIERREZ | Fuzzy: GUALTIEREZ
Example 3 | Real: HEATHFIELD | Fuzzy: HEATHFALD
Example 4 | Real: AAGARD | Fuzzy: OUGHGARD
Example 5 | Real: YOUNGMAN | Fuzzy: YONGMAN

Reversed digraphs

These algorithms look for misspellings due to reversed digraphs (two letter sequences that form one phoneme or distinct sound) which are a common typographical issue, such as “IE” substituted for “EI”. The character sequences and the sounds they produce are different for each language and situation, such as before, after, or between certain vowels and consonants, so our substitutions are language-rule based. Reversed digraph examples:

Example 6 | Real: ANGLES | Fuzzy: ANLGES
Example 7 | Real: DIELEMAN | Fuzzy: DEILEMAN
Example 8 | Real: OLEARY | Fuzzy: OLAERY
Example 9 | Real: RODREGUEZ | Fuzzy: RODREUGEZ
Example 10 | Real: SCHUMACHER | Fuzzy: SCHUMAHCER

Double letter misspellings

These algorithms look for misspellings due to double letters typed as single letters and single letters that are doubled. The most common typographical issues occur with the characters, in order of frequency, “SS”, “EE”, “TT”, “FF”, “LL”, “MM”, and “OO”. Double-letter misspelling examples:

Example 11 | Real: HUMBER | Fuzzy: HUMBEER
Example 12 | Real: ZWOLLE | Fuzzy: ZWOLE

Missed letters

These algorithms look for missed keystrokes and provide fuzzy logic matches with missing letters. Unlike the other algorithms, these are not language specific. Keystrokes can be missed in any language. Missed letter examples:

Example 13 | Real: HUNTER | Fuzzy: UNTER
Example 14 | Real: TAMERON | Fuzzy: TAMRON

String manipulations

Because so many of our algorithms are language-rule bases, additional name string manipulations are provided for the relatively small number of names without language applied. Most of these are similar to the reversed digraph substitutions. String manipulation examples:

Example 15 | Real: ELWORTHY | Fuzzy: ELWROTHY
Example 16 | Real: PEOPLE | Fuzzy: POEPLE

More about fuzzy logic

Both editions include the same names and features except the Pro version comes equipped with fuzzy logic. Fuzzy logic allows matching when lists have typographical errors. The Standard edition has everything except fuzzy logic.

pdGender

Male and female identification is essential for businesses and organizations. It allows you to send mail with a personal touch. Gender Coding also allows you to filter, map, and analyze your data based on this critical demographic. pdGender lets you accomplish this in ways not before possible on this scale. It is available in Pro and Standard editions.

Coverage includes hundreds of thousands of names and the package employs the best matching algorithms designed for this process. As an added benefit, languages of origin and use have also been researched and included along with additional features never before available on this scale.

The product is like no other gender coding database. It is essentially very simple, but with a lot of power. Users match the first names in their database lists and the software provides male or female gender identification.

The main gender database list 397,000 given names and nicknames, complete with gender, languages of origin and use, name rank in the United States, and other demographics.

For unisex names, there are special filters allowing users to tweak the output based on languages and nationalities, usage by gender, and other factors.

The database contains all the first name spellings gathered and published by the U.S. Census Bureau and Social Security Administration between 1800 and the present time, related nicknames, and ethnic given names and nicknames not found in the United States. About 75 percent of the given names and nicknames can be found in the United States, and the remainder only found outside the United States.

Using this software results in reduced long-term costs, improved customer service, and better marketing data.

Yes, language coverage is extensive. The list exceeds 500 languages, language families, and dialects. Some languages refer to ethnic groups. None of the languages were derived algorithmically and the provided information represents years of extensive onomastic research. When different sources list different origins and usages they may be combined depending on the reliability of the source and the reasonability of the information.

Top 30 languages

The following are the top 30 languages with the number of occurrences in the names database. The language count is one for each unique name formation and not one for each relationship (which would be many more):

  1. English (225,000)
  2. Arabic (46,700)
  3. Turkish (6,700)
  4. Punjabi (6,700)
  5. French (5,900)
  6. Iranian (5,800)
  7. Urdu (4,900)
  8. Afghan Arabic (4,400)
  9. Swedish (4,100)
  10. Finnish (3,600)
  11. Spanish (3,300)
  12. Italian (3,100)
  13. Bengali (3,100)
  14. German (3,100)
  15. Pashto (3,000)
  16. Norwegian (3,000)
  17. Danish (2,900)
  18. Korean (2,700)
  19. Egyptian Arabic (2,400)
  20. Polish (2,000)
  21. Czech (2,000)
  22. Russian (1,900)
  23. Dutch (1,800)
  24. Hungarian (1,800)
  25. Portuguese (1,700)
  26. Malaysian Malay (1,700)
  27. Albanian (1,700)
  28. Japanese (1,700)
  29. Bosniak Bosnian (1,500)
  30. Icelandic (1,400)

Note that the counts are rounded to the lower 100.

Also note that the Arabic and Muslim name section is very large due to the many different variations and ways of writing these names. These include theophoric combination names such as those with the religious prefix “Abdul”. Both common and uncommon possibilities are included, and the use of Sun Letters in Arabic and Maltese is accounted for.

A list of all the identified languages with counts is included with the software as a Microsoft Excel (XLSX) file. The language names chosen are detailed and easy to search for.

This software identifies names as male (“M”), female (“F”) or, when the name is both male and female, unisex (“U”).

The WORLD field is the first in a series of 141 gender coding fields. Notably, it is the only gender coding field without filters of any kind. It is called “world” because it defines the basic international usage of each name. It can be utilize like the standard unfiltered gender coding fields most users are familiar with in other products. It derives the largest number of unisex identifications because it gives equal weight to all languages and nationalities. If a name if male in the United States and female in Vietnam, this field will flag the name as unisex.

Following this field are a series of 140 filtered gender coding fields which are the heart of the pdGender matching system. They allow filtering the gender coding output for languages and nationalities, rare usage by one gender, archaic names, and nicknames. Here are examples:

  • When names are different genders in different languages and nationalities, users can choose which languages and nationalities to take precedence
  • When names are one gender now in current times, but were the opposite gender in a previous era, the system automatically applies the modern usage
  • When names are one gender when used as a proper given name and unisex when used as a nickname, users can choose to have the given name usage applied
  • When unisex names are only rarely used by one gender and much more common in the opposite gender, users can choose to ignore the rare instances

The field names are designed to indicate what filters are applied. Here are examples:

Example 1 | Field: WORLD_XA | Description: Gives equal weight to all languages and nationalities and reduces precedence of archaic names in gender determination
Example 2 | Field: USA_XAN | Description: Prioritizes United States names and reduces precedence of archaic names and nicknames in gender determination
Example 3 | Field: EN_FR_XAR | Description: Prioritizes English and French names and reduces precedence of archaic names and rare usages in gender determination
Example 4 | Field: HISP_XANR | Description: Prioritizes Hispanic names and reduces precedence of archaic names, nicknames, and rare usages in gender determination

Prioritizing Languages and Nationalities

Because names can have different genders in different languages and nationalities, a filter is provided allowing the choice of which languages and nationalities to take precedence. There are 35 options which are indicated in the prefix of each gender coding field name. The choices and field name prefixes are:

Prefix | Filter
WORLD_ | All languages and nationalities are given equal weight
USA_ | United States names are prioritized
US_ES_ | United States and Spanish (Español) names are prioritized
US_HS_ | United States and Hispanic names are prioritized
US_FR_ | United States and French names are prioritized
ENG_ | English names are prioritized
EN_AA_ | English and African American names are prioritized
EN_ES_ | English and Spanish (Español) names are prioritized
EN_HS_ | English and Hispanic names are prioritized
EN_FR_ | English and French names are prioritized
AFRAM_ | African American names are prioritized
SPA_ | Spanish names are prioritized
HISP_ | Hispanic names are prioritized
FRA_ | French names are prioritized
AFR_ | African (non-Muslim) names are prioritized
BRIT_ | British names are prioritized
CEL_ | Celtic (language family) names are prioritized
EASIA_ | East Asian names are prioritized
EA_PI_ | East Asian and Pacific Islander names are prioritized
GAEL_ | Gaelic (Goidelic language family) names are prioritized
DEU_ | German (Deutsch) names are prioritized
GEM_ | Germanic (language family) names are prioritized
HAW_ | Oceania Hawaiian names are prioritized
IND_ | Indian (South Asia) names are prioritized
ITA_ | Italian names are prioritized
JW_ | Jewish, Yiddish, and Hebrew names are prioritized
MUS_ | Muslim names are prioritized
NATAM_ | Native American names are prioritized
PISLR_ | Pacific Islander names are prioritized
ROA_ | Romance (language family) names are prioritized
SCAND_ | Scandinavian names are prioritized
SLA_ | Slavic (language family) names are prioritized
CYM_ | Welsh (Cymraeg) names are prioritized
WEST_ | Western World names are prioritized
NWEST_ | Non-Western World names are prioritized

Adding Other Filters

The suffix of each gender coding field name indicates any additional filters that are applied. They all begin with an “X”, indicating eXclusion, followed by up to three characters (“A”, “N”, and/or “R”, in respective order) showing what filters are applied. There are four possible suffixes:

Suffix | Filter
_XA | Reduces precedence of archaic names in gender determination
_XAN | Reduces precedence of archaic names and nicknames in gender determination
_XAR | Reduces precedence of archaic names and rare usages in gender determination
_XANR | Reduces precedence of archaic names, nicknames, and rare usages in gender determination

One of the most useful features of the software is rare usages of names are identified by language. These flags are applied to unisex names and show when a name is used less than 20 percent of the time in the cited language and gender. This indicator allows filtering out rare usages in gender coding.

Note that rare usage indicators should not be compared for different languages, only within the same language. Because a name usage is labeled rare in Spanish and not in English does not mean the name is used less in Spanish than English, rather it means it is rare in Spanish compared to the Spanish opposite gender usage.

Basic gender coding can be performed with pdNickname, but pdGender is specifically designed for the task. It has multiple gender coding fields filtered for languages, rare unisex usage by one gender, archaic names, and nicknames. Some extras pdGender can do include:

  • When names are different genders in different languages and nationalities, users can choose which languages and nationalities to take precedence
  • When names are one gender now in current times, but were the opposite gender in a previous era, the system automatically applies the modern usage
  • When names are one gender when used as a proper given name and unisex when used as a nickname, users can choose to have the given name usage applied
  • When unisex names are only rarely used by one gender and much more common in the opposite gender, users can choose to ignore the rare instances

Yes, pdGender, pdNickname, pdSurname make excellent partners. They have been developed to be fully compatible. The name pair format in pdNickname is very similar to the pdSurname database except pdNickname is used to match give names and nicknames while pdSurname matches last names. pdGender is based on the first name database and is designed to apply gender identification to first name records. Note that pdNickname and pdSurname are not required to use pdGender but they are highly attuned to work together.

Yes, in addition to being a powerful resource for businesses and organizations working with list of names, the software is recommended for ancestry researchers, students, teachers, and scholars. Attention has been paid to accurately and precisely representing the origin and history of given first names (also known as personal names and forenames) and nicknames (including short forms, diminutives, and even hypocoristics) and the relationships between them. It is of particular benefit in the following fields:

  • Genealogy
  • Anthroponymy
  • Onomatology
  • Ethnology
  • Linguistics
  • Related fields

Special and unique origins

Of interest to those studying names, many records provide information about special and unique origins:

  • Names from religion:
    • Biblical
    • Quranic
    • Sanskrit
  • Bynames: a familiar name for a person, similar to a nickname, that is often used as a replacement for a personal name—for example, “Rocky” is a common byname for boxers
  • History: names that became known through historical events
  • Literature: literary names created by authors, composers, and poets
  • Names from mythology and legend:
    • Arthurian Legend
    • Egyptian Mythology
    • Greek Mythology
    • Irish Mythology
    • Judeo-Christian Legend
    • Norse Mythology
    • Roman Mythology
    • Many others mythologies and legends
  • Roman names:
    • Roman cognominia: originally nicknames that were later utilized to augment family names to identify a particular branch within a family or family within a clan
    • Roman gentes: identified a family consisting of all those individuals who shared the same nomen and claimed descent from a common ancestor
    • Roman nominia: hereditary surnames that identified a person as a member of a distinct gens
    • Roman praenomina: early personal names chosen by the parents of a Roman child originally bestowed the eighth day after the birth of a girl, or the ninth day after the birth of a boy; the praenomen would then be formally conferred a second time when girls married, or when boys reaching manhood and assumed the toga virilis (which in the case of Romans boys was about age 14 or 15)
  • Surnames: given names and nicknames that are also surnames

If you typed “Garfeild” into a word processor, it would probably be underlined with a squiggly red line signifying a misspelling. It is the name “Garfield” with the “IE” reversed to “EI”—a common mistake.

The fuzzy logic technology in the Pro edition of this software allows matching name data that has typographical errors. If you look at the fuzzy logic examples we have provided below, you are likely to see errors you have repeatedly made or seen. In many cases you will have to look close to see the difference, but they are different.

Fuzzy logic attempts to duplicate real errors created while entering names into databases. The most likely typographical errors are determined based on the number of letters, the characters involved, where they are located in the name, the language, and other factors.

The biggest advantage in our technology is in its ability to work with language rules that indicate how individual of various nationalities may hear and spell names.

Some fuzzy logic spellings have one typographical error while others have multiple issues, so the technology is suited for even the worst typists and transcribers. The algorithms have five layers:

Phonetic misspellings

These algorithms look at digraphs, trigraphs, tetragraphs, pentagraphs, hexagraphs, and even a German heptagraph, “SCHTSCH”, used to translate Russian words with the “SHCHA” or “SHCH” (romanticized) sound. These are, respectively, two to seven letter sequences that form one phoneme or distinct sound. Most of letter sequences trigraph and above are Irish who have more language rules than you can shake a stick at.

Many misspellings occur as transcribers enter the sounds they hear. The character sequences and the sounds they produce are different for each language and situation, such as before, after, or between certain vowels and consonants, so our substitutions are language-rule based. Furthermore, our algorithms consider both how a name may sound to someone who speaks English as well as how it may sound to someone who speaks Spanish, which is often different. Take the digraph “SC”. Before the vowels “E” or “I” it is most likely to be misspelled by an English speaker as “SHE” or “SHI” while a Spanish speaker may hear “CHE” or “CHI” and sometimes “YE” or “YI”. Our library includes over 80,000 language-based letter sequence phonetic rules. Phonetic misspelling examples:

Example 1 | Real: BARTHOLOMEW | Fuzzy: BARTHOLOMUE
Example 2 | Real: DAWNETTE | Fuzzy: DAUNETTE
Example 3 | Real: NATHANIEL | Fuzzy: NATHANAIL
Example 4 | Real: PHYLLIS | Fuzzy: FYLLIS
Example 5 | Real: SIGOURNEY | Fuzzy: SIGOURNI
Example 6 | Real: XAVIER | Fuzzy: XAVAR

Reversed digraphs

These algorithms look for misspellings due to reversed digraphs (two letter sequences that form one phoneme or distinct sound) which are a common typographical issue, such as “IE” substituted for “EI”. The character sequences and the sounds they produce are different for each language and situation, such as before, after, or between certain vowels and consonants, so our substitutions are language-rule based. Reversed digraph examples:

Example 7 | Real: ANNABETH | Fuzzy: ANNABEHT
Example 8 | Real: CAETLIN | Fuzzy: CEATLIN
Example 9 | Real: EUGENE | Fuzzy: UEGENE
Example 10 | Real: FRIEDRICH | Fuzzy: FREIDRICH
Example 11 | Real: RAQUEL | Fuzzy: RAUQEL
Example 12 | Real: VICKTOR | Fuzzy: VIKCTOR

Double letter misspellings

These algorithms look for misspellings due to double letters typed as single letters and single letters that are doubled. The most common typographical issues occur with the characters, in order of frequency, “SS”, “EE”, “TT”, “FF”, “LL”, “MM”, and “OO”. Double-letter misspelling examples:

Example 13 | Real: EMANNUEL | Fuzzy: EMMANNUEL
Example 14 | Real: KASSANDREA | Fuzzy: KASANDREA

Missed letters

These algorithms look for missed keystrokes and provide fuzzy logic matches with missing letters. Unlike the other algorithms, these are not language specific. Keystrokes can be missed in any language. Missed letter examples:

Example 15 | Real: ABDUL | Fuzzy: ADUL
Example 16 | Real: MARGARET | Fuzzy: MRGARET

String manipulations

These algorithm changes letters and syllables in a variety of ways. They are less guided by language rules and more guided by randomness. String manipulation examples:

Example 17 | Real: CYNTHIA | Fuzzy: CYNTTHA
Example 18 | Real: GERALD | Fuzzy: GERLLD

Both editions include the same names and features except the Pro version comes equipped with fuzzy logic. Fuzzy logic allows matching when lists have typographical errors. The Standard edition has everything except fuzzy logic.

pdGeoTIGER

Location data plays a central role in decision making, from market analysis and risk assessment to targeting and customer management, and it is important precise GeoCoding information is employed from the start. pdGeoTIGER was developed to provide exactly this. It is available in Pro and Standard editions.

These easy-to-use, comprehensive, and up-to-date packages permit exceptionally precise assignment of United States latitude and longitude coordinates, area size data, urban and rural indicators, legal and statistical area identifiers and indicators, and other geographic information.

Yes, an excellent 31-million record ZIP+4 database is included in all editions of the software.

The included address ranges point to a sequential line of potential addresses and not individual addresses. All possible structure numbers are included in the range, from the first structure to the last, and all structure numbers of the same parity (odd, even, or both) in between, regardless of if the actual structure currently exists.

The following legal and statistical areas are identified in all editions of the software:

  • ANRC
  • AIANNH
  • AITSUB
  • Tract
  • Block Group
  • Block
  • Block Suffix
  • Tribal Tract
  • Tribal Block Group
  • ZCTA5
  • Combined-NECTA
  • CSA
  • Congressional District
  • Consolidated City
  • County
  • County Subdivision
  • Division
  • CBSA
  • Metropolitan Division
  • NECTA
  • NECTA-Division
  • Place
  • PUMA
  • Region
  • School District
  • State
  • State Legislative District
  • Subbarrio
  • Urban Area
  • UGA
  • VTD
  • Historical areas

Yes, the database provides United States Census Bureau internal point latitude and longitude coordinates for census blocks and they are presented in multiple formats.

Any location on Earth can be described with two numbers—its latitude and its longitude. If a pilot or a ship’s captain wants to specify a position on a map, these are the coordinates they would use. In actuality, these coordinates are angles, measured in degrees, minutes, and seconds of arc.

Internal point latitude and longitude coordinates are a calculated point that is at or near the geographic center of the entity. For some irregularly shaped entities (such as those shaped like a crescent), the calculated geographic center may be located outside the boundaries of the area. In such instances, the internal point is identified as a point inside the entity boundaries nearest or near the calculated geographic center.

The following are the three formats provided for each set of coordinates; the examples are for the same latitude and longitude in Apache County, Arizona.

  • Degrees: seven decimal places; examples, +34.0874945, -109.3283640
  • Converted to radians: 15 numeric places; for trigonometry functions; examples, 0.594939012780458, -1.908139917618838
  • Degrees/Minutes/Seconds: for printing coordinates in documents and on websites; examples, 34° 5' 15'' N, 109° 19' 42'' W

Yes, census block total area size is entered in whole square meters, and the block‘s land and water characteristics are identified with the following values:

  • G = Glacier
  • I = Intermittent Water
  • L = Land
  • P = Permanent Water
  • S = Swamp/Marsh

Urban and rural characteristics are provided for census blocks. For the 2010 Census, the United States Census Bureau classified as urban all territory, population, and housing units located within urbanized areas (UA) and urban clusters (UC), both defined using the same criteria. The bureau delineates UA and UC boundaries that represent densely developed territory, encompassing residential, commercial, and other nonresidential urban land uses. In general, this territory consists of areas of high population density and urban land use resulting in a representation of the “urban footprint”. Rural areas consist of all territory, population, and housing units located outside UAs and UCs.

For the 2010 Census, the urban and rural classification was applied to the 50 states; the District of Columbia (federal district); and the Commonwealth of Puerto Rico, American Samoa, Guam, the Commonwealth of the Northern Mariana Islands, and the U.S. Virgin Islands insular areas.

Urban/Rural coding is as follows:

  • U = Urban
  • R = Rural

It is drawn from the most recent edition of the United States Census Bureau TIGER/Line® Shapefiles, United States Postal Service (USPS) data, and other proprietary information.

Both editions include an accurate 31 million record ZIP+4 GeoCoding database and a bonus pdGeoSupplement reference file. The Pro version adds a 60 million record address range GeoCoding database for even greater precision. The Standard edition has everything except the address range information.

pdCensus2010

United States census demographics are an indispensable tool for businesses, organizations, schools, researchers, students, and government. pdCensus2010 provides 150 of the most important 2010 Census variables along with additional useful information. It is available in Pro and Standard editions.

Tabulated at multiple summary levels and geographic components, these packages offer an easy-to-use, comprehensive, and up-to-date United States demographics database.

The population, household, group quarter, and housing unit demographics provided are the most important part of the package. The following subject areas are covered:

  • Population:
    • Total count
    • Urban/Rural
    • Gender
    • Age
    • Median age
    • Race
    • Hispanic or Latino
  • Households and Group Quarter Population:
    • Total count
    • Family households
    • Non-family households
    • Average household size
    • Average family size
    • Population in group quarters
  • Housing Units:
    • Total count
    • Urban/Rural
    • Occupied housing units
    • Race
    • Hispanic or Latino
    • Vacant housing units

Geographic areas are selected using a system of stratification levels made up of up to 164 summary levels and 96 geographic components, along with other legal and statistical area identification, characteristics, and special indicator fields.

Summary levels specify the linear geographical hierarchy of the areas being tabulated or analyzed. Some summary levels are tabulated at a state or equivalent entity level while others are tabulated at the United States nation level (not bound by state borders at the top of the summary level hierarchy).

Summary levels are tied to geographic components which provide a facility to restrict summary levels to specific elements of the geographic area such as urban areas, rural areas, tribal areas, metropolitan or micropolitan areas, principal cities, and other like elements. Geographic components are most often used in combination with summary levels but can be employed independent of them.

The largest summary level is United States. The smallest available summary level is the Census Block in the Pro edition and Census Block Group in the Standard edition.

Economic variables were covered in the old United States Census Bureau Summary File 3 (SF3). Because in 2010 the bureau decided to replace SF3 with the American Community Survey (ACS), which collects responses continuously instead of every ten years, pdCensus2010 also does not have economic data.

It is drawn from United States Census Bureau 2010 Summary File 1 (SF1) data, including updates, and other proprietary information.

The Pro edition includes tabulations to the smallest Census geographic level, the Census Block. The Standard version includes everything except the Census Block summation, and instead tabulates to the second smallest Census geographic level, the Census Block Group. Both include a bonus pdGeoSupplement reference file.

pdGeoSupplement

The bonus software provided additional reference information for legal and statistical census areas covered in pdGeoTIGER and pdCensus2010.

The following United States Census Bureau coding is provided:

  • American National Standards Institute (ANSI) identification codes
  • Metropolitan/Micropolitan Statistical Area Principal City Indicator
  • New England City and Town Area Principal City Indicator
  • American Indian Area/Alaska Native Area/Native Hawaiian Home Land Federal/State Recognition Indicator
  • Metropolitan/Micropolitan Statistical Area Status Indicator
  • New England City and Town Area Status Indicator
  • Urban Area Type Indicator
  • Urban Growth Area Type Indicator
  • Congressional Session
  • State Legislative Year
  • Voting District Indicator
  • School District Type Indicator
  • School District Low Grade Indicator
  • School District High Grade Indicator

It is provided as a free bonus with pdGeoTIGER and pdCensus2010.

No, copyright requirements only allow us to license the bonus software as part of pdGeoTIGER and pdCensus2010.

pdCensus2000

United States census data from 2000 continues to be an essential resource, and pdCensus2000 offers more than 150 of the most indispensable Census 2000 variables in an easy-to-use format.

Akin to its newer sibling pdCensus2010, this package is a comprehensive United States national demographics database built from Census 2000 data and tabulated at multiple summary levels.

The demographics provided are the most important part of the package. The following subject areas are covered:

  • Geography
  • Population
  • Age
  • Race
  • Housing Units
  • Income
  • Employment

Geographic areas are selected using a system of stratification levels made up of seven summary levels, along with other legal and statistical area identification, characteristics, and special indicator fields. Summary levels specify the linear geographical hierarchy of the areas being tabulated or analyzed. The following summary levels are included:

  • State
  • County
  • County Subdivision
  • Census Tract
  • Census Block Group
  • Census 3-digit ZIP Code Tabulation Area (ZCTA3)
  • Census 5-digit ZIP Code Tabulation Area (ZCTA5)

The largest summary level is State. The smallest available summary level is the Census Block Group.

It is drawn from United States Census Bureau 2000 Summary File 1 (SF1) and 2000 Summary File 3 (SF3) data, including updates, and other proprietary information.

The American Community Survey (ACS) shares many similarities with long-form Summary File 3 (SF3) data available in previous decennial Censuses, and which it replaces. However, there are many differences. The chief advantage of ACS data is its far more frequent release. It collects responses continuously instead of every ten years. This gives planners at all levels of government, business, and the general public far more current data than the decennial long form, and provides for the first time information about temporary populations, such as beach and ski communities.

But this advantage is also a disadvantage. While the ACS is timelier, information is also smoothed (flattened) out and has a lower accuracy rate because it is conducted over years of time instead of at a single point in time. This is particularly prevalent for small geographic areas which must pool three or five years of data to accumulate a large enough sample for reliable estimates.

There are also changes in residence rules, boundaries and definitions of geographic areas, how and which questions are asked, and survey methodology.

pdCountry

The world is becoming a smaller place and a handy collection of key country data is invaluable. pdCountry fits the bill in good fashion representing the entire globe. This easy-to-use, comprehensive, and up-to-date reference package provides core country information, GeoCoding data, and a host of useful demographic variables. It is available in Pro and Standard editions.

Uses are innumerable, and no company or organization that does international business should be without it. Financial companies, travel agents, webmasters, news agencies, research institutions, schools, students, and government will find it of particular value.

It is a comprehensive global database covering 211 current countries, 29 regions (including the World), and seven former countries (two former countries are only in the Pro edition).

It includes the following core information about countries and regions:

  • ISO numeric code
  • Regions
  • Name
  • Abbreviations
  • National capital
  • Language
  • Citizenry (noun and adjective)
  • National currency
  • Calling code
  • Internet portals

The demographics provided are among the most important parts of the package. 117 variables are available, and statistics are calculated in multiple ways, including in the national currency, US dollars, current prices, constant 2005 prices, rates, and/or shares. The following subject areas are covered:

  • Population
  • GDP (and its breakdown)
  • Value added by economic activity
  • Implicit price deflator
  • GNI
  • Exchange rate

It is drawn from the most recent United Nations (UN), International Organization for Standardization (ISO), International Olympic Committee (IOC), International Telecommunication Union (ITU), and top-level domain (TLD) data, and other proprietary information.

Both editions include the same core country information and GeoCoding data. The difference is the Pro version comes equipped with 43 years of demographics (1970–2012) while the Standard edition has the most recent ten years (2003–2012).

pdZIP

There are more than 41,000 United States Postal Service (USPS) 5-digit ZIP Codes, and more than 46 million USPS ZIP+4 records, in the 50 U.S. states, the District of Columbia, military posts, and island areas. pdZIP provides core USPS information about them, along with time zones, area codes, GeoCoding data, a host of useful demographic variables, and some new twists on the concept of ZIP Code databases. It is available in Pro and Standard editions.

These easy-to-use, comprehensive, and up-to-date packages are designed for those who want to create custom databases or applications, stylize the address information on their mailings, or go beyond what is available from USPS address cleaning services. The software also includes an alternate places reference file.

The following geographies are covered:

  • The 50 U.S. states
  • District of Columbia (federal district)
  • Overseas military areas:
    • U.S. Armed Forces Americas (except Canada)
    • U.S. Armed Forces Europe (which serves Europe, Canada, Africa, and the Middle East)
    • U.S. Armed Forces Pacific (which serves Asia and the Pacific)
  • Insular areas:
    • American Samoa
    • Commonwealth of the Northern Mariana Islands
    • Commonwealth of Puerto Rico
    • Guam
    • Midway Islands (also known as Midway Atoll; now inhabited only by caretakers)
    • U.S. Virgin Islands
    • Wake Island (also known as Wake Atoll; now inhabited only by civilian contractors)
  • Associated island areas:
    • Republic of the Marshall Islands
    • Federated States of Micronesia
    • Republic of Palau

The following core 5-digit area information is provided:

  • 5-digit ZIP Code
  • State
  • City
  • ZIP Classification
  • City-Delivery Carrier Routes
  • Bulk Mail Sort/Merge
  • Finance Number

The following core ZIP+4 area information is provided:

  • Plus4 Add-on
  • ZIP+4 address range
  • Carrier Route
  • Delivery Type
  • Street alias
  • Alternate record
  • LACS
  • Moves
  • Company
  • Puerto Rican urbanization

A postal carrier route is the group of addresses to which the USPS assigns the same code to aid in mail delivery or collection. Carrier route codes have four characters, one letter for the carrier route type followed by a three-digit carrier route number. Carrier route types are:

  • B### = PO box delivery
  • C### = City delivery
  • G### = General Delivery
  • H### = Highway contract
  • L### = Landmark area
  • R### = Rural route
  • V### = Void area (non-delivery)

The demographics provided are among the most important parts of the package. 53 variables are available and they encompass all 50 states, the District of Columbia (federal district), and the Commonwealth of Puerto Rico (insular area). The following subject areas are covered:

  • Population:
    • Total count
    • Gender
    • Median age
    • Race
    • Hispanic or Latino
    • Population in group quarters
  • Households:
    • Total count
    • Average household size
    • Average family size
    • Median family income
    • Median non-family income
  • Housing Units:
    • Total count
    • Median number of rooms
    • Median year built
    • Occupied housing units
    • Race
    • Hispanic or Latino
    • Median home value
    • Median rent
    • Vacant housing units

It is drawn from United States Postal Service (USPS) and United States Census Bureau data, and other proprietary information.

This bonus file is included with all editions of pdZIP. It lists preferred place names and acceptable and unacceptable alternate place names for United States Postal Service (USPS) 5-digit ZIP Codes.

Preferred cities are selected for use in mailings based on local mailing customs and USPS standards. For example, the city name “Hollywood” is desired by certain businesses in some Los Angeles USPS 5-digit ZIP Codes. Another example, if a five-digit locale has a large number of towns and villages, one may be chosen as the preferred city name.

When the five-digit codes were first implemented in 1963, each five-digit delivery area had only one preferred city for use in mailing addresses. Now addresses in the same five-digit zone can have different preferred cities, and ZIP+4 processing is required to precisely determine the correct preferred city for each individual address.

In the 5-digit ZIP Code files, a general preferred city is given for each five-digit delivery area, and the alternate places reference database is provided to assist selecting the best preferred city for individual addresses. In the Pro edition ZIP+4 files, the correct preferred city is identified for each address range.

Both editions include a 41,000 record 5-digit ZIP Code database along with an alternate places reference file. The Pro version adds 46 million ZIP+4 records. The Standard edition has everything except the ZIP+4 information.