View Single Post
  #53   Report Post  
Posted to alt.home.repair
micky micky is offline
external usenet poster
 
Posts: 8,582
Default People's Court today, Friday, 12/18

On Mon, 21 Dec 2015 23:32:56 -0700, rbowman
wrote:

On 12/21/2015 08:40 PM, Micky wrote:
The summer after my first year in college, I worked at the US Army
Finance Center** in Indianapolis. This is not what wikip or google
says but what they told me there was that Soundex was invented by a
guy who worked there. Since he invented it partly, largely, entirely
on government time, or in accordance with his employment contract, he
offered it to the army but they didnt' want it. I'm not sure if they
gave up their rights for no money or what. But then he pursued it
and ended up selling it back to the army for mucho money, enough that
he was rich.


It's been around a long time but there have been a lot of new improved
versions. I've used it for address validation and it works fairly well
for police dispatchers who can't spell. It can get weird though. BEACH,
BEECH, BIRCH are okay but sometimes the encoding gives you unexpected
candidates. Explainable, but still unexpected.


[More detail than necessary because I'm emailing this to someone]

Having more names than expected for the same code is not a real
problem. The goal is to cure the "opposite" problem. The goal is to
be able to find multiple names with multiple spellings and
misspellings and multiple correct, slurred, or mis pronunciations for
each of them..

Wikip doesn't make this clear, when it says " so that they can be
matched despite minor differences in spelling" It's far broader than
that, because vowels aren't considered at all, and, as you know,
consonants are grouped together so that any that can be mistaken
aurally for another get the same numeric code. So what most people
would consider major differences in spelling get the same code.

The goal is to prevent misfiling and to be able to find whatever has
been filed, even if one doesn't know how to spell or clearly pronounce
the name.

For example, in Latin American Spanish, b and v sound so close to each
other that they have to name them b-burro and v-vaca. (In print they
look different but in speech the b and v can be indistinguishable,
even to a Latino, unless maybe he knows how the word or name is
normally spelled and it's spelled normally. But a Latino or an Anglo
using Soundex doesn't have to know whether it's a b or a v because
they both get the same code. They don't have to know when filing, or
when retrieving. Not that Spanish was the driving force. Even in
English the sounds are similar. You and I might not notice, but
people who get names to spell all day long do.

https://en.wikipedia.org/wiki/Soundex

Although putting s and z with c, g, j, k, q, and x might be a
counter-plan, because it seems to me the first two couldn't be
confused with the other 6.