From January until July, I am serving a Knight International Journalism Fellowship in Ukraine. I am working with the Journalists' Initiative Association, based in Kharkiv in eastern Ukraine. We are helping promote a strong, independent media system, which we believe is crucial to democracy.

Sunday, April 29, 2007

The power of the tilde ( ~ )



Yulia and I did a workshop yesterday for journalists at a Kharkiv television news operation, ATN. The station has a new newsroom, and I sensed a lot of energy and commitment. I covered advanced Internet search techniques (aka "stupid Google pet tricks") and introductory Excel skills.

Most of the Google strategies were skills I had covered before -- they're listed on a handout I've posted online. Things like: Put phrases you're searching for in quotes; if you're looking for an exact match on a word or phrase, put a plus sign (+) next to it; if you want to search a specific domain, use "site:", followed by a snippet of the Web address (like "gov.ua" for Ukrainian government sites).

But in preparing for the workshop, I discovered an undocumented Google tips -- one of those oddities that makes software users wonder: Is it a feature or a flaw? It's hard to describe, but here goes:

In Google's English-language interface, if you precede a search term with a tilde (the technical term, of course, is "a squiggly"!), Google will look for synonyms to that term. Example: If I search for ...




... I get hits for newspaper, media and related terms. (Likewise, if you search for "~food", you get hits for terms like cooking, recipes and nutrition.) It's a cool way to do a "fuzzy" search -- to broaden your search but still keep it focused.

This works in English; I guess Google has created (or taps into) a database of synonymistic (that's probably not a word) terms.

Yulia and I tried this in Russian, and it doesn't work the same way; it doesn't search for synonyms of the word you're looking for. (My guess is, Google doesn't have the synonymistic database for other languages.)

However, Google does something else with the tilde-tagged words in Russian searches: It gives you spelling variations based on that word -- and this can be crucial for languages like Russian (in which the same word has different spellings depending on its case -- the way it's used in a sentence: as a subject, as an object, as the object of a preposition, etc.).

So check this out:

If I search for one variation of the spelling of Kharkiv (Харькова; the variation you might you if you said, "This is a Kharkiv map"), and you put a plus sign next to Kharkiv (meaning you want an exact match), here's what you get:



But if I change the + to a ~ (meaning I want a fuzzy search), I get:


That's about 1 million more hits. As I browse them, I see that Google has returned pages with such alternative spellings as Харькове, харьковского, Харьковский.

Another example: Suppose you're doing research about whether computer games have led people to commit suicide.

If you search for exact matches on the two key words (+самоубийство +компьютеры), you get 170,000 hits in Google. If you "fuzzy up" the search (another highly technical term) by changing your syntax to "~самоубийство ~компьютеры", you get 1.1 million hits -- because Google will return Web pages that have the word computer as singular or plural, as a noun or an adjective, as a subject or an object ... all the myriad possibilities.

Any ideas about what Google is doing? This trick doesn't always seem to work. It seems to work best with long words. Any Google experts out there?

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home