Викиречник:Frequency lists/TV/2006/explanation

Листе фреквенција ТВ / филмова[уреди]

Ово је број фреквенција речи у колекцији ТВ и филмских скрипата / транскрипта, првенствено преузетих са Интернета.

  • The total number of words counted is: 29,213,800.
  • Most stage directions and other cruft were stripped out of the scripts. What's left is (mostly) the actual words you'd hear coming out of your speakers.
  • "Words" were divided on any character not in [A-Z], [a-z], or the ISO-Latin-1 range [À-ÿ]. This includes a hyphen (since so many of the transcribers couldn't tell the difference between a hyphen and a dash). So the compound "happy-juice" would have been counted as "happy" and "juice". I may eventually get around to generating a separate list of common hyphenated compounds.
  • Апострофи су укључени само ако су у потпуности садржани у словима. So don't био је пребројан као don't али goin' , 'cause, и 'cool' би им се уклонили апострофи.
  • All words were converted to lowercase before counting, hence entries for i, jessica, etc.
  • Especially when you get to the lower-frequency words, don't expect all entries to actually make sense. Some of the not terribly useful (for Wiktionary) things you'll see are:
    • "creative" spellings by transcribers
    • attempts to write non-linguistic behaviour, like "mrmph!"
    • partial words. When Giles says "bu-bu-bu-but", that's counted as 3 "bu"s and one "but". You probably don't want to rush out and add an English section to the article for bu. (But we're now accepting pool bets for when somebody actually does.)
      • I just checked, and someone has. No, it wasn't me. Did anyone bet on 02:31, 3 March 2006? - Rissa
    • gibberish created by occasional malfunctions of the transcriber's closed-caption capture card. A not atypical line of a soap-opera transcript that you would not want to base an entry for tdodo or ógc on:
      No, I have tdodo this, jack. I have to. ç|ógc1sss @ How long will we be inaris?

Sincerest thanks to all the fans in the world who meticulously wrote down all this data. May they soon discover Wiktionary! Keffy 01:04, 17 February 2006 (UTC)

    • I’m not sure if this is the right place for this, but anyway: The link on #9021 "jed" goes to the lower-case version, which doesn’t exist in English. Can I change that, to make it jump to "Jed", the abbreviation of "Jedidiah"? How were those links created, automatically? --Geke (talk) 12:03, 18 April 2015 (UTC)