Викиречник:Frequency lists/TV/2006/explanation

Liste frekvencija TV / filmova[uredi]

Ovo je broj frekvencija reči u kolekciji TV i filmskih skripata / transkripta, prvenstveno preuzetih sa Interneta.

  • The total number of words counted is: 29,213,800.
  • Most stage directions and other cruft were stripped out of the scripts. What's left is (mostly) the actual words you'd hear coming out of your speakers.
  • "Words" were divided on any character not in [A-Z], [a-z], or the ISO-Latin-1 range [À-ÿ]. This includes a hyphen (since so many of the transcribers couldn't tell the difference between a hyphen and a dash). So the compound "happy-juice" would have been counted as "happy" and "juice". I may eventually get around to generating a separate list of common hyphenated compounds.
  • Apostrofi su uključeni samo ako su u potpunosti sadržani u slovima. So don't bio je prebrojan kao don't ali goin' , 'cause, i 'cool' bi im se uklonili apostrofi.
  • All words were converted to lowercase before counting, hence entries for i, jessica, etc.
  • Especially when you get to the lower-frequency words, don't expect all entries to actually make sense. Some of the not terribly useful (for Wiktionary) things you'll see are:
    • "creative" spellings by transcribers
    • attempts to write non-linguistic behaviour, like "mrmph!"
    • partial words. When Giles says "bu-bu-bu-but", that's counted as 3 "bu"s and one "but". You probably don't want to rush out and add an English section to the article for bu. (But we're now accepting pool bets for when somebody actually does.)
      • I just checked, and someone has. No, it wasn't me. Did anyone bet on 02:31, 3 March 2006? - Rissa
    • gibberish created by occasional malfunctions of the transcriber's closed-caption capture card. A not atypical line of a soap-opera transcript that you would not want to base an entry for tdodo or ógc on:
      No, I have tdodo this, jack. I have to. ç|ógc1sss @ How long will we be inaris?

Sincerest thanks to all the fans in the world who meticulously wrote down all this data. May they soon discover Wiktionary! Keffy 01:04, 17 February 2006 (UTC)

    • I’m not sure if this is the right place for this, but anyway: The link on #9021 "jed" goes to the lower-case version, which doesn’t exist in English. Can I change that, to make it jump to "Jed", the abbreviation of "Jedidiah"? How were those links created, automatically? --Geke (talk) 12:03, 18 April 2015 (UTC)