Пређи на садржај

Модул:data consistency check/док

Ово је документациона подстраница за Модул:data consistency check

This module checks the validity and internal consistency of the language, language family, and script data used on Wiktionary: the modules in Категорија:Језички модули података as well as Модул:scripts/data.

Output

[уреди]

Discrepancies detected:

  • Literary Chinese, the canonical name for the code lzh-lit, is wrong; it should be Literary Chinese.
  • The code nds-lpr and the canonical name Low Prussian should be removed; they are not found in Модул:etymology languages/data.
  • Literary Chinese, the canonical name for the code lzh-lit, is wrong; it should be Literary Chinese.
  • Literary Chinese језик (lzh-lit) has a canonical name that is not unique; it is also used by the code lzh.
  • The data key preprocess_links for ??? (th-new) is invalid.
  • The canonical name North Germanic (gmq) is missing.
  • Северно германски, the canonical name for the code gmq, is wrong; it should be North Germanic.
  • The code ira-mid and the canonical name Middle Iranian should be removed; they are not found in Module:families/data.
  • The code ira-old and the canonical name Old Iranian should be removed; they are not found in Module:families/data.
  • The canonical name Northern Ryukyuan (jpx-nry) is missing.
  • The canonical name Southern Ryukyuan (jpx-sry) is missing.
  • Indo-Aryan, the canonical name for the code inc, is wrong; it should be Индо-Аријан.
  • Indo-European, the canonical name for the code ine, is wrong; it should be Индо-Европски.
  • Balto-Slavic, the canonical name for the code ine-bsl, is wrong; it should be Балтословенски.
  • The code ira-mid and the canonical name Middle Iranian should be removed; they are not found in Module:families/data.
  • The code ira-old and the canonical name Old Iranian should be removed; they are not found in Module:families/data.
  • Slavic, the canonical name for the code sla, is wrong; it should be Словенски.
  • East Slavic, the canonical name for the code zle, is wrong; it should be Источнословенски.
  • Southern Amami Ōshima, the canonical name for the code ams, is wrong; it should be Southern Amami-Oshima.
  • The canonical name Southern Amami-Oshima (ams) is missing.
  • The canonical name Амерички знаковни језик (ase) is missing.
  • American Sign Language, the canonical name for the code ase, is wrong; it should be Амерички знаковни језик.
  • The canonical name Dhundhari (dhd) is missing.
  • Proto-West Germanic, the canonical name for the code gmw-pro, is wrong; it should be Пра-Западно Германски.
  • The canonical name Пра-Западно Германски (gmw-pro) is missing.
  • The canonical name Proto-Indo-European (ine-pro) is missing.
  • Пра-Индо-Европски, the canonical name for the code ine-pro, is wrong; it should be Proto-Indo-European.
  • Aiwoo, the canonical name for the code nfl, is wrong; it should be Äiwoo.
  • The canonical name Äiwoo (nfl) is missing.
  • Moabite, the canonical name for the code obm, is wrong; it should be Моавски.
  • The canonical name Моавски (obm) is missing.
  • Пра-Семитски, the canonical name for the code sem-pro, is wrong; it should be Proto-Semitic.
  • The canonical name Proto-Semitic (sem-pro) is missing.
  • The canonical name Кантонски (yue) is missing.
  • Cantonese, the canonical name for the code yue, is wrong; it should be Кантонски.
  • Afar, the canonical name for the code aa, is wrong; it should be Афарски.
  • Afrikaans, the canonical name for the code af, is wrong; it should be Африкански.
  • Amharic, the canonical name for the code am, is wrong; it should be Амхарски.
  • Southern Amami Ōshima, the canonical name for the code ams, is wrong; it should be Southern Amami-Oshima.
  • Old English, the canonical name for the code ang, is wrong; it should be Стари Енглески.
  • Arabic, the canonical name for the code ar, is wrong; it should be Арапски.
  • Aramaic, the canonical name for the code arc, is wrong; it should be Арамејски.
  • American Sign Language, the canonical name for the code ase, is wrong; it should be Амерички знаковни језик.
  • Azerbaijani, the canonical name for the code az, is wrong; it should be Азербејџански.
  • Belarusian, the canonical name for the code be, is wrong; it should be Белоруски.
  • Bulgarian, the canonical name for the code bg, is wrong; it should be Бугарски.
  • Braj, the canonical name for the code bra, is wrong; it should be Брај.
  • Catalan, the canonical name for the code ca, is wrong; it should be Каталонски.
  • Mandarin, the canonical name for the code cmn, is wrong; it should be Мандарин.
  • Corsican, the canonical name for the code co, is wrong; it should be Корзички.
  • Czech, the canonical name for the code cs, is wrong; it should be Чешки.
  • Welsh, the canonical name for the code cy, is wrong; it should be Велшки.
  • Danish, the canonical name for the code da, is wrong; it should be Дански.
  • German, the canonical name for the code de, is wrong; it should be Немачки.
  • Dungan, the canonical name for the code dng, is wrong; it should be Дунган.
  • Greek, the canonical name for the code el, is wrong; it should be Грчки.
  • English, the canonical name for the code en, is wrong; it should be Енглески.
  • Middle English, the canonical name for the code enm, is wrong; it should be Средњи Енглески.
  • Esperanto, the canonical name for the code eo, is wrong; it should be Есперанто.
  • Spanish, the canonical name for the code es, is wrong; it should be Шпански.
  • Basque, the canonical name for the code eu, is wrong; it should be Баскијски.
  • Finnish, the canonical name for the code fi, is wrong; it should be Фински.
  • French, the canonical name for the code fr, is wrong; it should be Француски.
  • Old French, the canonical name for the code fro, is wrong; it should be Стари Француски.
  • Irish, the canonical name for the code ga, is wrong; it should be Ирски.
  • Proto-West Germanic, the canonical name for the code gmw-pro, is wrong; it should be Пра-Западно Германски.
  • Gothic, the canonical name for the code got, is wrong; it should be Готски.
  • Ancient Greek, the canonical name for the code grc, is wrong; it should be Антички Грчки.
  • Gujarati, the canonical name for the code gu, is wrong; it should be Гуџарати.
  • Hawaiian, the canonical name for the code haw, is wrong; it should be Хавајски.
  • Hebrew, the canonical name for the code he, is wrong; it should be Хебрејски.
  • Hindi, the canonical name for the code hi, is wrong; it should be Хинди.
  • Hungarian, the canonical name for the code hu, is wrong; it should be Мађарски.
  • Armenian, the canonical name for the code hy, is wrong; it should be Јерменски.
  • Ido, the canonical name for the code io, is wrong; it should be Идо.
  • Italian, the canonical name for the code it, is wrong; it should be Италијански.
  • Japanese, the canonical name for the code ja, is wrong; it should be Јапански.
  • Korean, the canonical name for the code ko, is wrong; it should be Корејски.
  • Latin, the canonical name for the code la, is wrong; it should be Латински.
  • Ladino, the canonical name for the code lad, is wrong; it should be Ладино.
  • Macedonian, the canonical name for the code mk, is wrong; it should be Македонски.
  • Malayalam, the canonical name for the code ml, is wrong; it should be Малајалам.
  • Mongolian, the canonical name for the code mn, is wrong; it should be Монголски.
  • Marathi, the canonical name for the code mr, is wrong; it should be Марати.
  • Malay, the canonical name for the code ms, is wrong; it should be Малајски.
  • Maltese, the canonical name for the code mt, is wrong; it should be Малтешки.
  • Translingual, the canonical name for the code mul, is wrong; it should be Међународни.
  • Nepali, the canonical name for the code ne, is wrong; it should be Непали.
  • Dutch, the canonical name for the code nl, is wrong; it should be Холандски.
  • Norwegian, the canonical name for the code no, is wrong; it should be Норвешки.
  • Moabite, the canonical name for the code obm, is wrong; it should be Моавски.
  • Okinoerabu, the canonical name for the code okn, is wrong; it should be Oki-No-Erabu.
  • Old Marathi, the canonical name for the code omr, is wrong; it should be Стари Марати.
  • Old Tamil, the canonical name for the code oty, is wrong; it should be Стари Тамилски.
  • Pali, the canonical name for the code pi, is wrong; it should be Пали.
  • Polish, the canonical name for the code pl, is wrong; it should be Пољски.
  • Portuguese, the canonical name for the code pt, is wrong; it should be Португалски.
  • Romanian, the canonical name for the code ro, is wrong; it should be Румунски.
  • Russian, the canonical name for the code ru, is wrong; it should be Руски.
  • Sanskrit, the canonical name for the code sa, is wrong; it should be Санскрт.
  • Scots, the canonical name for the code sco, is wrong; it should be Шкотски.
  • Serbo-Croatian, the canonical name for the code sh, is wrong; it should be Српскохрватски.
  • Slovak, the canonical name for the code sk, is wrong; it should be Словачки.
  • Slovene, the canonical name for the code sl, is wrong; it should be Словенски.
  • Proto-Slavic, the canonical name for the code sla-pro, is wrong; it should be Пра-Словенски.
  • Albanian, the canonical name for the code sq, is wrong; it should be Албански.
  • Swedish, the canonical name for the code sv, is wrong; it should be Шведски.
  • Thai, the canonical name for the code th, is wrong; it should be Тајски.
  • Tokunoshima, the canonical name for the code tkn, is wrong; it should be Toku-No-Shima.
  • Tagalog, the canonical name for the code tl, is wrong; it should be Тагалог.
  • Tok Pisin, the canonical name for the code tpi, is wrong; it should be Ток Писин.
  • Turkish, the canonical name for the code tr, is wrong; it should be Турски.
  • Ukrainian, the canonical name for the code uk, is wrong; it should be Украјински.
  • Vietnamese, the canonical name for the code vi, is wrong; it should be Вијетнамски.
  • Yiddish, the canonical name for the code yi, is wrong; it should be Јидиш.
  • Cantonese, the canonical name for the code yue, is wrong; it should be Кантонски.
  • Southern Amami-Oshima, the canonical name for ams, is repeated in the table of aliases.
  • Panyi Bai, the canonical name for bfc, is repeated in the table of otherNames.
  • Daakaka, the canonical name for bpa, is repeated in the table of otherNames.
  • Äiwoo, the canonical name for nfl, is repeated in the table of otherNames.
  • Toku-No-Shima, the canonical name for tkn, is repeated in the table of aliases.
  • Ura (Papua New Guinea), the canonical name for uro, is repeated in the table of otherNames.
  • Wiradjuri, the canonical name for wrh, is repeated in the table of otherNames.
  • Арапски, the canonical name for the code Arab, is wrong; it should be Arabic.
  • Armenian (Armn) is missing
  • Јерменски, the canonical name for the code Armn, is wrong; it should be Armenian.
  • Old Cyrillic (Cyrs) is missing
  • Стара Ћирилица, the canonical name for the code Cyrs, is wrong; it should be Old Cyrillic.
  • Готски, the canonical name for the code Goth, is wrong; it should be Gothic.
  • Gothic (Goth) is missing
  • Грчки, the canonical name for the code Grek, is wrong; it should be Greek.
  • Гуџарати, the canonical name for the code Gujr, is wrong; it should be Gujarati.
  • Gujarati (Gujr) is missing
  • Хангул, the canonical name for the code Hang, is wrong; it should be Hangul.
  • Hangul (Hang) is missing
  • Han (Hani) is missing
  • Хан, the canonical name for the code Hani, is wrong; it should be Han.
  • Hebrew (Hebr) is missing
  • Хебрејски, the canonical name for the code Hebr, is wrong; it should be Hebrew.
  • Јапански, the canonical name for the code Jpan, is wrong; it should be Japanese.
  • Japanese (Jpan) is missing
  • Каннада, the canonical name for the code Knda, is wrong; it should be Kannada.
  • Kannada (Knda) is missing
  • Korean (Kore) is missing
  • Корејски, the canonical name for the code Kore, is wrong; it should be Korean.
  • Латиница (Latn) is missing
  • Латински, the canonical name for the code Latn, is wrong; it should be Латиница.
  • Малајалам, the canonical name for the code Mlym, is wrong; it should be Malayalam.
  • Malayalam (Mlym) is missing
  • Феничански (Phnx) is missing
  • Phoenician, the canonical name for the code Phnx, is wrong; it should be Феничански.
  • Tamil (Taml) is missing
  • Тамилски, the canonical name for the code Taml, is wrong; it should be Tamil.
  • Телугу, the canonical name for the code Telu, is wrong; it should be Telugu.
  • Telugu (Telu) is missing
  • Тибетски, the canonical name for the code Tibt, is wrong; it should be Tibetan.
  • Tibetan (Tibt) is missing
  • Adlam, the canonical name for the code Adlm, is wrong; it should be Адлам.
  • Cyrillic, the canonical name for the code Cyrl, is wrong; it should be Ћирилица.
  • Devanagari, the canonical name for the code Deva, is wrong; it should be Деванагари.
  • Hiragana, the canonical name for the code Hira, is wrong; it should be Хирагана.
  • Katakana, the canonical name for the code Kana, is wrong; it should be Катакана.
  • Phoenician, the canonical name for the code Phnx, is wrong; it should be Феничански.
  • Thai, the canonical name for the code Thai, is wrong; it should be Тајски.
  • Code: aa. Saw name: Afar. Expected name: Афарски.
  • Code: af. Saw name: Afrikaans. Expected name: Африкански.
  • Code: als. Saw name: Albanian. Expected name: Албански.
  • Code: ams. Saw name: Southern Amami Ōshima. Expected name: Southern Amami-Oshima.
  • Code: ang. Saw name: Old English. Expected name: Стари Енглески.
  • Code: ar. Saw name: Arabic. Expected name: Арапски.
  • Code: arc. Saw name: Aramaic. Expected name: Арамејски.
  • Code: az. Saw name: Azerbaijani. Expected name: Азербејџански.
  • Code: be. Saw name: Belarusian. Expected name: Белоруски.
  • Code: bg. Saw name: Bulgarian. Expected name: Бугарски.
  • Code: ca. Saw name: Catalan. Expected name: Каталонски.
  • Code: cmn. Saw name: Mandarin. Expected name: Мандарин.
  • Code: cmn-ear. Saw name: Mandarin. Expected name: Мандарин.
  • Code: co. Saw name: Corsican. Expected name: Корзички.
  • Code: cs. Saw name: Czech. Expected name: Чешки.
  • Code: cy. Saw name: Welsh. Expected name: Велшки.
  • Code: da. Saw name: Danish. Expected name: Дански.
  • Code: de. Saw name: German. Expected name: Немачки.
  • Code: dng. Saw name: Dungan. Expected name: Дунган.
  • Code: el. Saw name: Greek. Expected name: Грчки.
  • Code: en. Saw name: English. Expected name: Енглески.
  • Code: enm. Saw name: Middle English. Expected name: Средњи Енглески.
  • Code: eo. Saw name: Esperanto. Expected name: Есперанто.
  • Code: es. Saw name: Spanish. Expected name: Шпански.
  • Code: eu. Saw name: Basque. Expected name: Баскијски.
  • Code: fi. Saw name: Finnish. Expected name: Фински.
  • Code: fr. Saw name: French. Expected name: Француски.
  • Code: fr-CA. Saw name: French. Expected name: Француски.
  • Code: frk. Saw name: Proto-West Germanic. Expected name: Пра-Западно Германски.
  • Code: fro. Saw name: Old French. Expected name: Стари Француски.
  • Code: fro-nor. Saw name: Old French. Expected name: Стари Француски.
  • Code: ga. Saw name: Irish. Expected name: Ирски.
  • Code: gem. Saw name: Germanic. Expected name: Германски.
  • Code: gem-pro. Saw name: Proto-Germanic. Expected name: Пра-Германски.
  • Code: gkm. Saw name: Ancient Greek. Expected name: Антички Грчки.
  • Code: gmw-pro. Saw name: Proto-West Germanic. Expected name: Пра-Западно Германски.
  • Code: got. Saw name: Gothic. Expected name: Готски.
  • Code: grc. Saw name: Ancient Greek. Expected name: Антички Грчки.
  • Code: gu. Saw name: Gujarati. Expected name: Гуџарати.
  • Code: haw. Saw name: Hawaiian. Expected name: Хавајски.
  • Code: he. Saw name: Hebrew. Expected name: Хебрејски.
  • Code: hi. Saw name: Hindi. Expected name: Хинди.
  • Code: hu. Saw name: Hungarian. Expected name: Мађарски.
  • Code: hy. Saw name: Armenian. Expected name: Јерменски.
  • Code: io. Saw name: Ido. Expected name: Идо.
  • Code: it. Saw name: Italian. Expected name: Италијански.
  • Code: itc-ola. Saw name: Latin. Expected name: Латински.
  • Code: ja. Saw name: Japanese. Expected name: Јапански.
  • Code: ko. Saw name: Korean. Expected name: Корејски.
  • Code: la. Saw name: Latin. Expected name: Латински.
  • Code: lad. Saw name: Ladino. Expected name: Ладино.
  • Code: mk. Saw name: Macedonian. Expected name: Македонски.
  • Code: ml. Saw name: Malayalam. Expected name: Малајалам.
  • Code: mn. Saw name: Mongolian. Expected name: Монголски.
  • Code: mr. Saw name: Marathi. Expected name: Марати.
  • Code: ms. Saw name: Malay. Expected name: Малајски.
  • Code: ms-cla. Saw name: Malay. Expected name: Малајски.
  • Code: ms-old. Saw name: Malay. Expected name: Малајски.
  • Code: mt. Saw name: Maltese. Expected name: Малтешки.
  • Code: mul. Saw name: Translingual. Expected name: Међународни.
  • Code: ne. Saw name: Nepali. Expected name: Непали.
  • Code: nl. Saw name: Dutch. Expected name: Холандски.
  • Code: no. Saw name: Norwegian. Expected name: Норвешки.
  • Code: okn. Saw name: Okinoerabu. Expected name: Oki-No-Erabu.
  • Code: pi. Saw name: Pali. Expected name: Пали.
  • Code: pl. Saw name: Polish. Expected name: Пољски.
  • Code: pt. Saw name: Portuguese. Expected name: Португалски.
  • Code: ro. Saw name: Romanian. Expected name: Румунски.
  • Code: ru. Saw name: Russian. Expected name: Руски.
  • Code: sa. Saw name: Sanskrit. Expected name: Санскрт.
  • Code: sa-ved. Saw name: Sanskrit. Expected name: Санскрт.
  • Code: sco. Saw name: Scots. Expected name: Шкотски.
  • Code: sh. Saw name: Serbo-Croatian. Expected name: Српскохрватски.
  • Code: sk. Saw name: Slovak. Expected name: Словачки.
  • Code: sl. Saw name: Slovene. Expected name: Словенски.
  • Code: sla. Saw name: Slavic. Expected name: Словенски.
  • Code: sla-pro. Saw name: Proto-Slavic. Expected name: Пра-Словенски.
  • Code: sq. Saw name: Albanian. Expected name: Албански.
  • Code: sv. Saw name: Swedish. Expected name: Шведски.
  • Code: ta. Saw name: Tamil. Expected name: Тамил.
  • Code: th. Saw name: Thai. Expected name: Тајски.
  • Code: tl. Saw name: Tagalog. Expected name: Тагалог.
  • Code: tpi. Saw name: Tok Pisin. Expected name: Ток Писин.
  • Code: tr. Saw name: Turkish. Expected name: Турски.
  • Code: uk. Saw name: Ukrainian. Expected name: Украјински.
  • Code: vi. Saw name: Vietnamese. Expected name: Вијетнамски.
  • Code: xno. Saw name: Old French. Expected name: Стари Француски.
  • Code: yi. Saw name: Yiddish. Expected name: Јидиш.
  • Code: yue. Saw name: Cantonese. Expected name: Кантонски.
  • Code: zh. Saw name: Chinese. Expected name: Кинески.

Checks performed

[уреди]

For multiple data modules:

  • Codes for languages, families and etymology-only languages must be unique and cannot clash with one another.
  • Canonical names for languages, families, and etymology-only languages must not be found in the list of other names.
  • Each name in the list of other names must appear only once.
  • otherNames, if present, must be an array.
  • Wikidata item IDs must be a positive integer or a string starting with Q and ending with decimal digits.

The following must be true of the data used by Module:languages:

  • Each code must be defined in the correct submodule according to whether it is two-letter, three-letter or exceptional.
  • The canonical name (field 1) must be present and must not be the same as the canonical name of another language.
  • If field 2 is not nil, it must a valid Wikidata item ID.
  • If field 3 or family is given and not nil, it must be a valid family code.
  • If field 4 or scripts is given and not nil, it must be an array, and each string in the array must be a valid script code.
  • If ancestors is given, it must be an array, and each string in the array must be a valid language or etymology language code.
  • If family is given, it must be a valid family code.
  • If type is given, it must be one of the recognised values (regular, reconstructed, appendix-constructed).
  • If entry_name is given, it must be a table that contains either two arrays (from and to) or a string (remove_diacritics) or both.
  • If sort_key is given, it may either be a string, or at table that in turn contains either two arrays (from and to) or a string (remove_diacritics).
  • If entry_name or sort_key is given, the from array must be longer or equal in length to the to array.
  • If standardChars is given, it must form a valid Lua string pattern when placed between square brackets with ^ before it ("[^...]). (It should match all characters regularly used in the language, but that cannot be tested.)
  • If override_translit is set, translit must also be set, because there must be a transliteration module that can override manual transliteration.
  • If link_tr is present, it must be true.
  • Have no data keys besides these: 1, 2, 3, "entry_name", "sort_key", "display", "otherNames", "aliases", "varieties", "type", "scripts", "ancestors", "wikimedia_codes", "wikipedia_article", "standardChars", "translit", "override_translit", "link_tr".

Checks not performed:

  • If translit is present, it should be the name of a module, and this module should contain a tr function that takes a pagename (and optionally a language code and script code) as arguments.
  • If sort_key is a string, it should be the name of a module, and this module should contain a makeSortKey function that takes a pagename (and optionally a language code and script code) as arguments.
  • If entry_name or sort_key is a table and contains a field remove_diacritics, the value of the field should be a string that forms a valid Lua pattern when it is placed inside negated set notation ([^...]).

These are not checked here, because module errors will quickly crop up in entries if these conditions are not met, assuming that Module:utilities attempts to generate a sortkey for a category pertaining to the language in question, or full_link attempts to use the transliteration module.

Module:languages/code to canonical name and Module:languages/canonical names must contain all the codes and canonical names found in the data submodules of Module:languages, and no more.

The following must be true of the data used by Module:etymology languages:

  • canonicalName must be given.
  • parent must be given must be a valid language, family or etymology-only language code.
  • If ancestors is given, it must be an array, and each string in the array must be a valid language or etymology language code. The etymology language should also be listed as the ancestor of a regular language.
  • Have no data keys besides these: "canonicalName", "otherNames", "parent", "ancestors", "wikipedia_article", "wikidata_item".

Codes in Module:families data must:

  • Have canonicalName, which must not be the same as the canonical name of another family.
  • If family is given, it must be a valid family code.
  • Have at least one language or subfamily belonging to it.
  • Have no data keys besides these: "canonicalName", "otherNames", "family", "protoLanguage", "wikidata_item".

Codes in Module:scripts data must:

  • Have canonicalName.
  • Have at least one language that lists it as one of its scripts.
  • Have a characters pattern for script autodetection, and this must form a valid Lua string pattern when placed between square brackets ("[...]"). (It should match all characters in the script, but that cannot be tested.)
  • Have no data keys besides these: "canonicalName", "otherNames", "parent", "systems", "wikipedia_article", "characters", "direction".