Модул:data consistency check

Извор: Викиречник
Иди на навигацију Иди на претрагу

This module checks the validity and internal consistency of the language, language family, and script data used on Wiktionary: the modules in Category:Language data modules as well as Module:scripts/data.

Checks performed[уреди]

For multiple data modules:

  • Codes for languages, families and etymology-only languages must be unique and cannot clash with one another.
  • Canonical names for languages, families, and etymology-only languages must not be found in the list of other names.
  • Each name in the list of other names must appear only once.
  • otherNames, if present, must be an array.

Codes in Module:languages data must:

  • Be defined in the correct submodule according to whether the code is two-letter, three-letter or exceptional.
  • Have canonicalName, which must not be the same as the canonical name of another language.
  • If scripts is given, then it must be an array, and each string in the array must be a valid script code.
  • If family is given, it must be a valid family code.
  • If type is given, it must be one of the recognised values (regular, reconstructed, appendix-constructed).
  • If entry_name is given, it must contain two arrays (from and to).
  • If sort_key is given, it must be either a string or a table containing two arrays (from and to).
  • If entry_name or sort_key is given, the from array must be longer or equal in length to the to array.
  • If standardChars is given, it must form a valid Lua string pattern when placed between square brackets with ^ before it ("[^...]). (It should match all characters regularly used in the language, but that cannot be tested.)
  • Have no data keys besides these: "canonicalName", "entry_name", "sort_key", "otherNames", "type", "scripts", "family", "ancestors", "wikimedia_codes", "wikipedia_article", "standardChars", "translit_module", "override_translit", "link_tr", "wikidata_item".

Checks not performed:

  • If translit_module is present, it should be the name of a module, and this module should contain a tr function that takes a pagename (and optionally a language code and script code) as arguments.
  • If sort_key is a string, it should be the name of a module, and this module should contain a makeSortKey function that takes a pagename (and optionally a language code and script code) as arguments.

These are not checked here, because module errors will quickly crop up in entries if these conditions are not met, assuming that Module:utilities attempts to generate a sortkey for a category pertaining to the language in question, or full_link attempts to use the transliteration module.

Module:languages/code to canonical name and Module:languages/canonical names must contain all the codes and canonical names found in the data submodules of Module:languages, and no more.

Codes in Module:etymology languages data must:

  • Have canonicalName.
  • Have parent, which must be a valid language, family or etymology-only language code.
  • Have no data keys besides these: "canonicalName", "otherNames", "parent", "ancestors", "wikipedia_article", "wikidata_item".

Codes in Module:families data must:

  • Have canonicalName, which must not be the same as the canonical name of another family.
  • If family is given, it must be a valid family code.
  • Have at least one language or subfamily belonging to it.
  • Have no data keys besides these: "canonicalName", "otherNames", "family", "protoLanguage", "wikidata_item".

Codes in Module:scripts data must:

  • Have canonicalName.
  • Have at least one language that lists it as one of its scripts.
  • Have a characters pattern for script autodetection, and this must form a valid Lua string pattern when placed between square brackets ("[...]"). (It should match all characters in the script, but that cannot be tested.)
  • Have no data keys besides these: "canonicalName", "otherNames", "parent", "systems", "wikipedia_article", "characters", "direction".

Output[уреди]

Discrepancies detected:

Модул:etymology languages/data

  • Etymology-only Ashtiani језик (atn) has invalid parent language or family code ira-ker.
  • Etymology-only Zoroastrian Dari језик (gbz) has invalid parent language or family code ira-ker.
  • Etymology-only Gazi језик (gzi) has invalid parent language or family code ira-ker.
  • Etymology-only Abyanehi језик (ker-aby) has invalid parent language or family code ira-ker.
  • Etymology-only Abuzeydabadi језик (ker-abz) has invalid parent language or family code ira-ker.
  • Etymology-only Anaraki језик (ker-ana) has invalid parent language or family code ira-ker.
  • Etymology-only Ardestani језик (ker-ard) has invalid parent language or family code ira-ker.
  • Etymology-only Ashtiani језик (ker-ast) has invalid parent language or family code ira-ker.
  • Etymology-only Badrudi језик (ker-bdr) has invalid parent language or family code ira-ker.
  • Etymology-only Bidhandi језик (ker-bid) has invalid parent language or family code ira-ker.
  • Etymology-only Bijagani језик (ker-bij) has invalid parent language or family code ira-ker.
  • Etymology-only Chimehi језик (ker-cim) has invalid parent language or family code ira-ker.
  • Etymology-only Zoroastrian Dari језик (ker-dar) has invalid parent language or family code ira-ker.
  • Etymology-only Delijani језик (ker-del) has invalid parent language or family code ira-ker.
  • Etymology-only Farizandi језик (ker-far) has invalid parent language or family code ira-ker.
  • Etymology-only Gazi језик (ker-gaz) has invalid parent language or family code ira-ker.
  • Etymology-only Hamadani језик (ker-ham) has invalid parent language or family code ira-ker.
  • Etymology-only Hanjani језик (ker-han) has invalid parent language or family code ira-ker.
  • Etymology-only Isfahani језик (ker-isf) has invalid parent language or family code ira-ker.
  • Etymology-only Jowshaqani језик (ker-jow) has invalid parent language or family code ira-ker.
  • Etymology-only Kafroni језик (ker-kaf) has invalid parent language or family code ira-ker.
  • Etymology-only Kashani језик (ker-kas) has invalid parent language or family code ira-ker.
  • Etymology-only Kermani језик (ker-ker) has invalid parent language or family code ira-ker.
  • Etymology-only Kesehi језик (ker-kes) has invalid parent language or family code ira-ker.
  • Etymology-only Komjani језик (ker-kom) has invalid parent language or family code ira-ker.
  • Etymology-only Mahallati језик (ker-mah) has invalid parent language or family code ira-ker.
  • Etymology-only Meymehi језик (ker-mey) has invalid parent language or family code ira-ker.
  • Etymology-only Naraqi језик (ker-nar) has invalid parent language or family code ira-ker.
  • Etymology-only Nashalji језик (ker-nas) has invalid parent language or family code ira-ker.
  • Etymology-only Natanzi језик (ker-nat) has invalid parent language or family code ira-ker.
  • Etymology-only Nayini језик (ker-nay) has invalid parent language or family code ira-ker.
  • Etymology-only Qalhari језик (ker-qal) has invalid parent language or family code ira-ker.
  • Etymology-only Qohrudi језик (ker-qoh) has invalid parent language or family code ira-ker.
  • Etymology-only Sedehi језик (ker-sed) has invalid parent language or family code ira-ker.
  • Etymology-only Soi језик (ker-soi) has invalid parent language or family code ira-ker.
  • Etymology-only Tari језик (ker-tar) has invalid parent language or family code ira-ker.
  • Etymology-only Varani језик (ker-var) has invalid parent language or family code ira-ker.
  • Etymology-only Vonishuni језик (ker-von) has invalid parent language or family code ira-ker.
  • Etymology-only Varzenehi језик (ker-vrz) has invalid parent language or family code ira-ker.
  • Etymology-only Khunsari језик (ker-xun) has invalid parent language or family code ira-ker.
  • Etymology-only Khuri језик (ker-xur) has invalid parent language or family code ira-ker.
  • Etymology-only Yarandi језик (ker-yar) has invalid parent language or family code ira-ker.
  • Etymology-only Yazdi језик (ker-yaz) has invalid parent language or family code ira-ker.
  • Etymology-only Zefrehi језик (ker-zef) has invalid parent language or family code ira-ker.
  • Etymology-only Zori језик (ker-zor) has invalid parent language or family code ira-ker.
  • Etymology-only Khunsari језик (kfm) has invalid parent language or family code ira-ker.
  • Code ntz is not unique; it is also defined in Модул:languages/data3/n.
  • Etymology-only Natanzi језик (ntz) has invalid parent language or family code ira-ker.
  • Code nyq is not unique; it is also defined in Модул:languages/data3/n.
  • Etymology-only Nayini језик (nyq) has invalid parent language or family code ira-ker.
  • Etymology-only Soi језик (soj) has invalid parent language or family code ira-ker.

Модул:families/data

Модул:languages/canonical names

  • The canonical name Пра-Austroasiatic (aav-pro) is missing.
  • Proto-Austroasiatic, the canonical name for the code aav-pro, is wrong; it should be Пра-Austroasiatic.
  • The canonical name Mount Iriga Agta (agz) is missing.
  • Mt. Iriga Agta, the canonical name for the code agz, is wrong; it should be Mount Iriga Agta.
  • The code ais and the canonical name Nataoran Amis should be removed; they are not found in a submodule of Module:languages.
  • The canonical name Пра-Cangin (alv-cng-pro) is missing.
  • Proto-Edoid, the canonical name for the code alv-edo-pro, is wrong; it should be Пра-Edoid.
  • The canonical name Пра-Edoid (alv-edo-pro) is missing.
  • The canonical name Пра-Gbe (alv-gbe-pro) is missing.
  • The canonical name Пра-Central Togo (alv-gtm-pro) is missing.
  • The canonical name Пра-Igboid (alv-igb-pro) is missing.
  • The canonical name Пра-Kwa (alv-kwa-pro) is missing.
  • The canonical name Пра-Nupoid (alv-nup-pro) is missing.
  • Proto-Atlantic-Congo, the canonical name for the code alv-pro, is wrong; it should be Пра-Atlantic-Congo.
  • The canonical name Пра-Atlantic-Congo (alv-pro) is missing.
  • The canonical name Пра-Yoruboid (alv-yor-pro) is missing.
  • Арамејски, the canonical name for the code arc, is wrong; it should be Арaмејски.
  • The canonical name Арaмејски (arc) is missing.
  • The code asd and the canonical name Asas should be removed; they are not found in a submodule of Module:languages.
  • The canonical name Mount Iraya Agta (atl) is missing.
  • Mt. Iraya Agta, the canonical name for the code atl, is wrong; it should be Mount Iraya Agta.
  • The code atn and the canonical name Ashtiani should be removed; they are not found in a submodule of Module:languages.
  • Proto-Central New South Wales, the canonical name for the code aus-cww-pro, is wrong; it should be Пра-Central New South Wales.
  • The canonical name Пра-Central New South Wales (aus-cww-pro) is missing.
  • The canonical name Пра-Daly (aus-dal-pro) is missing.
  • Proto-Daly, the canonical name for the code aus-dal-pro, is wrong; it should be Пра-Daly.
  • Proto-Nyulnyulan, the canonical name for the code aus-nyu-pro, is wrong; it should be Пра-Nyulnyulan.
  • The canonical name Пра-Nyulnyulan (aus-nyu-pro) is missing.
  • Proto-Pama-Nyungan, the canonical name for the code aus-pam-pro, is wrong; it should be Пра-Pama-Nyungan.
  • The canonical name Пра-Pama-Nyungan (aus-pam-pro) is missing.
  • The canonical name Proto-Amuesha-Chamicuro (awd-amc-pro) is missing.
  • The canonical name Proto-Kampa (awd-kmp-pro) is missing.
  • The canonical name Proto-Arawak (awd-pro) is missing.
  • Пра-Arawakan, the canonical name for the code awd-pro, is wrong; it should be Proto-Arawak.
  • The canonical name Proto-Paresi-Waura (awd-prw-pro) is missing.
  • Proto-Ta-Arawakan, the canonical name for the code awd-taa-pro, is wrong; it should be Proto-Ta-Arawak.
  • The canonical name Proto-Ta-Arawak (awd-taa-pro) is missing.
  • Middle Armenian, the canonical name for the code axm, is wrong; it should be Средњи Јерменски.
  • The canonical name Средњи Јерменски (axm) is missing.
  • The canonical name Proto-Uto-Aztecan (azc-pro) is missing.
  • Пра-Uto-Aztecan, the canonical name for the code azc-pro, is wrong; it should be Proto-Uto-Aztecan.
  • The canonical name Proto-Sotho-Tswana (bnt-sts-pro) is missing.
  • The canonical name Бошњачки (bs) is missing.
  • Пра-Abkhaz-Abaza, the canonical name for the code cau-abz-pro, is wrong; it should be Proto-Abkhaz-Abaza.
  • The canonical name Proto-Abkhaz-Abaza (cau-abz-pro) is missing.
  • Cafundo Creole, the canonical name for the code ccd, is wrong; it should be Cafundó.
  • The canonical name Cafundó (ccd) is missing.
  • The code cdg and the canonical name Chamari should be removed; they are not found in a submodule of Module:languages.
  • Цебуано, the canonical name for the code ceb, is wrong; it should be Cebuano.
  • The canonical name Cebuano (ceb) is missing.
  • Chocangacakha, the canonical name for the code cgk, is wrong; it should be Chocangaca.
  • The canonical name Chocangaca (cgk) is missing.
  • The canonical name црногорски (cnr) is missing.
  • The code dgu and the canonical name Degaru should be removed; they are not found in a submodule of Module:languages.
  • The canonical name Далматински (dlm) is missing.
  • Dalmatian, the canonical name for the code dlm, is wrong; it should be Далматински.
  • The canonical name Proto-Western Mande (dmn-mdw-pro) is missing.
  • The canonical name Proto-Mande (dmn-pro) is missing.
  • The code dud and the canonical name Duka should be removed; they are not found in a submodule of Module:languages.
  • The canonical name Dewas Rai (dwz) is missing.
  • The canonical name Proto-Inuit (esx-inu-pro) is missing.
  • Пра-Inuit, the canonical name for the code esx-inu-pro, is wrong; it should be Proto-Inuit.
  • North Frisian, the canonical name for the code frr, is wrong; it should be Севернофризски.
  • The canonical name Севернофризски (frr) is missing.
  • The code gbz and the canonical name Zoroastrian Dari should be removed; they are not found in a submodule of Module:languages.
  • The canonical name Proto-West Germanic (gmw-pro) is missing.
  • The canonical name Proto-Hellenic (grk-pro) is missing.
  • Пра-Hellenic, the canonical name for the code grk-pro, is wrong; it should be Proto-Hellenic.
  • The code gzi and the canonical name Gazi should be removed; they are not found in a submodule of Module:languages.
  • Пра-Hmong, the canonical name for the code hmn-pro, is wrong; it should be Proto-Hmong.
  • The canonical name Proto-Hmong (hmn-pro) is missing.
  • The canonical name Proto-Hmong-Mien (hmx-pro) is missing.
  • Пра-Hmong-Mien, the canonical name for the code hmx-pro, is wrong; it should be Proto-Hmong-Mien.
  • The canonical name хрватски (hr) is missing.
  • The canonical name Proto-Armenian (hyx-pro) is missing.
  • Пра-Armenian, the canonical name for the code hyx-pro, is wrong; it should be Proto-Armenian.
  • Herero, the canonical name for the code hz, is wrong; it should be Хереро.
  • The canonical name Хереро (hz) is missing.
  • The canonical name Proto-Ijoid (ijo-pro) is missing.
  • Inupiak, the canonical name for the code ik, is wrong; it should be Inupiaq.
  • The canonical name Inupiaq (ik) is missing.
  • The canonical name Proto-Indo-Aryan (inc-pro) is missing.
  • Пра-Индо-Aryan, the canonical name for the code inc-pro, is wrong; it should be Proto-Indo-Aryan.
  • The canonical name Proto-Anatolian (ine-ana-pro) is missing.
  • Пра-Anatolian, the canonical name for the code ine-ana-pro, is wrong; it should be Proto-Anatolian.
  • The code ira-azr and the canonical name Old Azari should be removed; they are not found in a submodule of Module:languages.
  • ira-ker, the code for the canonical name Kermanic, is wrong; it should be xme-ker.
  • ira-kls, the code for the canonical name Kalasuri, is wrong; it should be xme-kls.
  • ira-klt, the code for the canonical name Kilit, is wrong; it should be xme-klt.
  • The canonical name Proto-Komisenian (ira-kms-pro) is missing.
  • The canonical name Proto-Munji-Yidgha (ira-mny-pro) is missing.
  • The canonical name Proto-Medo-Parthian (ira-mpr-pro) is missing.
  • The canonical name Proto-Pathan (ira-pat-pro) is missing.
  • Пра-Ирански, the canonical name for the code ira-pro, is wrong; it should be Proto-Iranian.
  • The canonical name Proto-Iranian (ira-pro) is missing.
  • ira-sak-pro, the code for the canonical name Proto-Saka, is wrong; it should be xsc-sak-pro.
  • The canonical name Proto-Sanglechi-Ishkashimi (ira-sgi-pro) is missing.
  • The canonical name Proto-Shughni-Roshani (ira-shr-pro) is missing.
  • The canonical name Proto-Shughni-Yazghulami (ira-shy-pro) is missing.
  • The canonical name Proto-Shughni-Yazghulami-Munji (ira-sym-pro) is missing.
  • The canonical name Vanji (ira-wnj) is missing.
  • The canonical name Proto-Zaza-Gorani (ira-zgr-pro) is missing.
  • The canonical name Erie (iro-ere) is missing.
  • The canonical name Proto-Iroquoian (iro-pro) is missing.
  • Пра-Iroquoian, the canonical name for the code iro-pro, is wrong; it should be Proto-Iroquoian.
  • The canonical name Пра-Јапански (jpx-pro) is missing.
  • Пра-Japonic, the canonical name for the code jpx-pro, is wrong; it should be Пра-Јапански.
  • The code kfm and the canonical name Khunsari should be removed; they are not found in a submodule of Module:languages.
  • The canonical name Когношки (kg) is missing.
  • Kongo, the canonical name for the code kg, is wrong; it should be Когношки.
  • The canonical name Proto-Khoe (khi-kho-pro) is missing.
  • The canonical name Kanu (khx) is missing.
  • The canonical name Eastern Pwo (kjp) is missing.
  • Eastern Pwo Karen, the canonical name for the code kjp, is wrong; it should be Eastern Pwo.
  • The canonical name East Kewa (kjs) is missing.
  • Phrae Pwo Karen, the canonical name for the code kjt, is wrong; it should be Phrae Pwo.
  • The canonical name Phrae Pwo (kjt) is missing.
  • The canonical name Гренладски (kl) is missing.
  • Greenlandic, the canonical name for the code kl, is wrong; it should be Гренладски.
  • The canonical name Proto-Kru (kro-pro) is missing.
  • The canonical name Southern Kissi (kss) is missing.
  • Southern Kisi, the canonical name for the code kss, is wrong; it should be Southern Kissi.
  • The code lic-pro and the canonical name Пра-Hlai should be removed; they are not found in a submodule of Module:languages.
  • Пра-Atayalic, the canonical name for the code map-ata-pro, is wrong; it should be Proto-Atayalic.
  • The canonical name Proto-Atayalic (map-ata-pro) is missing.
  • The canonical name Proto-Aslian (mkh-asl-pro) is missing.
  • The code mkh-law and the canonical name Lawi should be removed; they are not found in a submodule of Module:languages.
  • The canonical name Middle Mon (mkh-mmn) is missing.
  • Пра-Vietic, the canonical name for the code mkh-vie-pro, is wrong; it should be Proto-Vietic.
  • The canonical name Proto-Vietic (mkh-vie-pro) is missing.
  • The canonical name Proto-Kalapuyan (nai-klp-pro) is missing.
  • The canonical name Proto-Tsimshianic (nai-tsi-pro) is missing.
  • Пра-Utian, the canonical name for the code nai-utn-pro, is wrong; it should be Proto-Utian.
  • The canonical name Proto-Utian (nai-utn-pro) is missing.
  • The canonical name Proto-Trans-New Guinea (ngf-pro) is missing.
  • The canonical name Proto-Eastern Oti-Volta (nic-eov-pro) is missing.
  • Пра-Gur, the canonical name for the code nic-gur-pro, is wrong; it should be Proto-Gur.
  • The canonical name Proto-Gur (nic-gur-pro) is missing.
  • The canonical name Proto-Ogoni (nic-ogo-pro) is missing.
  • The canonical name Proto-Oti-Volta (nic-ovo-pro) is missing.
  • The canonical name Proto-Plateau (nic-plt-pro) is missing.
  • The canonical name Proto-Upper Cross River (nic-ucr-pro) is missing.
  • The canonical name Proto-Volta-Congo (nic-vco-pro) is missing.
  • Old Dutch, the canonical name for the code odt, is wrong; it should be Стари Холандски.
  • The canonical name Стари Холандски (odt) is missing.
  • The code oht and the canonical name Old Hittite should be removed; they are not found in a submodule of Module:languages.
  • The canonical name Proto-Chatino (omq-cha-pro) is missing.
  • The canonical name Proto-Oto-Pamean (omq-otp-pro) is missing.
  • The canonical name Teojomulco Chatino (omq-teo) is missing.
  • The canonical name Proto-Zapotecan (omq-zap-pro) is missing.
  • The canonical name Proto-Zapotec (omq-zpc-pro) is missing.
  • Onin Based Pidgin, the canonical name for the code onx, is wrong; it should be Pidgin Onin.
  • The canonical name Pidgin Onin (onx) is missing.
  • The canonical name Proto-Ossetic (os-pro) is missing.
  • The canonical name Proto-Otomi (oto-otm-pro) is missing.
  • The canonical name Proto-Otomian (oto-pro) is missing.
  • Пра-Kalamian, the canonical name for the code phi-kal-pro, is wrong; it should be Proto-Kalamian.
  • The canonical name Proto-Kalamian (phi-kal-pro) is missing.
  • The canonical name Proto-Halmahera-Cenderawasih (poz-hce-pro) is missing.
  • Пра-Halmahera-Cenderawasih, the canonical name for the code poz-hce-pro, is wrong; it should be Proto-Halmahera-Cenderawasih.
  • The canonical name Proto-Great Andamanese (qfa-adm-pro) is missing.
  • The canonical name Proto-Hurro-Urartian (qfa-hur-pro) is missing.
  • Пра-Hurro-Urartian, the canonical name for the code qfa-hur-pro, is wrong; it should be Proto-Hurro-Urartian.
  • The canonical name Proto-Kadu (qfa-kad-pro) is missing.
  • The canonical name Proto-Hlai (qfa-lic-pro) is missing.
  • The canonical name Proto-Ongan (qfa-ong-pro) is missing.
  • The canonical name Proto-Kra-Dai (qfa-tak-pro) is missing.
  • Proto-Tai-Kadai, the canonical name for the code qfa-tak-pro, is wrong; it should be Proto-Kra-Dai.
  • Пра-Yeniseian, the canonical name for the code qfa-yen-pro, is wrong; it should be Proto-Yeniseian.
  • The canonical name Proto-Yeniseian (qfa-yen-pro) is missing.
  • The canonical name Proto-Yukaghir (qfa-yuk-pro) is missing.
  • Пра-Yukaghir, the canonical name for the code qfa-yuk-pro, is wrong; it should be Proto-Yukaghir.
  • The canonical name Proto-Jê (sai-jee-pro) is missing.
  • The canonical name Proto-Nyima (sdv-nyi-pro) is missing.
  • The canonical name Proto-Taman (sdv-tmn-pro) is missing.
  • The canonical name Proto-Siouan (sio-pro) is missing.
  • Пра-Siouan, the canonical name for the code sio-pro, is wrong; it should be Proto-Siouan.
  • The canonical name Proto-Hrusish (sit-hrs-pro) is missing.
  • Пра-Sino-Tibetan, the canonical name for the code sit-pro, is wrong; it should be Proto-Sino-Tibetan.
  • The canonical name Proto-Sino-Tibetan (sit-pro) is missing.
  • Sinsauru, the canonical name for the code snz, is wrong; it should be Kou.
  • The canonical name Kou (snz) is missing.
  • The code soj and the canonical name Soi should be removed; they are not found in a submodule of Module:languages.
  • The canonical name Sambalpuri (spv) is missing.
  • The canonical name Proto-Nilo-Saharan (ssa-pro) is missing.
  • The canonical name Сумерски (sux) is missing.
  • Sumerian, the canonical name for the code sux, is wrong; it should be Сумерски.
  • The canonical name Slavomolisano (svm) is missing.
  • Molise Croatian, the canonical name for the code svm, is wrong; it should be Slavomolisano.
  • The canonical name Sakizaya (szy) is missing.
  • Proto-Tai, the canonical name for the code tai-pro, is wrong; it should be Прото-Тај.
  • The canonical name Прото-Тај (tai-pro) is missing.
  • The canonical name Proto-Bodo-Garo (tbq-bdg-pro) is missing.
  • The canonical name Proto-Lolo-Burmese (tbq-lob-pro) is missing.
  • The code tbq-pro and the canonical name Proto-Tibeto-Burman should be removed; they are not found in a submodule of Module:languages.
  • Tswana, the canonical name for the code tn, is wrong; it should be Тцвана.
  • The canonical name Тцвана (tn) is missing.
  • The code trk-ogz-pro and the canonical name Proto-Oghuz should be removed; they are not found in a submodule of Module:languages.
  • The canonical name Пра-Турски (trk-pro) is missing.
  • Proto-Turkic, the canonical name for the code trk-pro, is wrong; it should be Пра-Турски.
  • Tsonga, the canonical name for the code ts, is wrong; it should be Тцонга.
  • The canonical name Тцонга (ts) is missing.
  • The code tut-pro and the canonical name Пра-Altaic should be removed; they are not found in a submodule of Module:languages.
  • The canonical name Gong (ugo) is missing.
  • Ugong, the canonical name for the code ugo, is wrong; it should be Gong.
  • Пра-Uralic, the canonical name for the code urj-pro, is wrong; it should be Proto-Uralic.
  • The canonical name Proto-Uralic (urj-pro) is missing.
  • The canonical name Proto-Ugric (urj-ugr-pro) is missing.
  • Пра-Ugric, the canonical name for the code urj-ugr-pro, is wrong; it should be Proto-Ugric.
  • The canonical name Saare (uss) is missing.
  • The canonical name Hun (uth) is missing.
  • The code vaf and the canonical name Vafsi should be removed; they are not found in a submodule of Module:languages.
  • Venda, the canonical name for the code ve, is wrong; it should be Венда.
  • The canonical name Венда (ve) is missing.
  • The canonical name Middle Median (xme-mid) is missing.
  • The canonical name Old Median (xme-old) is missing.
  • The canonical name Old Tati (xme-ott) is missing.
  • The canonical name Tafreshi (xme-taf) is missing.
  • The canonical name Proto-Tatic (xme-ttc-pro) is missing.
  • The canonical name Proto-Scythian (xsc-pro) is missing.
  • The canonical name Proto-Saka-Wakhi (xsc-skw-pro) is missing.
  • The canonical name Proto-Yupik (ypk-pro) is missing.
  • Пра-Yupik, the canonical name for the code ypk-pro, is wrong; it should be Proto-Yupik.
  • The canonical name Old Czech (zlw-ocs) is missing.
  • Стари Чешки, the canonical name for the code zlw-ocs, is wrong; it should be Old Czech.
  • The canonical name Old Polish (zlw-opl) is missing.
  • Стари Пољски, the canonical name for the code zlw-opl, is wrong; it should be Old Polish.

Модул:languages/code to canonical name

  • Proto-Afro-Asiatic, the canonical name for the code afa-pro, is wrong; it should be Пра-Afro-Asiatic.
  • Proto-Algonquian, the canonical name for the code alg-pro, is wrong; it should be Пра-Algonquian.
  • Proto-Cangin, the canonical name for the code alv-cng-pro, is wrong; it should be Пра-Cangin.
  • Proto-Edoid, the canonical name for the code alv-edo-pro, is wrong; it should be Пра-Edoid.
  • Proto-Gbe, the canonical name for the code alv-gbe-pro, is wrong; it should be Пра-Gbe.
  • Proto-Central Togo, the canonical name for the code alv-gtm-pro, is wrong; it should be Пра-Central Togo.
  • Proto-Igboid, the canonical name for the code alv-igb-pro, is wrong; it should be Пра-Igboid.
  • Proto-Kwa, the canonical name for the code alv-kwa-pro, is wrong; it should be Пра-Kwa.
  • Proto-Nupoid, the canonical name for the code alv-nup-pro, is wrong; it should be Пра-Nupoid.
  • Proto-Atlantic-Congo, the canonical name for the code alv-pro, is wrong; it should be Пра-Atlantic-Congo.
  • Proto-Yoruboid, the canonical name for the code alv-yor-pro, is wrong; it should be Пра-Yoruboid.
  • Proto-Apachean, the canonical name for the code apa-pro, is wrong; it should be Пра-Apachean.
  • Proto-Algic, the canonical name for the code aql-pro, is wrong; it should be Пра-Algic.
  • Арамејски, the canonical name for the code arc, is wrong; it should be Арaмејски.
  • Proto-Athabaskan, the canonical name for the code ath-pro, is wrong; it should be Пра-Athabaskan.
  • Proto-Arawa, the canonical name for the code auf-pro, is wrong; it should be Пра-Arawa.
  • Proto-Arnhem, the canonical name for the code aus-arn-pro, is wrong; it should be Пра-Arnhem.
  • Proto-Central New South Wales, the canonical name for the code aus-cww-pro, is wrong; it should be Пра-Central New South Wales.
  • Proto-Daly, the canonical name for the code aus-dal-pro, is wrong; it should be Пра-Daly.
  • Proto-Nyulnyulan, the canonical name for the code aus-nyu-pro, is wrong; it should be Пра-Nyulnyulan.
  • Proto-Pama-Nyungan, the canonical name for the code aus-pam-pro, is wrong; it should be Пра-Pama-Nyungan.
  • Proto-Iwaidjan, the canonical name for the code aus-wdj-pro, is wrong; it should be Пра-Iwaidjan.
  • Middle Armenian, the canonical name for the code axm, is wrong; it should be Средњи Јерменски.
  • Aymara, the canonical name for the code ay, is wrong; it should be Ајмара.
  • Bashkir, the canonical name for the code ba, is wrong; it should be Башкир.
  • Bihari, the canonical name for the code bh, is wrong; it should be Бихари.
  • Bislama, the canonical name for the code bi, is wrong; it should be Бислама.
  • Bengali, the canonical name for the code bn, is wrong; it should be Бенгали.
  • The code bs (Бошњачки) is missing.
  • Kamkata-viri, the canonical name for the code bsh, is wrong; it should be Kati.
  • Чечен, the canonical name for the code ce, is wrong; it should be Чеченски.
  • The code cnr (црногорски) is missing.
  • Corsican, the canonical name for the code co, is wrong; it should be Корзички.
  • Czech, the canonical name for the code cs, is wrong; it should be Чешки.
  • Old Church Slavonic, the canonical name for the code cu, is wrong; it should be Старословенски.
  • Dungan, the canonical name for the code dng, is wrong; it should be Дунган.
  • Ewe, the canonical name for the code ee, is wrong; it should be Еве.
  • Египатски, the canonical name for the code egy, is wrong; it should be Egyptian.
  • Esperanto, the canonical name for the code eo, is wrong; it should be Есперанто.
  • Spanish, the canonical name for the code es, is wrong; it should be Шпански.
  • Estonian, the canonical name for the code et, is wrong; it should be Естонски.
  • Basque, the canonical name for the code eu, is wrong; it should be Баскијски.
  • Persian, the canonical name for the code fa, is wrong; it should be Персијски.
  • Fang (Beboid), the canonical name for the code fak, is wrong; it should be Fang (Cameroon).
  • Fang (Bantu), the canonical name for the code fan, is wrong; it should be Fang (Guinea).
  • Fula, the canonical name for the code ff, is wrong; it should be Фула.
  • Finnish, the canonical name for the code fi, is wrong; it should be Фински.
  • Fijian, the canonical name for the code fj, is wrong; it should be Фиџи.
  • Faroese, the canonical name for the code fo, is wrong; it should be Фарски.
  • Middle French, the canonical name for the code frm, is wrong; it should be Средњи Француски.
  • Old French, the canonical name for the code fro, is wrong; it should be Стари Француски.
  • North Frisian, the canonical name for the code frr, is wrong; it should be Севернофризски.
  • West Frisian, the canonical name for the code fy, is wrong; it should be Западни Фризијски.
  • Gan, the canonical name for the code gan, is wrong; it should be Ган.
  • Scottish Gaelic, the canonical name for the code gd, is wrong; it should be Шкотски Галски.
  • Ge'ez, the canonical name for the code gez, is wrong; it should be Ги'из.
  • The code gok (Gowli) is missing.
  • Gothic, the canonical name for the code got, is wrong; it should be Готски.
  • Ancient Greek, the canonical name for the code grc, is wrong; it should be Антички Грчки.
  • Gujarati, the canonical name for the code gu, is wrong; it should be Гуџарати.
  • The code gyo and the canonical name Gyalsumdo should be removed; they are not found in a submodule of Module:languages.
  • Hausa, the canonical name for the code ha, is wrong; it should be Хауса.
  • Hawaiian, the canonical name for the code haw, is wrong; it should be Хавајски.
  • Humburi Senni, the canonical name for the code hmb, is wrong; it should be Humburi Senni Songhay.
  • The code hr (хрватски) is missing.
  • The code htx (Middle Hittite) is missing.
  • Hungarian, the canonical name for the code hu, is wrong; it should be Мађарски.
  • Herero, the canonical name for the code hz, is wrong; it should be Хереро.
  • Interlingua, the canonical name for the code ia, is wrong; it should be Интерлингва.
  • The code ibh and the canonical name Bih should be removed; they are not found in a submodule of Module:languages.
  • Indonesian, the canonical name for the code id, is wrong; it should be Индонезијски.
  • Interlingue, the canonical name for the code ie, is wrong; it should be Интерлингве.
  • Igbo, the canonical name for the code ig, is wrong; it should be Игбо.
  • Pidgin Iha, the canonical name for the code ihb, is wrong; it should be Iha Based Pidgin.
  • Proto-Indo-Iranian, the canonical name for the code iir-pro, is wrong; it should be Пра-Индо-Ирански.
  • Ido, the canonical name for the code io, is wrong; it should be Идо.
  • Icelandic, the canonical name for the code is, is wrong; it should be Исландски.
  • Old Latin, the canonical name for the code itc-ola, is wrong; it should be Стари Латински.
  • Proto-Italic, the canonical name for the code itc-pro, is wrong; it should be Пра-Италијански.
  • The code jbw and the canonical name Yawijibaya should be removed; they are not found in a submodule of Module:languages.
  • The code jgk and the canonical name Gwak should be removed; they are not found in a submodule of Module:languages.
  • The code jjr and the canonical name Zhár should be removed; they are not found in a submodule of Module:languages.
  • Proto-Japonic, the canonical name for the code jpx-pro, is wrong; it should be Пра-Јапански.
  • Javanese, the canonical name for the code jv, is wrong; it should be Јавански.
  • Kongo, the canonical name for the code kg, is wrong; it should be Когношки.
  • Kazakh, the canonical name for the code kk, is wrong; it should be Казашки.
  • Greenlandic, the canonical name for the code kl, is wrong; it should be Гренладски.
  • Khmer, the canonical name for the code km, is wrong; it should be Кмерски.
  • Kanuri, the canonical name for the code kr, is wrong; it should be Канури.
  • Kashmiri, the canonical name for the code ks, is wrong; it should be Кашмирски.
  • Kurdish, the canonical name for the code ku, is wrong; it should be Курдски.
  • Cornish, the canonical name for the code kw, is wrong; it should be Корнски.
  • Kyrgyz, the canonical name for the code ky, is wrong; it should be Киргиски.
  • Latin, the canonical name for the code la, is wrong; it should be Латински.
  • Luxembourgish, the canonical name for the code lb, is wrong; it should be Луксембуршки.
  • The code lba (Lui) is missing.
  • Luganda, the canonical name for the code lg, is wrong; it should be Лугандски.
  • Limburgish, the canonical name for the code li, is wrong; it should be Лимбуршки.
  • Laboya, the canonical name for the code lmy, is wrong; it should be Lamboya.
  • Lingala, the canonical name for the code ln, is wrong; it should be Лингала.
  • Lao, the canonical name for the code lo, is wrong; it should be Лао.
  • Lithuanian, the canonical name for the code lt, is wrong; it should be Литвански.
  • Middle Chinese, the canonical name for the code ltc, is wrong; it should be Средњи Кинески.
  • Latvian, the canonical name for the code lv, is wrong; it should be Летонски.
  • The code lvi and the canonical name Lawi should be removed; they are not found in a submodule of Module:languages.
  • Malagasy, the canonical name for the code mg, is wrong; it should be Малгашки.
  • Marshallese, the canonical name for the code mh, is wrong; it should be Маршалски.
  • Maori, the canonical name for the code mi, is wrong; it should be Маорски.
  • Malayalam, the canonical name for the code ml, is wrong; it should be Малајалам.
  • Mongolian, the canonical name for the code mn, is wrong; it should be Монголски.
  • Malay, the canonical name for the code ms, is wrong; it should be Малајски.
  • Maltese, the canonical name for the code mt, is wrong; it should be Малтешки.
  • Translingual, the canonical name for the code mul, is wrong; it should be Међујезички.
  • Burmese, the canonical name for the code my, is wrong; it should be Бурмански.
  • The code myd (Maramba) is missing.
  • Min Nan, the canonical name for the code nan, is wrong; it should be Мин Нан.
  • Numana, the canonical name for the code nbr, is wrong; it should be Numana-Nunku-Gbantu-Numbu.
  • Nepali, the canonical name for the code ne, is wrong; it should be Непалски.
  • The code nei (Neo-Hittite) is missing.
  • Newar, the canonical name for the code new, is wrong; it should be Newari.
  • Dutch, the canonical name for the code nl, is wrong; it should be Холандски.
  • The code nns (Ningye) is missing.
  • Norwegian, the canonical name for the code no, is wrong; it should be Норвешки.
  • The code npg and the canonical name Ponyo should be removed; they are not found in a submodule of Module:languages.
  • The code nql and the canonical name Ngendelengo should be removed; they are not found in a submodule of Module:languages.
  • The code nqy and the canonical name Akyaung Ari should be removed; they are not found in a submodule of Module:languages.
  • The code ntz (Natanzi) is missing.
  • Navajo, the canonical name for the code nv, is wrong; it should be Навахо.
  • Classical Newar, the canonical name for the code nwc, is wrong; it should be Classical Newari.
  • Chichewa, the canonical name for the code ny, is wrong; it should be Чичева.
  • The code nyq (Nayini) is missing.
  • Moabite, the canonical name for the code obm, is wrong; it should be Моавски.
  • Old Dutch, the canonical name for the code odt, is wrong; it should be Стари Холандски.
  • Old Japanese, the canonical name for the code ojp, is wrong; it should be Стари Јапански.
  • Pali, the canonical name for the code pi, is wrong; it should be Пали.
  • Polish, the canonical name for the code pl, is wrong; it should be Пољски.
  • The code pnd and the canonical name Mpinda should be removed; they are not found in a submodule of Module:languages.
  • Pashto, the canonical name for the code ps, is wrong; it should be Пашто.
  • Русински, the canonical name for the code rue, is wrong; it should be Rusyn.
  • Scots, the canonical name for the code sco, is wrong; it should be Шкотски.
  • Proto-Semitic, the canonical name for the code sem-pro, is wrong; it should be Пра-Семитски.
  • Српскохрватски, the canonical name for the code sh, is wrong; it should be Српски.
  • Sumerian, the canonical name for the code sux, is wrong; it should be Сумерски.
  • Swedish, the canonical name for the code sv, is wrong; it should be Шведски.
  • Swahili, the canonical name for the code sw, is wrong; it should be Свахили.
  • Пра-Tai, the canonical name for the code tai-pro, is wrong; it should be Прото-Тај.
  • Пра-Southwestern Tai, the canonical name for the code tai-swe-pro, is wrong; it should be Proto-Southwestern Tai.
  • Turks and Caicos Creole English, the canonical name for the code tch, is wrong; it should be Turks And Caicos Creole English.
  • Telugu, the canonical name for the code te, is wrong; it should be Телугу.
  • Tajik, the canonical name for the code tg, is wrong; it should be Таџик.
  • Thavung, the canonical name for the code thm, is wrong; it should be Aheu.
  • Tigrinya, the canonical name for the code ti, is wrong; it should be Тигриња.
  • Turkmen, the canonical name for the code tk, is wrong; it should be Туркмен.
  • Ramandi, the canonical name for the code tks, is wrong; it should be Takestani.
  • Tagalog, the canonical name for the code tl, is wrong; it should be Тагалог.
  • Tswana, the canonical name for the code tn, is wrong; it should be Тцвана.
  • Taíno, the canonical name for the code tnq, is wrong; it should be Taino.
  • Tongan, the canonical name for the code to, is wrong; it should be Тонган.
  • Turkish, the canonical name for the code tr, is wrong; it should be Турски.
  • Proto-Turkic, the canonical name for the code trk-pro, is wrong; it should be Пра-Турски.
  • Tsonga, the canonical name for the code ts, is wrong; it should be Тцонга.
  • Tatar, the canonical name for the code tt, is wrong; it should be Татарски.
  • The code tvx and the canonical name Taivoan should be removed; they are not found in a submodule of Module:languages.
  • Tahitian, the canonical name for the code ty, is wrong; it should be Тахићански.
  • Uyghur, the canonical name for the code ug, is wrong; it should be Ујгур.
  • Ukrainian, the canonical name for the code uk, is wrong; it should be Украјински.
  • Urdu, the canonical name for the code ur, is wrong; it should be Урду.
  • Uzbek, the canonical name for the code uz, is wrong; it should be Узбек.
  • Venda, the canonical name for the code ve, is wrong; it should be Венда.
  • Veps, the canonical name for the code vep, is wrong; it should be Вепски.
  • Walloon, the canonical name for the code wa, is wrong; it should be Валун.
  • Wolof, the canonical name for the code wo, is wrong; it should be Волоф.
  • Wu, the canonical name for the code wuu, is wrong; it should be Ву.
  • Old Armenian, the canonical name for the code xcl, is wrong; it should be Стари Јерменски.
  • Khwarezmian, the canonical name for the code xco, is wrong; it should be Chorasmian.
  • The code xdo and the canonical name Kwandu should be removed; they are not found in a submodule of Module:languages.
  • Xhosa, the canonical name for the code xh, is wrong; it should be Хоса.
  • Khoini, the canonical name for the code xkc, is wrong; it should be Kho'ini.
  • The code xme (Median) is missing.
  • The code xsc (Scythian) is missing.
  • The code xsj and the canonical name Subi should be removed; they are not found in a submodule of Module:languages.
  • The code xvi (Kamviri) is missing.
  • The code xyt and the canonical name Mayi-Thakurti should be removed; they are not found in a submodule of Module:languages.
  • Yazghulami, the canonical name for the code yah, is wrong; it should be Yazgulyam.
  • Cantonese, the canonical name for the code yue, is wrong; it should be Кантонски.
  • Zhuang, the canonical name for the code za, is wrong; it should be Џуанг.

Модул:languages/data2

  • Бошњачки, the canonical name for bs, is repeated in the table of aliases.
  • ??? (cnr) does not have a two-letter code.
  • црногорски, the canonical name for cnr, is repeated in the table of aliases.
  • Српски језик (sh) has a canonical name that is not unique; it is also used by the code sr.
  • Српски, the canonical name for sr, is repeated in the table of aliases.
  • Туркмен језик (tk) lists an invalid language code trk-ogz-pro as ancestor.

Модул:languages/data3/e

Модул:languages/data3/g

Модул:languages/data3/h

Модул:languages/data3/i

Модул:languages/data3/j

Модул:languages/data3/k

Модул:languages/data3/l

Модул:languages/data3/m

Модул:languages/data3/n

Модул:languages/data3/o

Модул:languages/data3/p

Модул:languages/data3/r

Модул:languages/data3/s

Модул:languages/data3/t

Модул:languages/data3/v

Модул:languages/data3/w

Модул:languages/data3/x

Модул:languages/data3/y

Модул:languages/datax

Модул:scripts/by name

  • Cherokee (Cher) is missing
  • Chorasmian (Chrs) is missing
  • Грузијски (Geor) is missing
  • Глагољица (Glag) is missing
  • Готица (Goth) is missing
  • Гуџарати (Gujr) is missing
  • Old Italic (Ital) is missing
  • Малајалам (Mlym) is missing

Модул:scripts/code to canonical name

  • Chrs (Chorasmian) is missing

Модул:scripts/data


local export = {}

local m_language_data = require("Module:languages/alldata")
local m_language_codes = require('Module:languages/code to canonical name')
local m_language_canonical_names = require('Module:languages/canonical names')
local m_etym_language_data = require("Module:etymology languages/data")
local m_family_data = require('Module:families/data')
local m_script_data = require('Module:scripts/data')

local m_table = require("Module:table")
local Array = require("Module:array")

local messages

local function discrepancy(modname, ...)
	messages[modname]:insert(string.format(...))
end

local all_codes = {}

local language_names = {}
local family_names = {}
local script_names = {}

local nonempty_families = {}
local allowed_empty_families = {tbq = true}
local nonempty_scripts = {}
	
local function link(name)
	if not name then
		return "???"
	elseif name:find("[Јј]език$") then
		return "[[:Категорија:" .. name .. "|" .. name .. "]]"
	else
		return "[[:Категорија:" .. name .. " језик|" .. name .. " језик]]"
	end
end
	
local function link_script(name)
	if not name then
		return "???"
	elseif name:find("[Cc]ode$") or name:find("[Ss]emaphore$") then
		return "[[:Категорија:" .. name:gsub("^%l", string.upper) .. "|" .. name .. "]]"
	else
		return "[[:Категорија:" .. name .. " текст|" .. name .. " текст]]"
	end
end

local function invalid_keys_message(modname, code, data, invalid_keys, is_script)
	local plural = #invalid_keys ~= 1
	discrepancy(modname, "The data key%s %s for %s (<code>%s</code>) %s invalid.",
		plural and "s" or "",
		invalid_keys
			:map(
				function(key)
					return '<code>' .. key .. '</code>'
				end)
			:concat(", "),
		(is_script and link_script or link)(data.canonicalName or data[1]),
		code,
		plural and "are" or "is")
end

local function check_data_keys(valid_keys, is_script)
	valid_keys = Array(valid_keys):to_set()
	
	return function (modname, code, data)
		local invalid_keys
		for k in pairs(data) do
			if not valid_keys[k] then
				invalid_keys = invalid_keys or Array()
				invalid_keys:insert(k)
			end
		end
		if invalid_keys then
			invalid_keys_message(modname, code, data, invalid_keys, is_script)
		end
	end
end

-- Modification of isArray in [[Module:table]].
local function find_gap(t)
	local i = 0
	for _ in pairs(t) do
		i = i + 1
		if t[i] == nil then
			return i
		end
	end
end

local function check_array(modname, code, data, array_name)
	local gap = find_gap(data[array_name])
	if gap then
		discrepancy(modname, "The %s array in the data table for %s (<code>%s</code>) has a gap at index %d.",
			array_name, data.canonicalName or data[1], code, gap)
	end
end

local function check_other_names_or_aliases(modname, code, canonical_name, data, data_key, allow_nested)
	local array = data[data_key]
	if not array then
		return
	end
	check_array(modname, code, data, data_key)

	local names = {}
	local function check_other_name(other_name)
		if other_name == canonical_name then
			discrepancy(modname,
				"%s, the canonical name for <code>%s</code>, is repeated in the table of <code>%s</code>.",
				canonical_name, code, data_key)
		end
		if names[other_name] then
			discrepancy(modname,
				"The name %s is found twice or more in the list of <code>%s</code> for %s (<code>%s</code>).",
				other_name, data_key, canonical_name, code)
		end
		names[other_name] = true
	end

	for _, other_name in ipairs(array) do
		if type(other_name) == "table" then
			if not allow_nested then
				discrepancy(modname,
					"A nested table is found in the list of <code>%s</code> for %s (<code>%s</code>), but isn't allowed.",
					data_key, canonical_name, code)
			else
				for _, on in ipairs(other_name) do
					check_other_name(on)
				end
			end
		else
			check_other_name(other_name)
		end
	end
end

local function check_other_names_aliases_varieties(modname, code, canonical_name, data)
	if data.otherNames then
		check_other_names_or_aliases(modname, code, canonical_name, data, "otherNames")
	end
	if data.aliases then
		check_other_names_or_aliases(modname, code, canonical_name, data, "aliases")
	end
	if data.varieties then
		check_other_names_or_aliases(modname, code, canonical_name, data, "varieties", true)
	end
end

local get_codepoint = mw.ustring.codepoint
local function validate_pattern(pattern, modname, code, data, standardChars)
	if type(pattern) ~= "string" then
		discrepancy(modname, '"%s", the %spattern for %s (<code>%s</code>), is not a string.',
			pattern, standardChars and 'standard character ' or '', code, data.canonicalName)
	end
	local ranges
	for lower, higher in mw.ustring.gmatch(pattern, "(.)%-(.)") do
		if get_codepoint(lower) >= get_codepoint(higher) then
			ranges = ranges or Array()
			table.insert(ranges, { lower, higher })
		end
	end
	if ranges and ranges[1] then
		local plural = #ranges ~= 1 and "s" or ""
		discrepancy(modname, '%s (<code>%s</code>) specifies an invalid pattern ' ..
			'for %scharacter detection: <code>"%s"</code>. The first codepoint%s ' ..
			'in the range%s %s %s must be less than the second.',
			link(data.canonicalName), code, standardChars and 'standard ' or '', pattern, plural, plural,
			ranges
				:map(
					function(range)
						return range[1] .. "-" .. range[2] .. (" (U+%X, U+%X)")
							:format(get_codepoint(range[1]), get_codepoint(range[2]))
					end)
				:concat(", "),
			#ranges ~= 1 and "are" or "is")
	end
	if not pcall(mw.ustring.find, "", "[" .. pattern .. "]") then
		discrepancy(modname, '%s (<code>%s</code>) specifies an invalid pattern for ' ..
			(standardChars and 'standard' or '') .. ' character detection: <code>"%s"</code>',
			link(data.canonical_name), code, pattern)
	end
end

local function check_entry_name_or_sortkey(modname, code, data, replacements_name)
	local replacements = data[replacements_name]
	if type(replacements) == "string" then
		if replacements_name ~= "sort_key" then
			discrepancy(modname, "The %s field in the data table for %s (<code>%s</code>) must be a table.",
				replacements_name, data.canonicalName, code)
		end
		return
	end
	
	if (replacements.from ~= nil) ~= (replacements.to ~= nil) then
		discrepancy(modname,
			"The <code>from</code> and <code>to</code> arrays in the <code>%s</code> table for %s (<code>%s</code>) are not both defined or both undefined.",
			replacements_name, data.canonicalName, code)
	elseif replacements.from then
		for _, key in ipairs { "from", "to" } do
			local gap = find_gap(replacements[key])
			if gap then
				discrepancy(modname,
					"The %s array in the %s table for %s (<code>%s</code>) has a gap at index %d.",
					key, replacements_name, data.canonicalName, code, gap)
			end
		end
	end
	
	if replacements.remove_diacritics and type(replacements.remove_diacritics) ~= "string" then
		discrepancy(modname,
			"The <code>remove_diacritics</code> field in the <code>%s</code> table for %s (<code>%s</code>) table must be a string.",
			replacements_name, data.canonicalName, code)
	end
	
	if replacements.from and replacements.to
			and m_table.length(replacements.to) > m_table.length(replacements.from) then
		discrepancy(modname,
			"The <code>from</code> array in the <code>%s</code> table for %s (<code>%s</code>) must be shorter or the same length as the <code>to</code> array.",
			replacements_name, data.canonicalName, code)
	end
end

local function has_regular_language_child(parent_code)
	for code, data in pairs(m_language_data) do
		local ancestors = data.ancestors
		if ancestors then
			for _, ancestor in pairs(ancestors) do
				if ancestor == parent_code then
					return true
				end
			end
		end
	end
	return false
end

local function check_ancestors(modname, code, data, ancestors, is_etymology_language)
	check_array(modname, code, data, "ancestors")
	
	local canonical_name = data[1] or data.canonicalName
	if is_etymology_language then
		if not has_regular_language_child(code) then
			discrepancy(modname,
				"The etymology language %s (<code>%s</code>) has an <code>ancestors</code> field, "
				.. "but no regular languages list it as an ancestor.",
				link(canonical_name), code)
		end
	end
	
	for _, ancestor_code in ipairs(ancestors) do
		if not (m_language_data[ancestor_code] or m_etym_language_data[ancestor_code]) then
			discrepancy(modname,
				"%s (<code>%s</code>) lists an invalid language code <code>%s</code> as ancestor.",
				link(canonical_name), code, ancestor_code)
		end
	end
end

local function check_languages()
	local check_language_data_keys = check_data_keys{
		1, 2, 3, -- canonical name, wikidata item, family
		"entry_name", "sort_key", "otherNames", "aliases", "varieties",
		"type", "scripts", "ancestors",
		"wikimedia_codes", "wikipedia_article", "standardChars",
		"translit_module", "override_translit", "link_tr",
	}
	
	local function check_language(modname, code, data)
		local canonical_name, wikidata_item, lang_type = data[1], data[2], data.type
		
		check_language_data_keys(modname, code, data)
		
		if all_codes[code] then
			discrepancy(modname, "Code <code>%s</code> is not unique; it is also defined in [[Модул:%s]].", code, all_codes[code])
		else
			if not m_language_codes[code] then
				discrepancy("languages/code to canonical name", "The code <code>%s</code> (%s) is missing.", code, canonical_name)
			end
			all_codes[code] = modname
		end
		
		if not canonical_name then
			discrepancy(modname, "Code <code>%s</code> has no canonical name specified.", code)
		elseif language_names[canonical_name] then
			discrepancy(modname,
				"%s (<code>%s</code>) has a canonical name that is not unique; it is also used by the code <code>%s</code>.",
				link(canonical_name), code, language_names[canonical_name])
		else
			if not m_language_canonical_names[canonical_name] then
				discrepancy("languages/canonical names", "The canonical name %s (<code>%s</code>) is missing.", canonical_name, code)
			end
			language_names[canonical_name] = code
		end
		
		if wikidata_item then
			if not wikidata_item:match '^Q%d+$' then
				discrepancy(modname,
					"%s (<code>%s</code>) has a Wikidata item with an invalid form: <code>%s</code>.",
					canonical_name, code, wikidata_item)
			end
		end

		check_other_names_aliases_varieties(modname, code, canonical_name, data)
		
		if lang_type and not (lang_type == "regular" or lang_type == "reconstructed" or lang_type == "appendix-constructed") then
			discrepancy(modname, "%s (<code>%s</code>) is of an invalid type <code>%s</code>.", link(canonical_name), code, data.type)
		end
		
		if data.scripts then
			check_array(modname, code, data, "scripts")
			if not data.scripts[1] then
				discrepancy(modname, "%s (<code>%s</code>) has no scripts listed.", link(canonical_name), code)
			else
				for _, sccode in ipairs(data.scripts) do
					if not m_script_data[sccode] then
						discrepancy(modname,
							"%s (<code>%s</code>) lists an invalid script code <code>%s</code>.",
							link(canonical_name), code, sccode)
					end
		
					nonempty_scripts[sccode] = true
				end
			end
		end
		
		if data.ancestors then
			check_ancestors(modname, code, data, data.ancestors, false)
		end
		
		if data[3] then
			local family = data[3]
			if not m_family_data[family] then
				discrepancy(modname,
					"%s (<code>%s</code>) has an invalid family code <code>%s</code>.",
					link(canonical_name), code, family)
			end
			
			nonempty_families[family] = true
		end
		
		if data.sort_key then
			check_entry_name_or_sortkey(modname, code, data, "sort_key")
		end
		
		if data.entry_name then
			check_entry_name_or_sortkey(modname, code, data, "entry_name")
		end

		if data.standardChars then
			validate_pattern(data.standardChars, modname, code, data, true)
		end
	end
	
	-- Check two-letter codes
	local modname = "languages/data2"
	local data2 = require("Модул:" .. modname)
	
	for code, data in pairs(data2) do
		if not code:find("^[a-z][a-z]$") then
			discrepancy(modname, '%s (<code>%s</code>) does not have a two-letter code.', link(data.canonicalName), code)
		end
		
		check_language(modname, code, data)
	end
	
	-- Check three-letter codes
	for i = string.byte('a'), string.byte('z') do
		local letter = string.char(i)
		local modname = "languages/data3/" .. letter
		local data3 = require("Модул:" .. modname)
		local code_pattern = "^" .. letter .. "[a-z][a-z]$"
		
		for code, data in pairs(data3) do
			if not code:find(code_pattern) then
				discrepancy(modname,
					'%s (<code>%s</code>) does not have a three-letter code starting with "<code>%s</code>".',
					link(data.canonicalName), code, letter)
			end
			
			check_language(modname, code, data)
		end
	end
	
	-- Check exceptional codes
	modname = "languages/datax"
	local datax = require("Модул:" .. modname)
	
	for code, data in pairs(datax) do
		if code:find("^[a-z][a-z][a-z]?$") then
			discrepancy(modname, '%s (<code>%s</code>) has a two- or three-letter code.', link(data.canonicalName), code)
		end
		
		check_language(modname, code, data)
	end
	
	-- These checks must be done while all_codes only contains language codes:
	-- that is, after language data modules have been processed, but before
	-- etymology languages, families, and scripts have.
	local function check_code_and_name(modname, code, canonical_name)
		if not all_codes[code] then
			if not language_names[canonical_name] then
				discrepancy(modname,
					"The code <code>%s</code> and the canonical name %s should be removed; they are not found in a submodule of [[Module:languages]].",
					code, canonical_name)
			else
				discrepancy(modname,
					"<code>%s</code>, the code for the canonical name %s, is wrong; it should be <code>%s</code>.",
					code, canonical_name, language_names[canonical_name])
			end
		elseif not language_names[canonical_name] then
			local data_table = require("Модул:" .. all_codes[code])[code]
			discrepancy(modname,
				"%s, the canonical name for the code <code>%s</code>, is wrong; it should be %s.",
				canonical_name, code, data_table[1] or data_table.canonicalName)
		end
	end
	
	for code, canonical_name in pairs(m_language_codes) do
		check_code_and_name("languages/code to canonical name", code, canonical_name)
	end
	
	for canonical_name, code in pairs(m_language_canonical_names) do
		check_code_and_name("languages/canonical names", code, canonical_name)
	end		
end

local function check_etym_languages()
	local modname = "etymology languages/data"
	
	local check_etymology_language_data_keys = check_data_keys{
		"canonicalName", "otherNames", "aliases", "varieties", "parent",
		"wikipedia_article", "wikidata_item", "ancestors"
	}
	
	local function link(name)
		if not name then
			return "???"
		elseif name:find("[Јј]език$") then
			return name
		else
			return name .. " језик"
		end
	end
	
	for code, data in pairs(m_etym_language_data) do
		local canonical_name, parent, ancestors =
			data.canonicalName, data.parent, data.ancestors
		check_etymology_language_data_keys(modname, code, data)
		
		if all_codes[code] then
			discrepancy(modname, "Code <code>%s</code> is not unique; it is also defined in [[Модул:%s]].", code, all_codes[code])
		else
			all_codes[code] = modname
		end
		
		if not canonical_name then
			discrepancy(modname, "Code <code>%s</code> has no canonical name specified.", code)
		elseif language_names[canonical_name] then
			--[=[
			discrepancy(modname,
				"%s (<code>%s</code>) has a canonical name that is not unique; it is also used by the code <code>%s</code>.",
				link(data.names[1]), code, language_names[data.names[1]])
			--]=]
		else
			language_names[canonical_name] = code
		end
		
		check_other_names_aliases_varieties(modname, code, canonical_name, data)
		
		if parent then
			if type(parent) ~= "string" then
				discrepancy(modname,
					"Etymology-only %s (<code>%s</code>) has a parent language or family code that is %s rather than a string.",
					link(canonical_name), code, parent == nil and "nil" or "a " .. type(parent))
			elseif not (m_language_data[parent] or m_family_data[parent] or m_etym_language_data[parent]) then
				discrepancy(modname,
					"Etymology-only %s (<code>%s</code>) has invalid parent language or family code <code>%s</code>.",
					link(canonical_name), code, parent)
			end
			
			nonempty_families[parent] = true
		else
			discrepancy(modname,
				"Etymology-only %s (<code>%s</code>) has no parent language or family code.",
				link(canonical_name), code)
		end
		
		if ancestors then
			check_ancestors(modname, code, data, ancestors, true)
		end
	end

	local checked = {}
	for code, data in pairs(m_etym_language_data) do
		local stack = {}

		while data do
			if checked[data] then
				break	
			end
			if stack[data] then
				discrepancy(modname, "%s (<code>%s</code>) has a cyclic parental relationship to %s (<code>%s</code>)",
					link(data[1] or data.canonicalName), code,
					link(m_etym_language_data[data.parent].canonicalName), data.parent
				)
				break
			end
			stack[data] = true
			code, data = data.parent, data.parent and m_etym_language_data[data.parent]
		end
		
		for data in pairs(stack) do
			checked[data] = true	
		end
	end
end

local function check_families()
	local modname = "families/data"
	
	local check_family_data_keys = check_data_keys{
		"canonicalName", "otherNames", "aliases", "varieties", "family",
		"protoLanguage", "wikidata_item"
	}

	local function link(name)
		if not name then
			return "???"
		elseif name:find("[Ll]anguages$") then
			return "[[:Category:" .. name .. "|" .. name .. " family]]"
		else
			return "[[:Category:" .. name .. " languages|" .. name .. " family]]"
		end
	end
	
	for code, data in pairs(m_family_data) do
		check_family_data_keys(modname, code, data)
		
		if all_codes[code] then
			discrepancy(modname, "Code <code>%s</code> is not unique; it is also defined in [[Модул:%s]].", code, all_codes[code])
		else
			all_codes[code] = modname
		end
		
		if not data.canonicalName then
			discrepancy(modname, "<code>%s</code> has no canonical name specified.", code)
		elseif family_names[data.canonicalName] then
			discrepancy(modname,
				"%s (<code>%s</code>) has a canonical name that is not unique; it is also used by the code <code>%s</code>.",
				link(data.canonicalName), code, family_names[data.canonicalName])
		else
			family_names[data.canonicalName] = code
		end
		
		check_other_names_aliases_varieties(modname, code, data.canonicalName, data)
		
		if data.family then
			if data.family == code and code ~= "qfa-not" then
				discrepancy(modname,
					"%s (<code>%s</code>) has itself as its family.",
					link(data.canonicalName), code)
			elseif not m_family_data[data.family] then
				discrepancy(modname,
					"%s (<code>%s</code>) has an invalid parent family code <code>%s</code>.",
					link(data.canonicalName), code, data.family)
			end
			
			nonempty_families[data.family] = true
		end
	end
	
	for code, data in pairs(m_family_data) do
		if not (nonempty_families[code] or allowed_empty_families[code]) then
			discrepancy(modname, "%s (<code>%s</code>) has no child families or languages.", link(data.canonicalName), code)
		end
	end

	local checked = { ['qfa-not'] = true }
	for code, data in pairs(m_family_data) do
		local stack = {}

		while data do
			if checked[code] then
				break	
			end
			if stack[code] then
				discrepancy(modname, "%s (<code>%s</code>) has a cyclic parental relationship to %s (<code>%s</code>)",
					link(data[1] or data.canonicalName), code,
					link(m_family_data[data[3]].canonicalName), data[3]
				)
				break
			end
			stack[code] = true
			code, data = data.family, m_family_data[data[3]]
		end
		
		for code in pairs(stack) do
			checked[code] = true	
		end
	end
end

local function check_scripts()
	local modname = "scripts/data"
	
	local check_script_data_keys = check_data_keys({
		"canonicalName", "otherNames", "aliases", "varieties", "parent",
		"systems", "wikipedia_article", "characters", "direction",
		"character_category",
	}, true)
	
	local m_script_codes = require('Модул:scripts/code to canonical name')
	local m_script_canonical_names = require('Модул:scripts/by name')
	
	for code, data in pairs(m_script_data) do
		local canonical_name = data.canonicalName
		if not m_script_codes[code] and #code == 4 then
			discrepancy('scripts/code to canonical name', '<code>%s</code> (%s) is missing', code, canonical_name)
		end
		
		check_script_data_keys(modname, code, data)
		
		if not canonical_name then
			discrepancy(modname, "Code <code>%s</code> has no canonical name specified.", code)
		elseif script_names[canonical_name] then
			--[=[
			discrepancy(modname,
				"%s (<code>%s</code>) has a canonical name that is not unique; it is also used by the code <code>%s</code>.",
				link_script(data.names[1]), code, script_names[data.names[1]])
			--]=]
		else
			if not m_script_canonical_names[canonical_name] and #code == 4 then
				discrepancy('scripts/by name', '%s (<code>%s</code>) is missing', canonical_name, code)
			end
			script_names[canonical_name] = code
		end
		
		check_other_names_aliases_varieties(modname, code, canonical_name, data)
		
		if not nonempty_scripts[code] then
			discrepancy(modname,
				"%s (<code>%s</code>) is not used by any language%s.",
				link_script(canonical_name), code, data.characters and ""
					or " and has no characters listed for auto-detection")
		--[[
		elseif not data.characters then
			discrepancy(modname, "%s (<code>%s</code>) has no characters listed for auto-detection.", link_script(canonical_name), code)
		--]]
		end

		if data.characters then
			validate_pattern(data.characters, modname, code, data, false)
		end
	end
end

-- Warning: cannot be called twice in the same module invocation because
-- some module-global variables are not reset between calls.
function export.do_checks()
	messages = setmetatable({}, {
		__index = function (self, k)
			local val = Array()
			self[k] = val
			return val
		end
	})
	
	check_languages()
	check_etym_languages()

	-- families and scripts must be checked AFTER languages; languages checks fill out
	-- the nonempty_families and nonempty_scripts tables, used for testing if a family/script
	-- is ever used in the data
	check_families()
	check_scripts()
	
	setmetatable(messages, nil)
	
	local function find_code(message)
		return string.match(message, "<code>([^<]+)</code>")
	end
	
	find_code = require("Модул:fun").memoize(find_code)
	
	local function comp(message1, message2)
		local code1, code2 = find_code(message1), find_code(message2)
		if code1 and code2 then
			return code1 < code2
		else
			return message1 < message2
		end
	end
	
	for modname, msglist in pairs(messages) do
		msglist:sort(comp)
	end
	
	local ret = messages
	messages = nil
	return ret
end

function export.format_message(modname, msglist)
	return '===[[Модул:' .. modname .. ']]==='
		.. msglist
			:map(
				function(msg)
					return "\n* " .. msg
				end)
			:concat()
end

function export.check_modules(...)
	local ret = Array()
	local messages = export.do_checks()
	for _, module in ipairs {...} do
		local msglist = messages[module]
		if msglist then
			ret:insert(export.format_message(module, msglist))
		end
	end
	return ret:concat("\n")
end

function export.check_modules_t(frame)
	local args = m_table.shallowcopy(frame.args)
	return export.check_modules(unpack(args))
end

function export.perform(frame)
	local messages = export.do_checks()
	
	-- Format the messages
	local ret = Array()
	for modname, msglist in m_table.sortedPairs(messages) do
		ret:insert(export.format_message(modname, msglist))
	end
	
	-- Are there any messages?
	if i == 1 then
		return '<b class="success">Glory to Arstotzka.</b>'
	else
		ret:insert(1, '<b class="warning">Discrepancies detected:</b>')
		
		return ret:concat('\n')
	end
end

return export