AutoLEX: An Automatic Framework for Linguistic Exploration

AutoLEX is a tool for exploring language structure and provides an automated framework for extracting a first-pass grammatical specification from raw text in a concise, human-and machine-readable format.

Along with the language structure, we also provide rules to help with vocabulary learning, which we also extract automatically.

We apply our framework to all languages of the Syntactic Universal Dependencies project .

Here are the languages (and treebanks) we currently support.

ISOLanguageTreebankLinguistic Analysis
en English EWT
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • el Greek GDT
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • Learn Vocab
  • es Spanish GSD
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • Learn Vocab
  • mr Marathi SAM-EN
  • General Information
  • Learn Vocab
  • WordOrder
  • Suffix Usage
  • Agreement
  • kn Kannada SAM-EN
  • General Information
  • Learn Vocab
  • WordOrder
  • Suffix Usage
  • Agreement
  • fr French GSD
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • hi Hindi HDTB
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • it Italian VIT
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • tr Turkish IMST
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • ru Russian SynTagRus
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • no Norwegian Nynorsk
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • ug Uyghur UDT
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • ro Romanian Nonstandard
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • bg Bulgarian BTB
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • gl Galician CTG
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • cs Czech PUD
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • fi Finnish TDT
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • pl Polish PDB
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • la Latin ITTB
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • sl Slovenian SST
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • zh Chinese GSD
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • sms Skolt Giellagas
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • nl Dutch Alpino
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • ca Catalan AnCora
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • bho Bhojpuri BHTB
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • ur Urdu UDTB
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • am Amharic ATT
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • pcm Naija NSC
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • kpv Komi IKDP
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • id Indonesian GSD
  • General Information
  • WordOrder
  • CaseMarking
  • sr Serbian SET
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • eu Basque BDT
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • lt Lithuanian ALKSNIS
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • vi Vietnamese VTB
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • tl Tagalog TRG
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • fa Persian Seraji
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • sme North Giella
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • da Danish DDT
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • ar Arabic NYUAD
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • gd Scottish ARCOSG
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • grc Ancient PROIEL
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • mdf Moksha JR
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • te Telugu MTG
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • mt Maltese MUDT
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • wo Wolof WTB
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • ja Japanese GSD
  • General Information
  • aii Assyrian AS
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • he Hebrew HTB
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • pt Portuguese GSD
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • cu Old PROIEL
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • olo Livvi KKPP
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • myv Erzya JR
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • cy Welsh CCG
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • be Belarusian HSE
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • ko Korean Kaist
  • General Information
  • sv Swedish LinES
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • de German GSD
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • uk Ukrainian IU
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • bxr Buryat BDT
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • ta Tamil TTB
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • ga Irish IDT
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • sk Slovak SNK
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • hu Hungarian Szeged
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • got Gothic PROIEL
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • hr Croatian SET
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • akk Akkadian PISANDUB
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • lzh Classical Kyoto
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • cop Coptic Scriptorium
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • lv Latvian LVTB
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • wbp Warlpiri UFAL
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • sa Sanskrit UFAL
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • gun Mbya Thomas
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • kk Kazakh KTB
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • et Estonian EDT
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • fro Old SRCMF
  • General Information
  • WordOrder
  • CaseMarking
  • hsb Upper UFAL
  • General Information
  • bm Bambara CRB
  • General Information
  • WordOrder
  • CaseMarking
  • af Afrikaans AfriBooms
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • yue Cantonese HK
  • General Information
  • WordOrder
  • CaseMarking
  • hy Armenian ArmTDP
  • General Information
  • WordOrder
  • CaseMarking
  • cs Czech PDT
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking