AutoLEX: An Automatic Framework for Linguistic Exploration

AutoLEX is a tool for exploring language structure and provides an automated framework for extracting a first-pass grammatical specification from raw text in a concise, human-and machine-readable format.

Along with the language structure, we also provide rules to help with vocabulary learning, which we also extract automatically.

We apply our framework to all languages of the Syntactic Universal Dependencies project .

Here are the languages (and treebanks) we currently support.

ISOLanguageTreebankLinguistic Analysis
en English EWT
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • el Greek GDT
  • Agreement
  • WordOrder
  • CaseMarking
  • Learn Vocab
  • es Spanish GSD
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • Learn Vocab
  • mr Marathi SAM-EN
  • General Information
  • Learn Vocab
  • WordOrder
  • Suffix Usage
  • Agreement
  • kn Kannada SAM-EN
  • General Information
  • Learn Vocab
  • WordOrder
  • Suffix Usage
  • Agreement
  • fr French GSD
  • Agreement
  • WordOrder
  • CaseMarking
  • hi Hindi HDTB
  • General Information
  • Agreement
  • WordOrder
  • CaseMarking
  • ta Tamil TTB
  • Agreement
  • WordOrder
  • CaseMarking
  • ta Turkish IMST
  • Agreement
  • WordOrder
  • CaseMarking
  • ca Catalan AnCora
  • Agreement
  • WordOrder
  • CaseMarking
  • ur Urdu UDTB
  • WordOrder
  • id Indonesian GSD
  • WordOrder
  • CaseMarking
  • sr Serbian SET
  • WordOrder
  • CaseMarking
  • eu Basque BDT
  • WordOrder
  • CaseMarking
  • lt Lithuanian ALKSNIS
  • WordOrder
  • CaseMarking
  • vi Vietnamese VTB
  • WordOrder
  • CaseMarking
  • fa Persian Seraji
  • WordOrder
  • CaseMarking
  • sme North_Sami Giella
  • WordOrder
  • CaseMarking
  • da Danish DDT
  • WordOrder
  • CaseMarking
  • swl Swedish_Sign_Language SSLC
  • WordOrder
  • CaseMarking
  • ar Arabic NYUAD
  • WordOrder
  • CaseMarking
  • gd Scottish_Gaelic ARCOSG
  • WordOrder
  • CaseMarking
  • grc Ancient_Greek PROIEL
  • WordOrder
  • CaseMarking
  • de German HDT
  • WordOrder
  • CaseMarking
  • it Italian VIT
  • WordOrder
  • CaseMarking
  • no Norwegian Nynorsk
  • WordOrder
  • CaseMarking
  • ug Uyghur UDT
  • WordOrder
  • CaseMarking
  • ro Romanian Nonstandard
  • WordOrder
  • CaseMarking
  • bg Bulgarian BTB
  • WordOrder
  • CaseMarking
  • gl Galician CTG
  • WordOrder
  • CaseMarking
  • cs Czech PDT
  • WordOrder
  • CaseMarking
  • fi Finnish TDT
  • WordOrder
  • CaseMarking
  • pl Polish PDB
  • WordOrder
  • CaseMarking
  • la Latin ITTB
  • WordOrder
  • CaseMarking
  • zh Chinese GSD
  • WordOrder
  • CaseMarking
  • nl Dutch Alpino
  • WordOrder
  • CaseMarking
  • te Telugu MTG
  • WordOrder
  • CaseMarking
  • mt Maltese MUDT
  • WordOrder
  • CaseMarking
  • wo Wolof WTB
  • WordOrder
  • CaseMarking
  • ja Japanese BCCWJ
  • WordOrder
  • CaseMarking
  • orv Old_Russian TOROT
  • WordOrder
  • CaseMarking
  • he Hebrew HTB
  • WordOrder
  • CaseMarking
  • pt Portuguese GSD
  • WordOrder
  • CaseMarking
  • cu Old_Church_Slavonic PROIEL
  • WordOrder
  • CaseMarking
  • olo Livvi KKPP
  • WordOrder
  • CaseMarking
  • be Belarusian HSE
  • WordOrder
  • CaseMarking
  • ko Korean Kaist
  • WordOrder
  • CaseMarking
  • sv Swedish LinES
  • WordOrder
  • CaseMarking
  • uk Ukrainian IU
  • WordOrder
  • CaseMarking
  • bxr Buryat BDT
  • WordOrder
  • CaseMarking
  • kmr Kurmanji MG
  • WordOrder
  • CaseMarking
  • ga Irish IDT
  • WordOrder
  • CaseMarking
  • sk Slovak SNK
  • WordOrder
  • CaseMarking
  • hu Hungarian Szeged
  • WordOrder
  • CaseMarking
  • got Gothic PROIEL
  • WordOrder
  • CaseMarking
  • hr Croatian SET
  • WordOrder
  • CaseMarking
  • lzh Classical_Chinese Kyoto
  • WordOrder
  • CaseMarking
  • cop Coptic Scriptorium
  • WordOrder
  • CaseMarking
  • lv Latvian LVTB
  • WordOrder
  • CaseMarking
  • kk Kazakh KTB
  • WordOrder
  • CaseMarking
  • et Estonian EDT
  • WordOrder
  • CaseMarking
  • fro Old_French SRCMF
  • WordOrder
  • CaseMarking
  • hsb Upper_Sorbian UFAL
  • WordOrder
  • CaseMarking
  • af Afrikaans AfriBooms
  • WordOrder
  • CaseMarking
  • hy Armenian ArmTDP
  • WordOrder
  • CaseMarking