We all know that structural features of languages exhibit non-random areal patterning that is often quite independent of genetic affiliation. On the model of the Balkan Sprachbund, quite a few other linguistic areas have been discovered in other parts of the world on the basis of phonological, morphosyntactic and perhaps also lexical features. As we have learned more recently, such patterning may also be found at the global level. We believe that linguists have now assembled enough information, both in the form of good reference grammars and in the form of global cross-linguistic studies, that we can begin thinking about making this information available to a wider public in the form of an atlas.
This Atlas will show structural features of languages in much the same way as linguistic data are displayed in dialect atlases. It will, so to speak, show us the isoglosses of the dialects of Human Language. We envisage an Atlas with about 100 structural features, each shown on a two-page global map and accompanied by a two-page description and discussion of the feature. To make areal patterns visible, each feature needs to be mapped for at least 150 languages, and ideally more than 200. In addition to the printed version, we envisage a fully searchable CD-ROM version.
The World Atlas of Language Structures will provide a research tool for typologists and linguists with areal interests. It is hoped that it will also make linguists (and interested lay persons) aware of the structural/ grammatical/ typological diversity of human languages at a global scale. Although theoretically most linguists probably know that there are (still) thousands of languages spoken, many would probably profit enormously from such a publication that would make many of the widely discussed typological parameters salient in colored maps.
Examples of features that we hope will be represented are:
-- number of consonants
-- number of vowels
-- front rounded vowels
-- clicks
-- interdentals
-- number of lexical tones
-- position of lexical stress
-- syllable structure (CV, CVC, CCV, etc.)
-- suffixing vs. prefixing
-- root-and-pattern morphology
-- reduplication
-- echo compounds
-- articles
-- gender: number of distinct genders
-- nominal number (plural, dual, etc.)
-- numeral classifiers
-- numeral systems
-- 1st person plural exclusive pronoun
-- negation: morphological/periphrastic
-- order of S,V,O
-- noun-adjective order
-- noun-genitive order
-- noun-relative clause order
-- noun-adposition order
-- number of cases
-- clause alignment: accusative/ergative/active
-- overt WH movement
-- adjectives: nouny/verby
-- instrumental/comitative syncretism
-- alienable/inalienable possession
-- copula with nominal/adjectival/locative predication
-- question particles: YNQ, others
-- color terms
Features should include some that are traditionally considered interesting by lay people (e.g. tone, gender). In addition features that are known to have a restricted distribution (root-and-pattern morphology, clicks) should be included, to make their anomalous patterning even more striking.
The scale of the project, with at least 150 languages required for a map, means that the features considered will have to be rather shallow. For instance, it will hardly be possible at this stage of our knowledge to include maps on long-distance reflexives, parasitic gaps and intonation patterns. But even the more shallow features that can be mapped will often be highly relevant to current theoretical concerns, and the conspicuous gaps of the Atlas will hopefully engender more enthusiasm for filling the gaps in our knowledge by writing detailed grammars of little-known languages all over the world.
Preliminary list of features and authors
List of
language experts that have agreed to assist contributors
1. Introduction
The World Atlas of Language Structures (WALS) will consist of world-wide maps showing the geographical distribution of approximately 100 structural features of languages (phonological, morphosyntactic, lexical), plus an Introduction and a General Section. Each feature will be the responsibility of a single author (or a single set of co-authors), and accordingly each contribution will be called a "Chapter". (Of course, authors can contribute several chapters to the Atlas.)
The regular chapters will follow a uniform format. Each will take up four pages in the Atlas: two for the world-wide map, and two text pages that describe the structural feature and discuss the emerging areal pattern(s) (if any).
In addition to the printed version, there will be a CD-ROM version of
the Atlas.
2. Language sample
Ideally, the Atlas would describe the features in a set of languages that is uniform for all the chapters. Since this is very difficult to achieve, we have opted for a compromise model:
There will be a core sample of 100 languages, listed in the Appendix of these Guidelines. We strongly recommend that authors provide data for these 100 languages, if at all possible. Beyond that, individual authorsí samples may diverge.
However, individual samples must be sufficiently large and representative. Authors should strive for a sample size of about 200 languages, though the absolute minimum, which may be acceptable for a few particularly difficult but interesting features, is 100 languages. More than 200 languages are of course always possible and welcome.
In selecting the additional languages of their sample, authors should strive to choose languages more or less evenly from different families and geographical areas. The following numbers of languages from different areas in the 100-language core sample may provide a general idea of what seems to be suitable proportions: Africa: 18; Europe and Mainland Asia: 25; Insular Southeast Asia and Pacific, excluding New Guinea and Australia: 8; New Guinea: 9; Australia: 7; North America, including Mexico and Central America: 19; South America: 14.
The most important requirement is that no large area should be seriously underrepresented. For instance, a map with no Australian or no Mesoamerican language is unacceptable, even if it contains dozens of Papuan and Amazonian languages. Overrepresentation is acceptable in areas with low degree of language diversity or phylogenetic diversity, e.g. in northern Eurasia and Bantu Africa. A map with data for (say) Swahili, Makua, Zulu, Herero, Kinyarwanda, Duala and Shona is acceptable even though it overrepresents Bantu, because otherwise southern Africa would look fairly empty. Likewise, authors should try not to include too many languages that are spoken in close vicinity, because these are difficult to represent on a word-wide map, and in such cases some languages may have to be excluded from the printed version. (For the CD-ROM version, so such limitation exists.)
Geographical closeness need not be a major consideration for the six areas of the Caucasus, California, southern Mexico, southeastern Colombia, New Guinea and northern Australia, because inset maps will probably have to be necessary for these areas anyway.
In addition to the 100-language sample, the editors propose an additional
100 languages that might be included in a 200-language sample for the Atlas,
with the understanding that this additional list is only being included
to assist authors who do not already have samples and who could benefit
from suggestions as to what languages to include beyond the basic 100 languages.
3. Nature of structural features
3.1. Number of feature values. The simplest feature has two possible feature values, e.g. "language L has/lacks X", "L has word order XV or VX", etc. However, many maps will describe features with more possible values. For instance, a map on polar questions may have the feature values "Only intonation", "Initial question particle", "Final question particle", "Verb fronting", "Special verbal mood". If the feature values are just an unstructured list, then not more than four or five different features should be distinguished. (This depends to some extent on the geographical distribution ? if all five values occur scattered over the map, then the map will be very difficult to read; if two of the five features are restricted to very specific regions, then five features may be quite OK.)
3.2. Relationship between feature values. If the feature values
can be ordered on a scale (e.g. 2 vowels ? 3 vowels ? 4 vowels ? 5-6 vowels
? more than 7 vowels), they will be coded in an iconic way by the map-makers,
e.g. different shades of blue, with the darkest blue representing the greatest
number of vowels, and light blue representing the smallest number. In such
cases it may be possible to have more than 4-5 feature values.
In many cases a feature is apparently relevant only to a subset
of languages. For instance, a map on the dual may have the feature values
"language has dual in pronouns" and "language has dual in pronouns and
full nouns". This feature value does not apply to dual-less languages,
but they will be shown on the map as well, and assigned the feature value
"language has no dual at all". Thus, all features will broken up into different
values in such a way that all languages can be assigned a feature value.
4. Text pages
The two text pages consist of a text comprising at least two main parts: a Descriptive Part and an Analytical Part. The Descriptive Part describes the feature and gives examples from a handful of languages. It provides precise definitions of the feature values and discusses difficulties in identifying the correct feature value for a given language. In many cases, a full description and discussion will not be possible, so bibliographical references to the theoretical and typological literature (often to the authorís own work published elsewhere) are important here. This part should not be too technical and should in principle be accessible to readers who are not specialists in grammatical typology.
The Analytical Part discusses the results of the geographical mapping. It points out random distributions and areal clusterings, tries to separate genetic effects from true Sprachbund-like patterns, attempts explanations of observed patterns, mentions possible correlations (e.g. large vowel systems in Africa correlating with vowel harmony systems), discusses earlier claims in the literature, etc.
There should also be space here for blow-up maps of smaller areas (such
as the Balkans, or Mesoamerica, or any other area that is particularly
interesting for a particular feature, or that the author happens to be
interested in), tables, or other visual material that makes the text pages
look less technical than an average journal paper.
5. Data base
The data base must consist of at least three fields: Language Name, Feature Value, and Bibliographical Reference (author/year and, if possible, page number). For some feature values it may be realistic to have a fourth field: Example (e.g. the shape of the definite article in a map of the feature "Existence of definite articles"). This fourth field is optional; the option will primarily be taken by authors who already have the relevant data.
The information from the third (and fourth) field is relevant only for the CD-ROM version.
The database must be submitted both as a computer copy (on a floppy disk or as an e-mail message) and as a hard copy. It should not be a document in a database program, but either a text (Ascii) document or a Microsoft Word document. Thus, data in a database program should be exported as a text document before they are submitted. For the fourth field, the hard copy may be the only useful version because of nonstandard characters.
6. Deadlines
6.1. The Proposal. Coordinating about 100 chapters from many different authors and turning them into a formally coherent Atlas is a formidable task. To help them in this task, the editors need some advance information from the authors: a provisional outline of the content of their chapter(s), called "Proposal". The Proposal should contain
(i) a brief description of the feature to be mapped in the chapter
(ii) a list of the feature values that will be used along with
preliminary
definitions
In the case of potential authors who are not starting from scratch,
a brief description of existing work (published or unpublished) would be
very useful for the editors.
Authors should send the Proposal as soon as possible. The editors will
examine the received Proposals and discuss possible problems with the authors.
If there are two Proposals for the same feature, the Proposal that was
sent in first will generally be accepted.
6.2. The Chapter. The deadline for the first draft of Chapters is 31 December 2000. Authors who are not starting from scratch are encouraged to send in their Chapter(s) before this date, because it is not easy for the editors to process everything simultaneously.
The first draft minimally contains the database (i.e. languages paired with feature values) and the Descriptive Part of the text pages. Authors are not required to create a map on their own on the basis of their data.
The editors will create a map from the data provided by the author and
send it to the author as soon as possible. When the author gets this map
and the editorsí comments, s/he can complete his/her Chapter, i.e. finish
the text pages (add the Analytical Part) and add or correct data. The final
deadline is 31 December 2001.
Appendix: The samples
100-Language Sample:
Abkhaz, Acoma, Alamblak, Amele, Apurina, Asmat, Bagirmi, Barasano, Basque,
Bukiyip, Burmese, Burushaski, Canela-Kraho, Chalcatongo Mixtec, Chamorro,
Chukchi, Copainala Zoque, Cree, Daga, Egyptian Arabic, English, Fijian,
Finnish, French, Georgian, German, Gooniyandi, Grebo, Greek (Modern),
Guarani, Harar Oromo, Hausa, Hebrew, Hindi, Hixkaryana, Hmong Njua,
Imbabura Quechua, Imonda, Indonesian, Jakaltek, Japanese, Koyraboro
Senni
Songhay, Kannada, Karok, Kayardild, Kewa, Khalkha, Kiowa, Koasati,
Korean,
Krongo, Kutenai, Lakota, Lango, Lavukaleve, Lezgian, Lower Grand Valley
Dani, Luvale, Makah, Malagasy, Mandarin Chinese, Mangarayi, Mapuche,
Maricopa, Martuthunira, Mataco, Maung, Maybrat, Meithei, Nama (Khoekhoe),
Ngiyambaa, Oneida, Otomi, Paiwan, Persian, Piraha, Rama, Rapanui, Russian,
Sango, Sanuma, Slave, Spanish, Supyire, Swahili, Tagalog, Tamazight
(Ayt
Ndhir dialect), Thai, Tiwi, Tukang Besi, Turkish, Vietnamese, Warao,
Wari,
West Greenlandic, Wichita, Yagua, Yaqui, Yoruba, Zulu.
Additional languages in 200-Language Sample:
Abipon, Ainu, Araona, Armenian, Awa Pit, Aymara, Bambara, Bawm, Beja,
Brahui, Bribri, Cahuilla, Cambodian, Carib, Cayuvava, Coast Tsimshian,
Comanche, Dehu, Diola-Fogny, Dongolese Nubian, Ekari, Epena Pedee,
Evenki,
Ewe, Fur, Garo, Haida, Hanis Coos, Hungarian, Hunzib, Igbo, Ika, Ingush,
Iraqw, Irish, Kanuri, Kapau, Karo Batak, Kawesqar, Kayah Li, Kera,
Ket,
Khasi, Khmu, Kilivila, Kiribatese, Kobon, Kongo, Koromfe, Kunama, Ladakhi,
Lak, Latvian, Lealao Chinantec, Lepcha, Maba, Maori, Maranungku, Marind,
Mundari, Murle, Navajo, Ndyuka, Nenets, Nez Perce, Ngiti, Nivkh,
Nkore-Kiga, Nunggubuyu, Paamese, Passamaquoddy, Paumari, Pitjantjatjara,
Selknam, Semelai, Sentani, Shipibo-Konibo, Sierra Miwok, Southeastern
Pomo, Squamish, Suena, Taba, Tetelcingo Nahuatl, Tlingit, Trumai, Tunica,
Una, Ungarinjin, Urubu-Kaapor, Usan, Wambaya, Wardaman, Witoto, Yidiny,
Yimas, Yuchi, Yukaghir, Yup'ik, Yurok, !Xu (Ju/'hoan).