Thursday, August 07, 2014

Superdialects Identified From Twitter Data

A dialect is a particular form of language limited to a specific region or social group. Linguists are fascinated by dialects because they reveal social classes, patterns of immigration and how groups have influenced each other in the past.

But studying dialects is hard work. Traditionally, linguists do this by interviewing a relatively small number of people, typically a few hundred, and asking them to fill out questionnaires. Researchers then use the results to create linguistic atlases but these are naturally limited by the choice of the locations and individuals who have been studied.

Today, Bruno Gonçalves at the University of Toulon in France and David Sánchez at the Institute for Cross-Disciplinary Physics and Complex Systems on the island of Majorca, Spain, say they have found a new way to study dialects on a global scale using messages posted on Twitter. The results reveal a major surprise about the way dialects are distributed around the world and provide a fascinating snapshot of how they are evolving under various new pressures, such as global communication mechanisms like Twitter.

Gonçalves and Sánchez begin by sampling all of the tweets written in Spanish over two years and that also contain geolocation information. That gave them a database of 50 million geolocated tweets, with most from Spain, Spanish America, and the United States.

They then searched these tweets for word variations that are indicative of specific dialects. For example, the word for car in Spanish can be auto, automóvil, carro, coche, concho, or movi, with each being more common in different dialects. Different words for bra include ajustador, ajustadores, brasiel, brassiere, corpiño, portaseno, sostén, soutien, sutién, sujetador, and tallador while variations on computer include computador, computadora, microcomputador, microcomputadora, ordenador, PC, and so on.

They then plotted where in the world these different words were being used, producing a map of their distribution. This map clearly shows how different words are commonly used in certain parts of the world.

However, they also looked at the environments in which the words were used, whether in large cities or in rural locations. And that revealed a major surprise.

No comments: