MACIEJ KARPIŃSKI

Statistics

These are texts, links, tasks etc. prepared in order to compensate for the classes on Wednesday(s), March 11th and 18th. Please note that there are some tasks to do! Please send me the first via e-mail no later than by the 20th of March, and the second no later than by the 27th of March 🙂

INTRO

Linguistic data may be obtained/collected in different ways. There are different linguistic data types, and these types of data may correspond to different types of statistic variables. When we deal with the data, we should know how they were collected, what are their basic properties, and what types of variables we can use to deal with them. We should be also able to judge the quality of the data which depends on their sources, methods of collection and initial processing.

TOPIC 1: LINGUISTIC DATA COLLECTION

This text extends our discussion from the previous meeting. Please read the texts carefully and follow further instructions.

Rose, Heath & Mckinley, Jim & Baffoe-Djan, Jessica. (2019). Data Collection Research Methods in Applied Linguistics.

On observational data:

Nivedita Lakhar: Linguistic data collection: A field observation

On interviewing:

Alshenqeeti, H. (2014). Interviewing as a data collection method: A critical review. English linguistics research, 3(1), 39-45.

Appropriate approach to sampling is absolutely crucial in obtaining high quality data. Please take a look at these materials and think of some examples of good and bad approaches to sampling in linguistic data.

Sampling methods review (web page – a very brief intro to sampling)

Finally, a more complex but important text which bridges some of our topics in statistics with language documentation:

Himmelmann, N. P. (2012). Linguistic data types and the interface between language documentation and description. Language documentation & conservation, 6, 187-207.

A web page – straightforward and easy to follow

Kinds and sources of linguistic data

This is what you should know and be able to do after reading/viewing the materials above:

  1. Briefly discuss data sources and ways of data collection in linguistics (query, interview, experiment, observation, etc.)
  2. Describe various approaches to sampling and the issue of sampling errors.
  3. Characterize some peculiar properties of linguistic data (e.g. when compared to other fields)
  4. Recognize the factors that influence the quality of the data (e.g. representative, precise, in adequate amounts)

Finally, there are two small tasks for you. Please select ONE of the two topics and address it on no more than two pages:

  • There is a small, isolated community of ca. 30 people that you are allowed to visit, probably as the first linguist. They live in the mountains, in an area difficult to access. You’ll have a chance to stay there for 2-3 days.
    • What would you like to know before you go there about the language, the people and the place?
    • What kind of data you decide to collect first (especially if this is to be “once in the lifetime)
    • What approach(es) to data collection would you use?
    • What would be major factors deciding on the quality and credibility of your data?
  • You are asked to find the most popular words in Polish (German, French, etc. – select one language) newspapers in the recent year. You have a very limited time for this task.
    • How would you decide on the selection of newspapers? Random? Titles? Political profiles? Issues? Proportion in the market? Else?
    • How would you decide on the selection of texts in the newspapers? Size of the articles? (e.g. take only small/large ones) Only titles? Only text body? Balance topics (sport, weather, politics)?
    • Of course, you can take EVERYTHING you can but if there are short of time and resources, you may want to sample. Due to “conscious” sampling you will be able to answer some more detailed questions, not only on the “general” frequency of a given word. Moreover, well-though sampling may increase the quality of your data and avoid some biases. Give some examples of advangates.

=============================================================

TOPIC 2: DATA TYPES AND VARIABLES IN LANGUAGE STUDIES

From the materials above, you probably already know that there are various types of data to deal with in language studies. It is extremely important to recognize the characteristics of the data you deal with as it determines (or at least influences) your choice of methods and puts some limitations on

You may want to listen to a brief (15 min) lecture on the types of statistical variables:

https://www.youtube.com/watch?v=ZxV-kf0yBss

If you feel more like reading, try THIS text.

Please read THIS and watch THAT carefully! It’s on measurement scales.

Please run SPSS (I assume you’ve already installed it). Close the “Welcome” window with numerous options and file lists. You’ll see something like a spreadsheet. In the lower left corner, you’ll find two tabs: Dane (Data) and Zmienne (Variables). You’re probably in the Data sheet. Click on the Zmienne tab. Here you can define variables that you will deal with. For now, focus on the second and the penultimate column only: Typ and Poziom pomiaru. As for the Typ(e), browse available choices. You can return to the Data tab and type some numbers in the first column. Then, each time you change the data type (experiment with it!), take a look at these numbers how they change.

Once you are familiar with this, take a look at Poziom pomiaru (Measurement level). For each variable you want to define, there are three options. Test them as well.

Here you have some support (SPSS offers extensive help system – please use it whenever in doubt):

https://www.spss-tutorials.com/spss-variable-types-and-formats/

Additional material:

An interesting interview on linguistic data evolution and importance

What you should know/can after this class:

  1. What are properties and major categories of linguistic data?
  2. What are the four measurement scales?
  3. What types of variables are used in statistics?
  4. You should be able to categorize the type of the data you deal with and represent it with an adequate type of variable (e.g., reaction time measurements are continuous, numerical, positive numbers – numerical variable can be used

Of course, you’ll find more encyclopaedic information in Wikipedia and in many statistical portals. Try, for example, THIS one.

Don’t worry if you don’t understand some of the theoretical or practical parts of this material. Send me a question via e-mail and I’ll find more reading for you or try to answer your questions directly. Be active! There are tons of materials on the web. The only question is probably the quality. Please look for reliable materials from renown companies, institutions and researchers.