Neuroling Resources

a playground for aspiring neurolinguists

This project is maintained by jessb0t

Welcome!

This website is intended to provide students of neurolinguistics, both novice and experienced alike, with useful resources to explore data of various types. :nerd_face:

Scroll down to find open datasets that you can download and explore. These are followed by a small smattering of practical tutorials in neuroscience, programming, and statistics.

Datasets

The public release of datasets is a cornerstone of open science. In this particular collection, I include only datasets that could be relevant to an investigation of linguistics-related questions (including auditory processing) and which contain neural data (EEG, MEG, FFR, ECoG, fMRI, etc.). They are in no particular order. All papers are linked to the DOI or ArXiv for ease of reference, but many are also freely available via PubMed Central.

ERP CORE | EEG

The ERP CORE dataset has two experiments of particular interest to neurolinguists: the N400 and the MMN. The N400 will especially appeal to those who want to investigate lexical semantics (in English) while the MMN is an important ERP in auditory processing. Participants are 40 English speakers.

OSF:N400 –– OSF:MMN –– Website –– Paper

Llanos/Reetzke Audiobook Listening | EEG

A dataset of 15 participants listening to an English audiobook, read by an American male speaker, interlaced with distractor tones. Participants are monolingual English L1 (n=15) and multilingual English L2, Mandarin L1 speakers (n=15).

Zenodo –– Paper

Broderick Natural Listening Datasets | EEG

A collection of datasets that includes naturalistic listening, RSVP reading, and speech in noise experiments. The paper offers details on stimuli, procedures, and participants. All three auditory datasets (natural speech, reverse speech, and cocktail party speech) are also available in CND format from the CNSP.

Dryad –– CNSP –– Paper

MEG-MASC | MEG

An MEG dataset in which 27 participants (all but one of whom were English L1) listened to several English stories produced by a text-to-speech synthesizer with three distinct synthetic voices and multiple speech rates.

GitHub –– OSF –– Paper

LeBel Podcast Listening | fMRI

Deep data for 8 participants (language background undisclosed) who listen to multiple hours of natural, English-language stories from The Moth Radio Hour and the NYT Modern Love podcast during fMRI scanning. Extensive annotations for the podcast stimuli.

GitHub –– OpenNeuro –– Paper

‘Narratives’ (Nastase Natural Listening) | fMRI

A large dataset (345 participants!) listening to naturalistic audio while in fMRI scanning. Participants include both English L1 and English L2; stimuli are varied but all naturalistic English speech. Extensive annotations are available for the stimulus stories.

OpenNeuro –– GitHub –– Datalad –– Paper

Tang Pitch Listening | ECoG

A dataset of ECoG data from the superior temporal gyrus (STG) of 10 participants who listened to single English sentences with varied pitch contour, phonetic content, and synthesized speaker.

Zenodo –– Paper

Brennan ‘Alice in Wonderland’ Listening | EEG

An EEG dataset of 49 participants listening to the first chapter of Alice’s Adventures in Wonderland read in English. Participant language background was not disclosed. Dataset is also available in CND format from the CNSP.

UMich Deep Blue –– CNSP –– Paper

Das Auditory Attention | EEG

An EEG dataset of 16 participants who simultaneously listened to six-minute snippets of two stories read in Dutch. Competing stories were either presented separately in each ear or in the same stream using a head-related transfer function (HRTF). Participant language background was not disclosed. Dataset is also available in CND format from the CNSP.

Zenodo –– CNSP –– Paper

Cavanagh Oddballs | EEG

Two EEG datasets using a 3-stimulus oddball paradigm with two clinical populations:

Data:PD –– Paper:PD –– Data:mTBI –– Paper:mTBI

Ershaid Cortical Tracking in Noise | EEG+Pupillometry

A dataset that includes EEG, pupillometry, self-reports of listening effort, and phonological and reading assessments for 49 Spanish-speaking participants. Stimuli sentences were presented in quiet or embedded in either multitalker babble (“cafeteria”) or reverberation noise.

OSF –– Paper

Issa Measures of Cortical Tracking of Speech | ECoG/EEG/MEG

A compilation of other datasets, using different methodologies and with stimuli in different languages, that the authors use to explore five different methods for extracting the speech envelope.

OSF –– Paper

‘Baby Rhythms’ Cortical Tracking in Infants of Nursery Rhymes | EEG

EEG was recorded from a longitudinal cohort of 50 infants (3 sessions) born in the United Kingdom, some exposed to a monolingual home environment and others a bilingual home environment, as the infants listened to eighteen nursery rhymes. Dataset is also available in CND format from the CNSP.

OSF –– CNSP –– Paper

Kösem Speech Rhythms | MEG

An MEG dataset of 33 Dutch speakers listening to carrier sentences at two speeds (fast: 5.5 Hz; slow: 3 Hz) followed by a stable target window containing a target word that belongs to a minimal pair contrasting in vowel length.

RadboudU –– Paper

Kösem Vocoder Training | MEG

MEG and intelligibility data for 31 Dutch speakers who listened to 2-band and 4-band vocoded speech twice: once in a pre-training phase and then again after they were trained to listen to 4-band vocoded speech (post-training).

RadboudU –– Paper

MacIntyre Vocoded Sherlock Story | EEG

Data from 38 English speakers (with no experience in Dutch) who heard a Sherlock Holmes story in both languages across three vocoding conditions: unprocessed, 16-band, and 16-band with a decay slope.

OSF –– Paper

Reetzke Mandarin Tone Learning | EEG (FFR)

Behavioral and neural data from 20 English L1 listeners trained to identify Mandarin lexical tones over the course of 4-13 days until their identification accuracy rivaled 13 Mandarin L1 listeners, then further trained for ten days and tested again eight weeks post-training.

Mendeley-Behavior –– Mendeley-Neural –– Paper

Tutorials

Many amazing scientists have created useful (and free!) tutorials to do, well, just about anything. This is a collection of tutorials that I have found particularly useful for learning skills that support my neurolinguistic research.

Programming with Software Carpentry

The most common programming languages for neurolinguistic data analysis are MATLAB and Python. Python and R are widely used in acoustics. R is popular for statistical modeling while Python is considered top-of-class for machine learning. With these three languages, a neurolinguist has a broad foundation on which to explore and analyze data. (Note: R and Python are both free to download and install. For MATLAB, you may have access through your institution. The actual tutorials below are free.)

R –– Python –– MATLAB

Signal Processing with Mike X. Cohen

Cohen is the master of time-series data analysis. He has multiple books and Udemy courses, as well as a plethora of freely-available tutorials on YouTube (with accompanying code available on GitHub!).

YouTube –– GitHub

EEG Preprocessing + Simple ERPs

A series of informal trainings led by Dr. George Buzzell using the MADE preprocessing pipeline to prepare EEG data for analysis. Use the readmes in 2021 and 2022 (in order) to access video links and materials.

GitHub

Linear Mixed Effects Modeling for Linguistics

Two very straightforward tutorials on the basics of mixed effects modeling by cognitive linguist Dr. Bodo Winter. An excellent place to start for mixed effects modeling.

Paper + Tutorial

Neuromatch Computational Neuroscience

A thousand tutorials in one! The Neuromatch team makes their annual, three-week intensive course freely available, with videos and hands-on tutorials using Jupyter notebooks in either Google Colab or Kaggle. Ignore the “Projects” part (which is only for summer participants), and start with W0D0 (Week 0, Day 0).

Website

Statquest

Dr. Josh Starmer’s hilarious, accessible videos can help you work through any statistical confusion.

YouTube –– Website

3Blue1Brown

Sometimes, you just need a visual. Grant Anderson’s videos will change how you view linear algebra and calculus forever.

Website

HarvardX Fundamentals of Neuroscience

This is a series of three online courses (free if you select the Audit Track) that offers an excellent introduction to the basics of neuroscience, including the biochemistry of how neurons work, how they function as ensembles, and how they combine to the make the brain.

Website



Got suggestions? Head over to the GitHub repo (red button on the left) and create a new issue!