November 6, 2003 - American National Corpus
Download MP3 (Right-click or option-click the link.)
AA: I'm Avi Arditti with Rosanne Skirble, and this week on Wordmaster -- the American National Corpus!
RS: In linguistics, a corpus is a body of words. It's language collected in context, from books, poems, recorded conversations, newspapers, broadcasts -- any way language is used. Researchers, dictionary publishers, advertising writers and other users can easily search the collection by computer to find out how people naturally use language.
AA: There's a British National Corpus. And now there's one for American English. Developers of the American National Corpus want to collect one-hundred-million core words -- and perhaps several hundred million more, to provide the broadest possible selection of texts and genres.
RS: Paulo Quaglio is a Brazilian-born research assistant for the American National Corpus.
QUAGLIO: "One of the things that we have learned is that native speaker intuition is very often inaccurate. So I'm going to give you an example from my personal research. So let me ask you a question. If you had to choose between these two sentences that I'm going to give you, which one would you choose. And there's no right or wrong, OK?"
RS: "All right."
QUAGLIO: "And the first is, 'I haven't seen you in two years.' And the second is, 'I haven't seen you for two years.' What would you say, in or for?"
AA: "I would say 'in,' and I would think it sounds more British English to say 'for.'"
RS: "'I haven't see you for two years,' 'I haven't seen you in two years.' To me, it doesn't make a big difference."
AA: "Well ... "
QUAGLIO: "OK, that's interesting. So that is a question that corpus linguistics can answer. So what do we do, we go to this huge body of language and we search for those occurrences and we try to find patterns of occurrence. Let me just guess one thing -- this is not always true, but Avi, would you be younger than Rosanne?"
AA: "Well, yes."
QUAGLIO: "OK, so what our research has shown is that in American English this construction is used with 'in' 75 percent of the time, and it seems to be correlated with speaker age.
AA: "Wait, so the sentence 'I haven't seen you in two years' -- 'IN two years' rather than 'FOR two years' -- is more likely to be used by younger Americans, than 'for two years.'"
RS: "Maybe it's because I've just been with my 85-year-old mother. [laughter]"
QUAGLIO: "Both are used, but in British English 'for' occurs 98 percent of the time."
RS: "That's interesting, because I was recently in England. Maybe all these influences of being around my 85-year-old mother and being in England ... "
AA: "You were only there for a week!"
QUAGLIO: "Or it could also be the region that you're from here in the United States."
AA: "You know, I'm curious, with the influences of instant media and television and the Internet and so forth, and I know you've just released the first installment of 10 million words -- "
AA: " -- is it safe to assume that by now, maybe some of those words aren't used as frequently as maybe they were five or 10 years ago or whenever they were first compiled?"
QUAGLIO: "Well, language is constantly changing, and I can give you an example -- for example, the use of the adverbial intensifier 'so.' So when you take a look at grammars for example, you see there that 'so' modifies an adjective or an adverb. So, for example, 'she's so beautiful,' 'oh that was so beautifully done.' But then when you take a look at a more recent corpus, you see that 'so' is modifying a verb and it's also modifying a noun. Examples are 'oh I so want to do that.' Right, you've probably heard that before?"
QUAGLIO: "So this is really new. So we heard that a lot here on campus and we said 'oh, this is kind of interesting,' and then we saw that on this particular television show that I am analyzing. And so somebody says 'hey, you're so the man!' Hey, 'man' is a noun."
AA: "Or my daughter will say 'that's so not right.'"
QUAGLIO: "Exactly. This is the second point. Why not say, for example, 'this is so untrue'? What is the difference between that and 'this is so not true'?"
RS: Paulo Quaglio is a research assistant for the American National Corpus, and a doctoral candidate at Northern Arizona University in Flagstaff, Arizona. The corpus will be updated and distributed freely for non-commercial research purposes.
AA: Commercial use will be limited at first to members of a consortium. These include publishers, software companies and academic members. The Web site for the project is americannationalcorpus -- all one word -- dot o-r-g. We'll post a link at our site.
RS: Our address is voanews.com/wordmaster. And don't forget our e-mail address. It's firstname.lastname@example.org. With Avi Arditti, I'm Rosanne Skirble