0% found this document useful (0 votes)
94 views145 pages

Perl For Linguists: Michael Hammond

This document appears to be a slide presentation on using Perl for linguistic research and analysis. It introduces why programming is useful for linguists, especially Perl, and then outlines some key things linguists can do with Perl, like collecting and analyzing data, modeling linguistic theory, and gaining general professional skills. It also provides examples of Perl programs for tasks like finding verbs in text and running experiments. The document aims to teach basic Perl concepts and help linguists learn a little Perl.

Uploaded by

Ángel Molina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views145 pages

Perl For Linguists: Michael Hammond

This document appears to be a slide presentation on using Perl for linguistic research and analysis. It introduces why programming is useful for linguists, especially Perl, and then outlines some key things linguists can do with Perl, like collecting and analyzing data, modeling linguistic theory, and gaining general professional skills. It also provides examples of Perl programs for tasks like finding verbs in text and running experiments. The document aims to teach basic Perl concepts and help linguists learn a little Perl.

Uploaded by

Ángel Molina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 145

Perl for Linguists

Michael Hammond

University of Arizona

Perl for Linguists – p.1/38


Perl for linguists

Perl for Linguists – p.2/38


Perl for linguists
Why programming?

Perl for Linguists – p.2/38


Perl for linguists
Why programming? Why Perl?


Perl for Linguists – p.2/38


Perl for linguists
Why programming? Why Perl?


Learn a little Perl.




Perl for Linguists – p.2/38


Perl for linguists
Why programming? Why Perl?


Learn a little Perl.




More advanced Perl. . .




Perl for Linguists – p.2/38


Why programming?

Perl for Linguists – p.3/38


Why programming?
Collect data


Perl for Linguists – p.3/38


Why programming?
Collect data


Analyze data


Perl for Linguists – p.3/38


Why programming?
Collect data


Analyze data


Model theory


Perl for Linguists – p.3/38


Why programming?
Collect data


Analyze data


Model theory


General professional skills




Perl for Linguists – p.3/38


Code for this tutorial
All the code for this presentation is available over
the web at the following URL:

http://linguistics.arizona.edu/~hammond/berkeley.html

Perl for Linguists – p.4/38


Unzipping the programs
1. Download the file programfiles.zip by
right-clicking on the link, selecting ‘save target
as’, and selecting your desktop as the
destination.
2. Unzip the file by double-clicking on the
downloaded file on your desktop and moving
all the files there to a new directory on the
desktop.
3. Open the MS-DOS prompt in the new
directory by double-clicking on the file
doswindow.bat.
Perl for Linguists – p.5/38
Collecting data

Perl for Linguists – p.6/38


Collecting data
Running experiments locally (expprog.pl)


Perl for Linguists – p.6/38


Collecting data
Running experiments locally (expprog.pl)

Running experiments locally with a GUI


(tkexp.pl)

Perl for Linguists – p.6/38


Collecting data
Running experiments locally (expprog.pl)

Running experiments locally with a GUI

(tkexp.pl)
Running experiments remotely

(Bailey & Hahn replication, bhrep.cgi)

Perl for Linguists – p.6/38


Collecting data
Running experiments locally (expprog.pl)

Running experiments locally with a GUI


(tkexp.pl)
Running experiments remotely

(Bailey & Hahn replication, bhrep.cgi)


Assembling corpora from local static

resources (makecorpus.pl)

Perl for Linguists – p.6/38


Collecting data
Running experiments locally (expprog.pl)

Running experiments locally with a GUI


(tkexp.pl)
Running experiments remotely

(Bailey & Hahn replication, bhrep.cgi)


Assembling corpora from local static

resources (makecorpus.pl)
Assembling corpora from nonlocal dynamic

resources: the web (websearch.pl)

Perl for Linguists – p.6/38


Analyzing data

Perl for Linguists – p.7/38


Analyzing data
Looking for patterns (visgrep.pl)

Perl for Linguists – p.7/38


Analyzing data
Looking for patterns (visgrep.pl)


Counting things (neightk.pl)




Perl for Linguists – p.7/38


Analyzing data
Looking for patterns (visgrep.pl)


Counting things (neightk.pl)




Finding verbs (verbs.pl)




Perl for Linguists – p.7/38


Modeling theory

Perl for Linguists – p.8/38


Modeling theory
Optimality Theory (web interface,


sylpars.pl)

Perl for Linguists – p.8/38


Modeling theory
Optimality Theory (web interface,


sylpars.pl)
N-gram models (a bunch of examples from a


course on Statistical NLP that I did recently)

Perl for Linguists – p.8/38


General professional skills

Perl for Linguists – p.9/38


General professional skills
General programming skills


Perl for Linguists – p.9/38


General professional skills
General programming skills


Web programming


Perl for Linguists – p.9/38


Why Perl?

Perl for Linguists – p.10/38


Why Perl?
Free


Perl for Linguists – p.10/38


Why Perl?
Free


Multi-platform


Perl for Linguists – p.10/38


Why Perl?
Free


Multi-platform


Easy


Perl for Linguists – p.10/38


Why Perl?
Free


Multi-platform


Easy


Multiple dialects


Perl for Linguists – p.10/38


Why Perl?
Free


Multi-platform


Easy


Multiple dialects


Powerful regular expression tools




Perl for Linguists – p.10/38


Why Perl?
Free


Multi-platform


Easy


Multiple dialects


Powerful regular expression tools




Written by a “linguist”


Perl for Linguists – p.10/38


Why Perl?
Free


Multi-platform


Easy


Multiple dialects


Powerful regular expression tools




Written by a “linguist”


Perl poetry


Perl for Linguists – p.10/38


Why Perl?
Free


Multi-platform


Easy


Multiple dialects


Powerful regular expression tools




Written by a “linguist”


Perl poetry


Obfuscated perl, “japhs”, etc.




Perl for Linguists – p.10/38


Learn a little perl. . .

Perl for Linguists – p.11/38


Learn a little perl. . .
Windows basics


Perl for Linguists – p.11/38


Learn a little perl. . .
Windows basics


Perl syntax


Perl for Linguists – p.11/38


Learn a little perl. . .
Windows basics


Perl syntax


IO


Perl for Linguists – p.11/38


Learn a little perl. . .
Windows basics


Perl syntax


IO


Regular expressions


Perl for Linguists – p.11/38


Learn a little perl. . .
Windows basics
Perl syntax
IO
Regular expressions
Where to find out more

Perl for Linguists – p.11/38


Windows basics

Perl for Linguists – p.12/38


Windows basics
It is easiest to invoke perl programs from the
!

DOS prompt. (If configured properly, you can


just double-click them though.)

Perl for Linguists – p.12/38


Windows basics
It is easiest to invoke perl programs from the
"

DOS prompt. (If configured properly, you can


just double-click them though.)
perl myprogram.pl: invokes the perl
"

interpreter with a program myprogram.pl.

Perl for Linguists – p.12/38


Windows basics
It is easiest to invoke perl programs from the
#

DOS prompt. (If configured properly, you can


just double-click them though.)
perl myprogram.pl: invokes the perl
#

interpreter with a program myprogram.pl.


ctrl-c: stops the current program
#

Perl for Linguists – p.12/38


A goal
We will build our efforts around an example
$

program: a program that parses a text file in


English into sentences and then tries to find
all the verbs (verbs.pl).
This exemplifies simple data and control
$

structures and what we might call


“computational linguistic reasoning”.
We won’t get to the full version of this
$

program—it’s infinitely expandable


actually—but we can get a good start.

Perl for Linguists – p.13/38


Programming overview

Perl for Linguists – p.14/38


Programming overview
Implement some idea as perl code (write the
%

program).

Perl for Linguists – p.14/38


Programming overview
Implement some idea as perl code (write the
&

program).
Convert code into something the computer
&

can execute (run perl on your code).

Perl for Linguists – p.14/38


Programming overview
Implement some idea as perl code (write the
'

program).
Convert code into something the computer
'

can execute (run perl on your code).


Program execution (run perl on your code).
'

Perl for Linguists – p.14/38


Programming overview
Implement some idea as perl code (write the
(

program).
Convert code into something the computer
(

can execute (run perl on your code).


Program execution (run perl on your code).
(

Results (what happens as a consequence).


(

Perl for Linguists – p.14/38


Perl syntax

Perl for Linguists – p.15/38


Perl syntax
A perl program is a series of statements.
)

Perl for Linguists – p.15/38


Perl syntax
A perl program is a series of statements.
*

Statements can be organized into groups.


*

Perl for Linguists – p.15/38


Perl syntax
A perl program is a series of statements.
+

Statements can be organized into groups.


+

Statements are operations on some bit of


+

data, e.g. “print this string”, “add these


numbers”, etc.

Perl for Linguists – p.15/38


Perl syntax
A perl program is a series of statements.
,

Statements can be organized into groups.


,

Statements are operations on some bit of


,

data, e.g. “print this string”, “add these


numbers”, etc.
Groups of statements can apply:
,

1. only when specific conditions hold, or


2. more than once, etc.

Perl for Linguists – p.15/38


Statements

Perl for Linguists – p.16/38


Statements
A statement includes some operation
-

(typically marked with following parentheses),


ends with a semicolon, and may contain
more. For example:

Perl for Linguists – p.16/38


Statements
A statement includes some operation
.

(typically marked with following parentheses),


ends with a semicolon, and may contain
more. For example:
print("Hello!"); (prints the string
.

“Hello!”)

Perl for Linguists – p.16/38


Statements
A statement includes some operation
/

(typically marked with following parentheses),


ends with a semicolon, and may contain
more. For example:
print("Hello!"); (prints the string
/

“Hello!”)
rand(5); (gets a random number between 0
/

and 5)

Perl for Linguists – p.16/38


Statements
A statement includes some operation
0

(typically marked with following parentheses),


ends with a semicolon, and may contain
more. For example:
print("Hello!"); (prints the string
0

“Hello!”)
rand(5); (gets a random number between 0
0

and 5)
localtime(); (gets the current time)
0

Perl for Linguists – p.16/38


Try it

Perl for Linguists – p.17/38


Try it
1. Open the DOS window (through the start
menu or by clicking on the doswindow.bat
icon.)

Perl for Linguists – p.17/38


Try it
1. Open the DOS window (through the start
menu or by clicking on the doswindow.bat
icon.)
2. Type edit myprogram.pl

Perl for Linguists – p.17/38


Try it
1. Open the DOS window (through the start
menu or by clicking on the doswindow.bat
icon.)
2. Type edit myprogram.pl
3. Type print("I am a linguist!");

Perl for Linguists – p.17/38


Try it
1. Open the DOS window (through the start
menu or by clicking on the doswindow.bat
icon.)
2. Type edit myprogram.pl
3. Type print("I am a linguist!");
4. Select Save and then Exit from the File
menu.

Perl for Linguists – p.17/38


Try it
1. Open the DOS window (through the start
menu or by clicking on the doswindow.bat
icon.)
2. Type edit myprogram.pl
3. Type print("I am a linguist!");
4. Select Save and then Exit from the File
menu.
5. Type perl myprogram.pl at the prompt.

Perl for Linguists – p.17/38


Variables

Perl for Linguists – p.18/38


Variables
Notice how rand() and localtime() don’t
1

seem to do anything.

Perl for Linguists – p.18/38


Variables
Notice how rand() and localtime() don’t
2

seem to do anything.
What they do is return a string. You can save
2

that result and then print it:

Perl for Linguists – p.18/38


Variables
Notice how rand() and localtime() don’t
3

seem to do anything.
What they do is return a string. You can save
3

that result and then print it:


$myvariable = localtime();
print($myvariable);

Perl for Linguists – p.18/38


Variables

Perl for Linguists – p.19/38


Variables
Simple “scalar” variables: $myvar,
4

$aVariable, $x.

Perl for Linguists – p.19/38


Variables
Simple “scalar” variables: $myvar,
5

$aVariable, $x.
A value is assigned to a variable with the
5

assignment operator =.

Perl for Linguists – p.19/38


Variables
Simple “scalar” variables: $myvar,
6

$aVariable, $x.
A value is assigned to a variable with the
6

assignment operator =.
Variables can hold numbers, strings, etc.
6

Perl for Linguists – p.19/38


Reading a file

Perl for Linguists – p.20/38


Reading a file
First use open() to open a file and associate
7

it with a handle, e.g.


open(F, "myfile.txt");.

Perl for Linguists – p.20/38


Reading a file
First use open() to open a file and associate
8

it with a handle, e.g.


open(F, "myfile.txt");.
Read a line from the file with record-reading
8

operator: F .
9

Perl for Linguists – p.20/38


Reading a file
First use open() to open a file and associate
;

it with a handle, e.g.


open(F, "myfile.txt");.
Read a line from the file with record-reading
;

operator: F .
<

When you’re done, close the handle:


;

close(F);.

Perl for Linguists – p.20/38


Sample code
open(F,"myfile.txt");
$line = <F>;
print($line);
close(F);

Perl for Linguists – p.21/38


Control structures
if: conditional application
>

while: iteration as long as some condition is


>

true
for: iteration for a specific number of times
>

foreach: iteration for every member of a list


>

Perl for Linguists – p.22/38


More sample code
open(F,"myfile.txt");
while ($line = <F>) {
print($line);
}
close(F);

Perl for Linguists – p.23/38


Arrays

Perl for Linguists – p.24/38


Arrays
Array variables: a list of variables grouped
?

together: @myarray, @thisBigArray,


@them.

Perl for Linguists – p.24/38


Arrays
Array variables: a list of variables grouped
@

together: @myarray, @thisBigArray,


@them.
Adding an element to the end of an array:
@

push(@myarray, "hat");.

Perl for Linguists – p.24/38


Arrays
Array variables: a list of variables grouped
A

together: @myarray, @thisBigArray,


@them.
Adding an element to the end of an array:
A

push(@myarray, "hat");.
Retrieving an element from the end of an
A

array: $it = pop(@myarray);.

Perl for Linguists – p.24/38


Arrays
Array variables: a list of variables grouped
B

together: @myarray, @thisBigArray,


@them.
Adding an element to the end of an array:
B

push(@myarray, "hat");.
Retrieving an element from the end of an
B

array: $it = pop(@myarray);.


Retrieving a specific element:
B

print($myarray[4]); (array indices start


at 0).
Perl for Linguists – p.24/38
Sample code again
open(F,"myfile.txt");
while ($line = <F>) {
push(@mylines, $line);
}
close(F);
foreach $theline (@mylines) {
print($theline);
}

Perl for Linguists – p.25/38


Regular expressions

Perl for Linguists – p.26/38


Regular expressions
“Regular expression”: a restrictive way of
C

indicating string patterns.

Perl for Linguists – p.26/38


Regular expressions
“Regular expression”: a restrictive way of
D

indicating string patterns.


$myvar =~ /regexp/: does $myvar
D

contain the regular expression?

Perl for Linguists – p.26/38


Regular expressions
“Regular expression”: a restrictive way of
E

indicating string patterns.


$myvar =~ /regexp/: does $myvar
E

contain the regular expression?


$myvar =~ s/regexp/string/: if
E

$myvar contains the regular expression,


replace it with the string.

Perl for Linguists – p.26/38


The verb-finding program

Perl for Linguists – p.27/38


The verb-finding program
Recall that the program breaks an English
F

text file into sentences and does its best at


finding all the verbs.

Perl for Linguists – p.27/38


The verb-finding program
Recall that the program breaks an English
G

text file into sentences and does its best at


finding all the verbs.
An hour is not enough time to figure out how
G

to do all of this, but just the bits we’ve covered


are a lot of it (verbs.pl).

Perl for Linguists – p.27/38


Where to find out more

Perl for Linguists – p.28/38


Where to find out more
The free ActiveState Perl we’re using comes
H

with extensive web-based documentation.

Perl for Linguists – p.28/38


Where to find out more
The free ActiveState Perl we’re using comes
I

with extensive web-based documentation.


In any perl implementation the perldoc
I

command can be used to find out lots and


lots of stuff.

Perl for Linguists – p.28/38


Where to find out more
The free ActiveState Perl we’re using comes
J

with extensive web-based documentation.


In any perl implementation the perldoc
J

command can be used to find out lots and


lots of stuff.
The official and best perl website is
J

www.cpan.org, but see also www.perl.org.

Perl for Linguists – p.28/38


Advanced stuff

Perl for Linguists – p.29/38


Advanced stuff
Let’s now do some more advanced stuff.
K

Perl for Linguists – p.29/38


Advanced stuff
Let’s now do some more advanced stuff.
L

‘Advanced’ in terms of perl.


L

Perl for Linguists – p.29/38


Advanced stuff
Let’s now do some more advanced stuff.
M

‘Advanced’ in terms of perl.


M

‘Advanced’ in terms of linguistics.


M

Perl for Linguists – p.29/38


Advanced perl

Perl for Linguists – p.30/38


Advanced perl
Object-oriented programming and public perl
N

modules

Perl for Linguists – p.30/38


Advanced perl
Object-oriented programming and public perl
O

modules
Tk (editperl.pl, visgrep.pl)
O

Perl for Linguists – p.30/38


Advanced perl
Object-oriented programming and public perl
P

modules
Tk (editperl.pl, visgrep.pl)
P

Remote computing (bhexp.cgi, dbiex.pl,


P

websearch.pl)

Perl for Linguists – p.30/38


Object-oriented programming

Perl for Linguists – p.31/38


Object-oriented programming
Object-oriented (OO) programming is another
Q

style of programming.

Perl for Linguists – p.31/38


Object-oriented programming
Object-oriented (OO) programming is another
R

style of programming.
OO Programs are a network of things or
R

objects, not a list of commands.

Perl for Linguists – p.31/38


Object-oriented programming
Object-oriented (OO) programming is another
S

style of programming.
OO Programs are a network of things or
S

objects, not a list of commands.


Objects have their own data and specialized
S

functions for dealing with their own data.

Perl for Linguists – p.31/38


Object-oriented programming
Object-oriented (OO) programming is another
T

style of programming.
OO Programs are a network of things or
T

objects, not a list of commands.


Objects have their own data and specialized
T

functions for dealing with their own data.


Objects can be refer to other objects or inherit
T

the properties of other objects.

Perl for Linguists – p.31/38


Translating to OO

Perl for Linguists – p.32/38


Translating to OO
Original verb-finding program: verbs.pl.
U

Perl for Linguists – p.32/38


Translating to OO
Original verb-finding program: verbs.pl.
V

OO verb-finding program: verbsOO.pl.


V

Perl for Linguists – p.32/38


Caveats

Perl for Linguists – p.33/38


Caveats
OO programs are longer.
W

Perl for Linguists – p.33/38


Caveats
OO programs are longer.
X

OO programs are slower.


X

Perl for Linguists – p.33/38


Caveats
OO programs are longer.
Y

OO programs are slower.


Y

OO programming in perl is not orthodox OO


Y

(lots of “oddments”).

Perl for Linguists – p.33/38


Caveats
OO programs are longer.
Z

OO programs are slower.


Z

OO programming in perl is not orthodox OO


Z

(lots of “oddments”).
00 programming isn’t intuitive for most folks.
Z

Perl for Linguists – p.33/38


Caveats
OO programs are longer.
[

OO programs are slower.


[

OO programming in perl is not orthodox OO


[

(lots of “oddments”).
00 programming isn’t intuitive for most folks.
[

Unfortunately, 00 programming is essential for


[

some modules.

Perl for Linguists – p.33/38


Tk

Perl for Linguists – p.34/38


Tk
Tk is an independent language for making
\

graphical user interfaces.

Perl for Linguists – p.34/38


Tk
Tk is an independent language for making
]

graphical user interfaces.


There is a special perl module so that you
]

can build GUIs indirectly using Tk.

Perl for Linguists – p.34/38


Tk
Tk is an independent language for making
^

graphical user interfaces.


There is a special perl module so that you
^

can build GUIs indirectly using Tk.


This module is widely available for Unix/Linux
^

and Windows, but not clear if it’s available for


Macs.

Perl for Linguists – p.34/38


Tk
Tk is an independent language for making
_

graphical user interfaces.


There is a special perl module so that you
_

can build GUIs indirectly using Tk.


This module is widely available for Unix/Linux
_

and Windows, but not clear if it’s available for


Macs.
If you want to write programs in Perl that
_

really look like modern computer programs,


then you need to use Tk.
Perl for Linguists – p.34/38
GUI coding

Perl for Linguists – p.35/38


GUI coding
Tk programs use OO programming style;
`

buttons, windows, etc. are all “objects”.

Perl for Linguists – p.35/38


GUI coding
Tk programs use OO programming style;
a

buttons, windows, etc. are all “objects”.


You create these objects and then your
a

program waits for the user to interact with


them.

Perl for Linguists – p.35/38


GUI coding
Tk programs use OO programming style;
b

buttons, windows, etc. are all “objects”.


You create these objects and then your
b

program waits for the user to interact with


them.
makecorpus.pl makecorpusTk.pl
b

Perl for Linguists – p.35/38


Remote computing

Perl for Linguists – p.36/38


Remote computing
CGI (“Common Gateway Interface”):
c

programs that run remotely (generating


javascript and HTML: bhexp.cgi)

Perl for Linguists – p.36/38


Remote computing
CGI (“Common Gateway Interface”):
d

programs that run remotely (generating


javascript and HTML: bhexp.cgi)
Interacting with local or remote databases
d

(generating sql: dbiex.pl)

Perl for Linguists – p.36/38


Remote computing
CGI (“Common Gateway Interface”):
e

programs that run remotely (generating


javascript and HTML: bhexp.cgi)
Interacting with local or remote databases
e

(generating sql: dbiex.pl)


Interacting with the web generally
e

(websearch.pl)

Perl for Linguists – p.36/38


Modeling

Perl for Linguists – p.37/38


Modeling
Perl not good for computational clarity; not
f

well-defined

Perl for Linguists – p.37/38


Modeling
Perl not good for computational clarity; not
g

well-defined
Perl exceptionally good at string processing
g

Perl for Linguists – p.37/38


Modeling
Perl not good for computational clarity; not
h

well-defined
Perl exceptionally good at string processing
h

Computational tasks as string processing:


h

sylpars.pl

Perl for Linguists – p.37/38


What now?

Perl for Linguists – p.38/38


What now?
Programming and perl hopefully
i

demystified. . .

Perl for Linguists – p.38/38


What now?
Programming and perl hopefully
j

demystified. . .
Some ideas about what you can do with it if
j

you’re thinking that you need to program.

Perl for Linguists – p.38/38


What now?
Programming and perl hopefully
k

demystified. . .
Some ideas about what you can do with it if
k

you’re thinking that you need to program.


If you already know some perl, perhaps some
k

other ideas about what to do with what you


already know.

Perl for Linguists – p.38/38

You might also like