Eprints-2.2-Docs - (EPrints 2.2 Documentation)
Eprints-2.2-Docs - (EPrints 2.2 Documentation)
2 Documentation
Christopher Gutteridge
1 Introduction 5
1.1 What is GNU EPrints? . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Should I be installing EPrints 2, how much effort will it take? . . 5
1.3 What will it run on? . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 This Documentation . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Required Software 7
2.1 What Additional Software does EPrints Require? . . . . . . . . . 7
2.2 MySQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Apache/mod perl . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Perl 5.6 and Perl Modules . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Optional GDOME support . . . . . . . . . . . . . . . . . . . . . 10
6 Configuring an Archive 27
6.1 EPrints Archive Configuration . . . . . . . . . . . . . . . . . . . 27
6.2 XML Config Files in EPrints . . . . . . . . . . . . . . . . . . . . 29
6.3 The Primary Archive Configuration File . . . . . . . . . . . . . . 32
6.4 ArchiveConfig.pm . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.5 ArchiveMetadataFieldsConfig.pm . . . . . . . . . . . . . . . . . . 37
3
4 CONTENTS
6.6 ArchiveOAIConfig.pm . . . . . . . . . . . . . . . . . . . . . . . . 49
6.7 ArchiveRenderConfig.pm . . . . . . . . . . . . . . . . . . . . . . . 50
6.8 ArchiveTextIndexingConfig.pm . . . . . . . . . . . . . . . . . . . 52
6.9 ArchiveValidateConfig.pm . . . . . . . . . . . . . . . . . . . . . . 53
6.10 citations-languageid.xml . . . . . . . . . . . . . . . . . . . . . . . 54
6.11 metadata-types.xml . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.12 phrases-languageid.xml . . . . . . . . . . . . . . . . . . . . . . . . 57
6.13 ruler.xml . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.14 The static/ directory . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.15 subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.16 template-languageid.xml . . . . . . . . . . . . . . . . . . . . . . . 58
7 Troubleshooting 61
7.1 Trouble Shooting . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.2 Installation of EPrints and Required Software . . . . . . . . . . . 61
7.3 Setting Up and Configuring a New Archive . . . . . . . . . . . . 62
7.4 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
8 How-To Guides 63
8.1 HOW TO: Set up a Complex Custom View . . . . . . . . . . . . 63
8.2 HOW TO: Add a New Field . . . . . . . . . . . . . . . . . . . . . 65
8.3 HOW TO: Remove a Field . . . . . . . . . . . . . . . . . . . . . 67
8.4 HOW TO: Add a new eprint type . . . . . . . . . . . . . . . . . 68
8.5 HOW TO: Remove an eprint type . . . . . . . . . . . . . . . . . 68
8.6 HOW TO: Add a new document type . . . . . . . . . . . . . . . 68
8.7 HOW TO: Add a Discussion Forum for Each EPrint . . . . . . . 68
8.8 HOW TO: Make the latest additions to your archive appear on
your main website . . . . . . . . . . . . . . . . . . . . . . . . . . 69
8.9 HOW TO: Add full text searching . . . . . . . . . . . . . . . . . 70
8.10 HOW TO: Make the referencetext field link to the items referenced 70
8.11 HOW TO: Make the password controled parts of the site use
HTTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.12 HOW TO: Customise the way the the search results are formatted 71
Introduction
7
8 CHAPTER 1. INTRODUCTION
Required Software
2.2 MySQL
Tested on: 3.23.29a-gamma
Install a recent version of MySQL 3. You will need the .h and library files
later to install the MySQL perl module. MySQL 4 is due soon, but we are not
making plans to support it yet (if you try EPrints with MySQL 4 and it works,
please let us know)
If installing from RPM you require: mysql-server, mysql-devel and mysql
RPMs.
9
10 CHAPTER 2. REQUIRED SOFTWARE
Apache is the most commonly used webserver in the world, and it’s free!
EPrints requires Apache to be configured with mod perl, as this allows Apache
modules that are entirely written in perl, hence providing much improved effi-
ciency.
Get Apache from http://httpd.apache.org/dist/httpd/
EPrints requires that the apache modules mod perl and mod rewrite be
enabled.
% make
% make install
( mod perl should have already run the apache ./configure script for us. )
% gunzip FOO-5.23.tar.gz
% tar xf FOO-5.23.tar
% cd FOO-5.23
% perl Makefile.PL
% make
% make test
% make install
Data::ShowTable
MySQL Interface Module requires this.
DBI
Tested with: v1.14
MySQL Interface Module requires this.
Msql-Mysql Module
Tested with: v1.2215
This one can be tricky. It requires access to .h and library files from
MySQL. I install MySQL from source first, but some installs of MySQL
don’t put the lib and include dirs where this module expects. The answer
to the first question is that you only need MySQL support.
Under Red Hat’s GNU/Linux distribution, the zlib-devel RPM should
be installed before you install this module.
12 CHAPTER 2. REQUIRED SOFTWARE
MIME::Base64
Tested with: v2.11
Unicode::String requires this.
Unicode::String
Used for Unicode support. No known problems. Tested with v2.06.
XML::Parser
Tested with v2.30
Used to parse XML files. Requres the expat library. A .tar.gz and an
RPM are available in the tools dir on eprints.org.
Apache
The perl Apache.pm module is acutally part of mod perl - installing
mod perl as part of Apache should also have installed the perl Apache
module.
http://rpmfind.net/linux/rpm2html/search.php?query=libxml2
http://rpmfind.net/linux/rpm2html/search.php?query=libxml2-devel
2.5. OPTIONAL GDOME SUPPORT 13
http://gdome2.cs.unibo.it/#downloads
You may either use the RPMs (gdome2 and gdome2-devel) or the tarball.
File uploads
wget, tar, gunzip and unzip are required to allow users to upload files as
.tar.gz or .zip or to captures them from a URL.
These all come installed with most modern versions of linux. If you can
get them working, you can remove the option by edditing ”archive formats” in
SystemSettings.pm
Tested with wget 1.6.
If there are problems you may need to tweak how these are invoked in Sys-
temSettings.pm
Latex Tools
There is an optional feature which allows you to set eprints to look in certain
fields (eg. title and abstract) for stuff which looks like latex equations and
display it as an image of that equation instead. These tools are only required if
you want to use this feature.
latex, dvips and convert (convert is part of ”imagemagick”). (These all
ship with Red Hat’s GNU/Linux distribution but you may have to install them
yourself on other systems.)
This is a ”cosmetic” feature, it only affects the rendering of information, so
you can always add it later if you want to save time initially.
14 CHAPTER 2. REQUIRED SOFTWARE
Chapter 3
3.1 Installation
(If you are upgrading an existing installation of eprints please see the section
on upgrading elsewhere in this manual.)
EPrints needs to be installed as the same user as the apache webserver runs
as. We suggest you install it as user ”eprints” and group ”eprints”. Under some
UNIX platforms, creating a user and group can be done using the ”adduser”
command. Otherwise refer to your operating system documentation.
Unpack the eprints tar.gz file:
% gunzip eprints-2.something.tar.gz
% tar xf eprints-2.something.tar
Now run the ”configure” script. This is a /bin/sh script which will attempt
to locate various parts of your system such as the perl binary. It will also check
your system for required components.
% cd eprints-2.something
% ./configure
By default the system installs as user and group ”eprints”. You will need to
change this if you are not installing as either ”root” or ”eprints”.
The configure script accepts a number of options. All are optional. The
most important are:
–help
List all the options (many are intended for compiled software and are
ignored).
15
16 CHAPTER 3. HOW TO INSTALL EPRINTS (AND GET STARTED)
–prefix=PREFIX
Where to install EPrints (or look for a version to upgrade). By default
/opt/eprints2/
–with-perl=[PATH ]
Path of perl interpreter (in case configure can’t find it, or you have more
than one and want to use a specific one).
–with-user=[USER ]
Install eprints to run as USER. By default ”eprints”.
–with-group=[GROUP ]
Install eprints to run as GROUP. By default ”eprints”.
–with-virtualhost=[VIRTUALHOST ]
Use VIRTUALHOST rather than * for apache VirtualHost directives.
–disable-diskfree
Disable disk free space calls. This will be automatically set if configure
fails its tests for the df call.
–with-toolpath=[PATH ]
An alternate path to search for the required binaries.
Once you are happy with your configuration you may install eprints by
running install.pl:
% ./install.pl
Now you should edit the configuration file for your copy of apache. This is
often /usr/local/apache/conf/http.conf or /etc/httpd/conf/httpd.conf
Add this line: (If you didn’t install eprints in /opt/eprints2/ replace that
with the location on your system).
Include /opt/eprints2/cfg/apache.conf
You may also wish to change the user and group apache runs as. The user
must be the same as the user you installed eprints as. We recommend:
User eprints
Group eprints
% bin/erase_archive ARCHIVEID
The following commands will generate the initial database tables, the initial
website and the apache configuration files to run this archive:
% bin/generate_apacheconf ARCHIVEID
% bin/create_tables ARCHIVEID
% bin/import_subjects ARCHIVEID
% bin/generate_static ARCHIVEID
% bin/create_user ARCHIVEID USERID EMAIL admin PASSWORD
% bin/generate_views ARCHIVEID
Where USERID, EMAIL and PASSWORD are your choice for the initial
administration account. Once you have made this account you can create new
accounts via the web interface.
For more information on what these commands do, see the last section of
this documentation or use the –man option.
After running generate apacheconf or modifying the configuration you must
restart your webserver for the changes to take effect. The example below to
stop and start Apache might not work on your system - if you have a problem
consult the apache documentation.
% /etc/rc.d/init.d/httpd stop
% /etc/rc.d/init.d/httpd start
Do not just use ’reload’ or ’restart’ as these do not force mod perl to reload
the perl modules, and EPrints currently only reads the configuration when the
PERL modules are loaded.
18 CHAPTER 3. HOW TO INSTALL EPRINTS (AND GET STARTED)
Backups
You should also have made sure that the system is being properly backed up.
This is gone into in more detail elsewhere in the documentation.
OAI
We would also encourage you to configure the OAI support for your archive and
register it. It’s quite easy - pretty much fill in the blanks in the ArchiveOAICon-
fig.pm file in the archive configuration directory.
EPrints 2.1 support OAI versions 1 and 2 at URL paths /perl/oai and
/perl/oai2.
Once you register your archive (at http://www.openarchives.org) various
search systems will be able to collect the metadata (titles, authors, abstract
etc.) and allow more people to find records in your archive.
See http://www.openarchives.org/ for more information on the OAI proto-
col. For more information setting up the OAI interface archive see the section
in this documentation about Configuring an Archive.
Setting it up
This is best done by using the UNIX ”cron” command (as user ”eprints”). Cron
will email ”eprints” on that machine with the output, so best use the –quiet
option so it only bothers you with errors.
How often you want to run this depends on the size of your archive, and
how fast the contents changes. This feature is roughly order ”n”. Which means
if you double the number of items in your archive then you double the time it
takes to run (ish).
Once an hour would seem a good starting point. If your archive gets real
big, say more than 10000 records, then maybe once a day is more realistic - the
one thing that you don’t want to happen is for a new generate views to start
before the old one finishes as they will mess up each others output.
Run generate views on the command line to find out how long it takes.
and add the line
23 * * * * /opt/eprints2/bin/generate_views I<archiveid>
This runs at 23 minutes past each hour. If you have more than one archive,
don’t make them all start rebuilding stuff at the same time, stagger it. Otherwise
once an hour everything will slow down as it fights to run several intensive scripts
at once.
See the crontab man page man 5 crontab for more information on using
cron.
3.5 Subscriptions
Subscriptions provide a way in which users of your system can receive regu-
lar updates, via email, when new items are added which match a search they
specified.
To automate sending out these subscriptions you must add some entries in
the crontab (as for views). You need one set of these per archive.
For example:
Note the spacing out so that all 3 don’t start at once and hammer the
database. You may wish to change the times, but we recommend early morning
as the best time to send them (midnight-6am).
4.1 Terms
This is a definition of some terms used in the eprints documentation and com-
ments. Many of these are ”objects” within the code and the perl module which
handle them is listed.
archive
EPrints::Archive
EPrints::Session
A session is created every time a cgi script or a bin script is executed, and
terminated afterwards.
eprint
EPrints::EPrint
An eprint is a record in the system which has one or more documents and
some metadata. Usually, more than one document is to provide the same
information in multiple formats, although this is not compulsary.
document
21
22 CHAPTER 4. EPRINTS STRUCTURE AND TERMS
EPrints::Document
user
EPrints::User
A user registered with the system. (NOT necesarily the author of the
eprints they deposit).
subject
EPrints::Subject
A subject has an id and a list of who it’s parents are. There is a build in
subject with the id ”ROOT” to act as the top level. A subject can have
more than one parent to allow you to create a rich lattice, rather than just
a tree, but loops are not allowed.
dataobj or item
EPrints::DataObj
The ”super class” of subjects, users, eprints and documents. In the very
core of the system these are all treated identically and much of the con-
figuration and methods of these classes of ”thing” are identical. We use
the term item to speak about the general case.
dataset
EPrints::DataSet
database
EPrints::Database
The connection to the MySQL back end. datasets are stored in the MySQL
system, but you do not have to address it directly.
fields or metadata fields
EPrints::MetaField
A single field in a dataset. Each dataset has a few ”system” fields which
eprints uses to manage the system and then any number of archive specific
fields which you may configure.
subscriptions (sometimes called alerts in other archives)
EPrints::Subscription
MetaField
Each apache sub
<<External Application>>
0..n process only handles
Apache & mod_perl
one session at a
+system field given time, but may
Data in the 0..n handle many
+archive field 1..n
Database sequentially.
The database version of
this data can be
represented by more than 1 1
one DataObj at a time in
multiple Sessions. <<Actual Data>> 1
Item 1 1..n
1..n <<External Application>>
n Apache subprocess
1 DataSet 7 1 Archive Session 1
1
1
0..n 1
1
DataObj 1
1
0..n
1
EPrint 1 Document SearchExpression
User 1 n 1..n 0..n SearchField Database
1
1
0..n
0..n
+parent
CHAPTER 4. EPRINTS STRUCTURE AND TERMS
File: H:\eprints.mdl 14:02:57 12 February 2002 Class Diagram: Logical View / Main Page 1
Chapter 5
apache.conf
This file is generated by generate apacheconf. See the documentation
of generate apacheconf for more information.
auto-apache.conf
This file is generated and overwritten by generate apacheconf. Do not
edit it directly. See the documentation of generate apacheconf for more
information.
auto-apache-includes.conf
This file is generated and overwritten by generate apacheconf. Do not
edit it directly. See the documentation of generate apacheconf for more
information.
languages.xml
This XML file contains an (exhaustive) list of all ISO language ID’s and
their names.
system-phrases-languageid.xml
One of these files per language needed for any archive in this system. These
files contain the phrases needed to render the website and email in each
25
26 CHAPTER 5. CONFIGURING THE SYSTEM
language, not counting names of things like metadata fields which vary
between archives. It should not be edited by hand, but may be overridden.
See the instructions on phrase files in the archive config documentation.
SystemSettings.pm
Described below.
SystemSettings.pm
This is a perl module which is created and edited by the eprints installer script
when installing or upgrading EPrints. It’s found in perl-lib/EPrints/
SystemSettings contains system specific things:
base path
The root directory of your eprints install. Normally /opt/eprints2/
executables
A hash of the path of various external commands such as sendmail and
wget.
invocation
A hash of how eprints is to invoke various external commands. The vari-
ables with uppercase names - $(FOO) - are replaced with parameters from
eprints, the lowercase names - $(sendmail) - are replaced with the strings
in executables.
archive formats
An array of id’s of archive formats offered in the upload document page.
For each their must be an entry in the archive extension and invocation,
$(DIR) is the where eprints wants the contents of the archive and $(ARC)
is the archive file.
version id
The id of the current eprints version.
version
The human readable version number.
user
The UNIX user eprints will run as. Usually ”eprints”.
group
The UNIX user eprints will run as. Usually ”eprints”.
virtualhost (Since v2.1)
If this is set, it is used for the VirtualHostName in the Apache configura-
tion files. (By default EPrints uses ”*”).
5.1. EPRINTS GENERAL CONFIGURATION 27
Configuring an Archive
apache.conf
This file is generated by generate apacheconf. See the documentation
of generate apacheconf for more information.
ArchiveConfig.pm
The general configuration items which don’t fit anywhere else are in this
perl module. It is described fully later in this section of documentation.
This module ”requires” the other 5 perl modules. They are in seperate
files to make them easier to get to grips with.
29
30 CHAPTER 6. CONFIGURING AN ARCHIVE
ArchiveMetadataFieldsConfig.pm
This module configures the metadata fields and the default values.
ArchiveOAIConfig.pm
This module configures how the archive exports itself via the Open Archives
protocol.
ArchiveRenderConfig.pm
This module contains subroutines which handle rendering the data into
XHTML (mostly) for display as webpages.
ArchiveTextIndexingConfig.pm
This module handles turning UTF8 text strings into lists of index words
for free text searches.
ArchiveValidateConfig.pm
This module contains subroutines which check the metadata for problems.
auto-apache.conf
This file is generated and overwritten by generate apacheconf. Do not
edit it directly. See the documentation of generate apacheconf for more
information.
citations-languageid.xml
One of these files for each languageid supported by this archive. These
XML files describe how to turn metadata for an item into a citation (with
markup). They are described fully later in this section of documentation.
entities-languageid.dtd
One of these files for each languageid supported by this archive. These
DTD files are generated automaticly just before eprints loads the archives
configuration and should not be edited directly.
metadata-types.xml
This XML file describes the various types of eprints, users etc. and which
metadata fields are required or relevant to each. It is described fully later
in this section of documentation.
phrases-languageid.xml
One of these files for each languageid supported by this archive. These
XML files contain all the phrases which are specific to this archives such as
the titles of metadata fields. They are described fully later in this section
of documentation.
6.2. XML CONFIG FILES IN EPRINTS 31
ruler.xml
This XML file just contains the horizontal divider used in webpages cre-
ated by the system. It is described fully later in this section of documen-
tation.
static/
This directory contains the data needed to create the static webpages such
as the homepage, and about page. It is described fully later in this section
of documentation.
subjects
This file contains the initial subjects for the system. It is described fully
in the documentation for import subjects.
template-languageid.xml
One of these files for each languageid supported by this archive. These
XML/XHTML files describe the outline for webpages for this system.
They are described fully later in this section of documentation.
XHTML
This files use HTML elements (and other elements too). XHTML is a fairly new
version of HTML which is back compatable with HTML 4 but written using
XML not SGML. This means that it is much stricter but less ambiguous and
easier to parse and modify. Assuming you know HTML, the main differences
are as follows:
<img SRC=someurl>
<hr NOSHADE WIDTH=2>
<P>Foo bar</P>
And that’s more or less it. See http://www.w3c.org/ for a complete descrip-
tion.
Extra Entities
The XML files all use a DTD which defines a few extra entities. Entities are
items in XML (or HTML) which start with ”&” and end with ”;” like &.
These additional entities come from the entities DTD file created by gener-
ate entities. One DTD is created per language, although currently the only
variation is the archive name.
&archivename;
The name of the archive in the current language.
&adminemail;
The administrators email address.
6.2. XML CONFIG FILES IN EPRINTS 33
&base url;
The base URL of the system (without a trailing slash)
&perl url;
The base URL of the CGI directory (without a trailing slash)
&frontpage;
The URL of the system homepage.
&userhome;
The URL of the user homepage.
&version;
The current EPrints version.
&ruler;
The XHTML of the standard divider.
None of these entities are not available in the citations file or the ruler file.
<host>stoatprints.org</host>
<host>
The hostname of this archive.
<alias redirect=”yes-or-no”>
This is optional and may be repeated. It has the attribute ”redirect” which
may be set to yes or no. This controls what virtual hosts are supported
and if they should redirect to the main <host>.
<language>
The ISO id of a language supported by this archive. Repeatable. One of
these should also be the defaultlanguage. See below.
<port>
The port number that the server is running on. Usually 80.
<urlpath>
The directory from the root of the server name. Usually /
<archiveroot>
The filesystem path of the rest of the archive configuration.
<configmodule>
The path to the perl module which does the main configuration (Archive-
Config.pm)
<dbname>
The name of the MySQL database. Usually the same as the archive ID.
6.3. THE PRIMARY ARCHIVE CONFIGURATION FILE 35
<dbhost>
The host on which MySQL is running. Usually localhost.
<dbport>
An optional MySQL port, if it’s not the standard one. Should be empty
if we are to use the default.
<dbsock>
An optional MySQL socket. Should be empty if we are to use the default.
<dbuser>
The username to use when connecting to MySQL, usually ”eprints”.
<dbpass>
The password to use to connect to MySQL.
<defaultlanguage>
One of the supported language. This is the default for this archive.
<adminemail>
The email address of the archive administrator. I strongly suggest that
this is an alias rather than a personal email address. If all your webpages
contain ”[email protected]” and bill takes over from bob you would have to
regenerate every page with ”[email protected]”. Much better to set up an
email alias or forward from ”[email protected]” and point it at
bob (for now). Heed these words spoken from grim experience!
<archivename language=”langcode”>
The name of the archive. This has an attribute ”language” the value of
which is an iso language id. There should be one of these archivename
elements per supported language. eg.
6.4 ArchiveConfig.pm
This module imports the other 5 perl modules. It allows lots of little tweaks to
the system, which are all commented in the file.
It includes options to hide various features you may not want and to cus-
tomise the browse, search and subscription functions.
Also you can customise what each type of user can and can’t do, and how
they authenticate their passwords.
This configuaration file contains perl methods which are called when a session
starts and ends, to log things, to generate the entities for the entities file and
security on non public files.
Browse Views
The browse views are generated by the script ”generate views” and what that
script does is configured by the ”browse views” item in the config.
It is a reference to a perl array [], each item of which is a hash {}.
The hash has 3 required properties and a number of optional ones.
id (required)
The ID of this view - the view will be placed in a subdirectory of /views/
of this name. The ID is also used to identify the full name of this view in
the phrase file. id=>"foo" would find it’s title in the phrase ”view-
name eprint foo”
fields (required)
The list of the names of the fields to browse, seperated by a slash ”/”.
This should normally be a single field unless you want to merge the values
of two fields. The id part of a field may be specified by appending ”.id”
to the fieldname.
order (required)
A list of fields to sort by in order of priority, sepearted by slashes ”/”.
A minus sign prefixing the fieldname ”-” indicates reverse sorting on that
field.
allow null
Should we make a page for the ”unset” condition? A page for items which
do not have a year set may be useful. But for other fields this may be
meaningless. Set it to 1 for true.
include
Generate a file for every value, ending in ”.include” which contains the
XHTML of the citations of records and the number of records, but without
wrapping the site standard template around it.
6.4. ARCHIVECONFIG.PM 37
nohtml
Normally the system generates a page like that described for ”include”
with a .html suffix and the site template. If nohtml is set to 1 then it
won’t.
citation
Normally the citation used is that for the ”type” of eprint. If this is set
then that citation (from the citations file) will be used for all items. This
allows for some clever stuff if you want to make page which can get sucked
into another website.
Normally the system puts a paragraph tag around each citation, but if
you use a custom citation this will not happen.
nocount
Do not include the count of how many items at the top of the page.
nolink
The system generates an index.html in /view/ with a list of all the
browse views available. Setting nolink to 1 will hide this item.
noindex
Do not generate an index.html file in /view/foo / listing all the values
of the view and linking to their respective pages.
notimestamp (since v2.2)
Do not add the timestamp at the bottom of the view page.
hideempty (since v2.2)
Only applicable to subjects. This option will supress subjects which do
not have any records in. This is useful on ”young” archives which look
very empty if you have a large subject tree and only a few records, and
those clustered in 3 or 4 subjects.
The most common view is to browse by subject:
{ id=>"subject", allow_null=>0, fields=>"subjects",
order=>"title/authors", hideempty=>1 }
A more complex view generates a view on author & editor ID’s which are
not advertised but may be captured by some other software to build staff CV
pages.
{ id=>"person", allow_null=>0, fields=>"authors.id/editors.id",
nohtml=>1, nolink=>1, noindex=>1, include=>1,
order=>"-year/title" }
For my example person id ”wh” this will generate a webpage called /view/person/wh.include
(and one for each other value of authors or editors ID’s) which can be captured
by an external automated system.
38 CHAPTER 6. CONFIGURING AN ARCHIVE
User Privs
The user permission configuration allows you to set what types of user can and
can’t do. The user home page will only show a user options which they can do.
New types of user, and which data about themselves they can edit is set in
metadata-fields.xml.
Permissions are set by ”type” of user. By default there are 3 kinds of user:
”user”, ”editor” and ”admin”.
Admin can, by default, do everything.
6.5 ArchiveMetadataFieldsConfig.pm
Fields Configuration
Metadata is data about data. The information which we store to describe each
record (eprint) in the system. Users also have metadata.
This module is the configuration for the metadata. This is probably the
most important part of the system.
The system automatically assigns some fields to each dataset (users, eprints,
etc.) such as ”type” to eprints and ”username” to users. The majority of the
fields are optional, and configured in this module.
Fields have a number of properties. The only required properties are ”name”
and ”type”. Name is the name of the field. This is used to identify this through-
out the system. The other properties depend on what type the field is.
When you add a field you need to add the ”human readable” version in
the phrase file, this seperation allows you to change the description without
changing the field itself. When you add a field named ”foo” to the ”eprint”
metadata you should add ”eprint typename foo” to the phrases. You may also
wish to add ”eprint typehelp foo” which is the explanation given to the user on
the metadata input page.
The following types of field are supported, along with their special property
options.
int
Optional properties: digits
This type describes a positive integer. Stored as an INT in the database.
year
This type describes a year. It works pretty much like ”int” but is always
4 digits long. Stored as an INT in the database.
longtext
Optional properties: input rows, input cols, search cols
This type describes an unlimited length text field. Used for things like
titles and abstracts. It can’t be effiently searched as a single value, the
system indexes the words. See ”free text indexing” section. Stored in
MySQL as a TEXT field.
40 CHAPTER 6. CONFIGURING AN ARCHIVE
date
This type describes a date, always expressed as yyyy-mm-dd, eg. 1969-
05-23. It is stored as a DATE in the database.
boolean
Optional properties: input style
This is a simple yes/no field which is stored in the database as SET(
’TRUE’,’FALSE’ ). It can be rendered as a menu, a check box or radio
buttons. (See input style)
name
Optional properties: input name cols, search cols
This type is used to store names of people (eg. authors). It is split into 4
parts: honourific, given names, family name and lineage. This may seem
over fussy but it avoids people putting ”Reverend” in the given names or
”Junior” in the family name. If you dislike this you can hide honourific
and lineage (See ArchiveConfig.pm).
We use ”family name” rather than ”last name” in the hope of avoiding
international confusion (some countries list family name first, so their last
name is what I would call their ”christian”, or ”first”, name.
Names are stored using 4 SQL fields. The name field ”supervisor” would
be stored as supervisor honourific, supervisor given, supervisor family, su-
pervisor lineage. Each is a VARCHAR(255).
set
Required properties: options
Optional properties: input rows, search rows
This type is a limited set of options. The list of options must be specified.
Each option must also be added to the phrase file. Option ”foo” of field
”bar” in the ”user” dataset will have the phrase id ”user fieldopt bar foo”.
Stored in the database as a VARCHAR(255), containing the id of the option.
text
Optional properties: input cols, maxlength, search cols
This is a simple text field. It normally has a maximum length of 255
ASCII characters, less if non-ASCII characters are used as these are UTF-
8 encoded.
Stored in the database as a VARCHAR(255).
secret
Identical to ”text” except that the input field is a starred-out password
input field, and it is only ever written to the database, it can’t be read
back. Writing an empty value will NOT change the previous value.
6.5. ARCHIVEMETADATAFIELDSCONFIG.PM 41
url
Identical to ”text” except it is rendered and validated differently.
email
Identical to ”text” except it is rendered and validated differently.
subject
Optional properties: top, showtop, showall, input rows, search rows
This is a hierarchical subject tree. At first glance it works like sets, but
it can be searched for all items in or below a given subject. Subjects may
be added to the live system.
The subject tree starts at a subject with the id ”ROOT” but a subject
field only offers all the items below the subject with the id ”subjects”.
This can be changed using the ”top” property, so that you can have two
fields which options are different parts of the same tree.
Subjects may have more than one parent. eg. biophysics can appear under
both physics and biology, while still being the same subject.
See the bin/import subjects manpage for more information on seting up
the initial subjects.
You may have more than one ”subject” field, eg. Subject and Department,
with unrelated parts of the subject tree as their ”top”.
A later version of eprints2 will have a feature which allows an admin user
to limit an editor user to a certain subject (and things below it). So that in
the above example you can declare an editor of either a Subject (capital-S)
or a Department.
pagerange
A range of pages, eg 1-44. Currently not searchable.
Stored in the database as a VARCHAR(255).
datatype
Required properties: datasetid
Optional properties: input rows, search rows
This field works like a set, but gets its options from the types of the dataset
specified.
For example, if you specified the datasetid ”user” then, unless you’ve
changed the defaults, would give the options ”user”,”editor” and ”admin”
- which are the types of user specified in metadata-types.xml.
Options are:
user
The types of user.
42 CHAPTER 6. CONFIGURING AN ARCHIVE
document
The types of document.
eprint
The types of eprint.
security
Security levels of a document (probably not very useful).
language
All the languages specified in languages.xml
arclanguage
The languages supported by this archive. Configured in ArchiveCon-
fig.pm. Stored in the database as a VARCHAR(255).
langid
This is used internally, it contains an ISO language ID. You probably
don’t want to use it. Stored as a CHAR(16).
id
This is also used internally, it contains the ID part of a field with the
hasid property. Don’t use it! Stored in the database as a VARCHAR(255).
Field Properties:
”status” indicates either ”system” or ”cosmetic” or ”other”. ”system” prop-
erties cannot be changed without erasing and recreating your archive. ”cos-
metic” fields only effect the display of data and can be safely changed. ”other”
is explained in the description.
name
Status: system
Required by: all
Default: NO DEFAULT
The name of the field. Strongly recommended to only be lowercase a-z
only.
6.5. ARCHIVEMETADATAFIELDSCONFIG.PM 43
type
Status: system
Required by: all
Default: NO DEFAULT
The type of field. One of the list described above.
browse link
Status: cosmetic
Optional on: all
Default: undef
This is the id of a ”browse” view. This will hyperlink this value to the
browse for that value when rendering it.
confid
Status: cosmetic
Internal use only. Sets the confid if a field is being created without a
dataset. The confid is used as a fake dataset for generating phrase ids.
datasetid
Status: other
Required by: datatype
Default: NO DEFAULT
Used to set which dataset’s types are this fields options.
Changing this on a live system could cause some confusion, as values in
the old dataset may exist.
digits
Status: cosmetic
Optional on: int
Default: 20
Maximum number of digits for this number.
input rows
Status: cosmetic
Optional on: longtext, set, subject, datatype
Default: set in ArchiveConfig.pm
The number of input rows in a text area, or options to display at once in a
menu. Setting to 1 will make a pull down menu (unless this is a ”multiple”
field).
44 CHAPTER 6. CONFIGURING AN ARCHIVE
search cols
Status: cosmetic
Optional on: text, longtext, url, email, name, id
Default: set in ArchiveConfig.pm
The width of the search field. If searching multiple fields at once then the
value is taken from the first field in the list.
search rows
Status: cosmetic
Optional on: datatype, set, subject
Default: set in ArchiveConfig.pm
The number of items to display in a search field list. If searching multiple
fields at once then the value is taken from the first field in the list.
input cols
Status: cosmetic
Optional on: text, longtext, url, email
Default: set in ArchiveConfig.pm
The width of the input field.
input id cols
Status: cosmetic
Optional on: fields with ”hasid” set.
Sets the width of the ID input field on a field with an ID.
Default: set in ArchiveConfig.pm
input boxes
Status: cosmetic
Optional on: fields with ”multiple” set.
Default: set in ArchiveConfig.pm
How many boxes to initially show on a multiple field.
input style
Status: cosmetic
Optional on: boolean
Default: undef
By default booleans render as a check box. These other formats look a
bit clearer on the input field:
menu
Display as a pull-down menu. You will need to set the phrases
dataset fieldopt fieldname TRUE and dataset fieldopt fieldname FALSE
(where dataset & fieldname are the ids of the dataset and field).
These are the menu options.
radio
Display as radio buttons (ones which deselect when you select another
one). You will need to set the phrase dataset radio fieldname. This
phrase should have two ”pin” elements: true and false, which are the
positions to place the radio buttons.
fromform
Status: cosmetic
Optional to: all
Default: undef
A reference to a perl function which will process the value from the form
before storing it. The function will be passed ($value, $session) where
value is the value from the form and session is the current EPrints::Session.
It should return the processed value.
This could be used, for example, to turn a username ”moj199” into a
userid ”312” for internal user.
toform
Status: cosmetic
Optional to: all
Default: undef
A reference to a perl function which will process the value just before it
is displayed in the form. The function will be passed ($value, $session)
46 CHAPTER 6. CONFIGURING AN ARCHIVE
where value is the value from the database and session is the current
EPrints::Session. It should return the processed value.
This could be used, for example, to turn a userid ”312” being used inter-
nally by your systems into more human-friendly username ”moj199”.
If you use toform then you should probably set fromform to change your
values back again.
maxlength
Status: cosmetic
Optional to: text, email, url, secret
Default: 255
The maximum length of the value.
hasid
Status: system
Optional to: all
Default: 0
This adds an additional ”ID” property to the field. This is most useful on
a ”name” field which is ”multiple”. It associates an additional value with
the name, for example a username, or email address, which can be used
to uniquely identify that person. If you want to get an accurate list of all
of someones papers then their name is NOT good enough.
You might also wish to make a ”publication” text field have an ID which
is an optional ISSN, but it makes more sense in ”multiple” fields.
multilang
Status: system
Optional to: all (but silly for date, year, int, boolean)
Default: 0
If set this makes the field ”multilingual”. That is to say it can have more
than one value, one value per language.
For example, the ”canadian stuff” archive may wish to make your title
and abstract multilang so that authors can enter them in both french and
english.
This is more useful than having title en and title fr as eprints understands
it and can render the version of the field appropriate to the viewer (if they
set a language preference).
multiple
Status: system
Optional to: all (but silly for date, year, int, boolean)
6.5. ARCHIVEMETADATAFIELDSCONFIG.PM 47
Default: 0
If set this property makes the field a LIST rather than one value and
handles rendering it as a list and inputing it. The input field will appear
with a default of 3 inputs and a ”more spaces” button which will reload
the page with more if you need more than 3.
This causes the field to be stored in a seperate SQL table.
options
Status: other
Required by: set
Default: NO DEFAULT
This should be a array of options. eg.
required
Status: system
Optional to: all
Default: 0
This indicates that this field is always required. It is not recommended to
set this, but rather indicate requirednes of fields by type in the metadata-
types.xml file.
Either way you set it, required fields will cause the item they are in to fail
to validate unless the field has a value.
requiredlangs
Status: other
Optional to: fields with ”multilang” property
Default: []
A list of languages which are required for this multilang field. eg. you
can force an ”en” (english) entry, while allowing them to optionally add
others.
eg. [ ”en”, ”fr” ]
A list of codes can be found in languages.xml
Adding more requiredlangs does not magically give you values for these
languages in existing data.
48 CHAPTER 6. CONFIGURING AN ARCHIVE
showall
Status: cosmetic
optional to: subjects
Default: 0
By default subjects are only shown if they are ”depositable”. This option
makes all subjects, depositable or not, options.
showtop
Status: cosmetic
optional to: subjects
Default: 0
If set then the topmost item in the subject is shown. Usually this is a
container, eg. ”subjects”, and should remain hidden.
top
Status: cosmetic
optional to: subjects
Default: ”subjects”
Sets the top node in the tree. The options are all the children (and their
children).
idpart
Used internally.
mainpart
Used internally.
render single value
Status: cosmetic
Optional to: all
Default: undef
This overrides the rendering of a single item. In a multiple, multilang field
it will be called on each value of the language to display.
This is a reference to a function which takes ( $session, $field, $value )
and returns a XHTML DOM fragment.
Set this to \&EPrints::Latex::render string to make eprints try and spot
latex in this fields values and render it as images instead!
(Since EPrints v2.1) Set this to \&EPrints::Utils::render xhtml field to
make eprints read this field as XML and place that XML right in the
XHTML web page. (Normally the system would escape all the greater-
than and less-than characters.
6.5. ARCHIVEMETADATAFIELDSCONFIG.PM 49
render value
Status: cosmetic
Optional to: all
Default: undef
This is a reference to a function which will render the entire value of the
field, overriding eprints own renderer. It should take as parameters: (
$session, $field, $value, $alllangs, $nolink )
The function should return an XHTML DOM fragment.
If $alllangs is set then the function should render all values on a multilang
field, rather than just the ”best” one.
If $nolink is set then no HTML anchor links should be used, eg. to link a
URL.
export as xml
Status: cosmetic
Optional to: all
Default: 1
If this attribute is set to zero then this field will be ommitted from the
output of the XML export script.
$value =~ m/^(.)([0-9]+)$/;
return sprintf( "%s%08d", $1, $2 );
This would turn a2 into a00000002 and a11 into a00000011 which will sort
correctly alphabetically. Don’t worry - these values are only ever used for
sorting, they should never get output.
50 CHAPTER 6. CONFIGURING AN ARCHIVE
You should probably use the bin/reindex command on the dataset in ques-
tion (probably ”archive” or ”user” after adding or changing this property
to a field. This may take a significant amount of time.
fieldnames
Status: cosmetic(ish)
Required by: search
Default: NO DEFAULT
This should be a reference to an array of field names - exactly like the
ones used in ArchiveConfig.pm to configure search, advanced search and
subscriptions.
Adding fields to this will cause no problem. Removing fields will mean
that those fields are ignored when turning values of this field back into
searches.
Defaults
This section of the file contains subroutines which are called to set default values
for Users, Documents and EPrints.
Automatics
These functions let you set automatic fields. This allows you to make fields
which are updated automatically each time the item (User/EPrints/Document)
is commited to the database.
This allows you to create ”compound” fields. Such fields are created by
processing the values of other fields rather than being edited directly.
For example, if you wanted to make an automatic int field which contains the
number of authors, you could add the following to set eprint automatic fields:
6.6 ArchiveOAIConfig.pm
This module configures how the archive exports its data via the OAI protocol.
52 CHAPTER 6. CONFIGURING AN ARCHIVE
For more inforamtion on the how and why of OAI see http://www.openarchives.org/
OAI allows a harvestor to request the metadata from your archive and other
archives to provide a federated search. The next time the harvestor harvests
your archive it only has to ask for items which have changed or been added
since last time it asked.
The current version of EPrints supports both OAI 1.1 and OAI 2.0.
The base URL for your OAI v1.1 interface will be http://archivepath/perl/oai
The base URL for your OAI v2.0 interface will be http://archivepath/perl/oai2
If you want to use the OAI system then you need to fill in the blanks, such
as policy and the OAI-id of the archive.
You may create OAI sets in a similar manner to ”browse views” in Archive-
Config.pm.
If you want to change the way that an EPrint is mapped into Dublin Core
then edit the make metadata oai dc - which returns a DOM XML object.
To add a new metadata type you need to add a new mapping function and
add entries to the namespaces, schemas and functions items near the top of the
file.
6.7 ArchiveRenderConfig.pm
This module contains fuctions which turn data into XHTML for displaying on
the web.
If you want to change the way a user info page, or an eprint ”abstract” page
is rendered then here’s the place to do it.
There are also ”full” versions of these functions which display all the internal
variables and things. These are the views which the editors and admin see.
The XHTML is generated using DOM (Document Object Model), but eprints
provides some functions for easily generating XHTML DOM. The only method
of DOM you should need to use is appendChild - which adds an element to this
element.
% perldoc /opt/eprints2/perl_lib/EPrints/Archive.pm
The functions most useful to extacting and rendering information are docu-
mented here:
$session->render ruler();
Returns the default ruler for the archive (from ruler.xml).
$session->render link( $uri, $target )
Returns the XHTML element (with URI properly escaped):
<a href="uri"></a>
Which you can appendChild stuff into. If $target is specified then a target
attribute is included - to make it pop up a new window.
$item->render value( $fieldname, $showall )
$item is either an EPrint, a User or a Document.
$fieldname is the name of the field you want to render. If $showall is 1
then ALL values are rendered in a multilang field.
$item->render citation( $style )
Renders the citation of the item using the citation for the item’s type from
the citation file.
If $style is set then it uses the citation with that id instead.
54 CHAPTER 6. CONFIGURING AN ARCHIVE
6.8 ArchiveTextIndexingConfig.pm
This module you probably won’t need to change unless you want to modify how
eprints does searches for words in strings.
When a record is added to the system eprints uses this module to turn a
string into a list of values which are indexed. By default these are words with 3
letters or more except some predefined stop words. It also turns latin characters
with acutes into the their plain ascii (no acute/grave) versions.
It then does the same with the search string and looks for these keys.
Example:
6.9 ArchiveValidateConfig.pm
This module handles validating data entered by users. Each subroutine is de-
scribed in more detail in the module itself.
Each subroutine returns a list of DOM elements, each of which describing a
single problem. Any problems will prevent the user from continuing with editing
until they correct the problems.
As with the rendering functions, if you don’t care about making this work
in more than one language then you can just make the DOM items by calling
$session-¿make text( ”problem explanation” )
The eprint & document validation routines have a flag $for archive which,
if true, indicates that the item is being checked before going into the actual
archive. You can use this to force an editor to enter fields which the user may
leave blank.
Validation Functions
validate field
Called for all fields. Use it to check individual field values. By default
checks that url’s look OK.
validate eprint meta
Check the metadata of an eprint. Use this to test dependencies between
fields. eg. if you have a requirement that field ”A” OR field ”B” must be
set.
validate eprint
Validate the whole eprint. The last part of the validation of an eprint.
56 CHAPTER 6. CONFIGURING AN ARCHIVE
6.10 citations-languageid.xml
The ciations file describes how to render an item (eprint/user/whatever) into
a short piece of XHTML. Each citation has a ”type”. There are 3 kinds of
citation:
default citation
This is a very short description of the item. Usually ”the title or failing
that, the id”. The type id is just the name of the dataset. eg. ”eprint”
type citation
These are richer descriptions which vary between type of eprint, user or
document. The type id is dataset type eg. eprint preprint.
other citation
Used by custom browse views. Any name you like.
The citation file contains a list of citation elements:
<ep:citation type="...">
Each one may contain text and tags. The text may also include the names of
fields in the record being rendered. These names should be between @ symbols.
eg. @authors@ or @title@. These will be replaced with a rendered version of
the value in that field. (if you need an actual @ symbol for some reason two
@@ with nothing inside will be rendered as a single @).
Note. The @title@ style was introduced in EPrints 2.2. Before that this file
used XML entities such as &title; but this caused problems and didn’t solve
any. Use of entities is still supported, but deprecated.
In addition you may use XHTML elements and the following elements in the
eprints namespace. These elements are always removed but they control if their
contents is kept or not. Conditional elements may be placed inside each other
since v2.2.
<ep:linkhere>
This element is replaced with an XHTML anchor linking to the item. If
this citation is being rendered without a link then it is just removed (but
not the contents).
6.10. CITATIONS-LANGUAGEID.XML 57
<ep:iflink>
The contents of this element are only preserved if we are rendering this
citation as a link. Maybe an icon which you don’t want if it’s not a link.
<ep:ifnotlink>
The opposite of iflink.
<ep:ifset ref=”fieldname”>
The contents of this element are only preserved if the field ”fieldname”
has a value.
<ep:ifnotset ref=”fieldname”>
The contents of this element are only preserved if the field ”fieldname”
does not have a value.
name
This is the name of one or more fields, specified as in the search fields
configuration. eg. ”title/abstract”
value
This is a value to search for. Treated like the value entered in a
search field.
merge (optional)
Can be ANY or ALL. Works like the match all? in a search form.
match (optional)
Can be IN, EQ, or EX. In, Equal or Exact. Exact on subjects means
that subject, but not any below it in the heirarchy.
For example:
This will render (approx) after years before 1950. Neat eh?
6.11 metadata-types.xml
This file allows you to configure the types of eprint, user, document and docu-
ment security level.
When you add a new type you should add it’s name to the archive phrases
file(s). The phraseid is ”dataset typename typename” eg. ”document typename pdf”,
and you should add a new citation to the citations file. Any fields which are
not required but appear in the citation should probably be inside a <ep:ifset>
so that you don’t get see ”UNSPECIFIED” if they are not, er, specified.
The main element is ”metadatatypes”. This contains a list of ”dataset”
elements each of which has a name attribute.
The ”type” elements in user and eprint ”dataset”s should contain a list of
”field” elements. This describes the fields which may be edited for this type and
the order that they appear on the form.
You may include system fields in this list, but be careful if you do.
Attributes for ”field” are:
name
The name of the metadata field.
required
If set to ”yes” then this field may not be left blank. Some system fields
are always required no matter how this is set.
staffonly
This field only appears on the ”editor” edit eprint form, not the user one.
Or, in the case of the user dataset, the staff edit-user page.
6.12 phrases-languageid.xml
This file contains a list of XML ”phrasees”. Everything eprints ”says” to users
is stored in this file and its system-level counterpart. If you want the site to run
in more than one language, you need one phrase file per language.
The phrase file is XML and contains a toplevel ”phrases” element. This
contains the list of phrases.
Each phrase has a ”ref” attribute to identify it and contains text and option-
ally some XHTML tags. It may also contain eprints entities such as &archive-
name; and also some phrases should contain ”pin” elements, described below.
The phrases in the archive phrase file are specific to that archive, the system
phrase file contains non-archive specific phrases. The id’s of most of the phrases
in the archive phrases are generated from the id’s of the fields, datasets, types
etc.
The archive phrase file contains: names of dataset types, names of metadata
fields, help on entering each Ametadata field, the names of options in ”set” fields,
the description of different search ordering options, names of browse views,
phrases used in the render and validation routines, mail which eprints sends out
and phrases which override those in the system file.
pins
Some phrases need some ”pin” elements to show eprints where to insert values.
Usually pins don’t contain any elements but occasionally they do when they
represent what to place a link around.
Emails
EPrints sends out emails when a user registers/changes their password, when a
user changes their email, when a deposited item is rejected/deleted by an editor
and when the system is low on resources. These mails can be customised in the
phrase file.
Make sure you wrap your text in paragraph <p> tags. EPrints will auto-
matically word wrap these in the email. <hr /> elements in a mail are turned
into a line of dashes.
When eprints sends a mail it will send it as plain ASCII text, unless it
contains latin-1 elements, in which case it will be latin-1 encoded. If it contains
unicode characters not in the latin-1 charset then it will be utf-8 encoded.
60 CHAPTER 6. CONFIGURING AN ARCHIVE
6.13 ruler.xml
This file configures the horizontal divider which eprints uses, which is inserted
in place of &ruler;
If you have no great dislike of ¡lt¿hr¡gt¿ horizontal rulers then you can leave
it alone.
You can’t use entities like &frontpage; in ruler.
6.15 subjects
This file is not used by the core eprints system. It is used by import subjects
to set up the initial subjects. For more information see the instructions for
import subjects.
6.16 template-languageid.xml
This file is the shell of every page in the system. It is more or less a normal
XHTML page but you can use the eprints &foo; entities in it and it should
contain ”pin” elements like a phrase. The pins it should contain are:
Troubleshooting
Possible cause: Missing the ndbm library which is required (for some reason).
Solution: It comes as part of gdbm which is free. If working from a package
you need gdbm-devel to get the header files (.h files).
63
64 CHAPTER 7. TROUBLESHOOTING
7.4 General
Solution: Build apache following the detailed instructions in the ”requried soft-
ware” section of the documentation.
How-To Guides
65
66 CHAPTER 8. HOW-TO GUIDES
This will generate a view which is NOT listed on the /view/ page and it
will not skip making the .html file and make a .include file per value. This will
only contain the ”count” of items and the XHTML of their citations. This can
be used as part of a page on the http://www.foobaars.ac.uk/ site; either by
using php and capturing it on-the-fly using readfile or scripting it with perl
or NFS exporting the filesystem onto the main server (or just doing it all on
one computer) and using server side includes to place it in a page.
<ep:citation type="project_table"><tr><td>&title;</td><td>&authors;</td>
<td>&year;</td><td><ep:linkhere><img
src="http://www.foobars.ac.uk/images/paperlink.png" alt="view" width="32"
height="64" border="0" /></ep:linkhere></td></tr></ep:citation>
That should generate for the manticore project, in the ”.include” file (I’ve
cut the contents of the ”img” tag for readability:
This will generate both .html pages and .include pages. An member of your
organisation can get a list of their records either by linking to /views/by person/alice.html
(where alice is their username) or by snarfing the URL /views/by person/alice.include
into his own homepage.
8.2. HOW TO: ADD A NEW FIELD 67
input rows being set to one will make it appear as a pull-down menu.
Add the Field to metadata-types.xml
If you want the user to be able to edit this field for any or all types of
eprint/user then you need to add it to each appropriate type in metadata-
types.xml (this can be changed on a live system without any serious con-
sequencies).
<ep:citation type="eprint_techreport"><ep:linkhere><span
class="citation">&authors; <ep:ifset name="year">(&year;)
</ep:ifset>&title;. Technical Report<ep:ifset name="reportno">
&reportno;</ep:ifset><ep:ifset name="department">, &department;
</ep:ifset><ep:ifset name="institution">, &institution;</ep:ifset>.
&local;.</span></ep:linkhere></ep:citation>
All we’ve done is add &local;. to the end. It’s not inside <ep:ifset
name="local"> as it is a required field and will (should) always be set.
Add it to the the Abstract (or View-User) page.
This is also optional. If you want it to appear on the web page for this item
then edit ArchiveRenderConfig.pm and select the appropriate function,
either eprint render or user render.
In our example we only want to mention items if an item was not produced
locally. We’ll add it below the documents and above the abstract...
Single language example:
Multiple-language example:
If you want to make it handle more than language then we’ll need to use
the archive phrase file - we would add something like this to each languages
file:
You may prefer to use this method even if you are only using a single
language.
If you add a field you will need to run erase archive and create tables before
you will see a change. EPrints will fail to run if you change the fields and do
rebuild the tables.
Using d3eprints.open.ac.uk
Just add the following code to the ArchiveRenderConfig.pm file just before the
if( $has multiple versions ) bit.
Please note that this code is not internationalised.
#####################################
# Begin D3Eprints links
$table->appendChild( _render_row(
$session,
$session->make_text( "D3Eprints discussion" ),
$ol ) );
readfile( "http://eprints.foo.org/perl/latest?mainonly=yes" );
72 CHAPTER 8. HOW-TO GUIDES
mkdir /opt/eprints2/cgi/local
and place your scripts in there, they will have URLs under http://yoursite.com/perl/
<form action="/perl/google_site">
<p>Use Google to search this site:
<input name="q" value="" />
<input type="submit" name="go" value="Search" />
</form>
For other search engines; see their documentation for how to make a form
to search only your site.
?xuversion=1.0&locspec=charrange:offset/length
75
76 CHAPTER 9. VLIT TRANSCLUSION SUPPORT
http://www.weebleprints.co.uk/archive/00000543/01/notfalldown.txt
?xuversion=1.0&locspec=charrange:1403/130
(All one URL, only split to fit on the page) This will return characters from
offset 1403 to 1533.
Human Mode
An optional ”mode” parameter may by used. The ”human” mode returns the
character range as HTML with characters like & properly escaped and new line
characters turned into HTML ”br” break tags. It will place two links before the
text: a (c) link which will link to an explanation of transcopyright - If you want
to change this URL you’ll have to hack VLit.pm - and a TRANS link which will
take you to the context of the quote - 1024 extra characters before and after
but with the quote highlighted in red. Clicking TRANS on the context view
will take you to the full raw document.
XML-Entity Mode
You may also set mode to be xml-entity, eg:
http://lemur1.ecs.soton.ac.uk/archive/00000134/01/xuDation-d18.txt?
locspec=charrange:10429/488&xuversion=1.0&mode=xml-entity
where startx,endx,starty,endy and n are all positive integers. Any parts may be
omitted: To specify the first 50 rows of page 3 locspec=area:page=3/vrange=,50.
78 CHAPTER 9. VLIT TRANSCLUSION SUPPORT
Chapter 10
Why Backup?
It is almost certain that you will be storing valuable information in your Eprints
server. Even assuming that the eprints code is 100% bug free and that you will
never delete 8000 records when you run the wrong script at 3am, you still need
to back up! Drives and fans break. Computers get stolen. Server rooms get
flooded (that happened to us!). Without proper backups this could be a disaster.
What to Backup
You need to backup two things.
The /opt/eprints2/ directory (or whatever you called it). Not all the
subdirectories have to be backed up, but it is much easier to backup the whole
thing. Make sure that you back up any symbolicly linked directories too.
Each MySQL database which your archives use. See the MySQL manual for
more information on backing up MySQL databases. The mysqldump command
will dump the whole of a database as a big list of SQL commands to re-create
it.
Best Practice
We strongly recommend that you:
* Regularly backup your EPrints archive and database.
* Keep mulitple sets of backups.
* Keep a recent backup physically sepearate from the archive - either in
another room or ideally another site.
* Regularly check that you can actually restore from your backup. It’s not
uncommon for people to produce a daily backup for years without checking it.
When they come to need it, they discover that something has gone wrong and
the backup is useless.
79
80 CHAPTER 10. BACKING-UP YOUR SYSTEM
* Assume that you will be restoring to different hardware - the tape drive
may be stolen or melted too, and you’ll be unabled to get one just the same ’cus
they stopped making them! Check that your backups work on hardware other
than that used to create them.
* Decide who is responsible for backups. Their responsibilities should include
making sure that the above policies are implemented even if they are ill or
unavailable and making sure that someone else knows how to take over making
and restoring the backups if they leave or are hit by a bus.
If you can’t do all of these, which is admittedly a lot of extra work, then do
as many as you can.
Fortune favours the backed-up. It always seems to be the un-backed-up
systems that have disk crashes. Life’s like that...
Chapter 11
eprints-tech
eprints-tech is the technical mailing list for people who want to argue over XML
processing and which libraries to use. UNIXy people.
To subscribe send an email to [email protected] containing the
text
subscribe eprints-tech
81
82 CHAPTER 11. PROBLEMS, QUESTIONS AND FEEDBACK
eprints-underground
eprints-underground is the non-technical mailing list for people who actually
care what the system does, not how. This is for discussion of metadata, politics
(how to get people to fill the damn data in) and what people actually plan to
use the eprints for.
To subscribe send an email to [email protected] containing the
text
subscribe eprints-underground
OAI Lists
The Open Archives Protocol has some mailing lists of its own, see the http://www.openarchives.org/
site for information on these.
Chapter 12
Generally speaking when upgrading EPrints v2 you should unpack and install
eprints to the same path as your current version. The installer will detect the
existing version and upgrade it. Existing files which have been altered by you
should be automatically backed up so they don’t get lost. But if your hacks are
important then you should probably back them up by hand before upgrading.
Always stop apache before upgrading.
Always make sure your system is fully backed up before upgrading.
Then you should follow the specific instructions for each stage from your old
version to your new version...
83
84 CHAPTER 12. UPDATING FROM PREVIOUS VERSIONS
Remove /opt/eprints2/archives/foobar/cfg/
Now install eprints 2.0 and agree to upgrade when it asks.
Copy the new default configuration into your archive dir:
% cp -R /opt/eprints2/defaultcfg/ /opt/eprints2/archives/foobar/cfg/
Work through the ”diff” you produced, and re-apply the changes to the
contents of /opt/eprints2/archives/foobar/cfg/.
You also need to execute the following SQL command.
• (This is not essential) Also on the citations file for the ”eprint” and ”user”
citation types: add a <ep:linkhere> .... </ep:linkhere> just inside the
<ep:citation> tag.
• (This is not essential) The title for the abstract block never gets added to
the page. Edit ArchiveRenderConfig.pm - search for eprint fieldname abstract
and add the following line as the next line:
$page->appendChild( $h2 );
12.4. UPDATING FROM EPRINTS 2.0.1 TO EPRINTS 2.1 85
% bin/upgrade ARCHIVEID
This script will explain the changes it is making. It will require the MySQL
root password.
If you want an extra level of protection, you may want to run mysqldump
to backup the database.
• If you are not using the OAI system then just copy the new ArchiveOAICon-
fig.pm configuration file over your old one. The new one can be found in
/opt/eprints2/defaultcfg/ArchiveOAIConfig.pm.
If you have already configured the OAI system then you need to add the re-
quired configuration for OAI 2.0. These can be copied from /opt/eprints2/defaultcfg/ArchiveOAIConfig.pm.
The two relevant sections are the block titled ”OAI-PMH 2.0” and the
subroutine make metadata oai dc oai2. Copy these into your archives
ArchiveOAIConfig.pm and modify them if needed.
• Look at the section on setting up subscriptions in the ”Installation” chap-
ter.
You need to add some more items to the cron tab to mail out the sub-
scriptions. One set of new cron entries per archive.
• The call for EPrints::EPrint-¿new() has changed the order of the param-
eters (to standardise it with Subject, User, etc.).
In ArchiveRenderConfig.pm edit:
to be:
• Upgrade the SQL tables - EPrints 2.2 needs to make some small changes
to the database. Run:
% bin/upgrade ARCHIVEID
• cfg/ArchiveOAIConfig.pm
In the sub eprint to unqualified dc change this line:
to this:
• cfg/ArchiveRenderConfig.pm (optional)
You may wish to add the following to ”eprint render”, after the commen-
tary section. It will add ’type’ to the abstracts page (although not unless
you run generate abstracts
$table->appendChild( _render_row(
$session,
$session->html_phrase( "eprint_fieldname_type" ),
$eprint->render_value( "type" ) ) );
• cfg/ArchiveConfig.pm (optional)
In the 2.1 default configuration a user could not view their own secure
documents if they were not an editor. This was silly. Fix it in sub
can user view document:
Change the block:
to:
12.5. UPDATING FROM EPRINTS 2.1 (OR 2.1.1) TO EPRINTS 2.2 87
• cfg/ArchiveRenderConfig.pm (optional)
You may wish to add subscriptions to the user render full method so staff
can see what subscriptions a user has. Just before
$info->appendChild( $table );
Add:
my @subs = $user->get_subscriptions;
my $subs_ds = $session->get_archive->get_dataset( "subscription" );
foreach my $subscr ( @subs )
{
my $rowright = $session->make_doc_fragment;
foreach( "frequency","spec","mailempty" )
{
my $strong;
$strong = $session->make_element( "strong" );
$strong->appendChild( $session->make_text(
$subs_ds->get_field( $_ )->display_name( $session ) ) );
$strong->appendChild( $session->make_text( ": " ) );
$rowright->appendChild( $strong );
$rowright->appendChild( $subscr->render_value( $_ ) );
88 CHAPTER 12. UPDATING FROM PREVIOUS VERSIONS
$entities{ruler} = $archive->get_ruler()->toString;
to
$entities{ruler} = EPrints::XML::to_string( $archive->get_ruler() );
$metafield-¿get dataset()
MetaFields no longer known what dataset they belong to
$metafield-¿set dataset()
MetaFields no longer known what dataset they belong to
$metafield-¿get values( $session )
Now use: $metafield-¿get values( $session, $dataset )
XML functions
All the XML handling now uses EPrints::XML as an abstraction to
the differences between XML::GDOME and XML::DOM.
Please use EPrints::XML::to string( $node ) rather than $node-¿toString
as in GDOME toString does not work as expected on DocumentFrag-
ments.
EPrints::Config::parse xml()
Removed. Use EPrints::XML methods instead.
90 CHAPTER 12. UPDATING FROM PREVIOUS VERSIONS
91
92 CHAPTER 13. EPRINTS HISTORY (AND FUTURE PLANS)
Jan 2002
EPrints 2 Alpha-2 (Pepperoni) released.
Feb 14 2002
EPrints 2.0 (Olive) released.
Apr 17 2002
EPrints 2.0.1 (Tuna) released. Mostly bugfixes.
July 1 2002
EPrints offically joins GNU Project.
July 4 2002
GNU EPrints 2.1 (Pineapple) released. Added subscriptions and OAI 2.0
support.
October 31 2002
GNU EPrints 2.2 (Pumpkin) released. Added subject editors and GDOME
support.
Mirroring
Being able to run the system from two machines eg. USA with a European
mirror. One system would be the ”root” and all editing and user based
functions would be done there, but searching, browsing and downloading
can be done from a mirror.
”Peer Review”
A more complex approach to the ”buffer” which allows items to be as-
signed to reviewers who can add comments, or scores, or what-have-you,
before the item is accepted or rejected.
Citation Scanning/Linking
Software to scan the full texts of documents looking for citations and
attempt to link them to (a) other items in the archive and (b) use a third
party system to link to external items. We do now have the paracite plugin
but it’s not enough yet.
How-To’s
These may appear on eprints.org rather than part of the package. These
will be trails through the (admittedly large) configuration for performing
specific tasks like adding a new field.
Super Configurer
A configuration tool which can do really complex stuff like add and remove
fields.
Chat Forums
A slashdot style chat at the bottom of each abstract page. Possibly using
a seperate system such as a PHP bulletin board. Possibly using the d3e
system developed at open.ac.uk.
Web Log Munger
Something which takes the logs from the website and produces nicely
styled inforamtion on how many hits various documents get. Could pos-
sibly be a ”contributed” feature rather than part of the core system.
Multilple Sites, one archive
Kind of like the multilanguage support. This would allow more than one
site to have a single back-end database. For example you may want to
present a subset of your archive on a seperate URL (in addition to the
normal one) with different branding. The multi-lingual site options cur-
rently available would be replaced (neatly) with this more comprehensive
approach.
Fully Customisable Workflow
So that every stage of submission can be configured. This would allow
complex approval mechanisms and peer review etc. Creating a configu-
ration would be very hard but changing an existing one would be quite
easy. We could supply multiple configurations for basic variations; subject
archive, online journal, institutional archive etc.
Interoperability with Similar Projects
There are other open source and free projects with things in common
with GNU EPrints. These include D-Space (from MIT and HP) and
Greenstone (University of Waikato, NZ). We have spoken with members
of both projects and hope to make ways to share data between these
systems. I’m also hoping to make it possible for GNU EPrints to use
Greenstone’s plugin system for document converters.
Collection of contributed tools on software.eprints.org
This is where I ask for help! If you have some interesting code, useful
scripts, unusual or innovative configuration, translation, subject config-
uration for a specific subject area... etc. Please send it to us at sup-
[email protected] with an explanation of what it is, who wrote it, where,
and who has copyright on it.
95
96 CHAPTER 14. THE EPRINTS LOGO
SYNOPSIS
configure archive [archiveid ] [options]
DESCRIPTION
Create a new EPrint Archive, or edit the basic options of an existing archive.
This script will prompt the user for various basic information about the
archive and then will:
After running this script you should edit the files created in the archives cfg
directory, especially ArchiveConfig.pm then you will need to run a number
of setup scripts, starting with create tables.
ARGUMENTS
archiveid
The ID of the EPrint archive to modify (or create).
97
98 CHAPTER 15. COMMAND LINE TOOLS
OPTIONS
–help
Print a brief help message and exit.
–man
Print the full manual page and then exit.
–quiet
This option does not do anything.
–verbose
Explain in detail what is going on. May be repeated for greater effect.
–version
Output version information and exit.
AUTHOR
This is part of this EPrints 2 system. EPrints 2 is developed by Christopher
Gutteridge.
VERSION
EPrints Version: 2.2
CONTACT
For more information goto http://www.eprints.org/ which give information
on mailing lists and the like.
Chris Gutteridge may be contacted at [email protected]
Should you need a real world address for some reason, EPrints can be con-
tacted in the real world at
COPYRIGHT
This file is part of EPrints 2.
Copyright (c) 2000,2001,2002 University of Southampton, UK. SO17 1BJ.
EPrints 2 is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later version.
15.2. CREATE TABLES COMMAND 99
SYNOPSIS
create tables archiveid [options]
DESCRIPTION
Create the SQL tables which the EPrints software will use.
You should run import subjects after this, actually it will remind you
unless you make it –quiet.
ARGUMENTS
archiveid
The ID of the EPrint archive to effect.
OPTIONS
–force
Go ahead and try and create the tables, even if the script encounters things
it thinks are wrong like too many indexed fields or too the database not
being empty.
–help
Print a brief help message and exit.
–man
Print the full manual page and then exit.
–quiet
Be vewwy vewwy quiet. This option will supress all output unless an error
occurs.
100 CHAPTER 15. COMMAND LINE TOOLS
–verbose
Explain in detail what is going on. May be repeated for greater effect.
–version
Output version information and exit.
AUTHOR
VERSION
CONTACT
COPYRIGHT
SYNOPSIS
create user archiveid [options] username email usertype [password ]
DESCRIPTION
Create a new user to an eprint archive.
This is handy for making the initial admin user at the very least.
ARGUMENTS
archiveid
The ID of the EPrint archive to add a user to.
username
The requested username for the new user. If a user with this name already
exists then the script will abort with a error.
email
The email address of the new user.
usertype
The type of the new user. The type of a user sets how much they can
do in the system. The default EPrints configuration provides 3 types of
users:
user
Normal everyday joe public users.
editor
Editors may approve eprints for addition, browse the submission
buffer, and check the archive status.
admin
Administrators may do everything. That’s probably what for yourself
if you are the person setting up the archive.
password
The initial password for this user. You don’t have to specify it here if
you don’t want to. It will be encrypted, by EPrints, using UNIX crypt.
This should not be a problem unless you are using a different method to
authenticate users.
102 CHAPTER 15. COMMAND LINE TOOLS
archiveid
The ID of the EPrint archive to add a user to.
OPTIONS
–help
Print a brief help message and exit.
–man
Print the full manual page and then exit.
–quiet
Be vewwy vewwy quiet. This option will supress all output unless an error
occurs.
–verbose
Explain in detail what is going on. May be repeated for greater effect.
–version
Output version information and exit.
AUTHOR
This is part of this EPrints 2 system. EPrints 2 is developed by Christopher
Gutteridge.
VERSION
EPrints Version: 2.2
CONTACT
For more information goto http://www.eprints.org/ which give information
on mailing lists and the like.
Chris Gutteridge may be contacted at [email protected]
Should you need a real world address for some reason, EPrints can be con-
tacted in the real world at
COPYRIGHT
This file is part of EPrints 2.
Copyright (c) 2000,2001,2002 University of Southampton, UK. SO17 1BJ.
EPrints 2 is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later version.
EPrints 2 is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABIL-
ITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
You should have received a copy of the GNU General Public License along
with EPrints 2; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
SYNOPSIS
erase archive archiveid [options]
DESCRIPTION
This script completely erases the archive contents, including all database tables
and eprint document files and the web site. After running this, the metadata
configuration can be safely updated and the creation scripts run again.
Without the –force option, this script asks for confirmation before actually
erasing anything.
After this script you will need to run create tables before you can use the
archive.
ARGUMENTS
archiveid
The ID of the eprint archive to use.
OPTIONS
–help
Print a brief help message and exit.
104 CHAPTER 15. COMMAND LINE TOOLS
–man
Print the full manual page and then exit.
–quiet
Be vewwy vewwy quiet. This option will supress all output unless an error
occurs.
–verbose
Explain in detail what is going on. May be repeated for greater effect.
–version
Output version information and exit.
–force
Don’t ask before making the changes.
–noerasedb
Don’t delete the database tables.
–noerasefiles
Don’t erase the document and website files.
–rootpass password
This option allows you to specify the MySQL root password, which will
skip the bit where you get asked for it. It’s not very secure to put a root
password on the command line. Consider yourself warned...
BUGS
This script assumes that the root mysql user is called root.
You may run into problems if your mysql install is not on the same machine
as EPrints. If this happens use –noerasefiles and delete and recreate the
database yourself.
You can specify –noerasedb and –noerasefiles at the same time, but that’s
really stupid, as then nothing will happen.
AUTHOR
This is part of this EPrints 2 system. EPrints 2 is developed by Christopher
Gutteridge.
VERSION
EPrints Version: 2.2
15.5. EXPORT HASHES COMMAND 105
CONTACT
For more information goto http://www.eprints.org/ which give information
on mailing lists and the like.
Chris Gutteridge may be contacted at [email protected]
Should you need a real world address for some reason, EPrints can be con-
tacted in the real world at
COPYRIGHT
This file is part of EPrints 2.
Copyright (c) 2000,2001,2002 University of Southampton, UK. SO17 1BJ.
EPrints 2 is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later version.
EPrints 2 is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABIL-
ITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
You should have received a copy of the GNU General Public License along
with EPrints 2; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
SYNOPSIS
export hashes archiveid [options] [filename]
DESCRIPTION
Every time a document in eprints is modified a new .xsh file is generated con-
taining a hash of each file. This script creates a hash of each of these .xsh files
and create a super .xsh file containing each of those hashes.
If no filename is given this script outputs to standard out.
106 CHAPTER 15. COMMAND LINE TOOLS
The XML file produced may then be archived safely. You can then create an
MD5 of that file and do something to prove you had it on the date you created
it.
For example, publish it in an small-ad in a national paper.
Now you can prove you had that MD5 on that day, which proves you had
the file output by export hashes that day. The MD5’s in that file prove that
you had the .xsh file of a given document. Those files should prove that you
had a given file.
This all assumes that nobody works out a way to do MD5’s in reverse. And
there’s no legal precident yet.
That’s why this is an experimental feature.
ARGUMENTS
archiveid
The ID of the EPrint archive to use.
filename
A filename to write to. If omited this will write to stdout.
OPTIONS
–help
Print a brief help message and exit.
–man
Print the full manual page and then exit.
–quiet
Be vewwy vewwy quiet. This option will supress all output unless an error
occurs.
–verbose
Explain in detail what is going on. May be repeated for greater effect.
–version
Output version information and exit.
–all
Output hashes of ALL .xsh files for each document, not just the most
recent. This takes longer but should be logged periodically.
AUTHOR
This is part of this EPrints 2 system. EPrints 2 is developed by Christopher
Gutteridge.
15.6. EXPORT XML COMMAND 107
VERSION
EPrints Version: 2.2
CONTACT
For more information goto http://www.eprints.org/ which give information
on mailing lists and the like.
Chris Gutteridge may be contacted at [email protected]
Should you need a real world address for some reason, EPrints can be con-
tacted in the real world at
COPYRIGHT
This file is part of EPrints 2.
Copyright (c) 2000,2001,2002 University of Southampton, UK. SO17 1BJ.
EPrints 2 is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later version.
EPrints 2 is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABIL-
ITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
You should have received a copy of the GNU General Public License along
with EPrints 2; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
SYNOPSIS
force config reload archiveid [options]
108 CHAPTER 15. COMMAND LINE TOOLS
DESCRIPTION
This command forces the webserver to reload the config files for the given
archive.
This is very useful, but the webserver will only reload the config files in the
”forked” versions of itself, but the origional. This will generate a major extra
load each time the webserver forks of a process.
This command offers a temporary solution to the problem only. You should
still restart apache at some point.
All this file really does is create an empty file named .changed in the archives
own cfg/ directory. The main system, running in the webservers mod perl, looks
at the time this file was last modified. If this file has changed since the archive
configuration was loaded then it forces a reload.
ARGUMENTS
archiveid
The ID of the eprint archive to force a reload on.
OPTIONS
–help
Print a brief help message and exit.
–man
Print the full manual page and then exit.
–quiet
Be vewwy vewwy quiet. This option will supress all output unless an error
occurs.
–verbose
Explain in detail what is going on. May be repeated for greater effect.
–version
Output version information and exit.
AUTHOR
This is part of this EPrints 2 system. EPrints 2 is developed by Christopher
Gutteridge.
VERSION
EPrints Version: 2.2
15.8. GENERATE ABSTRACTS COMMAND 109
CONTACT
For more information goto http://www.eprints.org/ which give information
on mailing lists and the like.
Chris Gutteridge may be contacted at [email protected]
Should you need a real world address for some reason, EPrints can be con-
tacted in the real world at
COPYRIGHT
This file is part of EPrints 2.
Copyright (c) 2000,2001,2002 University of Southampton, UK. SO17 1BJ.
EPrints 2 is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later version.
EPrints 2 is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABIL-
ITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
You should have received a copy of the GNU General Public License along
with EPrints 2; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
SYNOPSIS
generate abstracts archiveid [options] [eprintid ]
DESCRIPTION
This script recreates every static abstract page for an eprints archive. To save
load on the database, as archived data should not change, EPrints creates static
webpages containing the summary of each eprint. If you change the way the
abstracts are rendered or change the site template then you will want to run
this script.
110 CHAPTER 15. COMMAND LINE TOOLS
ARGUMENTS
archiveid
The ID of the eprint archive to use.
eprintid
An optional integer indicating that only the abstract page for record
eprintid should be updated. Handy for testing new configurations.
OPTIONS
–help
Print a brief help message and exit.
–man
Print the full manual page and then exit.
–quiet
Be vewwy vewwy quiet. This option will supress all output unless an error
occurs.
–verbose
Explain in detail what is going on. May be repeated for greater effect.
–version
Output version information and exit.
AUTHOR
This is part of this EPrints 2 system. EPrints 2 is developed by Christopher
Gutteridge.
VERSION
EPrints Version: 2.2
CONTACT
For more information goto http://www.eprints.org/ which give information
on mailing lists and the like.
Chris Gutteridge may be contacted at [email protected]
Should you need a real world address for some reason, EPrints can be con-
tacted in the real world at
EPrints c/o Christopher Gutteridge
Department of Electronics and Computer Science
University of Southampton
SO17 1BJ
United Kingdom
15.9. GENERATE APACHECONF COMMAND 111
COPYRIGHT
This file is part of EPrints 2.
Copyright (c) 2000,2001,2002 University of Southampton, UK. SO17 1BJ.
EPrints 2 is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later version.
EPrints 2 is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABIL-
ITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
You should have received a copy of the GNU General Public License along
with EPrints 2; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
SYNOPSIS
generate apacheconf [options]
DESCRIPTION
This script generates the apache config files which will be used by EPrints. In
the simple case all you need to do is run this script then add a line to your main
apache configuration file - often, but not always, /usr/local/apache/conf/httpd.conf
Include /opt/eprints2/cfg/apache.conf
Or elsewhere if you installed EPrints somewhere other than /opt/eprints2.
This file then uses the ”Include” directive to include all relevant apache config
files from this EPrints installation.
By default the virtualhost directives are
<VirtualHost *>
But the * can be changed to something different by editing the ¡virtualhost¿
option in SystemSettings.pm
ARGUMENTS
OPTIONS
–help
Print a brief help message and exit.
112 CHAPTER 15. COMMAND LINE TOOLS
–man
Print the full manual page and then exit.
–quiet
Be vewwy vewwy quiet. This option will supress all output unless an error
occurs.
–verbose
Explain in detail what is going on. May be repeated for greater effect.
–version
Output version information and exit.
FILES
EPRINTS/cfg/apache.conf
This file is not updated if it already exists, so you can add system-wide
apache configuration directives here. By default it just includes the two
other system wide files: auto-apache-includes.conf and auto-apache.conf
EPRINTS/cfg/auto-apache-includes.conf
This file is updated with Include lines to each of the archive specific apache
config files. This file should not be edited by hand,
EPRINTS/cfg/auto-apache.conf
This file contains the system wide apache directives required by EPrints.
This file should not be edited by hand.
EPRINTS/archives/ARCHIVEDIR/cfg/apache.conf
This file is not updated if it already exists, so you can add archive-specific
apache configuration directives here. By default it just includes the auto-
matically generated archive specific file: auto-apache.conf
EPRINTS/archives/ARCHIVEDIR/cfg/auto-apache.conf
This file contains all the configuration directives needed for an archive.
This is where the bulk of the configuration appears, the clever stuff, if you
will. This file should not be edited by hand.
EPRINTS/archives/ARCHIVEDIR/cfg/apachevhost.conf
This file is not updated if it already exists, it is included into the virutal-
host in auto-apache.conf so that you can a couple of additional directives
if you need to. For example, redirects or additional log directives.
AUTHOR
This is part of this EPrints 2 system. EPrints 2 is developed by Christopher
Gutteridge.
15.10. GENERATE STATIC COMMAND 113
VERSION
CONTACT
COPYRIGHT
generate static - Generate static pages of an EPrint archive using the tem-
plate.
SYNOPSIS
DESCRIPTION
This script creates the static web site for EPrints (or, if you are running in
multiple lanugages it generates the websites).
It processes every file in EPRINTS/archives/ARCHIVE/cfg/static/LANGID/.
For each language processes all the files in /LANGID/ and /generic/ into
EPRINTS/archives/ARCHIVE/html/LANGID. If that sounds confus-
ing, don’t worry, it’s not that bad, just put your webpage outlines in static/en/
and your image files and the like in static/generic/ and run this script and see
what happens.
Most files are copied into the target directory as is and directory structure
is preserved.
Files with a .xpage or .xhtml suffix are processed as they are copied.
.xpage
This is an XML file with the following structure:
The resulting file will be a .html file (foo.xpage becomes foo.html). It will
take the template for this archive and insert the title and body from the
appropriate places. It will also cause the the special EPrints entities to be
converted as it is copied. See the main documentation.
.xhtml
This is a normal XHTML file but with the following XML header:
ARGUMENTS
archiveid
The ID of the eprint archive to use.
OPTIONS
–help
Print a brief help message and exit.
–man
Print the full manual page and then exit.
–quiet
Be vewwy vewwy quiet. This option will supress all output unless an error
occurs.
–verbose
Explain in detail what is going on. May be repeated for greater effect.
–version
Output version information and exit.
AUTHOR
This is part of this EPrints 2 system. EPrints 2 is developed by Christopher
Gutteridge.
VERSION
EPrints Version: 2.2
CONTACT
For more information goto http://www.eprints.org/ which give information
on mailing lists and the like.
Chris Gutteridge may be contacted at [email protected]
Should you need a real world address for some reason, EPrints can be con-
tacted in the real world at
COPYRIGHT
This file is part of EPrints 2.
Copyright (c) 2000,2001,2002 University of Southampton, UK. SO17 1BJ.
EPrints 2 is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later version.
EPrints 2 is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABIL-
ITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
You should have received a copy of the GNU General Public License along
with EPrints 2; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
SYNOPSIS
generate views archiveid [options]
DESCRIPTION
This script renders static ”browse views” for an EPrint Archive.
What this does is generate browse pages for each field configured as brows-
able in ArchiveConfig.pm. It creates a static web page for each value of that
field, and index pages to navigate to them.
For example, if we make ”year” browseable then this script will generate one
page for each unique value of the year field. So a user can then view the 1995
page and see links to all the 1995 eprints.
Advantages of this are that this puts less load on the database than user
searches. Assuming you pick two or three sensible fields to make browsable.
This script should be run every hour or so, but that should once a day or
even once a week on large archives, as the more eprints the longer it will take
to run. The rough length of time to run this is of the order of O( languages *
eprints * browsable fields ). You can automate running this with the cron
system.
ARGUMENTS
archiveid
The ID of the eprint archive to use.
15.11. GENERATE VIEWS COMMAND 117
OPTIONS
–help
Print a brief help message and exit.
–man
Print the full manual page and then exit.
–quiet
Be vewwy vewwy quiet. This option will supress all output unless an error
occurs.
–verbose
Explain in detail what is going on. May be repeated for greater effect.
–version
Output version information and exit.
AUTHOR
This is part of this EPrints 2 system. EPrints 2 is developed by Christopher
Gutteridge.
VERSION
EPrints Version: 2.2
CONTACT
For more information goto http://www.eprints.org/ which give information
on mailing lists and the like.
Chris Gutteridge may be contacted at [email protected]
Should you need a real world address for some reason, EPrints can be con-
tacted in the real world at
COPYRIGHT
This file is part of EPrints 2.
Copyright (c) 2000,2001,2002 University of Southampton, UK. SO17 1BJ.
EPrints 2 is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later version.
118 CHAPTER 15. COMMAND LINE TOOLS
SYNOPSIS
import subjects archiveid [options] [subjectfile]
DESCRIPTION
Import a set of subjects into an EPrints archive. The subjects are the heirar-
chical tree of options for ”subject” type metadata fields in an eprint archive.
Use the staff admin subject editor for little tweaks. Use this command for
the initial setup or bulk editing subjects. Use the exporter to dump the current
subjects if you (an administrator) have edited them online.
This script should also be run after create tables.
ARGUMENTS
archiveid
The ID of the EPrint archive to use.
subjectfile
This is the file to import the subjects from. If you do not specify it then
the system will use ”subjects” from the given archives cfg directory (or
”subjects.xml” if you used –xml).
OPTIONS
–help
Print a brief help message and exit.
15.13. IMPORT SUBJECTS COMMAND 119
–man
Print the full manual page and then exit.
–quiet
Be vewwy vewwy quiet. This option will supress all output unless an error
occurs.
–verbose
Explain in detail what is going on. May be repeated for greater effect.
–version
Output version information and exit.
–xml
Instead of using colon sepearted subject file, the input file is in the XML
format which the exporter uses.
–nopurge
Do not purge the existing records from the subject table before importing
this file. Rather than do this, it’s probably easier to export the current
subjects as XML, then combine in your new file and reimport it.
FILE FORMAT
There are two different file formats excepted, the default colon seperated file
and XML in the eprints export format.
The colon seperated ASCII is easier to edit, but is more limited. It is not
intended for UTF-8 encoded characters and can only specify subject names in
the default language.
The XML format can contain any unicode characters and also allows multiple
languages for the names of subjects. You may wish to dump the current subjects
out of eprints as XML. Edit it. Then re-import it. The downside is that this
format is far more verbose.
subjectid
An ASCII string which is a unique ID for this subject.
120 CHAPTER 15. COMMAND LINE TOOLS
name
The name of this subject, in the default language of the archive.
parents
A comma seperated list of the parents of this subject. Be careful not to
cause loops. The top level subject id is ROOT and should not be imported
as it always exists.
depositable
A boolean value ( 1 or 0 ) which indicates if this subject may have eprints
associated with it.
The XML File Format This is the standard eprints export format. It looks
like this:
<eprintsdata>
<record>
<field name="subjectid">phys</field>
<field name="name"><lang id="en">Physical Sciences & Mathematics</lang></fiel
<field name="parents">subjects</field>
<field name="depositable">FALSE</field>
</record>
.
.
.
</eprintsdata>
The fields have the same meaning as described for the ASCII format, with
the following variations. The name field can (and should) have a name for each
language supported by the archive. Multiple parents are indicated by multiple
<ltfield name=”parents¡gt¿¿ elements. Depositable should be either TRUE or
FALSE.
AUTHOR
This is part of this EPrints 2 system. EPrints 2 is developed by Christopher
Gutteridge.
VERSION
EPrints Version: 2.2
CONTACT
For more information goto http://www.eprints.org/ which give information
on mailing lists and the like.
Chris Gutteridge may be contacted at [email protected]
15.14. LIST USER EMAILS COMMAND 121
Should you need a real world address for some reason, EPrints can be con-
tacted in the real world at
COPYRIGHT
This file is part of EPrints 2.
Copyright (c) 2000,2001,2002 University of Southampton, UK. SO17 1BJ.
EPrints 2 is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later version.
EPrints 2 is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABIL-
ITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
You should have received a copy of the GNU General Public License along
with EPrints 2; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
SYNOPSIS
rehash documents archiveid [options] [documentid ]
DESCRIPTION
This command regenerates the hash values for all documents. This will, by
default, create an MD5 hash of all the files in the document in alphabetic order
of their path. This value is then stored as a string of hexidecimal characters in
the metadata for the document.
This value is automatically re-calculated if more files are uploaded or any
files are removed.
It is useful as a quick reliable way of seeing if the document contents has
changed. If you generate other formats of a document on-the-fly the MD5 can
122 CHAPTER 15. COMMAND LINE TOOLS
be useful as part of caching, or you may wish to pass the hashes to a third party
to allow you to later verify that an item was in the archive since a given date.
archiveid
The ID of the eprint archive to rehash the documents of.
OPTIONS
–help
Print a brief help message and exit.
–man
Print the full manual page and then exit.
–quiet
Be vewwy vewwy quiet. This option will supress all output unless an error
occurs.
–verbose
Explain in detail what is going on. May be repeated for greater effect.
–version
Output version information and exit.
AUTHOR
This is part of this EPrints 2 system. EPrints 2 is developed by Christopher
Gutteridge.
VERSION
EPrints Version: 2.2
CONTACT
For more information goto http://www.eprints.org/ which give information
on mailing lists and the like.
Chris Gutteridge may be contacted at [email protected]
Should you need a real world address for some reason, EPrints can be con-
tacted in the real world at
COPYRIGHT
This file is part of EPrints 2.
Copyright (c) 2000,2001,2002 University of Southampton, UK. SO17 1BJ.
EPrints 2 is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later version.
EPrints 2 is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABIL-
ITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
You should have received a copy of the GNU General Public License along
with EPrints 2; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
SYNOPSIS
reindex archiveid [options] [dataset]
DESCRIPTION
This script rebuilds the indexs are ordering information of a dataset, eg. ”archive”
or ”users”. The time it takes will depend on the number of records in the dataset.
This script should be run if you change the way that the free text indexing
function works or change the descriptions of subjects or sets (and want them to
sort correctly).
reindex should not be run lightly on a larget archive; a back-of-the-envelope
estimate at running time is 1 second per record.
So 300 records will take about 5 minutes. 3000 records will take about an
hour and 30000 records will take something like 10 hours!
ARGUMENTS
archiveid
The ID of the EPrint archive to use.
dataset
This is the name of the dataset to reindex, one of: subject, user, archive,
buffer, inbox, deletion, document
124 CHAPTER 15. COMMAND LINE TOOLS
OPTIONS
–help
Print a brief help message and exit.
–man
Print the full manual page and then exit.
–quiet
Be vewwy vewwy quiet. This option will supress all output unless an error
occurs.
–verbose
Explain in detail what is going on. May be repeated for greater effect.
–version
Output version information and exit.
–force
Don’t ask before running. If you want to automate this script, eg. run it
once every 6 months, you don’t want it interactively checking if you want
to continue!
AUTHOR
This is part of this EPrints 2 system. EPrints 2 is developed by Christopher
Gutteridge.
VERSION
EPrints Version: 2.2
CONTACT
For more information goto http://www.eprints.org/ which give information
on mailing lists and the like.
Chris Gutteridge may be contacted at [email protected]
Should you need a real world address for some reason, EPrints can be con-
tacted in the real world at
COPYRIGHT
This file is part of EPrints 2.
Copyright (c) 2000,2001,2002 University of Southampton, UK. SO17 1BJ.
EPrints 2 is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later version.
EPrints 2 is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABIL-
ITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
You should have received a copy of the GNU General Public License along
with EPrints 2; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
SYNOPSIS
generate views archiveid frequency [options]
DESCRIPTION
This script sends out all the subscription emails for the specified archive and
frequency. frequency must be one of daily|weekly|monthly.
This script should probably be called from your ”cron” system, soon after
midnight. Something like:
Note the spacing out so that all 3 don’t start at once and hammer the
database.
ARGUMENTS
archiveid
The ID of the eprint archive to use.
126 CHAPTER 15. COMMAND LINE TOOLS
frequency
Which ”frequency” of subscriptions to send - the daily, weekly or monthly
ones.
OPTIONS
–help
Print a brief help message and exit.
–man
Print the full manual page and then exit.
–quiet
Be vewwy vewwy quiet. This option will supress all output unless an error
occurs.
–verbose
Explain in detail what is going on. May be repeated for greater effect.
–version
Output version information and exit.
AUTHOR
This is part of this EPrints 2 system. EPrints 2 is developed by Christopher
Gutteridge.
VERSION
EPrints Version: 2.2
CONTACT
For more information goto http://www.eprints.org/ which give information
on mailing lists and the like.
Chris Gutteridge may be contacted at [email protected]
Should you need a real world address for some reason, EPrints can be con-
tacted in the real world at
COPYRIGHT
This file is part of EPrints 2.
Copyright (c) 2000,2001,2002 University of Southampton, UK. SO17 1BJ.
EPrints 2 is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later version.
EPrints 2 is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABIL-
ITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
You should have received a copy of the GNU General Public License along
with EPrints 2; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
SYNOPSIS
upgrade archiveid [options]
DESCRIPTION
Some versions of eprints require modifications to be made to the database tables
used by earlier versions.
Run this script on each archive after upgrading the eprints software.
ARGUMENTS
archiveid
The ID of the EPrint archive to effect.
OPTIONS
–help
Print a brief help message and exit.
–man
Print the full manual page and then exit.
128 CHAPTER 15. COMMAND LINE TOOLS
–quiet
This option doesn’t do anything. You REALLY don’t want to run this
script without knowing what’s happening.
–verbose
Explain in detail what is going on. May be repeated for greater effect.
–version
Output version information and exit.
AUTHOR
This is part of this EPrints 2 system. EPrints 2 is developed by Christopher
Gutteridge.
VERSION
EPrints Version: 2.2
CONTACT
For more information goto http://www.eprints.org/ which give information
on mailing lists and the like.
Chris Gutteridge may be contacted at [email protected]
Should you need a real world address for some reason, EPrints can be con-
tacted in the real world at
COPYRIGHT
This file is part of EPrints 2.
Copyright (c) 2000,2001,2002 University of Southampton, UK. SO17 1BJ.
EPrints 2 is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later version.
EPrints 2 is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABIL-
ITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
You should have received a copy of the GNU General Public License along
with EPrints 2; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA