A Word Trend Analysis Tool For News Content

This software gathers a corpus of news content from RSS feeds over time, and processes them such that a web client can be used to analyse trends.

An example can be seen in the graph below, showing the fortunes of the candidates in the 2015 British Labour leadership race.

This project was started in about 2007 and went through a series of iterations. What you see here is the result of a 2025 tidy-up of some old code, removing a load of unnecessary stuff originally developed for the website it was part of.

Components

The three parts of this project are split into directories at the top level.

processing

The processing directory contains the PHP scripts which harvest RSS feeds and process them into the corpus.

client

The client directory contains a web client for analysing the data.

data

The data directory contains the processed corpus data. This project does not ship with a corpus, you will have to collect your own. Warning, a corpus is composed of a huge tree of small JSON files, and can quickly become unwieldy. It's suggested that this directory not be put on your system disk. The production version of this mounted a dedicated HDD for the corpus at this path.

Getting started

Requirements

This software requires PHP to be installed.

Creating a corpus

You will need to configure and run the ./processing/processing.sh script to create your corpus. Here follows a quick run-down of the requirements and startup.

Configuring and running processing.php

There are a set of variable definitions for paths at the top of ./processing/processing.php. With the default directory structure they should work without modification, however it is suggested you cast your eye over them.

You will need to create your own feed list in ./processing/feedlist/feedlist.txt. The distributed version contains a couple of example feeds as well as instructions. I have a set of lists of feeds if you need somewhere to start.

With everything configured, you should be able to open a terminal in ./processing, and run processing.sh

Once you are happy with the operation, you can add this script to a cron job to run every few hours and build your corpus over time. Alternatively for testing you can use the every command to schedule it every few hours.

Using the client

The client requires a web server to run. You can of course set one up, but for testing purposes the simplest way to do this is to use PHP's built-in web server. The ./start-web-server.sh script does that, allowing you to go to localhost:8080/client in your browser to run the client.

The client will require some configuration. Towards the top of ./client/index.php is a group of configuration variables. Most of them will not need changing with the default directory structure.

You will need to change $startweekdate to reflect the start of your corpus, this is the week the client will display at start.

$noisewords is an array of noise words that reflects my use of the system for British politics. You may need to make a few edits.

Licence

The code in this repository ls copyright (c) Jenny List, 2007-2025, and licensed under the MIT licence, EXCEPT for the following components which are here under their own licences:

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
client		client
data		data
images		images
processing		processing
LICENCE.txt		LICENCE.txt
README.md		README.md
start-web-server.sh		start-web-server.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Word Trend Analysis Tool For News Content

Components

processing

client

data

Getting started

Requirements

Creating a corpus

Configuring and running processing.php

Using the client

Licence

About

Uh oh!

Releases

Packages

Languages

License

JennyList/word-trend-analysis

Folders and files

Latest commit

History

Repository files navigation

A Word Trend Analysis Tool For News Content

Components

processing

client

data

Getting started

Requirements

Creating a corpus

Configuring and running processing.php

Using the client

Licence

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages