Skip to content

AndersonShun/loghub

 
 

Repository files navigation

Loghub

Loghub maintains a collection of system logs, which are freely accessible for research purposes. Some of the logs are production data released from previous studies, while some others are collected from real systems in our lab environment. Wherever possible, the logs are NOT sanitized, anonymized or modified in any way. All these logs amount to over 87GB in size. We thus host only a small sample (2k lines) on Github for each dataset.

How to get the data?

If you are interested in these datasets, please request the raw logs at Zenodo or visit the source links wherever applicable.

Logs currently available:

Software System Dataset Name Time Span #Messages Size Compressed (.tar.gz) Source Link
Distributed systems
HDFS HDFS-1 38.7 hours 11,175,629 1.54GB 152.01MB Link
HDFS-2 N.A. 71,118,073 16.84GB 877.38MB
Hadoop Hadoop N.A. 394,308 49.78MB 2.50MB
Spark Spark N.A. 33,236,604 2.88GB 179.18MB
Zookeeper Zookeeper 26.7 days 74,380 10.18MB 452KB
OpenStack OpenStack N.A. 207,820 60.02MB 5.27MB Link
Operating systems
Windows Windows 226.7 days 114,608,388 27.36GB 1.63GB
Linux Linux 263.9 days 25,567 2.30MB 228KB
Mac Mac 7.0 days 117,283 16.48MB 1.46MB
Server applications
Apache Web server Apache 263.9 days 56,481 5.02MB 260KB
OpenSSH OpenSSH 28.4 days 655,146 71.70MB 4.49MB
Mobile systems
Andriod Andriod N.A. 63,042,037 7.00GB 825.57MB
Health App HealthApp 10.5 days 253,395 22.98MB 2.24MB
Supercomputers
Blue Gene/L BGL 214.7 days 4,747,963 725.77MB 61.46MB Link
HPC HPC N.A. 433,489 32.77MB 3.21MB
Thunderbird Thunderbird 244 days 211,212,192 31.04GB 1.97GB
On-premises software
Proxifier Proxifier N.A. 21,329 2.48MB 172KB

Publications using these datasets

Organizations that request these datasets

Industry
Huawei VMWare Research HSBC
Academy
CUHK University of Utah Trinity College Dublin University of La Rochelle
Simula Research Lab Indraprastha University University of Hamburg

Feedback

For any questions or feedback, please post to our issue page.

License

The log datasets are freely available ONLY for research purposes.

LogPAI Team, 2018

About

A collection of system log datasets for massive log analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published