0% found this document useful (0 votes)
261 views10 pages

Assignment 2 - Capstone

This document describes a data flow for networking metrics from various school boards in Ontario. Logging devices in each school board collect data on provisioned capacity and usage every 10 minutes and send it to log collectors. The data is then formatted into a report and sent to the EDU Broadband compliance and insight manager portal for validation and analysis. Key data entities tracked include uptime, bandwidth usage, packet errors, and service coverage. Data accuracy, completeness, integration, security, and governance present challenges when analyzing information from different sources.

Uploaded by

SubhanjanDas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
261 views10 pages

Assignment 2 - Capstone

This document describes a data flow for networking metrics from various school boards in Ontario. Logging devices in each school board collect data on provisioned capacity and usage every 10 minutes and send it to log collectors. The data is then formatted into a report and sent to the EDU Broadband compliance and insight manager portal for validation and analysis. Key data entities tracked include uptime, bandwidth usage, packet errors, and service coverage. Data accuracy, completeness, integration, security, and governance present challenges when analyzing information from different sources.

Uploaded by

SubhanjanDas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

HUMBER INSTITUTE OF TECHNOLOGY

AND ADVANCED LEARNING

(HUMBER COLLEGE)

Capstone Assignment 2
Capstone Course - BIA-5450-0GA

Group - 1

Networking Metrics

Supporting Sustainment & Continuous Improvements

Submitted by:

Last Name First Name Student Number


Shah Divyansh N01472284
Patel Hardi N01480409
Das Subhanjan N01431473
Shand Yuvraj N01479401
Tyagi Abhi N01474042

Submitted to: Prof. Salam Ismaeel

Submission Date: 2023-03-


Table of Content

Data sources and key data entities and flows 3


Data Sources and Flow Diagram 3
Key Data Entities 4
Data Dictionary 5
Data Cleaning 6
Data Storage 6
Data Output 6
Challenges 6
References 6

2
Data sources and key data entities and flows 

Data Sources and Flow Diagram

Figure 1: Data Flow Diagram

The logging devices installed in each school board, which oversee transmitting the data of the
provisioned capacity and usage to the Ministry of Education in Ontario, are where the data
used in this report originally came from. K-12 Edge Device (SD-WAN) and LAN/WLAN are the
logging devices in consideration. They gather data in every 10-minute intervals and send it to
log collectors like NetFlow, IPFIX, or SNMP. These log collectors are quite good at gathering
data from a variety of sources.

Following this, the collected data is then channeled through a report generator that is
programmed to produce a report in a pre-defined template. This template data is then passed
on to the EDU Broadband compliance and insight manager (BCIM) portal, which serves as the
next stop for the monthly report. It's here that the data undergoes validation and analysis
procedures to ensure that the report produced is a true representation of the data obtained.

3
It's worth noting that the process of collecting data in real-time every 10 minutes, followed by
its transformation and transfer through a report generator before finally undergoing validation
and analysis at the EDU Broadband compliance and insight manager portal is highly intricate
and multifaceted.

Key Data Entities


We must consider these crucial business data entities that we currently receive from the
networking logging tools at each school to accomplish tracking of network availability at each
school board. The following are included among these crucial business data entities that the
ministry is trying to understand and keep track of:

 Uptime and Downtime: Used to describe the time that each school's network is active
or not.
 Broadband Provisioned Capacity: The highest data transfer rate permitted over each
school's broadband connection, is referred to.
 Utilization Over Time: A vital indicator that shows the history of network traffic and
data transfer rates over time.
 Packet Errors: Key sign of the effectiveness of the network and the standard of the
services being offered.
 Service Coverage: The degree of network availability and connection at each school.

It should be emphasized that not every school will have access to all the above-mentioned
data entities. Due to a variety of factors, some of the data may be missing.

4
Data Dictionary

Field Name Data Type Description Example


DSB String District School Board names Algonquin & Lakeshore
Catholic District School Board
Final Acronym String Calculated field using DSB names 20.DSBn
to create an acronym
School Name String Schools under each school board Applewood Public School
Monitoring String Tool used to track networks SolarWinds
Tool
Device String Monitoring tool name and APP-FGT100E.dsbn.edu.on.ca
Unique ID
Interface String Networking split method vlan_internet
Key String Calculated unique id 20.DSBn:APP-
concatenating Final Acronym + FGT100E.dsbn.edu.on.ca:vlan
Device + Interface _internet
Avg_rec Integer Average Download Speed 7
Peak_rec Integer Maximum Download Speed 253
Avg_trans Integer Average Upload Speed 1
Peak_trans Integer Maximum Upload Speed 29
Provisioned Integer Provisioned Download Speed 250000
Downloads
Provisioned Integer Provisioned Upload Speed 250000
Upload
Avg_avail Percentage Average Uptime percentage 100.00%
every month
Pkt_err Percentage Average packet error every 0.00%
month

5
Data Cleaning

6
Data Storage
Data inflight or in transit, is data actively moving from one location to another or moving
between systems or devices.
Data at rest is the data that has reached its destination and is not actively moving between
networks or devices.
The inflight data in our data flow diagram is stored in temporary buffers or caches when it being
transferred from , for instance, LAN/wLAN network to DSP data collector. Once the data
successfully reaches its destination, i.e. DSP Data collector, it is then usually stored in a more
permanent location.
When data is at rest in our process flow, it is stored as database. This database can be accessed
to derive insights or create reports that adhere to pre-defined business rules.

The storage size for different types of data varies on the type and format of the data. The
different types of data are:
 Text Data - This type of data can range from a few Kilobyte to severe megabyte in size
 Audio Data - Audio data can range from a few KB to severe MB in size depending on the
length of the audio
 Image Data - Image data can range from a few KB to severe MB in size depending on the
resolution of the image.
 Video Data - Video data can range from a few MB to severe GB in size depending on the
length and the resolution of the video.

We are only concerned with the text data as the networking data flow contains only text data.

Following security rules have been put in place at both source and output for accessing the
data:
 Authentication: Users may need to provide credentials such as usernames and
passwords to access the data. Multi-factor authentication (MFA) can also be used to
provide an additional layer of security.

 Authorization: Users may only be allowed to access certain data based on their roles or
permissions. This can be controlled through access control lists or other access
management systems.

 Encryption: Data can be encrypted both in transit and at rest to prevent unauthorized
access. Encryption keys can be managed through a key management system.

7
 Monitoring and auditing: Access to data can be monitored and logged to detect any
unauthorized access or suspicious activity.

Data Output

8
Challenges
(https://www.orioninc.com/blog/the-four-pillars-of-data-management/#:~:text=Specifically%2C
%20there%20are%20four%20major,Standards%2C%20Integration%2C%20and
%20Quality.&text=Most%20importantly%2C%20in%20order%20to,data%20as%20a
%20corporate%20asset.) for all points

Data accuracy, completeness, integration, security, and governance are some of the challenges
faced when dealing with data from different sources. In this report, the data comes from logging
devices such as the K-12 Edge Device (SD-WAN) and LAN/WLAN, which transmit data about
the provisioned capacity and usage to the Ministry of Education in Ontario. The following
challenges are encountered when working with such data:

Data Accuracy: The precision of data is a crucial factor in decision-making. Using various
devices, software, and monitoring tools can result in discrepancies between the collected
information from different sources leading to misleading decisions based on erroneous
conclusions. It's important to note that even slight variations may cause significant impact; for
example, if one institution uses an alternative logging device than another institute or company it
could lead to inconsistencies with their outcome causing biasness towards certain individuals or
groups within society using the wrong analogy relating data patterns which isn't relevant at all.

Data Completeness: Not all schools may have access to all the crucial business data entities, such
as uptime and downtime, broadband provisioned capacity, utilization over time, packet errors,
and service coverage, which the Ministry of Education in Ontario uses to understand and keep
track of the network availability at each school board. The incomplete data can make it hard for
the Ministry to make informed decisions.

Data Integration: The task of merging data from assorted origins and arrangements can present
difficulties, particularly when managing immense quantities of data. There is the possibility for
issues with coordination between various sources and formats that could result in delays or
inaccuracies during handling and examination of said data. To illustrate this difficulty, log
collectors such as NetFlow, IPFIX or SNMP used to retrieve information from logging
apparatuses may not synchronize cohesively with report generators which would give rise to
complications throughout conjoint activities involving transcription and dispatching transactions.

Data Security: Data security is critical when handling sensitive data such as network availability
and usage. There may be potential risks of data breaches, data theft, or unauthorized access,
compromising the integrity and confidentiality of the data. For instance, if the monitoring tool
used to track the networks is not secure, it can expose the data collected to potential risks.

Data Governance: The management of data governance is no easy achievement, especially when
dealing with information derived from a plethora of sources. The task necessitates an articulated
strategy and establishes policies to safeguard the integrity, entirety, and harmony of recorded
facts. If such measures are not adhered to successfully maintain accuracy throughout all datasets

9
sourced; then this could potentially result in false, conclusions being drawn or poor decision-
making abilities ascertained by users relying on some source material within said sets.

References

10

You might also like