Anthony Abou-Jaoude’s story with Continent 8 continues as he re-joins the Group in a key role as Director of  Product Engineering. Anthony, who was the first software engineer to ever be hired at Continent 8, shares with us his personal data journey. Read more about Anthony’s professional highlights, the current unprecedented volume of data and how companies across the globe can ensure they get value from it.

By Anthony Abou-Jaoude 

Anthony Abou-JaoudeI am sure everyone has been hearing the term Big Data more often in the past few years and the discussions addressing the importance of it.  This has never been as true and as relevant than during these pandemic times, where we are constantly being bombarded by statistics and charts that we must make sense of.

In these ever-challenging times, data is enabling decision makers to act fast and make choices based on facts and statistics.  Even more, this pandemic is highlighting the importance of the visualization of data and extracting intelligence from it. At the same time the general public a way of understanding the impact of this virus in a more familiar format, whether it is different forms of graphs and charts, or infographic pictorial maps.

First, let me start with a few data points about myself:  I am a returning employee of Continent 8, where I was the first software engineer ever hired. It was a great way to start my professional career right out of university.  In the spirit of data and visualization, I can share some personal and professional highlights with you using a diagram:

Personal and professional highlights
Personal and professional highlights

I am so glad and privileged to return to such a success story and familiar faces at Continent 8.  From my first week back, I dove into the data and visualized how the company has been experiencing exponential growth across multiple verticals. For example, as nothing depicts it better than a chart, the data flowing within our network shows no sign of slowing down:

Data flow within Continent 8's network
Data flow within Continent 8’s network from 2012 – 2020

In the same vein, we, as a population in general, are creating unprecedented amounts of data. In fact, every two days we generate as much data as we did from the beginning of time till 2003.  It is estimated that in 2018 we have accumulated 33 zettabytes (ZB) of data, and by 2025 that number will grow by 430% to total 175 ZBs.

This growth of data is challenging companies across the globe to get value out of their data, and to optimize its collection, storage and usage.  The best big data applications are those that unlock value from both structured and unstructured data, and to achieve that, you need a combination of tools, data scientists and leadership. All coming together on a data journey to deal with this massive influx of data.

Just like Maslow’s pyramid, there exists a data science hierarchy of needs where the first and most basic of requirements must be met in order to have a proper and successful data strategy:

The Data Science Hierarchy of Needs Pyramid
The Data Science Hierarchy of Needs Pyramid SOURCE: “THE AI HIERARCHY OF NEEDS” MONICA ROGATI.

An organization must start by looking at what data it has, how it’s being used and collected, it’s quality, what is missing and where it can be optimized.  Once this is understood, it can move on to the next step, which is enabling and supporting personnel that need the data to be able to query it themselves by empowering them with the right tools and training.  The ability to slice and dice the data and securely share it with multiple groups in the organization, is the difference between just collecting the data and having it idle versus operationalizing it by extracting value from it and reducing risk.

After setting a proper data foundation, a company can get into the final stages of the journey which really gives the competitive edge.  With proper correlation and data sets in place, we start getting into the forecasting and machine learning cases.  Using a multitude of tools, techniques and algorithms, we can pre-empt future needs, or prevent failures, with a reasonable level of confidence.

The data journey is a never ending one, always affected by factors outside a dataset and in need of continuous improvement, with the human component very much important at every stage.

Without realizing it, many of us are already working together on this data journey. Let us know where Continent 8 can help as we are continuing to unlock the power of data for ourselves and our customers.

At 10:03UTC on Sunday 30th August Continent 8’s monitoring systems began to notice something was awry with traffic traversing our links to and from CenturyLink/Level(3).

The events which followed were relatively simple for Continent 8 and service providers in a similar position, but many others globally (including some large names) were unaware that they were set for a global outage of over 4 hours. Casualties of the issue to varying degrees included Reddit, Hulu, AWS, Blizzard, Steam, Cloudflare, Xbox Live, Discord, and dozens more.

Continent 8 is connected to CenturyLink/Level(3) among a large and diverse set of network providers. In simple terms, when we see an increase in errors or an outage from one network provider, our systems automatically switch all the affected traffic across alternative providers. Given the number of providers we have access to, we are able to continue to route all traffic even when one provider has a global issue.

In addition to automated route and peering provider management, via specific endpoint monitoring setup across the C8 internet peering; the NOC is also able to manually manipulate the peering relationships and priorities should we identify an issue with an ability to get to source or destination locations. Providers relying on solely automated protocols such as BGP with upstream peering or exchange providers can amplify an issue when they continue to treat a provider as “good” when in reality it isn’t.

The specific behaviour of the issue on Sunday was such that automatic redirection of all traffic away from CenturyLink wasn’t triggered by the routing protocol (it was the same situation for all ISPs), and so manual intervention was required. By 10:35UTC Continent 8 NOC had identified and acted to move all traffic away from CenturyLink to other providers and all services remained fully available to our customers.

The same action was also then taken by those providers fortunate to have similar options. Some had services available again within the hour, some longer. Interestingly, some large ISPs took much longer to manually switch, and the online feedback of when services were eventually reinstated is interesting.

So why did some individuals, businesses and service providers suffer for over 4 hours? Well CenturyLink/Level(3) is among the largest network providers in the world, who also happen to provide some of the lowest latency routing. As a result, many hosting providers rely solely on their connectivity to the Internet, especially in some of the most densely connected locations which CenturyLink/Level(3) operate or peer in local exchanges. Because this outage appeared to take all of the CenturyLink/Level(3) network offline, both ISPs solely reliant and individuals who are CenturyLink customers would not have been able to reach any other Internet provider until the issue was resolved, over 4 hours later. They literally had no other option but to wait.

In addition, a “knock-on” effect can be that those peering providers or exchanges not migrating away from CenturyLink/Level(3) can compound the issue if they sit between a source and destination. Meaning having a broader peering and manual capability could best minimise the impact, if not completely resolve it.

As a provider CenturyLink/Level(3) responded and communicated well throughout and did everything they could during what was clearly a globally significant event. However, if ever there was a case study for carrier redundancy and awareness that even the most reliable, robust and highest capacity networks can go down, this was it. Stacking odds in your favour and investing in services which are truly carrier redundant may at times seem overkill, but it clearly pays off when events like this inevitably happen.

In terms of the details around what happened, CenturyLink released a statement which advised that the root cause was due to an incorrect Flowspec announcement effectively preventing BGP routes from taking root. Regardless of the detail though even if that issue was to be prevented in future, there are many other issues which could take the largest providers offline again.

ISPs and businesses across the globe should again be asking themselves this week “Do we, and our service provider have sufficient carrier redundancy and incident management capability?”.

Let's work together.

GET IN TOUCH

Asia +65 3165 4649
Europe +44 1624 694625
Latin America +54 11 5168 5637
North America +1 514 461 5120