Whilst working on a new product that we have recently launched I’ve spent quite a lot of time speaking to customers about Timing and specifically understanding how they synchronise the clocks within their business. The answers to this a have been fairly standard, they take time from one of the many public NTP servers that are available over the internet. Each server that they have is often configured to get its time independently or sometimes they may have a Master clock which takes the time in and then redistributes the public NTP across the network.
With finance customers who have an interest in trading where having an accurate time is often mandated by regulators the answer has response had been a little different. Typically, they will have a more sophisticated setup comprising of a Grand Master clock which receives its input via an antenna which receives a GNSS (Global Navigation Satellite System) signal – GPS is one of many GNSS systems that are in operation which provide us with timing and positioning information when you receive signals from multiple satellites together. The time is then distributed from the Grand Master via one of two timing protocols, NTP and PTP, to devices on their network.
Being able receive Time of Day information via GNSS is great. Firstly: It’s cheap to implement, you can get a GNSS receiver for under $50, and secondly, they are accurate to 10s of Nanoseconds (1/1,000,000,000th of a Second – billionth), far more than the vast majority of people need.
Utilising NTP is fairly straight forward, you simply tell each device which NTP server to use to get it’s time from – this can often be controlled by central group policies so you don’t even need to make a configuration change on each device. With a Grand Master setup, once it’s in and running it is often left to run with little or no ongoing admin.
There is often one point that is often overlooked: How do you know that this time is accurate?
The time that each public timing source declares can wildly differ from each other.
This graph shows four Public NTP Sources over a 24 hour period.
- time.google.com – Google’s public NTP service (Yellow)
- time.apple.com – MacOS and iOS default time source (Orange)
- time.windows.com – the default NTP Service for Windows devices (Blue)
- time.facebook.com – Facebook’s public NTP Service (Purple)
You can see for the Microsoft service that there is large regular wave with a delta of 1.5 milliseconds (1.5/1000th of a second) in the level of accuracy, commonly also referred to as drift, it certainly isn’t what you are aiming for when you consuming a time signal. It’s very different to the Facebook source which has an ultra-steady line with a minimal level of jitter across the 24-hour period. You will observe with Google and Apple Sources that there is several clearly definable steps in the plots, this is where the provider, Google and Apple in this case, are balancing the time sources across different servers within their networks to spread the load but each server is reporting a slightly different time.
The Time Map shows that in a four-hour period that the address that the time source associated with time.microsoft.com had five separate time sources of its own.
This provides some level of explanation for the wavey nature of the plot for this source. The Frequency graph provides further explication. Frequency is the rate of correction that is measured every time we sync with the server. Currently we are set to sync once per second, 1hz. Note the level of change in rates in comparison to Facebook.
Who would have thought Facebook is the place to set your Public NTP server to !!
Other factors can also have an impact accuracy, such as the distance to the timing source or the utilisation of the internet link that the source is coming over.
This screenshot shows how the accuracy of both internal and external time sources can be affected when an organisations nightly backup occurs. During the backup timing accuracy drifts as much as 50ms for the Internet Time Sources and close to 30ms for Internal Sources. QoS (Quality of Service) markers on the internet connection have prioritised inter-site traffic but have not been able to guarantee the bandwidth and quality of the connection to maintain a reasonable level of accuracy across a period of time lasting several hours.
Many companies have or are in the process of deploying SD-WAN, which utilises non-dedicated WAN Circuits for connectivity between sites, there are numerous cost benefits to doing this but they are far more susceptible to variable delay (latency) and jitter, both of which has an impact of the timing accuracy
GNSS based systems which don’t rely on remote sources to establish the time are very vulnerable to attack – signals can be easily blocked, either maliciously by a signal jammer or by accident for example by a bird sitting on your antenna.
But in the end why is all of this this important: There are many answers to this but I am going to focus on two,
Firstly, if you are investigating a cyber-attack or you have systems which are proactively identifying and attempting to block attacks the log information that is being analysed is reviewed in order based upon the timestamps for each log entry. If each system in your network believes the time if different to other systems on the network by even a few milliseconds, 1/1000th of a second, how can you or the systems doing the analysis correctly correlate an event and understand the correct order that every sub-event occurred? Servers often only sync their clocks once an hour or less, Windows Server default is sync every 130 minutes, and during which point clocks can drift by hundreds of milliseconds. For example, 100milliseconds is about how far it takes network traffic to travel 10,000 miles, that’s the distance from London to San Francisco and back, or New York to Sao Paulo and back. You can travel a long way in your network, in and out of servers in that time!
The second reason is down to regulations, particularly those related to Financial Trading where you are required to ensure clock accurate to UTC (Universal Coordinated Time) to levels as low as 100 microseconds (100/1,000,000th of a second). It is very easy to add up the delay that each component in you the path contributes but the question comes back to accuracy, how do you know the time is accurate despite your best efforts in ensuring that you have a resilient architecture
TimeKeeper, the product which I’ve used in the graphs in this blog, has recently been introduced by Keysight to measure the accuracy of your timing inputs/signals, detect any anomalies and steer the host systems clock based upon the most accurate time source you have inputting. TimeKeeper can also distribute time and provide compliance reports that show if all of your clients are correctly syncing, their accuracy and if there have been any breaks in synchronisation.