A Road Usage Charging invoicing system
ClearRoad had won a US project where the need was to create a system that could correctly invoice cars for their Road Usage based on actual GPS location. The aim was thus, using Automatic® OBU devices, to retrieve the GPS position of the car over a certain period in time, and generate a valid invoice based on the regulatory tariffs for Road Usage. We created such a system in a little below six months, which ran in production for one year with very little production issues.
We used ERP5 as an ERP backend for invoicing and data flow management, and used the python scientific stack (mostly Shapely), to process the raw GPS data and compute the driving distance on the different types of roads, in the different US states. We hosted our own Openstreetmap database in an Azure VM for the road identification and categorization. (In case of disputes, we had to be able to mark a road as "private", so that Road Users wouldn't be charged for driving on that road).
This project shows how, with minimal resources (basically two/three FTEs, and a relatively low budget), quite performant systems can be delivered on time by RES.
Improve Ad Viewability
Artnet has a news website that generates quite some advertisement revenue. High-end advertisement clients (luxury brands mostly), were getting more and more demanding towards the quality of the ads on the Artnet news website. A metric that was poor for Artnet, was the viewability: i.e. the ratio of ads that are actually viewed by users, compared to the amount of ads that are loaded into the users' browsers.
Here, we started by a detailed analysis of the reason for this poor performance, which was clearly that their CMS was loading entire articles at once including all ads, even if the user would never scroll all the way down the article. So we implemented a lazy-loading solution that triggered the loading of the ad only when the user scrolled close to the ad's placeholder. The main challenge was to implement that into the existing, heavily customized CMS, and to respect all ad loading rules (e.g. which ad can be loaded where, in what order, etc.)
We improved the viewability measure (as measured by Google) from below 50% to around 70%.
Here RES expertise about low-level details in how browsers execute javascript and how the DOM is loaded was very helpful.
Commodity price simulator
As a large energy company, ENGIE had developed over many years, a quite featurefull price simulator for the various commodities that it trades as part of its energy production and sales activities. Over time, the code-base had increased greatly, and various subcomponents of the simulator were poorly packaged, dependent on many libraries that were unmaintained and sometimes poorly documented or tested, with little usage of source control tools. The deployment process was increasingly tedious, and running the simulator was a real issue as analysts ran local versions of the code with hacky patches, and no real control over which version of the subcomponents they were using. All this was also still running in Python 2, which was soon to disappear.
So here, we first created a proper Python package for the project and included all dependencies that were not well-separated from the main project, porting them to python 3 at the same time. We started versioning with Git, and solved the installation dependency hell by trial and error using Anaconda Python. Once we had a running version of the code with its dependencies, we created a proper DevOps build procedure that could install the package and all its dependencies on any compatible machine. We stabilized the code, added some long-awaited features, and moved towards production on Azure.
The first step in production was to create periodic runs, which were configured with cron on Ubuntu. Then, results were published with an Apache web server on the internal network, and the teams dependent on weekly results could automate the result retrieval. The second step was to create a real-time querying service of the simulator. ENGIE had developed an inhouse tool for automated task execution. We integrated the simulator into that tool so that users could just upload a JSON file that would configure their specific run, and they would be able to query the results on the Apache server. This task executor also made it possible to perform very feature-full price simulations which take into account weather dependency of the ENGIE weather tool that was coded in R.
Finally, we ported the entire production setup into a DevOps tool of ENGIE, so that many more instances of these services could be deployed if needed.
Again, this shows that by using proper development practices, good opensource tools and lean cloud infrastructure, we can create great analytics systems in very little time.
Research and development projects
At RES, we invest massively in R&D ourselves. In order to quickly experiment with large datasets, it is very convenient to use huge compute resources that are available today. Indeed, the largest recent commodity servers can host around 1TB of RAM, which is very handy to inspect large datasets, try out models on them quickly, etc. This is impossible on typical commodity laptops.
RES purchased in 2017 and 2019, refurbished Facebook infrastructure and thus has access to very large compute servers, at a fraction of the cost of the cloud. We just run Linux, Anaconda Python and Jupyter for R&D and experimentation with notebooks, and Postgres and TimescaleDB for storage. Although the infrastructure is quite old (2011-2012: first generation Intel Xeon E5), the performance is almost equivalent to today's machines, but the only downside is that power consumption is 30-40% higher.
Here is a benchmark of floating-point performance on a RES compute instance, compared to a recent machine:
RES | Public cloud |
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Model name: Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
Stepping: 7
CPU MHz: 1624.778
CPU max MHz: 3000.0000
CPU min MHz: 1200.0000
BogoMIPS: 4399.78
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15,24-31
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts
$ free -m
total used free shared buff/cache available
Mem: 257932 25974 150552 193 81405 229958
Swap: 262116 0 262116
0 262116
$ time python -c 'import numpy; A = numpy.random.random((10000, 10000)); test = numpy.dot(A,A)'
real 0m11.342s
user 2m28.529s
sys 0m2.392s |
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Stepping: 2
CPU MHz: 2500.000
CPU max MHz: 2500.0000
CPU min MHz: 1200.0000
BogoMIPS: 4993.87
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 30720K
NUMA node0 CPU(s): 0-11,24-35
NUMA node1 CPU(s): 12-23,36-47
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb invpcid_single intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
$ free -m
total used free shared buff/cache available
Mem: 386673 5468 280256 514 100948 377879
Swap: 1023 0 1023
$ time python -c 'import numpy; A = numpy.random.random((10000, 10000)); test = numpy.dot(A,A)'
real 0m6.125s
user 3m0.645s
sys 0m12.072s
|
We see a great performance of the older Xeon, 11.3 seconds vs 6.1 on the newer v3, knowing the RES system is a lower-end Xeon CPU: it has only 32 cores (vs 48 of the cloud one), and has a lower clock speed.
We typically run out data science R&D on these machines, and we have a dedicated GPU machine to train very intensive GPU computations in 32 bit floating point (neural networks). When we need to train a really huge neural network, our GPUs are too small so we sometimes use Google Cloud TPUs, which are of great performance and it's quite cheap for just on-demand training of a model.