-
Toto: Time Series Optimized Transformer for Observability
Authors:
Ben Cohen,
Emaad Khwaja,
Kan Wang,
Charles Masson,
Elise Ramé,
Youssef Doubli,
Othmane Abou-Amal
Abstract:
This technical report describes the Time Series Optimized Transformer for Observability (Toto), a new state of the art foundation model for time series forecasting developed by Datadog. In addition to advancing the state of the art on generalized time series benchmarks in domains such as electricity and weather, this model is the first general-purpose time series forecasting foundation model to be…
▽ More
This technical report describes the Time Series Optimized Transformer for Observability (Toto), a new state of the art foundation model for time series forecasting developed by Datadog. In addition to advancing the state of the art on generalized time series benchmarks in domains such as electricity and weather, this model is the first general-purpose time series forecasting foundation model to be specifically tuned for observability metrics.
Toto was trained on a dataset of one trillion time series data points, the largest among all currently published time series foundation models. Alongside publicly available time series datasets, 75% of the data used to train Toto consists of fully anonymous numerical metric data points from the Datadog platform.
In our experiments, Toto outperforms existing time series foundation models on observability data. It does this while also excelling at general-purpose forecasting tasks, achieving state-of-the-art zero-shot performance on multiple open benchmark datasets.
△ Less
Submitted 11 July, 2024; v1 submitted 10 July, 2024;
originally announced July 2024.
-
FlipHash: A Constant-Time Consistent Range-Hashing Algorithm
Authors:
Charles Masson,
Homin K. Lee
Abstract:
Consistent range-hashing is a technique used in distributed systems, either directly or as a subroutine for consistent hashing, commonly to realize an even and stable data distribution over a variable number of resources. We introduce FlipHash, a consistent range-hashing algorithm with constant time complexity and low memory requirements. Like Jump Consistent Hash, FlipHash is intended for applica…
▽ More
Consistent range-hashing is a technique used in distributed systems, either directly or as a subroutine for consistent hashing, commonly to realize an even and stable data distribution over a variable number of resources. We introduce FlipHash, a consistent range-hashing algorithm with constant time complexity and low memory requirements. Like Jump Consistent Hash, FlipHash is intended for applications where resources can be indexed sequentially. Under this condition, it ensures that keys are hashed evenly across resources and that changing the number of resources only causes keys to be remapped from a removed resource or to an added one, but never shuffled across persisted ones. FlipHash differentiates itself with its low computational cost, achieving constant-time complexity. We show that FlipHash beats Jump Consistent Hash's cost, which is logarithmic in the number of resources, both theoretically and in experiments over practical settings.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Sampling Space-Saving Set Sketches
Authors:
Homin K. Lee,
Charles Masson
Abstract:
Large, distributed data streams are now ubiquitous. High-accuracy sketches with low memory overhead have become the de facto method for analyzing this data. For instance, if we wish to group data by some label and report the largest counts using fixed memory, we need to turn to mergeable heavy hitter sketches that can provide highly accurate approximate counts. Similarly, if we wish to keep track…
▽ More
Large, distributed data streams are now ubiquitous. High-accuracy sketches with low memory overhead have become the de facto method for analyzing this data. For instance, if we wish to group data by some label and report the largest counts using fixed memory, we need to turn to mergeable heavy hitter sketches that can provide highly accurate approximate counts. Similarly, if we wish to keep track of the number of distinct items in a single set spread across several streams using fixed memory, we can turn to mergeable count distinct sketches that can provide highly accurate set cardinalities.
If we were to try to keep track of the cardinality of multiple sets and report only on the largest ones, maintaining individual count distinct sketches for each set can grow unwieldy, especially if the number of sets is not known in advance. We consider the natural combination of the heavy hitters problem with the count distinct problem, the heavy distinct hitters problem: given a stream of $(\ell, x)$ pairs, find all the labels $\ell$ that are paired with a large number of distinct items $x$ using only constant memory.
No previous work on heavy distinct hitters has managed to be of practical use in the large, distributed data stream setting. We propose a new algorithm, the Sampling Space-Saving Set Sketch, which combines sketching and sampling techniques and has all the desired properties for size, speed, accuracy, mergeability, and invertibility. We compare our algorithm to several existing solutions to the heavy distinct hitters problem, and provide experimental results across several data sets showing the superiority of the new sketch.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Swapping Semantic Contents for Mixing Images
Authors:
Rémy Sun,
Clément Masson,
Gilles Hénaff,
Nicolas Thome,
Matthieu Cord
Abstract:
Deep architecture have proven capable of solving many tasks provided a sufficient amount of labeled data. In fact, the amount of available labeled data has become the principal bottleneck in low label settings such as Semi-Supervised Learning. Mixing Data Augmentations do not typically yield new labeled samples, as indiscriminately mixing contents creates between-class samples. In this work, we in…
▽ More
Deep architecture have proven capable of solving many tasks provided a sufficient amount of labeled data. In fact, the amount of available labeled data has become the principal bottleneck in low label settings such as Semi-Supervised Learning. Mixing Data Augmentations do not typically yield new labeled samples, as indiscriminately mixing contents creates between-class samples. In this work, we introduce the SciMix framework that can learn to generator to embed a semantic style code into image backgrounds, we obtain new mixing scheme for data augmentation. We then demonstrate that SciMix yields novel mixed samples that inherit many characteristics from their non-semantic parents. Afterwards, we verify those samples can be used to improve the performance semi-supervised frameworks like Mean Teacher or Fixmatch, and even fully supervised learning on a small labeled dataset.
△ Less
Submitted 20 May, 2022;
originally announced May 2022.
-
Towards efficient feature sharing in MIMO architectures
Authors:
Rémy Sun,
Alexandre Ramé,
Clément Masson,
Nicolas Thome,
Matthieu Cord
Abstract:
Multi-input multi-output architectures propose to train multiple subnetworks within one base network and then average the subnetwork predictions to benefit from ensembling for free. Despite some relative success, these architectures are wasteful in their use of parameters. Indeed, we highlight in this paper that the learned subnetwork fail to share even generic features which limits their applicab…
▽ More
Multi-input multi-output architectures propose to train multiple subnetworks within one base network and then average the subnetwork predictions to benefit from ensembling for free. Despite some relative success, these architectures are wasteful in their use of parameters. Indeed, we highlight in this paper that the learned subnetwork fail to share even generic features which limits their applicability on smaller mobile and AR/VR devices. We posit this behavior stems from an ill-posed part of the multi-input multi-output framework. To solve this issue, we propose a novel unmixing step in MIMO architectures that allows subnetworks to properly share features. Preliminary experiments on CIFAR-100 show our adjustments allow feature sharing and improve model performance for small architectures.
△ Less
Submitted 20 May, 2022;
originally announced May 2022.
-
DDSketch: A fast and fully-mergeable quantile sketch with relative-error guarantees
Authors:
Charles Masson,
Jee E. Rim,
Homin K. Lee
Abstract:
Summary statistics such as the mean and variance are easily maintained for large, distributed data streams, but order statistics (i.e., sample quantiles) can only be approximately summarized. There is extensive literature on maintaining quantile sketches where the emphasis has been on bounding the rank error of the sketch while using little memory. Unfortunately, rank error guarantees do not precl…
▽ More
Summary statistics such as the mean and variance are easily maintained for large, distributed data streams, but order statistics (i.e., sample quantiles) can only be approximately summarized. There is extensive literature on maintaining quantile sketches where the emphasis has been on bounding the rank error of the sketch while using little memory. Unfortunately, rank error guarantees do not preclude arbitrarily large relative errors, and this often occurs in practice when the data is heavily skewed. Given the distributed nature of contemporary large-scale systems, another crucial property for quantile sketches is mergeablility, i.e., several combined sketches must be as accurate as a single sketch of the same data. We present the first fully-mergeable, relative-error quantile sketching algorithm with formal guarantees. The sketch is extremely fast and accurate, and is currently being used by Datadog at a wide-scale.
△ Less
Submitted 28 August, 2019;
originally announced August 2019.
-
Toward Automatic Understanding of the Function of Affective Language in Support Groups
Authors:
Amit Navindgi,
Caroline Brun,
Cécile Boulard Masson,
Scott Nowson
Abstract:
Understanding expressions of emotions in support forums has considerable value and NLP methods are key to automating this. Many approaches understandably use subjective categories which are more fine-grained than a straightforward polarity-based spectrum. However, the definition of such categories is non-trivial and, in fact, we argue for a need to incorporate communicative elements even beyond su…
▽ More
Understanding expressions of emotions in support forums has considerable value and NLP methods are key to automating this. Many approaches understandably use subjective categories which are more fine-grained than a straightforward polarity-based spectrum. However, the definition of such categories is non-trivial and, in fact, we argue for a need to incorporate communicative elements even beyond subjectivity. To support our position, we report experiments on a sentiment-labelled corpus of posts taken from a medical support forum. We argue that not only is a more fine-grained approach to text analysis important, but simultaneously recognising the social function behind affective expressions enable a more accurate and valuable level of understanding.
△ Less
Submitted 6 October, 2016;
originally announced October 2016.
-
Gated networks: an inventory
Authors:
Olivier Sigaud,
Clément Masson,
David Filliat,
Freek Stulp
Abstract:
Gated networks are networks that contain gating connections, in which the outputs of at least two neurons are multiplied. Initially, gated networks were used to learn relationships between two input sources, such as pixels from two images. More recently, they have been applied to learning activity recognition or multi-modal representations. The aims of this paper are threefold: 1) to explain the b…
▽ More
Gated networks are networks that contain gating connections, in which the outputs of at least two neurons are multiplied. Initially, gated networks were used to learn relationships between two input sources, such as pixels from two images. More recently, they have been applied to learning activity recognition or multi-modal representations. The aims of this paper are threefold: 1) to explain the basic computations in gated networks to the non-expert, while adopting a standpoint that insists on their symmetric nature. 2) to serve as a quick reference guide to the recent literature, by providing an inventory of applications of these networks, as well as recent extensions to the basic architecture. 3) to suggest future research directions and applications.
△ Less
Submitted 10 December, 2015;
originally announced December 2015.