-
Fourier Neural Operator Surrogate Model to Predict 3D Seismic Waves Propagation
Authors:
Fanny Lehmann,
Filippo Gatti,
Michaël Bertin,
Didier Clouteau
Abstract:
With the recent rise of neural operators, scientific machine learning offers new solutions to quantify uncertainties associated with high-fidelity numerical simulations. Traditional neural networks, such as Convolutional Neural Networks (CNN) or Physics-Informed Neural Networks (PINN), are restricted to the prediction of solutions in a predefined configuration. With neural operators, one can learn…
▽ More
With the recent rise of neural operators, scientific machine learning offers new solutions to quantify uncertainties associated with high-fidelity numerical simulations. Traditional neural networks, such as Convolutional Neural Networks (CNN) or Physics-Informed Neural Networks (PINN), are restricted to the prediction of solutions in a predefined configuration. With neural operators, one can learn the general solution of Partial Differential Equations, such as the elastic wave equation, with varying parameters. There have been very few applications of neural operators in seismology. All of them were limited to two-dimensional settings, although the importance of three-dimensional (3D) effects is well known.
In this work, we apply the Fourier Neural Operator (FNO) to predict ground motion time series from a 3D geological description. We used a high-fidelity simulation code, SEM3D, to build an extensive database of ground motions generated by 30,000 different geologies. With this database, we show that the FNO can produce accurate ground motion even when the underlying geology exhibits large heterogeneities. Intensity measures at moderate and large periods are especially well reproduced.
We present the first seismological application of Fourier Neural Operators in 3D. Thanks to the generalizability of our database, we believe that our model can be used to assess the influence of geological features such as sedimentary basins on ground motion, which is paramount to evaluating site effects.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
A variance reduction strategy for numerical random homogenization based on the equivalent inclusion method
Authors:
Sebastien Brisard,
Michael Bertin,
Frederic Legoll
Abstract:
Using the equivalent inclusion method (a method strongly related to the Hashin-Shtrikman variational principle) as a surrogate model, we propose a variance reduction strategy for the numerical homogenization of random composites made of inclusions (or rather inhomogeneities) embedded in a homogeneous matrix. The efficiency of this strategy is demonstrated within the framework of two-dimensional, l…
▽ More
Using the equivalent inclusion method (a method strongly related to the Hashin-Shtrikman variational principle) as a surrogate model, we propose a variance reduction strategy for the numerical homogenization of random composites made of inclusions (or rather inhomogeneities) embedded in a homogeneous matrix. The efficiency of this strategy is demonstrated within the framework of two-dimensional, linear conductivity. Significant computational gains vs full-field simulations are obtained even for high contrast values. We also show that our strategy allows to investigate the influence of parameters of the microstructure on the macroscopic response. Our strategy readily extends to three-dimensional problems and to linear elasticity. Attention is paid to the computational cost of the surrogate model. In particular, an inexpensive approximation of the so-called influence tensors (that are used to compute the surrogate model) is proposed.
△ Less
Submitted 29 March, 2023;
originally announced April 2023.
-
Improvement of algebraic attacks for solving superdetermined MinRank instances
Authors:
Magali Bardet,
Manon Bertin
Abstract:
The MinRank (MR) problem is a computational problem that arises in many cryptographic applications. In Verbel et al. (PQCrypto 2019), the authors introduced a new way to solve superdetermined instances of the MinRank problem, starting from the bilinear Kipnis-Shamir (KS) modeling. They use linear algebra on specific Macaulay matrices, considering only multiples of the initial equations by one bloc…
▽ More
The MinRank (MR) problem is a computational problem that arises in many cryptographic applications. In Verbel et al. (PQCrypto 2019), the authors introduced a new way to solve superdetermined instances of the MinRank problem, starting from the bilinear Kipnis-Shamir (KS) modeling. They use linear algebra on specific Macaulay matrices, considering only multiples of the initial equations by one block of variables, the so called ''kernel'' variables. Later, Bardet et al. (Asiacrypt 2020) introduced a new Support Minors modeling (SM), that consider the Pl{ü}cker coordinates associated to the kernel variables, i.e. the maximal minors of the Kernel matrix in the KS modeling. In this paper, we give a complete algebraic explanation of the link between the (KS) and (SM) modelings (for any instance). We then show that superdetermined MinRank instances can be seen as easy instances of the SM modeling. In particular, we show that performing computation at the smallest possible degree (the ''first degree fall'') and the smallest possible number of variables is not always the best strategy. We give complexity estimates of the attack for generic random instances.We apply those results to the DAGS cryptosystem, that was submitted to the first round of the NIST standardization process. We show that the algebraic attack from Barelli and Couvreur (Asiacrypt 2018), improved in Bardet et al. (CBC 2019), is a particular superdetermined MinRank instance.Here, the instances are not generic, but we show that it is possible to analyse the particular instances from DAGS and provide a way toselect the optimal parameters (number of shortened positions) to solve a particular instance.
△ Less
Submitted 2 August, 2022;
originally announced August 2022.
-
Creating optimal conditions for reproducible data analysis in R with 'fertile'
Authors:
Audrey M. Bertin,
Benjamin S. Baumer
Abstract:
The advancement of scientific knowledge increasingly depends on ensuring that data-driven research is reproducible: that two people with the same data obtain the same results. However, while the necessity of reproducibility is clear, there are significant behavioral and technical challenges that impede its widespread implementation, and no clear consensus on standards of what constitutes reproduci…
▽ More
The advancement of scientific knowledge increasingly depends on ensuring that data-driven research is reproducible: that two people with the same data obtain the same results. However, while the necessity of reproducibility is clear, there are significant behavioral and technical challenges that impede its widespread implementation, and no clear consensus on standards of what constitutes reproducibility in published research. We present fertile, an R package that focuses on a series of common mistakes programmers make while conducting data science projects in R, primarily through the RStudio integrated development environment. fertile operates in two modes: proactively (to prevent reproducibility mistakes from happening in the first place), and retroactively (analyzing code that is already written for potential problems). Furthermore, fertile is designed to educate users on why their mistakes are problematic and how to fix them.
△ Less
Submitted 18 August, 2020;
originally announced August 2020.
-
Practical Algebraic Attack on DAGS
Authors:
Magali Bardet,
Manon Bertin,
Alain Couvreur,
Ayoub Otmani
Abstract:
DAGS scheme is a key encapsulation mechanism (KEM) based on quasi-dyadic alternant codes that was submitted to NIST standardization process for a quantum resistant public key algorithm. Recently an algebraic attack was devised by Barelli and Couvreur (Asiacrypt 2018) that efficiently recovers the private key. It shows that DAGS can be totally cryptanalysed by solving a system of bilinear polynomia…
▽ More
DAGS scheme is a key encapsulation mechanism (KEM) based on quasi-dyadic alternant codes that was submitted to NIST standardization process for a quantum resistant public key algorithm. Recently an algebraic attack was devised by Barelli and Couvreur (Asiacrypt 2018) that efficiently recovers the private key. It shows that DAGS can be totally cryptanalysed by solving a system of bilinear polynomial equations. However, some sets of DAGS parameters were not broken in practice. In this paper we improve the algebraic attack by showing that the original approach was not optimal in terms of the ratio of the number of equations to the number of variables. Contrary to the common belief that reducing at any cost the number of variables in a polynomial system is always beneficial, we actually observed that, provided that the ratio is increased and up to a threshold, the solving can be heavily improved by adding variables to the polynomial system. This enables us to recover the private keys in a few seconds. Furthermore, our experimentations also show that the maximum degree reached during the computation of the Gröbner basis is an important parameter that explains the efficiency of the attack. Finally, the authors of DAGS updated the parameters to take into account the algebraic cryptanalysis of Barelli and Couvreur. In the present article, we propose a hybrid approach that performs an exhaustive search on some variables and computes a Gröbner basis on the polynomial system involving the remaining variables. We then show that the updated set of parameters corresponding to 128-bit security can be broken with 2^83 operations.
△ Less
Submitted 9 May, 2019;
originally announced May 2019.
-
Bridging the Generalization Gap: Training Robust Models on Confounded Biological Data
Authors:
Tzu-Yu Liu,
Ajay Kannan,
Adam Drake,
Marvin Bertin,
Nathan Wan
Abstract:
Statistical learning on biological data can be challenging due to confounding variables in sample collection and processing. Confounders can cause models to generalize poorly and result in inaccurate prediction performance metrics if models are not validated thoroughly. In this paper, we propose methods to control for confounding factors and further improve prediction performance. We introduce Ort…
▽ More
Statistical learning on biological data can be challenging due to confounding variables in sample collection and processing. Confounders can cause models to generalize poorly and result in inaccurate prediction performance metrics if models are not validated thoroughly. In this paper, we propose methods to control for confounding factors and further improve prediction performance. We introduce OrthoNormal basis construction In cOnfounding factor Normalization (ONION) to remove confounding covariates and use the Domain-Adversarial Neural Network (DANN) to penalize models for encoding confounder information. We apply the proposed methods to simulated and empirical patient data and show significant improvements in generalization.
△ Less
Submitted 11 December, 2018;
originally announced December 2018.
-
On the Composition of Scientific Abstracts
Authors:
Iana Atanassova,
Marc Bertin,
Vincent Larivière
Abstract:
Scientific abstracts contain what is considered by the author(s) as information that best describe documents' content. They represent a compressed view of the informational content of a document and allow readers to evaluate the relevance of the document to a particular information need. However, little is known on their composition. This paper contributes to the understanding of the structure of…
▽ More
Scientific abstracts contain what is considered by the author(s) as information that best describe documents' content. They represent a compressed view of the informational content of a document and allow readers to evaluate the relevance of the document to a particular information need. However, little is known on their composition. This paper contributes to the understanding of the structure of abstracts, by comparing similarity between scientific abstracts and the text content of research articles. More specifically, using sentence-based similarity metrics, we quantify the phenomenon of text re-use in abstracts and examine the positions of the sentences that are similar to sentences in abstracts in the IMRaD structure (Introduction, Methods, Results and Discussion), using a corpus of over 85,000 research articles published in the seven PLOS journals. We provide evidence that 84% of abstract have at least one sentence in common with the body of the article. Our results also show that the sections of the paper from which abstract sentence are taken are invariant across the PLOS journals, with sentences mainly coming from the beginning of the introduction and the end of the conclusion.
△ Less
Submitted 9 April, 2016;
originally announced April 2016.
-
Editorial for the First Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics
Authors:
Iana Atanassova,
Marc Bertin,
Philipp Mayr
Abstract:
The workshop "Mining Scientific Papers: Computational Linguistics and Bibliometrics" (CLBib 2015), co-located with the 15th International Society of Scientometrics and Informetrics Conference (ISSI 2015), brought together researchers in Bibliometrics and Computational Linguistics in order to study the ways Bibliometrics can benefit from large-scale text analytics and sense mining of scientific pap…
▽ More
The workshop "Mining Scientific Papers: Computational Linguistics and Bibliometrics" (CLBib 2015), co-located with the 15th International Society of Scientometrics and Informetrics Conference (ISSI 2015), brought together researchers in Bibliometrics and Computational Linguistics in order to study the ways Bibliometrics can benefit from large-scale text analytics and sense mining of scientific papers, thus exploring the interdisciplinarity of Bibliometrics and Natural Language Processing (NLP). The goals of the workshop were to answer questions like: How can we enhance author network analysis and Bibliometrics using data obtained by text analytics? What insights can NLP provide on the structure of scientific writing, on citation networks, and on in-text citation analysis? This workshop is the first step to foster the reflection on the interdisciplinarity and the benefits that the two disciplines Bibliometrics and Natural Language Processing can drive from it.
△ Less
Submitted 17 June, 2015;
originally announced June 2015.
-
Mining Scientific Papers for Bibliometrics: a (very) Brief Survey of Methods and Tools
Authors:
Iana Atanassova,
Marc Bertin,
Philipp Mayr
Abstract:
The Open Access movement in scientific publishing and search engines like Google Scholar have made scientific articles more broadly accessible. During the last decade, the availability of scientific papers in full text has become more and more widespread thanks to the growing number of publications on online platforms such as ArXiv and CiteSeer. The efforts to provide articles in machine-readable…
▽ More
The Open Access movement in scientific publishing and search engines like Google Scholar have made scientific articles more broadly accessible. During the last decade, the availability of scientific papers in full text has become more and more widespread thanks to the growing number of publications on online platforms such as ArXiv and CiteSeer. The efforts to provide articles in machine-readable formats and the rise of Open Access publishing have resulted in a number of standardized formats for scientific papers (such as NLM-JATS, TEI, DocBook). Our aim is to stimulate research at the intersection of Bibliometrics and Computational Linguistics in order to study the ways Bibliometrics can benefit from large-scale text analytics and sense mining of scientific papers, thus exploring the interdisciplinarity of Bibliometrics and Natural Language Processing.
△ Less
Submitted 6 May, 2015;
originally announced May 2015.