ABSTRACT

This workshop is part of the collaborative project between the CNPq/Brazil - Inria/France which involves Brazilian and French researchers in the field of computational science and scientific computing. The general objective of the workshop is to setup a Brazil-France collaborative effort for taking full benefits of future high-performance massively parallel architectures in the framework of very large-scale datasets and numerical simulations. To this end, the workshop proposes multidisciplinary lectures ranging from exploiting the massively parallel architectures with high-performance programming languages, software components, and libraries, to devising numerical schemes and scalable solvers for systems of differential equations.

SUMMARY OF THE PROJECT

The prevalence of modern multicore technologies has made massively parallel computing ubiquitous and offers a huge theoretical potential for solving a multitude of scientific and technological challenges. Nevertheless, most applications and algorithms are not yet ready to utilize available architecture capabilities. Developing large-scale scientific computing tools that efficiently exploit these capabilities will be even more challenging with future exascale systems. To this end, a multi-disciplinary approach is required to tackle the obstacles in manycore computing, with contributions from computer science, applied mathematics, and engineering disciplines.

Such is the framework of the collaborative project between the CNPq - Inria which involves Brazilian and French researchers in the field of computational science and scientific computing. The general objective of the project is to setup a Brazil-France collaborative effort for taking full benefits of future high-performance massively parallel architectures in the framework of very large-scale datasets and numerical simulations. To this end, the project has a multidisciplinary team with computer scientists who aim at exploiting the massively parallel architectures with high-performance programming languages, software components, and libraries, and numerical mathematicians who aim at devising numerical schemes and scalable solvers for systems of Partial Differential Equations (PDEs). The driving applications are related to important scientific questions for the society in the following 4 areas: (i) Resource Prospection, (ii) Reservoir Simulation, (iii) Cardiovascular System, and (iv) Astronomy

The researchers are divided in 3 fundamental groups in this project: (i) Numerical schemes for PDE models; (ii) Scientific data management; (iii) High-performance software systems.

Aside research goals, the project aims at making overall scientific results produced by the project available to the Brazilian and French scientific communities as well as to graduate students, and also establishing long-term collaborations beyond the current project. To this end, another objective of the project is the integration of the scientific results produced by the project within a common, user-friendly computational platform deployed over the partners' HPC facilities and tailored to the 4 aforementioned applications.

PRACTICAL INFORMATION

The Fourth Brazil-France workshop will take place at the Serra Azul Hotel in the pleasant city of Gramado at the Rio Grande do Sul State in Brazil. The City of Gramado is about 100 km far away from the Porto Alegre International Airport, called Salgado Filho. The workshop will be organized by the Computer Science Department of the Federal University of Rio Grande do Sul.

PARTICIPANTS


Brazilian Participants


Resource Prospection

Pedro Dias (LNCC)
Alvaro Coutinho (High Performance Computing Center and Department of Civil Engineering, COPPE/UFRJ)
Renato Elias (High Performance Computing Center and Department of Civil Engineering,COPPE/UFRJ)
Marta Matoso (Computer Science Department, COPPE/UFRJ)
Philippe Navaux (UFRGS)
Rodrigo Kassik (Ph.D. Student at UFRGS)
Francieli Zanon Boito (Ph.D. Student at UFRGS)
Danilo Costa (Ph. D. Student at the High Performance Computing Center and Department of Civil Engineering COPPE/UFRJ)
Nicolas Maillard (UFRGS)
Lucas Schnorr (UFRGS)
Flavio Alles (MSc. Student at UFRGS)
Adiel Seffrin (MSc. Student at UFRGS)
Marcio Castro (UFSC)
Rafel Keller Tesser (Ph. D. Student at UFRGS)
Victor Eduardo Martinez Abaunza (Ph. D. Student at UFRGS)




Reservoir Simulation

Alexandre Madureira (LNCC)
Frédéric Valentin (LNCC)
Diego Paredes (Universidad Catolica de Valparaiso, Chile)
Felipe Horta (MSc student at Computer Science Department, COPPE/UFRJ)
Vitor Silva (Ph.D. Student at Computer Science Department, COPPE/UFRJ)
Flavio Costa (Ph.D. Student at Computer Science Department, COPPE/UFRJ)




Cardiovascular System

Antonio Tadeu Gomes (LNCC)
João Marcelo Uchôa (UFC)
Felipe Maciel Anderson (CENAPAD-UFC)
André Novotny (LNCC)
Lucas Mueller (LNCC)




Astronomy

Vinicius Freire (Ph.D. student at UFC)
Fabio Porto (LNCC)
Amir Khatibi Mogadan (Ph.D. student at LNCC)
Hermano Lustosa (MSc student at LNCC)
Daniel Gaspar (Ph.D. student at LNCC)




Guests

Rodrigo Gandra (PROMOB-3D, PETROBRAS/CENPES/PDGEO)



×


French Participants



Researchers

Reza Akbarinia, Inria Sophia Antipolis - Méditerranée, Zenith project-team
Konstantin Brenner, Inria Sophia Antipolis - Méditerranée, Coffee project-team
Luc Giraud, Inria Bordeaux - Sud-Ouest, Hiepacs project-team
Cédric Lachat, Inria Bordeaux - Sud-Ouest, Bacchus project-team
Stéphane Lanteri, Inria Sophia Antipolis - Méditerranée, Nachos project-team
Raphaël Léger, Inria Sophia Antipolis - Méditerranée, Nachos project-team
Miguel Liroz Gistau, Inria Sophia Antipolis - Méditerranée, Zenith project-team
Pierre Ramet, Inria Bordeaux - Sud-Ouest, Hiepacs project-team




Representatives from INRIA's Management Teams

Tania Castro, International Relations Department
Hélène Kirchner, Director of International Relations Department
Jean Roman, Director of the Inria Bordeaux - Sud-Ouest research centre




Guests

Jean-François Méhaut, NANOSIM team, Joseph-Fourier University, CEA and Laboratoired d´Informatique de Grenoble



×

PROGRAM

Thursday 18/09




LIST OF ABSTRACTS

One of the goal of the Mont-Blanc project is to use real HPC application to evaluate the feasibility of exascale architectures using o the shelf hardware commonly used in the embedded world. The porting of those application is thus of paramount importance for the project. But, if getting scientific applications to run on the target platform is not very difficult, obtaining good performance portability is challenging. Indeed, HPC software are often hand tuned for the most frequently encountered architectures, and those optimizations can prove harmful if applied on a very di erent architecture. One way to alleviate this problem is to use task based runtimes to obtain adaptive application from a load balancing and network point of view. Unfortunately this solves only part of the problem. Individual tasks also have to be optimized and can be very sensitive to many parameters that are often not clearly exposed in the source code. We thus propose BOAST a meta-programming tool aiming at generating parametrized source code. Several output languages are supported and an expressive DSL is defined to help express the optimizations. An integrated compilation and execution framework is also supplied. This allows to directly test the generated kernels inside BOAST. This talk will present BOAST and how we used it to port part of two HPC applications:
- the Debauchies wavelet kernels of BigDFT, a quantum physics software that computes the electronic density around atoms and molecules.
- a port from CUDA to OpenCL of SPECFEM3D-GLOBE, a wave propagation software based on spectral nite element methods.
Performance results will also be presented.

×

Among other research subjects Coffee team of Inria Sophia Antipolis - Mditerrane develops an expertise in the domain of numeral analysis and modeling of complex flows in porous media. The presentation will begin with a short overview of the recent activities of our team in this field. We will next focus on a particular hybrid dimensional model of two phase flow in fractured porous media, which represents the fractures as the highly conductive media of co-dimension one. We will present the Vertex Approximate Gradient (VAG) discretization of the problem, which may be written in the form of a conservative nodal scheme. Compare to more classical Control Volume Finite Element approach, VAG discretization allows to avoid mixing of heterogeneities within a single control volume. As a result it is well suited for highly contrasted flow rates which arise in fractured porous media.

×

MapReduce has established itself as one of the most popular alternatives for big data processing due to its programming model simplicity and automatic management of parallel execution in clusters of machines. However, the original proposal possesses several limitations that have been pointed out and addressed in the literature, including the lack of database optimizations, such as indexing or columnar data layouts; load balancing problems and the overhead of big data transfers. In this presentation, we first present some of the limitations of MapReduce. Then, we focus on optimizing data transfer between machines during the execution of MapReduce jobs, by employing data partitioning techniques.

×

Mining Probabilistic Frequent Itemsets (PFI) is very important for many applications, particularly for scientific applications that need to deal with probabilistic data. The problem is challenging since algorithms designed for deterministic data are not applicable in probabilistic data. The problem is even more dicult for probabilistic data streams where massive frequent updates need to be taken into account while respecting data stream constraints. In this talk, we present FEMP (Fast and Exact Mining of Probabilistic data streams), the first solution for exact PFI mining in data streams with sliding windows. FEMP allows updating the frequentness probability of an itemset whenever a transaction is added or removed from the observation window. Using these update operations, we are able to extract PFI in sliding windows with very low response times. Furthermore, our method is exact, meaning that we are able to discover the exact probabilistic frequentness distribution function for any monitored itemset, at any time. We implemented FEMP and conducted an extensive experimental evaluation over synthetic and real-world data sets; the results illustrate its very good performance.

×

Nanophotonics is a branch of optical engineering which is concerned with the study of the behavior of light at the nanometer scale in interaction with sub-wavelength particles or devices. Because of its numerous scientific and technological applications (e.g. in relation to telecommunication, energy production and biomedicine), nanophotonics currently represents an active field of research increasingly relying on computer simulation beside experimental studies. The numerical study of electromagnetic wave propagation in interaction with nanometer scale structures generally relies on the solution of the system of time-domain Maxwell equations, taking into account an appropriate physical dispersion model, such as the Drude or Drude-Lorentz models, for characterizing the material properties of the involved nanostructures at optical frequencies. We present a discontinuous finite element time-domain solver for the computer simulation of the interaction of light with nanometer scale structures. The method relies on a compact stencil high order interpolation of the electromagnetic field components within each cell of an unstructured tetrahedral mesh. This piecewise polynomial numerical approximation is allowed to be discontinuous from one mesh cell to another, and the consistency of the global approximation is obtained thanks to the definition of appropriate numerical traces of the fields on a face shared by two neighboring cells. Time integration is achieved using an explicit scheme and no global mass matrix inversion is required to advance the solution at each time step. Moreover, the resulting time-domain solver is particularly well adapted to parallel computing.

×

Commonly hybrid architectures have been supported by Graphics Processing Units (GPUs) from NVIDIA. Since the Fermi architecture GPUs support concurrent kernel execution, where different kernels of the same application context can execute on the GPU at the same time and since Kepler architecture implements dynamic parallelism to spawn new threads by adapting to data without going back to the host CPU. Ondes3D is a seismic model implemented in CUDA, and solves a fourth-order spatial operator, the thread that handles the calculation of point needs to know the fields (and therefore access the arrays) at point and of 12 neighboring points, this means 13 access to memory. In this work I analyze the load on a GPUs cluster, to find which factors are affecting the performance when the number of GPUs is increased.

×

Seismic wave simulations are used to predict the consequences of future earthquakes. The modeling of the propagation of these waves is based on a set of elastodynamics equations, usually solved with the Finite-Differential Time-Domain method. When modeling a restrict region, as the physical phenomena is unbounded, we also need absorbing boundary conditions. These conditions model the absorption of the energy that propagates outside the simulated region. To parallelize this simulation, we usually decompose the domain in smaller sub-domains. This gives rise to load-imbalance, because sub-domains on the borders have larger absorbing boundary condition regions than those on the center. Besides that, load imbalance can also rise from the variation of composition of the geological layers and from the propagation of the wave itself. As a way to mitigate this load imbalance, we propose the use of dynamic load balancing. To evaluate the effectiveness of this solution, we ported a seismic wave propagation simulator to AMPI. This way, we were able to use the load balancing framework provided by the Charm++ runtime system. We evaluated the performance of the application with different load balancers, including a set of topology-aware load balancers. Our results show performance gains up to 23.85% on a small system (64 cores), and up to 36.41% on a larger one (288 cores).

×

Flood events in urban areas are becoming more frequent as a consequence of several factors, such as population growth, climate change (which has magnified the intensity of rainfall), the rise of sea levels which threatens coastal areas, and the decaying or poor infrastructure. The effects of these flood events can result in entire cities with underwater houses and several damages for the inhabitants. In order to assist the prediction of flood events, the scientific community has developed environmental models to simulate and interact with several topics, including meteorology, seismic activities and hydrology. Hydrological modeling is a valuable tool to avoid the huge consequences of disasters. With an efficient model, it is possible to preview and simulate future damages, being able to handle the event and take actions with antecedence. To make this possible, fast and efficient models are needed. With the advance of technology and the improvement of computer architecture, modeling is today hundred times faster than forty years ago. The utilization of multicore systems and stream processors such as GPUs enabled some hydrological models to simulate events in a few minutes, what used to take hours or days when executed in the traditional CPU serial codes, with the same precision and reliability of results besides without a hight cost of rewriting the model code. Actualy, we explore some programming paradigm and presents a comparison of a GPU and serial implemented versions of the kernel of some models that use the quasi-linearized onedimentional Saint-Venan equations, or the shallow water models. This comparison will be used to analyze the behavior of the new features of actuals GPGPU cards in those models, benefits and question that need to be handleled in the model implementation.

×

Large-scale simulation of seismic wave propagation is an active research topic. Its high demand for processing power makes it a good match for High Performance Computing (HPC). Although we have observed a steady increase on the processing capabilities of HPC platforms, their energy efficiency is still lacking behind. In this talk, I will discuss the use of a low-power manycore processor, the MPPA-256, for seismic wave propagation simulations. First I present its peculiar characteristics such as limited amount of on-chip memory and describe the intricate solution we brought forth to deal with this processors idiosyncrasies. Next, I discuss the performance and energy efficiency results obtained on MPPA-256 and other common-place platforms such as general-purpose processors and a GPU. The results show that even if MPPA-256 presents an increased software development complexity, it can indeed be used as an energy efficient alternative to current HPC platforms, resulting in up to 71% and 5.18 x less energy than a GPU and a general-purpose processor, respectively.

×

In this talk we consider multiphysics applications our group has been working in the last years from algorithmic, software and hardware perspectives. These encompass engineering applications such extreme wave-structure interaction phenomena in floating structures, coupled multi-material flow with heat transfer and polydisperse mixtures, the latter with applications in geology. We also discuss forecasting the stochastic space, quantifying uncertainty propagation in initial conditions. We will show that these different applications present several algorithmic similarities, which lead to common implementation approaches on modern high performance computers. Topics as parallel linear and nonlinear equation solving, mesh generation and visualization will also be addressed.

×

This study examines changes in the finite-difference order and weights may cause over the memory demands and computational costs of the reverse time migration kernel. Orders from fourth to sixteenth for both, Taylors derivative weights and optimized weights by the Scalar Binominal Window were considered. Moreover, performance e ects from stencil optimizations and Intel Xeon Phi and Xeon E5 architectures were also considered. Ultimately, results attained showed that optimized high- order weights have provided savings of up to 70% of the computational cost and 90% of the memory demand.

×

Numerical simulations, aiming at providing reliable information, are typically hard to manage and subject to different sources of uncertainty. Uncertainty Quantification (UQ) can be used to measure the reliability of experiments that run simulations involving complex numerical models. UQ enables the estimation of con dence intervals in predictions by stressing the numerical model taking into consideration the variability on the uncertain inputs, leading to very large data exploration. The choice of which slice of the input parameter space to explore in a UQ workflow impacts the amount of data produced, which can be very large. Scientists usually start the UQ workflow with a modest configuration. If the outcome is below a given quality criteria, they change the data and run the workflow again. These quality criteria are complex to evaluate and are sensitive to the dataflow runtime characteristics, being impossible to precise in advance. Thus, scientists resubmit the execution of a UQ workflow by changing the input configuration until the result meets their expectations. Such manual support of iteration makes it hard to manage the evolution of scientific data analysis. We will present our algebraic data-centric iteration approach to support the execution of UQ in parallel simulation, with iterative constructs and user steering. Results on UQ show impressive execution time savings from 2.5 to 24 days, compared to non-iterative workflow execution.

×

The maturity reached by current models for the cardiovascular system poses new challenges from the point of view of applicability and exploration of model capabilities. Relevant examples are: parameter identification, uncertainty quantification and sensitivity analysis, patientspecific modeling or even model improvement by coupling with other physiological systems (e.g. control mechanisms). All these challenges have in common the need for either significantly long (in time) simulations or many (many) simulation instances running in parallel. As a consequence, there is need for the development of new numerical methods which should be accurate, robust and computationally efficient, being parallelism and scalability of the utmost importance in order to make proper use of high performance computing resources. The above mentioned challenges involve dealing with thousands of simulations, which makes data management a crucial aspect. In this talk we will present the state of the art in terms of computational models for the simulation of systemic interactions taking place in the cardiovascular system at different levels of vascular organization, as well as open problems and strategies to tackle some of them by resorting to large scale scientific computing.

×

The inverse gravimetry problem consists of determining the mass density distribution in certain geometrical domain from partial boundary measurements of the exterior gravity field.There is two main applications of the inverse gravimetry problem, which are:

- Reconstruction of the mass density distribution in the whole Earth, which is done with the purpose of obtaining information of the dynamical processes taking place in the interior of the Earth.
- Reconstruction of the mass density distribution of certain small regions of the Earth, located close to Earth's surface.

Both reconstruction problems are known to be severally ill-posed [2]. In fact, the first problem violates each of Hadamard's criteria for a well-posed problem [6]. To deal with the problem of uniqueness, strong assumptions, which usually do not hold, are made on the class of measures to be reconstructed. For instance, it is well known that uniqueness holds in the class of measures corresponding to single star-shaped domains. The stability of the reconstruction can be obtained by introducing a regularization of the inverse operator, such as in the Tikhonov or the total variation regularizations, but the numerical difficulties usually prevent the reconstruction algorithms from obtaining good resolution. Additional practical issues such as partial measurement render the second reconstruction problem even more difficult.

On the other hand, the gravitational field can be measured with high precision, and is the most stationary and stable of all known physical fields on the Earth. In addition, potential important applications have fueled research into new numerical algorithms of reconstruction [3]. In this work we deal with the second application of the inverse gravimetry problem. More specifically, we consider the two and three-dimensional inverse gravimetry problems with partial measurement. To deal with these problems we follow the ideas presented in [1]. In that paper the inverse problem is reformulated as a topology optimization problem, where the support of the measure is the unknown variable. The Kohn{Vogelius functional, which measures the misfit between the solutions of two auxiliary problems [5], one containing information on the boundary measurement and the other one containing information on the boundary excitation, is minimized in a class of measures consisting of certain finite number of ball-shaped anomalies. The resulting topology optimization algorithm is based on the total variation of the Kohn{Vogelius functional. Although very simple in conception and implementation, this algorithm has been shown effective in the solution of the two-dimensional inverse potential problem with total boundary measurement.

In this work we extend the ideas in [1] to cover the two and three spatial dimensions cases with partial boundary measurement. The additional difficulty that arises from the partial boundary measurement is directly addressed by modifying the auxiliary problems considered in [1], where the Newtonian potential is used to complement the unavailable information on the hidden boundary. In contrast to existing approaches [4], the proposed method does not need a numerical continuation approach to obtain fictitious boundary measurements of the potential. In addition, the resulting reconstruction algorithm is non-iterative and very robust with respect to noisy data. Finally, some numerical results are presented in order to show the effectiveness of the devised reconstruction algorithm.

References

[1] A. Canelas, A. Laurain and A.A. Novotny. A new reconstruction method for the inverse potential problem. Journal of Computational Physics, 268:417{431, 2014.
[2] V. Isakov. Inverse source problems, volume 34 of Mathematical Surveys and Monographs.American Mathematical Society, Providence, RI, 1990.
[3] V. Isakov. Inverse problems for partial diferential equations. Springer, New York, 1998.
[4] V. Isakov, S. Leung and J. Qian. A fast local level set method for inverse gravimetry.Commun. Comput. Phys., 10(4):1044{1070, 2011.
[5] R. Kohn and M. Vogelius. Determining conductivity by boundary measurements. Comm. Pure Appl. Math., 37(3):289{298, 1984.
[6] V. Michel. Regularized wavelet-based multiresolution recovery of the harmonic mass density distribution from data of the earth's gravitational field at satellite height. Inverse Problems, 21(3):997{1025, 2005.

×

This work proposes a Multiscale Hybrid-Mixed (MHM) method for the Maxwell equation in time domain. The MHM method is a consequence of a hybridization procedure, and emerges as a method that naturally incorporates multiple scales while provides solutions with high-order precision. The computation of local problems is embedded in the upscaling procedure, which are completely independent and thus may be naturally obtained using parallel computation facilities. We present some preliminaries results started within the collaboration with the project NACHOS, at INRIA Sophia-Antipolis, France. The MHM method may be seen as the first version of the MHM method for time dependent problems. As such, we study the impact of choosing implicit or explicit time schemes within the MHM framework. The presentation of the results are split in two parts. In the first one, we motivate the work and propose a new theoretical framework in which the MHM method is built. Preliminary theoretical results are also addressed. The second part of this presentation will be dedicated to the numerical validation of the MHM method. We conclude that the MHM method is naturally shaped to be used in parallel computing environments and appears to be a highly competitive option to handle realistic multiscale hyperbolic boundary value problems with precision on coarse meshes.

References

[1] C. Harder, D. Paredes and F. Valentin A family of multiscale hybrid-mixed finite element methods for the Darcy equation with rough coefficients. Journal of Computation Physics, Vol. 245, pp. 107-130, 2013.
[2] R. Araya, C. Harder, D. Paredes and F. Valentin Multiscale hybrid-mixed method. SIAM Journal on Numerical Analysis, Vol. 51, No. 6, pp. 3505-3531, 2013.

×

This work proposes a Multiscale Hybrid-Mixed (MHM) method for the Maxwell equation in time domain. The MHM method is a consequence of a hybridization procedure, and emerges as a method that naturally incorporates multiple scales while provides solutions with high-order precision. The computation of local problems is embedded in the upscaling procedure, which are completely independent and thus may be naturally obtained using parallel computation facilities.

We present some preliminaries results started within the collaboration with the project NACHOS, at INRIA Sophia-Antipolis, France. The MHM method may be seen as the first version of the MHM method for time dependent problems. As a result, we study the impact of choosing implicit or explicit time schemes within the MHM framework.

The presentation of the results are divided in two parts. In this second part, we present an extense numerical validation of the MHM method for the two-dimensional Maxwell equations in time domain (the TM model), adopting different time discretization schemes. Numerical comparisons with the Discontinuous Galerkin method is also addressed.

We conclude that the MHM method is naturally shaped to be used in parallel computing environments and appears to be a highly competitive option to handle realistic multiscale hyperbolic boundary value problems with precision on coarse meshes.

References

[1] C. Harder, D. Paredes and F. Valentin A family of multiscale hybrid-mixed finite element methods for the Darcy equation with rough coefficients. Journal of Computation Physics, Vol. 245, pp. 107-130, 2013.
[2] R. Araya, C. Harder, D. Paredes and F. Valentin Multiscale hybrid-mixed method. SIAM Journal on Numerical Analysis, Vol. 51, No. 6, pp. 3505-3531, 2013.

×

In this talk I'll present some applications of high performance computing to neuroscience that are under development. The problems under consideration involve incorporating detailed aspects of neuronal physiology. The resulting models are given by nonlinear, time dependent partial differential equations that are discretized by numerical methods of multiscale type. Applications include inverse problems and stochastic modeling.

×

With the emergence of large-scale parallel architectures (peta- and exa-scale), it is important to ensure the scalability of applications that will run on these architectures, eliminating bottlenecks that adversely affect their performance. Applications that are characterized by handling large amounts of data, Big Data, (such as weather and climate applications, geophysical simulations, and others) are highly dependent on the performance of I/O operations, presenting a bottleneck to handle data. This talk will present and characterize some challenges of I/O operations and introduce recent strategies employed by our research team at GPPD UFRGS in order to improve the scalability of data management.

×

Considering the scale to which parallel systems are headed - and, also, the scale that these systems already are in today - the performance analysis must scale together in order to be able to be informative and relevant for parallel application programmers. And this scaling to the necessary levels is only possible through the automation of analysis. Automatic performance analysis of parallel applications consists in detecting performance issues and patterns in a completely automated fashion - that is, with no input from an expert human analyst. Thus, the amount of information that can be parsed and analyzed is no longer limited by the ability of human being to assess and comprehend the collected data. With automatized analysis the computational resources available - which are by far more powerful than what human beings possess - dictate the amount of collected data and of analysis that can be performed on this data. Thus, automatic performance analysis is a crucial technique to analyze current and future parallel applications. In this talk, we will present a survey of tools and techniques for the automatic performance analysis of parallel applications. Also, the roadmap of our research - the use of Machine Learning algorithms and techniques for automatic performance analysis - will be presented and discussed.

×

Performance analysis is widely used in high performance computing to better understand application and system behavior. The first step on doing a performance analysis is collect behavioral information during application execution. Because of the scalability of nowadays supercomputers, the collected data may easily reach many gigabytes of data. There are many techniques to analyze all this information, from trace visualization to statistical analysis and outlier detection. They all su er from different levels of scalability problems. In this task, we will present our e orts in order to address the large volume of trace data obtained from parallel applications. These efforts are implemented in PajeNG: a scalable performance analysis framework and its associated tools, that shall also be presented in this talk.

×

Computational Modelling is a science wherein scientists conceive mathematical models that strive to reproduce the behavior of a studied phenomenon. By means of simulations, predicting variables are computed, usually, through a multidimensional space-time frame. State of the art simulations use a class of software named Solvers to solve equations and compute the values of predicting variables. Moreover, to guide the computation in the physical domain space, polygonal meshes pinpoint the spots where values get computed and the time dimension brings the system dynamics by indicating successive values for the same mesh spot. Finally, in order to test the model with different parameter sets, the scientist may run thousands of simulations. Despite the huge amount of data produced by such simulations,the process is basically not supported by an efficient data management solution. Typical implementation stores parameters and simulated data in standard files organized with a sort of directory structure with no support for high level query language and distribution query processing.

In this context, this talk investigates the adoption of the Multidimensional Array Data Model over the SciDB DBMS to manage multidimensional numerical simulation data. We model the 3D spatio-temporal dimensions and the simulation as indexes in the Multidimensional Array Data Model and the predicted variables as values in cells. We present a new strategy to map unstructured 3D spatial meshes into arrays. An orchestrated set of spatial transformations map the original spatial model into a dense multidimensional array, radically reducing the number of sparse chunks produced by a naive mapping. Our strategy is particularly interesting for large queries that would retrieve a huge number of sparsely loaded data chunks. We have run a series of experiments over a real case scenario for the simulation of the cardio-vascular system, developed at LNCC. We show that in some queries we present an improvement in query elapsed-time of approximately 25 times, compared to the standard SciDB implementation.

×

Partitioning data from astronomy catalogues is paramount in order to scale processing for continuously increasing datasets. Basically, data partitioning strategies adopt one of two approaches that di er on the previous knowledge of a workload. Unfortunately, many of the scientific workflows that evaluate data in catalogues require reading a huge percentage of the data, making workload based strategies useless. In this context, multidimensional histograms techniques, such as equi-depth maybe interesting as it produces equal size partitions. In this talk, we will present our first results in adapting this technique for the partitioning of astronomy catalog data.

×

Current astronomy surveys present important challenges in the spatial positioning based cross-matching of sky catalogues. The cross-matching process attempts to identify sky objects registered in different catalogues corresponding to the same real object but whose coordinate position suffered slight displacement among catalogues. Positioning displacement is a known issue due to telescope calibration. The cross-matching among catalogues is usually applied in peer-to-peer fashion, between two different catalogues, and generates a single output catalog identifying common merged objects. The algorithm selects matches considering the shortest distance between objects using a spatial radius x defined by the user. However, when we want to compute a matching among three or more catalogues, a more careful process must be applied, as one shall not consider matching transitively and the ordering with which catalogues are chosen may produce different results. In order to tackle these issues we propose NACluster, a non-supervised clustering algorithm for matching sky objects from multi catalogues under the restriction that on each cluster only objects from different catalogues are allowed. We experiment NACluster with real and synthetic catalogues and show that the results present better accuracy than state of the art solutions. Now we are developing a parallel version of the algorithm.

×

Large-scale scientific computing often relies on intensive tasks chained through a workflow running on a high performance environment. In this context, scientists need to: model their workflows for later submission on dedicated execution environments; keep track of partial result files - commonly spread on a distributed architecture - for runtime or final analysis; and usually, steer experiment execution at particular points looking for convergence. Such process, which is complex and laborious, hides potencial issues given the interface between scientists, workflows models, experiment data and the execution environment.

When using a scientific workflow system, provenance data keeps track of every step of the execution. And if traversing provenance data is allowed at runtime, it is easier to monitor and analyze partial results. However, visualization of partial results is necessary to be done in sync to the workflow provenance.

In this work, it is proposed a scientific gateway platform - named Proteus - for data integration and visualization, supporting such laborious process that occurs among diferent systems and interfaces. Proteus allows large-scale workflows management based on runtime provenance queries to organize and aggregate data for enriched visualization. This project aims to provide support since the begining of data modeling till data visualization, also providing uncoupled integration with external visualization environments.

Proteus helps scientists to follow the steps of the running workflow and visualize the produced partial results. This innovates because several systems execute workflows online and do not allow for runtime analysis and workflow steering. Proteus is also elegible for reproducibility support while persisting workflow submission history, and enabling user to re-execute experiments with other provenance or execution environment.

To evaluate Proteus, this preliminary work proposes a finite element computational fluid workflow that runs on a supercomputer, through the workflow engine Chiron [1]. In this way, Proteus is evaluated while integrating workflow modeling, submission, steering and visualization - on a tiled-wall display - of several simulation steps and different views based on runtime provenance queries.

References

[1] Ogasawara, E. and Dias, J. and Silva, V. and Chirigati, F. and de Oliveira, D. and Porto, F. and Valduriez, P. and Mattoso, M. Chiron: a parallel engine for algebraic scientific workflows. , Vol. 25, pp. 2327-2341, 2013.

×

In this talk we discuss about the implementation of a library specifically crafted for the new family of Multiscale Hybrid-Mixed (MHM) finite element methods. The MHM method allows solving (global) problems on coarse meshes while providing solutions with high-order precision by exploring the loosely-coupled strategy of embedding independent (local) subproblems in the upscaling procedure. Sticking to the idea of high productivity of the SPiNMe platform, we've adopted Erlang as the base implementation for the communicating processes of the new library. In this talk we present the advantages of adopting Erlang for such processes not only in terms of high productivity but also with regard to fault tolerance. We show how the Erlang processes are loosely integrated with numerical computing processes (implemented in C++) that ultimately solve the global and local problems, thus indicating the potential of the Erlang implementation in being adopted in other contexts. We present a preliminary performance evaluation of the overall (Erlang plus C++) implementation, taking into account its speed-up and load balancing properties. We then conclude the talk by pointing out the intended future work.

×

The Discontinuous Galerkin time-domain method (DGTD) can be seen a finite element type method in which the continuity between elements has been released [1]. This leads to very good properties (stencil compactness, possible local h-p adaptation) in view of dealing with complex wave-propagation problems in heterogeneous media [2, 3]. One of the fruits of the research conducted at the Nachos project-team on the formulation and implementation of numerical methodologies based on the DGTD method is a parallel solver of 3D Maxwells equations on unstructured meshes which we refer to as MAXW-DGTD. In this talk, we will start by recalling the idea of the method. Then, we will give an overview of recent progress on the implementation, in MAXW-DGTD, of a hybrid coarse grain (MPI) / fine grain (OpenMp) parallelization strategy. One of the motivations behind this is to achieve good scalability results on the heterogeneous cluster/accelerator architecture proposed in the DEEP-ER exascale european project, on which MAXW-DGTD is currently being adapted. Finally, we will also briefly discuss future plans of including MAXW-DGTD as a low-level solver embedded in the Multiscale Hybrid Method (MHM) framework developed at LNCC by Harder et al. [4].

References

[1] Fezoui, L., Lanteri, S., Lohrengel, S. and Piperno, S., Convergence and stability of a discontinuous Galerkin time-domain method for the 3D heterogeneous Maxwell equations on unstructured meshes. ESAIM: Math. Model. Numer. Anal., Vol. 39, No. 6, pp. 11491176, 2005.
[2] Fahs, H., Hadjem, A., Lanteri, S., Wiart, J. and Wong, M.F., Calculation of the SAR induced in head tissues using a high order DGTD method and triangulated geometrical models. IEEE Trans. Ant. Propag., Vol. 59, No. 12, pp. 4669-4678, 2011.
[3] Léger, R., Viquerat, J., Durochat, C., Scheid, C. and Lanteri, S., A parallel non-conforming multi/element DGTD method for the simulation of electromagnetic wave interaction with metallic nanoparticles. Journal of Computational and Applied Mathematics, ISSN 0377-0427, http://dx.doi.org/10.1016/j.cam.2013.12.042.
[4] Harder, C., Paredes, D. and Valentin, F., A family of multiscale hybrid-mixed finite element methods for the Darcy equation with rough coefficients. Journal of Computation Physics, Vol. 245, pp. 107-130, 2013.

×

SPONSORS