Language Selection

English French German Italian Portuguese Spanish

Moving (parts of) the Cling REPL in Clang

Filed under
Development

Motivation
===

Over the last decade we have developed an interactive, interpretative 
C++ (aka REPL) as part of the high-energy physics (HEP) data analysis 
project -- ROOT [1-2]. We invested a significant  effort to replace the 
CINT C++ interpreter with a newly implemented REPL based on llvm -- 
cling [3]. The cling infrastructure is a core component of the data 
analysis framework of ROOT and runs in production for approximately 5 
years.

Cling is also  a standalone tool, which has a growing community outside 
of our field. Cling’s user community includes users in finance, biology 
and in a few companies with proprietary software. For example, there is 
a xeus-cling jupyter kernel [4]. One of the major challenges we face to 
foster that community is  our cling-related patches in llvm and clang 
forks. The benefits of using the LLVM community standards for code 
reviews, release cycles and integration has been mentioned a number of 
times by our "external" users.

Last year we were awarded an NSF grant to improve cling's sustainability 
and make it a standalone tool. We thank the LLVM Foundation Board for 
supporting us with a non-binding letter of collaboration which was 
essential for getting this grant.


Background
===

Cling is a C++ interpreter built on top of clang and llvm. In a 
nutshell, it uses clang's incremental compilation facilities to process 
code chunk-by-chunk by assuming an ever-growing translation unit [5]. 
Then code is lowered into llvm IR and run by the llvm jit. Cling has 
implemented some language "extensions" such as execution statements on 
the global scope and error recovery. Cling is in the core of HEP -- it 
is heavily used during data analysis of exabytes of particle physics 
data coming from the Large Hadron Collider (LHC) and other particle 
physics experiments.


Plans
===

The project foresees three main directions -- move parts of cling 
upstream along with the clang and llvm features that enable them; extend 
and generalize the language interoperability layer around cling; and 
extend and generalize the OpenCL/CUDA support in cling. We are at the 
early stages of the project and this email intends to be an RFC for the 
first part -- upstreaming parts of cling. Please do share your thoughts 
on the rest, too.


Moving Parts of Cling Upstream
---

Over the years we have slowly moved some patches upstream. However we 
still have around 100 patches in the clang fork. Most of them are in the 
context of extending the incremental compilation support for clang. The 
incremental compilation poses some challenges in the clang 
infrastructure. For example, we need to tune CodeGen to work with 
multiple llvm::Module instances, and finalize per each 
end-of-translation unit (we have multiple of them). Other changes 
include small adjustments in the FileManager's caching mechanism, and 
bug fixes in the SourceManager (code which can be reached mostly from 
within our setup). One conclusion we can draw from our research is that 
the clang infrastructure fits amazingly well to something which was not 
its main use case. The grand total of our diffs against clang-9 is: `62 
files changed, 1294 insertions(+), 231 deletions(-)`. Cling is currently 
being upgraded from llvm-5 to llvm-9.

A major weakness of cling's infrastructure is that it does not work with 
the clang Action infrastructure due to the lack of an 
IncrementalAction.  A possible way forward would be to implement a 
clang::IncrementalAction as a starting point. This way we should be able 
to reduce the amount of setup necessary to use the incremental 
infrastructure in clang. However, this will be a bit of a testing 
challenge -- cling lives downstream and some of the new code may be 
impossible to pick straight away and use. Building a mainline example 
tool such as clang-repl which gives us a way to test that incremental 
case or repurpose the already existing clang-interpreter may  be able to 
address the issue. The major risk of the task is avoiding code in the 
clang mainline which is untested by its HEP production environment.
There are several other types of patches to the ROOT fork of Clang, 
including ones  in the context of performance,towards  C++ modules 
support (D41416), and storage (does not have a patch yet but has an open 
projects entry and somebody working on it). These patches can be 
considered in parallel independently on the rest.

Extend and Generalize the Language Interoperability Layer Around Cling
---

HEP has extensive experience with on-demand python interoperability 
using cppyy[6], which is built around the type information provided by 
cling. Unlike tools with custom parsers such as swig and sip and tools 
built on top of C-APIs such as boost.python and pybind11, cling can 
provide information about memory management patterns (eg refcounting) 
and instantiate templates on the fly.We feel that functionality may not 
be of general interest to the llvm community but we will prepare another 
RFC and send it here later on to gather feedback.


Extend and Generalize the OpenCL/CUDA Support in Cling
---

Cling can incrementally compile CUDA code [7-8] allowing easier set up 
and enabling some interesting use cases. There are a number of planned 
improvements including talking to HIP [9] and SYCL to support more 
hardware architectures.



The primary focus of our work is to upstreaming functionality required 
to build an incremental compiler and rework cling build against vanilla 
clang and llvm. The last two points are to give the scope of the work 
which we will be doing the next 2-3 years. We will send here RFCs for 
both of them to trigger technical discussion if there is interest in 
pursuing this direction.


Collaboration
===

Open source development nowadays relies on reviewers. LLVM is no 
different and we will probably disturb a good number of people in the 
community ;)We would like to invite anybody interested in joining our 
incremental C++ activities to our open every second week calls. 
Announcements will be done via google group: compiler-research-announce 
(https://groups.google.com/g/compiler-research-announce).



Many thanks!


David & Vassil

Read more

Also: Cling C++ Interpreter Looking To Upstream More Code Into LLVM

More in Tux Machines

Python Programming

  • Multiple File/Image Upload with Django 3, Angular 10 and FormData

    In the previous tutorial we have seen how to implement file uploading in Django and Angular 10. In this tutorial, we'll see how to implement multiple file uploading with FormData. It's recommended that you start from the previous tutorial to see detailed steps of how to create a django project, how to install Angular CLI and generate a new Angular 10 project along with services and components as we won't cover those basics in this part.

  • Python Projects for Beginners (Massive 2020 Update)

    Learning Python can be difficult. You can spend time reading a textbook or watching videos, but then struggle to actually put what you've learned into practice. Or you might spend a ton of time learning syntax and get bored or lose motivation. How can you increase your chances of success? By building Python projects. That way you're learning by actually doing what you want to do! When I was learning Python, building projects helped me bring together everything I was learning. Once I started building projects, I immediately felt like I was making more progress. Project-based learning is also the philosophy behind our teaching method at Dataquest, where we teach data science skills using Python. Why? Because time and time again, we’ve seen that it works!

  • Practical Recipes for Working With Files in Python

    Python has several built-in modules and functions for handling files. These functions are spread out over several modules such as os, os.path, shutil, and pathlib, to name a few. This course gathers in one place many of the functions you need to know in order to perform the most common operations on files in Python.

  • Introduction to scheduled tasks helper scripts

    For all PythonAnywhere users who like to automate their workflow using scripts there’s already the pythonanywhere package which provides an interface for some PythonAnywhere API features. If you’re one of them, you might be interested in some recent additions for programmatic management of Scheduled Tasks.

  • Mike Driscoll: Python Malware May be Coming to a Computer Near You

    Cyborg Security reported recently that malware is starting to appear that has been written using the Python programming language. Traditionally, most malware has been written in compiled languages, such as C or C++. The reason is simple. Compiled languages let the attacker create smaller, harder to detect, executables. However, Python’s popularity and ease of use has made it more appealing to malware authors. The biggest problem with Python for malware is that it tends to use considerably more RAM and CPU than malware written in C or C++. Of course, with PCs being as powerful as they are now, this is no longer an issue. Especially when you consider that there are so many applications being written with Electron. Your web browser is now a huge resource hog! As the Cyborg Security website points out, you can use PyInstaller or py2exe to create an executable of your Python code. What that article doesn’t mention is that someone will need to digitally sign that software as well to get it to run on Windows 10. One thing the article mentions that was interesting to me is that you can use Nuitka to basically transpile your Python code to C and you’ll end up with a much smaller executable than you would with either PyInstaller or py2exe.

  • PyCoder’s Weekly: Issue #432 (Aug. 4, 2020)
  • PSF GSoC students blogs: Weekly Check-in #10
  • Python 3.6.9 : My colab tutorials - parts 008.

today's howtos

Graphics: AMD, Intel and Wayland/Wayfire

  • Defaulting Radeon GCN 1.0/1.1 GPUs To Better Linux Driver Is Held Up By Analog Outputs

    Switching from the "Radeon" to "AMDGPU" kernel driver on Linux is possible for Radeon GCN 1.0/1.1 era graphics cards and doing so can mean slight performance benefits, the ability to run the AMDVLK or RADV Vulkan drivers, and simply making use of this better maintained driver. But having these original GCN graphics cards default to the modern AMDGPU driver appears held up by the lack of analog video output support with that driver.

  • Intel's Open-Source H.265/HEVC Encoder Sees First Release Of 2020

    Intel's Scalable Video Technology team is known for their open-source video encoder work particularly on AV1 and VP9 formats, but they also continue to maintain a high performance H.265/HEVC encoder as well. Intel SVT-HEVC 1.5 was released on Monday as their first major update of the year. Intel SVT-HEVC 1.5 fixes "all memory leaks" following a refactoring of their allocation/deallocation code that also leads to the ability for FFmpeg to run multi-instance encoding in parallel. SVT-HEVC 1.5 also has a number of optimizations, fixes for a random hang issue with few threads (something we've seen as well with SVT-HEVC in our own benchmarks), and a number of other fixes.

  • GNOME's Mutter Adds Support For Launching "Trusted Clients" On Wayland

    Merged to GNOME's Mutter compositor is an API for Wayland to allow the launching of trusted clients. This "trusted clients" support is namely about allowing child windows to be signified as being from a parent window/process. This can also allow for some nifty use-cases for GNOME on Wayland. The patch explains: Unfortunately, although the child process can be a graphical program, currently it is not possible for the inner code to identify the windows created by the child in a secure manner (this is: being able to ensure that a malicious program won't be able to trick the inner code into thinking it is a child process launched by it).

  • Wayfire 0.5 Wayland Compositor Brings Latency Optimizations, More Protocols

    Wayfire, a Wayland compositor inspired by the likes of Compiz with different desktop effects, is out today with a new feature release. Perhaps most exciting with Wayfire 0.5 is the work done to improve (reduce) the latency. Wayfire now better tracks how much time it needs to draw a frame, support for the presentation time protocol, and other work. Aside from latency improvements, there are Wayland protocol additions for primary selection for allowing middle-click-paste to work plus the output-power-management protocol for better handling display output power management behavior.

How Librem 5 Solves NSA’s Warning About Cellphone Location Data

The NSA has published new warnings for military and intelligence personnel about the threats from location data that is captured constantly on modern cellphones (originally reported by the Wall Street Journal). While privacy advocates (including us at Purism) have long warned about these risks, having the NSA publish an official document on the subject helps demonstrate that cellphone tracking is a real privacy and security problem for everyone. We have been thinking about the danger of location data on cellphones for a long time at Purism and have designed the Librem 5 from scratch specifically to address this risk. The NSA document describes and confirms a number of the threats I wrote about almost a year and a half ago when I introduced our “lockdown mode” feature on the Librem 5–a feature that disables all sensors on the Librem 5. In this post I’ll describe the threats the NSA presents in their document and how we address them with the Librem 5. Read more Also: Librem 5 Web Apps