Language Selection

English French German Italian Portuguese Spanish

Moving (parts of) the Cling REPL in Clang

Filed under
Development

Motivation
===

Over the last decade we have developed an interactive, interpretative 
C++ (aka REPL) as part of the high-energy physics (HEP) data analysis 
project -- ROOT [1-2]. We invested a significant  effort to replace the 
CINT C++ interpreter with a newly implemented REPL based on llvm -- 
cling [3]. The cling infrastructure is a core component of the data 
analysis framework of ROOT and runs in production for approximately 5 
years.

Cling is also  a standalone tool, which has a growing community outside 
of our field. Cling’s user community includes users in finance, biology 
and in a few companies with proprietary software. For example, there is 
a xeus-cling jupyter kernel [4]. One of the major challenges we face to 
foster that community is  our cling-related patches in llvm and clang 
forks. The benefits of using the LLVM community standards for code 
reviews, release cycles and integration has been mentioned a number of 
times by our "external" users.

Last year we were awarded an NSF grant to improve cling's sustainability 
and make it a standalone tool. We thank the LLVM Foundation Board for 
supporting us with a non-binding letter of collaboration which was 
essential for getting this grant.


Background
===

Cling is a C++ interpreter built on top of clang and llvm. In a 
nutshell, it uses clang's incremental compilation facilities to process 
code chunk-by-chunk by assuming an ever-growing translation unit [5]. 
Then code is lowered into llvm IR and run by the llvm jit. Cling has 
implemented some language "extensions" such as execution statements on 
the global scope and error recovery. Cling is in the core of HEP -- it 
is heavily used during data analysis of exabytes of particle physics 
data coming from the Large Hadron Collider (LHC) and other particle 
physics experiments.


Plans
===

The project foresees three main directions -- move parts of cling 
upstream along with the clang and llvm features that enable them; extend 
and generalize the language interoperability layer around cling; and 
extend and generalize the OpenCL/CUDA support in cling. We are at the 
early stages of the project and this email intends to be an RFC for the 
first part -- upstreaming parts of cling. Please do share your thoughts 
on the rest, too.


Moving Parts of Cling Upstream
---

Over the years we have slowly moved some patches upstream. However we 
still have around 100 patches in the clang fork. Most of them are in the 
context of extending the incremental compilation support for clang. The 
incremental compilation poses some challenges in the clang 
infrastructure. For example, we need to tune CodeGen to work with 
multiple llvm::Module instances, and finalize per each 
end-of-translation unit (we have multiple of them). Other changes 
include small adjustments in the FileManager's caching mechanism, and 
bug fixes in the SourceManager (code which can be reached mostly from 
within our setup). One conclusion we can draw from our research is that 
the clang infrastructure fits amazingly well to something which was not 
its main use case. The grand total of our diffs against clang-9 is: `62 
files changed, 1294 insertions(+), 231 deletions(-)`. Cling is currently 
being upgraded from llvm-5 to llvm-9.

A major weakness of cling's infrastructure is that it does not work with 
the clang Action infrastructure due to the lack of an 
IncrementalAction.  A possible way forward would be to implement a 
clang::IncrementalAction as a starting point. This way we should be able 
to reduce the amount of setup necessary to use the incremental 
infrastructure in clang. However, this will be a bit of a testing 
challenge -- cling lives downstream and some of the new code may be 
impossible to pick straight away and use. Building a mainline example 
tool such as clang-repl which gives us a way to test that incremental 
case or repurpose the already existing clang-interpreter may  be able to 
address the issue. The major risk of the task is avoiding code in the 
clang mainline which is untested by its HEP production environment.
There are several other types of patches to the ROOT fork of Clang, 
including ones  in the context of performance,towards  C++ modules 
support (D41416), and storage (does not have a patch yet but has an open 
projects entry and somebody working on it). These patches can be 
considered in parallel independently on the rest.

Extend and Generalize the Language Interoperability Layer Around Cling
---

HEP has extensive experience with on-demand python interoperability 
using cppyy[6], which is built around the type information provided by 
cling. Unlike tools with custom parsers such as swig and sip and tools 
built on top of C-APIs such as boost.python and pybind11, cling can 
provide information about memory management patterns (eg refcounting) 
and instantiate templates on the fly.We feel that functionality may not 
be of general interest to the llvm community but we will prepare another 
RFC and send it here later on to gather feedback.


Extend and Generalize the OpenCL/CUDA Support in Cling
---

Cling can incrementally compile CUDA code [7-8] allowing easier set up 
and enabling some interesting use cases. There are a number of planned 
improvements including talking to HIP [9] and SYCL to support more 
hardware architectures.



The primary focus of our work is to upstreaming functionality required 
to build an incremental compiler and rework cling build against vanilla 
clang and llvm. The last two points are to give the scope of the work 
which we will be doing the next 2-3 years. We will send here RFCs for 
both of them to trigger technical discussion if there is interest in 
pursuing this direction.


Collaboration
===

Open source development nowadays relies on reviewers. LLVM is no 
different and we will probably disturb a good number of people in the 
community ;)We would like to invite anybody interested in joining our 
incremental C++ activities to our open every second week calls. 
Announcements will be done via google group: compiler-research-announce 
(https://groups.google.com/g/compiler-research-announce).



Many thanks!


David & Vassil

Read more

Also: Cling C++ Interpreter Looking To Upstream More Code Into LLVM

More in Tux Machines

digiKam 7.7.0 is released

After three months of active maintenance and another bug triage, the digiKam team is proud to present version 7.7.0 of its open source digital photo manager. See below the list of most important features coming with this release. Read more

Dilution and Misuse of the "Linux" Brand

Samsung, Red Hat to Work on Linux Drivers for Future Tech

The metaverse is expected to uproot system design as we know it, and Samsung is one of many hardware vendors re-imagining data center infrastructure in preparation for a parallel 3D world. Samsung is working on new memory technologies that provide faster bandwidth inside hardware for data to travel between CPUs, storage and other computing resources. The company also announced it was partnering with Red Hat to ensure these technologies have Linux compatibility. Read more

today's howtos

  • How to install go1.19beta on Ubuntu 22.04 – NextGenTips

    In this tutorial, we are going to explore how to install go on Ubuntu 22.04 Golang is an open-source programming language that is easy to learn and use. It is built-in concurrency and has a robust standard library. It is reliable, builds fast, and efficient software that scales fast. Its concurrency mechanisms make it easy to write programs that get the most out of multicore and networked machines, while its novel-type systems enable flexible and modular program constructions. Go compiles quickly to machine code and has the convenience of garbage collection and the power of run-time reflection. In this guide, we are going to learn how to install golang 1.19beta on Ubuntu 22.04. Go 1.19beta1 is not yet released. There is so much work in progress with all the documentation.

  • molecule test: failed to connect to bus in systemd container - openQA bites

    Ansible Molecule is a project to help you test your ansible roles. I’m using molecule for automatically testing the ansible roles of geekoops.

  • How To Install MongoDB on AlmaLinux 9 - idroot

    In this tutorial, we will show you how to install MongoDB on AlmaLinux 9. For those of you who didn’t know, MongoDB is a high-performance, highly scalable document-oriented NoSQL database. Unlike in SQL databases where data is stored in rows and columns inside tables, in MongoDB, data is structured in JSON-like format inside records which are referred to as documents. The open-source attribute of MongoDB as a database software makes it an ideal candidate for almost any database-related project. This article assumes you have at least basic knowledge of Linux, know how to use the shell, and most importantly, you host your site on your own VPS. The installation is quite simple and assumes you are running in the root account, if not you may need to add ‘sudo‘ to the commands to get root privileges. I will show you the step-by-step installation of the MongoDB NoSQL database on AlmaLinux 9. You can follow the same instructions for CentOS and Rocky Linux.

  • An introduction (and how-to) to Plugin Loader for the Steam Deck. - Invidious
  • Self-host a Ghost Blog With Traefik

    Ghost is a very popular open-source content management system. Started as an alternative to WordPress and it went on to become an alternative to Substack by focusing on membership and newsletter. The creators of Ghost offer managed Pro hosting but it may not fit everyone's budget. Alternatively, you can self-host it on your own cloud servers. On Linux handbook, we already have a guide on deploying Ghost with Docker in a reverse proxy setup. Instead of Ngnix reverse proxy, you can also use another software called Traefik with Docker. It is a popular open-source cloud-native application proxy, API Gateway, Edge-router, and more. I use Traefik to secure my websites using an SSL certificate obtained from Let's Encrypt. Once deployed, Traefik can automatically manage your certificates and their renewals. In this tutorial, I'll share the necessary steps for deploying a Ghost blog with Docker and Traefik.