Language Selection

English French German Italian Portuguese Spanish

Kde Planet

Syndicate content
Planet KDE - http://planetKDE.org/
Updated: 7 hours 14 min ago

Cloud providers and telemetry via Qt MQTT

Thursday 29th of August 2019 07:01:57 AM

This is a follow up to my previous posts about using Qt MQTT to connect to the cloud. MQTT is a prominent standard for telemetry, especially in the IoT scenario.

 

We are often approached by Qt customers and users on how to connect to a variety of cloud providers, preferably keeping the requirements list short.

With this post I would like to provide some more information on how to create a connection by just using Qt, without any third-party dependency. For this comparison we have chosen the following cloud providers:

The ultimate summary can be viewed in this table

 

The source code to locally test the results is available here.

However, if you are interested in this topic, I recommend preparing a pitcher of coffee and continue reading…

And if you are want to jump to a specific topic, use these shortcuts:

Preface
Getting connected
Standard derivation (limitations)
Available (custom) topics
Communication routes
Other / references
Additional notes
How can I test this myself?
Closing words


Preface / Setting expectations

Before getting into the details I would like to emphasize some details.

First, the focus is on getting devices connected to the cloud. Being able to send and receive messages is the prime target. This post will not talk about services, features, or costs by the cloud providers themselves once messages are in the cloud.

Furthermore, the idea is to only use Qt and/or Qt MQTT to establish a connection. Most, if not all, vendors provide SDKs for either devices or monitoring (web and native) applications. However, using these SDKs extends the amount of additional dependencies, leading to higher requirements for storage and memory.

The order in which the providers are being evaluated in this article is based on public usage according to this article.

Getting connected

The very first steps for sending messages are to create a solution for each vendor and then establish a TCP connection.

Amazon IoT Core

We assume that you have created an AWS account and an IoT Core service from your AWS console in the browser.

The dashboard of this service looks like this:

The create button open a wizard which will help setting up the first device.

The only required information is the name of the device. All other items can be left empty.

The service allows to automatically create a certificate to be used for a connection later.

Store the certificates (including the root CA) and keep them available to be used in an application.

For now, no policy is required. But we will get into this at a later stage.

The last missing piece to start implementing an example is the hostname to connect to. AWS provides a list of endpoints here. Please note, that for MQTT you must use the account-specific prefix. Also, you can find the information on the settings page of the AWS IOT dashboard.

Using Qt MQTT, a connection is then established with those few lines:

const QString host = QStringLiteral("<your-endpoint>.amazonaws.com"); const QString rootCA = QStringLiteral("root-CA.crt"); const QString local = QStringLiteral("<device>.cert.pem"); const QString key = QStringLiteral("<device>.private.key"); QMqttClient client; client.setKeepAlive(10000); client.setHostname(host); client.setPort(8883); client.setClientId("basicPubSub"); QSslConfiguration conf; conf.setCaCertificates(QSslCertificate::fromPath(rootCA)); conf.setLocalCertificateChain(QSslCertificate::fromPath(local)); QSslKey sslkey(readKey(key), QSsl::Rsa); conf.setPrivateKey(sslkey); client.connectToHostEncrypted(conf);

A couple of details are important for a successful connection:

  • The keepalive value needs to be within a certain threshold. 10 seconds seem to be a good indicator.
  • Port 8883 is the standardized port for encrypted MQTT connections.
  • The ClientID must be basicPubSub. This is a valid ID auto-generated during the creation of the IoT Core instance.

 

Microsoft Azure IoT Hub

First, an account for the Azure Portal needs to be created. From the dashboard you need to create a new “Iot Hub” resource.

The dashboard can be overwhelming initially, as Microsoft puts many cloud services and features on the fore-front. As the focus is on getting a first device connected, the simplest way is to go to Shared access policies and create a new access policy with all rights enabled.

This is highly discouraged in a production environment for security reasons.

Selecting the freshly-created policy we can copy the connection string.

Following, we will use the Azure Device Explorer application, which can be downloaded here. This application suits perfectly for testing purposes. After launch, enter the connection string from above into the connection test edit and click update.

The management tab allows for creating new test devices, specifying either an authentication via X509 or Security Keys. Security keys are the preselected standard method, which we aim at as well.

Lastly, the Device Explorer allows us to create a SAS token, which will be needed to configure the MQTT client. A token has the following shape:

HostName=<yourIoTHub>.azure-devices.net;DeviceId=<yourDeviceName>;SharedAccessSignature=SharedAccessSignature sr==<yourIoTHub>.azure-devices.net%2F…..

We only need this part for authentication:

SharedAccessSignature sr==<yourIoTHub>.azure-devices.net%2F…..

The Azure IoT Hub uses TLS for the connection as well. To achieve the root CA, you can clone the Azure IoT C SDK located here or obtain the DigiCert Baltimore Root Certificate manually. Neither the web interface nor the Device Explorer provides it.

To establish a connection from a Qt application using Qt MQTT the code looks like this

const QString iotHubName = QStringLiteral("<yourIoTHub>"); const QString iotHubHostName = iotHubName + QStringLiteral(".azure-devices.net"); const QString deviceId = QStringLiteral("<yourDeviceName>"); QMqttClient client; client.setPort(8883); client.setHostname(iotHubHostName); client.setClientId(deviceId); client.setUsername(iotHubHostName + QStringLiteral("/") + deviceId + QStringLiteral("/?api-version=2018-06-30")); client.setPassword(QLatin1String("SharedAccessSignature sr=<yourIoTHub>.azure-devices.net%2Fdevices…")); auto caCerts = QSslCertificate::fromData(QByteArray(certificates)); QSslConfiguration sslConf; sslConf.setCaCertificates(caCerts); client.connectToHostEncryped(sslConf); Google Cloud IoT Core

Once you have created an account for the Google Cloud Platform, the web interface provides a wizard to get your first project running using Cloud IoT Core.

Once the project has been created, it might be hard to find your registry. A registry stores all information on devices, communication, rules, etc.

Similar to Microsoft Azure, all available services are placed on the dashboard. You will find the IoT Core item in the Big Data section on the left side.

After using the Google Cloud Platform for a while, you will find the search very useful to get to your target page.

From the registry itself you can now add new devices.

The interface asks you to provide the keys/certificates for your device. But it does not have a mean to create some from the service itself. Documentation exists on how to create these. And at production stage those steps will probably be automated in a different manner. However, for getting started these are additional steps required, which can become a hurdle.

Once your device is entered into the registry, you can start with the client side implementation.

Contrary to other providers, Google Cloud IoT Core does not use the device certificate while creating a connection. Instead, the private key is used for the password creation. The password itself needs to be generated as a JSON Web Token. While JSON Web Tokens are an open industry standard, this adds another dependency to your project. Something needs to be able to create these tokens. Google provides some sample code here, but adaptations to include it into an application are required.

The client ID for the MQTT connection is constructed of multiple parameters and has the following form:

projects/PROJECT_ID/locations/REGION/registries/REGISTRY_ID/devices/DEVICE_ID

From personal experience, be aware of case sensitivity. Everything but the Project ID keeps the same capitalization as you created your project, registry and device. However, the project ID will be stored in all lower-case.

Having considered all of this, the simplest implementation to establish a connection looks like this:

const QString rootCAPath = QStringLiteral("root_ca.pem"); const QString deviceKeyPath = QStringLiteral("rsa_private.pem"); const QString clientId = QStringLiteral("projects/PROJECT_ID/locations/REGION/registries/REGISTRY_ID/devices/DEVICE_ID"); const QString googleiotHostName = QStringLiteral("mqtt.googleapis.com"); const QString password = QByteArray(CreateJwt(deviceKeyPath, "<yourprojectID>", "RS256");); QMqttClient client; client.setKeepAlive(60); client.setPort(8883); client.setHostname(googleiotHostName); client.setClientId(clientId); client.setPassword(password); QSslConfiguration sslConf; sslConf.setCaCertificates(QSslCertificate::fromPath(rootCAPath)); client.connectToHostEncrypted(sslConf); Alibaba Cloud IoT Platform

The Alibaba Cloud IoT Platform is the only product which does come in multiple variants, a basic and a pro version. As of writing of this article this product structure seems to have changed. From what we can say it does not have an influence on the MQTT related items investigated here.

After creating an account for the Alibaba Cloud the web dashboard allows to create a new IoT Platform instance.

Following the instantiation, a wizard interface allows to create a product and a device.

From these we need a couple of details to establish a MQTT connection.

  • Product Key
  • Product Secret
  • Device Name
  • Device Secret

The implementation requires a couple of additional steps. To acquire all MQTT specific properties, the Client ID, username and password are created by concatenations and signing. This procedure is fully documented here. For convenience the documentation also includes example source code to handle this. If the concern is to not introduce external code, the instructions in the first link have to be followed.

To connect a QMqtt client instance, this is sufficient

iotx_dev_meta_info_t deviceInfo; qstrcpy(deviceInfo.product_key, "<yourproductkey>"); qstrcpy(deviceInfo.product_secret, "<yourproductsecret>"); qstrcpy(deviceInfo.device_name, "<yourdeviceID>"); qstrcpy(deviceInfo.device_secret, "<yourdeviceSecret>"); iotx_sign_mqtt_t signInfo; int32_t result = IOT_Sign_MQTT(IOTX_CLOUD_REGION_GERMANY, &deviceInfo, &signInfo); QMqttClient client; client.setKeepAlive(10000); client.setHostname(QString::fromLocal8Bit(signInfo.hostname)); client.setPort(signInfo.port); client.setClientId(QString::fromLocal8Bit(signInfo.clientid)); client.setUsername(QString::fromLocal8Bit(signInfo.username)); client.setPassword(QString::fromLocal8Bit(signInfo.password)); client.connectToHost();

 

You might recognize that we are not using QMqttClient::connectToHostEncrypted() as for all other providers. The Alibaba Cloud IoT Platform is the only vendor, which uses a non-TLS connection by default. It is documented, that it is possible to use one and also to receive a rootCA. However, the fact that this is possible after all surprises.

Standard derivation (limitations)

So far, we have established a MQTT connection to each of the IoT vendors. Each uses a slightly different approach to identify and authenticate a device, but all of these services follow the MQTT 3.1.1 standard.

However, for the next steps developers need to be aware of certain limitations or variations to the standard. These will be discussed next.

None of the providers have built-in support for quality-of-service (QoS) level 2. To some extend that makes sense, as telemetry information do not require multiple steps to verify message delivery. Whether a message is processed and validated is not of interest in this scenario. A developer should be aware of this limitation though.

To refresh our memory on terminology, let us briefly recap retained and will messages.

Retained messages are stored on the server side for future subscribers to receive the last information available on a topic. Will messages embedded to the connection request and will only be propagated in case of an unexpected disconnect from the client.

 

Amazon IoT Core

The client ID is used to identify a device. If a second device uses the same ID during a connection attempt, then the first device will be disconnected without any notice. The second device will connect successfully. If your application code contains some sort of automatic reconnect, this can cause all devices with the same client ID to be unavailable.

Retained messages are not supported by AWS and trying to send a retained message will cause the connection to be closed.

AWS IoT Core supports will messages within the given allowed topics.

A full description of standard deviations can be viewed here.

 

Microsoft Azure IoT Hub

The client ID is used to identify a device. The behavior of two devices with the same ID is the same as for Amazon IoT Core.

Retained messages are not supported on the IoT Hub. However, the documentation states that the Hub will internally append a flag to let the backend know that the messages was intended as retained.

Will messages are allowed and supported, given the topic restrictions which will be discussed below.

A full description of standard deviations can be viewed here.

 

Google Cloud IoT Core

This provider uses the client ID and the password to successfully identify a device.

Messages flagged as retain seem to lose this option during delivery. According to the debug logs they are forwarded as regular messages. We have not found any documentation about whether it might behave similar to the Azure IoT Hub, which forwards this request to its internal message queue.

Will messages do not seem to be supported. While it is possible to store a will message in the connect statement, it does get ignored in case of irregular disconnect.

 

Alibaba Cloud IoT Platform

The triplet of client ID, username and password are used to identify a device within a product.

Both, the retain flag as well as will messages, are getting ignored from the server side. A message with retain specified is forwarded as a regular message and lost after delivery. Will messages are ignored and not stored anywhere during a connection.

Available (custom) Topics

MQTT uses a topic hierarchy to create a fine-grained context for messages. Topics are similar to a directory structure, starting from generic to device specific. One example of a topic hierarchy would be

Sensors/Europe/Germany/Berlin/device_xyz/temperature

Each IoT provider handles topics differently, so developers need to be very careful on this section.

 

Amazon IoT Core

First, one needs to check which topics can be used by default. From the dashboard, browse to Secure-> Policies and select the default created policy. It should look like this

AWS IoT Core specifies policies in JSON format, and you will find some of the previous details specified in this document. For instance, the available client IDs are specified in the Connect resource. It also allows to declare which topics are valid for publication, subscribing and receiving. It is possible to have multiple policies in place and devices need to have a policy attached. That way, it allows for a fine-grained security model where certain types groups have different access rights.

Note that the topic description also allows wildcards. Those should not be confused with the wildcards in the MQTT standard. Meaning, you must use * instead of # to enable all subtopics.

Once you have created a topic hierarchy based on your needs, the code itself is simple

client.publish(QStringLiteral("topic_1"), "{\"message\":\"Somecontent\"}", 1); client.subscribe(QStringLiteral("topic_1"), 1);

 

Microsoft Azure IoT Hub

The IoT Hub merely acts as an interface to connect existing MQTT solutions to the Hub. A user is not allowed to specify any custom topic, nor is it possible to introduce a topic hierarchy.

A message can only be published in the following shape:

const QString topic = QStringLiteral("devices/") + deviceId + QStringLiteral("/messages/events/"); client.publish(topic, "{id=123}", 1);

 

For subscriptions similar limitations exist

client.subscribe(QStringLiteral("devices/") + deviceId + QStringLiteral("/messages/devicebound/#"), 1);

 

The wildcard for the subscription is used for additional information that the IoT Hub might add to a message. This can be a message ID for instance. To combine multiple properties the subtopic itself is url-encoded. An example message send from the IoT Hub has this topic included

devices/TestDevice01/messages/devicebound/%24.mid=7493c5cc-d783-4ecd-8129-d3c87590b544&%24.to=%2Fdevices%2FTestDevice01%2Fmessages%2FdeviceBound&iothub-ack=full

 

Google Cloud IoT Core

By default, a MQTT client should use this topic for publication

/devices/<deviceID>/events

But it is also possible to add additional topics using the Google Cloud Shell or other APIs.

In this case a topic customCross has been created. Those additional topics are reflected as subtopics on the MQTT side, though, meaning to publish a message to this topic, it would be

/devices/<deviceID>/events/customCross

For subscriptions custom topics are not available and there are only two available topics a client can subscribe to

/devices/<deviceID>/commands/# /devices/<deviceID>/config/

Config messages are retained messages from the cloud. Those will be send every time a client connects to keep the device in sync.

 

Alibaba Cloud IoT Platform

Topics can easily be managed in the Topic Categories tab of the product dashboard.

Each topic can be configured to receive, send or bidirectional communication. Furthermore, a couple of additional topics are generated by default to help creating a scalable structure.

Note that the topic always contains the device ID. This has implications on communication routes as mentioned below.

Communication routes

Communication in the IoT context can be split into three different categories

  1. Device to Cloud (D2C)
  2. Cloud to Device (C2D)
  3. Device to Device (D2D)

The first category is the most common one. Devices provide information about their state, sensor data or any other kind of information. Talking in the other direction happens in the case of providing behavior instructions, managing debug levels or any generic instruction.

Regarding device-to-device communication, we need to be a bit more verbose on the definition inside this context. A typical example can be taken from the home automation. Given a certain light intensity, the sensor propagates the information and the blinds automatically react to this by going down (Something which never seems to work properly in office spaces ). Here, all logic is handled on the devices and no cloud intelligence is needed. Also, no additional rules or filters need to be created in the cloud instance itself. Surely, all tested providers can instantiate a method running in the cloud and then forwarding a command to another device. But that process is not part of this investigation.

 

Amazon IoT Core

In the previous section we already covered the D2C and C2D cases. Once a topic hierarchy has been specified a client can publish to these topics, and also subscribe to one.

To verify that the C2D connection works, select the Test tab on the left side of the dashboard. The browser will show a minimal interface, which allows to send a message with a specified topic.

Also, the device-to-device case is handled nicely by subscribing and publishing to a topic as specified in the policy.

Microsoft Azure IoT Hub

It is possible to send messages from a device to the cloud and vice-versa. However, a user is not free to choose a topic.

For sending the Device Explorer is a good utility, especially for testing the property bag feature.

Device to device communication as in our definition is not possible using Azure IoT Hub.

During the creation of this post, this article popped up. It talks about this exact use case using the Azure SDKs instead of plain MQTT. The approach there is to locate the Service SDK on the recipient device. So for bidirectional communication this would be needed on all devices, with the advantage of not routing through any server.

Google Cloud IoT Core

Sending messages from a device to the cloud is possible, allowing further granularity with subtopics for publication. Messages are received on two available topics as discussed in above section.

As the custom topics still include the device ID, it is not possible to use a Google Cloud IoT Core instance as standard broker to propagate messages between devices (D2D).

The dashboard for a device allows to send a command, as well as a configuration from the cloud interface to the device itself.

 

Alibaba Cloud IoT Platform

Publishing and Subscribing can be done in a flexible manner using the IoT Platform. (Sub-)Topics can be generated to provide more structure.

To test sending a message from the cloud to a device, the Topic List in the device dashboard includes a dialog.

Device to device communication is also possible. Topics for these cannot be freely specified, they must reside exactly one level below

/broadcast/<yourProductName>/

. The topic on this sub-level can be chosen freely.

Other / References Amazon IoT Core Microsoft Azure IoT Hub Google Cloud IoT Core Alibaba Cloud IoT Platform

 

Additional notes

MQTT version 5 seems to be too young for the biggest providers to adopt to. This is very unfortunate, given that the latest standard adds in a couple of features specifically useful in the IoT world. Shared subscriptions would allow for automatic balancing of tasks, the new authentication command allows for higher flexibility registering devices, connection and message properties enable cloud connectivity to be more performant and easier to restrict/configure, etc. But at this point in time, we will have to wait for its adoption.

Again, I want to emphasize that we have not looked into any of the features above IoT solutions provide to handle messages once received. This is a part of a completely different study and we would be very interested in hearing from your results in that field.

 

Additionally, we have not included RPC utilization of the providers. Some have hard coded topics to handle RPC like Google differentiating between commands and configuration. Alibaba even uses default topics to handle firmware update notifications via MQTT. TrendMicro has released a study on security related concerns in MQTT and RPC has a prominent spot in there, a must read for anyone setting up a MQTT architecture from scratch.

How can I test this myself?

I’ve created a sample application, which allows to connect to any of the above cloud vendors when required details are available. The interface itself is rather simple:





 

You can find the source code here on GitHub.

Closing words

For any of the broader IoT and cloud providers it is possible to connect a telemetry-based application using MQTT (and Qt MQTT). Each has different variations on connection details, also to which extend the standard is fully available for developers.

Personally, I look forward to the adoption of MQTT version 5. The AUTH command allows for better integration of authentication methods, other features like topic aliases and properties bring in further use-cases for the IoT world. Additionally, shared-subscription are beneficial to create a data-worker relationship between devices. This last point however might step onto the toes of cloud vendors, as the purpose of them is to handle the load inside the cloud.

I would like to close this post with questions to you.

  • What is your experience with those cloud solutions?
  • Is there anything in the list, I might have missed?
  • Should other vendors or companies be included as well?

Looking forward to your feedback…

The post Cloud providers and telemetry via Qt MQTT appeared first on Qt Blog.

Little Trouble in Big Data – Part 3

Wednesday 28th of August 2019 12:22:54 PM

In the previous two blogs in this series I showed how solving an apparently simple problem about loading a lot of data into RAM using mmap() also turned out to require a solution that improved CPU use across cores.

In this blog, I’ll show how we dealt with the bottleneck problems that ensued, and finally, how we turned to coarse threading to utilize the available cores as well as possible whilst keeping the physical memory usage doable.

These are the stages we went through:

  1. Preprocessing
  2. Loading the Data
  3. Fine-grained Threading

Now we move to Stage 4, where we tackle the bottleneck problem:

4 Preprocessing Reprise

So far so good right? Well yes and no. We’ve improved things to use multithreading and SIMD but profiling in vtune still showed bottlenecks. Specifically, in the IO subsystem, paging the data from disk into system memory (via mmap). The access pattern through the data is the classic thing we see with textures in OpenGL where the texture data doesn’t all fit into GPU memory, so it ends up thrashing the texture cache with the typical throw-out-the-least-recently-used-stuff as of course we need the oldest stuff again on the next iteration of the outer loop.

This is where the expanded, preprocessed data is biting us in the backside. We saved runtime cost at the expense of disk and RAM usage and this is now the biggest bottleneck to the point where we can’t feed the data from disk (SSD) to the CPU fast enough to keep it fully occupied.

The obvious thing would be to reduce the data size, but how? We can’t use the old BED file format, as the quantization used is too coarse for the offset + scaled data. We can’t use lower precision floats as that only reduces by a small constant factor. Inspecting the data of some columns in the matrix, I noticed that there are very many repeated values. Which makes total sense given the highly quantized input data. So, we tried compressing each column using zlib. This worked like magic – the preprocessed data came out only 5% larger than the quantized original BED data file!

Because we are compressing each column of the matrix independently, and the compression ratio varies depending upon the needed dictionary size and the distribution of repeated elements throughout the column, we need a way to be able to find the start and end of each column in the compressed preprocessed bed file. So, whilst preprocessing, we also write out a binary index companion file which, for each column, stores the offset of the column start in the main file and its byte size.

So when wanting to process a column of data in the inner loop, we lookup in the index file the extent of the column’s compressed representation in the mmap()‘d file, decompress that into a buffer of the right size (we know how many elements each column has, it’s the number of people) and then wrap that up in the Eigen Map helper.

Using zlib like this really helped in reducing the storage and memory needed. However, now profiling showed that the bottleneck had shifted to the decompression of the column data. Once again, we have improved things, but we still can’t keep the CPU fed with enough data to occupy it for that inner loop workload.

5 Coarse Threading

How to proceed from here? What we need is a way to balance the CPU threads and cycles used for decompressing the column data with the threads and cycles used to then analyze each column. Remember that we are already using SIMD vectorization and parallel_for and parallel_reduce for the inner loop workload.

After thinking over this problem for a while I decided to have a go at solving it with another feature of Intel TBB, the flow graph. The flow graph is a high-level interface and data driven way to construct parallel algorithms. Once again, behind the scenes this eventually gets decomposed into the threadpool + tasks as used by parallel_for and friends.

The idea is that you construct a graph from various pre-defined node types into which you can plug lambdas for performing certain operations. You can then set options on the different node types and connect them with edges to form a data flow graph. Once set up, you send data into the graph via a simple message class/struct and it flows through until the results fall out of the bottom of the graph.

There are many node types available but for our needs just a few will do:

  • Function node: Use this along with a provided lambda to perform some operation on your data e.g. decompress a column of data or perform the doStuff() inner loop work. This node type can be customized as to how many parallel instantiations of tasks it can make from serial behavior to any positive number. We will have need for both as we shall see.
  • Sequencer node: Use this node to ensure that data arrives at later parts of the flow graph in the correct order. Internally it buffers incoming messages and uses a provided comparison functor to re-order the messages ready for output to successor nodes.
  • Limiter node: Use this node type to throttle the throughput of the graph. We can tell it a maximum number of messages to buffer from predecessor nodes. Once it reaches this limit it blocks any more input messages until another node triggers it to continue.

I’ve made some very simple test cases of the flow graph in case you want to see how it works and how I built up to the final graph we used in practice.

The final graph used looks like this:

A few things to note here:

  1. We have a function node to perform the column decompression. This is allowed to use multiple parallel tasks as each column can be decompressed independently of the others due to the way we compressed the data at preprocess time.
  2. To stop this from decompressing the entire data set as fast as possible and blowing up our memory usage, we limit this with a limiter node set to some small number roughly equal to the number of cores.
  3. We have a second function node limited to sequential behavior that calls back to our algorithm class to do the actual work on each decompressed column of data.

Then we have the two ordering nodes. Why do we need two of them? The latter one ensures that the data coming out of the decompression node tasks arrives in the order that we expect (as queued up by the inner loop). This is needed because, due to the kernel time-slicing the CPU threads, they may finish in a different order to which they were enqueued.

The requirement for the first ordering node is a little more subtle. Without it, the limiter node may select messages from the input in an order such that it fills up its internal buffer but without picking up the first message which it needs to send as an output. Without the ordering node up front, the combination of the second ordering node and the limiter node may cause the graph to effectively deadlock. The second ordering node would be waiting for the nth message, but the limiter node is already filled up with messages which do not include the nth one.

Finally, the last function node which processes the “sequential” (but still SIMD and parallel_for pimped up) part of the work uses a graph edge to signal back to the limiter node when it is done so that the limiter node can then throw the next column of data at the decompressor function node.

With this setup, we have a high level algorithm which is self-balancing between the decompression steps and the sequential doStuff() processing! That is actually really nice, plus it is super simple to express in just a few lines of code and it remains readable for future maintenance. The code to setup this graph and to queue up the work for each iteration is available at github.

The resulting code now uses 100% of all available cores and is balancing the work of decompression and processing the data. Meanwhile the data processing also utilizes all cores well. The upside of moving the inner loop to be represented by the flow graph means that the decompression + column processing went from 12.5s per iteration (on my hexacore i7) to 3s. The 12.5s was measured with the sequential workload already using parallel_for and SIMD. So this is another very good saving.

Summary

We have shown how a simple “How do I use mmap()?” mentoring project has grown beyond its initial scope and how we have used mmap, Eigen,parallel_for/parallel_reduce, flow graphs and zlib to nicely make the problem tractable. This has shown a nice set of performance improvements whilst at the same time keeping the disk and RAM usage within feasible limits.

  • Shifted work that can be done once to a preprocessing step
  • Kept the preprocessed data size down as low as possible with compression
  • Managed to load even large datasets into memory at once with mmap
  • Parallelized the inner loop operations at a low level with parallel_for
  • Parallelized the high-level loop using the flow graph and made it self-balancing
  • Fairly optimally utilizing the available cores whilst keeping the physical memory usage down (number of threads used * col size roughly).

Thanks for reading this far! I hope this helps reduce your troubles, when dealing with big data issues.

The post Little Trouble in Big Data – Part 3 appeared first on KDAB.

GSoC ’19 comes to an end

Wednesday 28th of August 2019 12:08:39 PM

GSoC period is officially over and here is a final report of my work in the past 3 months.

QmlRenderer library

The library will be doing the heavy lifting by rendering QML templates to QImage frames using QQuickRenderControl in the new MLT QML producer. Parameters that can be manipulated are:

  • FPS
  • Duration
  • DPI
  • Image Format

The library can be tested using QmlRender (a CLI executable).

Example:

./QmlRender -i “/path/to/input/QML/file.qml” -o “/path/to/output/directory/for/frames”

./QmlRender –help reveals all the available options that may be manipulated.

MLT QML Producer

What has been done so far?

  • A working and tested QmlRenderer library
  • Basic code to the QML MLT producer

What work needs to be done?

  • Full-fledged MLT QML producer
  • Basic titler on Kdenlive side to test

Check out the in full GSOC depth report here.

The whole experience for the last 8 months, right from the first patch to the titler project, has been great with a steep learning curve but I have thoroughly enjoyed the whole process. I seek to continue improving Kdenlive and I’m really thankful to all the Kdenlive developers and the community for presenting me with this fine opportunity to work for the revamp of an important feature in our beloved editor.

Although GSoC is “officially” over, the new Titler as a project in whole is far from done and I will continue working on it. So nothing really changes. 

The next update will be when we get a working backend set up – until then!

polkit-qt-1 0.113.0 Released

Tuesday 27th of August 2019 07:09:59 PM

Some 5 years after the previous release KDE has made a new release of polkit-qt-1, versioned 0.113.0.

Polkit (formerly PolicyKit) is a component for controlling system-wide privileges in Unix-like operating systems. It provides an organized way for non-privileged processes to communicate with privileged ones.   Polkit has an authorization API intended to be used by privileged programs (“MECHANISMS”) offering service to unprivileged programs (“CLIENTS”).

Polkit Qt provides Qt bindings and UI.

This release was done ahead of additions to KIO to support Polkit.

SHA-256:
5b866a2954ef10ffb66156e2fe8ad0321b5528a8df2e4a91b02f5041ce5563a7
GPG fingerprint:
D81C0CB38EB725EF6691C385BB463350D6EF31EF

Notable changes since 0.112.0
———————————————————
– Add support for passing details to polkit
– Remove support for Qt4

https://download.kde.org/stable/polkit-qt-1/

Thanks to Heiko Becker for his work on this release.

Full changelog

  •  Bump version for release
  •  Don’t set version numbers as INT cache entries
  •  Move cmake_minimum_required to the top of CMakeLists.txt
  •  Remove support for Qt4
  •  Remove unneded documentation
  •  authority: add support for passing details to polkit
    https://phabricator.kde.org/D18845
  •  Fix typo in comments
  •  polkitqtlistener.cpp – pedantic
  •  Fix build with -DBUILD_TEST=TRUE
  •  Allow compilation with older polkit versions
  •  Fix compilation with Qt5.6
  •  Drop use of deprecated Qt functions REVIEW: 126747
  •  Add wrapper for polkit_system_bus_name_get_user_sync
  •  Fix QDBusArgument assertion
  • do not use global static systembus instance

 

Day 92 – The last day

Monday 26th of August 2019 12:08:00 PM

After the second coding period, I was in the begin of the backend development. I’ll list and explain what was made in this period. After GSoC, I’ll still work on Khipu to move it out from Beta soon, then, I’ll fix the bugs and try to implement the things that are missing and new features.

GitHub link: https://github.com/KarinaPassos/khipu3

Finished or almost finished tasks:

Plot Dictionaries: an old Khipu feature, it gives to the user examples of valid inputs and about what the old program can do. The old version was a window with premade examples. I tried to make it simpler and it only creates a 2D and 3D space with some examples. I changed the name to “Plot Examples”.

Save/Load files: the save option will save your spaces and plots in a JSON file, and the load option will load them, and it only accepts JSON files.

Edit space name: when the user creates a new space, the default name is “2D Space” or “3D Space”. Then, it allows the user to rename the space.

Edit plot dialog: it’s currently a dialog that should provide to the user the options of set the expression, visibility and color of a plot in some space. It’s currently crashing the program in some situations.

Search box: it should provide to the user the option of find a space. It’s working but there’s a bug when the user clicks in the search result. For example: if the user clicks in the third search result, it will open the third in the original list, not the third result.

Menubar visibility: I chose the F1 key to show/hide the menubar, but it’s not working and I still don’t know why.

Old features that still were not implemented:

Help menu: the menu with information about KDE, Khipu’s developers, documentation, bug report.

Grid settings: it allows the user to select which kind of grid (normal or polar) will be used and set the grid color.

Cilindric surfaces, parametric surfaces, spacial curves, parametric curves, polar curves: I used the Kalgebra mobile as reference to my code, then, I only created a simple box which receives a simple expression. You still can’t set intervals and create other types of functions. But you can plot simple and implicit functions, you can lines and planes like “x=3”, and solve equations like “sin(x) = 1”.

Snapshot: a photo of each space to appear with the space information.

Open Source is more than licenses

Monday 26th of August 2019 10:15:26 AM

A few weeks ago I was honored to deliver the keynote of the Open Source Awards in Edinburgh. I decided to talk about a subject that I wanted to talk about for quite some time but never found the right opportunity for. There is no video recording of my talk but several people asked me for a summary. So I decided to use some spare time in a plane to summarize it in a blog post.

I started to use computers and write software in the early 80s when I was 10 years old. This was also the time when Richard Stallman wrote the 4 freedoms, started the GNU project, founded the FSF and created the GPL. His idea was that users and developers should be in control of the computer they own which requires Free Software. At the time the computing experience was only the personal computer in front of you and the hopefully Free and Open Source software running on it.

The equation was (Personal Hardware) + (Free Software) = (Digital Freedom)

In the meantime the IT world has changed and evolved a lot. Now we have ubiquitous internet access, computer in cars, TVs, watches and other IoT devices. We have the full mobile revolution. We have cloud computing where the data storage and compute are distributed over different data centers owned and controlled by different people and organizations all over the world. We have strong software patents, DRM, code signing and other crypto, software as a service, more closed hardware, social networking and the power of the network effect.

Overall the world has changed a lot since the 80s. Most of the Open Source and Free Software community still focuses mainly on software licenses. I’m asking myself if we are not missing the bigger picture by limiting the Free Software and Open Source movement to licensing questions only.

Richard Stallman wanted to be in control of his computer. Let’s go through some of the current big questions regarding control in IT and let’s see how we are doing:

Facebook

Facebook is lately under a lot of attack for countless violations of user privacy, being involved in election meddling, triggering a genocide in Myanmar, threatening democracy and many other things. Let’s see if Free Software would solve this problem:

If Facebook would release all the code tomorrow as Free and Open Source software our community would be super happy. WE have won. But would it really solve any problems? I can’t run Facebook on my own computer because I don’t have a Facebook server cluster. And even if I could it would be very lonely there because I would be the only user. So Free Software is important and great but actually doesn’t give users and freedom or control in the Facebook case. More is needed than Free Software licenses.

Microsoft

I hear from a lot of people in the Free and Open Source community that Microsoft is good now. They changed under the latest CEO and are no longer the evil empire. They now ship a Linux kernel in Windows 10 and provide a lot of Free and Open Source tools in their Linux containers in the Azure Cloud. I think it’s definitely a nice step in the right direction but their Cloud solutions still have the strongest vendor lock-in, Windows 10 is not free in price nor gives you freedom. In fact they don’t have an Open Source business model anywhere. They just USE Linux and Open Source. So the fact that more software in the Microsoft ecosystem is now available under Free Software licenses doesn’t give any more freedom to the users.

Machine Learning

Machine Learning is an important new technology that can be used for many thing from picture recognition to voice recognition to self driving cars. The interesting thing is that the hardware and the software alone are useless. What is also needed for a working machine learning system are the data to train the neural network. This training data is often the secret ingredient which is super valuable. So if Tesla would release all their software tomorrow as Free Software and you would buy a Tesla to have access to the hardware than you are still unable to study, build and improve the self driving car functionality. You would need the millions of hours of video recording and driver data to make your neural network useful. So Free software alone is not enough to give users control

5G

There is a lot of discussions in the western world if 5G infrastructure can be trusted. Do we know if there are back doors in cell towers if they are bought from Huawei or other Chinese companies? The Free and Open Source community answers that the software should be licenses under a Free Software license and then all is good. But can we actually check if the software running on the infrastructure is the same we have as source code? For that we would need reproducible builds, access to all the code signing and encryption keys and the infrastructure should fetch new software updates from our update server and not the one provided by the manufacturer. So the software license is important but doesn’t give you the full control and freedom.

Android

Android is a very popular mobile OS in the Free Software community. The reason is that it’s released under a Free Software license. I know a lot of Free Software activists who run a custom build of Android on their phone and only install Free Software from app stores like F-Droid. Unfortunately 99% of normal users out there don’t get these freedoms because their phones can’t be unlocked, or they lack the technical knowledge how to do it or they rely on software that is only available in the Google PlayStore. Users are trapped in the classic vendor lock-in. So the fact that the Android core is Free Software actually doesn’t give much freedom to 99% of all its users.

So what is the conclusion?

I think the Open Source and Free Software community who cares about the 4 freedoms of Stallman and being in control of their digital lives and user freedom has to expand their scope. Free Software licenses are needed but are by far not enough anymore to fight for user freedom and to guarantee users are in control of their digital life. The formula (Personal Hardware) + (Free Software) = (Digital Freedom) is not valid anymore. There are more ingredients needed. I hope that the Free Software community can and will reform itself to focus on more topics than licenses alone. The world needs people who fight for digital rights and user freedoms now more than ever.

[GSoC – 6] Achieving consistency between SDDM and Plasma

Sunday 25th of August 2019 07:16:21 PM

Previously: 1st GSoC post 2nd GSoC post 3rd GSoC post 4th GSoC post 5th GSoC post Roughly a year ago I made a post titled How I'd improve KDE Plasma - a user's point of view. I never shared the post publicly, but revisiting the first topic of the post — "my biggest pet peeve"...... Continue Reading →

Pay another respect to kritacommand--which we are going beyond

Sunday 25th of August 2019 06:34:27 PM

Your work is gonna make Krita significantly different.– Wolthera, Krita developer and digital artist

Krita’s undo system, namely kritacommand, was added 8 years ago to Calligra under the name of kundo2, as a fork of Qt’s undo framework. The use of undo commands, however, might have an even longer history. Undo commands provide a way to revert individual actions. Up to now, most (though not all) undo commands do it by providing two sets of code that do and undo the actions, respectively. Drawbacks of this system includes (1) it is not very easy to manage; (2) it may introduce duplicated code; and (3) it makes it hard to access a previous document state without actually going back to that state. What I do is to start getting rid of such situation.

The plan for a new system is to use shallow copies to store documents at different states. Dmitry said “it was something we really want to do and allows us to make historical brushes (fetch content from earlier document states).” And according to him, he spent years to implement copy-on-write on paint layers. He suggested me to start from vector layers which he thought would be easier since it does not need to be very thread-safe.

I completely understood that was a challenge, but did not realize where the difficult part was until I come here. Copy-on-write is not the challenging part. We have QSharedDataPointer and almost all the work is to routinely replace the same code. Porting tools is more difficult. The old flake tools are running under the GUI thread, which makes no requirement on thread-safety. Technically we do not need to run it in a stroke / in image thread but with no multithreading the tools runs too slowly on some computers (read as “my Thinkpad laptop”) so I am not unwilling to take this extra challenge. In previous posts I described how the strokes work and the problems I encountered. Besides that there are still some problems I need to face.

the HACK code in the stroke strategy

At the last of the strokes post, I proposed a fix to the crash when deleting KisNode, which is messy. After testing with Dmitry at the sprint, we discovered that the real problems lies in KoShapeManager‘s updateTreeCompressor. It is used to schedule updates of its R-tree. However, it is run at the beginning of every other operation so Dmitry says it is no longer needed. After the compressor was removed we are safe to delete the node normally so there would be no need for such hack code.

Path tool crashing when editing calligraphic shapes

Calligraphic shapes, coming from Karbon, is a shape created by hand-drawing. It has many path points and editing it using path tool usually leads to a crash. Dmitry tested it with ASan and discovered the problem occurs because the path points, which is fetched in the GUI thread to paint the canvas, could be deleted when editing the shape. He suggests to apply a lock to the canvas, not allowing the image and GUI threads to access the shapes concurrently.

Keeping selections after undo/redoing

This challenge is a smaller one. The shape selections were not kept, since they are not part of the layer. It was owned by the layer’s shape manager, though, but a cloned layer would take a brand-new shape manager. In addition undo() and redo() will now replace the whole layer, so pointers to original shapes are no longer valid. This means merely keeping the selections from the shape manager would not work. The solution is to map the selected shapes to the cloned layer, which would be kept in the undo command. The strategy I use is similar to what we have done for layers: go through the whole heirarchy of the old layer and push everything into a queue; go through the heirarchy of the cloned layer in the same order and each time take the first shape in the queue; if the popped shape is in the selection, we add its counterpart in the cloned layer to our new selection.

For now the tools should be working and the merge request is prepared for final review. Hopefully it would make its way to master soon.

KSyntaxHighlighting - Over 300 Highlightings...

Sunday 25th of August 2019 12:06:00 PM

I worked yesterday again on the Perl script that creates the highlighting update site used by e.g. Qt Creator.

I thought it would be perhaps a good idea to create some simple human readable overview with all existing highlighting definitions, too.

The result is this auto-generated Syntax Highlightings page.

Astonishing enough, at the moment the script counts 307 highlighting definitions. I wasn’t aware that we already crossed the 300 line.

Still, it seems people miss some highlighting definitions, take a look at the bug list of KSyntaxHighlighting.

The bugs with requests for new definition requests got marked with [New Syntax].

I am actually not sure if we should at all keep bugs for such requests, if no patch is provided there to add such an highlighting. Obviously, we want to have proper highlighting for all stuff people use.

But, as we have these bugs at the moment, if you feel you have the time to help us, take a look. Some Perl 6 highlighting is appreciated, or any of the others there ;=) Or perhaps you have a own itch to scratch and provide something completely different!

Our documentation provides hints how to write a highlighting definition. Or just take a look at the existing XML files in our KSyntaxHighlighting repository.

Patches are welcome on the KDE Phabricator.

Otherwise, if you provide a new definition XML + one test case, you can attach them to the bug, too (or send them to kwrite-devel@kde.org).

With test case an example file in the new language is wanted, with some liberal license. This will be used to store reference results of the highlighting to avoid later regressions and to judge the quality of the highlighting and later improvements.

We would prefer MIT licensed new files, if they are not derived from older files that enforce a different license, thanks! (in that case, it would be good to mention in some XML comment which file was used as base)

KDE Usability & Productivity: Week 85

Sunday 25th of August 2019 02:12:46 AM

I’m not dead yet! KDE’s new goal proposals have been announced, and the voting has started. But in the meantime, the Usability & Productivity initiative continues, and we’re onto week 85! We’ve got some nice stuff, so have a look:

New Features Bugfixes & Performance Improvements User Interface Improvements

Next week, your name could be in this list! Not sure how? Just ask! I’ve helped mentor a number of new contributors recently and I’d love to help you, too! You can also check out https://community.kde.org/Get_Involved, and find out how you can help be a part of something that really matters. You don’t have to already be a programmer. I wasn’t when I got started. Try it, you’ll like it! We don’t bite!

If you find KDE software useful, consider making a tax-deductible donation to the KDE e.V. foundation.

Kate - Document Preview Plugin - Maintainer Wanted!

Saturday 24th of August 2019 04:46:00 PM

At the moment the Document Preview plugin that e.g. allows to preview Markdown or other documents layout-ed via embedding a matching KPart is no longer maintained.

You can find more information about why the plugin got abandoned in this phabricator ticket.

If you want to step up and keep that plugin alive and kicking, now is your chance!

Even if you don’t want to maintain it, you can help out with taking care of existing bugs for this plugin.

Just head over to the KDE Bugzilla bugs for this plugin.

Any help with this is welcome!

Preparing for KDE Akademy 2019

Saturday 24th of August 2019 08:15:00 AM

Less than two weeks to go until Akademy 2019! Quite excited to go there again, for the 16th time in a row now. Until then there’s quite a few things I still need to finish.

Talks

I got three talks accepted this time, which is very nice of course, but it also implies quite some preparation work. The topics are rather diverse, all three cover aspects I ended up looking into for various reasons during the past year, and where learned interesting or surprising things that seemed worthwhile to share.

Secure HTTP Usage - How hard can it be?

Day 1, 15:00 in U4-01, details.

This is the hardest one of the three talks to prepare, as I’m least familiar with the full depth of that subject. I did look into this topic during the past year as part of the KDE Privacy Goal, and I wasn’t too happy with what I found. So this talk will cover our current means of talking to remote services (QNetworkAccessManager, KIO, KTcpSocket, QSslSocket) and their issues (spoiler: there are more than you’d wish for), and how we could possibly move forward in this area.

To illustrate the problems, I’ve written a little demo application which can be found on KDE’s Gitlab, as well as the first patches to actually address the current shortcomings.

KPublicTransport - Real-time transport data in KDE Itinerary

Day 2, 14:00 in U4-08, details.

In last year’s presentation about KDE Itinerary I assumed that access to public transport real-time data would be a big blocker. I was wrong, after the talk I was contacted by people from the Navitia team, which lead me to discover all the existing work the Open Transport community has already done in this field, one of the biggest (positive) surprises for me last year. So here I’m going to show what’s available there for us, and our interface to this, the KPublicTransport framework.

KDE Frameworks on Android

Day 2, 15:00 in U4-08, details.

While Android isn’t really my area of expertise either, I ended up digging into this as part of the work of bringing KDE Itinerary to that platform, and again not being entirely happy with the current state. Platform specific code in applications, even more so error-prone string-based JNI code, isn’t really something I like to maintain. Pushing more of that to the libraries that are supposed to isolate applications from this (Qt and KDE Frameworks) is the logical step here.

This one looks like it’s going to be the most challenging one to squeeze into its given time slot. I also couldn’t stop myself here from implementing a prototype to address some of the issues mentioned above.

BoFs

Due to popular demand I’ll also be hosting a BoF about creating custom KItinerary extractors, currently planned for Thursday, 9:30 in U2-02. Since most people will have had to travel to Akademy everyone should have sample data for train or plane tickets, and for accommodation bookings that we can look at to get supported. Having a very recent version of KItinerary Workbench installed will be useful for this (which as a result of preparing this is receiving inline editing capabilities for the extractor scripts, and is getting close to being able to reload even the extractor meta data without requiring an application restart). My goal for this would be to get more people into contributing new or improved custom extractor scripts for so far not supported booking data.

I’ll also try to bring the latest version of the Plasma Mobile Yocto demo, in case we want to do an embedded device BoF again.

Itinerary

While we have been fixing a few Akademy-related KDE Itinerary bugs already, one thing that doesn’t work yet is automatically integrating the Akademy events in to the itinerary (that’ll need the browser integration).

Until then, there is a manually created file that can be imported into KDE Itinerary here. At the moment this only contains the conference days, for the welcome and social event there are no locations available yet. Having the events in the itinerary will make the navigation features more useful, e.g. to find your way from your hotel to the event.

See you in Milan :)

I am going to Akademy

Friday 23rd of August 2019 10:12:28 PM

One more edition of KDE Akademy approaches, and here I am waiting for the day to pack my bags and get into a 15 hours adventure from Brazil to Milan.

As always I am excited to meet my friends, and have some fun with all people of the community. Discussing our present and future.

And this year I am going with a pack of KDE 3DPrinted key holders.

So, if you want one or more of these, hurry because I will only have around 50 units.

See you all very soon!

That’s all folks!

Final Days of GSoC 2019

Friday 23rd of August 2019 08:53:00 PM
Hello Friends! Final Evaluation is coming and this brings the GSoC project to an end. I believe it is my duty to let you know all about my contributions to Labplot during this project. I will try to make this post self-contained and will try to cover every detail. Let's try to answer some general questions. If something is left, please feel free to comment, I will get back to you asap.
What all Statistical Tests got added?This is the final list of statistical tests which got added during the course of the project:
  • T-Test
    • Two-Sample Independent
    • Two Sample Paired
    • One Sample
  • Z-Test
    • Two-Sample Independent
  • ANOVA
    • One Way ANOVA
    • Two Way ANOVA
  • Levene Test: To check for the assumption of homogeneity of variance between populations
  • Correlation Coefficient
    • Pearson's R
    • Kendall's Tau
    • Spearman Rank
    • Chi-Square Test for Independence
So as many of you must have noticed, I have added almost all the features which I have promised in the proposal. For one or two which are left (noted down in TODO), all the basic structure is already created and it will not be that difficult to complete them in future. 
These features are tested using automatic unit testing. How these statistical tests can be selected? You can choose a test using these beautiful docks.
Hypothesis Test DockCorrelation Coefficient DockFor T-Test, Z-Test and ANOVA, you have to go to Hypothesis Test Dock and for Correlation Coefficient, you have to go to Correlation Coefficient Dock.
Here is the live demo. It will show you how to reach these docks and what all tests are available. 

What should be Data Source Type to run these tests? Currently, you can perform the tests on the data contained in the spreadsheet. Labplot has its own spreadsheet. Spreadsheet

You can either, import data from a file (CSV and other formats) or SQL database directly to this spreadsheet.

Now, you just have to choose the columns on which test is to be performed, along with other options in the dock.


Note that only valid columns related to the tests will be shown.

There can be cases where, you don't have the access to the whole data, but you have the statistics of data like a number of observations, the sample mean, sample standard deviations or you have data in the form of the contingency table. These cases mostly occur in case of Z-Test (as data is huge) or in Chi-Square Test for Independence, and hence currently this second alternative is available only for these two tests. For other tests, the implementation can be extended in future.
So, the second alternative is the statistic table. In Statistic Table you can fill the data and then can continue with the test.

Here are the Statistic Tables for Z-Test and Chi-Square Test for Independence.

Chi-Square Test for Independence Statistic Table

Z-Test Statistic Table










You can enter data in these empty cells.

For Z-Test Statistic Table: Row Header (vertical header) is editable and you can give your own names to them

For Chi-Square Statistic Table: Both Row and Column Headers are editable. Moreover, you can change the number of rows and number of columns of this table dynamically from here in the dock:

Reducing the number of rows/columns can erase the data for those reduced columns/rows. Increasing again will create new cells.

Using Export to Spreadsheet, you can export this contingency table to a spreadsheet. Three columns will get appended to the spreadsheet after clicking on it by names: Independent Var.1, Independent Var2 and Dependent Var. Now, you can save the spreadsheet and next time you can simply open the spreadsheet and perform the test without having to fill the contingency table.


You can clear all the cells of the table using a clear button at the end of the table. Notice, none of the header's content will get erased using this.



What are these extra options in the Docks? You can uncheck this checkbox to remove the assumption of the equality of variance between both population means. Press Levene's Test button to check for homogeneity of variance. Equality of variance is assumed by default. For Two-Sample Independent T-Test, you can select whether to make this assumption or not but for One-Way ANOVA this assumption must be valid. 

Change Significance level from here. The default value is 0.05.

Set the null and alternate hypothesis from here.
For Two-Sample Tests:
μ = Population Man of Independent Var1μₒ = Population Man of Independent Var2
For One-Sample Tests:μ = Population Mean of Independent Varμₒ = Assumed Population mean

For  One-Sample Tests, you can set the assumed population mean from here.

This option is visible when the statistical tests can take variable 1 to be categorical such that variable 2 becomes the dependent variable. This checkbox is automatically checked when column mode in Independent variable 1 is TEXT. You can also check this option manually if the column contains class labels in numeric form.
Using this you can finally perform a statistical test.

So Finally! How to perform statistical tests on the data? I will show you various live demos. For the first demo, I will import data from a file, but for later demos, I will directly start from a pre-filled spreadsheet. 
These examples are taken from various websites and open-source application like JASP, etc.
Demo for Two-Sample Independent T-Test. Here, I will also show you the use of checkbox Independent Var. 1 Categorical. 

Now, I will give you a demo example of calculating Two-Sample Independent Z-Test using Statistic Table:


This is demo example od calculating Two Way ANOVA


This is a demo example of calculating the Pearson Correlation Coefficient Test:


This demo shows how to calculate the chi-square test for independence if you have contingency table and how to convert that table in a spreadsheet. 


How does Result Look like? Hopefully, you must have seen video demos till now. There you must have seen many result tables. I have also tried to explain what all each element in results view/table is. 
The Result view can be divided into these three parts:  <-------------------------------Title




<--------------------------------Summary View 






<-------------------------------Result View

In Summary view, you can see the summary of the data in the form of tables and texts. It is a QTextEdit Widget. The additional feature I have added is the ability to show different tooltips for separate words. There is no direct method for such a feature, so I have to subclass ToolTipTextEdit from QTextEdit. These tables are actually HTML Tables which are automatically generated by code. Since QTextEdit is HTML aware, so using HTML Tables gives the user feature-rich experience.
The Result view shows you the final result of the statistical test. It will also show errors in red colour if it encounters some error while performing the test. Here also, you can see the tooltip, but here tooltip is for every line and not for every word. 
Here are the screenshots of some results. 



















What is left to do? 
  • Add more tooltips in Result and Summary
  • Check for assumptions using various tests (like Levene's Test).
  • Reimplement above features for data source type: Database.
  • Integrate various tests in one workbook to show a summary to the user in a few clicks.
  • All other minor TODOs are already written as comments in source code itself.
What are the future goals?We aim to generate a single self-contained report for the data, currently analysed by the user. This report will show the statistical analysis summary and graphs in one place, at a single click, without the need of the user to explicitly select or instruct anything unless he/she feels the need of doing so. The idea is to make the task of data analysis easy for the user and give him/her the freedom to play around with the data while keeping track of the changes occurring in different statistical parameters.The ConclusionSo finally! you made till the end of the post! Kudos to you. 
This also brings me to the end of the GSoC project. It was a very pleasant journey. I learnt a lot during these 3-4 months. I wouldn't have reached here without all the help and support of my mentors (Stefan Gerlach and Alexander Semke). They both are most chilled out persons I have worked up with and they had calmed and helped me a lot in difficult times and never got frustrated while correcting my stupidest mistakes. Huge Clap for Both of Them. 👏👏👏

Google Summer of Code 2019 may be ending but this is a start for me in this huge and friendly open source community. I will try to be as active here as possible and will not stop working on this project. 
There are many things left to be done but I think the basic structure is made already during this project and in future, these features can be extended very nicely. 
Thank you all for reading this till the very end. Will meet you all soon with new blog posts till then take care, bubye... Alvida, Shabba Khair, Tschüss




The Sprint

Friday 23rd of August 2019 06:17:25 PM

Hi -)) haven’t posted for some time, because I was busy travelling and coding for the first half of the month. From Aug 5 to Aug 9, I went to the Krita Sprint in Deventer, Netherlands.

According to Boud, I was the first person to arrive. My flight took a transit via Hong Kong where some flights were affected due to natural and social factors, but fortunately mine was not one of them. Upon arrival in Amsterdam I got a ticket for the Intercity to Deventer. Railway constructions made me take a transfer via Utrecht Centraal, but that was not a problem at all: the station has escalators going both up to the hall, and down to the platforms (in China you can only go to the hall by stairs or elevator (which is often crowded after you get off)). When I got out of Deventer Station, Boud immediately recognized me (how?!). It was early in the morning, and the street’s quietness was broken by the sound of me dragging my suitcase. Boud led me through Deventer’s crooked streets and alleys to his house.

For the next two days people gradually arrived. I met my main mentor Dmitry (magician!) and his tiger, Sagoskatt, which I (and many others) have mistaken for a giraffe. He was even the voice actor for Sago. He had got quite a lot of insights into the code base (according to Boud, “80%”) and solved a number of bugs in Krita (but he said he introduced a lot of bugs, ha!). Also I met David Revoy (my favourite painter!), the author of Pepper and Carrot. And Tiar, our developer who started to work full-time on Krita this year; she had always been volunteering to support other Krita users and always on the IRC and Reddit. And two of other three GSoC students for the year: Blackbeard (just as his face) and Hellozee. Sh_zam could not come and lost communications due to political issues, which was really unfortunate (eh at least now he can be connected). It is feels so good to be able to see so many people in the community – they are so nice! And it is such an experience to hack in a basement church.

On Aug 7 we went to the Open Air Museum. It displays a large extent of the history in the Netherlands, how their people lived. After a really delicious lunch we went out and started to do paintings. I was to paint on my Surface using Krita, but unfortunately it went out of battery so I had to gave up and painted on a postcard. The tram in the museum is my favourite one (I am always fond of transit) and they even have a carhouse where stood lots of old vehicles. Except for my head which hit the ceiling of the coach three times, everything that day was wonderful.

The next day was the main meeting. In the morning we discussed the development plans for Krita. Bugs. Stability. New features. David Revoy came up again with the docker size problem, which Boud simply called it “a Qt problem.” He said, “Yes I do know what to do with that, but new users probably don’t and thus we gotta address it and not solely blame Qt.” (Yeah it troubled me a lot as well!) Another thing closely related to me was building on Windows, which was largely neglected by KDE. In the afternoon the focus shifted to marketing. I did not know much about it, but it is a fact that we cannot produce electricity out of love. We spent quite a lot of time on the painting competition for Krita. Where it should be held. How to collect the paintings. How to filter out good pictures. Krita promotes new artists. They promote our software.

For the next two days people started leaving. I left on the 10th, and then slept for a whole day when I got to Nanjing (so tired…). On Aug 14th I left again for Toronto, and then restarted to write code and debug. I finally got the time to write this post today, as I finally fixed a crash in my project. It is almost finished, and soon another post would be made on it.

Cantor and the support for Jupyter notebooks at the finish line

Friday 23rd of August 2019 09:13:49 AM
Hello everyone! It's been almost three weeks since my last post and this is going to be my my final post in this blog. So, I want to summarize all my work done in this GSoC project. Just to remember again, the goal of the project was to add the support for Jupiter notebooks to Cantor. This format is widely used in the scientific and education areas, mostly by the application Jupyter, and there is a lot of content available on the internet in this format (for example, here). By adding the support of this format in Cantor we’ll allow Cantor users access this content. This is short description, if you more intersted, you can found more details in my proporsal.

In the previous post, I described the "maximum plan" of the Jupyter support in Cantor being mostly finished. What this means in practice for Cantor is:
  • you can open Jupyter notebooks
  • you can modify Jupyter notebooks
  • you can save modified Jupyter notebooks without loosing any information
  • you can save native Cantor worksheets in Jupyter notebook format
To test the implemented code I used couple of notebooks mentioned in „link to the earlier post“. But the Jupyter world doesn’t consist out of this small number of notebooks only, of course. So, it was interesting to confront the code with more notebooks available in the wild out there.

I recently discovered a nice repository of Jupyter notebooks about Biomechanics and Motor Control with 70 notebooks. I didn’t use these notebooks before for testing and validation and didn’t know anything about there content. 70 notebooks is quite a number and my assumption was these notebooks, without knowing them in detail, will cover many different parts and details of the specification of the Jupyter notebook format and will challenge my implementation to an extent that was not possible during my previous testing activities. So, this new set of notebooks was supposed to be new and good test content for further and stricter validation of Cantor.

I was not disappointed. After the first round of manual testing based on this content, I found issues in 7 notebooks (63 projects functioning correctly!), which I addressed. Now, Cantor handles all 70 notebooks from this repository correctly.

Looking back at what was achieved this summer, the following list summarizes the project:
  • the scope for mandatory features described in the project proposal was fully realized
  • the biggest part of optional features was finalized
  • some other new features were added to Cantor which were needed for the realization of the project like new result types, the supported for embedded mathematical expressions and attachments in Markdown Cells, etc.
  • the new implementation was tested and considered stable enough to be merged into master and we plan to release this with Cantor 19.12
  • new dedicated tests were written to cover the new code and to avoid regressions in future, the testing framework was extended to handle project load and save steps
I prepared some screenshoots of Jupyter notebooks that show the final result in Cantor:










Even though the initial goal of the project was achieved, there are still some problems and limitations in the current implementation:
  • for Markdown entries containing text with images where certain alignment properties were set or after image size manipulations, the visualization of the content is not always correct which is potentially a bug in Qt
  • because of small difference in syntax between MathJax used in Jupyter notebooks and Latex used for the actual rendering in Cantor, the rendering of embedded mathematical expressions is not always successful. At the moment Cantor shows an error message in such cases, but this message is often not very clear and helpful for the user
  • Qt classes, without involving the full web-engine, as used by Cantor provide only a limited and basic support for HTML. More complex cases like embedded Youtube video and JavaScript don’t work at all.
This is all for the limitations, I think. Let's talk about future plans and perspectives. In my opinion, this project has reached its initial goals, is finished now and will only need maintenance and support in terms of bug fixing and adjustment to potential format changes in future.

When talking more generally, this project is part of the current overall development activities in Cantor to improve the usability and the stability of the application and to extend the feature set in order to enable more workflows and to reach to a bigger audience with this. See 19.08 and 18.12 release announcements to read more about the developments in the recent releases of Cantor. Support of the Jupyter notebook format is a big step into this direction but this not all. We have already many other items in our backlog like for the UX improvements, plots integration improvements going into this direction. Some of this items will be addressed soon. Some of them are something for the next GSoC project next year maybe?

I think, that's all for now. Thank you for reading this blog and thank you for your interest in my project. Working on this project was a very interesting and pleasant period of my life. I am happy that I had this opportunity and was able to contribute to KDE and especially to Cantor with the support of my mentor Alexander Semke.
So, Bye.

KDE ISO Image Writer – Release Announcement

Thursday 22nd of August 2019 08:28:08 PM

My GSoC project comes to an end and I am going to conclude this series of articles by announcing the release of a beta version of KDE ISO Image Writer.
https://download.kde.org/unstable/isoimagewriter/0.8/
https://binary-factory.kde.org/job/KDE%20ISO%20Image%20Writer_Nightly_mingw64/

Highlights of the changes User Interface Revamp

The user interface of KDE ISO Image Writer has been revamped according to designs made by the KDE community (https://phabricator.kde.org/M113).

Windows Build

In addition to the user interface changes, code changes have been made to allow KDE ISO Image Writer to compile and run on Windows.

N.B. The Windows build currently crashes when ISO images of certain distributions are used because of a segmentation fault caused by QGpgME on Windows.

Making Sink(ed) contacts accessible to Plasma-Phonebook App

Thursday 22nd of August 2019 08:08:31 PM

Plasma-phonebook is a contacts managing application for Plasma-Mobile. The app gets the data, i.e., contacts from the kpeople library. It acts as backend for the plasma-phonebook app.
The task is to reveal sink contacts, i.e., contacts synced using KDE sink API to kpeople. In other terms, we need to make sink’s contacts database as kpeople’s data-source. Sink is an offline-caching, synchronization and indexing system for calendars, contacts, mails, etc.

** “How contacts are synced using sink?” will be discussed in another blog. **

Here’s what happens. When Plasma-phonebook app is started, an instance of kpeople (the backend) is created, and following which all the datasources of kpeople are called upon to serve their master .
In this case, KPeopleSinkDataSource.

class KPeopleSinkDataSource : public KPeople::BasePersonsDataSource { public: KPeopleSinkDataSource(QObject *parent, const QVariantList &data); virtual ~KPeopleSinkDataSource(); QString sourcePluginId() const override; KPeople::AllContactsMonitor* createAllContactsMonitor() override; }; QString KPeopleSinkDataSource::sourcePluginId() const { return QStringLiteral("sink"); } AllContactsMonitor* KPeopleSinkDataSource::createAllContactsMonitor() { return new KPeopleSink(); }

All the logic of program the KPeople-Sink plugin lies in class KPeopleSink, whose temporary instance is returned in function createAllContactsMonitor [line number 14]. This class is inherited from KPeople::AllContactsMonitor. AllContactsMonitor needs to be subclassed by each datasource of kpeople.

All that is to be done is :
1. Fetch the list of addressbooks synced by sink.
2. Get resource-id of these address-books.
3. Fetch the list of contacts for that addressbook
4. Assign a unique URI to every contact
5. Create an object of a class inherited from AbstractContact. AbstractContact is the class to provide the data from a given contact by the backends. It’s virtual function customProperty, a generic method to access a random contact property needs to be defined in the subclass.
6. Emit contactAdded signal.
Contacts are now aadded to kpeople and hence, accessible to plasma-phonebook! Simple \o/

This is what code looks like:

void KPeopleSink::initialSinkContactstoKpeople(){ //fetch all the addressbooks synced by sink const QList<Addressbook&gt; sinkAdressbooks = Sink::Store::read<Addressbook&gt;(Sink::Query()); Q_FOREACH(const Addressbook sinkAddressbook, sinkAdressbooks){ //to get resourceId QByteArray resourceId = sinkAddressbook.resourceInstanceIdentifier(); //fetch all the contacts synced by sink const QList<Contact&gt; sinkContacts = Sink::Store::read<Contact&gt;(Sink::Query().resourceFilter(resourceId)); Q_FOREACH (const Contact sinkContact, sinkContacts){ //get uri const QString uri = getUri(sinkContact, resourceId); //add uri of contact to set m_contactUriHash.insert(uri, sinkContact); KPeople::AbstractContact::Ptr contact(new SinkContact(sinkContact)); Q_EMIT contactAdded(uri,contact); } } }

Now, suppose plasma-phonebook app is running and in the meanwhile sink syncs the updated addressbook from server. In that case, next task becomes to make the changes visible in phonebook. We set up a notifier for each resource-id that notifies everytime a contact/addressbook with that resource-id is updated.

m_notifier = new Notifier(resourceId); m_notifier->registerHandler([=] (const Sink::Notification &notification) { if (notification.type == Notification::Info && notification.code == SyncStatus::SyncSuccess) { //Add program logic for updated addressbook/contact } });

A hash-table can be maintained to keep track of new contacts being added. Fetch the list of contacts synced by sink. Each time a contact in the list is not present in hash-table, it means a new contact has been added on the server and hence, emit contactAdded signal. New contact will be added to kpeople and hence accessible in plasma-phonebook.

const QList<Contact&gt; sinkContacts = Sink::Store::read<Contact&gt;(Sink::Query().resourceFilter(resourceId)); Q_FOREACH (const Contact sinkContact, sinkContacts){ const QString uri = getUri(sinkContact, resourceId); if(!m_contactUriHash.contains(uri)){ m_contactUriHash.insert(uri, sinkContact); KPeople::AbstractContact::Ptr contact(new SinkContact(sinkContact)); Q_EMIT contactAdded(uri,contact); } }

Similarly, we can code for cases where a contact is updated or deleted.

And with this, Tada!
Our system for Nextcloud CardDav Integration on Plasma Mobile is here \o/

You can follow the similar procedure if you want to create your own contact datasource for KPeople

KPeople-Sink plugin’s code is available at: https://invent.kde.org/rpatwal/kpeople-sink
Contributions to the project are welcomed

Day 88

Thursday 22nd of August 2019 03:00:42 AM

Today, I’ll talk about my GSoC experience and won’t focus so much on Khipu, but in the next days I’ll publish a post about Khipu and what I’ve done.

As I said in the old posts, the begin was the most complicated part for me. I made a project thinking that I’d be able to complete, I started studying the code and the things I’d make many weeks before the start. But I couldn’t understand the code and I think it’s my fault. I even lost three weeks after the start stuck in this situation. It was hard for me, because I was really scared about failing and at the same time dealing with my college stuff, because in Brazil, our summer (and our summer vacation), is in December-February, in July we have a three week vacation, but GSoC lasts three months. I wasn’t having a good time at college as well, but with the help of my mentors I found a way to deal with the both things and as everything went well.

After this complicated start, to not fail, my mentor suggested that I could change my project. My initial project was to create new features to Khipu and Analitza (Khipu’s main library) to make it a better application and move it out from beta. Then, my new project was to refactor Khipu (using C++ and QML). I was scared because I didn’t know if I’d be able to complete it, but the simplicity of QML helped me a lot, and before the first evaluation (approx. two weeks after I decided my new project) I finished the interface, or at least most of it.

During the second period, in the start, I slowed down my code activities because I was in the end of my college semester and I had to focus on my final tests. But after this, I started to work in the backend and learned about models, connections between C++ and QML, and the most important: I improved my programming skills. I needed to learn to build a real program, and not making small patches as I used to do in my open source contributions history. In the end of this period, the screen was already working and showing 2D and 3D spaces and plotting functions.

So here I am, in the end of the third and last period writing about this experience. In this period, I worked in the program buttons and options, created dialogs, improved the readability of my code, documented my functions and files, and the technical stuff that I’ll explain in the next post.
This period is being nice, because I gained so much maturity, not only for coding, but to my life. I feel I’m able to fix the bugs that still are there and to deal with new situations in my professional and even in my personal life.

This post has the purpose to share my experience to the students that will participate on the next years. So I’d like to give two advices based on my experience:
1- don’t freak out, ask for help. The time that you lose scared and nervous, use it to think rationally on what you can do to solve that problem.
2- make a weekly activities schedule, it maybe will sound obvious, but after I started doing it, my productivity increased a lot, because is easier to control how much time I’m working and allocate time to my other activities.
And, of course, I’d like to say to KDE, Google and my mentors: thanks for this opportunity.

Second Evaluation

Wednesday 21st of August 2019 08:00:00 PM
I would say the second evaluation period: June 28, 2019 - July 22, 2019, was one of the most exciting period of my whole GSoC project. I had passed in the first evaluation, received payment and had gone partying with my brother. Things also got easier and I started enjoying coding in Qt.

During this period, I would say that I face these two major problems.

1) For the first evaluation, I had not done unit testing. I had only tested my features manually. So, It was time to create unit tests for each feature. I started reading about Qt unit tests, analysed source code of unit tests and wrote my own for previously implemented features. These unit tests were just not enough as I was taking a huge relative error of around 0.1. At that time my mentor pointed it out but I couldn't able to understand properly. So we left it for a while and started with new features. After completing new features (almost of what was intended to be done according to the proposal), I started creating unit tests for these features as well. My relative error was huge again, but this time, I didn't leave it for the future. So, I discussed it with my mentor again and then we caught the issue. The issue was, I was not taking good precision in correct answers with which values had to be compared. I recalculated all answers using a calculator and then placed in it the code. Now, I am getting around 1.e-5 accuracy, which I think is decent enough. This recalculation for then I had only done for newly implemented features, ones which are implemented for first evaluation is still left to be recalculated. This is explained in my report for the second evaluation.

2) The second problem was the size of source code files. My portion of the code now became large. The problem was that introducing new features will increase the size of the source file by a huge amount and it is not a good practice to add everything in one file, and we also didn't want to repeat and rewrite same portions of the code everywhere. So, then we used the inheritance concept. I created base classes (GeneralTest) for all the classes created so far and subclassed (HypothesisTest and CorrelationCoefficient) from it. This is also explained in my report for the second evaluation.

For more technical details of what all is done in this period, you can refer to my report:
https://docs.google.com/document/d/1qgss0AssIb3HJIDeAYIos2ig37tk_8UWqDsn4OwDPrQ/edit?usp=sharing


Thank you so much for your interest and reading so patiently. Bye, take care.. see you soon.











More in Tux Machines

SUSE/OpenSUSE: Ceph and OpenSUSE's Tumbleweed Progress

  • Can I deploy Ceph on older hardware?

    You just retired a bunch of servers and disk arrays, but before you place hundreds of thousands or millions of dollars’ worth of equipment on the curb, you’re wondering if you can use it for a Ceph-based storage solution like SUSE Enterprise Storage. The answer is: maybe. SUSE prides itself on supporting a wide range of hardware, from blades to retail terminals to IoT devices. In fact, SUSE makes it possible to easily deploy a wide range of software on that hardware and certify it will work through the SUSE YES Certification Program. SUSE Yes Certification assures your IHV equipment is fully compatible with SUSE software, including SUSE Enterprise Storage.

  • openSUSE Tumbleweed – Review of the week 2019/42

    Another week has passed with again four snapshots published. This pace seems to be holding pretty solid and I think it’s not the worst speed there is. During this week, we have released the snapshots 1011, 1012, 1014 and 1016. As usual, some were smaller, some were bigger.

EPA and EPAAR

  • EPA Rule Will Make Its Custom Code Open Source By Default

    The Environmental Protection Agency is getting ready to default to making all its custom code open source, finally meeting an Office of Management and Budget policy instituted during the last administration.

    The EPA will publish a notice Friday in the Federal Register soliciting public comment on a new open-source policy that will be added to the agency’s acquisition regulations. The clause—which will be added to all EPA contracts that include the use of open-source software or the development of custom code that may or may not be shared widely—will require contractors to provide the agency with all “underlying source code, license file, related files, build instructions, software user’s guides, automated test suites and other associated documentation as applicable,” according to the notice.

  • Environmental Protection Agency Acquisition Regulation (EPAAR); Open Source Software

    A Proposed Rule by the Environmental Protection Agency on 10/18/2019

    [...]

    The EPA is writing a new EPAAR clause to address open source software requirements at EPA, so that the EPA can share custom-developed code as open source code developed under its procurements, in accordance with Office of Management and Budget's (OMB) Memorandum M-16-21, Federal Source Code Policy: Achieving Efficiency, Transparency, and Innovation through Reusable and Open Source Software. In meeting the requirements of Memorandum M-16-21 the EPA will be providing an enterprise code inventory indicating if the new code (source code or code) was custom-developed for, or by, the agency; or if the code is available for Federal reuse; or if the code is available publicly as open source code; or if the code cannot be made available due to specific exceptions.

Samsung discontinues ‘Linux on DeX’ program

  • Samsung discontinues ‘Linux on DeX’ program, removing support w/ Android 10

    Late last year, Samsung and Canonical partnered on an app that allowed select Galaxy phones to run a full Linux desktop on top of Android. Less than a year later, Samsung has announced that they’re discontinuing the Linux on DeX program, coinciding with the update to Android 10. One of the sci-fi-style dreams that many of us have had since the onset of smartphones is the idea of plugging your phone into a desktop-size monitor to get a desktop-style experience. Through the years, many have attempted it in earnest, and the latest offering from Samsung brought an interesting approach.

  • Samsung Calls It Quits on the ‘Linux on DeX’ Project

    Samsung DeX, if you have heard of it, allows the users to turn their Galaxy phones into desktop PCs simply by connecting a monitor and other peripherals. The company made DeX more welcoming and useful for Galaxy flagship users by partnering with Canonical earlier last year. It made it possible for users to run a full Linux desktop instance on its DeX-supported flagship phones. This was an amazing feature for developers and users who didn’t really like carrying a laptop with them. They could rely on their Galaxy flagship (including the Galaxy S and Note-series) for a desktop-like experience, running Ubuntu on the move. However, the response to Linux on DeX seems to have been lackluster and Samsung has decided to shutter this project.

  • Samsung is discontinuing Linux support on Dex

    Samsung goes on to explain that starting with its Android 10 beta ROMS, already rolling out on certain devices, Linux support will be removed from Dex altogether. This does make us wonder if, perhaps, the third-party OS emulation setup Samsung was employing to get Linux to work in the first place somehow breaks certain rules or security policies Google implemented with the latest Android version. Regardless of whether or not this is the case, if you are currently using Linux on Dex, you definitely want to start keeping regular backups of your data. Since, given current developments even staying on Android 9 and not updating your phone's Android OS still might not be a sure-fire way to keep the feature running.

Android Leftovers