today's howtos
-
Bash scripting quirks & safety tips
-
grep vs AWK vs Ruby, and a uniq disappointment
In my data-cleaning work I often make up tallies of selected individual characters from big, UTF-8-encoded data files. What's the best way to do this? As shown below, I've tried grep/sort/uniq, AWK and Ruby, and AWK's the fastest. The trials also revealed an unexpected problem with the uniq program in GNU coreutils.
My testbed was a humongous data table called reference.txt. The file contains 233 million characters, some of which are strange Unicode items.
-
How to Setup Docker Private Registry on CentOS 7.x / RHEL 7.x
-
How to Install a DHCP Server in Ubuntu and Debian
-
A Python script for fixing smart quotes in text
-
Protect Your Online Privacy With The Tor Browser Bundle
-
Technical Workshop Guidelines
-
issue #73: OpenSSL, Fossjobs, bcachefs, tmuxp, Gitlab, netbox, udocker, iptables & more
- Login or register to post comments
- Printer-friendly version
- 1175 reads
- PDF version
More in Tux Machines
- Highlights
- Front Page
- Latest Headlines
- Archive
- Recent comments
- All-Time Popular Stories
- Hot Topics
- New Members
digiKam 7.7.0 is releasedAfter three months of active maintenance and another bug triage, the digiKam team is proud to present version 7.7.0 of its open source digital photo manager. See below the list of most important features coming with this release. |
Dilution and Misuse of the "Linux" Brand
|
Samsung, Red Hat to Work on Linux Drivers for Future TechThe metaverse is expected to uproot system design as we know it, and Samsung is one of many hardware vendors re-imagining data center infrastructure in preparation for a parallel 3D world. Samsung is working on new memory technologies that provide faster bandwidth inside hardware for data to travel between CPUs, storage and other computing resources. The company also announced it was partnering with Red Hat to ensure these technologies have Linux compatibility. |
today's howtos
|
Recent comments
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago
1 year 11 weeks ago