Sven R. Kunze: 2016

Thursday, September 29, 2016

PostgreSQL 9.6 Released

We are finally there. There is a new major version of PostgreSQL - 9.6.

Here are the news, so go ahead and read what they've done for the community. It's just great!

Cheers!

Wednesday, September 7, 2016

PostgreSQL: Optimizing Aggregates

Aggregation of vast energy: our sun.

It's time to write about PostgreSQL 9.6 again. With the Release Candidate 1 out in the wild, we slowly approach the end of a very interesting development cycle. Here I would like to talk about the development in the field of aggregates. So, two commits, 9/552 and 9/435, will improve the performance of queries using aggregate functions and GROUP BY clauses:

Followup: systemd user instances

In this last post, I wrote about how to fix systemd user instances for older/broken systemd versions. Here, I'd like to explain how we managed to get the solution for, say, more than a single host where you can't do those changes by hand (at least not while keeping you sane and your customers happy).

In order to keep track here, we need to do the following:

What do you do when you need more systemd instances?

You wanna make it there? You need to go there!

Today, we had a very interesting problem. We needed to have systemd run additional instances of itself to manage custom daemons. This works like the following.

PostgreSQL: Index-Only Scans with Partial Indexes

Partial sun lurking out of the water.

Another posts of my PostgreSQL 9.6 series. This time, I am talking about commit 9/299. The complete discussion can be found here.

PostgreSQL: Parallel Aggregate

With PostgreSQL 9.6 looming on the horizon, I went out to sift through some of PostgreSQL's commitfests to find some interesting bits and pieces. This post is the start of a series covering commits of the next generation of the venerable database management system.

Outflow made parallel.

What is a path?

Wisecracker.

pathlib is a provisional stdlib module. However, as the current threads (here, here, here and here) on python-ideas show, it is not as easy to work with as originally intended. Once you have a Path object, it's quite easy to use what Path offers which is a lot.

One big problem, though, is the interaction of Path objects with existing stdlib functions. Most of the later are string-consuming functions whereas the former are no strings at all. As far as I can see, this is one reason why pathlib lacks broader adoption and many agree. This situation leads to the following possible resolutions:

make Path objects compatible with strings (basically make them inherit from strings)
make existing stdlib functions accept Path objects (basically make them accept both and convert if needed)
do both but that seems superfluous

Solution 2 would also affect third-party libraries as noted here.

In order to decide appropriately, it becomes necessary to answer the following question:

p-strings

Currently, there is an interesting debate on python-ideas on the topic of "Would we like to add so-called p-strings to Python?". The p-string idea basically extends the f-string syntax which will be released in the upcoming Python 3.6.

The "p" in p-string stands for path and one of the alternative proposals is to add the following syntactic sugar to Python like this:

Python makes you a worse programmer

Thanks Luke for this interesting read: http://lukeplant.me.uk/blog/posts/why-learning-haskell-python-makes-you-a-worse-programmer/

It reminds me of English as it is substantially simpler than most other languages. What I've heard (from themselves) is that most native English speakers are not easily motivated to learn a second language. And that is although they know all the corresponding advantages like healthier brains, better first language, more interesting traveling, etc.

A good article about why to learn a second language: http://www.omniglot.com/language/articles/benefitsoflearningalanguage.htm

Tuesday, March 22, 2016

Safe Cache Invalidation

Caches - as fragile as bubbles.

There are only two hard things in Computer Science: cache invalidation and naming things.

– Phil Karlton

And right he is. Both is true for the package that I would like to present in this post. Based on functools.lru_cache, it allows you to specify when the caches should be invalided. In the absence of a proper name for this kind of functionality, I called it xcache, analogous to xheap and xfork.

Even Faster Heaps

An ambulance rushing by.

Heaps are about performance. So, it is time to make xheap faster again. After realizing that the actual slowdown of RemovalHeap and XHeap does not simply stem from the general overhead but from NOT using the C implementation at all, I decided to change that.

Raymond Tomlinson, the inventor of email, died

Raymond Tomlinson invented one of the most famous technologies of today: email.

He died on Friday.

Read more about him on Ars: http://arstechnica.com/business/2016/03/e-mail-inventor-ray-tomlinson-who-popularized-symbol-dies-at-74/

Wednesday, March 2, 2016

LRU Caches

Precious little pieces preserving the balance of nature.

Python features LRU caches. For this purpose, the decorator @functools.lru_cache is provided. You can configure the size of the cache as well as whether equal arguments of different types should be distinguished.

RLU stands for "least recently used", i.e. if the maximum size of the cache has been reached and a new item is to be inserted, the item with the oldest access timestamp will be discarded to make room for the new resident. The cache size can be unlimited which especially useful for short running scripts.

Let's get our hands dirty:

DROWN—Yet Another Vulnerability of TLS

Yet another vulnerability of TLS has been discovered even affecting the latest version 1.2 as Ars wrote:

http://arstechnica.com/security/2016/03/more-than-13-million-https-websites-imperiled-by-new-decryption-attack/

Best,
Sven

Designing xfork

Recently, I came to know a small team working on a problem which they try to solve by using threads. As expected, problems popped up soon and development slowed down considerably. So based on the previous post, I would like to lay out my intentions and design decisions regarding xfork, a module I've written and actively maintain in analysis of the newly introduced async/await syntax.

Concurrency is a hard engineering problem.

Take it seriously and even consider not being concurrent a valid option.

Design Assumptions

I created xfork from the following observations based on my own experience. Developers usually:

Concurrency in Python

More speed by having two rails. Not always true.

Last year, PEP 0492 got accepted, which introduced coroutines and async/await to Python. During that time, I started subscribing to some Python mailing lists and participated in discussions since then. I wondered how ordinary Python developers can write code that can be executed in parallel or at least concurrently. Specifically regarding asyncio (coroutines) and concurrency in general, we got a survey compiled which I want to record here.

My Python IDE Journey

Pick one.

This post is not intended as advertising but to illustrate my journey to my currently used Python IDE. I tried several ones in recent years due to educational and professional needs as well as to satisfy my curiosity.

First Stop

Everything starts with gedit, nano and vim, right? Not quite full IDEs but it's a start. You can at least write code and have some syntax highlighting available. Until today, a colleague of mine uses vim with tons of plugins featuring "go to definition", "find usages", "code completion", "project nav tree", etc. So, it's quite possible to work with simple editors and enhance them indefinitely.

As you can imagine, I was looking for something else which goes beyond the venerable terminal. So, I started looking for an alternative with the following properties (in its order of priority):

Let's go down the rabbit hole!

Things can be topsy-turvy when considered upside down.

As mentioned in the previous post, there is an interesting and at the same time weird piece of code duplication in RemovalHeap and XHeap that is necessary to make them work properly. This post will cover this oddity in depth.

Imagine you want to count the number of items being set in a list. So, instead of providing a native list object, you write your own class like this:

The xheap Benchmark

These are the inlets of a steam engine. That means, it's time to perform some serious measurements!

We are going to compare xheap and heapq. The benchmark suite can be found right by the source.

The Competitors

heapq - collections of heap functions of Python stdlib written in C
xheap - object-oriented wrappers for heapq

Fast Object-Oriented Heap Implementation

There comes light to the darkness of your heaps.

This is the third post of a series of heap-related ones. See here and here for the back story.

In the last post, we found the heapq module lacking important features. Average Joe Dev doesn't want to clutter up his source code and and re-implement the same features all over the place to rectify the shortcomings of heapq. Understandably, Python core devs don't want to compromise on the performance of heapq either—being fast is the mission of a heap.

Another Pleasant Announcement!

Oracle deprecates the Java browser plugin, prepares for its demise says it all. Thanks, Ars, for bringing this to my attention.

Tuesday, January 26, 2016

heapq and Missing Features

You sometimes need a bit more convenience than a barrel to store your water.

Recently, I wrote about heaps in Python and promised a sequel. Here, you are. This time, we ponder over the shortcomings of Python's heap module, heapq.

So, what's wrong with it? Nothing at all if you have the basic needs: fast push and pop onto a heap. However, as usual, at a certain point, you want more features in your program which in turn set the requirements higher for the heap implementation you use. If that happens, you usually

Heaps in Python

This time, we dig into the matter of how we can use heaps in Python. As a starter, you want heaps when there's a job to do like the following:

queue=[]
while necessary:
    ...
    queue.append(an_item)   # or two
    ...
    ...
    next_item = min(heap)   # what's next
    ...
    queue.remove(next_item) # we're done; remove it
    ...

PostgreSQL 9.5 released

Congratulation to the PostgreSQL Global Development Group for finishing another release of a great piece of software. There is pretty massive set of release notes online of which I'd like to cover the most notably changes and additions.

This is one for the books!

The title says it all.

Microsoft readies kill switch for Internet Explorer 8, 9 and 10

You made my day!

Tuesday, January 5, 2016

Boon and Bane of IDEs

I have come to know many different people and engineers in the recent years. Many of them use IDEs for the day-to-day work. So far, so good. Though, I noticed some of them are not able to help themselves our of weird situations where an IDE corrupted some project config files or generated other strange output. Some of them don't even know how to use subversion or git without a GUI.

Next Programming Language

Most engineers have a preferred programming language. If you are new to programming, you might ask what's the best language to learn in the first place. If you already familiar with some languages, you might ponder which one to learn next. Depending on the situation, you have many options to choose from.

Existing Projects
When joining an existing company/team, you will not reinvent the whole product line but instead improve existing software step by step. So, the question of your language skills is basically set then. You can acquire missing knowledge via tutorial, cookbooks or blogs from around the Web to get started. Deeper knowledge is gained by practicing, reading documentation, filing bug reports and submitting pull requests.

A New Projekt
If you start your own project, you can pick from the following options in descending order of priority:

research what is necessary for a particular application
research what others use for a particular field
use Python

Option 1 should be your primary driver for learning new languages. There are various fields of computer science out there; so I might illustrate my point by providing two examples. If you want to develop Web applications, there is no way out of learning HTML, CSS and JavaScript right now. If you need to process data through Relational Database Management Systems, there is not way out of learning SQL. These are de-facto standards in their own field and you definitely need to master them.

If there's not a single way of solving a particular application domain, we go to option 2.

Option 2 basically builds on the assumption that more people can help faster. Quantity is not always a guarantee to a quality answer but there better be one at all. So, as somebody new to a language, you definitely want quick answers and involvement. A big and vivid community can help you out of misery almost instantly. I recommend Stack Overflow for this purpose. They have almost all sorts of fields related to software and hardware.

Still no idea what to use? Then we go to last resort Option 3.

Option 3 is my fallback for Turing-complete languages when I want things done (so take it with a grain of salt). There is basically nothing Python, its standard library and the bunch of openly available, third-party libraries cannot handle (except women, maybe)—may it be Web and database work (Django), dataset processing (pandas), strings, dicts/maps, dates and times and and and…

Executable pseudocode is a win-win for everybody involved. You can communicate ideas properly and even non-programmers can understand short passages of Python well enough to tell things apart. Even more and more universities switch from Java to Python for teaching programming. Furthermore, I don't like solving already solved problems. I consider that a waste of time for business applications (need job done) and teaching (get idea across).

Beware, there is more than programming when it comes to computers in general. Read on.

Broaden Your Horizon

You definitely should give different languages a try over your lifetime as an engineer. I can tell you it will give you a huge amount of satisfaction in the long run. It helps you to master computers and increase your value tremendously even though you never use some of learned language for a real project.

Not all languages are the same. They differ in several key aspects in part influenced by their history. The following list gives you a glimpse on how diverse computer languages are. Each time you pick a language make sure it fits into different categories. Thus you develop a deeper understanding when and where to use each kind of languages + its surrounding ecosystem (libs, IDEs, communities, etc).

Purpose (programming, data, matching, etc)
Style (imperative, declarative, functional, logical, etc.)
Chomsky hierarchy (Turing-completeness is only a part of it)
Old vs. New
Syntax and Formatting

Example List:

C (programming, fast+unsafe)
YAML (data)
Regular Expression (matching, declarative)
Haskell (purely functional)
Prolog (logical)

I wish you all the best at learning your first or next computer language.

You might ask what's the language I learn and practice currently: it's go. Why? It's because one project I am contributing to uses it as it's primary programming language. You'll see a post of that soon.

Best,
Sven

Saturday, January 2, 2016

Happy New Year!

Welcome back in 2016. I hope you are fine and made it injury-free into 2016.

Back into the lab, this post probes the suitability of blogger.com for developers. We first need some way of presenting code; preferably with a monospaced font and gray-backgrounded:

>>> print('Hello New Year')
Hello New Year

Thursday, September 29, 2016

Wednesday, September 7, 2016

Tuesday, September 6, 2016

Tuesday, August 23, 2016

Friday, July 15, 2016

Tuesday, June 28, 2016

Thursday, March 31, 2016

Tuesday, March 29, 2016

Tuesday, March 22, 2016

Tuesday, March 8, 2016

Sunday, March 6, 2016

Wednesday, March 2, 2016

Tuesday, March 1, 2016

Tuesday, February 23, 2016

Friday, February 19, 2016

Thursday, February 18, 2016

Tuesday, February 16, 2016

Saturday, January 30, 2016

Thursday, January 28, 2016

Tuesday, January 26, 2016

Thursday, January 7, 2016

Wednesday, January 6, 2016

Tuesday, January 5, 2016

Sunday, January 3, 2016

Saturday, January 2, 2016