The Obligation of The Programmer

Robert C.Martin, of Clean Code fame, has something to say on the role of we programmers in today’s society.

We rule the world.

We don’t quite understand this yet. More importantly, the world doesn’t quite understand it yet. Our civilization doesn’t quite realize how dependent it has become on software — on us.

He goes as far as suggesting a programmer’s code of conduct of sorts. Food for thought I guess, although I suspect we’re too much of a wild and scattered bunch for something like that to really stick.

Nonetheless, he raises a very good point about the predominance of software in our society and the risk that soon or later someone will wake up and attempt to impose some sort of regulation on the profession.

Read it all at the Clean Coder Blog.

Open Source and Code Responsibility

Last week I was speaking at an Open Source panel at Better Software 2014, and one of the topics that we touched was code responsibility. This is an important topic for anyone who is maintaining an open source project, especially when it comes to the process of reviewing and accepting code contributions.

At some point during the debate, I argued that when a maintainer merges a pull request, he (or she) implicitly agrees on being responsible for that code. That seemed to strike some surprise into most attendees.

Yes, in theory any contributor is just a ping away so in case trouble arises one can always reach him, or her. Unfortunately this is not always the case. While some contributors will fully embrace your project and keep helping after their initial contribution, truth is that a good number of them will just move on, never to be seen again.

There’s nothing wrong with that. Not everyone has spare time to devote to your project, which is perfectly fine. It is natural for most people to contribute what they need to a project and then go on their way. Actually, one could argue that most projects grow and prosper precisely thanks to this kind of contributions.

However this attitude can become an incumbent when big chunks of code get merged, usually as new (big) features. Good practices advice against merging huge pull requests. In fact they are rare and when they do come, it is a good idea to ask for them to be split into smaller ones. But no matter the format, a huge contribution is likely to hit a project one day or another. It might even come from more than one person: a disconnected and distributed team of contributors who have been patiently tinkering on a side branch or a fork for example. When this happens, and provided that the contribution is worth merging, the maintainer should then ask him/herself the obvious question: am I willing to deal with the consequences of this merge?

In fact this is the exact scenario I’m dealing with right now. The Eve project has always been a MongoDB shop. Since its inception however people have been asking if (when) SQL support was going to be added. I think we were in Alpha when someone started contributing SQL code. Over time I ended up devoting a specific branch to this feature. Several people have been hacking at it since then, and what a splendid job they did! To say that it was a huge commitment is an understatement but, in time, they managed to deliver. So now we have this awesome sqlalchemy branch which is feature complete and ready to be merged ahead of the new Eve release. We’re talking 4K+ lines of code and 44 files changed. Code quality is not under discussion. I know that several companies and individuals have been using that branch in production with good success, even when it still was at its early stages.

This is very exciting as adding SQL support has a good chance to greatly improve the audience of the project. At the same time however, I’m a little bit nervous if not scared, and I have been for a while. Am I ready to deal with the consequences of this supermerge? Inevitably SQLAlchemy tickets will be opened and Stack Overflow questions will be asked. SQL-related pull requests will come in and mailing list posts will flock. To be honest I don’t think I can handle that, let alone allocate more of my free time to the project. Also, I’m not very confident with SQLAlchemy itself so I would not be the best person deal with that code anyway. In the recent past while discussing SQLAlchemy support on the mailing list, I have been very clear about my concerns, so much that I probably scared a few people away. What worries me the most, I suspect, is the risk of new code becoming stale one day or another. In time that would probably impact the reputation of the whole project.

To think about it, we already had something similar happen in the past, although for a smaller feature. The Document Versioning pull request, contributed by the amazing Josh Villbrandt, had been daunting me with similar thoughts. New code was going to be be quite intrusive, adding a good deal of complexity to an otherwise relatively simple codebase. Everything went amazingly well though. Josh is still an active contributor. He helps with improving his own feature and, even more importantly, other contributors are now helping with Document Versioning as we speak. Overall, the Eve project as a whole as been enjoying a growing number of skilled contributors and adopters. It’s been a joy to see people commenting on open tickets, offer support on the mailing list and even on Stack Overflow. So that should be encouraging.

So here I sit, with 4K LOCs ready to be merged. What do I do with them? I considered a few options. One is leaving the SQL feature in a separate branch. Another is to ask the team to refactor the whole thing into an external extension (we have a few already). By doing any of these Eve-core would remain MongoDB only and I could keep managing it on my own. But then again, none of these options would add native SQL support to Eve. Also, an extension or a branch would run even a greater danger of becoming stale.

At some point I guess even mildly successful projects like Eve have to decide wether they want to outgrow their author. I strongly believe that growing and trusting communities is all that open source is about. You release your work out there and, even at that early stage, you are already entrusting people. You trust that they will take notice and then that they will validate your project (or not). Eventually, someone will review your code, adopt it and, in time, contribute. The project then grows up to a point where its community becomes so predominant that you, as the author and maintainer, just have to let some control go.

So yes, SQL support is coming to Eve, and as a native feature. I trust that the contributors to the SQLAlchemy backend will stay around and, if they won’t, that someone else will stand up and take the torch. I am also confident that the community as a whole will adopt the feature, make it grow and well… we’ll see what happens next.

If you want to get in touch, I’m @nicolaiarocci on Twitter.

Feature Overview: The Eve OpLog

The operations log or OpLog is a new Eve feature that I’m currently developing on the oplog experimental branch. It’s supposed to help in addressing a subtle issue that we’ve been dealing with, but I believe it can also emerge as a very useful all-around tool. I am posting about it in the hope of gathering some feedback from Eve contributors and users, so that I can better pinpoint design and implementation before I merge it to the main development branch.

What is the OpLog?

The OpLog is a special resource that keeps a record of operations that modify the data stored by the API. Every POST, PATCH, PUT and DELETE operation can eventually be recorded by the oplog.

At its core the oplog is simply a server log, something that’s always been on the Eve roadmap. What makes it a little bit different is its ability to be exposed as a read-only API endpoint. This would in turn allow clients to query it as they would with any other standard endpoint.

Every oplog entry contains few important informations about the document
involved with the edit operation:

  • URL of the endpoint hit by the operation.
  • Kind of operation performed.
  • Unique ID of the document.
  • Date when the document was updated.
  • Date when the document as created.
  • User token, if User Restricted Resource Access is enabled for the endpoint

Like any other API-maintained documents, oplog entries also contain:

  • Unique ID
  • ETag
  • HATEOAS meta fields, if enabled.

A typical oplog entry would look something like this:

{
    "o": "DELETE", 
    "r": "people", 
    "i": "542d118938345b614ea75b3c",
    "_updated": "Fri, 03 Oct 2014 08:16:52 GMT", 
    "_created": "Fri, 03 Oct 2014 08:16:52 GMT",
    "_id": "542e5b7438345b6dadf95ba5", 
    "_etag": "e17218fbca41cb0ee6a5a5933fb9ee4f4ca7e5d6"
    "_links": {...},
}

To save a little storing space field names have been shortened when possible (what can I say, I’m a MongoDB guy): o stands for operation, r stands for resource, i stands for unique ID and c stands for changes. Other keys are defined by the configuration settings, and their default names are shown here.

How is the oplog operated?

Three new settings keywords are available:

  • OPLOG
    Sets the oplog name and defaults to oplog. This is the name of the collection on the database and also the default url for the oplog endpoint.

  • OPLOG_METHODS
    A list of HTTP methods for which oplog entries are to be recorded. Defaults to ['DELETE'].

  • OPLOG_ENDPOINT
    Set it to True if an oplog endpoint should be made available by the API. Defaults to False.

  • OPLOG_AUDIT
    If enabled, IP addresses and changes introduced with PATCH and PUT methods are also logged. Defaults to True.

So by default the oplog is stored on a conveniently named oplog collection, it only stores informations about deleted documents.

Since the eventual oplog endpoint is a standard API endpoint, if it is enabled the API maintainer can also fiddle with the endpoint settings as he/she would do with any other Eve endpoint. This allows for setting custom authentication (you probably want this resource to be only accessible for administrative purposes), changing the url, etc. Just add an oplog entry to the API domain, like so:

'oplog': {
    'url': 'log',
    'auth': my_custom_auth_class,
    'datasource': {'source': 'myapilog'}
}

Note that while you can change most settings, the endpoint will always be read-only, so setting either resource_methods or item_methods to something else than ['GET'] will serve no purpose. Also, unless you need to do so, adding an oplog entry to the API domain is not required as it will be added automatically for you.

Why the OpLog?

Clients have always been able to retrieve changes by simply querying an endpoint with a If-Modified-Since request. So why do we need an operations log? Of course because server-side logging is cool, and so is auditing, but it’s not only about that.

Single entry point for all API updates

From the client perspective and for most use cases logging inserted, edited and replaced documents is probably a waste of both space and time, and this is the main reason why only DELETE operations are logged by default. However, I believe there are scenarios where remote access to a full activity log can be useful.

Imagine an API which is accessed by multiple apps (say phone, tablet, web and desktop applications) and all of them need to stay in sync with each other and the server. Instead of hitting every single endpoint with a IMS request they could just access the oplog. That’s a single request vs several, and since the oplog itself is a standard API endpoint, they could also perform IMS requests against it for optimal gains:

Server, please send me all the changes occurred to the API since my last access. Sincerely, Your Client.

Again this is not always the best approach a client could take. Sometimes it is probably better to only query for changes when they are really needed, but it seems cool to have both approaches available (and remember, the oplog endpoint is disabled by default).

Fixing 304s

And then there are deleted documents which are a completely different beast. With no oplog we would have no way to tell if and when any document has been deleted, let alone inform clients about that. Actually, there is an open ticket precisely about this, and it’s been sitting there for a while.

When a If-Modified-Since request is received, the API is expected to respond with a 304 Not Modified status if no changes occurred, so that clients can conveniently fallback to cached data. Up to version 0.4 (the official release at the time of this writing) Eve has been doing exactly that, with one caveat: missing documents were being ignored as, in the contest of the IMS request, there was no way to know about them.

The operations log will allow Eve powered APIs to take deleted documents into account, returning perfectly proper 304 codes as needed. The impact on performance should be minimal as we will only query the oplog when and if no changes have been detected on the target collection.

This solves only one half of the problem however. What happens when a IMS request comes in and deleted documents are found in the backlog? How do we report them back to the client? Three options come to mind which would address this scenario:

  1. Respond with a 200 OK and a the usual “changes since IMS date” payload, which might happen to be empty if only deletions occurred in the time window. The client can then go and query the oplog endpoint with the same IMS date, finally getting the list of deleted documents IDs.
  2. Include deleted documents IDs in the standard payload (within the _items list), maybe with a deleted status tag. This status tag is something new though, and for consistency we should probably add it to other objects in the payload.

  3. Add support for a new _deleted meta field in resource payloads. When deleted documents are spotted in the backlog the response payload will include them in their own list. Something like this:

{
    "_items": [<list of edited and added documents>],
    "_deleted": [<list of deleted document IDs>]
    ...
}

First option is so bad I should probably not be listing it at all. It would take two roundtrips to get the whole update down. Also, it would kind of force API maintainers to open the oplog endpoint to their clients.

I’m not convinced #2 would be a good idea either, as objects in the items list would not be homogeneous anymore and we would have to add support for a new meta field anyway (the status tag).

Option #3 on the other hand looks quite good to me. It does not require multiple requests to handle the case of deleted documents on IMS requests, and it is still easy and clean for clients to process. I am going to go with #3 unless feedback is negative and for good reasons, so let your opinion be heard.

Closing concerns

I am slightly concerned about the performance impact, not so much on IMS requests but rather on write operations, especially when a complete, all-operations log is being recorded.

In MongoDB world OpLog is probably an ideal candidate for a capped collection. I’m not entirely convinced about that though, as by its nature a capped collection is bound to lose data over time, which again might lead to inaccurate 304 handling.

I am not implementing the OpLog at the data layer level however. It is a business layer feature to let other engines take advantage of it. Nothing prevents the MongoDB admin from setting the oplog as a capped collection anyway. Also, keep in mind that like all other resources maintained by the API, indexes are not handled by Eve itself so you will have to do your homework in that field too.

So here you have it. I’m currently done on both configuration and logging parts and will be working on 304 handling and response payloads in the coming days so that all of this can be included with next version 0.5. Be warned that at the moment the develop branch has no support for IMS requests on resource endpoints. It’s been disabled to avoid providing clients with inaccurate responses (see the ticket above).

If you have any comment or feedback to provide, please let me know in the comments below. I’d really appreciate that.

PS. In case you are wondering yes, the Eve OpLog is heavily inspired by the awesome MongoDB OpLog.

Ordered Dictionaries with Python 2.4-2.6

OrderedDict is a super handy data structure.

An OrderedDict is a dict that remembers the order that keys were first inserted. If a new entry overwrites an existing entry, the original insertion position is left unchanged. Deleting an entry and reinserting it will move it to the end.

Problem is, this stuff is only available in the standard library since Python 2.7 while my project also needs to support Python 2.6. Fortunately there’s a back-port available and it is only a pip install away:

# make OrderedDict available on Python 2.6-2.4
$ pip install ordereddict

ordereddict is based on the awesome recipe by Raymond Hettinger, works with Python 2.4-2.6 and, most importantly, is a drop-in replacement for OrderedDict.

However if you want your code to run seamlessly on all Pythons there’s still some work to be done. First of all you want to make sure that the appropriate OrderedDict is imported, either the standard library version (for Python 2.7 and above) or the back-port release.

This is easily accomplished, and in a very pythonic way:

try:
    # try with the standard library
    from collections import OrderedDict
except ImportError:
    # fallback to Python 2.6-2.4 back-port
    from ordereddict import OrderedDict

Fixing setup.py

If you are shipping your code as a package then you also want to make sure that setup.py properly handles the different Python versions. Since setup.py itself is nothing but standard Python module, we can make it more dynamic by applying the same technique above.

#!/usr/bin/env python
from setuptools import setup, find_packages

# your standard required modules
install_requires = [
    'simplejson',
    'cerberus',
    'events',
    ...
]

try:
    from collections import OrderedDict
except ImportError:
    # add backport to list of required modules
    install_requires.append('ordereddict')

setup(
    name='appname',
    version='0.1',
    packages=find_packages(),
    ...
    # no matter which Python, we're now good to go
    install_requires = install_requires,
    ...
)

Handling requirements.txt

When it comes to pip’s requirements.txt, what I think works best is to simply add a diff file which targets old Python versions like so:

# py26-requirements.txt
# install from 'canonical' requirements.txt first (DRY)
-r requirements.txt
# add specific Python 2.6 dependencies
ordereddict

A developer using Python 2.6 would then go with

$ pip install -r py26-requirements.txt

Whereas someone on a recent Python would simply run

$ pip install -r requirements.txt

Since py26-requirements.txt explicitly lists Python 2.6 dependencies only and then relies on requirements.txt, most likely you will only need to update the main requirements when/if new dependencies are needed.

You can check out the commit where Python 2.6 support for OrderDict has been introduced.

If you want to get in touch, I’m @nicolaiarocci on Twitter.

Taming Portable Class Libraries and .NET Framework 4

If your project is a Portable Class Library and you want it to run with the .NET Framework 4 well, you are in for a few surprises. Especially so if you are using InstallShield for building your deployment package. We’ve been going through this a few days ago and it’s been kind of  a wild ride. I thought I could pin the whole thing down so that others might enjoy a painless journey through all this mess.

Portable Class Libraries and .NET Framework 4

The first thing you should know is that while the .NET Framework 4 does support PCLs, in fact it won’t run them without a patch. For whatever reason, Microsoft decided that PCL compatibility wasn’t a worth a 4.0.4 update. That leaves us with the need to not only make sure that target machines are running the up-to-date .NET4 release (v4.0.3) but also that they’ve been updated with KB2468871.

You might be wondering why this is an issue in the first place. We could simply install the .NET Framework 4.5 which is backward compatible with the .NET4 and includes the afore mentioned KB2468871. Even better, we could just target the .NET 4.5 on our PCL. Problem is that besides iOS, Android, WinPhone and Silverlight we also want our libraries to run seamlessly on as many Windows editions as possible, Windows XP included. Here is the catch: .NET4 is the last framework version to run on Windows XP. And yes, we got the memo, Microsoft officially abandoned Windows XP a while ago so why bother? Well it turns out that millions of users are still running XP, especially so in the enterprise and SMB. These PCL are targeting exactly that, precisely the accounting software segment, and believe me there’s a huge number of users happily invoicing and accounting on their old-fart-but-still-splendidly-doing-its-job-for-cheap boxes. Oh and the .NET Framework 3.5 is not an option as it doesn’t support Portable Classes at all.

All things considered, it’s still good news. We can build PCL which run everywhere, Windows XP included. We only need to make sure that both the .NET Framework 4 and the KB2468871 are installed on target machines. Easy enough, right?

The strange story about KB2468871 InstallShield Prerequisite

We rely on InstallShield for building our distributions, so I was delighted to find that it comes with a KB2468871 Setup Prerequisite out of the box. All we had to do was add the prerequisite to our setup and we would be done. In fact, our first test was encouraging. We ran the setup on a pristine Windows XP. It installed the .NET Framework 4, then the KB patch and then, finally, our own application which included the PCL libraries. Everything was running smoothly. We then moved on to test the same identical build on a fresh Windows 7 machine. Again, it installed the .NET4 just fine… and then it crashed. Actually, the setup itself did not crash. It was the KB2468871 which was crashing while the main setup itself was left idle, waiting for the KB install to complete. So, what was going on there?

CPU Architecture Does Matter

To make a long story short, after a lot of investigation and an embarrassingly high number of tests we found that our setup was only crashing on 64bit editions of Windows. It turned out that the issue was with the InstallShield Prerequisite itself. It was broken. In a bad way.

The KB2468871 comes in three flavors:

  • NDP40-KB2468871-v2-x86.exe
  • NDP40-KB2468871-v2-IA64.exe
  • NDP40-KB2468871-v2-x64.exe

Three executables, each one targeting a different architecture. Upon inspection of the stock prerequisite however, we discovered that it was launching the x86 executable no matter what the target edition of Windows was. That explained the crashes on x64 systems.

The solution was to create a new custom prerequisite which would download and launch the x64 KB edition on 64bit systems. We then had to update the stock prerequisite too, so that it would only run on x86 systems. So we now had two specialized KB2468871 prerequisites. One for 32 and another for 64 bit systems. They would be launched alternatively depending on the target system. We proceeded to add them both to our InstallShield project and rebuild it. Then we went and tested it against freshly installed Windows. It installed the .NET Framework 4, then totally skipped the KB (like the prerequisite didn’t even exist) and finally proceeded to install both PCL and main application – which of course would crash on execution as there was no KB on the system.

Beaten, but not ready to admit defeat yet, we went back to the drawing board.

Execution Order Does Matter. Or Not.

Our custom launch conditions for both our KB prerequisites were there, and they looked good. Then there was this other conditional triplet. It was validating two registry keys and then making sure that mscorlib.dll existed in .NET4 folder. So, the idea was that the KB installation should only be executed if the .NET4 was on the target system. That sounded perfectly reasonable. As we could configure the order in which prerequisites were executed, we just had to make sure that the .NET4 prerequisite was assigned a higher priority, so it would run before the KB prerequisites themselves. The prerequisites order was fixed, a new setup was built and… nothing changed. KB were still not being installed.

Prerequistes order did not seem to matter. In fact, if we removed that check-if-net4-is-there conditional triplet the KB would install. However that was not an acceptable solution because then the KB would always be installed, causing a reinstall (and a waste of time and resources) on most systems.

Then I had my epiphany. Maybe launch conditions were being evaluated all together at boot time, for all prerequisites, before they were installed and regardless of their execution order? Non-sense you might think (I did). Why allow me to set an execution order in the first place, if launch conditions for each item are not going to be evaluated in that order? Luckily, validating this theory was going to be quick and easy. We just had to reboot the system after the faulty installation was completed, then run the setup again. And guess what? On second run, after the reboot, the KB installation would be executed without a glitch. Bingo. Since the .NET4 had been installed on the previous run, now registry keys were there and the mscorlib.dll was in the right place, so the KB launch conditions were finally met.

We ended up replacing that bogus conditional triplet completely and changing the validation logic. Instead of checking if the .NET4 was installed (on pristine XP and Windows 7 systems, which don’t have the .NET Framework 4 preinstalled, it just could not be there yet), we were now simply checking if KB itself was installed. After all it just needs to be there, no matter the Windows or framework version (on .NET 4.5+ the KB will be installed by default).

So now, armed with our brand new custom KB2468871-x64 prerequisite and a totally re-imaginated set of launch conditions we were finally able to build a setup that would deliver a fully functional Portable Class Library which would run on all possible Windows: XP, Windows 7, Windows 8… independently of the CPU architecture. Victory!

If you read this far you probably noticed that I didn’t include instructions on how to apply these changes to the stock prerequisite, let alone create a new one. You can find those instructions on the InstallShield website, or you can simply use our modified prerequisites. Of course we are providing them without any guarantee that they will work for you. They did for us, and that’s it.

Too long; didn’t read

  1. Portable Class Libraries won’t run on plain vanilla .NET Framework 4.0 unless KB2468871 is installed;
  2. If using InstallShield KB2468871 Setup Prerequiste you’re in for a wild ride;
  3. However, and until an official fix is released, you can opt to use our modified prerequisites instead.
  4. Get the custom KB2468871 (64x) InstallShield prerequisite.
  5. Get the custom KB2468871 (x86) prerequisite.

Conclusion

The stock KB2468871 InstallShield prerequiste has been out for a good while so I’m baffled that to this day I still cannot find any reference about these issues on the internet. Portable Class Libraries are probably still a niche and the fact that most of Macrovision KB resources are hidden behind a wall does not help. Soon or later an official prerequisite update will be released. Until then, feel free to use our mods.

I’ll just add that if we were dealing with an Open Source project, we’d just open a pull request and be done with it.

If you want to get in touch, I’m @nicolaiarocci on Twitter.

Microsoft’s New Running Shoes

When Ballmer famously said, “Linux is a cancer that attaches itself in an intellectual property sense to everything it touches,” it was fair to characterize Microsoft’s approach to open source as hostile. But over time, forces within Microsoft pushed to change this attitude. Many groups inside of Microsoft continue to see the customer and business value in fostering, rather than fighting, OSS.

via Microsoft’s New Running Shoes.

10 Most Common Python Mistakes

Python’s simple, easy-to-learn syntax can mislead Python developers – especially those who are newer to the language – into missing some of its subtleties and underestimating the power of the language.

With that in mind, this article presents a “top 10” list of somewhat subtle, harder-to-catch mistakes that can bite even the most advanced Python developer in the rear.

via 10 Most Common Python Mistakes.