Wednesday, May 8, 2013

A Programmer By Any Other Name

If you follow economics at all, the news has been awash in a sizable scandal involving a pair of economists and their bold claim - a specific level of government debt causes economic slowdowns - being found to have several serious problems. One of these problems was a critical error made in their Excel spreadsheet.

That's right. They were using Excel for high level, academic work that had direct policy impacts. This is the cue for everyone technical in the audience to feel smug.



Turns out this isn't uncommon - the use of Excel for this kind of work or the errors. In another instance, a very massive trade by JPMorgan Chase blew up spectacularly in part because of an error. In this instance, too, the flaw appeared to (temporary) benefit the party who wrote the formula.

I would argue using Excel for this kind of work isn't necessarily the biggest problem. That would be...

They Don't Think They Are Programmers, But It Is Likely They Are.

Sufficiently complex Excel formula/models are indistinguishable from a lot of other programming. There is some question as to whether the formula language by itself is actually Turing complete, but given the presence of VBA it is made trivially so anyway. Other complex business analysis/design tools are in a similar boat.

Given this, such users should be considered programmers. But since they are not, there is little encouragement or recognition that they should aspire toward the semblance of rigor that some software developers actually practice. Which isn't a lot, truth be told.

What We Actually Know About Software Development, and Why We Believe It's True

Shadow Programmers

About 40% of those employed in computer and mathematical occupations are actually software developers, but looking at the other job descriptions in that category, I would be shocked if they weren't actually programming - writing elaborate models worked up in Excel or maddening cursor-heavy SQL business logic.

And this category doesn't include economists. Undoubtedly there are even more occupations that act as shadow programmers. It is an open question if all white collar/academic occupations are or will be writing code. Certainly anyone who is doing anything like a proof is doing so.

Given the numbers, just like the shadow economy/shadow banking system, the number of shadow programmers almost certainly exceeds anyone with the title.

Possible Solutions

Professional certification/licensing

This is a stupid idea for a variety of reasons. First, the excessive number and nature of professional certifications are already a problem in the United States. They raise the barrier to entry and are used to kneecap potential competition. It is questionable they do anything to prevent catastrophes. And even if they did, there isn't a compelling state interest for a lot of programming to be of very high quality. A bug in a photo sharing app that trashes the user experience will have an appropriate market effect.

Outreach

This might actually work. If you see someone who is clearly programming - aka:
Oh no it's just this simple query…yes it spans 1000 lines and updates based on business logic with a trigger, why do you ask?
You call it out.
Yes that describes more or less the maximum complexity of any business/enterprise procedure. Let me give you some books. Let's write tests. Let's have reviews. Let's use version control.
Then you convince management. Either bite the bullet and (1) train this person to program in a responsible manner, which is not terribly expensive, (2) have a software developer take on the task, which can be expensive as heck if you need more of them, or (3) deal with the fact their business is built on bad code. That last one is cheap now, and potentially crippling later.

Wednesday, April 18, 2012

AI/Machine Learning Python Samples

I have a new repository on github that demonstrates a couple of basic machine learning and AI techniques, principally picked up from CS_373 and Stanford's Introduction to AI. It's all explained there, and I intend to add to it as I continue my eduction in the field.

Machine learning is something I rarely hear talked about in the spatial developer field. This is unfortunate, as machine learning can be an effecitive means of analyzing, managing, and generating spatial information.

Another cool thing is the documentation I put together for this. I used pycco, a very easy to use annotated code document generator (port of Docco). Here is a particle filter. Here is a Kalman filter.

Pycco

Posted via email from The Pragmatic Geographer

Tuesday, February 7, 2012

Spatial correction using a particle filter (applications of the AI class)

The Problem - old data, important data

Some of the most important spatial data is old. It was built up and maintained over decades by paper and early computer systems, and it represents power lines, roads, water pipes, and property lines. It would be good to know the precise location of this stuff.

The PLC power system was designed on in-house drawn lotlines. Today, the difference between those lotlines and the actual parcel locations is as much as 100ft, and in no consistent direction. What follows are attempts to correct the location of more than 20,000 structures without doing a significant portion by hand, using some techniques picked up in Stanford's Free AI class.

The correct location is the "Hidden" bit

Education in some very advanced and useful algorithms are now within the grasp of anyone with an internet connection and a decade old computer. More than a hundred thousand participated in the recently completed Stanford AI course, including myself. One particular technique caught my eye:

The problem being solved above is one of location - that is the hidden variable that needs to be estimated in continuous space. Why couldn't I do something similar for static assets like poles and underground vaults? With enough control points I could then move everything else relative to them (inverse distance weighted rubbersheeting) and vastly improve the data.

A Naive Approach

I wanted to start with the simplest possible implementation. I loaded the lotlines (old, hand-drawn), parcel polygons, and the poles into PostGIS. I then converted the lines and polygons to points, and decided to use the total sum distance as the mechanism for comparing candidate particles to the poles.

Again, very naive (and the data is too noisy for it to work), but it served a purpose - getting everything set up for my next iteration: comparing candidates based on tangent and distance as the robot sensors above undoubtebly do.

Posted via email from The Pragmatic Geographer

Friday, January 27, 2012

TileMill - what it does and some reasons to try it

We were promised jetpacks, but I'll take Tilemill as a temporary replacement: http://mapbox.com/tilemill/

971bc

The MapBox/DevelopmentSeed team has created one of the the last pieces really needed for mainstream open source GIS to gain really massive appeal

TileMill is used for making web maps - or more specifically - for generating tiles that make up the now-ubiquiteous slippy maps we see online.

There are other desktop applications that do this, the most notable being ArcGIS Desktop. But Desktop was built for other things first: advanced analysis tools, some pretty powerful editing capabilities, and authoring paper maps.

TileMill does one thing and it does it well. It costs nothing (compared to several thousand for some flavor of ArcMap), and outputs an open tile format that you can wire up to a webmap or iPad in less time than it takes to install ArcMap.

And it is smooth. The user experience is the best I have had with a desktop application in a long while.

It also has sane, plaintext css-like styling (MSS). This may sound like a no-brainer, but your options before this were basically some proprietary binary format from ESRI (not extensible, difficult to automate, limiting, vendor specific) or SLD, which is open source but widely regarded as something of a mess for other reasons.

There is also the training issue. ArcMap is giant and powerful - and extremely complex. The market for "GIS Analysts" is still strong in a large part because of this complexity. Less experienced users will find TillMill easier to pick up and web designers (of which there is a large pool of talent) will find it very easy.

It is out for every operating system of note.

4wonm

Seriously, go give it a try.

What else is needed

Conversion - minimal, well documented, mostly automated steps from ESRI - TileMill/FOSS

Samples like crazy - more or less emulate the ESRI samples, complete with documentation

Posted via email from The Pragmatic Geographer