Tuesday, May 4, 2010

The Security Exchange Commission, Python, Excel and Functional Programming

What does the SEC have to to do with Python, Excel and Functional Programming? Well, the title of this blog is Random Thoughts, so here are a few random thoughts.

Two weeks ago the US SEC Security and Exchange Commission proposed significant revisions to the rules governing the offering process, disclosure and reporting for ABS (Asset Backed Securities).

One of the proposals is to require ABS issuers to file Python computer programs describing the flow of funds (called waterfall) in ABS transactions. I think that's pretty cool!

That's cool for investors

Waterfall provisions in modern securitizations and CDOs (Collateralized Debt Obligations) are complicated. They're described in 'prospectus' documents written in plain english. That's verbose, error prone, and makes it difficult for investors to create credit and cash flow models and conduct their own evaluations of ABS. The last financial crisis revealed that many investors relied on ratings provided by third party credit rating agencies instead of doing their own evaluation homework, and were not fully aware of the risk in the underlying mortgages within the pools of securitized assets.

Providing a Python program modeling the waterfall for each ABS would help investors conduct their own evaluations and make more informed investment decisions. The Python program source code would be filed on EDGAR (Electronic Data-Gathering, Analysis, and Retrieval). Investors could just download the program for an ABS, input their data assumptions and model the waterfall right away.

That's cool for Python and programming

I've dissected a few ABS prospectus documents and I'm starting to think that these Python programs could look just like Excel spreadsheets (which most investors are familiar with), with user defined formulas written as Python functions, in a functional programming style.

I did some intense Excel programming long time ago for a project at France Telecom. I needed to model the characteristics of their applications, middleware and network. I was a C coder at the time and had to switch to Excel programming for 4 months. What a shock! :) My first reaction was to go back to my programmer comfort zone and write Visual Basic instead of Excel formulas. Then I decided to try harder with Excel formulas, and I loved it! After a few weeks I was amazed by the power of such a simple programming model: cells, values and formulas that you can just compose... and how easily I could translate my specific business and network engineering rules to Excel formula 'code'. I guess I'm stating the obvious to the millions of Excel users out there :).

More recently I've become interested in functional programming, and I'm starting to see a lot of similarities between how I approach functional programs and Excel worksheets. If Excel supported recursion, the ability to define reusable formulas and treat them as data, I would dare to say that it's the most successful functional programming language ever! After all, a spreadsheet is just a functional program made of formulas/functions evaluated as their values are needed.

I've also done quite a bit of programming in Python, and find it pretty natural to write Python programs in a functional programming style (mostly inspired from Scheme). Purists will argue that Python is not a real functional programming language like Haskell. I agree, but I've found enough functional constructs in Python to get close to the happy functional feeling I get with Scheme. Python is also less cryptic than Haskell and simpler than F# or Scala.

Plain english, functional programs and Excel programs

One of the things I like about functional programming is how I can translate a plain english description of a real world process or rule into a functional program. Let's take a simple example and write a plain english description of a function (or formula) that calculates the sum of a list of numbers:

"The sum of a list of numbers is equal to the first number in the list plus the sum of the numbers in the rest of the list. If the list is empty the sum is 0."

Let's compare that to a typical imperative programming implementation of "sum" (with assignments, state and side effects):

In Java:
static int sum(int[] numbers) {
    int total = 0;
    int i = 0;
    while (i < numbers.length) {
        total = total + numbers[i];
        i = i + 1;
    return total;
In Python:
def sum(numbers):
    total = 0
    i = 0
    while i < len(numbers):
        total = total + numbers[i]
        i = i + 1
    return total

That's quite different from our original english description, right?

It'd be interesting to reflect on what a programmer needs to go through to come up with that code. There seems to be a lot of invention and concepts to grasp here: several variables, a loop, access to values in a list using an index.

How would you you explain the existence of variable "i" and its relationship with the original english description of "sum"? What's a variable by the way? Do we call it variable because it varies over time? Is "numbers" a variable too? But it doesn't vary, right? Is it just a value then? What does it mean to loop over a list? What if I don't know the length of the list upfront?

Here comes incidental complexity... Incidental complexity is different from the inherent complexity of the real world process you're trying to model in a computer program. It is the complexity baggage that comes with a particular way to model that process and the programming language used to do it.

You don't want incidental complexity. It gets in your way as you're just trying to calculate that sum...

Now, let's write "sum" in Scheme:
(define (sum numbers)
    (if (empty? numbers)
        (+ (first numbers) (sum (rest numbers))

Then in Python again, using a functional style inspired by the Scheme version:
(using simple Python implementations of "empty", "first" and "rest")
def sum(numbers):
    if empty(numbers):
        return 0
    return first(numbers) + sum(rest(numbers))

Isn't that functional version of "sum" much closer to the original plain english description? To me, that looks like a straight translation of the plain english description to Python, more precise, and executable. No more incidental complexity here!

Wouldn't that be a good way to translate an ABS waterfall prospectus to a computer program? Could an Excel user working at a financial institution write that piece of Python code?

To try to understand that, let's try to get in the mind of an Excel power user and see how he'd model "sum" in Excel if it wasn't already a builtin function.

1  A       B
2  Numbers Sum
4  1       6 = A4 + B5 = first number in the list (A4) + sum of the rest (B5)
5  2       5 = A5 + B6
6  3       3 = A6 + B7

Surprise! That's almost the same thing as the Python or Scheme sum() functions!

"= A4 + B5" is "the first number in the list" (A4) + "the sum of the rest" (B5).

The Excel version introduces a few subtle differences, however:
  • No recursion, the Excel model is flat and the application of a formula to a series of values is realized by copying/pasting the formula over a series of cells (B4:B6).

    I'm not sure what to think about that yet. Recursion is a powerful functional programmer tool, sometimes difficult to master in computer programs, but used extensively in financial formulas.

  • Formulas have no name and are reused by copy/paste instead of references to their name.

    That's interesting. Most people will agree that copying the same formula over many cells makes it more difficult to change it. But naming a function forces the spreadsheet developer to learn that name, like a new word in a language. Once you have 100 functions, that's 100 words, and you've invented a new ad-hoc language (called a Domain Specific Language these days). That new language is often more complex than Python or Excel, and you're forcing people to learn it. Is that better?

  • Function and function application are mixed. B4 doesn't contain the sum() function, it contains an application of sum() to live data (A4 and B5). In other words B4 contains an instance of sum() loaded with the values from A4 and B5.

    Is that bad? No, I think it's actually great as you get instant feedback with the results of your formulas as you shape your code and data. Think of it like a  builtin TDD (Test Driven Design), which you wouldn't have to worry about. Think about writing code, testing it at the same time without having to explicitly write tests, and using the test results to guide you as you code. Sounds like a programmer's paradise? That's what you do with Excel, without even thinking about it :)

Off to the next financial crisis

Going back to ABS waterfall Python programs... If the SEC proposal is adopted, I think they'll present a great opportunity to marry the Excel-like worksheet paradigm, Python and functional programming.

Pushing that dream a little further, here would be my wish list for a Python-based ABS waterfall development environment:
  • Feels like an Excel worksheet, tables, cells, workbooks, the ability to mix presentation, values and formulas.
  • Takes formulas as simple Python functions.
  • Allows me to define my own reusable named formulas and just copy/paste their name around.
  • Like Excel, loads my Python functions with live data and shows me the results all the time.
  • Tolerates an incomplete worksheet (or program).
  • Allows several people to collaborate on a worksheet.
  • Is open-source (as the SEC proposal mentions the open source nature of Python).
  • Runs in the cloud off a bunch of Web pages -- what doesn't these days :)

Anyone would like to try to build that environment?

Just imagine what millions of Python-enabled-functional-spreadsheet-in-the-cloud power users could do to the next financial crisis :)

Let me know what you think...

1 comment:

Richard Careaga said...

A simplified example of a waterfall program can be seen at http://www.pylaw.org/demonstration.txt

The postings on this site are my own and don’t necessarily represent positions, strategies or opinions of my employer IBM.