atmlog

A log. To chronical my progress, observations and, hopefully, useful lessons.

POSTS

A stream of consciousness. Not targeted specifically for public consumption, but renders my thoughts, workflow and goals open and welcome to critique.

This approach was inspired by Mark Levin and his public SEO journal.

Journal

04-10/12 Sunday

So I started this entry 6 days back. And then work interupted. Let us continue.

My first entry in almost two weeks. A large reason for that is I havent felt the compulsion to use this medium for helping me to tackle any issues. Once my I had recovered from the mental slump I was experiencing the need and desire to… self medicate through this journal was alleviated. So I think we can safely say that the largest benefit of this journal was for working through that period, but I still have some lingering feelings I can make use of it in a more periodic fashion now!

The reason I have been drawn back is to help me to frame the backlog of work I have on my plate and how I can begin to tackle it. Let’s start by listing all the project I feel I have an active role in right now:

Work: ETL Interal Library system Team documentation

Outside/Personal: Remin Talespace

Misc.: Python meetup Javascript meetup

23/11 Tuesday

Mojo is back in its place and I’ve been having an exceptional time blasting through a few stories and refactoring the ETLv2 implementation. Though I am left with the question - would slowing myself down a little during these high periods to write and reflect here be valuable in the long run? Would the dialogue still yield meaning? My hunch is yes. If I can hone the dialogue I have with myself, perhaps make it more structured and targeted, rather than these rather free journal entries.

Today I have been working on a validation/early warning system for the ETL project. This is to provide us timely feedback upon failure of any part of the system which will come in the form of messages over email, slack and log entries. We have also discussed a ‘heartbeat’ report which will consist of a daily email showing birds eye metrics of the data that is come through the system - just another avenue for catching issues before they affect the business.

This evening, for a short time, I will be going over my react learnings and planning how I can finish this toy implementation before I go for a version 2 with redux.

Hoozah, finally got a google map rendering, now I just need to figure how to drop and re-render my own markers.

Goodnight!

22/11 Monday

Finally got a little mojo back last night. The CidTranformer came together quite nicely but a little testing nuisance caused a set back that I was not awake enough to fix. That will be the first item to take care of. Weirdly, when I was working through a little refactoring some tests suddenly broke. It appears to be something to do with pandas and not being happy with None value fields. I need to drill down and figure precisely which fields are causing the issue.

Remember: if you assign a variable to function call to a variable - make sure that bugger returns something sensible!

20/11 Sunday

Emacs therepy has been remarkably refreshing. It is an editor I always find myself coming back to. Perhaps this is merely another outlet for my procrastination but emacs leaves me with a feeling that trumps all others.

This evening I am cracking on the signup cleaner steps for the ETL endeavouring to catch up the lost time I have experienced. I’m feeling good. Let’s get into it.

First intention at hand is the CidTranformer step for the ETL system. These steps mostly require extraction from the previous implementation of the ETL but due to their highly coupled nature they require a good deal of rethinking which essentially amounts to an completely new implementation. This particular step is most concerned with an edge case for one of the companies products.

19/11 Saturday

Note the gap between entries. Frustratingly, life me a cheap shot once again. A debilitating, accident borne ankle injury rendered me out of intellectual commision for at least a couple of days. Sadly, the start of this journalling effort has been mired by rather uncharacteristic ups and downs - I hope there is no causality there.

So after a little emacs therapy, it is time to proceed with my work. Let us set an intetion or continue from my previous, though, moving forward, I wish to make these intentions more granular which means I will need to restate intentions more frequently.

16/11 Wednesday

A busy day lined up:

An hour to cram work Client/Freelance meeting at 10.30 Grind out some more work after lunch Grab dinner with the wife Meet a friend to crack on an NGO project this evening.

Todays work objective is to finish the signup cleaning steps for the ETL system. Fortunately this mostly requires pulling code from the original ETL and injected it into this new project in a clean manner.

15/11 Tuesday

Bi-weekly Tuesdays are our sprint planning days so very little productivity tends to take place unfortunately. Though this was party driven by being stuck at home having to deal with some household matters.

Home bound once more and I can continue my dive into React. Though let us consider an actionable intention to drive my efforts. In the current pet project I am grinding through I have gotten to a point where I can -almost- simulate the exact behaviour that I had in the angular version. However, I must say that a pure React implementation leaves a lot to be desired. I really miss the concrete structures that angular provides, it acts as a scaffold and guide. Though the simplicity of React is quite decent - you’re either creating more components and passing info via props, or manipulating state by firing functions that passed down from their parents. The latter part I dislike immensely. It is difficult to determine where is the most appropriate place to dump state. Though this freedom, I suspect, will become a great asset once I am more fluent and confident with React - and I can definitely see how Redux makes this tremendously more beautiful.

So for now I am working on getting the dynamic filtering working. I have figured it out for generating buttons but now I need to register the changed/filtered locations somewhere. I suppose it can go in the first common parent, but considering the app is so small it would be equally fair to put it in the App/Root which will enable me to share it with the Map component later - thus enabling the state to be persisted when the route is changed between List/Map views.

So yes, let us proceed with that notion. Keep all the state in one place. Settled. A pure React apps data flow is extremely reminiscent of Angular, though I am extremely satisfied that React compels/encourages discrete componentisation. Angular does this to a degree but only within the confines of the structures it provides which can be… stuffed full of any logic you desire. React components, however, along side ES6/Webpack syntax, quickly becomes cumbersome when you try to stuff too much into one file.

Ok, two more things to think about: 1) Using google maps, 2) Styling. Let us begin with google maps. The beautiful part now is that I can use a great deal of the components I already wrote in the List component. This is in contrast to the Angular style where it was simply convenient to copy and past the whole element. Now, if I copy and paste the React component, I will have very little code duplication (in fact none…) because the modules will be imported and shared across both components and the unique parts of the List component I will simply have to remove entirely as they have no place in the Map component.

14/11 Monday

Alrighty, a new work week dawns. Feeling much better this week and anticipate a greater degree of productivity. In part I think this renewal is due to to the observations I made on Saturday.

With the refactoring done for the ETL project I can now safely move on and complete the transformer steps. Most steps are done, I only need to consider the signup merging and cleaning. So rather than starting with a transformer I think I need another service/utility to merge dataframes based on a common attribute.

Tip of the day: learn postgres window functions - https://www.postgresql.org/docs/current/static/functions-window.html | https://www.postgresql.org/docs/current/static/tutorial-window.html They consistently save my ass.

The merge utility was simple enough, simply an interface to the pandas dataframe merge function. With that in place I also implemented some test helpers for signup data which helped to identify a mistake in the signup SQL query which is that the query should only return one row per customer (the latest signup based on the date passed), not every instance. The interesting part now is that I have complete freedom regarding how to move forward due to the extremely flexible architecture we have opted for. Either knock out the signup transformers or another utility that I have identified to union the datasets I get from different databases.

13/11 Sunday

Not necessarily unhappy about the lack of productivity today, my time was filled with family and household necessities.

12/11 Saturday

The slump persists. Granted it has been quite a refreshing change of pace and enabled some nice family time to work its way into my usual grind. Today, however, a new tactic is taking place in the Battle of the Slump. Figuring that perhaps the home work environment is contributing to my complacency and in desperate attempt to move myself away from the frustrating positive feedback loop I am feeling (“Frustrated because I can’t budge the slump, frustrated that I’m frustrated that I can’t budge the slump”).

I watched a fantastic lecture on procrastination by a fellow called Professor Timothy Pychyl which resonated with me a great deal. While I enjoy referring to these slumps as some sort of mental routine, I suspect they are more akin to severe hit of procrastination. According to Professor Pychyl, when we focus on giving ourselves small concrete, actionable intentions (rather than vague goals) we are many more times likely to fulfil those intentions - and feel good about doing so. This, I think, explains my relative success over procrastination since marrying as the need to prioritse and manage my time more acutely became important which naturally lead to seeking smaller, time sensitive implementation intentions. I also believe this is something that this journalling format can help with but I had been struggling to put into words. Intriguingly, an observation to support this hypothesis lies in this very journal! If we examine the current goals I have set for myself I think we can see a potential contributor to this present slump:

Work: ETL

  • Goal: V2 </sub>

Personal: React!

  • Goal: Let’s learn it for real. </sub>

Learning: What to Expect When you’re expecting!

  • Goal: Read selectively. Read that which applies to us now and that which I anticipate might. </sub>

What we can see if two very vague and broad goals, 1) ‘ETL V2’, 2) ‘React’, and one fairly more concrete, how to read ‘What to Expect When You’re Expecting’. The later, reading ‘What to Expect..’ has been going tremendously well, I have not skipped anything because my goal is clear. Learning React and finishing the ETL system are massively unclear. Not unclear in their distant deliverable but in the steps required to get there. So I propose an additional field to keep track of the ‘actionable intention’ I should be working on at that moment, or replace the goal all together with this ‘actionable intention’. I think the former would be most appropriate because the ‘actionable intention’ should be guided by and help bring myself closer to the goal.

My current action is to finish the refactoring and changes brought up in my last code review. The next is to move the SCP interface into the utility directory and make the specific SCP implementation the same format as the zuora downloader. Simple enough. Simple enough turned into a little more beautification than anticipated! I’ve been frustrated several times because the system wasn’t smart enough to auto create directories for me, so a quick bit of os library magic has cleared up that frustration, I will do the same for the zuora downloader module too.

11/11 Friday

Frustratingly empty.

10/11 Thursday

This morning I am giving React a little more of my love. The componentised workflow is compelling to me, especially since this was how I worked with Angular anyway, with a directive first approach. React offers a simpler interface for constructing components which I certainly find more desirable, the lack of structure, however, is not. The paradigm Angular enforces may feel strict initially, but it promotes a measure of consistency and predictability into your work. This is perhaps the only negative element I foresee with moving towards a reactive/functional approach to paradigm - sometimes unrestricted freedom and creativity leads to chaos. For sure the C++ and Java era project managers felt the same. Hopefully with modern tools and testing these concerns can be mitigated.

I have conquered the React event and state system so now my workflow becomes remarkably reminiscent of the Angular implementation I did for this little toy project. At this stage I am wiring up click handling to render each of the filtering buttons appropriately, and the React component composability is beginning to shine through.

09/11 Wednesday

My pace feels slow this week. I’ve been slogging my way through this downloader module without reaching a point where I am satisfied. Now, due to time constraints, I need to move on. Perhaps this resembles the new peak in my evolution as a developer. I am able to write working, well tested features that are retain a great deal of refactorable warts.

At the end of yesterday I had knocked out an SCPInterface class using the paramiko library. However, I inlined a bunch of configuration details so I will quickly apply Extract Subclass so that I have a more specific instance of this interface. Managed to squeeze in a few more transformer use cases and pull out a BaseTransformer super class to centralise some of common features. While I find the abstractions of OOP quite confusing (not in day to day use, but their relevance, why not just use functions… everywhere!) but I often appreciate the shape and structure they provide my code base.

Code review of my latest changes was mostly positive, just need to work on my naming and break some classes out into their own files, which is what I shall work on now. Refactoring has progressed smoothly, I am quite pleased with the way this project is shaping up. Code reviewing with a senior is proving to be a tremendous boost to helping me discover those refactorable warts in my code that I mentioned earlier in the day.

08/11 Tuesday

Logging is already paying its worth. I find the runtime feedback invaluable, even during testing or manually executing my code in the terminal. Observing the execution was a lesson I learned from ETL batch jobs so I’m pleased to see the value is transferable.

Yesterday we mostly completed the zuora downloader module. There is one more element that needs to be put in place, however, so that is what I’ll work on this morning. I am very pleased with the shape that TDD helps me to form. It really does act as a catalyst for design enabling you to treat you code in the way an artist might a block of clay. This mostly comes in the form of safe refactoring.

07/11 Monday

Inevitability struck and this self dialogue became neglected. Coincidentally, or causatively, I have been hit with, what I usually attribute to, a routine slump in my intellectual capacity. Usually after a period of time having been working close to peek performance my brain and body goes into a fierce, but brief, period of exhausted molasses. During this period I find rest to be the only cure. However, for complete transparency, these periods used to be very routine, maybe once a quarter, since getting married I am pleased to say that the respite and downtime provided by family time seems to have provided sufficient rest for my mind to continue working at an optimal state during work hours (and beyond, when family obligations don’t beckon!). So this calls into question the root of this recent slump that has at least seen two weekends of lackluster productivity. While this technique is too new to be attributed as a cause or saviour, it has lead me to observe a flow that I appreciate, and in these few days without it, I have made another observation that leads me to think this journal will be a sensible way to proceed. The latter observation is the way in which I attempt to tackle new problems and learnings. My usual tactic, prior to this journal, was to rely on my mind, to expect it to make connections and lead me to the next steps towards solutions and memory retention. Though, with effort, this worked during my university years, I have never seen this as an ideal approach, thus my pursuit of all things productivity enhancing! Anyway, enough of this for today, let’s proceed:

What am I working on? ETL, in short. After a very fruitful and enlightening pair programming session with a senior the system has been reformatted into a more robust structure. Now I am working on the daily data downloaders. These consist of a simple SCP script to grab data from another server and an interface to the Zuora data export api. Fortunately, both of these items are already fleshed out, I only need to rework them into a class structure.

It’s only taken me a couple of years but I think I have my head wrapped around the Python standard logging library. I’ve always felt that the performance impact that it represents would be more than worth the sacrifice due to, what I believe will be, invaluable insight into the runtime behaviour of any app I write. Keen to see how this pans out!

02/11 Wednesday

In the flooow. Haven’t felt the compulsion to resort to this dialogue format while I have been porting old ETL system code to a new, more beautiful format. Very satisfied with my progress.

Alrighty, with the new ETL system off to a good start I feel happy to start diverting a little of my time back to mastering React, beginning with a reimplementation of the little map toy I made in angular recently.

01/11 Tuesday

Very satisfied with my progress test driving the ETL system. A nice benefit of testing first is that I am encouraged not only to isolate and make things composable (go dependency injection!) but also to design my objects and data flows to be test friendly.

So this morning I am working on some data dependencies for one of the transformation steps in the ETL system.

31/10 Monday

As of last night I had a change of heart regarding the direction of this ETL project. While it is a noble idea to try and implement a custom system, in this day and age it seems ultimately meaningless, definitely feels like reinventing the wheel. I feel a more important sentiment in software development today is time and, of course, ‘agility’. Therefore, it is more sensible to take the rough edged implementation we have now and smooth it out into a more beautiful shape. All is not lost though. The initial work put down for the ETLv2 implementation is isolated enough to fit nicely into a good workflow/data pipeline toolkit. Talking of libraries/toolkits, Luigi is a wonderful, but rigid thing. It forces you to make pipelines that conform to a particular shape and direction. Last night, however, I came across SciLuigi which is a simple wrapper/layer above the core implementation of Luigi that aims to promote code reuse and more creative composition of task logic.

What am I working on? Having mapped out the dependency tree for the SciLuigi tasks I envision the steps I need to take to implement this workflow becomes very apparent. Currently the Zuora extractor is set and plugged into a task. Now I need the same for an AWC extractor task, followed by a task to merge both outputs into one file.

One quandary is irking me. Though it mostly revolves around my lack of knowledge with Pandas. I need to discover a few things. 1) How to map over an entire data frame 2) While mapping over a data frame, how to create a new series value as I go. My initial, naive, approach would be to pre-create the new series, iterate over the entire data frame and simply use the initial values I need along with the function necessary to create the new series value. However, there must be a more optimised way to achieve this. Pandas data frames have a few methods for iterative/mapping over the entire collection such as .map and .apply and even permit being iterated over like one would a regular Python dictionary like with .iteritems, except one value in the tuple takes the index and the other the whole set of custom series values.

So, at present I have the second databases extractor function in place. Next I need to combine the two database tasks/extractor outputs into one merge task. Ah, another issue I am struggling to capture in my minds eye is how to permit running an entire backlog in one fell swoop, while also enabling flexibility for daily runs. Perhaps if the extractors are always striving to fill the entire backlog then it will always default to filling up the data in between run intervals, and then also inherently provide a failsafe for days when APIs and network errors cause gaps in the update. However, I currently don’t know if this is compatible with the luigi way of doing things.

Turns out combining dataframes is a pretty trivial matter - go Pandas! .append and you’re away. However, it doesn’t make much sense when you have datasets that are not the same shape so I need to work on transformation functions before we can play with combining.

30/10 Sunday

After a pleasant lazy weekend I have a few hours to ponder my work. Though admittedly I have been interspersing my time with a little reading. I have been working my way through Douglas Crockford’s Javascript: The Good Parts which is a book I have had on my mind to read for quite some time now. It is not quite the book I imagined it to be since, so far, it is merely a discussion of fundamental building blocks of the language. Perhaps more insightful revelations will come later.

29/10 Saturday

Enjoyed a work free lazy day - not at all dissatisfied!

28/10 Friday

The journal commitment was broken yesterday but I would argue that it was for the right reasons - pair programming and learning react. These two activities consumed my day. I have really absorbed a good amount of React and its philosophies are becoming more clear to me. In the first tutorials I watched I was a bit confused about the supposed unidirection data flow that is supposed to be one of the hall marks of the library (I don’t want to keep calling it a framework because it is, intentionally, too limited in scope). Most of tutorials I watched explained event handling in React by passing functions down the hierarchy as props on components. This is a very simple model to understand. Add functions to the parents that alter state, send those functions to the child components, then they can alter state directly which can be propagated to the entire component tree. My first thought, however, is that this didn’t seem very intelligent! Especially for a library getting so much love from the functional crowd. Finally, however, I discovered Flux, or Redux, which makes the event management in React much more sensible! I wont chatter on the topic, but I am excited!

Today, and for the next sprint or so, I will be banging on this new ETL implementation. My colleague and I have a good workflow in place and our mentalities are aligned towards making this a beautiful piece of software!

What am I working on: For the most part, moving scripts into objects as part of the first iteration of the ETL master plan. The plan essentially consists of smashing out a well separated, working implementation without too much thought regarding the design - a sort of Sandi Metz mentality. The idea being that once we have a bag of small, well tested objects, we can then begin pattern matching and refactoring the logic into a more robust architecture. So, task one for today is moving the cohort extractor logic into a few classes.

27/10 Thursday

26/10 Wednesday

Today was a fruitful day of pair programming with a colleague of mine. Mostly we put our minds together to work out a first draft plan of a new ETL system. Since the current version is mostly a mesh of scripts and luigi tasks it will be meaningful to weave them together into a neat, unified system.

So I spent a fair amount of time digesting and poking react tonight. Turns out there are some quirks that crop up when trying to feed ES6 syntax to react classes (and any object for that matter), definitely something to watch out for in the future. (https://derickbailey.com/2015/09/28/dog-es6-arrow-functions-really-solve-this-in-javascript/)

React appears to be less frightening now that I have heard some insight the difference and importance of props and state.

To my current mind we could equivalently refer to these concepts as:

  • state: data saved or set at component creation
  • props: data passed to the component, usually from a parent

Given these concepts, the easy way to associate local functions to a component, and the well defined life cycle functions available, I can see working with the react philosophy becoming quite intuitive - vastly more so than working with Angular components! However, we must not forget that React, in its basic form, is only concerned with the UI structure and behaviour. Therefore, at this naive stage, Angular wins due to its comprehensive toolset and the harmony with which it works together. I am interested to see how I can mimic Angular services in React and also the convenience of testing.

25/10 Tuesday

Let’s endeavour to stream line the input to this journal. The dialogue on 23/10 felt quite productive, centered around framing problems/solutions, lessons learned, and being able to answer “What am I working on?”.

What am I working on? Additional validations for the bulk zuora extractor. A strange, untraceable error caused the report updates to fail yesterday. This was easily remedied by running the tasks manually, but unfortunately we had to hear about the error from a stakeholder rather than hearing about it directly from the task. This will can be solved by adding some sort of before and after checks to ensure data has entered the database. Additional things to fix are proper scheduling times for the cronjobs, it appears my UTC calculations were off and the tasks are being run during the Malaysia work day! There is also one lingering issue with the task runner itself, it should be run from within the virtualenv session to ensure all packages are accessible, rather than relying on directly executing the files hidden within the virtualenv dir.

I think I will work through the above backwards, as starting with fixing the task runner seems the logical place to begin! PR submitted, reviewed and merged. Also brought the zuora extractor query to a neat conclusion and had a meeting about the overview for the new ETL system.

The new ETL system will be a simple, well designed, OO heavy, Uncle Bob approved affair. Essentially we want to craft a simple system of independent units/services that enable constructing tasks of the for:

Reader (IO) -> Processing -> Writer (IO)

With scheduling, success state, reporting and validation as other independent units that provide additional functionality to bring the ETL system to a whole. So, to preempt the usual question: I am working on the architecture of this system, first to draw an outline, then solidify it in a simple UML birds eye view.

A depressing and enlightening read from the Wait But Why author: http://waitbutwhy.com/2015/12/the-tail-end.html. It succinctly sums up my biggest sadness and fear regarding living so far from my family. Damn. Additionally, in a mini post, he outlines the fact that the average day can be broken up into approximately 100 chunks of ten minutes and that evaluating how we utilise this handful of time could be extremely valuable - and I think could make a very neat productivity/self reflection app! Something to consider once I am satisfied with the state of Remin. http://waitbutwhy.com/2016/10/100-blocks-day.html

Alright, slight tangent after hearing about a remarkable lady popular in the Ruby community called Sani Metz she has a tremendous presentation style and pace which makes her an excellent communicator of difficult topics. Of particular note is a talk about concentrating on populating your code base with many many small, independent objects. Humorously this seems very reminiscent of functional programming, perhaps this is the natural order, good object oriented code will ever gravitate towards a functional style.

So I began my approach to the new ETL system endevouring to well, waterfall my way through, by thinking about as much of the app as possible and jamming it into a diagram. This has some merits of its own but detract from an angle philosophy. Instead let’s try a…. stone skimming analogy.

Skim Driven Design

Skim Driven Design. Start somewhere, anywhere that shouts at you as the most appropriate, or the only place that you can think of. Don’t get tied down by architecture. Most importantly, remember a stone being skimmed across water requires energy to ensure momentum can carry it across! Tests are your codes energy, without them you will never hit your mark. With tests we can give momentum to the implementation and anticipate our work being carried to its next logical step.

For me, especially with such a data centric piece of software, I want to start with the data. Dogma says no, the purist will complain and say “No, no, no, start at the core of your app and work out”. Screw that, if a baby starts with a cell and prospers from there, my app can come to fruition an object at a time.

Object 1: DataSource.

First, however, I need a test runner. py.test is the winner as far as I am concerned. I’ll start in a new virtual environment (conda environment) - clean slate.

24/10 Monday

A rather empty work day, finished the Angular test project which I am very happy with, joined a couple of work meetings, finished a book on Javascript promises.

23/10 Sunday

Alright, last day of wrap up for the Angular test project. I am very satisfied with the progress thus far and there is only a limited number of things to wrap up before I can begin plugging in some beautification.

As per yesterdays round off, we are working on the groupLocations service which should provide some functionality to limit the scope of locations retrieved from the database. In particular we should be able to limit by country and city. This is a very easy thing to implement if the limits are hard coded but it might be nice to have a more generic solution so that, in some hypothetical scenario, the JSON value obtained from the server is extended, the filtering functionality would immediately permit the new value to be included. So rather than hard coding the accepted limit variables, we should hard code that which should not be a valid limit value, in this case, lat, lng, id and name. This would mean a continent value could be added to the location object and the filter mechanics would immediately accept it to the filtering menu.

Here’s how I imagine this working. Next to the list or map object there will be a set of check boxes, or perhaps a select menu which will be populated by all the unique first level options for filtering, like city, country. When one is selected a second menu should appear with unique options for the 1st level selection, like a list of cities or countries.

So we will essentially build a conditional/decision tree. Though under the hood this can be easily implemented with filter, this should also make it relatively nice to test.

Let’s begin with some tests to get the service set up. So here’s a conundrum. Should I avoid a dependency within this grouping service and pass in the locations from the controller, or allow the grouping service to grab the locations from the same http interface service. There are arguments, in my head for both options:

  1. if we pass in the locations from the controller then the controller has more to do and is responsible for performing that action.

  2. if the group service gets a dependency then it is tied to this apps implementation, it knows about the source of the locations.

Now that I have framed the options in my head i see that former options is preferable because:

  1. it is the controllers job to wire things together, so long as it is not performing the complex logic itself.

  2. the grouping service becomes more portable and more generic without the need for dependencies.

Perhaps this is a neat metric for Angular development, less dependencies, more data flow. Time to pull the dependency out of the service and permit each function to accept arguments. Splendid, remarkable how much less config and start up tests are required with this simplification.

What interface should this service have, as a first stab I would say:

  1. getUniqueKeys/Groups

  2. filterByGroup

I foresee a problem if the groups become nested beyond…2 levels. How should the state of the grouped location list be maintained, yet flexible enough to permit rolling back as filters are applied and removed. Let’s test drive the initial implementation and flesh out the problems.

Randomly a test working yesterday began failing. Definitely due to the Jasmine async test done feature that I don’t understand. fortunately the function called within the init method is already covered in another test so I shall settle with that in favour of using more of my time there.

More unexpected test failures. I think something to do with the order of tests and/or not tearing down jasmine spies might be causing the issue. I will investigate upon completion of the service I am working on.

A simple implementation of the grouping service is done. It is quite dirty and not nearly as generic as I would like but that can come later if the requirements demand it. So, what am I working on? Two options, figure out why the async tests are failing or move on to permit filtering on the frontend. I think I will drive forward and get the app done so I can at least have it in a finalised state. Tidy up can come later. First, I would love to have Jade templating available to me! Working with raw HTML is tiresome, therefore, let’s get another little gulp task/watch in place. However, the danger of this is that the app directory could get messy, so I may need to implement a proper build/release pipeline. Hum hum, I will skip for now, I don’t want to over complicate things so close to completion. BrowserSync, however, is not something I am willing to skip!

BrowserSync is setup, I could spend a long time tinkering with my setup but now is the time to get things wired together and beautify this bad boy. Buttons are in place, working with protractor is quite pleasurable - when you’re dealing with simple behaviour! What am I working on? Buttons, once a group button has been clicked it should reveal a new set of buttons to limit the locations by that sub-choice. Done, quite a chore but testing my way through proved invaluable as it revealed flaws and as well as lead me to sensible resolutions at the same time.

With a minimal, dirty looking looking, but feature complete list view in place I can now move the same implementation to the map view. The redundancy between the list and map view is screaming at me but I do not wish to merge them at this time. What am I working on now? Bring the map view up to the same feature set as the list view.

22/10 Saturday

The weekend arrives and I can finally dedicate a sufficient time window to grind on the little angular example I putting together. So far it is moving in a very fluid manner. Jasmine and protractor are in place and guiding the implementation beautifully. I had frankly forgotten how much I enjoy developing once I have a comprehensive test suite to guide the way. I will definitely try to replicate this with backend development by initially testing the interface to the database and then moving on towards the center of the app using the abstracted data model as a guide.

To the usual question: What am I working on? With protractor in place and working I can now proceed with having the List view render the ‘database’ json properly. Easily achieved with a neat ng-repeat! First, however, let’s write a protractor test to guide the implementation - hoozah!

Turns out testing promises is a giant bitch - until you discover the async handling functionality within Jasmine… I don’t understand at all but if you’re trying to play with promises then it is necessary to utilise the option done parameter and function in jasmine:

it("test a function that resolves a promise", (done) => {
        functionWithPromiseInside()
        done()
        expect(...)
    })

Now that I have the promise handling settled for the list controller we can consider plugging in an angular map module. Settled on ngMap. This is a library I have at least investigated before (due to a previous project). The API is very simple and seems very consistent with the angular philosophy and design. With a simple map implementation complete, time to ask my self that question again as a prompt for tomorrow:

What am I working on now? The only thing remaining on the little architecture diagram is the groupLocations which will handle some filtering over the locations and dynamically update the list and map.

21/10 Friday

Today will consist of another of collaboration as we finalise an ETL feature. A few more benefits of using this self-dialogue/journaling format I have been noticing.

Alrighty, work has come to a slow down for Friday which grants me a little freedom to work one some of the Angular test project. I also took some time to indulge on some videos - as usual. I am a big fan of this fellow and his videos he has a very calm and humble approach to his presentations which is incredible refreshing compared to other rather more… excitable speakers! Of particular note is his Supercell project which is a Gulp build/asset management system that is very reminiscent of a system I pieced together centered around HarpJS, though it suffered with some awkward issues when trying to pair it with BrowserSync. I suspect my life would have been simpler by taking Mr. Longie’s approach and keeping everything inside simple Gulp tasks.

The test app is taking good shape now that I have conquered the annoying Jasmine setup issues. The database server interface service is in place and tested to the extent it needs to be. Next I’ll plug it into some controllers before moving onto the templates. I quite the idea of this implementation mentality, as if I am moving naturally from the server to the next step in the sequence of data flow events

request => server => response => service => controller => view

Let’s begin by defining each controller with a respective test driven by its respective test set.

20/10 Thursday

Unknown applied_to_invoice_item_id rate_plan_charge -> unit_price should reflect value before charge. -> charge_amount should have discounts etc applied before reaching InvoicePayment Difference/meaning between: dmrc, dtcv, tcv, mrr Meaning of invoice.status

Today saw another day of collaborative work which, as noted previously, tends to be a productive thing. With the evening here I turn my attention back to yesterdays Angular app. I finished with a small UML diagram outlining the structure of the app, next up I want to setup a small dev environment so I can test my way through the implementation of the apps services. Gulp is my task runner of choice so let’s get that set up with jasmine/karma as my test runner.

Recalling and implementing the relationship between Angular/Jasmine/Karma took up a significant portion of my night but I know the payoff is unparalleled.

19/10 Wednesday

Good news for the morning, the ETL automation/validator mostly working as expected. Got a couple of things to iron out but mostly I am happy. Things to fix include: Some scripts pointing to the incorrect database, testing truthiness of a Pandas dataframe is not as Pythonic as it should be so this is causing one validator test to fail.

This morning, indulging in a new programming youtuber - funfunfunction Meanwhile fixing the issues mentioned above, immediate conundrum is that I don’t properly understand a particular library in the project that I inherited. It’s the Python cron library. A small issue arose because I was inappropriately trying to locate bash in my virtualenv bin directory! So a short effort to pour over the docs is in order.

Not too long to discover that my attempt at creating a custom job type didn’t work out as expected - trying to find the bash command in a specific file is not smart!

With all immediate problems resolved I have been tasked with piecing together an ERD diagram to represent where the data comes from that squeezed into the report dataset, suspect this to will take up a good portion of my remaining time today.

ERD diagram came out quite nicely, very pleased with the workflow Atom and PlantUML offers.

A small… personal project to consider, here’s the specs:

Build a single page app. Given a json file with city and country coordinates

  1. Group those location by country and city.
  2. Show the location with address, city and post with its location in map.
  3. Provide documentation about technology used and flow.

Naturally, with a simple spec like this, I turn to my go to JS tool - Angular. I’d like to experiment with using UML to guide to construction and planning of the app. It doesn’t need to be deep and thorough, just enough to detail the main components of the app and possibly data flow.

An Angular app, given the spec above, should consist of:

  • Home View
  • Search View
    • to inspect the data
  • Map
    • defaults to showing all points grouped by country and city

First up, let’s get the json data behind the great json-server package to provide a convenient. Got a nodeenv environment setup, so npm install away.

18/10 Tuesday

Paid signups

  1. sourceid
  2. cid
  3. utm_source
  4. utm_campaign data =>
    1. sourceid
    2. cid
    3. utm_source
    4. utm_campaign
  • if sourceid == int() AND IN legacy paid codes
  • if sourceid contains any current paid codes
  • if cid contains any current paid codes
  • if utm_source contains any current paid codes
  • if utm_campaign contains any current paid codes
  • Same approach for data

Business Determination

  1. business_unit
  2. sourceid
  3. cid
  • if business_unit != null then use it
  • if sourceid regex 4th bracket
  • if cid regex 4th bracket

Another day of work that saw the journal neglected, however, today consisted of a rather more legitimate set of circumstances that couldn’t be avoided. Primarily: pair programming. The journal format, I suppose, could be considered a poor substitute for pair programming. It emphasises a continuous dialogue to promote problem solving and self reflection. This is something I feel pair programming excels and improves upon, frankly, there is no comparison in terms of productivity! So I have not ended my work day feeling as though I was impeded by the lack of dialogue/productivity/problem solving.

This evening I have a few things to tackle to round off the day, 1) setup clojue on my machine, at least to the point where the REPL is working, 2) flesh out my thoughts regarding the new ETL system, 3) email Remin users and promote an uptick in activity.

Clojure

Turns out I got this far before, or perhaps MacOS is shipping with Clojure now? Brew install gets us rocking:

brew install leiningen

Though leiningen is the… pip of Clojure which installs the Clojure language as a dependency when you try to install it. Very smart. Attempting to upgrade leiningen via its command line utility prompts you to upgrade it via the package manager used to install it initially. This is the kind of foresight and planning that Clojure’s author seems to embody - excellent. Group through a simple introductory web server tutorial which was enlightening enough. Nothing particularly special to note, it just felt like a normal web framework! It would just require a little practice to develop a familiarity with the syntax and get some autocomplete on the go, but after that I can see it becoming intuitive. One exciting element was the fact I could define html in raw Clojure, felt a bit like React, very much in favour of this - maybe I should write a similar templating engine in Python….

Can’t see much more happening tonight, sleep calls.

17/10 Monday

Last night got a little lost as I tried to shoe horn a large sql query into a Pandas dataframe. It didn’t seem reasonable to me that 800k rows was too large, though it may have been that the server running server running jupyter wasn’t happy with the load I was expecting it to store in memory.

Excitingly this is the first day at work where I will be leveraging this new journaling technique for focus and working through problems. I have already blasted through at hour of work without falling back on the journal which implies this is going to require a little more deliberation on my part. Frankly I am already feeling the benefits of this approach - my weekend was possibly the most productive I have had all year!

Priorities for today: fix same bugs that have emerged in the etl report generator. They creeped in during an update that was merged last week but due to some error handling their influence on the dataset was not only minimised but also silenced - definitely not desirable. This raises a good point - occasionally observing your report output is valuable, especially if you are pumping errors to STDOUT.

Changes look good, sprint reviews are up-to-date, time to push a branch and wait on review. I really enjoy leveraging jupyter notebooks for this kind of work. It is trivial to load an entire file into a notebook and run it within the context of the whole project which enables very quick, independent, non-destructive changes to that part of the code base. Additionally, with a remote notebook working on the staging server I can even run it in an environment that best mimics production. Even cooler, if I find that a file, referenced by the unit of code I am interested in, is broken, I can simply drop that unit into an adjacent cell and make similarly quick changes. Finally, upon completion, I can quickly copy all the code and their respective changes back into an editor, preserving source controls authority over the changes I have made. The only disappointment being that this does not permit a very dynamic relationship with TDD. I will have to experiment with the Atom editor and its Hydrogen plugin to see if this workflow can be improved.

So let’s cast attention back to the issue of generating a neat set of raw cohort data. Last I got stumped by unresponsive Pandas dataframes on jupyter. This time let’s try moving to my local develop environment and using a more limited dataset to perform the same actions. Unfortunately, as I was about to push my changes I ran into some failing tests. Turns out a lapse in protocol saw me add a new database query that slows the whole test sweet down a great deal. Unbearably slow!

Coach bound again. This was the first day at work trying to leverage this new journaling technique. I can’t say it went particularly well. Whether I simply need more discipline to interweave my work with pouring internal monologue into this text file or perhaps my goals are well defined enough in the work place that I don’t need to rely so heavily on it. However, I would up feeling the day was not as smooth as it aught to be but I cannot determine whether this was because I was trying to use the journal or because I didn’t use it rigorously enough!

I’ve been undergoing a rather committed series of productive procrastination rounds. It started with a a recommendation by Uncle Bob to checkout some vidoes by the author the Clojure programming language. His recommendation was on point, the man oozes sense and clarity. More than this, though, his topics, which don’t necessarily reference Clojure, definitely highlight the strengths of functional programming. This in turn led me to splurging on more and more of his videos, there aren’t too many, I just watched the last one, which was an introduction to ClojureScript, a dialect of Clojure which compiled down to Javascript, smart. I have had a fairly distant affection for Lisp every since discovering Emacs but it eluded me up until this point, mostly because I couldn’t justify the time investment. Clojure, a dialect of Lisp, appears to provide a rather broadly applicable (being compatible with both Javascript and Java run times!) Lisp that could be worth the investment! Something to explore this weekend.

For now, I will give a little more attention to my raw cohort generator. The main blocker currently is simply the feedback cycle. Trying to juggle so much database interaction causes a very laborious wait time. So the thinking goes, if I can have my generator work on a subset of data, it aught to work on the entire dataset! This will require some heavy testing but I think that is an acceptable, and desirable, compromise. Therefore, first thing to consider, how can I sensible limit the amount of data I am playing with while maintaining confidence over the results.

I have constrained my… data window in two ways, 1) only using one source of sales data and 2) limiting the scope to 1000 sales. This has greatly improved response time and I can now begin applying logic to extract the ‘latest paid signup’ for each sale.

16/10 Sunday

Mall trawling (and sleeping!) taking up much of my day. Mall time is usually put to relatively good use for reading and thinking about any problems I have at hand. So numerous articles were tidied off. One little project that has been creeping in and out of my consciousness is a rewrite of this little diamond: Choreographer. I would like to get it rewritten in Typescript, both for my own interest and the longevity of the open source package as a whole.

Anyway, key thing on the agenda for today, finish my reading of Clean Code. 2/3 of the remaining chapters were completed before leaving the house this morning. They are intriguing reads as we get to follow Uncle Bob through some real world examples where he tidies up old libraries. The final chapter, which I am most keen to absorb, is his suggestions for ‘code smells’, patterns within source code that may represent places to start thinking about tidying up and better abstracting your logic.Interestingly, his list builds on Martin Fowlers own list of things to watch out for which will make for a nice bit of reinforcement!

Uncle Bob, his books, his videos, his articles, ooze with sense and it’s a pleasure to read his content and be blessed to be part of the generation who can learn from a wise set of pioneers who waded through the early days to bring us their stories and save us from strife.

Raw cohort generator

With that book settled a new one chosen (Pregnancy material, hoorah) it is time to turn my attention back to a work goal. I’ve been grinding away trying to streamline the cohort report generator by organising functions and database queries that can serve to create a ‘base’ cohort dataset, pre-transformation and creation of custom fields. If that can be achieved in a performant manner then generation of the full cohort dataset becomes trivial - and damn fast. The latest blocker to this data tucked away in a replication database that prevented bulk exports due to the way replication streams work in Postgres. Maybe the setup could be altered to permit large queries from certain sources but I don’t know nearly enough about Postgres replication setup to gauge that! Therefore, a little negotiation scored me an additional export script run by the usual servers backup run. Now I can easily access the two tables I need by scp’ing that data across to our Business Intelligence server each day. It’s dirty, it could be better, but this is the cheapest and easiest to swallow option from a business perspective.

First task: Get the newly localised data queries working. One benefit, and pain, of working on a ‘hand-me-down’ project is that you uncover nuggets that end up being remarkably useful. Upon taking over this project it was our plan to consolidate all processing for the primary report into one streamlined process. Meaning, in one step, in one function call, we could generate the entire report data set. It so turned out that this resulted in a lot of iteration and subqueries within the that one call. Ultimately, this resulted in a particularly non-performant system. The original author had the wise idea of precompiling a lot of the data before hand leaving the final output fairly easy to produce through a few sql joins. Initially it seemed reasonable, smash all the disparate logic together into one smooth process resulting in one out, rather than a multitude of intermediate steps. I think this was a mistake. Therefore, the next step I need to give my attention is producing a new, cleaner, framework free signup associator pipeline.

The current system is bound to a python library called Luigi. This is a task management library, written by the folks over at Spotify, that has a number of fantastic features that make it a smart choice for building ETL systems, especially if you’re playing with ‘big data’. Our use case, however, is rather more manageable. By manageable, I mean, we can and do execute every step of our ETL pipeline on a single server. Therefore, one of my first observations was that this was overkill for our requirements and now, with rewrite in progress, it was one of the first happy sacrifices made. Personally I am in favour of an extremely simple model reliant on SQLAlchemy and Pandas to coerce the data into the expected format.

So, let’s this about the problem at hand. I need to gather all ‘last paid signups’ into a table. This means associating one of potentially many user signups to a particular purchase. In the last year we have migrated to a new commerce platform, Zuora, which means we are in the difficult phase of needing to manage two sources of new (and old) sales, these are: Zuora and an internal system called AWC. So, we need to extract purchase information from both sources and attempt to pair it with a single signup that can be found in another distinct data source called OK. Using email addresses we can make a simple association to the correct user/signup set but there is a variety of complicated and obscure legacy business rules that need to be satisfied in order to correctly pick our the signup most likely to carry the signup information and marketing source than we can assign to that purchase.

The first simple step I foresee - extract all orders ids from both sales/purchase sources along with their associated ‘order items’, meaning, each order will potentially have many items within the order, much like a shopping cart. A little juggling with SQLAlchemy syntax to discover the appropriate DISTINCT ON equivalent (session.query().distinct()) and I now have a couple of queries to get all orders/invoices.

15/10 Saturday

Fix remin memory ordering

Cohort validator has been pushed and awaits judgement. Next on the work agenda is to complete an all-in-one script to generate the basic set of cohort data pre-transformation with as little calls to the database as possible. For today, however, I will turn my attention to Remin.

I am satisfied with the look and feel of the website for now, there is mainly a few minor issues that need to resolved. Chief of which is the order that added ‘memories’ are added to the stream of user uploads. The root of the issue lies here:

div(ng-repeat="memory in vm.memories | orderBy:'-$id'" ng-cloak)
  gz-memory-brick-directive(memory="memory")

An Angular template iterator that referenced one of vm.memories properties to order the whole list by. First let’s do a quick inspection on the firebase dashboard to see if I can spot some differences between out of place images and those uploaded before. Immediately, but correctly, side tracked myself to investigate the console output (which displays all memories obtained from the database). This showed me my error fairly quickly. Filtering by $id seems smart, if we were talking about a standard database then it would be the wise choice but the AngularFire library creates an seemingly arbitrary string id like: -KRbDBdxwISnsKNb5oU4 which does not lend itself well to ordering! A quick switch order based on timestamp settled it.

Next, there was a small issue when people first signed up where their profile was not saved correctly. I need to manually update a few users to fix this. Manual administration complete. A small observation I noticed is that, email signup users are having their profiles created correctly (after my the fix) but google/facebook signups may not be, this warrants investigation. Yes, definitely an issue for social signup users. It appears the redirect causes the registration call back to be skipped, which was where profile creation is taking place. Two actions are necessary. 1) Move the createProfile function to its logical home within the profile service and 2) permit creation of a profile - if it does not exist - at sign in time. This will better capture users at a point post registration.

Clean Code

After successfully, and satisfyingly, cracking on the Remin issue earlier I healthily side tracked by family matters and - Rich Hickey. Mr. Hickey, a tremendous communicator, has many intriguing news bytes to feed you. Very much recommend checking out his talks.

Now that my two (work/personal) project priorities have been tidied away I have turned my attention to finally absorbing some clean code. So it turned out I was quite close to finishing the book all along. I had been a bit neglectful having left the book to fester in my bag for sometime but, after refreshing my memory with the books table of contents, I have already digested (and remarkably retained) a good portion of the content I felt applies to me. All that remains is to walk through the practical chapters where Uncle Bob steps through a couple of examples of modules he has refactored. Let’s see if I can tick this one off my list tomorrow!

14/10 Friday

Alright, priority today is to get ETL validation deployed. All that is left to do is testing out the script server side and adding in some slack notifications if anything fails.

Fix blog scrolling

Meanwhile, however, I wish to wrap up a dirty issue with this blog. Unfortunately the theme designer did not account for funny individuals like me wanting to use the home page for lengthy content! Therefore, the view you’re reading here needs to become scrollable. Stack Overflow (SO) hunting yields a few suggestions. Essentially, the container needs to permit text overflow using overflow: auto or overflow: scroll. My problem is two fold: 1) testing the site locally does not render the split view correctly - appears to be something to do with the view port size not setting appropriately:

Chrome View Renders strangely

Interestingly, using the Canary (Chrome developer browser), the site renders correctly!

Canary renders properly

The myriad of class style dependencies is not working in my favour. Though a small win was obtained by removing one style which allow this journal content to occupy much more horizontal space. It appears to be one thing to apply the option for scrolling on an element, but if that element does not reflect the height of your content then it is probably not the appropriate target. Yes! Applying the overflow:scroll property to the immediate parent of where this journal is inserted (<div class='about-info'>) we finally get to see a scrollbar. The visual presence of a scrollbar is fixed, but the journal content container still doesn’t recognise/respect the actual of height of its content. Initial SO suggestions pointed towards making the container display:block which a quick Mozilla docs search tells me, very usefully, that this CSS statement turns an element into a ‘block element’… My naive understanding of CSS tells me that a div is already a block element (meaning it can contain other inline) elements.

Alrighty, finally sat down again. A tiny addition to the _site.scss saw my rendering woes settled, I think it can be safely added to the repo. Next eventual update, I would like to see collapse the .block-left portion as the user scrolls down the to read more content - we’ll see!

README’s

While in the car I read a neat piece concerning project README files, their importance and some decent guidelines for how they should be structured. The crux of it for me was:

README files should be organised and sufficiently succinct such that the prospective user can reject your package as soon as possible.

This may appear quite counter intuitive but it actually makes sense. We write and publish open source packages altruistically, for the benefit of other. Therefore, we care not for download numbers or how many big players are using the package. The package is either useful or not and we should help the developer determine whether our package fits their use case with as little reading as possible. Find the article here: Art of the README

ETL Validation notifications

First things first, I need to confirm is this little bash script will execute as I plan. My effort of late has been steered towards integrating Jupyter Notebooks more seamlessly into my work flow and delivery. A very simple script for my validator will take a directory of notebooks, convert them to python scripts and run them.

#!/usr/bin/env bash

# Remove all old, potentially outdated scripts.
rm scripts/notebook_scripts/*.py

# Convert notebooks to scripts recursively
jupyter nbconvert --to script scripts/notebook_scripts/*.ipynb

# Run all python scripts found.
for f in scripts/notebook_scripts/*.py; do ipython "$f"; done

I’m really excited about this. Notebooks feel as though they bring me closer to that literate programming dream. A more exciting and stream lined script discovered recently threatens to take this to the next level! This current implementation is exciting because I can seamlessly add more test cases to my validator, update existing cases or directly inspect the output of ones that have failed - all from the comfort of my browser!

Alas, I must move on. The script above does what I intend but I had a small issue yesterday with the script dropping the converted scripts in the wrong place. Aside from a few typos and unnecessary recursive wild card use /**/*.py everything looks great.

The script is running now - if it works out happily, we can move on to slack integration. The SlackClient library is a simple thing to include, we already use it liberally to stream information from our servers to Slack, this makes for a wonderful, seamless stream of insight from our servers right inline with our primary mode of communication.

def send_message(message, channel_id='#bi_logger'):
    slack_client.api_call(
        "chat.postMessage",
        channel=channel_id,
        text=message,
        username='Jupyter',
        icon_emoji=':robot_face:'
    )

With the script looking good (locally) I feel confident enough to convert the assortment of assert statements into some sort of try => except => SlackSpam

Bulk extractor is happy but wrestling a little with the SlackClient which don’t want to pass properly to slack. A nice to have at this point, no need to blast anymore time on it.

Good progress. Wiring up the slack messages is trivial but having them appear in an informative and digestible format is not easy. Passing in rudimentary markdown syntax is possible from the function above e.g. *ValidatorError* but this is about the limit. Ideally I would like to color code the messages I pass over so that the readers eyes are automatically drawn to the highest priority messages. Another issue encountered was a path issue with the notebooks. Unfortunately the notebooks are in a nested directory outside of the primary app rendering module look up difficult. While running scripts this is not a problem since they are looked up from the root of the app. Therefore, I get to have a little fun exploring lndir and other options for symlinking an entire directory tree - this worked out to be a convenient solution.

13/10 Thursday

So hear goes. Couch bound. Spent part of an evening updating this neglected blog. It has been beautified with the Halves theme and adapted a little to include this little blog feature. A little dissatisfied with discovery of Markdowns lack of Subscript support having to resort to something like this:

<sub>
*Work:* ETL
</sub>

The intent of this journal will be a little obscure to the outsider. It is not meant to be a emotional or purely psychological reflective dialogue, but rather have a more focused and work driven orientation.

Some benefits I foresee are:

  • Effective self dialogue for working through problems and planning implementation.
  • A filler to occupy my prolific procrastination windows. My hope is that this stream of thought will help to guide myself back to the work at hand.
  • Since this journal does not have the polish of a public facing blog (despite it being public!), I am free to litter it with code snippets that I find interesting. This will serve a potential future benefit of being easily searchable in this giant text file!