Wednesday, 3 June 2015

Principles of Reactive Programming - Coursera MOOC - Review

The recently concluded Principles of Reactive Programming on Coursera was a good introduction to the paradigm of Reactive Programming in the Scala programming language. It was a kind of sequel to "Functional Programming Principles in Scala" from last year.  I say kind of as you can still take this course without taking the first one provided you have familiarity with Scala and functional programming ideas.

In a nutshell, here is what I think about the course.

It's an introduction to a different mode of concurrent programming, to reactive principles, all using Scala libraries. It does not go into much depth (which is probably a drawback of most MOOCs) but provides a foundation on which one can build. For example, I can dive deeper into Actor programming now that I know the fundamentals.

- Great introduction to Reactive Programming
- Instructors are experts in their fields (Martin Odersky, Eric Meijer, Roland Kuhn)
- Assignments corresponding to every week's topic

- Differences in teaching styles and video content among the three instructors make the ride jumpy. Or maybe I am just spoilt after taking Martin Odersky's Functional Programming course - which was superb. For the lectures on Actors, Learning Concurrent Programming in Scala has a chapter on Actors which I would recommend to be read first before viewing the lectures. The same is true for Futures.
- Assignments are completely test driven. That is good for grading, but passing the test is just the first step. Ensuring that your code is written using the finer points of the principles taught is up to you. You might get 10/10 using the automated test grader but your code might not be "correct". I had this experience in the final assignment. This has been pointed out by many in the forums too.

Overall, it's a must-take course if you plan to learn about Reactive Programming.

Tuesday, 2 December 2014

Effective email communication

Communication and its various nuances always fascinate me. There are times when I realize, not always too late, that I have failed in communicating what I wanted to convey. It always ends up being a learning experience for me.

For most people, the word "communication" seems to remain confined to what one says or writes. But it's far, far more than that.

I wanted to share a few tips I have learned about effective email communication over the years. I've picked these up from observation as well as from friends and colleagues. I still commit some of these mistakes when I'm in a hurry but I hope I am getting better.
  •  Know your recipients. Tailor your email accordingly. Put yourself in their situation
    • Their awareness of what you're talking about. Do they have prior context and how much? 
    • Their environment e.g. Sharing a URL in your email that works only on Chrome (and they use Firefox), or sending URLs that don't work outside your office network.
    • Their focus e.g. Are they likely to single out one out of multiple points in the email and downplay the rest? How do you address any concerns that the recipient might have? Thinking about these beforehand might you save an email iteration or more.
  • Make your intentions clear. If there are actionables, point them out. If you know the owner of the action, point him/her out. If you don't, ask. If it's not an actionable email, mention it (FYI, JFYI) and explain why you are sending the email. 
  • Use a meaningful subject line
  • Use To, Cc and Bcc carefully
    • If you're addressing one or more people in the email body, you can put them in the To field
    • Be careful while Bcc'ing. If the Bcc'ed person does not realize she is Bcc'ed, she might respond to all and then everybody will know, which you might not have intended. If you're the Bcc'ed person, it's upto you to check the email headers and be cognizant of this.
    • Be careful while clicking Reply. You might have meant Reply-All. Gmail/Google Apps Mail have a setting where you can set Reply All as the default.
  • If the email thread has been going on for sometime, it's helpful to summarize everything, including repeating what has been already said, when a conclusion has been reached. 
  • Don't clear the previous content when you respond. People often have to look at the whole thread to regain context.
  • If the thread has forked off to another topic, or you want to do the forking, change the subject to something appropriate that suits the new topic.

Somebody said "Communication is about the receiver". If my recipient does not get what I'm trying to convey, I have failed, and not the recipient. This might sound extreme but it's an effective ideal to work towards.

Saturday, 23 November 2013

Graphite Tip: Disabling data averaging while viewing graphs

Graphite, the superb graphing tool, has gained a lot of popularity lately and with good reason. It's flexible, fairly easy to setup, very easy to use and has a thriving community with plugins for many monitoring systems. It can store any kind of numeric data over time.

By default, Graphite stores data in WhisperDB, a fixed size database with configurable retention periods for various resolutions. What this means is that you can store higher resolution data (say data for every 5 seconds) for a shorter period of time (e.g. 1 month) and then store the same data at the lower resolution (say for every hour) beyond that time period. The data will be consolidated based on the the method you configure (sum, average). This behaviour of Graphite is well known.

What is not so well known is that Graphite also does consolidation when you view the graphs. This happens when the number of data points is more than the number of pixels. In such cases, the Graphite graph renderer will consolidate the data into one point using an aggregation function. The default aggregation function is average. So you might end up seeing smaller values than you expect.

Here's an example of a graph where there are more data points than pixels. The actual peak value was a little over 200, but you cannot see it here due to averaging.

Here is the same graph (same data for the time span) where the image width has been increased* (== more pixels). You can see the peak is almost 200.

Click to view larger

Sometimes this behaviour may not be what you want. To see the "actual" data points irrespective of what size your image is, Graphite's URL API provides a property called minXStep. To use it simply add the property as a request parameter (with value 0) in the graph URL. From the documentation:
To disable render-time point consolidation entirely, set this to 0 though note that series with more points than there are pixels in the graph area (e.g. a few month’s worth of per-minute data) will look very ‘smooshed’ as there will be a good deal of line overlap.

The same graph with minXStep=0 now looks like this:

A bit "smooshed" but with the exact data that was collected.

* Pass width=x as a request parameter to the graph URL, x in pixels

Monday, 30 September 2013

Revoking private key access to EC2 instances, and other random tips

Consider the following scenario

  • You have many EC2 instances running production code
  • Access to those instances is using a passphrase-protected key
  • A member of your operations team who has access to the key leaves so you have to change the key. Or, you need to change the existing key as a matter of some internal security policy.

How do you do it?

  • Generate a new keypair
  • Add the public key to the EC2 instances' <login user's home dir>/.ssh/authorized_keys
  • Remove the old public key from the same authorized_keys file
  • Done. The old key is useless now.
  • This is not actually revocation

Some things to note about AWS keypairs
  • EC2 metadata for the instance(s) will continue to show the original keypair name it was created with, whatever keys you add or remove from authorized_keys. The original public key may not even exist on the instance anymore, if you have gone through the steps above, but the metadata will still show it. This is because AWS has no way of knowing that you changed the authorized_keys file.
  • You can upload keys generated by yourself to the AWS console and they will be available for use while launching EC2 instances. Your generated keys have to be RSA keys of 1024, 2048 or 4096 bits.
  • AWS keypairs are said to be confined to a single region. This is true only if you consider the default state of affairs. You can get around it.
    • For keys that you generate, you can  import them to all the regions you want using the AWS console or the CLI tools. 
    • For keys that AWS generates, you can take the public key from an EC2 instance launched with that key, and import that in a similar manner to all the regions you want. The private key is available for download when you generate the key.

Friday, 27 September 2013

Private Cloud Options with Amazon Web Services - Part 1

Amazon Web Services is the largest IaaS provider, according to this Gartner report, in terms of compute capacity. AWS also has a wider geographical presence than other similar companies.  

AWS offers an option to have a private cloud inside their public cloud. You can run this as a small personal cloud, or use one of Amazon's connectivity offerings to connect it securely to your existing infrastructure. This is an overview of the private cloud options with AWS, followed by an overview of the various connectivity options.

Private Cloud Options
When you launch a regular EC2 instance, it has a public IP address. It is always reachable from the public internet whether you want it or not. You can configure the instance's AWS security group (the inbuilt firewall) to allow access to specific ports only, but this may not serve your security needs. You might want traffic to flow only between your instances and not from the internet.

The obvious way to do this is to not have IP addresses which are reachable from the internet, i.e., use private IP addresses. Which is exactly what VPC offers.

IP Ranges
A VPC is like a private network inside Amazon's cloud where you can create smaller subnets and instances inside them. While creating a VPC, you'll need to define the range of IP addresses that the VPC will cover.

The basic unit of a VPC is a subnet - a logical network where you can create instances, define the range of private IP addresses that the instances inside it will have and create routing tables to define how traffic is routed to and from the subnet.

Kinds of Subnets
There are two kinds of subnets you can create inside a VPC
Private : EC2 instances created inside it cannot talk to the internet and vice versa.
Public : EC2 instances created inside it can access the internet and can also be made accessible from the internet.

Private and public are just names and not inbuilt properties. What actually makes them "private" and "public" are the routing tables you create and assign to the subnets. So you must first create the subnets, then create the tables, assign them to the subnets and finally give them descriptive names. If you use the VPC wizard, it will do this for you. You can create multiple subnets of each type.

Communication between a public subnet and the internet
If you want your instances to access the internet, you have an option of adding an "internet gateway" to a subnet. The internet gateway here is an AWS abstraction. You would add this to your public subnet (or subnets). Once you assign a gateway, you must assign an elastic IP to an instance inside that subnet. This instance is the one that would be able to communicate with the outside world.

VPC places a limit on the number of elastic IPs (5). If you have many instances which need to access the internet, you would put all of them behind a single instance with an EIP instead of assigning each an EIP, and use NAT to access the internet from the "hidden" instances.

Communication between a private and a public subnet
Setting up communication between a private and a public subnet is a straightforward configuration in the routing table.

A typical example of using both private and public subnets in a VPC is from the AWS documentation:

Here, the database servers are extra-secure inside a private subnet, while the webservers are in the public subnet, as they have to serve traffic to end users. 

The "private" nature of a VPC is not limited to the network alone. Inside a VPC, you have the option of launching a regular EC2 instance, which is a virtual machine on a host shared with other guest VMs. You can also choose to launch a dedicated instance - which is a truly dedicated machine used only by your instance, giving you isolation at the hardware level as well. Costs are slightly higher for dedicated instances.

A VPC lets you setup your own private cloud with isolation at the hardware and the network levels. I'll explore the various connectivity options between VPCs and your own datacenter in the next post.