A couple weeks ago I was in a walk with a friend of mine, he was complaining about the speed of changes in the software development world. In his words, every week something new appears, it can be a new methodology, a new framework, a new programming language and so on, and we, developers, we need to make choices for each one.

Is it worth my time? Is it promising or not ?

And even after we made these choices, every year we have to make them again, maybe a hot technology became obsolete or a technology that didn't appear to be promising is now the hottest in our local market and we need to learn it as soon as possible.

My answer to him was that software development world is like a stock market. Let me explain my analogy.


Software Development, a stock market



stock-market-quotes

For me, the software development characteristics that my friend was complaining about is very similar to a stock market, as in software development new companies are appearing every day (technologies) and investors (developers) need to choose which they will put their money (time). There is only one uncomfortable difference for us, while an investor can choose a large number of companies to put their money, we, developers, difficultly can really learn more than 2 or 3 new things concurrently, it puts a lot of pressure on our shoulders, since 2 or 3 bad decisions in a row can put us in a bad position in the market.

But accordingly my analogy, how a young investor should choose their stocks in the software market?


Choosing stocks



Screen Shot 2014-03-08 at 11.29.28 PM

There are three ways that people win money in a stock market, or they invest in the true value of a company, that is called Fundamental analysis, and it's used by maybe the greatest all-time investor, Warren Buffet, or people speculate about the stock values, trying to buy them when they appear to be cheap and selling them when they're expensive. There is also the small caps, companies who have a low market capitalization and due it are more riskier. In software market we can't speculate since we don't buy and sell technologies, once we learn something, we can't forget it, all we can do is to avoid to invest more time. So, our young investor has two options, the fundamental analysis and the small caps to invest their money (time).

As a young developer, what's the best? Bet in a new technology or choose a well-established and already profitable technology?


Time Travel

Disclaimer: The languages used below are only examples to explain the whole idea, I don't want to provoke any flame war.

Coming back to 2005, Rails is rising, Ruby is as older as Java but with very fewer developers, Java and .NET are already stablished languages. What would have happened if our young developer had chosen Ruby to start his career? And if he had chosen Java or .NET?

Assuming that he had put all his efforts on the chosen language, as a Ruby developer with 8 years of experience, he would be one of the most experienced ruby developers in the world, maybe he could be also a world class speaker and his position would be very comfortable, a lot of startups would be hunting him, after all, he would be so rare!

As a .NET developer or a Java developer with 8 years of experience, he probably would have a good job, but he wouldn't be one of the most experienced developers in the world, after all, there are developers using Java for 15 years, these will be the best choice when a company needs a true specialist.

We can use the same comparison with other periods, for example, the rising of iOS and Android, the rising of Big Data and Machine Learning, Cloud Computing, Node, Clojure, Scala and so on. Of course, some of these bets can be or could be bad choices, as happened, in my opinion, with VB.NET, Windows Mobile, JME and Silverlight, but for a small cap investor it's part of the game.


My opinion

IMHO, I would advice our young developer to bet in new technologies/methodologies, it represents a big chance that if the chosen technology achieve a good market share, he will be a world class specialist, otherwise, he will be pursuing the same path already taken by other developers, but the other developers will always be more experienced.

Of course, we know that experience in one technology helps a developer to learn another one, but, for a big data job, I would still hire a developer with 8 years of experience in Big data rather than another that was a web developer for 6 years and 2 years ago started to develop big data solutions.


And you?

And you?

Which advice you would give to a young developer?
Do you prefer to be an early adopter or you wait until it's mature enough?
Do you have some experience about your choices to share with us?

Last weekend I gave a talk in a local event about the C#/.NET ecosystem, the audience was dominated by students and I tried to show to them how they can analyze a programming language, points they should observe and after I analyzed the C# through these points. The post below is a compiled of this talk.

Updated – Discussions

A huge discussion is happening on HackerNews about this theme:

Disclaimer: All the points below reflect my and only my opinion. If you agree with me, cool, if not, give your opinion and we can start a discussion, after all, blogs are made for it. Welcome and Good read!

Post Outline

  1. Introduction and Motivation
  2. Language Features
  3. Generalist/Niche
  4. Tools
  5. Costs Involved
  6. Community/Open Source
  7. Future

Introduction and Motivation

In the last two years I started to participate a lot of the local software community (Salvador, Brazil), I gave talks, I organized meetings and even founded two user groups (.NET Salvador and Dev In Bahia).

As a result, I met a lot of people and I made a lot of friends too. But these new friends had different backgrounds than me and my colleagues, while we work with C# and the .NET stack, these new friends are people from PHP, Ruby, Python, Java, Scala and so on, we assembled a group where diversity is a strong characteristic. In a scenario like this, we have two options.

First, we could become enemies and start long flame wars like Java x .NET, Ruby x Python and these stuff that we see children discussing about.

The other option is try to become a better developer and learn how other programming languages work, how they solve problems, how is their market, their tools and how they are prepared to face the future.

The next sections are a result from a reflection that I did about my toolset(strongly based in C#/.NET), how it can be compared to other toolsets and how it’s prepared for the future.

Language Features

In this section, I want to analyze the features that the language have and how the language is evolving in the last years. IMHO, C# has a lot of cool features and Microsoft is making an excellent job through the last years keeping the language in the edge. Below is some of the features/characteristics that I like in C#.

Static Typing

Programming languages can have dynamic types or static types, they also can have weak types or strong types (a good explanation about it here). C# has strong and static types, it means that it can rely on compiler to catch some syntactic errors, it has a better performance since it’s compiled and it also has good tools for code completion, code analysis and refactoring support.

It’s a very controversy topic, and I know we can be less bureaucratic and more productive with dynamic languages, but most of my work is develop software for enterprises and in this environment, software normally last for 5, 10, 15 years, a lot of developers work in these projects, they have different skill levels and they normally don’t write tests (unit or integration), so at least with static typing I’m guaranteeing that they won’t add any syntatic error.

    // A Javascript Code with dynamic typing
    var student = {
        Age: 17,
        Name: 'Paulo Ortins'
    };

    function printAdultStudent(student) {
        if (student.Age > 18) {
            console.message(student.Name);
        }
    }
    // The same code now in C#, more bureaucratic but with type checking
    public class Student
    {
        public int Age { get; set; }
        public string Name { get; set; }
    }

    Student student = new Student()
    {
        Age = 17,
        Name = "Paulo Ortins"
    };

    public void PrintAdultStudent(Student student)
    {
        if (student.Age > 18)
        {
            Console.WriteLine(student.Name);
        }
    }

Type Inference/Annonymous Type

Despite to be statically typed, the C# team is always working on syntactic improvements to provide us an easy way to write static typed code. One of these improvements is type inference that avoid us to type twice a variable type. The other is annonymous type that make possible to write variable without a predefined type and still be able to rely on compiler to catch type errors.

    // Type Inference
    Student student = new Student();
    var student = new Student();

    Dictionary students = new Dictionary();
    var students = new Dictionary();

    // Annonymous Type
    var student = new {Name = "Paulo Ortins", Age = 23};

    student.Age = 25; // ok, with intellisense
    student.Aeg = 25; // compiler error

Extension Methods

Extension Methods is a feature added in C# 3.0 (2008) that enable us to add new methods to existing types, without using inheritance or modifying the original type, adding power and expressiveness to our code.

Let me give some examples of how extension methods can be used:

1. Replace the ! operator. The ! operator doesn’t add expressiveness to our code, actually, it’s sometimes misread by developers, I prefer the way that Python and VB.NET handle with contrary operations that is through the not operator. Below is how we can handle it using C# and Not:

    class BoolExtensions
    {
        static void Main(string[] args)
        {
            Console.WriteLine(false.Not()); 
	    // use .Not(), for me, is more elegant than use the '!' operator
            Console.WriteLine(!false);
        }
    }

    static class MyExtension
    {
        public static bool Not(this bool flag)
        {
            return !flag;
        }
    }

2. Add more power and expressiveness when working with lists.

    class Person
    {
        public int Age { get; set; }
        public decimal Grade { get; set; }
    }

    class PersonExtensions
    {
        static void Main(string[] args)
        {
            var people = new List();
            var approved = people.Approved();
            var adults = people.Adults();
            var approvedAdults = people.Adults().Approved();
        }
    }

    static class MyExtension
    {
        public static IEnumerable Adults(this IEnumerable people)
        {
            return people.Where(x => x.Age >= 18);
        }

        public static IEnumerable Approved(this IEnumerable people)
        {
            return people.Where(x => x.Grade >= 7.0M);
        }
    }

LINQ

Language-Integrated Query is a set of features added in C# 3.0, based on extension methods and functional programming, that add powerful query capabilities to C# as part of the language. Traditionally, queries are expressed as strings, for example, a SQL Command, without any type checking or code completion support and even worst, we have to learn a different query language for each data source (SQL, XML and so on). LINQ provide a single and simple way to develop queries, we can write queries against our typed collections using C# code and relying on code completions and compile errors to guide us and avoid misspelled fields. You can use LINQ to query against SQL, XML, DataSets and Collections. Below are some LINQ samples, you can find more examples here.

    public void LinqWhere()
    {
        int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };

        var lowNums =
            from n in numbers
            where n < 5
            select n;

        Console.WriteLine("Numbers < 5:");
        foreach (var x in lowNums)
        {
            Console.WriteLine(x);
        }
    }

    /*
    Numbers < 5:
    4
    1
    3
    2
    0
    */

    public void LinqGroup()
    {
        string[] words = { "blueberry", "chimpanzee", 
		"abacus", "banana", "apple", "cheese" };

        var wordGroups =
            from w in words
            group w by w[0] into g
            select new { FirstLetter = g.Key, Words = g };

        foreach (var g in wordGroups)
        {
            Console.WriteLine("Words that start with the letter" + 
			      " '{0}':", g.FirstLetter);
            foreach (var w in g.Words)
            {
                Console.WriteLine(w);
            }
        }
    }

    /*
    Words that start with the letter 'b':
    blueberry
    banana
    Words that start with the letter 'c':
    chimpanzee
    cheese
    Words that start with the letter 'a':
    abacus
    apple
    */

Functional Programming Support

C#, like Ruby, Python, Javascript and Scala, also provides to us functional programming support. We can pass functions as parameters, we can use functions as variables of and also return functions from another functions. It enables us to write more declarative code, with fewer lines of code and consequently less error prone. Below is some examples.

    class FunctionalExamples
    {
        class Person
        {
            public int Age { get; set; }
            public decimal Grade { get; set; }
        }

        private static void Main(string[] args)
        {
            // Action
            var person = new Person() {Age = 19, Grade = 7.0M};
            Action printAge = () => Console.WriteLine(person.Age);
            Action printGrade = () => Console.WriteLine(person.Grade);
            PrintSomething(printAge);   // 19
            PrintSomething(printGrade); // 7.0

            // Functions
            DoAndPrintMathOperation((num1,num2) => num1 * num2); // 50
            DoAndPrintMathOperation((num1,num2) => num1 - num2); // 5
            DoAndPrintMathOperation((num1,num2) => num1 %2B num2); // 15
            DoAndPrintMathOperation((num1,num2) => num1 / num2); // 2

        }

        public static void PrintSomething(Action printFunction)
        {
            printFunction();
        }

        public static void DoAndPrintMathOperation(Func mathOperation)
        {
            const int num1 = 10;
            const int num2 = 5;
            var result = mathOperation(num1, num2);
            Console.WriteLine(result);
        }
    }

Async and Await

Asynchronous Programming is essential when we are handling activities that are potentially blocking, for example, when we are accessing a Database, or accessing a Web page or working with files. These activities are normally slow and, when executed in synchronous processes, they can stop the entire execution. Asynchronous Programming allow us to execute these tasks without forcing the application to wait for them, improving the responsiveness and scalability.

C# 5.0, brought two new keywords, async and await, that make easier to write async methods. Below, I wrote a example comparing sync methods with async methods and you can see how easy is to write async methods with these new keywords and how it can be used to improve application performance.

 

private static void Main(string[] args)
{
    Action runAsync = () =>
    {
        var siteLength = GetSiteContentLengthAsync("http://msdn.microsoft.com");
            siteLength.ContinueWith(x => Console.WriteLine("COMPLETED"));
            Thread.Sleep(1000);
        };

    Action runSync = () =>
    {
        var siteLength = GetSiteContentLengthSync("http://msdn.microsoft.com");
        Console.WriteLine("COMPLETED");
        Thread.Sleep(1000);
    };

    Profile(runAsync);
    Profile(runSync);
}

public static void Profile(Action action)
{
    var initialTime = DateTime.Now;

    for (var i = 0; i < 10; i%2B%2B)
    { 
       action();
    }

    var endTime = DateTime.Now;
       Console.WriteLine("{0} ms", (endTime - initialTime).TotalMilliseconds);
}

async static Task GetSiteContentLengthAsync(string webSite)
{
    var urlContents = await new HttpClient().GetStringAsync(webSite);
    return urlContents.Length;
}

static int GetSiteContentLengthSync(string webSite)
{
    var urlContents = new HttpClient().GetStringAsync(webSite).Result;
    return urlContents.Length;
}

/*
COMPLETED
...
COMPLETED
10114,1591 ms

COMPLETED
...
COMPLETED
23363,9914 ms
*/

Generalist/Niche

One of the points that I consider when I choose a new language to learn is how this new language increase my tool set and C# increase it a lot, with C# we can develop web, mobile and desktop applications, we can also develop kinect applications and big data solutions. In other words, it delivers solutions to almost every problem that we face nowadays.

As Java, C# also runs on top of a VM(Virtual Machine) called CLR (Common Language Runtime). The compilation process is also the same. The compiler converts a .NET language to an intermediate language called CIL (Common Intermediate Language) that is the same as Java bytecode and in runtime the CLR converts this intermediate language to native code. Mono project offers an open source implementation of .NET Framework, C# and CLR, it enables C# to be used for different purposes and in different platforms.

400px-CLR_diag.svg

Figure 1: .NET compilation pipeline.

Web

C# has some frameworks to help us to develop web applications, the most used are ASP.NET MVC and ASP.NET WebForms, both developed by Microsoft. The community has also developed their own frameworks, for example, NancyFx, FubuMVC, OpenRasta and others.

Desktop

With C# you can, obviously, develop applications for windows using WPF (Windows Presentation Foundation), you can also use Metro to develop applications for Windows 8.

Mono also added more versatility to .NET when we talk about desktop applications, there are ports for Linux and for MacOS that enable us to develop C# applications for these platforms.

monoForMacOS

Figure 2: TouchDraw, a MacOS application built with Mono for Mac.

Mobile

Develop for mobile devices is becoming a key factor for developers, mobile market is increasing their share by large steps. C# offers solutions for the three main operating systems, Windows Phone, Android and iOS.

Windows Phone, since Microsoft bought Nokia, is increasing their sales and consequently their market share, some people are already betting that,in the next years, Windows Phone will beat iOS as the second biggest player in the market. C# developers, today, have the opportunity to learn this technology while it isn’t mainstream and, if Windows Phone really become a major platform, get a good position in the future.

mobile_market_share

Figure 3: Mobile Market Share Prediction – 2015.

When we talk about the two biggest players nowadays, we can have C# code running on Android and on iOS through Mono for Android and Mono for iOS, and even better, unlike happens with hybrid applications written in Javascript, where applications don’t have a native behaviour, when we use Mono we can use native controls, using native interface builders and give our application a native behavior, and even better, we can do it while sharing almost 75% of the code base between the different versions.

Screen Shot 2013-10-29 at 8.46.30 PM

Figure 4: The same application with native controls and sharing almost 75% of the code base.

Cloud/Big Data

Another tendency when developing applications is, first, the capacity to handle thousands and thousands users, and, second, the capacity to gather and analyze all the data produced by this users and discover patterns to help the companies to take decisions.

According to a survey conducted by Gartner Inc., by 2015, big data demand will reach 4.4 million jobs globally but only one-third of these jobs will be filled. Almost 3 million jobs will be waiting for those developers who start to develop big data solutions in the next years.

As C# developers, we can develop cloud solutions using AWS and Azure, from Amazon and Microsoft respectively and many others. They also enable us to perform map reduce operations through Elastic Map Reduce and Azure HDInsight.

hadoop_azure

Figure 5: HDInsight, hadoop on Windows Azure.

Natural User Interfaces/Kinect

Natural User Interface or NUI, is the name given to user interfaces that are almost invisible and based in natural movements. It gained traction when started to replace games joysticks like Kinect, but also started to gain traction for other areas, like remote physioterapy, surgery support, house monitoring and so on.

The most used tool to develop NUIs have been the Microsoft Kinect where one of the languages used to develop these applications is C#.

kinect-phi

Figure 6: People are using Kinect to support physiotherapy sessions.

Tools

C# has mature tools to assist developers to be productive and to write maintainable code. The most used are:

Visual Studio – By far the most used tool by C# developers and and also considered by a lot of people the better existing IDE. It has good support for code completion, refactoring, web designer, debugging, unit tests, code analysis, deployment, version control and much more.

MonoDevelop – Visual Studio only runs on Windows, but remember that C# runs also on Linux and on Mac, MonoDevelop is the choice to write C# code in these platforms. Unlike Visual Studio, MonoDevelop is free and open source.

Xamarin Studio – Xamarin Studio is the IDE used to programming using C# for mobile applications for iOS and Android, it was developed by Xamarin, the same company that developed Mono.

Costs Involved

This is the point where C# really sucks. Almost every tool, OS, IDEs and plugins, are paid. Below are the costs at this moment.

Windows 8

  • Express: Free, but you lose some features like plugins support
  • Standard: $120
  • Pro: $200

Visual Studio 2013

  • Upgrade from 2012: $99 until 12/31, after $299
  • Full Price: $499

Xamarin Studio

  • Indie: $299/year, by platform
  • Business: $999/year, by platform

At least for students, both Microsoft and Xamarin offers huge discounts or even free versions of their software. To get Microsoft products by free you need to register yourself with a valid educational email in their program, DreamSpark. For Xamarin, you need to get in contact with them by email.

For those with startups or companies with less than 5 five years, Microsoft has a program called BizSpark, that offers free licenses and free support during 3 years.

dreamspark

Figure 7: DreamSpark, support for students.

Open Source/Community

I will start this topic with one recommendation. Read my disclaimer. Then read again. After these two years involved with people from other communities, I have to say that C#/.NET community is not good when compared with other communities like Java, Ruby, Python and now Node.js, where people develop their own language, tools, IDE’s and so on.

There are few open source projects but they never become too much popular. People are always waiting for Microsoft to develop new software/tools/IDEs and it results in a delay and in a dependency when talking about adopting new technologies like SASS, LESS and CoffeeScript, even when a non-Microsoft tool is developed, people tend to wait until a Microsoft solution is released to start to use these new technologies.

When good non-Microsoft tools are released (Resharper, CodeRush, RedGate Tools, NCrunch and others), they are usually paid, looks like the community culture is, if people pay for new software, people will sell new software. IMHO, it breaks a little bit the concept of community, that is everybody helping each other to build better and better tools.

Despite of it, Microsoft is trying to change this scenario a little bit. Recently, Microsoft is releasing some of their frameworks as open source projects and inviting developers to contribute. For example, the following frameworks are created by Microsoft and now are also open source:

Microsoft is also recognizing people from their efforts in open source projects and in community support. Those that more contribute in these topics in a given year can become a Microsoft MVP and receive incentives from MS to continue contributing. You can read more about Microsoft MVP here.

Recently, other people also discussed about .NET community and how it’s evolving, if you want to read more, check these links below:

Future

Worth betting in .NET for the next years?

IMHO, yes, and I’m betting on it. The platform is evolving fast and is being generalist that enable us to solve problems in a variety of fields, as I wrote in this section. It’s not perfect, but people are putting efforts to solve the problems and continue to push it forward.

Are there a future where we will have a top language like C#, running on a free Visual Studio version with free plugins built on top of a vibrant community? I hope so!

Wish to learn more about C#? Check out these resources.

November 2, 2013 @ 12:56 pm

Six months ago, I started my master's degree where I'm researching about software engineering and mining of code repositories. In the next months I pretend to, besides write about C#, Javascript and programming in general, also write about subjects that I'm researching, tools that I'm developing, and papers that I'm reading. In this post, I will talk about the results that I had through mining and gathering information from Apache Httpd repository.


Introduction

When developing a software, developers are always adding, changing and removing software artifacts. These software artifacts can be code, documentation, config files and so on. To manage these changes, developers use a VCS (version control system), good VCS examples are CVS, SVN, GIT and Mercurial. These VCS and the changes they manage, end up being a important information source about a software and everything it's related. Through mining we can answer a lot of questions about the software that are being developed:

  • How many developers are working in a software?
  • Where they come from?
  • What time they work in the project?
  • Who are the commiters who work in each piece of the software?
  • Who introduces more bugs?
  • Who produces the better code?


About Apache Httpd

The Apache HTTP Server Project is an effort to develop and maintain an open-source HTTP server for modern operating systems including UNIX and Windows NT. The goal of this project is to provide a secure, efficient and extensible server that provides HTTP services in sync with the current HTTP standards.

Apache Httpd have been developed since 1996 and, today, is in version 2.4.6 released in July, 2013. During this time, more than 100 developers made more than 55k commits. Due this size, Apache Httpd is constantly studied by computer scientists being target for various studies in academy. Httpd's artifacts are managed in a SVN(Subversion) repository and can be found in this link, you can also find more information about the project here.


The Research

In this research, I was interested to gather information about the Apache Httpd Developers. The following questions were answered:

  • Where developers come from?
  • When, time and weekday, developers make commits?
  • Which file types are edited?


Mining Apache Httpd Repository - Getting Commits

I didn't know nothing about Python, so I chose python (I'm not crazy, I was just trying to add one more tool to my belt) to extract data from the SVN Repository. Honestly, I don't know if there are others, but I found a very good tool to extract data from SVN Repositories called PySVN. I extracted all the data that I needed using the following code:

"""Documentation Link:
 http://pysvn.tigris.org/docs/pysvn_prog_ref.html#pysvn_client_log"""
import pysvn

class SvnService(object):
    """docstring for SvnService"""
    def __init__(self, repository_url):
        self.repository_url = repository_url

    def get_info(self):
        client = pysvn.Client()    
        data = client.log(self.repository_url, discover_changed_paths=True)
        return data


Mining Apache Httpd Repository - Getting Geolocations

When getting commits from a repository, we don't have any information about a developer, besides his login. One of my goals was draw a commit map with commits distributed by location, to get these information I grouped all the commits by developer's login and started to search manually their geolocations. The Apache Httpd project has a web page with some developers profile that includes their address, you can find this information here. After this step, I already had the geolocation information for a lot of developers, but I was still missing some of them. The Apache Foundation has another page where I can find a developer's name from his login. Here is the page. At this moment, I had all developers name, so I started to google them and for my happiness most of them has a online profile(Blog, Github, personal site and so on) with their address. The last step was to get their latitude and longitude through google maps API and their address.

// Request
"http://maps.googleapis.com/maps/api/geocode/json?address=Brazil&sensor=false"

// Response
{
   "results" : [
      {
         // ...
            "location" : {
               "lat" : -14.235004,
               "lng" : -51.92528
            }
         // ...   
       }
   ],
   "status" : "OK"
}


Mining Apache Httpd Repository - Adjusting the time zone

Two of my other stats depends on commit's time, the time of each commits for an obvious reason, and the weekday. The relation between time zones and weekdays are a little trickier, if a developer make a commit around midnight and we adjust the commit time accordingly to his time zone, it also can change the commit's date and of course changing the commit's weekday. To adjust the time zone, they are originally in UTC, I used a google API again, the Google Time Zone API, it's use is very simple, making request with a location (latitude and longitude), the api returns a json with information about the location's time zone.

// Request
"https://maps.googleapis.com/maps/api/timezone/json? +
location=39.6034810,-119.6822510&timestamp=1331161200&sensor=false"

// Response
{
   "dstOffset" : 0.0,
   "rawOffset" : -28800.0,
   "status" : "OK",
   "timeZoneId" : "America/Los_Angeles",
   "timeZoneName" : "Pacific Standard Time"
}


Results


Commits By Location

mapa-aberto

mapa-fechado

Most of the Apache's Commits comes from USA, England and Germany. Analyzing only the Top 20 committers, 12 come from USA, 4 from Germany,
2 from England, 1 from Denmark and 1 from Canada. A interesting point here is the Research Triangle Park who contributed a lot to Apache Httpd with 7 committers ( 3 of them in the Top 20).

Top 20 Committers and their Locations

1. William A. Rowe Jr. - Illinois, USA
2. Jim Jagielski - Maryland, USA
3. André L. Malo - Germany
4. Jeff Trawick - North Carolina, USA
5. Rich Bowen - Kentucky, USA
6. Stefan Fritsch - Germany
7. Rüdiger Plüm - Germany
8. Dean Gaudet - California, USA
9. Graham Leggett - England
10. Ryan Bloom - California, USA
11. Justin Erenkrantz - California, USA
12. Joe Orton - England
13. Joe Schaefer - Florida, USA
14. Daniel Gruno - Denmark
15. Joshua Slive - Canada
16. Ken Coar - North Carolina, USA
17. Doug MacEachern - California, USA
18. Bill Stoddard - North Carolina, USA
19. Ralf S. Engelschall - Germany
20. Roy T. Fielding - California, USA


Commits By Time and By Weekday

commits-weekday
commits-timeoftheday

Most of the commits were made in work hours and in work days. It can suggest that committers made these commits while working in their jobs or in their research (like the developers from Research Triangle Park).


By Date

commits-time

Analyzing commits through years, we can see that Apache Httpd is stable, it didn't have a boom (like Rails had in the last years), number of commits through the years, is, in average, almost the same.


By File Extensions

bubble-chart

As I expected, in a C project, most of the commits come from C files, '.c' and '.h'. The NotSpecified extension, actually is files without extensions, usually text files. There is also a lot of documentation files, written in html files and their respective translations '.html.en', '.html.fr', '.html.de' and so on.


Top 5 Committers

As curiosity, below is the same graphs by committer.


William A. Rowe Jr.

mapa-aberto

mapa-fechado

weekday

timeoftheday

bytime

files


Jim Jagielski

mapa-aberto

mapa-fechado

jim-week-day

jim-timeoftheday

jim-bydate

jim-byfile


André L. Malo

nd-mapa-aberto

nd-mapa-fechado

nd-weekday

nd-timeoftheday

nd-date

nd-fileextension


Jeff Trawick

trawick-mapa-aberto

trawick-mapa-fechado

trawick-weekday

trawick-timeoftheday

trawick-date

trawick-filextension


Rich Bowen

rbowen-mapaaberto

rbowen-mapafechado

rbowen-weekday

rbowen-timeoftheday

rbowen-date

rbowen-filextension

Continuing our road through C#, that already covered a basic C# program and how types are used in C#, now we will work with numeric types that, together with string and arrays, are the most used types in C#.


Built-in numeric types

We saw in the last post that C# has some predefined types to represent numbers. To represent integers, C# has 8 types, sbyte, short, int, long and their respective unsigned versions, byte, ushort, uint and ulong. They can be used to represent numbers in the following sizes, 8 bits(sbyte), 16 bits(short), 32 bits(int) and 64 bits(long). When we need represent real numbers, C# gives to us 3 options. Float and double are normally used for scientific calculations while decimal is normally used for financial calculations. The table below describes each one of these types, their sizes and their ranges.

Type
Size (bits)
Range
sbyte8-2^7 to 2^7 - 1
byte80 to 2^8 - 1
short16-2^15 to 2^15 - 1
ushort160 to 2^16 - 1
int32-2^31 to 2^31 - 1
uint320 to 2^32 - 1
long64-2^63 to 2^63 - 1
ulong640 to 2^64 - 1
float32-3.4 x 10^38 to 3.4 x 10^38
double64±5.0 × 10 ^ −324 to ±1.7 × 10 ^ 308
decimal128-7.9 x 10^28 to 7.9 x 10^28


Declaring Numbers

Numbers in C# can be declared using decimal, hexadecimal or exponential notation.


int decimalNotation = 10; // 10
int hexadecimalNotation = 0xA; // 10
double exponentialNotation = 1E2; // 100


Type Inference

When we are declaring a number, the compiler always tries to infer the type of a given number is. If a given number contains a decimal point or an exponential notation (E), the given number is a double. Otherwise, the compiler tries to find a integer type that can represent the given number, the following order is used: int, uint, long and ulong.


1.0                  // double
1.25                 // double
1E2                  // doubl
1                    // int
10000000000000000000 // ulong

When declaring a number, we can also use numeric suffixes (U, L, UL, F, D, M) to force the compiler to infer a given type.


1U // uint
1L // long
1UL // ulong
1F // float
1D // double
1M // decimal


Type Conversions

Numeric conversions in C# can be implicit or explicit. If the destination type can represent all the possible values that a given type can, an implicit conversion can be executed. Otherwise, if a given type can represent more values that the destination type a cast (explicit conversion) have to be performed.

For example, when converting from short to int, we can make an implicit conversion, since an int can represents all the possible values that a short can represent. Otherwise, when converting from long to int, an explicit conversion is required, with possible losses, since a long can represent more values that an int.

Between float and double the same thing happens, you can make an implicit conversion from a float to a double, and have to make an explicit conversion when converting from a double to a float.

When converting to a decimal, an exception exists, all integer values can be converted implicitly to a decimal. Otherwise, a explicit conversion is required.


short shortNum = 1;
int intNum = 1;
long bigLongNum = 1100000000000000000;
long smallLongNum = 1;

intNum = shortNum;                 // 1 - implicit conversion
intNum = (int)smallLongNum;        // 1 - explicit conversion
intNum = (int)bigLongNum;          // 82706432 - explicit conversion with loss

float floatNum = 1.0F; 
double smallDoubleNum = 2.0;
double bigDoubleNum = 10E200;

smallDoubleNum = floatNum;         // 1.0 - implicit conversion
floatNum = (float)smallDoubleNum;  // 1.0 - explicit conversion
floatNum = (float)bigDoubleNum;    // +infinite - explicit conversion with loss

decimal decimalNum = 0;

decimalNum = 1U;                   // 1
decimalNum = 1;                    // 1
decimalNum = 1L;                   // 1
decimalNum = 1UL;                  // 1
decimalNum = (decimal) 1F;         // 1
decimalNum = (decimal) 1D;         // 1


Operators

C# has the same basic operators than other languages (+,-,/,*) and due to being a C-Like, it also has unary operators with before and after increments/decrements.


int a = 3, b = 2;

Console.WriteLine(a + b); // 5
Console.WriteLine(a - b); // 1
Console.WriteLine(a/b);   // 1 - divisions between integers are truncated
Console.WriteLine(a*b);   // 6
Console.WriteLine(a++);   // 3 - increment after to print the number
Console.WriteLine(++a);   // 5 - increment before to print the number
Console.WriteLine(b--);   // 2 - decrement after to print the number
Console.WriteLine(--b);   // 0 - decrement before to print the number


Double or Decimal

Decimal should be used when accuracy is important, money for example. Otherwise, we can use double or float. A common mistake when learning C# is to use double to represent financial values, there is a reason to avoid it, float and double are represented internally in base 2, that is, it can only represent precisely base 2 numbers, while decimal is represented in base 10, that is, it can represent floating point numbers more precisely but there is a cost, performance.


double doubleA = 0.1F;
double doubleB = 1F;

Console.WriteLine(doubleB - 10 * doubleA);   // -1,49011611938477E-08
Console.WriteLine(doubleB + 10 * doubleA);   // 2,00000001490116

decimal decimalA = 0.1M;
decimal decimalB = 1M;

Console.WriteLine(decimalB - 10 * decimalA); // 0,0
Console.WriteLine(decimalB + 10 * decimalA); // 2,0


See also other posts from this series

Getting CSharper #1: A short introduction to C#
Getting CSharper #2: Understanding a C# Program
Getting CSharper #3: Understanding C# Types

Motivation

Two weeks ago, Me and my partners, we decided to start a new project using Team Foundation Service with Git and host it on Azure. As usual, we start developing a full happy path from project creation until deployment, to test all the infrastructure involved. The TFS Account creation was OK, the Git repository was also OK, but when we tried to setup the build configuration to automatically publish our project after an automated build, we start to have some issues. We found this declaration on MSDN.

Deploying from TFS's git repository to Windows Azure is not yet supported. It is on our radar tentatively mid of this year.
- Microsoft Employee

We can’t (until today) configure our Azure Website to deploy from a Git repository through Azure’s Portal. We end up, through the help of others blogs , configuring a trigger on team foundation service to publish our project after an successful commit. Community is all about share what we learn, so I decided to write a step by step tutorial that maybe can help someone with the same difficulties that we had or maybe we can receive more feedback and we improve even more our solution.


Tutorial

First of all, I will assume that you already have a Team Foundation Service Account and an Azure Account, if you don't have yet, you can create them at TFS and at Azure.


1. Creating a Team Foundation Service + Git project.

1.1 - Logged in your TFS Account. Click in "New Team Project + Git" button.

tfs-account



1.2 - Fill up the form. Remember that you can’t rename this project in the future.

creating-project



1.3 - Click on "Navigate to project" and you will see your project info.

project-created


2. Creating a WebApp

2.1 - Click on "Open new instance of Visual Studio" to open an Visual Studio instance linked to your project.

open-new-instance



2.2 - Clone your git repository and choose your repository folder, remember the folder's name cause after you will create your Web Application in this same folder.

clone-repo1


clone-repo-2



2.3 - Create a MVC 4 Web Application in your repository folder.

creating-app



2.4 - Commit it. Now we have a deployable project. The next step is to create our Azure WebSite.

commit-webapp


3 - Creating the Azure WebSite

3.1 - Logged in your Azure Portal, create a new azure website.

creating-web-site



3.2 - Download the publish profile.

download-publish-profile


4 - Configuring the automated deploy

4.1 - Import the publish profile that we downloaded. It will create a publish profile folder in our MVC project.

importing-publish-profile



4.2 - Configure a new build definition.

Go to Team Explorer -> Builds -> New Build Definition and create a new build definition.

team-explorer-new-build

On Trigger Tab, set the “Continuous Integration” radio, so the build will occur after each check in.

trigger-tab

On Source Settings Tab, select the managed branch.

source-settings

On Process Tab, fill the solution to build field with 'MyApp.sln'.

Also fill the MsBuild arguments field with the following value:

/p:DeployOnBuild=true /p:PublishProfile="pauloortinsblog - Web Deploy" /p:AllowUntrustedCertificate=true /p:UserName=$pauloortinsblog /p:Password=Re7ACb3g6By9idLoi6gsWGbdc8AK49HxqaEFNzxxxxxxxxxxxxxxx /p:VisualStudioVersion=11.0

You should get the UserName and the Password from the Publish Profile file.

publish-settings

process-tab



4.3 - Enable Build Services.

On your TFS Account -> Build Section, you need to add the build services to your account in the following section.

enable-build


5 - Testing the happy path

5.1 - Let's make a change in one of our files, Index.cshtml.

change



5.2 - Committing the change and pushing it to the repository.

testing-happy-path



pushing



5.3 - A new build was queued on Team Foundation Service

build queued



5.4 - After the build the finished, the WebApp was deployed automatically.

website


Conclusion

This is it. I hope this post might help someone. You can also feel free to suggest an improvement in this process!

As every addicted for programming languages, I feel myself excited when I see new programming languages. In the last years I did at least a toy project in C#, Java, Javascript, Ruby, Python, Scala and Objective-C. Recently, I made a search for new and exciting programming languages. Below is the the languages that I think more interesting and are promising more in the next years. They are open source and focused on writing concurrency software. I will definitively give a try to one of them. The problem is choose which one!


Go


package main

import "fmt"

func plus(a int, b int) int {
    return a + b
}

func main() {
    res := plus(1, 2)
    fmt.Println("1+2 =", res)
}

Go, also known as Golang, is an open source language developed by Rob Pike, Robert Griesemer and Ken Thompson at Google and since 2009 is used in some of the Google's production systems. Your main goal is to make easy to write concurrent systems. Go is compiled like C and is garbage collected like Java and aims to be have an efficiency of a statically typed compiled language with the ease of programming of a dynamic language.

Go syntax is similar to C and Java, blocks are surrounded by curly braces and there are control flows like if, switch and for. Unlike C, line-ending semicolons are optional, Go doesn't include type inheritance, generics and method overloading. Go also, makes heavy use of interfaces.

Go concurrency, that is the best point in the language, is implemented through the goroutines, who looks like small threads. Goroutines are created through the go statement from anonymous or named functions. These goroutines are executed concurrently with other goroutines, including their caller. Execution control is moved between them by blocking them when sending or receiving messages. Goroutines can also share data with other goroutines.

Current State

Go is now in version 1.1, and have been used by the following companies:

  • Google
  • Heroku
  • SoundCloud
  • Canonical
  • CloudFlare

Resources to learn Go

Official Site
Go By Example
Effective Go
Programming in Go: Creating Applications for the 21st Century (Developer's Library)


Elixir


defmodule Hello do
  IO.puts "Defining the function world"

  def world do
    IO.puts "Hello World"
  end

  IO.puts "Function world defined"
end

Hello.world

Created by José Valim (former Rails Committer), Elixir is inspired by the the best parts of scripting languages like Ruby and Python, but built on top of Erlang VM. Exixir's goal, like Go, is to make easy to write concurrent software, it was built on top of Erlang VM, that is known to be very good on concurrency, and use the Ruby's syntax instead the Erlang's syntax, that is known to be very expressive.

Elixir and Erlang shared the same bytecode and datatypes. This means you can invoke Erlang code from Elixir (and vice-versa) without any conversion or performance hit. It's good, because Elixir, despite is a new language, is benefited by code that is already maintained by the Erlang community.

Current State

Elixir is still in Beta and the actual version is the 0.10.0 released on 07/13/2013.

Resources to learn Elixir

Official Site
Meet Elixir ScreenCast with José Valim
Programming Elixir


Clojure


(loop [i 0]
  (when (< i 5)
    (println "i:" i)
    (recur (inc i)))i)

Clojure is dialect of LISP programming language created by Rich Hickey. It’s also a predominantly functional language, dynamic, built on top of JVM (but has ports to CLR – ClojureCLR, and to Javascript Engine – ClojureScript). It is designed to be a general-purpose language, combining the approachability and interactive development of a scripting language with an efficient and robust infrastructure for multithreaded programming.

Clojure syntax, as other Lisps, is built on S-expressions that are first parsed into data structures by a reader before being compiled.

Clojure's best part is that it allows the use of the well established JVM but in a simpler way and, like the other functional programming languages, Clojure is expressive allowing us to reduce the number of lines of code that we need to type.

Current State

Clojure's actual stable version is 1.5.1, but also there is a development version that is currently in the version 1.6. Clojure is been used by the following companies:

  • Amazon
  • Citigroup
  • BackType
  • Berico

Resouces to learn Clojure

Official Site
Clojure Programming
The Joy of Clojure: Thinking the Clojure Way
Programming Clojure

If you read my previous posts, Intro to TDD and 12 Lessons I learned using unit tests/TDD, you will see that I'm a huge TDD fan and I try to apply it always I can, this methodology have been my favorite way to guarantee that the software I'm developing is working properly and the code has a good quality and is easy to maintain.



Build/Testing Time

Recently, I joined in a relatively big project (It has been developed for 2 years with 12-14 developers) and one thing started to bother me and disturb my TDD flow, the build/testing time. This project has a solution with 20+ projects and 660 tests that covers only 7% of the code, if it had more tests the performance would be even worst. My machine isn't a top machine, Core I3 with 4 GB running Windows 8, but look at these times, they are not acceptable:

- Build Time: 3 minutes and 10 seconds
- Build + Tests Time: 5 minutes and 41 seconds
- Init Debug: 4 minutes and 9 seconds



Losses

Let's do some math and we are going to discover the amount of losses that we have due to this slow build.
Suppose that me and my team, we are using TDD and we need to run our tests 10 times per work hour for each person, an acceptable number.

For each work hour, we lose 56 minutes and 50 seconds building or running tests, we are reducing in almost 50% our productivity.

But we are 4 people and we work 8 hours per day, so for each 32 work hours, we effectively work 16. We are losing 16 hours per day.

In an average month, in 30 days, we lose 480 hours.

Now, let's talk about money, suppose that my employer pays $30 per hour for each person in my team.
So we lose 480 hours per month, more than $14000 IN A MONTH! What a waste of time and money!



Possible Solutions

After made these calculus, I started to search in the web for possible solutions to reduce this waste of time. The first two solutions were discarded, they were:

- Reduce the number of projects

The first solution is split code in folders instead in projects. It will reduce the number of projects that have to be compiled in each build, it will work, but I don't know how much it will help. This option was discarded because we entered in the project recently and we can't change the way that the other developers are working.

- Use NCrunch

NCrunch is an automated test runner that runs tests in background and concurrently. It's a amazing tool, I used it during the trial period and I fall in love for this tool. At this time, we decided to not spend money with tools, so I will have to wait a little more to buy NCrunch.



The chosen solution

A lot of people gave the suggestion, that instead rely on Visual Studio to build the solution and run the tests, I could do it through PowerShell using MSBuild.exe and VSTestConsole.exe. I did it. Let's talk about these tools:

- MSBuild

MSBuild is the a build tool that helps to automate the process of compiling, testing and deployment. In my case, I executed the following command:

MSBuild.exe myfolder/mysolution.sln /t:build /m:4 /nr:true /property:Config=Debug

/*
* /t: sets which targets/tasks will be executed
* /m: sets the number of cores that will be executed
* /nr: Enable or disable the re-use of MSBuild nodes.
* /property: sets the build config
*/

- VSTestConsole

VSTestConsole is, since VS2012, the default command-line application used to run tests. In my case, I created a .bat with these commands:

MSBuild.exe myfolder/mysolution.sln /t:build /m:4 /nr:true /p:OutputPath=c:\mydir
VSTest.Console.exe C:\mydir\testproject1.dll C:\mydir\testproject2.dll C:\mydir\testproject3.dll



Results

Build:
From 3 minutes and 10 seconds to 49s. 75.2% faster

Build + Tests:
From 5 minutes and 41 seconds to 1 minute and 10 seconds. 79.4% faster

Init Debug with building before from command-line:
From 4 minutes and 9 seconds to 1 minute and 8 seconds. 72.6% faster

I'm very satisfied with these results right now. My development flow is faster and I'm being more productive.



Future Plans

I'm planning to write a Visual Studio Extension to use my script instead use default build and, of course, give it to the community. What you think about it? Is there anyone with the same problem? It would be useful?

In the last years, we have seeing a lot of techniques and methodologies to develop software, each one aims to solve one of the problems that usually happen during development. For example:

  • TDD - has the goal to ensure that software is working properly
  • BDD - has the goal to produce live documentation, documentation that can be used to test the software, and put business analysts closer to the code
  • Continuous Integration - resolves integration problems, automate deployment and delivery phases and makes easier to change production versions

This techniques/methodologies contribute to a better software, more elegant, stable and bug free. But they don't ensure that our software is working properly. After all, we are ensuring that our functional requirements were implemented. But software is composed by functional and non-functional requirements, and unfortunately, non-functional requirements are constantly postponed until the deployment phase or even neglected. One of these non-functional requirements is performance, performance, if neglected, can be dangerous and can bring financial loss to the companies.

Some big companies made researches to know how performance can impact their business. Below are some of these results.



Look how our brain reacts to a slow page

Web-Stress-Infographic-500

Some interesting things happen with our users if our page performance sucks:

57% of them will leave the page before it loads
78% of them will feel themselves stressed or angry
4% of them will throw their cellphones in the ground



The cost of a 1 second delay in page load time

Atraso de um segundo

In this research, made with 160 companies with revenues between 1-1.5 billion dollars, we can see how an insignificant 1 second delay can impact a business and generate big losses:

7% reduction in conversions
11% fewer page views
16% decrease in customer satisfaction
2.5 million loss in sales per year



Cases



Google - Slower Searches - Less Searches

Google experiments demonstrate that slowing down the search results page by 100 to 400 milliseconds has a measurable impact on the number of searches per user of -0.2% to -0.6% (averaged over four or six weeks depending on the experiment). And more, this reduction tend to increase week after week.



Amazon - for each 100ms reduction in page load time, they got a 1% revenue increase

Amazon-Infographic



Microsoft - More Delay = Up to 5% Reduction in Key Metrics on Bing

bing-delayimpact

Microsoft discovered that load time delay is inversely proportional to their key metrics to Bing, like Revenue Per Visitor, Satisfaction and so on.



Mozilla - Faster Pages = more 60 million downloads

Mozilla

Mozilla made small adjustments in their product's land pages. This increased the downloads number in 60 millions per year.



Yahoo - 400ms faster = 9% more traffic

Yahoo

In 2008, Yahoo discovered that with a page load time 400ms faster, they increased their traffic in 9%. It happens due the fact that impatient users close their pages before they finish to load.



Shopzilla - Reduction in Load Time = More Revenue

Shopzilla

In 2009, Shopzilla reduced from 6s to 1.2s their page load times, it represented a 12% increase in revenue and 25% more page views.



AOL - Faster User Experiences = 50% more Page Views

Strangeloop-Infographic-AOL

In 2009, AOL made a research and discovered that visitors who have the better user experience visit an average of 7.5 pages/visit, on the other hand, the visitors with worst user experience visit only 5 pages/visit.



More Resources

Everything you wanted to know about web performance but were afraid to ask
The truth about performance related revenue statistics
More on how web performance impacts revenue…
The Last Word: Tide Tough Times with Performance-driven Development
Stuff that works - Performance Driven Development
One Second Delay
Case Shopzilla
Case Amazon
Case AOL
Case Yahoo
Case Mozilla Foundation

In my previous post, we saw our first C# program, we covered how it was built and what kind of structures we should use and so on. In this post, we will talk about the types that we can use when we are building a program.


What is a Type ?

Type, in programming, can be defined as a data classification that determines the possible values for a data, the operations that can be performed and how it will be stored.

static void Main()
{
    string firstName = "Paulo";
    string lastName = "Ortins";
    int age = 23;
    int salary = 100;

    Console.WriteLine(firstName + lastName); // PauloOrtins
    Console.WriteLine(age + salary); // 123
    Console.WriteLine(firstName.ToUpper()); // PAULO
    Console.WriteLine(salary.ToUpper());
}

In the example above, we are declaring four variables, two of them are variable of type string, and the others are of type int. Note that, in both cases we are defining what we can store in each variable, and how we can manipulate them. When we sum two strings, we are doing a concatenation, on the other hand, when we sum two integers we are performing a math operation. We can capitalize every letter in a string with the ToUpper method, but this method doesn't exist in type int.


Built-in Types

Built-in types are types that are specially supported by the compiler. They are also known as primitive types, we can group them to build other types. Below is a list of C# built-in types and how we can use each one.

Type Description Example
object The ultimate base type of all other types
object o = null;
string String type; a string is a sequence of Unicode characters
string s = "hello";
sbyte 8-bit signed integral type
sbyte val = 12;
short 16-bit signed integral type
short val = 12;
int 32-bit signed integral type
int val = 12;
long 64-bit signed integral type
long val1 = 12;
long val2 = 34L;
byte 8-bit unsigned integral type
byte val1 = 12;
ushort 16-bit unsigned integral type
ushort val1 = 12;
uint 32-bit unsigned integral type
uint val1 = 12;
uint val2 = 34U;
ulong 64-bit unsigned integral type
ulong val1 = 12;
ulong val2 = 34U;
ulong val3 = 56L;
ulong val4 = 78UL;
float Single-precision floating point type
float val = 1.23F;
double Double-precision floating point type
double val1 = 1.23;
double val2 = 4.56D;
bool Boolean type; a bool value is either true or false
bool val1 = true;
bool val2 = false;
char Character type; a char value is a Unicode character
char val = 'h';
decimal Precise decimal type with 28 significant digits
decimal val = 1.23M;


Custom Types

In C#, and in every language, we can combine primitive types to build our own types. For example, let's create a type called Person, that has two attributes, a variable of int type called age and a variable of string type called name.

class Person
{
    string name;
    int age;
    public Person(string paramName, int paramAge)
    {
        name = paramName;
        age = paramAge;
    }
}


Initializing Data

string firstName = "Paulo";
int age = 23;
Person person = new Person("Paulo", 23);

Types are models for data. When we are creating data, we need to instantiate a type. Predefined types are specially supported by the compiler, so we can create data just assigning a value for them. In the other hand, for custom types, we have to use the new operator. The new operator trigger a constructor call, that is like a method used to build a instance for a given type. If you look carefully to our custom type Person, we can find a constructor with two parameters, that is called when a new Person is created.


Conversions

Normally, we will have situations where we can store a value of one type in a variable of another type. It's called a conversion. Conversions can be implicit or explicit. Implicit conversions are conversions where the compiler can guarantee that it will succeed and no information will lost in the conversion. Conversely, explicit conversions are conversions where the compiler cannot guarantee that it will succeed and information can be lost during conversion.

int x = 123; // 32-bit integer
long y = x; // Long are 64-bit so it can store a int 
int z = (int) y; // Long is bigger than int, so an explicit conversion is required


Value Types x Reference Types

Types in C# can be:

  • Value Types
  • Reference Types

Value types comprises almost all built-in types, like char, bool, int, short, long and so on.
Reference types comprises all custom types, classes, interfaces and arrays.

They're are different by the way they are handled in memory. A variable of a value type stores a value, for example, a int variable stores a 32-bit data. Conversely, a variable of a reference type stores two values, a object value, and reference for that object. When we assign a value to a value type, we are modifying his value, but when we assign reference type, we are modifying a reference for an object. Let's play with these differences.

Person person = new Person("Paulo",23); // Person is a reference type
int age = 23; // int is a value type

Console.WriteLine(person.name); // Paulo
Console.WriteLine(age); // 23

int age2 = age;   // We are creating a new variable in memory with a value of 23
Person person2 = person; /* We are creating a new reference for a memory position,
                          * now both variable point for the same place 
                          */

age2 = age2 + 1;

// age and age2 are two different values

Console.WriteLine(age); // 23
Console.WriteLine(age2); // 24

person2.name = person.name + "123";

// person and person2 are references for a same memory position

Console.WriteLine(person.name); // Paulo123
Console.WriteLine(person2.name); // Paulo123


More Resources

Introduction

In the last years, relational databases have been the only option when we talk about data persistence. Our unique choice have been which database we should use. Should we use a SQL Server? Should we use a MySql? Oracle? Even in these cases, some choices come by default. E.g. if we are using .NET, we almost always work with Sql Servers, if we are using Java we almost always use Oracle, ruby-mysql, python-mysql/postgre and so on.

The reason is obvious, relational databases are in the field for decades, they proved to be robust for most of the applications. We can rely on them to take care of concurrency, transactions and so on. But if relational databases are reliable as I’m saying why they are losing market to NoSQL databases? Relational Databases have some problems that NoSQL Databases are resolving.



Problems with Relational Databases



Impedance Mismatch

We use to write software using Python, Ruby, Java, .NET. What they have in common? They are object-oriented languages. But we persist the data using MySQL, Postgre, Oracle and SQL Server. What they have in common? They are relational databases. Can you spot the difference? Impedance Mismatch is the name we gave to this difference. Our memory structures are object-oriented and our databases are relational, every time we need to save or retrieve data we need to make a conversion. ORM (Object Relational Mapping) Frameworkds, like Hibernate, Entity Framework, make easier to map objects and relational databases but it’s still a issue, principally when we need high performance queries.



Applications are getting bigger

Web applications are increasing in scale. We have to store more data, we have to serve more users and we need more computing capability. To handle this scenario we have to scale. We can scale in two ways. We can scale up, that is buying better machines, more disk, more memory and so on. Or we can scale out, that is buy a lot of small machines and use them in a cluster. In big applications scale up is not an option. Bigger machines are more expensive and they have a limit, we don’t have a machine that can handle the traffic from Google or Facebook. Given this context, we need new databases, since relational database are not designed to run on clusters. Yes, you have clustered relational databases, but they work sharing a disk, that isn’t the scenario we want to have when we’re building a cluster. Some of the companies who needs to handle a lot of traffic like Google, Facebook, Amazon started to develop databases that are designed to run on clusters and this was the beginning of NoSQL era.



NoSQL Era

Nowadays, there are a lot of NoSQL databases, MongoDB, Redis, Riak, HBase, Cassandra and so on. And each one has at least one of these characteristics.

  • NoSQL databases don't use SQL, some of the has query languages like MongoDB and Cassandra
  • Usually they are open-source projects
  • They we're built to run on clusters
  • Schemaless, you don't have rigid schema defining the data structure



Types of NoSQL

NoSQL databases can be divided in 4 types. Key-value, Document-Oriented, Column-Family Databases and Graph-Oriented Databases. Let's see what are each one of these types, his characteristics and where we should be using them.



Key-Value Databases

What are: A key-value store works like a simple hashtable that we are used to use in traditional languages. You can add, retrieve and delete data through keys. Since they use primary key access they tend to have a good performance and are easily scalable.

Examples: Riak, Redis, Memcached, Amazon's Dynamo, Project Voldemort

Who's using: GitHub (Riak), BestBuy (Riak), Twitter (Redis and Memcached), StackOverFlow (Redis), Instagram (Redis), Youtube (Memcached), Wikipedia (Memcached).

When we should use:

  • To store user information, like Session, Profiles, Preferences, Shopping Cart and so on. These info are often associated to a id(key). This case is exactly the best scenario to use a key-value database.

When we shouldn't use:

  • If we need to query the data by value instead by keys. There is no way to query a key-value database by value.
  • If we need to save relationship between data. We can't relate data between two or more keys in a key-value database.
  • If we need transactions. In a key-value database, we can't roll back a operation if a failure occurs.



Document-Oriented Databases

What are: Document-Oriented databases store data as documents. Documents can be defined as a set of maps, collections and scalar values. Documents are like rows, but unlike rows that have to have the same schema, documents can be totally different between themselves. These documents can be stored using XML, JSON or JSONB.

Examples: MongoDB, CouchDB, RavenDB

Who's using: SAP (MongoDB), Codecademy (MongoDB), Foursquare (MongoDB), NBC News (RavenDB)

When we should use:

  • Logging. In a enterprise environment, each application has different logging info. Document-oriented databases don't have a fixed schema. So we can use them to store all these different info.
  • Analytics. Since they are schemaless, we can store different metrics and new metrics can be added without schema changes.

When we shouldn't use:

  • If we need to have transactions between documents. Document-oriented databases don't support transaction between documents, if we need it, we shouldn't use a document database.



Column-Family Databases

What are: Column-Family databases store data in column families. A column family can be defined as groups of related data that are often queried together. Let me give a example. When we have a Person class we often access their name and age together but not his salary. In this case, name and age belong to one column-family and salary belongs to another one.

Examples: Cassandra, HBase

Who's using: Ebay (Cassandra), Instagram (Cassandra), NASA (Cassandra), Twitter (Cassandra and HBase), Facebook (HBase), Yahoo!(HBase)

When we should use:

  • Logging. Since we can store data with different columns, each application can write their info with their own column families.
  • Blogging Platforms. We can store each info in different column families. For example, tags in one family, categories in another one, posts in another one and so on.

When we shouldn't use:

  • If we need ACID transactions. Cassandra doesn't support transactions.
  • Prototyping. If we analyze the Cassandra data structure, we can see that this structure is based in the pattern we expect to retrieve the data. When we are designing a prototype, we can't predict how will be the query pattern and once it changes we will have to change the column families design.



Graph-Oriented Databases

What are: Graph databases allow us to store data as graphs. Entities can be represented as vertices and the relationships between these entities can be represented as edges. In a example, we could have 3 entities. Steve Jobs, Apple and Next. And two edges called "Founded by" that relate Apple to Steve Jobs and Next to Steve Jobs.

Examples: Neo4J, Infinite Graph, OrientDB

Who's using: Adobe (Neo4J), Cisco (Neo4J), T-Mobile (Neo4J)

When we should use:

  • Connected Data. If we have data that are connected through relationship, we have a good case to use a graph database, the vertices can be people, cities, companies and edges can be "lives in", "employed by" and so on.
  • Recommendation Engines. If we represent data in graph databases, they can be used to make recommendations like "people who bought this item also bought these items" like Amazon and Netflix.

When we shouldn't use:

  • Data model not suitable. Most of the cases are not suitable for graph databases since operations involving the whole graph are not trivial.

In the previous post we talked about C# history, how it’s evolving and some of his characteristics. Now, we will start to see code. We are going to see the C# keywords, blocks, how a C# program is structured and so on.

Understanding our First C# Program


using System; // Using declaration

namespace ConsoleApplication1 // Namespace declaration
{
   /*
      This is my 
      multiline comment
    */
    class Program // Class declaration
    {
        static void Main() // Main declaration
        {
            string name1 = "Philip";  // Statement/Variable Declaration
            int age1 = 23;            // Statement/Variable Declaration

            string name2 = "Roger";   // Statement/Variable Declaration
            int age2;                 // Statement/Variable Attribution
            age2 = age1 + 10;
            
	    Console.WriteLine(CreatePhrase(name1, age1)); 
	    // Statement/Method Call
            
	    Console.WriteLine(CreatePhrase(name2, age2)); 
	    // Statement/Method Call
        }

	// Method Declaration           
        static string CreatePhrase(string name, int age) 
        {
            return string.Format("Hi {0}, you are {1} years old.", name, age); 
	    // Statement/Return
        }
    }
}

What this program does? It prints a welcome message to two people. Now, let's using a bottom-up approach to figure out what each part means.

Statements

Statements are the smallest element in a language. A program is formed by a sequence of one or more statements. If you look to our program we will see that we have 7 statements, statements can be classified by type. Declaration Statements are statements used to declare a variable. Attribution Statements are used to assign a value to a variable. Call Statements are statements used to call a method (we see more about it soon). The last statement that we are using in our program is the Return Statement that we use to finish a method and return a value.

Methods

Sometimes, to break our program in subroutines that we can reuse latter, or to simplify our code, we can group statements in methods. Methods can receive one or more input data aka parameters and can return, or not, data to the caller. Our method CreatePhrase receive two parameters name and age and return a welcome message. Main method, when we are executing a console application (our program is a console application), the C# recognizes a method called Main as an entry point of execution and this method will be called to run the program.

Classes

In our example, we have a class called Program. A class is a unit of code who has state (data field members) and behaviors (methods). Our Program class has two methods, Main and CreatePhrase. A class is a kind of type and we can combine some of them to design our programs.

Namespaces

When our project is getting large, we feel the need to organize these types in other structures. They are called namespaces, and they are sets of types (class is a type, interface is a type and so on). In our program, we are declaring a namespace called ConsoleApplication1 which has only one class Program. Now look at the Console.WriteLine statement. Sometimes in our program we need to use types that were already created by other people, it's a good practice, we don't have to reinvent the wheel every time. To reuse a type already created we can import a namespace and we do it through the using statement. In our example, we are using a class called Console who belongs to a namespace called System. The using is there for convenience, without it, we should type the fully qualified name 'System.Console.WriteLine' to call the method WriteLine.

Syntax

C# syntax is inspired by C, C++ and Java.

Identifiers

Identifiers are names that we, programmers, choose for our types (variables, methods, classes, namespaces and so on). In our program we are using the following identifiers: ConsoleApplication1, Program, Main, name1, age1, name2, age2, CreatePhrase. And the .NET Framework is using these: System, Console, WriteLine, string, Format. Identifiers in C# must be a whole word, formed by numbers, letters and underscore. By convention, parameters, local variables, and private fields should be in camel case (e.g. personName) and all other identifiers (namespaces, classes, methods) should be in pascal case (e.g. WriteLine, Console, System).

Keywords

Keywords are names reserved by the language, you can't use them as identifiers. In our program, using, class, namespace, static are example of keywords.
Below there is a list of C# keywords:

keywords

Braces and Semicolons

Like C, C++ and Java, C# uses braces to delimit statements blocks and semicolons to mark the end of the line.

Comments

C# has support for two different types of code comments. Single-line comments and multi-line comments. A single-line comment starts with '//' and it continues until the end of the line. If we look at our program we will perceive that most of our program comments are single line comments. And there are the multiline comments, that start with '/*' and ends with '*/' and can contain one or more comment lines.

Conclusion

I think that we cover a little bit about C# strutures and syntax. In the next posts we will start to go deep in the language. There are a lot of ground to cover!

I started to programming using C# three years ago. Coming from Java, it was love at first sight, the language offers a lot of shortcuts to accomplish tasks easier than I was used with Java. Recently, I felt the need to leave to be a only a language user and start to understand what is happening under the hood. This series of posts are a result from my experience trying to deepen my C# knowledge through books, online tutorial, videos and so on. I hope it can help other people who are trying to learn more about C# too.

What it's C# ?

First of all, is necessary to talk about what is C#, what is his history, what are his characteristics and how we can compare C# with other languages.

A short history about C# and how it's evolving

1997 - Microsoft started a project that was internally known as Project Lightning (and also known as Project 42). The name "Project 42" was most likely because DevDiv (the Microsoft Developer Division) is in Building 42

1999 - Anders Hejlsberg, creator of Turbo Pascal and Delphi, and actually working on TypeScript, started to develop a language called Cool, which stood for "C-like Object Oriented Language", it was supposed to be a "clean-room" implementation of Java.

2002 - C# 1 was released by Microsoft, his syntax is very inspired by Java and C/C++.

2005 - C# 2 was released, the following features were introduced:

2007 - C# 3 was released, the following features were introduced:

2010 - C# 4 was released, the following features were introduced:

2012 - C# 5 was released, the following features were introduced:

Characteristics

In few words, we can define C# as a object-oriented programming language with support for functional programming, type-safe, with automatic memory management that runs mainly on Windows Platforms on top of CLR (Runtime Environment). Let's analyze these characteristics.

Object-Oriented

C# has support for encapsulation (we can declare private variables/functions), inheritance and polimorphism. Unlike C++, C# doesn't support multiple inheritance, instead it, we can use interfaces to make classes inherit characteristics from more than one source. The disadvantage is that interfaces only describe behaviors, they will have to be implemented in the leaf classes. In most of object-oriented programming languages, the behavior are defined by functions, in C#, there are more types like properties, that are functions to encapsulate an object's state, for example, age and name in a Person class, similar to gets and sets in Java.

Support for Functional Programming

C# has support to functions as first-citizen, you can declare functions as variables, we can pass functions as parameter. There are LINQ, lambda expressions and so on. It makes the code shorter and more powerful.

Type Safe

C# is statically typed, it checks for type errors during compile time. Check errors in compile time can catch a lot of errors before a program even run. Static typing also allow tools like IntelliSense and Refactoring Tools help us to write the program, it makes easier to maintain big projects.

Automatic Memory Management

Unlike C++, that requires that the developers worry about manual memory management, and similar to Java, C# has a automatic garbage collector, that automatically release memory for objects that are not used anymore. It doesn't eliminate the possibility to do manual memory management, if we want, we can use pointers like C++ to achieve a even better performance.

Runs mainly on Windows but has ports to other OS’s

C# is designed to run on windows platforms. There are some efforts to run C# in other platforms (See Mono Project), but these efforts are small and they support only a subset of C#/.NET Framework.

CLR

CLR (Common Language Runtine) is the .NET Framework's virtual machine. It's a runtime that is used by different programming languages in the .NET Framework (C#, VB.NET, IronRuby, IronPython and so on). It provides memory management, exception handling and thread syncronization. When we write a C# program the code will be compiled in a intermediate language called IL. This intermediate language will be loaded by CLR and will be compiled to native code by the CLR Just in Time compiler.

IC15013

.NET Framework

C# is part of the .NET Framework and it includes a lot of libraries with different objectives. The .NET Framework's Base Class Library provides user interface, data access, database connectivity, cryptography, web application development, numeric algorithms, and network communications. See the image below.

IuOVp (1)

Continues

I will be writing others posts about C#, about syntax, about libraries and so on. Subscribe the newsletter or the RSS to be notified! See you soon!

comment


DISCLAIMER: When I say 'to avoid code comments', it doesn't mean that I don't write comments, it means that I try to avoid code comments as much as I can, but sometimes I do, when I think it worth.

We spend more time reading software than writing software. I never seen any scientific study proving it, but in software field it's like a dogma or a common belief. Due to it, it's important to try to write software easy to read, it's important to care about the readability of our code. There are some techniques that programmers can use to achieve it. One of them, is write code comments.

When talking about code comments, there is big debate about it. Should we use comments to describe what our code does ? We should focus on write expressive code that doesn't require comments to be read ? Joe Kunk wrote a blog post about this debate - To Comment or Not to Comment. There are the ones who say that for a code be considered good it should be well-documented and there are the ones who say that we should avoid comments because it's normally used to explain/hide bad code.

In my opinion, influenced by the books, Clean Code and Refactoring, we should avoid to write comments unless we have a really good reason to write one (for example,in a mathematical algorithm) or we are obligated to do it due to some company rules or process. Below, I listed my 5 concerns about code comments.

Where I think that code comments fail

1. They tend to encourage bad code. There is an idea that commented code is a good code, so people often write comments in their code to make them look better. If we need to explain our code adding comments is already a signal that maybe we are writing bad code. Every time we start to write a comment we should think if we can be more expressive just cleaning our code.

2. We spend more time writing and maintaining them. Comments usually are a second version of the code. When we are writing a comment for a function we are repeating ourselves. We are transgressing the DRY (Don't Repeat Yourself) principle. We are spending time and adding complexity. Software requirements changes and code has to change too. If we are writing comments we have to maintain the comments too. So we can end up spending the double of the time when we have to make a change. We could use this time to improve our code or to develop new features.

3. Comments are not testable/verifiable. When we are changing code we can rely on tools, like compilers, IDEs and unit tests to help us. Comments don't. Comments don't have these tools. You can't rely on tools or unit tests to make sure they are right, in the correct place or out-of-date. Once you write a comment you have a not testable piece to care about his correctness and once it fails, it will fail silently.

4. They are less reliable than the documented code. Usually, comments become obsolete and they lose the connection with the code. Then, programmers can read them, and be cheated. Even if the comments are up-to-date, the only way to know if the code does what it should, will always be reading the code. A practical example, if our boss ask to us if a change was made, where we should look ? Code or Comments ?
Of course we will look at the code.

5. Some comment styles can fill a lot of screen space. Some comment standards (like the below) use a lot of lines being a problem when you are trying to read as much code you can.

/**
* 
* @param title The title of the CD
* @param author The author of the CD
* @param tracks The number of tracks on the CD
* @param durationInMinutes The duration of the CD in minutes
*/
public void addCD(String title, String author, 
int tracks, int durationInMinutes) {
CD cd = new CD();
cd.title = title;
cd.author = author;
cd.tracks = tracks;
cd.duration = duration;
cdList.add(cd);
}

Introduction

Two years ago, I was working in a project where our goal was to write a web excel-like application to calculate products/services prices. The team was split in 3 pieces, the development team, the requirement team and the QA team. This project became so big and we didn’t use any types of automated tests (our QA team was doing manual tests) that the project spent more time being tested than being developed. Each little change, the project spent hours, hours and hours with the QA team. One day I went to a developer meeting and talked about my problem with others programmers. They suggested to me learn about unit tests, acceptance tests and TDD.

Learned Lessons

The list below is a lessons list that I learned while applying unit tests/TDD since 2011.

1. Don’t try to apply TDD for the first time or teach TDD to your team in a real project. It won’t work. First is necessary to know how TDD’s flow works. How to mock up objects, how to mock up the framework internals, how organize tests and so on, if your team is not ready, it will slow down the development and you will miss some deadlines.

2. Coding Dojo is a good way to teach TDD. We do coding dojo sessions. It's the best way we found to teach TDD to new developers and to keep our skills up-to-date.

3. Try to convince your whole team before applying TDD. There is nothing more frustrating than one or two developers ruining our test efforts, commenting code, trying to commit with failing tests and so on. I had bad experiences with non committed developers. Explain about the benefits, how tests keep our software bug-free, how we can refactor the code without worry about break the software and so on.

4. Write sufficient tests. Build a test suite is like to build a shield against bugs, the team should be able to fully trust in this shield when we are doing a refactoring or evolving the software. If this shield has gaps, we are increasing the risk to create unidentified bugs when we change the code. You don't have to cover 100% of your code, it's almost impossible and you will lose too much time but cover the majority of your code is perfectly achievable. A good rule is test everything that can possibly break.

5. Use a coverage tool. Coverage tools will report gaps in our test suite. With these tools, is easy to identify code that aren't being tested. Most of these tools give to us a visual identification, coloring the lines that are being tested in blue/green and coloring in red the lines that are not being tested. If you are a .NET Programmer, the Visual Studio Ultimate comes with this feature or if you are a Java Programmer you can use the EclEmma.

6. Tests should be fast. Fast to run and fast to write. When we are building software we are always pursuing a deadline. Our tests have to help us to achieve this goal and not be a distraction or a delay.

If our tests take too much time to be written, the team will stop writing them when the deadlines become too tight.
If our tests take too much time to run, the team won't run them everytime they change the code or they would decrease the team's productivity.

7. Don't comment or ignore failing tests. Once your team becomes comfortable with build failing due to 1 test, they will be comfortable with the build failing due to 2, 3, 4 tests and so on. At these times, the test suite feedback will be ignored and the tests won't be helpful anymore.

8. Pair programming helps the team to adopt TDD. When we are trying TDD for the first time or when our deadline is tight, we will have the will to forget the tests and write only production code. Pair programming will prevent the team to cut corners and will keep it writing tests.

9. Keep your test code clean. Once, to speed our productivity, we decided that our test code shouldn't be as clean as our production code. At first look, it was a good decision but software will change and tests will have to be changed too. We ended up with tests difficult to maintain and with larger estimates due the cost to maintain the tests.

10. Tests should have one and only one reason to fail. Be careful if your test has a lot of assertions. If functions and classes should have only one responsibility, our tests should test only one concept. In this way, it will be easier to look at a failed test and figure out what is wrong.

11. Write unit tests will save debug time. A lot of time is spent debugging code, looking for bugs. Once you are writing unit tests, you will have a real-time feedback of each piece in your code, it will be easier to find bugs and consequently and it will reduce the time we spend debugging.

12. Keep pushing. Apply TDD is all about change our mindset. It's difficult to start to write tests and is even more difficult to write tests BEFORE to write production code. It's important to keep pushing and writing tests, one day, they will end up saving our lives. Also, once your team is fully comfortable with TDD, the productivity tend to increase.

What are Code Smells ?

The notion describing the "when" we should apply refactoring techniques in our code.
- Kent Beck

A code smell is a surface indication that usually corresponds to a deeper problem in the system.
- Martin Fowler

In computer programming, code smell is any symptom in the source code of a program that possibly indicates a deeper problem. Code smells are usually not bugs—they are not technically incorrect and don't currently prevent the program from functioning. Instead, they indicate weaknesses in design that may be slowing down development or increasing the risk of bugs or failures in the future.
- Wikipedia

Code smells are screams from our code trying to tell us that we should stop and think better in what we're doing.
- Myself

Code smells can ruin our project

  • Code smells will slow down the team.
  • Code smells will increase the risk of bugs.
  • Code smells will make the code more and more complex.
  • Code smells will make harder to add new programmers in the project.
  • Code smells will make more difficulty to programmers to make changes and consequently generate value for our clients.
  • Code smells, sometimes, can make the project be cancelled.

How can I discover a code smell ?

Several programmers and writers have been talking about code smells since 90s when Kent Beck invented this concept. A lot of lists were created to catalog and be used as guidelines to fight against code smells in our projects. Below I made my list with the worst code smells in my opinion.

Duplicated Code

It's probably the most seen code smell. If you see the same code block twice, it's already a signal that you should stop and do a refactoring.

If the code is been repeated in the same class we should extract it in a method.
If the code is been repeated in different classes, maybe we are a missing a opportunity to create a super/base class. Or sometimes, if we analyze better ,the code belongs to only one class and the other should be calling a method in the former class.

Commented Code

Sounds funny but some programmers looks like they don't trust their source code control system. I have seen a lot of comments like this:

// Some commented code
// This code was commented due a random reason - 05/10/2005 - Programmer 1

Delete this commented code, your source control won't let you down!

Long Methods/Long classes

Sometimes we see code blocks known as Megazords or Gods, that are pieces of code with too much responsabilities. It's a transgression of one of the SOLID principles - Single Responsability Principle.

Single responsibility principle
a code block should have only a single responsibility.

Maybe methods that are doing a lot of things eg. retrieving data from a source, processing and writing it in a console. It should be clearly splitted in 3 methods. Or maybe classes that are holding behaviors and characteristics that should be in another classes and so on. When writing a solution, a good signal is to look for comments, normally a code where you wrote a comment should be replaced by another function. When designing classes we should look for duplicated code, large methods and bad design.

Divergent Change / Shotgun Surgery

Software will change, we know it. So we have to design our code to make it easy to change. We want a single point of change and for a single reason.

Divergent Change is the code smell that happens when a piece of code is changed for various reasons. If you look for a piece of code and thinks, "I will change it if my database changes, if my business rules changes and if my view rules changes, your code has this bad smell! The solution here is to identify different responsabilities and extract them to different pieces of code.

If when we have a code changing for different reasons we have a Divergent Change, the opposite is also a smell. Shotgun Surgery is a code smell that happens when, for a single reason, you have to change several pieces of code. For example, when performing a database change, you shouldn't change your business rules or your views rules. If when we have a Divergent Change we should split responsabilities, when we have a Shotgun Surgery we should join responsabilities, in this case, database code shouldn't belongs to business or views classes.

Bad Names

Names in software are 90 percent of what make software readable. You need to take the time to choose them wisely and keep
them relevant. Names are too important to treat carelessly.
- Robert C. Martin aka Uncle Bob

Names plays a important role in software development. We spent more time reading code than writing code so we should care about our code readability, we should care about the names that we are giving to our variables, functions, classes and so on. Below are some name guidelines that we can use to achieve a better code.

Choose descriptive names. Look at this code below. What it does ? What is the software domain ? We know nothing about it. For me, it’s only a bunch of non-sense code.

public String getWinner(String p1, String p2) {
    if (p1 == 'r' && p2 == 's') {
        return 'r';
    }

    if (p1 == 's' && p2 == 'p') {
        return 's';
    }

    if (p1 == 'p' && p2 == 'r') {
        return 'p'
    }

    return 'd'
}

Now look at this code below. It’s exactly the same code but with better names. Now, we know what the code does and it's easier to evolve and maintain.

/*
Changed the code to adopt best practices - 07/07/2013.
*/

public enum PlaysTypes
{
    ROCK,PAPER, SCISSOR;
}

public enum WinnersTypes
{
    ROCK,PAPER, SCISSOR, DRAW;   
}      

public WinnersTypes judgeWinner(PlaysTypes play1, PlaysTypes play2) {
    if (play1 == PlaysTypes.ROCK && play2 == PlaysTypes.SCISSOR) {
        return WinnersTypes.ROCK;
    }
    
    if (play1 == PlaysTypes.SCISSOR && play2 == PlaysTypes.PAPER) {
        return WinnersTypes.SCISSOR;
    }
    
    if (play1 == PlaysTypes.PAPER && play2 == PlaysTypes.ROCK) {
        return WinnersTypes.PAPER;
    }
            
    return WinnersTypes.DRAW;
}

Use standard names when possible. It's easier to understand names that are based in conventions or standards. If you are using a MVC architecture, we should name our controllers in this way xxxController. If we are applying some design pattern, is good to give names like xxxxSingleton or xxxAdapter.
Your own project should have some standards. Let's respect them.

Avoid encoding and Hungarian Notation. Once I was working in a project where we should use the following convention:

L for local variables
G for global variables
P for parameters
str for string
int for int
and so on.

We ended up with these terrible names.

Lstr_name
Gint_session_id
Pint_age

Today's environments provide us all the info that we need. We can rely on them.

Names should describe everything that a code does. Don't use simple names for functions that do more than one thing. Look at the code below.

public Connection getConnection() {
    if (connection == null) {
        connection = new Connection();
    }

    return connection;
}

This function does more than only get a connection. If connection is null, it also creates a connection. A better name could be createOrReturnConnection.

Code Smell's Lists

Other people also shared their smells list. You can check out the lists below.

Jeff Atwood's List
Cunningham's List

Books that cover Code Smells and Code Refactoring

Refactoring: Improving the Design of Existing Code
Clean Code: A Handbook of Agile Software Craftsmanship

Introduction

Few weeks ago I wrote a blog post showing how we can write command line applications using argparse, you can check out it here. Some people suggested me to try docopt and this post is a result from this experience.

Docopt, the Pythonic command line arguments parser, that will make you smile

What is it ?

Docopt, like optparse and argparse, is a small library that simplifies the task to write command-line interfaces. But it came with a different idea.

How optparse and argparse work ? You have to read the documentation and write code that will provide guidelines to parse args. A example of this approach is the code that I wrote in my previous post. And being honest, I looked the documentation several times, trying to understand how each method works.

parser = argparse.ArgumentParser()
parser.add_argument("operation", 
	help="mathematical operation that will be performed", 
	choices=['add', 'subtract', 'multiply', 'divide'])
parser.add_argument("num1", help="the first number", type=int)
parser.add_argument("num2", help="the second number", type=int)
args = parser.parse_args()

How docopt works ? When we are designing a command-line interface, we write or we should write, our application documentation, right ? Docopt uses this docstring to parse the args. Look at this example extracted from documentation:

"""Naval Fate.

Usage:
  naval_fate.py ship new <name>...
  naval_fate.py ship <name> move <x> <y> [--speed=<kn>]
  naval_fate.py ship shoot <x> <y>
  naval_fate.py mine (set|remove) <x> <y> [--moored|--drifting]
  naval_fate.py -h | --help
  naval_fate.py --version

Options:
  -h --help     Show this screen.
  --version     Show version.
  --speed=<kn>  Speed in knots [default: 10].
  --moored      Moored (anchored) mine.
  --drifting    Drifting mine.

"""
from docopt import docopt


if __name__ == '__main__':
    arguments = docopt(__doc__, version='Naval Fate 2.0')
    print(arguments)

When I saw it for the first time, I had a "holy shit" moment and said "It won't work", but it works! You will avoid to read documentation and to write parse code, all you have to do is writing your application documentation and instantly you have your application running with all options and help messages working properly!

Coming back to the calculator example

In my first post about command line interfaces I built two command calculators, one using python w/o libraries and other using argparse. So to compare these libraries, let's build another calculator, in this time, using docopt. This is our documentation:

/*
We are going to create a command-line calculator. 
The calculator should be used in this way:

python calc.py add 2 2
> 4

python calc.py subtract 2 2
> 0

python calc.py multiply 2 2
> 4

python calc.py divide 2 2
> 1
   
*/

Let's transform it in the docopt pattern, that is the pattern of almost all command-line applications.

"""Calculator using docopt

Usage:
  calc_docopt.py operation <num1> <num2>
  calc_docopt.py (-h | --help)

Arguments
  <operation> Math Operation
  <num1> First Number
  <num2> Second Number

Options:
  -h --help     Show this screen.

"""

Now, let's add the docopt code, don't be afraid of that. With 3-4 lines of code our parser are done.

"""Calculator using docopt

Usage:
  calc_docopt.py operation <num1> <num2>
  calc_docopt.py (-h | --help)

Arguments
  <operation> Math Operation
  <num1> First Number
  <num2> Second Number

Options:
  -h --help     Show this screen.

"""
from docopt import docopt

if __name__ == '__main__':
    arguments = docopt(__doc__, version='Calculator with docopt')
    print(arguments)

Help Messages

Let's see some of the messages that docopt provide for us.

Help message

python calc_docopt.py -help
>>> Calculator using docopt
>>>
>>> Usage:
>>>   calc_docopt.py <operation> <num1> <num2>
>>>   calc_docopt.py (-h | --help)
>>> 
>>> Arguments
>>>   <operation> Math Operation
>>>   <num1> First Number
>>>   <num2> Second Number
>>> 
>>> Options:
>>>   -h --help     Show this screen.

Invalid Arguments

python calc_docopt.py
>>> Usage:
>>>   calc_docopt.py <operation> <num1> <num2>
>>>   calc_docopt.py (-h | --help)

Operations

Our work right now is to coding the calculator operations. I will respect the DRY’s gods and copy the operations from the other post.

"""Calculator using docopt

Usage:
  calc_docopt.py <operation> <num1> <num2>
  calc_docopt.py (-h | --help)

Arguments
  num1 First Number
  num2 Second Number

Options:
  -h --help     Show this screen.

"""
from docopt import docopt

if __name__ == '__main__':
    args = docopt(__doc__, version='Calculator with docopt')
    functions = {
        'add': lambda num1, num2: num1 + num2,
        'subtract': lambda num1, num2: num1 - num2,
        'multiply': lambda num1, num2: num1 * num2,
        'divide': lambda num1, num2: num1 / num2
    }

    print functions[args['<operation>']](int(args['<num1>']), int(args['<num2>']))

Our calculator now performs math operations as we expect.

Input Validation

If you are a good observer, you will see that our calculator lacks type parse/checking and operation checking. It's easier to accomplish it with argparse. But docopt doesn't have built-in input validations.

docopt does one thing and does it well: it implements your command-line interface. However it does not validate the input data. On the other hand there are libraries like python schema which make validating data a breeze.
- docopt documentation

You can get more info about python schema here.

Conclusion

Docopt is a new approach to write command apps. You can easily build apps. We can only appreciate Vladimir Keleshev's work and hope that it continues to improving and why not conquest a place in python core ?

Today, I was reading some articles in the internet when I found this post by Jason Whaley - The Five Most Influential Books to Me as a Developer. I’m addicted for programming books, at least once in a week, I find myself searching new programming books at Amazon. I decided to see what the programming superstars like to read.

David Heinemeier Hansson, Rails creator, in his post, shared this list in his blog:

Smalltalk Best Practice Patterns
Refactoring: Improving the Design of Existing Code
Patterns of Enterprise Application Architecture
Domain-Driven Design: Tackling Complexity in the Heart of Software
Are Your Lights On?: How to Figure Out What the Problem Really Is

Jeff Atwood, from StackOverflow and CodingHorror, recommended these books:

Code Complete: A Practical Handbook of Software Construction, Second Edition
Don't Make Me Think: A Common Sense Approach to Web Usability, 2nd Edition
Peopleware: Productive Projects and Teams (Second Edition)
The Pragmatic Programmer: From Journeyman to Master
Facts and Fallacies of Software Engineering

StackOverflow community also listed their top 100 most influential books in this post.

My Own List

When I started my programming career I did not read programming books. I thought that most of my learning would come from my daily work. I was wrong. Programming is a practical activity, you have to do the same thing once, twice, three times to become really good, but sometimes it is necessary to slow down and see what the real top performers think about coding. Most often, you will learn in a book in a month things that you would take you years to learn without them.

The books below helped me to change my opinion on programming and improve and critique my own code.

Refactoring: Improving the Design of Existing Code

It was the first good book that I read. Before it, all that I read was focused in specific technologies like pascal, C or Java. This book explains what it is a code smell, how you can identify one and how you can fix it. In last section, there is a list of code smells and refactoring recipes. All developers should read this one.

Clean Code: A Handbook of Agile Software Craftsmanship

This book teaches how to write good code. How to create good names, good functions, good objects, and good classes, how to format our code, how to write tests and so on. The second section is a case study when the author shows a bad code and transform it in a good code using good practices . I read this book once, but definitively is a book that I will read again, and again, and again, until absorb all the principles.

Test Driven Development: By Example

Ok, at this time, I knew a lot about how to identify a good code, how to write good functions, how to split my classes and so on, but one thing was frustrating myself. I couldn't rewrite my code because I couldn't guarantee that it would work again. I needed a way to verify that my code behavior wont change. I found it using TDD. In this book Kent Beck teaches us how we can write tests before write production code and how it can help us to achieve a better code and better design.

Domain-Driven Design: Tackling Complexity in the Heart of Software

I started my career in a software factory. We normally used a design, known as anemic model, where our entities were copies from our database tables, and once created were unlikely to change. In this book, Eric, teaches how we can use more OO in our software, how we can design better our software, how we can bring the domain expert to increase our knowledge about software domain, and use iterations to refine our model. A lot of concepts are introduced like ValueObjects, Aggregates, Services and so on.

C# in Depth

This books was the first book that I read that talk about language design. Through this book I could understand how C# is evolving, and why some decisions was taken. It also taught me some features that I never had used and that I could show it to my team to improving our code quality.

Suggestions

And you? Which book you consider the most influential book in your programming career?

I started a series to share resources that I have been using to learn some technologies. In this post, the subject is Javascript. I already made two posts covering Javascript here, and here.

Why we should learn Javascript ?

Some years ago, I was using Javascript only to perform validations or to build masks, but today the javascript use grew up. The new web requires beautiful and responsive UI’s that only will be achieved using a lot of javascript code. But javascript is going beyond that, it achieved the back-end also, today we can write a full application using only javascript. A good architecture could be:

Angular.js on Front-end
Node.js on Back-end
MongoDB as database with a javascript driver

Because of this, a couple months ago I started to study seriously javascript, and I will share the resources that I have been using with you:

Books

JavaScript: The Good Parts by Douglas Crockford: Crockford is a Senior Javascript at Yahoo! and created and maintains the JSON format. In this book he proposes to divide Javascript in good (that we should be using) and bad parts (that we should unlearn) and focuses in the features that makes javascript awesome.

Test-Driven JavaScript Development by Christian Johansen: It uses real examples to teach how we can write more robust, maintainable, and reliable JavaScript code using automated tests. It includes some frameworks like Node.js, qunit, jasmine and so on.

Secrets of the JavaScript Ninja by John Resig: J. Resig created the most used javascript framework, the JQuery, Written for JavaScript developers with intermediate-level skills, this book covers functions, closures, cross-browser development, object orientation, regular expressions and so on.

Eloquent JavaScript: A Modern Introduction to Programming by Marijn Haverbeke: Introduction to javascript, covers the basic, DOM, functional programming and object-oriented programming. It has a free HTML version.

JQuery is a huge part of javascript environment, I added these two books:

jQuery: Novice to Ninja: Clear and fun-to-read, this book covers jquery fundamentals.

Pro jQuery by Adam Freeman - He likes to write bible-like books. In this book he covers almost everything about jQuery, fundamentals, jquery ui, jquery mobile, how to create plugins and so on.

Online Courses

Codecademy's javascript course: Free course that covers the basic about the language. It has programming exercises that we can answer online.

Code School' Course: Code School has some javascript courses. Most of them are paid. But they offer a trial period.

Javascript Fundamentals: Development for Absolute Beginners by Bob Tabor: Free course at Channel9 that teaches you the fundamentals of Javascript programming.

Online Videos

Douglas Crockford's talk at Google Tech Talks about Javascript Good Parts

John Resig' talk at Google Tech Talks about Best Practices

Brendan Eich's (Javascript creator) talk about Javascript History

Newsletter

Javascript Weekly: Weekly newsletter about the javascript world, new frameworks, good blog posts and so on.

Good Code

Some writers say that to write a book is necessary to read a lot of other books. Software is similar, so, below is a good amount of good javascript code.

jQuery
jQuery UI
Node.js
D3.js
Angular.js

Call for Suggestions

This list is open to suggestions, so you are welcome to contribute!

For those who comes from .NET or Java, one of best things about python or ruby is the capability to write scripts. These little programs can solve a lot of problems like move files in a CI Server, clean some data, run tests and so on. When we are learning python, we have to learn also to use python scripts, everybody uses the following commands when we are installing a python library:

python setup.py build
python setup.py install

Or we use the following command when we are programming with django:

django-admin.py startproject mysite

Building a command-line application

In this post, we are going to learn how we can build a command-line calculator. Below is our application backlog:

/*
We are going to create a command-line calculator. 
The calculator should be used in this way:

python calc.py add 2 2
> 4

python calc.py subtract 2 2
> 0

python calc.py multiply 2 2
> 4

python calc.py divide 2 2
> 1
   
*/

Using sys.argv

When calling a python script, our arguments are stored in sys.argv variable in the following order: Script Name + arguments, so we can print each argument in this way:

# calc.py

import sys

for arg in sys.argv:
	print arg

And here we are printing our arguments:

python calc.py 1 2 3
> calc.py
> 1
> 2
> 3

Building our calculator

We already know how we can get our arguments, now we can implement the operations:

Implementing Add

# calc.py

import sys

args = sys.argv[1:]

operation = args[0]
num1 = int(args[1])
num2 = int(args[2])

if operation == 'add':
	total = num1 + num2

print total	

Other Operations

Implement the other operations is straightforward, we can easily code them:

# calc.py

import sys

args = sys.argv[1:]

operation = args[0]
num1 = int(args[1])
num2 = int(args[2])

if operation == 'add':
	total = num1 + num2
elif operation == 'subtract':
	total = num1 - num2
elif operation == 'multiply':
	total = num1 * num2
elif operation == 'divide':
	total = num1 / num2

print total

A switch..case situation ?

Python, unlike many OO languages, doesn't have switch..case statement and if it can be seen as a disadvantage at first look, it has a reason! When we use switch..case, in most of cases it indicates that there should be a polymorphic call there. But if..elif is bad as switch..case. We can write it in a more elegant way using dictionaries and functional programming. So we end up with the following code:

# calc.py

import sys

args = sys.argv[1:]

operation = args[0]
num1 = int(args[1])
num2 = int(args[2])

functions = {
	'add': lambda num1, num2: num1 + num2,
	'subtract': lambda num1, num2: num1 - num2,
	'multiply': lambda num1, num2: num1 * num2,
	'divide': lambda num1, num2: num1 / num2
}

total = functions[operation](num1, num2)

print total

Much better!

Just a toy application

It’s just a toy application. People don’t write command-line applications in this way. Real command-line applications have error checking, flags, help and other things. Instead it, people use modules that help to achieve these requirements. Some of the existent options are:

Using argparse to write a more robust application

Argparse makes easy to write command-line applications. You provide info about the parameters and argparse parse those out of sys.argv. It also provides help, usage messages and issues errors.

A basic program

import argparse

parser = argparse.ArgumentParser()
parser.parse_args()

This program does nothing. But we can already see the error messages, help and usage tips.

python calc_argparse.py
>>>

python calc_argparse.py --help
>>> usage: calc_argparse.py [-h]
>>>
>>> optional arguments: -h, --help  show this help message and exit

python calc_argparse.py --flag
>>> usage: calc_argparse.py [-h]
>>> calc_argparse.py: error: unrecognized arguments: --flag

python calc_argparse.py arg
>>> usage: calc_argparse.py [-h]
>>> calc_argparse.py: error: unrecognized arguments: arg

Adding arguments

Our command-line calculator does nothing, so let's start to add the parameters. The first parameter will be the operation.

import argparse

parser = argparse.ArgumentParser()
parser.add_argument("operation")
args = parser.parse_args()

print args.operation

Let's see how argparse parses arguments and shows error and help messages.

python calc_argparse.py
>>> usage: calc_argparse.py [-h] operation
>>> calc_argparse.py: error: too few arguments

python calc_argparse.py --help
>>> usage: calc_argparse.py [-h] operation
>>> positional arguments: operation
>>> optional arguments: -h, --help  show this help message and exit

python calc_argparse.py sum
>>> sum

Better help messages

Everything is working fine. But if we look carefully, what operation means ? We should have a help message. With argparse we can do it when we add an argument. Let’s do it:

import argparse

parser = argparse.ArgumentParser()
parser.add_argument("operation", 
	help="mathematical operation that will be performed")
args = parser.parse_args()

print args.operation

And we will see a better help message.

python calc_argparse.py --help
>>> usage: calc_argparse.py [-h] operation
>>> positional arguments:
>>> 		operation mathematical operation that will be performed
>>> optional arguments: -h, --help  show this help message and exit

Write the calculator

Until this moment we are just playing with argparse features. Now, we are going to write our calculator operations.
We have to add more two arguments (num1 and num2) and we will use the functions from the previous example to perform the operations. It should work.

import argparse

functions = {
	'add': lambda num1, num2: num1 + num2,
	'subtract': lambda num1, num2: num1 - num2,
	'multiply': lambda num1, num2: num1 * num2,
	'divide': lambda num1, num2: num1 / num2
}

parser = argparse.ArgumentParser()
parser.add_argument("operation", 
	help="mathematical operation that will be performed")
parser.add_argument("num1", help="the first number")
parser.add_argument("num2", help="the second number")
args = parser.parse_args()
print functions[args.operation](args.num1, args.num2)
python calc_argparse.py multiply 1 2
>>> Traceback (most recent call last):
>>>  File "calc_argparse.py", line 16, in <module>
>>>    print functions[args.operation](args.num1, args.num2)
>>>  File "calc_argparse.py", line 6, in <lambda>
>>>    'multiply': lambda num1, num2: num1 * num2,
>>> TypeError: can't multiply sequence by non-int of type 'str'

But it's not working. If we analyze the error, we can perceive that looks like the numbers are been parsed as str. It's a TypeError. When using argparse, we have to pass the argument types, it will parse with the right type and will check for type errors. Let's do it.

import argparse

functions = {
	'add': lambda num1, num2: num1 + num2,
	'subtract': lambda num1, num2: num1 - num2,
	'multiply': lambda num1, num2: num1 * num2,
	'divide': lambda num1, num2: num1 / num2
}

parser = argparse.ArgumentParser()
parser.add_argument("operation", 
	help="mathematical operation that will be performed")
parser.add_argument("num1", help="the first number", type=int)
parser.add_argument("num2", help="the second number", type=int)
args = parser.parse_args()

print functions[args.operation](args.num1, args.num2)

Now everything is working fine, we also have type errors checking.

python calc_argparse.py add 2 2
>>> 4

python calc_argparse.py subtract 2 2
>>> 0

python calc_argparse.py multiply 2 2
>>> 4

python calc_argparse.py divide 2 2
>>> 1

python calc_argparse.py add '2' '2'
>>> usage: calc_argparse.py [-h] operation num1 num2
>>> calc_argparse.py: error: argument num1: invalid int value: "'2'"

Operation Error Checking


Look what happens when me try to perform a invalid operation.

python calc_argparse.py invalid_operation 2 2
>>> Traceback (most recent call last):
>>>  File "calc_argparse.py", line 16, in <module>
>>>    print functions[args.operation](args.num1, args.num2)
>>> KeyError: 'invalid_operation'

Obviously, there isn't a 'invalid_operation' operation in math. So we should be limiting the available operations. With argparse, we can do it using the choices argument when we are adding an argument:

import argparse

functions = {
	'add': lambda num1, num2: num1 + num2,
	'subtract': lambda num1, num2: num1 - num2,
	'multiply': lambda num1, num2: num1 * num2,
	'divide': lambda num1, num2: num1 / num2
}

parser = argparse.ArgumentParser()
parser.add_argument("operation", 
	help="mathematical operation that will be performed", 
	choices=['add', 'subtract', 'multiply', 'divide'])
parser.add_argument("num1", help="the first number", type=int)
parser.add_argument("num2", help="the second number", type=int)
args = parser.parse_args()

print functions[args.operation](args.num1, args.num2)

Now, if we try a different operation we will see a better error.

python calc_argparse.py invalid_operation 2 2
>>> usage: calc_argparse.py [-h] {add,subtract,multiply,divide} num1 num2
>>> calc_argparse.py: error: 
>>>		argument operation: invalid choice: 'invalid_operation'
>>>  (choose from 'add', 'subtract', 'multiply', 'divide')

This is it. With argparse, we can achieve a more robust and error-safe application with less code.

Hi folks, I started a series about resources to become a ninja in some technologies. It was a way that I found to share the resources that I have been reading recently. In this post I will show resources to learn machine learning.

Books

Machine Learning by Peter Flach: Introductory textbook. It starts discussing how a spam filter works and then talk about machine learning elements (Features, Tasks and Models). After that, it shows the existent models (Rules based, probabilistics, tree based and so on).

Pattern Recognition and Machine Learning by Christopher M. Bishop: It covers a lot of Machine Learning Techniques (Regression, Classification, Neural Networks and so on), focusing in the math behind each technique. No previous knowledge of pattern recognition or machine learning is required, but good math skills will help a lot.

MOOC's

There are some free Machine Learning courses that I recommend:

Coursera Machine Learning Course by Andrew Ng: This course provide a introduction to machine learning. It covers supervised learning (regression, classification, neural networks, support vector machines), unsupervised learning (clustering, deep learning). It uses Octave to solve the exercises. This course is offered few times in the year, so it's good to see when it will be offered.

Caltech Machine Learning Course by Yaser Abu-Mostafa: It has 18 lectures with 60 minute each + homeworks. It covers less topics than Cousera's Course (only supervised learning).

Coursera Neural Networks Course by Geoffrey Hinton: Course about Neural Networks and how it's used in speech and object recognition, image segmentation, modeling language and human motion.

Machine Learning with Python

I'm studying machine learning with python. Below is the resources that i have been using.

Scikit-Learn: Python library with machine learning techniques already implemented.

An Introduction to scikit-learn: Machine Learning in Python by Jake Vanderplas: Talk given at PyConf 2013. 3h-tutorial covering machine learning concepts and scikit-learn package.

Advanced Machine Learning with scikit-learn by Olivier Grisel: Offers an in-depth experience of methods and tools for the Machine Learning practitioner through a selection of advanced features of scikit-learn and related projects. Scikit-learn/Machine Learning experience is required.

Practicing Machine Learning

Kaggle Competitions: Machine Learning is like programming, you have to practice to become a expert. Kaggles offers a lot of machine learning puzzles, you can use them for learning. If you are a top performer, there are competitions with money prizes.

Call for Suggestions

The intent is always improve this list. If you know other resources to learning Machine Learning, you are welcome to contribute!