Working with Strings and Regular Expressions

Mar 9, 2016

An exchange at work got me thinking this last week. Someone said “I really hate regex.” For those of you not in the loop on the lingo, regex means regular expressions. After a little prying, it turned out the reason they didn’t like regular expressions was really rooted in the fact that they didn’t have a chance to use them very often, so the concept is unfamiliar.

In this post I am not going to go into some of the more advanced parts of regular expressions because, a: they complicate things significantly and b: because I’m not experienced enough with them to give good guidance. Foundational regular expressions are still more than enough to discuss for a single blog post, nonetheless. For now, I am going to cover how to leverage a lot of power from a very small subset of the entire set of tools.

Along with this post I am also going to give screen casting another go and provide a bunch of examples in a video which, hopefully, will be a good supplement to the information I will present in my blog. In the video I am using approval testing to demonstrate the output from various regular expressions, which should make applying regex easier to do.

States and Strings

For the brave and the true, let’s start at the beginning. Regular expressions are a form of finite state machine with an accept state. What this means in English is, regular expressions go through a set of predefined states and, if enough of the characteristics are found in a string in the right order, then the pattern is considered a match.

A way to see this in action is the following:

var pattern = /abb/;
var testStr1 = 'abb';
var testStr2 = 'cabaabbac';
var testStr3 = 'ababcabadc';

testStr1.match(pattern) !== null; // true
testStr2.match(pattern) !== null; // true
testStr3.match(pattern) !== null; // false

We can see the pattern of characters in the first test string matches the pattern exactly. We expect this to pass. The second string also contains abb roughly in the middle. This means our pattern will also result in a match. The third string comes very close three times, but abb is not actually found anywhere in the entire string. Let’s take a look at the way the search is conducted in the second string since it is the more interesting of the passing two:

c
aba
b
aa
abb <- accept.

What this demonstration is showing is, the pattern matching mechanism starts verifying on a character and will continue to test until either a character does not pass one of the rules or a set of characters matches all of the rules and reaches what is referred to as an accept state. Regardless of how complex a regular expression ever gets, the underlying function always comes back to reaching an accept state or not. This means if something goes horribly awry and you can’t figure out what happened, simplify your expression and build piece by piece, extending the states in your pattern one by one until you get to the pattern you need.

Special Case Behaviors

Regular expressions come with a whole array of special case definitions and characters which match those cases. For now we are going to look at some of the simpler parts which you can use to make your pattern matching more expressive as you go.

The Optional Pattern

The first special case is the optional modifier, ‘?’. By adding a question mark after a character or expression, you can set a rule that the last part of the pattern was optional in nature and may or may not actually exist. A very common example for this is the American versus British spelling of color/colour. Here’s how it works:

var pattern = /colou?r/;

'color'.match(pattern) !== null; // true
'colour'.match(pattern) !== null; // true
'coolour'.match(pattern) !== null; // false

Obviously, the last string doesn’t match a valid spelling of color for any English standard defined, so no matches are returned, i.e. the result of the match expression is null. We can also see that both color and colour are considered valid matches, which means match will return correctly matched strings in the matches array.

Testing for Any Character

Sometimes you are less certain about the particular character which you want to match. For instance, let’s suppose we want to find any word which starts with ca, but we don’t care what the last letter is. We can use the ‘.’ pattern to match any word of this form, even if they aren’t valid words. Here’s an example:

var pattern = /ca./;

'car'.match(pattern) !== null; // true
'cam'.match(pattern) !== null; // true
'Occam\'s razor'.match(pattern) !== null; // true

The last string actually demonstrates the problem we run up against with regular expressions of this type. Regular expressions are not particularly discriminatory. The only rules we provided were that there must be the characters c and a, followed by another character. “Occam’s” satisfies this just fine even though cam is in the middle of a string of characters.

Initial and Terminating Patterns

Next let’s have a look at two more modifying patterns. We can actually express position of a string by using ‘^’ and ‘$’. The ‘^’ pattern expresses that anything which follows that character must be positioned at the very beginning of the string. ‘$’ is the polar opposite, saying that the string must match and terminate at that point. These are probably easiest seen in action.

var pattern1 = /^ca./;

'car mechanic'.match(pattern) !== null; // true
'cam shaft'.match(pattern) !== null; // true
'Occam\'s razor'.match(pattern) !== null; // false

var pattern2 = /ca.$/;

'riding in a cab'.match(pattern) !== null; // true
'a crow says caw'.match(pattern) !== null; // true
'Occam\'s razor'.match(pattern) !== null; // false

As you can see, pattern 1 only matched patterns which started with ca and pattern 2 only matched strings ending with ca followed by another character. This can be really handy for testing that a string either starts or ends with an expected value. The initializing and terminating patterns are also very useful in testing that a given string contains only characters which match an acceptable pattern. This is common for testing things like postal codes, phone numbers and email addresses.

Multiple Patterns

Where ‘?’ will make a pattern optional, there are broader reaching patterns like ‘’ and ‘+’. The ‘’ pattern means 0 or more matches of the previous character or pattern are acceptable. Similarly, the ‘+’ pattern means 1 or more matches of the previous character or pattern are acceptable.

var pattern1 = /colou*r/;
var pattern2 = /colou+r/;

'color'.match(pattern1) !== null; // true
'color'.match(pattern2) !== null; // false

'colouuur'.match(pattern1) !== null; // true
'colouuur'.match(pattern2) !== null; // true

With two pattern characters which are so close in meaning, it seems reasonable to ask why both? As it turns out, the case of 1 or more is common enough it is actually quite valuable to address that situation directly.

Any In Matching

The last pattern we are going to cover, quickly, is the match any in case specified as ‘[]’. Anything contained within the brackets will be used as an acceptable match. This means, if you have a very specific set of acceptable characters, it is easy to simply list them all and match against the definition. This is a common use case if numbers and letters are acceptable, but punctuation is not.

var pattern = /^ca[a-z]+$/;

'cabby'.match(pattern) !== null; // true
'cantina'.match(pattern) !== null; // true
'candl\'d'.match(pattern) !== null; // false

Since cabby and cantina both start with ca and contain only letters, they were proper matches for our pattern. Since candl’d (which is basically a nonsense spelling) has punctuation, it is not an acceptable match, so we get null as our match result.

Regex Flags

In Javascript (as well as Perl and potentially other languages) there are flags which accompany a regex search. Up to this point we have been, largely, looking at lower-case matches. In regular expressions, case matters. This means it is useful to be able to declare when case should or should not matter. For example Chris and chris are different strings, so a search for /chris/ won’t match Chris. That’s not helpful when case is unimportant. We can flag a pattern as case insensitive with ‘i’.

'Occam\'s razor'.match(/occam/) !== null; // false
'Occam\'s razor'.match(/occam/i) !== null; // true

The search string remains the same, but by adding the ‘i’ flag, we increase the allowable characters in our search and broaden the results which are returned.

In much the same way ‘i’ allows us to broaden our search by ignoring case, the ‘g’ flag will tell the regular expression engine to find all matches instead of just the first. Let’s have a look at that as well.

'Can we go to the can-can?'.match(/can/).length; // 1
'Can we go to the can-can?'.match(/can/g).length; // 2
'Can we go to the can-can?'.match(/can/ig).length; // 3

As you can see, as we add flags to our search, the results expand to capture more cases and instances. This is especially helpful when trying to replace values or split strings on complex patterns.

Summary

Regular expressions are a fairly rich way to describe string patterns for performing all kinds of matching and checks. Along with matching and checks, regular expressions make it easy to perform complex find and replace operations and string tokenizing.

With particularly complex expressions, regex can look rather cryptic, but when small, focused checks and behaviors are being performed regular expressions can be a great benefit to the programmer for working with strings and reducing pain.

Learning Javascript Differently

Mar 2, 2016

On Thursday and Friday I was at a convention called Agile Open San Diego. The core idea is people getting together and having two full days of water cooler discussions about agile development and leadership. It’s pretty cool and you should go to one near you. Anyway, something happened on Friday morning: I realized we need a new way of approaching language learning.

Currently there are a number of ways people can learn languages including classes, videos, code schools, code katas and more. The most important thing I have noticed is this: none of these really do a good job of building the language into your brain quickly or permanently.

I have watched people who are new at programming struggle through code katas after working through videos and online learning processes. I think it’s time for something different. Borrowing against the ideas of code cooking and the way martial arts are learned, I created the first form in a series of forms for Javascript learning.

What a Form Is

In in certain martial arts, a form (much like a kata) is a scripted set of motions which help to define a dictionary of movements the practitioner will use when sparring or in a fight situation. The form is important for developing muscle memory and deepening the mental relationship with the art.

When developing a relationship with a new language any experienced programmer picks a project to do which will force them to go through the motions of asking all the questions they need to ask to really understand how the new language ticks. This is the driving idea behind doing code katas. Katas force a programmer to look at the language they are working with and ask the same kinds of questions.

What if the new language is also the FIRST language?

A new programmer won’t necessarily even know the first questions to ask to get to the right questions. This is why katas help sharpen developers but don’t bring new developers up quickly. This is what a form will help to fix. Instead of placing a problem in front of a new developer, it places the code and asks them “what does this do?”

Here’s a quick start video I made to give a better idea:

Creating the First Form

In the language forms design I envisioned a total of (probably) six forms. Of course before there are six, there must be one. I wanted to emulate a system I already understood, which meant that the first form should be the gateway to greater understanding. I wanted to introduce some of the most common aspects of the language and use a problem which had a lot of room to grow.

Ideally, if the language student were to work through only first form, they would have enough knowledge of the language to start solving problems. The solutions may not be elegant and they would likely not seem refined to the veteran programmer, but solutions and doing are the path to greater understanding.

With all of this in mind, I chose a problem which covered interaction with vectors. Though this sounds pretty “mathy,” anyone who has completed an intermediate course in Algebra should be able to understand what is happening. Vectors represent just the right amount of leg room that the problem would be understandable by the accomplished first form student and had plenty to grow on for the third form student.

After choosing a problem that met my needs, I started creating the code. I want first form to represent the way someone would actually work through the problem in an agile environment. This means every step the student takes, there is a test to validate their progress. It becomes far less important for them to sit and scrutinize their code to make sure they got every character right on the first pass since the process is: read the test, write the code, run the code. If the code runs and it looks like what the golden example presents, it must be right. This gives students early comfort through TDD and small-step development.

Leveling

Something I personally have a terrible time with when I am trying to teach someone something is setting aside what I know. I want to give the student everything and have them see why I take such joy in what I do. It doesn’t ever work that way. Instead, I end up overloading them and they have trouble absorbing what I am trying to share.

Lynn Langit encourages the idea of leveling in her Teaching Kids Programming process. Leveling is the idea of presenting a simple idea that does one thing. Once the core idea is understood, then enhancing the idea with a greater concept, while reinforcing the original idea. All programming builds on more foundational concepts. This means that any topic can be taught through a leveling approach.

Language forms should work this way. The first form actually opens with a greeting. This mirrors the kung fu tradition of performing a greeting or salute to Buddha. The greeting is, essentially the foundational Hello World program, TDD style. We could call this a salute to the programming force, or something. I haven’t come up with a name yet, but it’s coming.

After the greeting, the very next thing first form does is it presents the concept of squaring a number. We don’t need to understand the entire process yet and the full problem is largely academic so the student can focus more on growing to solve the entire problem instead of trying to understand what it does. As we said before, katas are great for sharpening problem solving, this is all about developing a relationship with the language itself.

As we move through the process of solving the entire problem, the student will go through the process of small-step design, abstraction, TDD, conditionals, loops, variables, functions and so on. First form is a guided tour though the way a professional programmer works without the fear and intimidation that comes with getting thrown in with the sharks.

Later Forms

After the first form student has successfully grown to proficiency, they will need to move to the next form. Currently, only the first form is complete, but the second and third forms will actually make subsequent passes over the same code that was covered before. This process of enhancing existing code provides direct insight into identifying patterns, refactoring and smoothing out the edges on legacy systems.

Later forms will pull in concepts like closures, object orientation, higher order functions and so on. The process of creating custom types will be brought to the fore and, by the third form, the student will start learning ways to escape common pitfalls and smells by using well known patterns. They will be able to use what they have learned to improvise on solutions and find their way out of sticky situations.

Forms beyond the third are still in the embryonic stage in my mind, so they will come as I discover how to develop them. So far, I know they will likely not cover the same ground, but will, instead, dig into deeper topics like Gang of Four patterns, algorithms and data structures and greater language mastery. Understandably, not all students will want to go this far with a single language, but ideally, this would provide a means for the dedicated language student to open up greater discovery channels they can follow on their own.

In the end, all of the forms should create a comprehensive system for teaching language through immersion, muscle memory, leveling and, in the end, understanding. The forms are made, not to be done once, but to be repeated again and again to develop a comfort with the language without trying to force the student through solving puzzles and toy problems which they may not have the answers for yet.

Check out my progress creating Javascript forms on GitHub.

Types and Functions in Javascript

Feb 24, 2016

A while ago we talked about creating a custom type in Javascript using object inheritance. There were a couple of fundamental issues with this post: first it was fairly academic and was unlikely to be applicable to the real world; second we didn’t actually do anything with our resulting type.

I decided it was time to revisit the topic, spending less time on the hows of creating a new type more time on the whys. I created a gist with a full definition of a Vector object so we could start looking at how we can interact with a type and why it’s valuable to isolate object-oriented patterns to type system related activities rather than bundling everything in a class because “it’s the way things are done.”

A First Look

Something you might note right away is we have done some fancy finger work with our definition and created a combination of inheritable behaviors and static functions. This gives us the ability to fall back to the factory pattern for our object instead of instantiating it directly in the middle of our code. This kind of action is similar to how someone writing Scala might handle an object.

In fact this very kind of behavior is precisely the reason I really, REALLY want to love Scala. I don’t but I want to.

Vector also has both valueOf and toString methods which override the base object definition behaviors. This is really important since we don’t want some giant object output blob if we stringify our vector. Really, we want something akin to the mathematical representation, so if we can get this kind of behavior: Vector.build(1, 2, 3).toString(); // <1,2,3>

In much the same way we want a sane output when we call toString, valueOf should also give us something useful. Instead of returning the whole vector object, which is not easy to interact with in code, it would be preferable if valueOf actually gave us a meaningful data structure like an array. ValueOf will be especially important as we get into interacting with our vector.

Finally, we want our vector to be something we can interact with directly if necessary. Hiding the data away into a list somewhere is far less useful than putting it somewhere predictable. By using numeric indices on our object, if we reference Vector.build(4, 5, 6)[1]; we get 5, which is what we were hoping for. Our Vector object is looking less and less like a classical object and more like a real type with strong intention driving the API.

Writing Functions for Vectors

Vectors are a mathematical construct which means we have real actions we might want to do with them. In a classical approach to development, we would start adding methods to our Vector and extending the API through dumping a bunch of functionality on our final output data type.

The real world doesn’t work like that. A vector is simply a set of ordered values describing a mathematical idea. Vectors don’t actually do anything and they definitely don’t add or magnitude. At most it makes sense to ask a vector its length and a point at an index. Anything else is really an action we do TO a vector.

A common action to take with a vector is to figure out its size, better known as its magnitude. This is a pretty simple process and is directly related to the Pythagorean theorem we learned in grade school: a2 + b2 = c2 or (a2 + b2)1/2 = c.

Let’s take an implementation of a generic magnitude function for a vector.

    function addSquare(sum, value) {
        return sum + Math.pow(value, 2);
    }

    // (Vector) -> number
    function magnitude(vector) {
        var sumOfSquares = vector.valueOf().reduce(addSquare, 0);

        return Math.pow(sumOfSquares, 0.5);
    }

    // Magnitude example
    var vector = Vector.build(3, 4);
    magnitude(vector); // 5

I’d like to point out the annotation above the magnitude function. What this says is magnitude is a function which takes a vector and returns a number. The reason this annotation is so important is it describes an action we can take on a Vector type which will return a usable value. Our function is not interested in the object-orientedness of Vector, it assumes Vector is a type just like any other and acts upon it accordingly.

The other particularly important item is where we refer to vector.valueOf(). Vectors don’t get nice functions like reduce, since they aren’t inherited from the Object prototype. Instead, valueOf gives us a Javascript core data type we can interact with. This means we can limit the amount of custom code we must write in order to actually accomplish work with our vector.

Expanding Our Vector Functions

A better example, even, than magnitude when working with vectors is the concept of adding vectors together. In a purely object-oriented world, adding vectors involves either creating an object which has no real relation to vectors aside from the purpose of housing functions which act on vectors, or adding an add method to our vector creating a syntax that looks like this: vector1.add(vector2).

At best this kind of syntax is kind of odd looking since it doesn’t read quite right, where we would probably say something like “add vector1 and vector2,” this says “vector1 add, vector2” which is kind of awkward to write let alone say. What’s worse is there is an implied order of operations here. Vector addition, much like regular addition, is commutative. This means whether we do vector1 + vector2 or vector2 + vector1, we get the same result. It’s a good thing too. Could you imagine if changing the order you added two things together actually changed the outcome? Bad news.

Let’s take a look at a functional implementation to add two vectors. In this case we will, again, make use of the valueOf method and we will also take advantage of the fact that our vectors are indexed appropriately, so we can capture values without needing to perform valueOf just to get an array. Let’s have a look at the code.

    function addParallelValue (vector, value, index){
        return vector[index] + value;
    }

    // (Vector, Vector) -> Vector
    function addVectors(vector1, vector2) {
        if (vector1.length !== vector2.length) {
            throw new Error('Cannot add two vectors of different lengths.');
        }
        
        var newPoints = vector1.valueOf().map(addParallelValue.bind(null, vector2));

        return Vector.build.apply(null, newPoints);
    }

    // Addition example
    var vector1 = Vector.build(1, 2, 3);
    var vector2 = Vector.build(4, 5, 6);
    addVectors(vector1, vector2).toString(); // <5,7,9>

Our annotation this time states addVectors is a function which takes two vectors and returns a vector. Add vectors is actually a far more complex operation than taking the magnitude since we have to interact with two different vectors simultaneously, adding the values. Once we have the new values for our resulting vector, we must create a vector to return.

With the kind of variadic behavior our Vector constructor follows, performing this operation in a purely object oriented manner would be rather challenging, though not impossible. By building our Vector object with a factory, we get the added benefit of using the built-in apply function which all functions inherit. This makes creation of a new vector a trivial affair. In the end, we actually manage to accomplish the core computation in the same number of lines as our simpler magnitude computation.

Summary

Javascript’s blended object oriented/functional paradigm provides a lot of power when we redraw computation and type lines. Instead of bundling all functionality into objects and trying to force-fit a single solution, we get the greatest power object-orientation gives us, flexible type definitions, with the power functional programming provides, declarative, powerful computation syntax.

As you develop applications with Javascript take a look at what you are trying to accomplish and consider whether your current work is data- or computation-centric. Let this differentiating characteristic guide your hand and develop your apps to be powerful, clean and well defined.

Visual Studio Code Extensions: Editing the Document

Feb 17, 2016

I have been supporting an extension for Visual Studio Code for about a month now. In that time I have learned a lot about building extensions for an editor and static analysis of Javascript. Today is more about the former and less about the latter. Nevertheless, I have found that creating snippets, modifying the status bar and displaying messages is trivial, but modifying the current document is hard when you don’t know how to even get started.

The other important thing I have noted about writing extensions for VS Code is, although the documentation exists and is a, seemingly, exhaustive catalog of the extension API, it is quite lacking in examples and instructions. By the end of this post I hope to demystify one of the most challenging parts of document management I have found so far.

The VSCode Module

The first thing to note is anything you do in VS Code which interacts with the core API will require the vscode module. It might seem strange to bring this up, but it is rather less than obvious that you will have to interact with the vscode module often.

Under the hood, the vscode module contains pretty much everything you need to know about the editor and its current state. The module also contains all of the functions, objects and prototypes which you will need in order to make any headway in your code whatsoever. With that in mind, there are two ways you can get this module or any of its data into your current solution. The first option is to require it like any other node module.

var vscode = require('vscode');

As of the latest release of VS Code, you now have to explicitly specify the vscode module in dev-dependencies in your packages.json file. Below is what my dependencies object looks like:

  "devDependencies": {
    "chai": "^3.4.1",
    "mocha": "^2.3.4",
    "mockery": "^1.4.0",
    "sinon": "^1.17.2",
    "vscode": "^0.11.x"
  },

For now, don’t sweat the other stuff I have required. It is all test library stuff, which we will look at in a future post. Anyway, that last line is my vscode requirement. By adding this dependency, vscode will be available in your development environment, which makes actually getting work done possible. To install and include vscode, copy and paste the following at the command line inside your extension project:

npm install vscode --save-dev

Making a Document Edit

The reason it was important to briefly cover the actual vscode module is, we are going to live on it for the rest of the post. It will be in just about ever code sample from here to the end of the post.

So…

By reading the VS Code extension API it is really, really easy to get lost when trying to push an edit into the view. There are, in fact, 5 different object types which must be instantiated in order and injected into one another to create a rather large, deeply nested edit object hierarchy. As I was trying to figure it out, I had to take notes and keep bookmarks so I could cross-compare objects and sort out which goes where.

I will start off by looking at the last object in the sequence and then jump to the very first objects which need to be instantiated and work to our final implementation. When you want to make an edit to the code in the current document, you need to create a new WorkspaceEdit which will handle all of the internal workings for actually propagating the edit. This new WorkspaceEdit object will be passed, when ready, into an applyEdit function. Here’s what the final code will look like, so it is clear what we are working toward in the end:

    function applyEdit (vsEditor, coords, content){
        var vsDocument = getDocument(vsEditor);
        var edit = setEditFactory(vsDocument._uri, coords, content);
        vscode.workspace.applyEdit(edit);
    }

In this sample code, the _uri refers to the document we are interacting with, coords contains the start and end position for our edit and content contains the actual text content we want to put in our editor document. I feel we could almost spend an entire blog post just discussing what each of these pieces entails and how to construct each one. For now, however, let’s just assume the vsEditor is coming from outside our script and provided by the initial editor call, the coords are an object which we will dig into a little more soon, and content is just a block of text containing anything.

The Position Object

In our previous code sample, there is a function called setEditFactory. In VS Code there are two types of document edits, set and replace. So far I have only used a set edit and it seems to work quite nicely. With that in mind, however, there is a reason we are using a factory function to construct our edit. A document edit contains so many moving parts it is essential to limit exposure of the reusable pieces to the rest of the world since they largely illuminate nothing when you are in the middle of trying to simply add some text to the document.

Let’s actually dig into the other end of our edit manufacture process and look at the very first object which need to be constructed in order to actually produce a change in our document: the position. Every edit must have a position specified. Without a position, the editor won’t know where to place the changes you are about to make.

In order to create a position object, you need two number values, specifically integers. I’m not going to tell you where, exactly, to get these numbers because that is really up to the logic of your specific extension, but I will say that a position requires a line number and a character number. If you are tinkering with code in your own extension, you can actually make up any two numbers you want as long as they exist as coordinates in your document. line 5, char 3 is a great one if it exists, so feel free to key the values in by hand to get an idea of how this works.

Anyway, once we have a line and a character number, we are ready to construct a position.

    function positionFactory(line, char) {
        return new vscode.Position(line, char);
    }

That’s actually all there is to it. If you new up a Position object with a line and character number, you will have a new position to work with in your extension.

The Range Object

The next object you will need to display your document change is a range object. The range object, one would think, would simply take bare coordinates. Sadly this is not the case. What range actually takes is a start and end position object. The range tells VS Code what lines and characters to overwrite with your new content, so it must go from an existing line and character to an existing line and character, otherwise known as a start position and end position. Here’s how we create a range object.

    function rangeFactory(start, end) {
        return new vscode.Range(start, end);
    }

So far our factories are nice and clean, which makes their intent pretty obvious. This is handy because it gets really strange and hard to follow quickly without some care and feeding of clean code. Anyway, our range takes two positions, so we provide them as named above, start and end.

The TextEdit Object

The TextEdit object is where things start to really come together. Now that we have a range made of two positions, we can pass our range and our text content through to create a new edit object. The edit object is one of the key objects we need to actually perform our document change. It contains almost all of the necessary information to actually make a document change. Let’s look at how to construct a TextEdit object.

    function textEditFactory(range, content) {
        return new vscode.TextEdit(range, content);
    }

Keep in mind, though we have only written a few short lines of code we have now constructed an object tree containing 4 nested objects. Are you still keeping up?

Building an Edit

Now that we have gotten through the nitty gritty of constructing individual objects for our tree, we are ready to actually build a full edit and pass it back to the caller. This next function will make use of our factories in order to construct an object containing all dependencies in the right nesting order.

Does anyone else feel like we are putting together a matryoshka doll?

Anyway our next function will also follow the factory pattern we have been using so we get a clean string of function calls all the way up and down our stack which will, hopefully, keep things easy to follow.

    function editFactory (coords, content){
        var start = positionFactory(coords.start.line, coords.start.char);
        var end = positionFactory(coords.end.line, coords.end.char);
        var range = rangeFactory(start, end);
        
        return textEditFactory(range, content);
    }
```

As you can see, we are assembling all of the pieces and the stacking them together to build our document edit. The fully constructed edit will contain all of the instructions VS Code needs to modify the selected document. This will be useful as we construct our next object to interact with.

Yes, there's more.

The WorkspaceEdit Object

In order to introduce our edit into a document, we need to build a workspace edit to interact with. This workspace edit is, you guessed it, another object. A workspace has no required dependencies up front, so we are safe to construct this bare and interact with it later. Here's our factory: ```javascript function workspaceEditFactory() { return new vscode.WorkspaceEdit(); } ``` This new workspace edit is where we will do our final setup before applying our edits into the document we started with, originally. Once we have a workspace edit, we can perform behaviors like set and replace. Here's our last factory of the day, where we actually kick off the process. This actually brings us full circle, back to our edit application we looked at in the very first example. Let's look at the code. ```javascript function setEditFactory(uri, coords, content) { var workspaceEdit = workspaceEditFactory(); var edit = editFactory(coords, content); workspaceEdit.set(uri, [edit]); return workspaceEdit; } ``` Now we can see where all of our coordinates, content and that mysterious uri went. Our setEditFactory takes all of our bits and pieces and puts them together into a single edit which we can then apply to our workspace edit object, which is then passed back for the program to operate on.

Summary

Even after having figured this out from the VS Code documentation and implementing it in an extension, this is a lot to keep in your head. The bright side of all this work is, if done correctly, this can be wrapped up in a module and squirreled away to just be used outright. The surface area on this module is really only a single function, setEditFactory. This means, once you have it running correctly, it should be simple to call a single function with the right parts and get back a fully instantiated, usable object which can be applied to the editor. Hopefully this is useful for someone. Identifying this and putting it all together with clear intent was a challenge with no examples. If there were ever one place I would complain about VS Code it is the documentation. I hope my post helps clear up the obscurities and makes it easier for people to dig into making their own extensions or contributing to an extension someone else has built.

Math for Programmers: Difference of Sets

Feb 10, 2016

Last post we discussed union and intersection of sets. These two functions are common and well used, so they are quite important to understand at a deep level, especially if you work with databases on the regular. It can also be helpful to understand these behaviors if you have lots of sets of simple data which need to be combined in a straightforward way.

Difference Explanation

One last, critical, function which is typically used on two or more sets is the difference operation. The difference of sets can be characterized as A - B where the outcome is the set A with all elements shared with set B removed. We can visualize the difference of sets with the following diagram.

Venn diagram of a difference of sets

Unlike union and intersection which are commutative, the difference of two sets is not. This means that A &xcap; B and B &xcap; A are the same operation; the same can be said for A &xcup; B and B &xcup; A. The difference A - B is not the same as B - A. We can create a concrete example as follows.

A = {1, 2, 3, 4, 5}
B = {3, 4, 5, 6, 7}

A - B = {1, 2}
B - A = {6, 7}

Clearly these two differences are distinct and different. It is quite helpful to understand the order result of set difference when trying to apply functionality to it. we can create a function, called difference, and apply it to two sets. The action would behave like the following diagram:

Visualization of a difference function

In order to apply our difference function, we need to use a couple of helper functions we introduced in the last post: unique and buildSetMap. These will be important for isolating unique elements and eliminating or keeping them according to our difference functionality.

Predicates as Sets

We can describe a set by way of a predicate function. Effectively, anything which matches the condition of the predicate can be viewed as part of the set while anything which does not match the predicate is not included. A good example of this kind of set partitioning would be the set of even numbers. We could pick values with a function called isEven, which would give us a set like this: {2, 4, 6, 8, …}.

We can describe a more general purpose predicate which simply tests if a value is contained in a set built by our buildSetMap function. The behavior would be simple: test that any value which is provided exists in our set and does not resolve to undefined. Let’s have a look.

    function isInSet (set, value){
        return typeof set[value] !== 'undefined';
    }

We can, now, easily test if a value is contained in a set. This simplifies our problem significantly. Now all we really care about to take a difference is whether a given value is not contained in our set; i.e. then it is part of the difference.

Javascript has a function, filter, which is very closely related to both the intersection and difference of sets. If we choose a predicate which includes all values contained in a set, filter will return the intersection of the set of all values which match our predicate value and the set of values we are testing. If, on the other hand, we use a predicate which only keeps values which are NOT in the set, we get the difference instead.

We can use the factory pattern to keep our code clean and readable, while building a useful difference predicate function. Let’s have a look at the code that makes this work.

    function diffFactory(set, value) {
        return function (value) {
            return !isInSet(set, value);
        };
    }

Implementing Difference Function

Now we are ready to take the difference of two sets. We have a difference predicate factory, a unique function and a buildSetMap function. Bringing filter into the mix means the code almost writes itself. Let’s build our difference algorithm.

    function difference(lista, listb) {
        return unique(lista).filter(diffFactory(buildSetMap(listb)));
    }

This difference function will produce the difference of any two lists of primitive values. This means our original example of A - B can be done with a small, easy to read line of code; difference(A, B);.

Symmetric Difference Explanation

There is another difference operation which has a special name in computer science, the symmetric difference. Mathematically a symmetric difference can be written one of two ways, (A - B) &xcup; (B - A) or (A &xcup; B) - (A &xcap; B). Both of these operations resolve the same way, which can be visualized by the following diagram.

Venn diagram of a symmetric difference of sets

To make this more clear, let’s have a look at a concrete example of a symmetric difference using our original sets A and B.

A = {1, 2, 3, 4, 5}
B = {3, 4, 5, 6, 7}

symmetricDifference(A, B) = {1, 2, 6, 7}

What all of this really means is the symmetric difference takes everything from sets A and B and excludes the elements they share. A common example is a graph of students taking Math and English but none of the students which are taking both. We can create a function symmetricDifference which performs this operation and lives between the definition of sets A and B and the resulting set.

Visualization of a symmetric difference function

Symmetric Difference Implementation

To create our implementation, I opted for the difference of the union and intersection of A and B. This means we will first compute the union and intersections of our two sets and then take the difference of the two resulting sets.

    function symmetricDifference(lista, listb) {
        return difference(union(lista, listb), intersect(listb, lista));
    }

Once again we see the code practically writes itself. Since we already have functions to perform each of these operations on their own, all we have to do is simply chain them together in a meaningful way and our output simply emerges with nearly no effort at all.

Summary

This closes up the introduction of basic sets and operations including algorithms for acting on lists as sets of data. We discussed how lists and maps relate to sets, what a union and intersection are as well as how to write linear implementation of union and intersection algorithms. We addressed how sets can be defined conceptually with predicate functions and how they interact with concrete sets of data in our programs. Finally, we looked at the difference and symmetric difference operations as well as functions which perform our difference functions in linear time. With all of this closed up, we are ready to start performing more complex manipulations of data using higher order functions which we will discuss in upcoming posts. Until then, keep your code clean and use sets to make your programs better!

  • Web Designers Rejoice: There is Still Room

    I’m taking a brief detour and talking about something other than user tolerance and action on your site. I read a couple of articles, which you’ve probably seen yourself, and felt a deep need to say something. Smashing Magazine published Does The Future Of The Internet Have Room For Web Designers? and the rebuttal, I Want To Be A Web Designer When I Grow Up, but something was missing.

  • Anticipating User Action

    Congrats, you’ve made it to the third part of my math-type exploration of anticipated user behavior on the web. Just a refresher, the last couple of posts were about user tolerance and anticipating falloff/satisficing These posts may have been a little dense and really math-heavy, but it’s been worth it, right?

  • Anticipating User Falloff

    As we discussed last week, users have a predictable tolerance for wait times through waiting for page loading and information seeking behaviors. The value you get when you calculate expected user tolerance can be useful by itself, but it would be better if you could actually predict the rough numbers of users who will fall off early and late in the wait/seek process.

  • User Frustration Tolerance on the Web

    I have been working for quite a while to devise a method for assessing web sites and the ability to provide two things. First, I want to assess the ability for a user to perform an action they want to perform. Second I want to assess the ability for the user to complete a business goal while completing their own goals.

  • Google Geocoding with CakePHP

    Google has some pretty neat toys for developers and CakePHP is a pretty friendly framework to quickly build applications on which is well supported. That said, when I went looking for a Google geocoding component, I was a little surprised to discover that nobody had created one to do the hand-shakey business between a CakePHP application and Google.

  • Small Inconveniences Matter

    Last night I was working on integrating oAuth consumers into Noisophile. This is the first time I had done something like this so I was reading all of the material I could to get the best idea for what I was about to do. I came across a blog post about oAuth and one particular way of managing the information passed back from Twitter and the like.

  • Know Thy Customer

    I’ve been tasked with an interesting problem: encourage the Creative department to migrate away from their current project tracking tool and into Jira. For those of you unfamiliar with Jira, it is a bug tracking tool with a bunch of toys and goodies built in to help keep track of everything from hours to subversion check-in number. From a developer’s point of view, there are more neat things than you could shake a stick at. From an outsider’s perspective, it is a big, complicated and confusing system with more secrets and challenges than one could ever imagine.

  • When SEO Goes Bad

    My last post was about finding a healthy balance between client- and server-side technology. My friend sent me a link to an article about SEO and Google’s “reasonable surfer” patent. Though the information regarding Google’s methods for identifying and appropriately assessing useful links on a site was interesting, I am quite concerned about what the SEO crowd was encouraging because of this new revelation.

  • Balance is Everything

    Earlier this year I discussed progressive enhancement, and proposed that a web site should perform the core functions without any frills. Last night I had a discussion with a friend, regarding this very same topic. It came to light that it wasn’t clear where the boundaries should be drawn. Interaction needs to be a blend of server- and client-side technologies.

  • Coding Transparency: Development from Design Comps

    Since I am an engineer first and a designer second in my job, more often than not the designs you see came from someone else’s comp. Being that I am a designer second, it means that I know just enough about design to be dangerous but not enough to be really effective over the long run.

  • Usabilibloat or Websites Gone Wild

    It’s always great when you have the opportunity to built a site from the ground up. You have opportunities to design things right the first time, and set standards in place for future users, designers and developers alike. These are the good times.

  • Thinking in Pieces: Modularity and Problem Solving

    I am big on modularity. There are lots of problems on the web to fix and modularity applies to many of them. A couple of posts ago I talked about content and that it is all built on or made of objects. The benefits from working with objectified content is the ease of updating and the breadth and depth of content that can be added to the site.

  • Almost Pretty: URL Rewriting and Guessability

    Through all of the usability, navigation, design, various user-related laws and a healthy handful of information and hierarchical tricks and skills, something that continues to elude designers and developers is pretty URLs. Mind you, SEO experts would balk at the idea that companies don’t think about using pretty URLs in order to drive search engine placement. There is something else to consider in the meanwhile:

  • Content: It's All About Objects

    When I wrote my first post about object-oriented content, I was thinking in a rather small scope. I said to myself, “I need content I can place where I need it, but I can edit once and update everything at the same time.” The answer seemed painfully clear: I need objects.

  • It's a Fidelity Thing: Stakeholders and Wireframes

    This morning I read a post about wireframes and when they are appropriate. Though I agree, audience is important, it is equally important to hand the correct items to the audience at the right times. This doesn’t mean you shouldn’t create wireframes.

  • Developing for Delivery: Separating UI from Business

    With the advent of Ruby on Rails (RoR or Rails) as well as many of the PHP frameworks available, MVC has become a regular buzzword. Everyone claims they work in an MVC fashion though, much like Agile development, it comes in various flavors and strengths.

  • I Didn't Expect THAT to Happen

    How many times have you been on a website and said those very words? You click on a menu item, expecting to have content appear in much the same way everything else did. Then, BANG you get fifteen new browser windows and a host of chirping, talking and other disastrous actions.

  • Degrading Behavior: Graceful Integration

    There has been a lot of talk about graceful degradation. In the end it can become a lot of lip service. Often people talk a good talk, but when the site hits the web, let’s just say it isn’t too pretty.

  • Website Overhaul 12-Step Program

    Suppose you’ve been tasked with overhauling your company website. This has been the source of dread and panic for creative and engineering teams the world over.

  • Pretend that they're Users

    Working closely with the Creative team, as I do, I have the unique opportunity to consider user experience through the life of the project. More than many engineers, I work directly with the user. Developing wireframes, considering information architecture and user experience development all fall within my purview.

  • User Experience Means Everyone

    I’ve been working on a project for an internal client, which includes linking out to various medical search utilities. One of the sites we are using as a search provider offers pharmacy searches. The site was built on ASP.Net technology, or so I would assume as all the file extensions are ‘aspx.’ I bring this provider up because I was shocked and appalled by their disregard for the users that would be searching.

  • Predictive User Self-Selection

    Some sites, like this one, have a reasonably focused audience. It can become problematic, however, for corporate sites to sort out their users, and lead them to the path of enlightenment. In the worst situations, it may be a little like throwing stones into the dark, hoping to hit a matchstick. In the best, users will wander in and tell you precisely who they are.

  • Mapping the Course: XML Sitemaps

    I just read a short, relatively old blog post by David Naylor regarding why he believes XML sitemaps are bad. People involved with SEO probably know and recognize the name. I know I did. I have to disagree with his premise, but agree with his argument.

  • The Browser Clipping Point

    Today, at the time of this writing, Google posted a blog stating they were dropping support for old browsers. They stated:

  • Creativity Kills

    People are creative. It’s a fact of the state of humanity. People want to make things. It’s built into the human condition. But there is a difference between haphazard creation and focused, goal-oriented development.

  • Reactionary Navigation: The Sins of the Broad and Shallow

    When given a task of making search terms and frequetly visited pages more accessible to users, the uninitiated fire and fall back. They leave in their wake, broad, shallow sites with menus and navigtion which look more like weeds than an organized system. Ultimately , these navigation schemes fail to do the one thing they were intended for, enhance findability.

  • OOC: Object Oriented Content

    Most content on the web is managed at the page level. Though I cannot say that all systems behave in one specific way, I do know that each system I’ve used behaves precisely like this. Content management systems assume that every new piece of content which is created is going to, ultimately, have a page that is dedicated to that piece of content. Ultimately all content is going to live autonomously on a page. Content, much like web pages, is not an island.

  • Party in the Front, Business in the Back

    Nothing like a nod to the reverse mullet to start a post out right. As I started making notes on a post about findability, something occurred to me. Though it should seem obvious, truly separating presentation from business logic is key in ensuring usability and ease of maintenance. Several benefits can be gained with the separation of business and presentation logic including wiring for a strong site architecture, solid, clear HTML with minimal outside code interfering and the ability to integrate a smart, smooth user experience without concern of breaking the business logic that drives it.

  • The Selection Correction

    User self selection is a mess. Let’s get that out in the open first and foremost. As soon as you ask the user questions about themselves directly, your plan has failed. User self selection, at best, is a mess of splash pages and strange buttons. The web has become a smarter place where designers and developers should be able to glean the information they need about the user without asking the user directly.

  • Ah, Simplicity

    Every time I wander the web I seem to find it more complicated than the last time I left it.  Considering this happens on a daily basis, the complexity appears to be growing monotonically.  It has been shown again and again that the attention span of people on the web is extremely short.  A good example of this is a post on Reputation Defender about the click-through rate on their search results.

  • It's Called SEO and You Should Try Some

    It’s been a while since I last posted, but this bears note. Search engine optimization, commonly called SEO, is all about getting search engines to notice you and people to come to your site. The important thing about good SEO is that it will do more than simply get eyes on your site, but it will get the RIGHT eyes on your site. People typically misunderstand the value of optimizing their site or they think that it will radically alter the layout, message or other core elements they hold dear.

  • Information and the state of the web

    I only post here occasionally and it has crossed my mind that I might almost be wise to just create a separate blog on my web server.  I have these thoughts and then I realize that I don’t have time to muck with that when I have good blog content to post, or perhaps it is simply laziness.  Either way, I only post when something strikes me.

  • Browser Wars

    It’s been a while since I have posted. I know. For those of you that are checking out this blog for the first time, welcome. For those of you who have read my posts before, welcome back. We’re not here to talk about the regularity (or lack thereof) that I post with. What we are here to talk about is supporting or not supporting browsers. So first, what inspired me to write this? Well… this:

  • Web Scripting and you

    If there is one thing that I feel can be best learned from programming for the internet it’s modularity.  Programmers preach modularity through encapsulation and design models but ultimately sometimes it’s really easy to just throw in a hacky fix and be done with the whole mess.  Welcome to the “I need this fix last week” school of code updating.  Honestly, that kind of thing happens to the best of us.

  • Occam's Razor

    I have a particular project that I work on every so often. It’s actually kind of a meta-project as I have to maintain a web-based project queue and management system, so it is a project for the sake of projects. Spiffy eh? Anyway, I haven’t had this thing break in a while which either means that I did such a nice, robust job of coding the darn thing that it is unbreakable (sure it is) or more likely, nobody has pushed this thing to the breaking point. Given enough time and enough monkeys. All of that aside, every so often, my boss comes up with new things that she would like the system to do, and I have to build them in. Fortunately, I built it in such a way that most everything just kind of “plugs in” not so much that I have an API and whatnot, but rather, I can simply build out a module and then just run an include and use it. Neat, isn’t it?

  • Inflexible XML data structures

    Happy new year! Going into the start of the new year, I have a project that has carried over from the moment I started my current job. I am working on the information architecture and interaction design of a web-based insurance tool. Something that I have run into recently is a document structure that was developed using XML containers. This, in and of itself, is not an issue. XML is a wonderful tool for dividing information up in a useful way. The problem lies in how the system is implemented. This, my friends, is where I ran into trouble with a particular detail in this project. Call it the proverbial bump in the road.

  • Accessibility and graceful degradation

    Something that I have learnt over time is how to make your site accessible for people that don’t have your perfect 20/20 vision, are working from a limited environment or just generally have old browsing capabilities. Believe it or not, people that visit my web sites still use old computers with old copies of Windows. Personally, I have made the Linux switch everywhere I can. That being said, I spend a certain amount of time surfing the web using Lynx. This is not due to the fact that I don’t have a GUI in Linux. I do. And I use firefox for my usual needs, but Lynx has a certain special place in my heart. It is in a class of browser that sees the web in much the same way that a screen reader does. For example, all of those really neat iframes that you use for dynamic content? Yeah, those come up as “iframe.” Totally unreadable. Totally unreachable. Iframe is an example of web technology that is web-inaccessible. Translate this as bad news.

  • Less is less, more is more. You do the math.

    By this I don’t mean that you should fill every pixel on the screen with text, information and blinking, distracting graphics. What I really mean is that you should give yourself more time to accomplish what you are looking to do on the web. Sure, your reaction to this is going to be “duh, of course you should spend time thinking about what you are going to do online. All good jobs take time.” I say, oh young one, are you actually spending time where it needs to be spent? I suspect you aren’t.

  • Note to self, scope is important.

    Being that this was an issue just last evening, I thought I would share something that I have encountered when writing Javascript scripts.  First of all, let me state that Javascript syntax is extremely forgiving.  You can do all kinds of  unorthodox declarations of variables as well as use variables in all kinds of strange ways.  You can take a variable, store a string in it, then a number, then an object and then back again.  Weakly typed would be the gaming phrase.  The one thing that I would like to note, as it was my big issue last evening, is scope of your variables.  So long as you are careful about defining the scope of any given variable then you are ok, if not, you could have a problem just like I did.  So, let’s start with scope and how it works.

  • Subscribe

    -->