17 November 2009

FOODOPI - coming soon to a nation near you!

IP news from France. My translation probably isn't perfect. There are other things to worry about, too.

Watching the evilly smug faces of recording industry executives following France's recent adoption of Hadopi, restaurant owners have decided they deserved a slice of the IP pie, too. Watch out for the new law to be introduced later this year: FOODOPI!

Restaurant managers, owners, and chefs who have dedicated years of their talent, skill, and secret sauce to creating irresistible mouthwatering dishes have watched in dismay as food pirates illegally copy their ideas, recreating such classics as Ratatouille, Chicken Curry with Rice, and Spaghetti Bolognese with impunity in their own homes.

This is all going to change and the impoverished actors of the restauration industry will finally see their hard work rewarded and protected. The new FOODOPI law, if adopted, will allow restaurant industry executives name individuals suspected of recipe pirating and, after three warnings, those individuals will be prohibited from cooking for a length of time varying from six months to ten years.

Although the lack of judicial review during the prohibition process has raised concerns among civil liberties groups, a spokesperson for a French restaurateur's association observed "the justice system is already overstretched and it makes no sense to burden it even further. This law is a huge win for the public, ensuring continued innovation in the food service industry. It would be a disservice to the public and a drain on limited taxpayer resources to push this through the courts."

A spokesperson for a US-based organisation conducting cutting-edge research on genetic improvement of popular crops said while they would vigourously defend their IP in France under this new law, "currently we have no intention of prosecuting the most widespread violation of our intellectual property: the use of sodium chloride as a food additive for flavour enhancement". [A lawyer friend has advised me that this patent may not be applicable in France anyway as the use of "table salt" (to use pirate jargon) is a popular custom in this country, dating back centuries - ed.]

While the details of the new system have yet to be worked out, a leaked document obtained by this site indicates some of the strategies being considered -

  • Government-mandated cooking equipment for all new domestic kitchen installations with remote sensing equipment for ambient atmospheric analysis, allowing FOODOPI investigators detect potential violations of food industry IP by comparing chemicals and food traces in the air with a database of protected recipes.
  • Food retailers will report purchases to a central database that will apply sophisticated pattern-matching algorithms to identify individuals who may be planning IP violations. As an example, the document describes a hypothetical shopper in the process of acquiring 500g basmati rice, 8 chicken thighs, unflavoured yoghurt, turmeric, ginger, and garlic. The proposed pattern-matching software would flag this shopper as a potential pirate about to prepare Chicken Curry with Rice for 4.
  • Repeat offenders would ultimately have all kitchen equipment confiscated and perhaps have a camera installed in their homes to deter future violations

The document also noted some concerns of IP holders, including the threat of violations by picknickers and people using obsolete or camping equipment, where monitoring systems are less feasible.

Put that in your pipe and smoke it.

30 September 2009

mass file rename using mv and sed

I wanted to re-organise my config/locale directory because translation files for the same stuff were too far from each other. I had

config/locales
  en
    foo.yml
    bar.yml
  fr
    foo.yml
    bar.yml

It was too much of a pain to switch quickly between en/foo.yml and fr/foo.yml. So now I have

config/locales
  foo.en.yml
  foo.fr.yml
  bar.en.yml
  bar.fr.yml

except it's not four files, it's dozens. So I needed a mass-rename shell script to rename many files at once:

PATTERN=$1
shift
for file in $*
do
mv "$file" `echo "$file" | sed ${PATTERN}`
done

It combines mv and sed to rename files based on patterns in their names. Invoke thus:

cd config/locales/en
mvsed s/yml/en.yml/g *.yml

Repeat for each language, and then just move all the yml files up to the parent directory, and you're done. You can use this for lots of renaming operations, and as you will have noticed, you don't need to be a sed guru to figure out basic patterns. This script works with bash on my macos 10.5; I imagine it will work with little alteration on the sensible non-MSDOS shell you're using.

Acknowledgements to linuxforums.org / dnielsen78 for the inspiration for this script, and to everyone else who wrote this first.

08 July 2009

Don't misunderestimate your children

I was reading a Wallace and Gromit story to my (now 5-year-old) son, and he energetically pointed out

son Look, a grandfather clock!
me Yes, that's a grandfather clock. It's about to fall on top of Wallace!
Wallace That's just the thing, my lad, we'll have to turn back the clock ...
son What's "turn back the clock" ?
me Well, suppose something happened yesterday that you weren't happy about, so you decide to <blah blah long-winded explanation of expression with digression into the role of metaphor in language>
son Oh, you mean he builds a time machine?

Duh. I should have realised time travel is a standard part of every 5-year-old's vocabulary.

28 March 2009

Chart in Javascript

There aren't enough bar-chart-drawing implementations out there yet, so this article will present the best one yet. It looks like this:

a chart made with javascript

Or several little charts in a row, like this:

several bar charts in a row

Features:

  • Really easy to use
  • mouse move highlights bar under mouse
  • pops up tooltip with information for the bar under the mouse
  • All colours are configurable
  • Scales data automatically to height of canvas
  • Calculates thickness of bars automatically so all fit in the width of the canvas
  • Redraw data as often as you want - for example with data from an ajax call

The tooltip is missing from this screenshot. My mac hides the tooltip before it takes the screenshot unfortunately.

Anyway, I'd like to convince you that it's really, really easy to use. Supply your data in JSON format like this:

var data = [
[ "week 1", "1000" ],
[ "week 2", "2000" ],
[ "week 3", "3000" ]
];

Declare a canvas element:

<canvas id="weekly_chart" width="200" height="80"></canvas>

Then just call the Chart(canvas_element, bar_colour, bg_colour, hilite_colour) constructor:

var chart = new Chart($('weekly_chart'), "rgb(128,128,255)", "rgb(0,0,0)", "rgb(255,255,128)");

Then, draw your data

chart.draw(data);

It couldn't be simpler!

Here's the code. Just include it somwhere in your page. It's free under a Creative Commons Attribution Share-Alike 3.0 license - you can use, re-use, modify, redistribute, as long as you link back to this page and (re)distributions carry the same or a compatible license. It has a slight dependency on Prototype (2 calls) but you jQuery people will fix that quickly (in fact, you might even leave a comment with the jQuery alternative). Theoretically, it will work on Internet Exploder using Google's excanvas, although I haven't tested this configuration.

(function() {
  function max(data) {
    var max = 0;
    for (var i = 0; i < data.length; i++) {
      if (data[i][1] > max) {
        max = data[i][1];
      }
    }
    return max;
  }

  function coords(event, element, f) {
    var offset = $(element).cumulativeOffset();   // $ from prototype
    var p = Event.pointer(event || window.event); // Event.pointer from prototype
    var y = p.y - offset.top;
    var x = p.x - offset.left;
    f(y, x);
  }

  window.Chart = function(canvas, fg, bg, hilite) {
    if (!canvas || !canvas.getContext) {
      return;
    }

    var cx = canvas.getContext('2d');

    this.draw = function(data) {
      cx.clearRect(0, 0, canvas.width, canvas.height);
      var thick = canvas.width / data.length;
      var scale = canvas.height / max(data);

      function highlightBars(index) {
        cx.lineWidth = 1;

        for (var i = 0; i < data.length; i++) {
          if (i == index || i == (index + 1)) {
            cx.strokeStyle = hilite;
          } else {
            cx.strokeStyle = bg;
          }
          cx.beginPath();
          cx.moveTo((i * thick) + 0.5, 0);
          cx.lineTo((i * thick) + 0.5, canvas.height);
          cx.stroke();
        }
      }

      cx.fillStyle = bg;
      cx.fillRect(0, 0, canvas.width, canvas.height);

      cx.fillStyle = fg;

      for (var i = 0; i < data.length; i++) {
        var h = data[i][1] * scale;
        if (!isNaN(h)) {
          cx.fillRect(i * thick, canvas.height - h, thick, h);
        }
      }

      highlightBars(-2);


      canvas.onmouseout = function(event) {
        highlightBars(-2);
      };

      canvas.onmousemove = function(event) {
        coords(event, canvas, function(y, x) {
          var index = ((x-1) / thick).floor();
          if (index < 0) {
            index = 0;
          }
          highlightBars(index);

          var bar = data[index];
          if (bar) {
            canvas.title = bar[0] + " : " + bar[1];
          }
        });
      };
    };
  };
})();

This is the simple version. Feel free to comment with improvements, I'll keep the code up to date with my favourite suggestions.

26 March 2009

Spidering Internal Pages

A url within an anchor tag in html may be absolute or relative. Absolute links look like <a href='http://iconfu.com'>iconfu ...</a> - they start with a protocol. Relative links look like <a href='bar.html'>bar info ...</a>. When you link to bar.html from http://example.com/pages/foo.html, the browser constructs the full reference and requests http://example.com/pages/bar.html.

So far, so good.

Relative links may also be of the form <a href='?browse=arrow'>arrow icons</a>. A browser requesting this link from http://iconfu.com/tags/list/0.html will construct this url: http://iconfu.com/tags/list/0.html?browse=arrow

This can be convenient when the code or script that handles the requested page is separate from the code or script that handles the request parameters. This doesn't happen often, but when it does, it's useful to be able to construct the url without needing to know the originating page. For example, a login handler might be implemented as a filter before the page is rendered, so the login request would simply be ?username=foo&password=bar ... this gets expanded by the browser into http://example.com/pages/foo.html?username=foo&password=bar. On the server, your login filter handles the login parameters, and your example/page script handles the rest of the url.

The bad news is that some spidering implementations handle this incorrectly (google's works fine). Instead of requesting http://iconfu.com/tags/list/0.html?browse=arrow from the earlier example, they request http://iconfu.com/tags/list?browse=arrow - they chop out "0.html". My code doesn't like this, and returns an error. Dumb MF spider implementations.

So that was that. Well, here's another bit of news: about 95% of visitors who come to iconfu through search, come from google. There are two ways to explain this: (1) google is the world's dominant search engine, who uses yahoo/live/ask.com anyway; (2) the clever people behind google analytics use some clever reporting techniques to show that google is the world's dominant search engine so why bother with the others.

We can eliminate (2) because as you know googlers Do No Evil. But today, in a flash of insight, I realised (3) perhaps those other search engines are sending me no visitors because they think my site is full of bugs and holes and 500 Internal Server Errors.

I'll fix that today and I'll let you know if I get a little more love from those unloved search engines. And then you can add "be careful with relative urls containing only a query string" to your SEO toolkit.

Open Coffee Club Paris: For Sale

Open Coffee Club, Paris, 26 March 2009

A dude is making a speech about legal issues for startups. I don't like this. I come to OCC for peer-to-peer networking, not go get lectured at. This is hijacking an open, social event to allow one person dominate and control the discussion. Not only that, but my conversation was interrupted! This wastes my time, because I am deprived of the ability to choose the people I want to converse with. I expect to meet people either because they are interesting or their service is useful to me. I have no respect for a dude who has effectively bought* OCC as a platform to market his services, and I have no respect for an OCC willing to sell itself in this way.

Some people are politely paying attention. Others look bored and are wondering when this abuse will be over. And I have nobody to talk to :((

* "bought" in the moral sense. I have no idea how the dude in question obtained authority to hijack the group.

20 March 2009

Scaling Testing

Check out TestSwarm by John Resig - it's like SETI@Home for distributed testing ... isn't that a totally awesome concept? If you're having difficulty scaling your tests, especially if browser and OS combinations are wearing out your head, this is worth a look ...

Updating the security budget

You might think in your cosy world of internal corporate web applications you don't need to worry about XSS and CSRF attacks (cross-site-scripting and cross-site request forgery) - after all, these are worries for public-facing web sites, not for us, surely?

Wrong!

Suppose your disgruntled employee leaves the project, and "in these times of crisis, you know", has nothing to do so gets scripting. Your disgruntled ex-employee, who previously worked on a precious sensitive internal system (aren't they all?), knows exactly which URLs will trigger a money transfer if accessed by persons with the right privileges, assuming they are logged in at the time.

Suppose that person is you - you're the manager of Whatsit and Whatnot after all, you have clearance for pretty much everything, and we used to do lunch together, so you'd happily open a link I sent you because it's sure to be interesting, entertaining, funny, informative, edifying ... you know, the kind of links I send to people.

But all I need to do is include this bit of code, which will be completely invisible to you:

<form id="maliciousForm" action="http://internal.bank.example.com/transferMoney" style="display:none;">
  <input type="hidden" name="from" value="myOldBossesAccount"/>
  <input type="hidden" name="to" value="myPersonalAccount"/>
</form>

<script>document.forms.maliciousForm.submit();</script>

If you happen to be logged in to http://internal.bank.example.com at the time (perhaps in another tab or window of your browser), that's all it takes to do the damage. To cover my tracks I'd need to be a lot smarter and more subtle, but the basic attack is trivial. And, more likely than not, your application isn't protected.

In fact, you don't even know what scripts are embedded in this page ... do you dare "view source" and check? What malicious clandestine script has executed while you were reading this paragraph?

Time to run to the program manager and get a security budget extension! While you're at it, please don't shoot the messenger: smarter and meaner people than I (yes, they exist) have already thought of this. And even if your disgruntled ex-employee isn't that smart or mean, he might well be willing to sell his knowledge to someone who is.

For more information, see Adam Barth, Collin Jackson, and John C. Mitchell, Robust Defenses for Cross-Site Request Forgery (pdf). The most reliable defence involves ensuring all potentially harmful requests are made via POST, not GET; and then requiring that any POST request includes a secret token in its body (possibly via a hidden input); the server validates the secret token and allows the request to proceed only if the token is valid. Ruby on Rails provides framework methods for simplifying this (seriously: you call the protect_from_forgery method).

Asking your corporate users to logout when they're not using your precious sensitive application is like asking dogs to stop wagging their tails. It's not going to happen. If security isn't your problem, then it's a problem.

This post was inspired by the "Web application security horror stories" talk at FOWA Dublin 2009 by Simon Willison

17 March 2009

Jamendo & Magnatune

Thanks to Laurent I found Jamendo, a distributor of Creative-Commons-licensed music, and from there Magnatune, with similar goals. Magnatune artists get 50% of proceeds from sales of their work. Here's a Bach cantata from Magnatune:

JS Bach Cantatas - Volume IV - Early Cantatas for Holy Week by American Bach Soloists

The quality of the music is good, but there's an annoying voice between each track that tells you the album and artist name, which you knew already. I hope the paid version excludes this.

I'm going to explore these sites some more. It might be time to bid farewell to the iTunes Store, and all those Big Record Labels and their fascist anti-piracy methods with it.

10 March 2009

Ban Censorship Now!

An Irish ISP, eircom, has agreed to ban any website the IRMA (Irish Recorded Music Association) chooses to block.

I had thought censorship was one of those nasty deals you get in China and Iran, but now it's thriving in Ireland. Please support http://blackoutireland.com/ even if you're not Irish. I hate to fearmonger, but this is coming to an ISP near you, wherever you are, in the near future if it's not stopped now.

Personally, I'm not interested in illegal music (my musical preferences aren't the kind that go with P2P), but it is not acceptable for a private corporation to decide what sites I may or may not use. Now I just feel all icky and horrible when I enter a music store. All those shiny CDs are whispering "censorship, censorship" at me.

Go and prosecute the criminals, leave the rest of us alone!

03 March 2009

Blogger: break "convert line breaks"

Blogger's "convert line breaks" setting seems to cause a lot of pain, and what's more, it doesn't even seem to work. I get gratuitous <br/> in my code despite having set this setting to OFF. The issue and some workarounds are discussed on Rob on Programming, The Real Blogger Status, and MLA Wire.

The trouble is, if you've chosen to edit in "Edit Html" mode, it's reasonable to suppose that you know what you're doing, and you want the html code of your post to be exactly what you write. In other words, I can take care of my own <p> and <br> tags.

Blogger doesn't think so. And here's my revenge - I added this to the "style" section of my template:

  br { display: none; }
  br.forReal { display:inline; }

This way, I never have to think about Blogger's thoughtful, kind, but misguided insertion of line breaks again. And when I want a line break for real, which isn't very often (I'm more of a <p>...</p> person), I just

  <br class='forReal'/>

So I can make a table thusly*:

<table style="width:auto;" cellpadding="1" cellspacing="1" border="1">
  <tr>
    <td>foo</td>
    <td>bar</td>
  </tr>
  <tr>
    <td>toto</td>
    <td>titi</td>
  </tr>
</table>

Which looks like

foo bar
toto titi

Slight problem: it has probably gone and damaged all my old posts. What a pain!

* I know, "thusly" isn't a word. I don't care.

23 February 2009

Exceptions In Java [Must Read]

Exceptions were a big improvement over the old-fashioned C way of returning error codes, but then the debate raged over Checked vs Unchecked Exceptions, vs "Huh? Checked? Whazza?". Java offers all three varieties so you can never hope to port exceptional experience from one project to the next, because there are at least as many correct ways to do things as there are gurus. However, at least now there's a step in the right direction; with Björn Andersson's Java Exception Explanator a lot more of this makes sense. Check it out. Here's a teaser:

AccessControlException : You have lost control of Microsoft Access. If you cannot regain control or stop the program in some other way, you should cut the power to your computer as fast as possible.

ps. thanks reddit

Don't Forget To Flush

So, instead of being asleep like an honest citizen at 2 o'clock in the morning, here I am trying to make hibernate save my collection of Strings.

(I would hate not to be a developer: who else deals with Collections of Strings, I wonder?)

The mapping is perfect, straight from the documentation, to tricky stuff, no funny cases:

    <set name="categories" table="resources_categories">
      <key column="resource"/>
      <element column="category" type="string"/>
    </set>

Except, when I populate my category list, and try getSession().save(newResource) ... it doesn't save my Strings!

After a little more googling than I would have hoped, I bumped into this - http://forum.hibernate.org/viewtopic.php?t=951848&blah blah... where the hibernate team says you have to flush in this case. You don't have to flush to save most other things, but if it's a Collection of Strings, you have to flush.

So this works:

    getSession().save(newResource);
    getSession().flush();

- no changes to the mapping required.

22 February 2009

Another Smug Mac User

Today I got even smugger, if you can believe that. To cut a long, long story short, Sabrina, my beloved, a victim user of Windows Vista, was getting "Local Access Only" while connected to our trusty router that has always worked before, in our heterogeneous network of Mac, Ubuntu, and Various Windowses. The solution, in the end, was simple: "ipconfig /release" followed by "ipconfig /renew"*, but the path was labyrinthine. Innumerable fora record the despair and frustration of countless vista users facing the dreaded "Local Access Only" message

At the risk of repeating myself: the Mac is the first computer I've had that Mostly Just Works. The biggest annoyance with this thing is the no delete key - but there is a partial solution - KeyRemap4MacBook

But Vista? How can they break DHCP - something that's been working for over twelve years? How do people who are not technical deal with this?

(*) In case you came to this post in despair and frustration: here's how to run ipconfig

  1. Hit Start -> Run
  2. Enter "cmd" and hit return. A "command prompt" window will appear
  3. type "ipconfig /release" and hit return - this should get rid of any out-of-date connection state
  4. type "ipconfig /renew" and hit return - this gets you new connection state
  5. close the "cmd" window
Theoretically, and with a bit of luck, and assuming nothing else is wrong, and possibly after a slight delay, you'll have your internet again

31 January 2009

Homer vs The Bible

Many years ago I engaged in a weekly debate with some Christians on the Lewis Trilemma - the liar/lunatic/son of god question. Both sides fought nobly and I came out at the end of it with my atheism reinforced. The Christians (who were all charming people btw) probably came out with their religiosity reinforced, so it was pretty zero-sum in the end.

Many, many arguments were advanced by the Christian side to support their beliefs, but by the end of the year their reasoning boiled entirely down to this: When I read the Bible, I hear the Voice of God speaking to me.

That's all. The final, irreducible argument. It doesn't leave much hope for people, like me, with a little god-deafness problem. It also doesn't leave much hope for them, if they happen to hear the Voice of God saying different things to each of them, as appears to happen a little too often. The rascal!

Anyway, along the way we had some entertaining digressions. For example, if you consider ancient Greek texts, Homer's Iliad, or the New Testament, you will notice that there are only a few extant ancient copies of Homer, with a large number and variety of errors. The New Testament, on the other hand, has a larger number of extant ancient copies, with far fewer transcription errors. This, apparently, demonstrates the accuracy and reliability of the New Testament. See http://www.clemson.edu/spurgeon/books/apology/Chapter4.html for an example of this kind of reasoning.

Homer was around a long time before the NT, and his/her/their works were primarily transmitted orally. The surviving documents are older, so it is unsurprising that there are fewer of them. Most of all, part of the fun of reciting a good story is embellishing it, and the early transcribers of Homer doubtless succumbed to this desire. The transcribers of the Good News however had entirely different motives and risked hell were their labours imperfect.

In other words, both Homer and the Bible come with a Creative-Commons license - but the Bible invokes the "No Derivative Works" clause. It doesn't make the Bible right any more than a commercial license makes a software product superior.

26 January 2009

How to draw an ellipse

Pretty much every drawing tool and library comes with an ellipse-drawing function, so you never even need to think about how it's done. Until, one day, you're the one writing the drawing tool.

Formulae and sample code abound for calculating the outline of an ellipse. The problem is when you try to draw a nice-looking ellipse, the edge needs to be smooth, and this means half-colouring some of the pixels at the edge.

The first thing is to remember that a pixel doesn't represent a point, it represents a square. So there are two things to calculate: for any given ideal euclidean point, whether that point is inside the ellipse; and for any given square, its "insideness", ie how much of that square is covered by the ellipse. So a pixel is drawn 100% opaque if it is completely inside the ellipse, and 100% transparent if it is completely outside. Otherwise, it is drawn with a transparency proportional to its insideness.

So, if all four corners of a pixel are "inside" the ellipse, consider the pixel is 100% covered by the ellipse. If all four corners are not "inside", consider the pixel 0% covered. Otherwise, subdivide the pixel into four sub-pixels, and calculate the insideness of each sub-pixel, recursively. The percent coverage for any given pixel is the average coverage for all its sub-pixels. Below a certain threshold, don't recurse; just use the values for the corner points.

To calculate whether a single point is inside the ellipse, use this function:


/**
 * determines whether a point p,q is inside an ellipse
 * specified by (x,y,a,b)
 *
 * returns 1 if inside, 0 if outside
 *
 * @param x x-coordinate of centre of ellipse
 * @param y y-coordinate of centre of ellipse
 * @param a horizonatal radius
 * @param b vertical radius
 * @param p x-coordinate of point to consider
 * @param q y-coordinate of point to consider
 */
insideEllipse = function(x, y, a, b, p, q) {
  var dx = (p - x) / a;
  var dy = (q - y) / b;
  var distance = dx * dx + dy * dy;
  return (distance < 1.0) ? 1 : 0;
};

To calculate the insideness of a square area, use this:


/**
 * determines what proportion of a square (p,q) - (p+side, q+side) is covered by this
 * ellipse. If side < threshold, returns an approximate result.
 *
 * returns: a value in the range [0.0, 1.0]
 *
 * @param p x-coordinate of point to consider
 * @param q y-coordinate of point to consider
 * @param side the length of the edge of the square to sample
 * @param threshold do not recurse if side is less than this value
 */
insideness = function(x, y, a, b, p, q, side, threshold) {
  var i1 = insideEllipse(x, y, a, b, p, q);
  var i2 = insideEllipse(x, y, a, b, p + side, q);
  var i3 = insideEllipse(x, y, a, b, p + side, q + side);
  var i4 = insideEllipse(x, y, a, b, p, q + side);
  var total = i1 + i2 + i3 + i4;
  if (total == 4 || total == 0 || side < threshold) {
    return total / 4.0;
  }

  side = side / 2;
  var j1 = insideness(x, y, a, b, p, q, side, threshold);
  var j2 = insideness(x, y, a, b, p + side, q, side, threshold);
  var j3 = insideness(x, y, a, b, p + side, q + side, side, threshold);
  var j4 = insideness(x, y, a, b, p, q + side, side, threshold);
  return (j1 + j2 + j3 + j4) / 4.0;
};

Enjoy.

20 January 2009

Parisharing

If you live in Paris, or would like to visit Paris, you might be interested in Parisharing, a radical new authentic tourism concept. It's still new, and you can help by taking the relevant survey:

From the site:

30 million people dream of visiting Paris, if only for a few days. 30 thousand who live in Paris dream of traveling elsewhere, if only for a few days.

PariSharing is the meeting of those dreams.

If you hope to visit Paris, this will be your chance to rent a real Parisian apartment at an unbeatable price while enjoying an authentic Parisian lifestyle.

If you live in Paris, this is your chance to increase your earnings and offer yourself a nicer or longer vacation.

There you have it - Parisharing will offer visitors an authentic and affordable experience of Paris, and at the same give residents of Paris an additional incentive to leave the city for vacation. Is that a win-win situation or what?

How Do You Find Me?

When you hit google's cache of a page it conveniently highlights all of your search terms. Some web sites have this neat trick of highlighting your search terms when you've come from a search, even though you're not in google any more. The "referer" (sic) http header gives them the necessary information. Thanks to the amazing technology presented in this article, you, too, can do the same thing and dazzle your visitors.

On top of that, if you're not using Google Analytics, you will surely want to know what search terms people are using to get to your site. Why not Google Analytics? Because They Know Too Much Already!

Here's a lump of java code you can stick into your web app for extracting google search term information. There are three important methods:

public String getSearchTerm(HttpServletRequest req) - return the URL-decoded search term. Given a referrer http://google.com/search?q=awesome+icon+editor, return "awesome icon editor". This String is what you would use for highlighting content on your page to give visitors the creepy feeling that you know what they're thinking.

public boolean isSearching(HttpServletRequest req) - true if getSearchTerm returns a non-empty String

public static Object[] google(String referrer) - returns an array of length 2; google()[0] is the same as getSearchTerm; google()[1] is the search results page number. This tells you how many times your user clicked "next" on google's search results before they got to your site. This is useful information - it tells you how desperately your user wants your app, and how irrelevant google considers your site. Yes, the truth hurts, but you need to know before you can do anything about it. I'm sorry. The page number comes from the "start" parameter. So, with this referrer string http://google.com/search?q=awesome+icon+editor&start=80, the page number would be 8.

If you're using Freemarker, you can bind this object to a global variable, so unless you abhor globalisation you will know directly within your template whether you're being googled.

So, the code. Free for private and commercial use. Don't be afraid to link back here. Enjoy.

ExternalSearchHelper.java

/*
  copyright conan dalton 2009, license http://creativecommons.org/licenses/by-sa/3.0/ 
*/
import org.apache.commons.lang.StringUtils;

import javax.servlet.http.HttpServletRequest;
import java.net.URLDecoder;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class ExternalSearchHelper {
  static final Pattern itsGoogle = Pattern.compile("http://[^/]*google[^/]+/.*[&\\?]q=([^&]+).*");
  static final Pattern itsGoogle2 = Pattern.compile("http://[^/]*google[^/]+/.*[&\\?]q=([^&]+).*&start=([^&]+).*");

  public String getSearchTerm(HttpServletRequest req) {
    String referrer = req.getHeader("Referer");
    if (referrer == null || referrer.length() == 0) {
      return "";
    }

    return (String) google(referrer)[0];
  }

  public boolean isSearching(HttpServletRequest req) {
    // re-implement this if you don't want to depend on apache-commons
    return StringUtils.isNotBlank(getSearchTerm(req)); 
  }

  public static Object[] google(String referrer) {
    Object[] result = new Object[2];
    if (referrer == null || referrer.length() == 0) {
      return result;
    }

    Matcher m2 = itsGoogle2.matcher(referrer);
    if (m2.matches()) {
      result[0] = decode(m2.group(1));
      result[1] = new Integer(Integer.parseInt(m2.group(2)) / 10);
      return result;
    }

    Matcher m = itsGoogle.matcher(referrer);
    if (m.matches()) {
      result[0] = decode(m.group(1));
      result[1] = 0;
    }

    System.out.println("search term " + result[0]);
    return result;
  }

  private static String decode(String s) {
    return URLDecoder.decode(s);
  }
}

15 January 2009

On the Latest Intellij

I downloaded Intellij 8 the other day and as usual it was full of wonderful goodies:

  • Freemarker support, at last - syntax colouring, and freemarker stacktrace navigation from the console: awesome! [UPDATE] and navigation between templates included with the #include directive
  • Git support, at last!
  • Better javascript syntax highlighting
  • Hibernate: uses reverse-data-flow to find some strings which will become HQL queries, and highlights appropriately. Wow. I mean: wow, that must have been hard work, especially for something that isn't really all that essential ...
  • Little "close" button on each tab (the way web browsers do it), so I don't have to right-click and choose "close tab" any more. This is great. One of those little details.
  • Recognises associated source code when I update a dependency and its source - which I do a lot because some code for some projects is split into a separate project. Previously it would get all confused and not find the source any more.
I'm sure there's heaps more stuff, this is just what I noticed from the subset of features I use daily. Still missing:
  • Option to silence "Open in new frame" dialog when opening a new project. This is really annoying: most annoying dialogs come with a "don't ask this again" option - but not this one. Go away, pesky dialog!
Anyway, bravo, jetbrains.