February 02, 2010

Lazyweb request: lists of "good" and "bad" words for sentiment analysis

ENGLISH: I'm looking for "positive" and "negative" word lists which can be used as reference material when analyzing text.

The idea is to classify text with a high frequency of positive words as content with positive sentiment, and vice versa.

I imagine that such reference lists are already in use by applications which measure sentiment. Any ideas on where to find them in the public domain?

I'm primarily interested in these lists in (1) Dutch, (2) English, (3) Finnish.

NEDERLANDS: Ik ben op zoek naar woordenlijsten met "positieve" en "negatieve" woorden die ik kan gebruiken om het sentiment van teksten te analyseren.

Teksten met een hoog gehalte aan positieve woorden kunnen zo worden geclassificeerd onder positief sentiment en vice versa.

I stel me voor dat zulke referentie-lijsten al worden gebruikt door bestaande applikaties die sentiment meten. Enig idee waar ik zoiets kan vinden?

Het gaat me in eerste instantie om (1) nederlands, (2) engels en (3) fins.

SUOMEKSI: Jaaha, voisiko joku pliis auttaa - siis kääntää - kiitos? :-)

January 30, 2010

'Madame Magazine' Raili Mäkinen, over Nederlanders | Talouselämä 3/2010

Raili Mäkinen, vertrekkend algemeen direkteur van Sanoma Magazines Finland, in een interview in Talouselämä, 29 januari 2010, pagina 25:

"Hollantilaisilla ei ole minkäänlaista herran pelkoa. He ovat kerta kaikkiaan hyviä kyseenalaistamaan kaiken. Ensin ajattelin, että hyvänen aika, eiväthän nämä tiedä Suomen markkinoista mitään ja silti sai koko ajan olla selittämässä. Mutta kun antautuu kyseenalaistavaan keskusteluun, oppii paljon."

Jammer dat het artikel niet op Talouselämä's website te vinden is, anders had ik ernaar kunnen linken.

January 28, 2010

'So that people would find each other'

100118_L-S_Jos_250x305 This .pdf file has page 10 of Länsi-Savo, the daily newspaper from Mikkeli, for January 18, 2010.The feature story, 'Jotta ihmiset löytäisivät toisensa' ('So that people would find each other') by reporter Kaisa Parta is the result of her interviewing me at our home in Mäntyharju.

We talked about the differences between Finland and the Netherlands, how I've settled as an immigrant, and how moving to live among the woods and lakes of Eastern Finland wouldn't have been very likely for our family without the Internet.

Although she claims to be rather novice to Internet technology, Kaisa is very perceptive, asked pertinent questions and managed to unpack Cluetail's core business idea - developing recommendation technologies to connect people to the people and information most relevant to them - in what I consider a pleasant read.

Yeah, well, I would, wouldn't I? :-)

January 27, 2010

[UPDATE: solution] Struggling to re-route my microblog posts and shared reading

[UPDATE, January 30, 2010: I think I found a solution. I created an additional notify.me account, so now I can handle two different flows.

(1) Shared reading:

Google Reader -> Yahoo! Pipe -> notify.me (1st account) -> Ping.fm -> Twitter, Hi5, Jaiku, Friendfeed, Identi.ca, Plurk

(2) (Other) status updates / microblog posts:

Twitter -> Yahoo! Pipe -> notify.me (2nd account) -> Ping.fm -> LinkedIn, Typepad, Hi5, Jaiku, Friendfeed, Plaxo, Facebook, Identi.ca, Plurk, Tumblr

(3) Posting from any other place (e.g. Skype) to Ping.fm, or from the Ping.fm web UI:

-> Ping.fm -> @tt in post -> Twitter -> (see 2 above)

I'm testing now. Hope it will work this way.

Tweet: http://ping.fm/txw1h [UPDATE: solution] Struggling to re-route my microblog posts and shared reading

Okay, method number (3), with the @tt prefix, seems to work. At least it seems to post to Twitter only...]

Twitter's implementation of "reply" and "retweet" functionality inside its web UI is compelling me to set it apart from other social networks that support status updates and microblog posts.

Where I used to input my microblog posts in Ping.fm in order to distribute them to virtually all my accounts on social web services (including Twitter), I now find it a better idea to input on Twitter first, and then have my tweets automatically route to the other services.

Why? Because I want to use Twitter's "reply" and "retweet" buttons whenever an interesting conversation unfolds on Twitter.

Until now, I would type in the @ or RE or RT syntax manually. This involved the same effort whether on the Twitter web UI or on the Ping.fm web UI. So I would usually go to Ping.fm in order to spread my tweet across services.

Twitter now adds useful metadata when e.g. replying to a tweet. Due to that metadata, you can actually see on Twitter to which tweet I was replying. This is very useful. Since that metadata does not travel with my message when I write it on Ping.fm, I am compelled to write every reply on Twitter itself.

One such compelling reason is enough for me to switch from Ping.fm to Twitter.

Current flow:

Ping.fm -> all my accounts on social web services

AND:

Google Reader -> Yahoo! Pipes -> notify.me -> Ping.fm -> all my accounts

Desired flow:

Twitter -> (notify.me?) -> Ping.fm -> all my accounts (except Twitter)

AND:

Google Reader -> Yahoo! Pipes -> (notify.me?) -> Twitter -> (notify.me?) -> Ping.fm -> all my accounts (except Twitter)

OR:

Google Reader -> Yahoo! Pipes -> (notify.me?) -> Ping.fm -> all my accounts (including Twitter)

The challenge that I've run into is to do with notify.me. As far as I can tell, I can setup notify.me to post to Ping.fm in one way only: either for Ping.fm to post to Twitter only, or for Ping.fm to post to all my social web accounts (including or excluding Twitter).

I'm wondering if there's a hack, or whether I will need to find another service, similar to notify.me, in order to create a different route.

I've been trying some syntax suggested by Ping.fm in order to specify to which services it should post - by including that syntax into the Yahoo! Pipes feed.

In particular, I've tried to include #T in the Yahoo! Pipe after I had created a posting group "#T" on Ping.fm which included only Twitter. To no effect.

I then tried to include @tt in the Yahoo! Pipe hoping that Ping.fm would post only to Twitter, but none of those posts seem to go through at all. Three of them were picked up by notify.me, but none appeared on my "recent posts" on Ping.fm.

(I do apologize for my messy language here. It's late and I should really be sleeping. But this is bugging me.)

LATER: Right, after I removed "@tt" from the Yahoo! Pipe, my Google Reader shared reading items do seem to go through again.

EVEN LATER: Well, maybe not. But I need to get some sleep now. Let's see how much has gone through by sunrise. In any case, seems like I need to find an additional grab-and-post service like notify.me in order to enable two out of three routes from the desired flow described above.

January 01, 2010

Happy New Year, everybody!

HappyNewYear2010
 
 

December 31, 2009

Police find killer dead after shooting spree in Finland, taking six lives

(Information based on Finnish media reports - see sources below)

Six people lost their lives today in a shooting spree in the Finnish city of Espoo, near the capital Helsinki.

Three men and a woman were shot dead in the Prisma super market store at the Sello shopping center, around 10 am Finnish time (= 8 am UTC). All four were employees at the store.

A fifth victim, the ex-spouse of the killer, was found dead in her home in Espoo. She was an employee of the Prisma store, too.

In a live broadcast press conference which started at 14:30 Finnish time (12:30 UTC), police revealed that the shooter, Ibrahim Shkupolli, born in 1966, had killed himself in his own home in Espoo. Shkupolli is a native Kosovo Albanian.

The shooter assassinated his victims with a 9 mm hand gun. A restraining order was in force against Shkupolli, to prevent him from approaching the Prisma store as well as the home of his ex-spouse.

He also had previous convictions, in 2003 and 2007, for illegal possession of fire arms and ammunition.

The exact motive of the killings is still under investigation.

Finland has a history of public massacres in recent years. Eleven people, including the shooter Matti Juhani Saari, died in a massacre at a vocational school in Kauhajoki, September 2008. Nine people, including the shooter Pekka-Erik Auvinen, died in a shooting incident on Wednesday at Jokela High School, in Tuusula, November 2007.

The following are my tweets, based on Finnish media reports. I'll copy-paste them here in chronological order:

Tweet: [Reading:] Sello Espoossa: Ainakin neljää ihmistä ammuttu - Suomi - Uutiset - Ilta-Sanomat http://ping.fm/vHvR8

Tweet: http://www.yle.fi Four people killed in Finnish shopping mail shooting #Finland #news #shooting

Tweet: Three men and a woman were killed in a shopping mall shooting in Finland this morning. #Finland #news #shooting

Tweet: Police know the identity of the shooter, male, born 1966. Motive as yet unknown. #Finland #news #shooting

Tweet: Shoot-out happened at Sello Prisma mall, city of Espoo near capital Helsinki #Finland #news #shooting

Tweet: Police were alarmed 10:08 am. Still looking for killer, who used a 9 mm hand gun. #Finland #news #shooting

Tweet: http://ping.fm/ET2pf Suspect, Ibrahim Shkupolli, is known to the police. Updated 11:30 UTC. #Finland #news #shooting

Tweet: http://ping.fm/zHaYg Blog post: 'Another shoot-out in Finland: four people killed in shopping mall' #Finland #news #shooting

Tweet: http://ping.fm/zHaYg Not 4, but 5 killed in Finnish shopping mall shooting; killer at large #Finland #news #shooting

Tweet: http://ping.fm/zHaYg 5th victim found dead in a private home in Espoo #Finland #news #shooting

Tweet: http://ping.fm/zHaYg Killer still at large, "armed and dangerous" (Police) #Finland #news #shooting

Tweet: http://ping.fm/zHaYg Shooting spree in Finland; five dead, killer at large #Finland #news #shooting

Tweet: http://ping.fm/zHaYg Police: possibly 6 dead (not 5), possibly including the killer. #Finland #news #shooting

Tweet: http://ping.fm/zHaYg Police have surrounded home of suspected killer #Finland #news #shooting

Tweet: http://ping.fm/zHaYg Killer convicted for illegal arms posession 2003, 2007; restraining order #Finland #news #shooting

Tweet: http://ping.fm/zHaYg Police have found the suspect killer, Ibrahim Shkupolli, dead #Finland #news #shooting

Tweet: http://ping.fm/zHaYg Police find killer dead after shooting spree in Finland, taking six lives #Finland #news #shooting

Tweet: http://ping.fm/zHaYg Gunman in Finnish massacre Ibrahim Shkupolli was Kosovo Albanian #Finland #news #shooting

Tweet:

Sources:

Neljä kuoli ammuskelussa Espoon Sellossa - tekijä on edelleen kateissa | YLE (national public broadcaster)

Tässä on poliisin etsimä ampuja | MTV3.fi (national commercial TV channel)

Police press conference broadcast via YLE Areena

...

December 27, 2009

Would we still call it journalism?

Tweet (by me): [Reading:] The End Of Hand Crafted Content http://ping.fm/kzv9s

Tweet (by Katri Lietsala): @josschuurmans What did you think about Arrington's article? In my opinion, he is so right: excellent journalists can have their own brand.

We're talking about Michael Arrington's article on TechCrunch, 'The End Of Hand Crafted Content', which by now has received 350 comments and, according to Topsy.com, was retweeted 1246 times.

Tweet: @katrilietsala http://ping.fm/lxNzC I agree that excellent journalists can have their own brand. There are several push factors at play.

For one, "disintermediation" and "sources going direct" imply that one no longer needs to be part of a news organization - the news industry - to conduct and share acts of journalism. We really are witnessing a revolution, with the means of production changing hands.

For another, the social web, particularly the blogosphere and the real-time web, appear to appreciate personal perspectives just as much as "objective reporting". This seems like an opportunity for individuals, including the "excellent journalists" that you mention, to build their personal brands.

Furthermore, I expect that journalists will be increasingly compelled to go solo if their news organizations keep hanging on to their model of "lecturing" rather than facilitating conversation.

Some freelancers have always been successful at franchising their personal brand across various channels and publications.

The Demonic Verses

Having said all that, the other interesting point that Mike Arrington made has to do with the advent of highly automated, "fast food" content production.

The illustrating example here is Demand Media, a company whose way of producing “content” was characterized by Jay Rosen as "demonic".

Demand Media immediately brought to my mind the animated video "EPIC 2015" (the Evolving Personal Information Construct), in which GoogleZon operates in a rather similar fashion and the New York Times finally goes off-line to continue as an elite newspaper for the rich and the elderly.

Yet, personally, I am not so afraid of this type of spam. As Doc Searls wrote:

"(...) Just as an aside, I’ve been hand-crafting (actually just typing) my “content” for about twenty years now, and I haven’t been destroyed by a damn thing. I kinda don’t think FFC is going to shut down serious writers (no matter where and how they write) any more than McDonald’s killed the market for serious chefs. (...)"

The web has always challenged us to distill signal from noise. The vast majority of content on Twitter, for example, is of no consequence to most of us.

If anything, should spammy businesses like Demand Media succeed in gaming the major search engines (which I doubt), it would only boost our reliance on social filters: if I know you and you have read something that you'd like to share (and possibly discuss) with me, I'd probably be interested and trust that it's not spam. We are already relying more and more on human filters this way.

The business model is still up in the air

The big question remains, how is "excellent journalism" going to be paid for in the future? Jeff Jarvis is exploring possible answers within the 'New Business Models for News' program at the City University of New York's CUNY Graduate School of Journalism.

Nikki Usher contends that the business model for news has always been broken - which, in my view, seems to imply that news provision may have to be subsidized. Looking at it that way, Arrington may be right in suggesting that for some, the only way to keep publishing may be pro bono.

Perhaps if we made a distinction between b2b and b2c journalism?

The revenue model for b2c journalism relies on sales and advertising. Selling journalistic content to consumers seems an increasingly difficult proposition. And on the flip side, Dave Winer sheds some doubt on the future of advertising as well, in Rebooting the News #35:

"(...) advertising itself may go away. “In a way an ad is a query… They try to guess as to what I’m interested in. And the better they guess, the more it becomes information." (...)"

The revenue model for b2b journalism is a content model. Businesses will always be willing to pay for timely and/or exclusive information as long as it's an essential part of their supply chain. The potential customer segment here is not limited to the media.

But what then, if the customer is not a media organisation? Let's say it's a mobile phone producer, or an insurance company instead. If journalists were to supply them with information which has been researched and packaged exactly as if it were supplied to the media, would we still call it journalism?

December 22, 2009

Amplification is the new circulation

When one quotes, forwards or retweets a reported fact (or opinion, for that matter), I believe it is considered good journalistic practice to try and reference a source as close as possible to the original event, observer or report.

David Weinberger's "transparency is the new objectivity" would support the suggestion that such practice is just as much required on the Net today than it has been in the press and public discourse traditionally.

(And BTW, Just like professor Weinberger does, I should really apologize for the cliché of “x is the new y.”)

Dan Gillmor appears to support this principle as well by recommending that we should be skeptical of everything (while not equally skeptical of everything) we read and always consider the trustworthiness of the source and the verifiability of its claim.

And while I agree that the transparency and verifiability of a story's origin  is an important attribute of its credibility, I also observe a dilemma here:

With the proliferating practice of reblogging and retweeting, it often seems increasingly cumbersome to track down the original source.

Amplification is the new circulation.

As we move away from the lecture model to the conversation model, facts and opinion spread through the social graph as by "word of post".

Tweet: http://ping.fm/F2TxI @jayrosen_nyu Is it reasonable to expect from everyone who amplifies a message that they link to the origin?

Jay Rosen, I believe that this is a challenge for the rebooted news system and I would love to learn your take on it.

Let me offer an example.

My wife, Minnna Ojamies, is a native Finn, who follows the Finnish mainstream media closely. She serves as my "human filter" to the news in Finnish. She uses Google Reader to share the news reports which she considers most interesting. I subscribe to her shared reading on Google Reader.

Also I happen to share stuff I read; articles, posts and tweets which I think may be of interest to others and/or which I would like to capture for possible future reference.

What I share on Google Reader flows into an RSS feed (edited on Yahoo! Pipes to include the string "[Reading:]" in front of the headline), which is forwarded by notify.me via Ping.fm onto a number of "social" web services including my account on Twitter.

The other day, she shared this article published on Taloussanomat, reporting that the 100-dollar laptop, for which Nicholas Negroponte has been campaigning, had arrived.

I hadn't seen this news in any of the other RSS feeds that I subscribe to. Unfortunately, the article was rather poor on source references. Also, it didn't mention much anything about the timing of availability of the laptop in question, nor about its competition.

In other words, there was little transparency and verifiability to go by. Yet, when it comes to overall credibility as a news brand, Taloussanomat finds itself - in my perception at least - in positive territory. Therefor I shared it.

The topic interests me and if the report turns out to be "new and true", I will be happy that I captured and amplified it. If not, I will be disappointed in Taloussanomat and regret amplifying noise rather than signal.

I could have done my own background check, of course. A simple web search would probably have done the trick. And services like Techmeme are helpful, too.

But my point, really, is that it may not be realistic to expect "amplifiers" to routinely carry out verification checks.

Personally, when I am in "reading mode", catching up with my RSS subscriptions, I don't  necessarily want to allocate much time to verification. My priority is to read, capture and share (and amplification is a by-product which serves the rebooted news system).

So, I'm kinda wondering if it would be acceptable that we simply link to where we read the news - in my case the article by Taloussanomat - and perhaps trust that the rebooted news system will somehow take care of verifying the origin itself.

That, across all these chains of amplification, some people will actually go back and refer to the origin of the story - especially when doubt or controversy (combined with a lack of transparency or verifiability) pass a certain threshold.

There's another remark or two that I wanted to make around amplification being the new circulation.

If we accept this framing of the new news system for a moment, it might lead us to believe that paywalls a la Rupert Murdoch constitute indeed an act of shooting oneself in the proverbial foot.

Let's assume for a moment that the way to reach people online is less about signing up subscribers and more about amplification.

In a sense, the newspaper sales model can be associated with "push" and the amplification model with "pull". Through subscription and sales outlets, stuff is pushed to people on certain terms, but only after recieving the package will they find out what they appreciate and what not. What they subsequently do like and decide to amplify is what they have pulled out as signal from the noise.

You can't put it back into the tube, Rupert!

In such a world, where pull trumps push and amplification trumps circulation, any content behind paywalls cannot be amplified.

Or rather, of course the message can be amplified - Washington Post readers also have Twitter accounts - but the paywall discourages the referencing of the original source.

So, if amplification is the new circulation, perhaps the amplifiers (that's us) won't always take the trouble of reading and verifying the original source, especially when it's made cumbersome to do so. If important enough, we'll do the fact-checking somehow routing around the paywall. Perhaps we'll find our own sources.

Hm, Dave Winer, perhaps it's not only that sources are going direct, (@davewiner, what would be the best link to this theme?), but also readers will go direct, namely directly to the source.

Tweet: http://ping.fm/F2TxI @davewiner Seems to me that not only sources, but also receivers go direct, namely to the source.

(When sources go direct, they become senders. And if senders can go direct, so can receivers or readers.)

Finally: how about if the half time of news is approaching to zero, much like the cost of storage of digital content is approaching to zero?

In a variation to Chris Anderson, will it make best business sense to give the news away for free and sell something else? Some type of premium content? Live experiences?

In such scenario, high-quality news including investigative reporting will merely be a brand builder, an investment rather than a business of its own.

December 21, 2009

links for 2009-12-21

  • "(...) MeFeedia is a video community, where you can watch videos from a variety of sources and easily share them with your friends via email or your favorite social network. (...)"

December 16, 2009

Interesting conversations at the MobileBrainBank startup scene inTampere

Tweet: Thank you @PetraHelsinki for #MoBB! Here are my slides: http://ping.fm/T3ADP

I had the privilege of presenting my company, Cluetail Ltd. and demonstrating our Cluetail Lunch Date concept at the pikkujoulu edition of MobileBrainBank (#MoBB) at Demola in Tampere on Tuesday evening.

Thank you, Petra Soderling, for pulling this entrepreneurial network together! Thank you all participants for some very interesting conversations!

UPDATE, Wednesday, December 16, 2009:

Tweet: @PetraHelsinki has put yesterday's #MoBB video material out on YouTube. I just embedded #Cluetail's here: http://ping.fm/T3ADP

Petra has uploaded the video material she shot with her Nokia N97 to YouTube, as embedded here:

Google this blog

  • Google

Follow this blog

About Jos

  • After six years at Nokia HQ in Finland, Jos Schuurmans rebooted his own firm, Cluetail Ltd, in 2009 to help organizations and individuals extract more value from the conversations in which they engage online.

    He calls himself an "entrepreneur, participatory media strategist, blogger, journalist, aspiring coach". (See: LinkedIn profile)

    The content of www.josschuurmans.com is a reflection of anything and everything around which Jos cares to engage in conversation online. Other than that, the blog has no pre-defined theme.

    Jos can be reached:
    (business): jos.schuurmans[at]cluetail.com
    (private): jos[at]josschuurmans.com
    via Skype: jos.schuurmans.cluetail
    by phone: +358 50 59 33 006
    Twitter: twitter.com/josschuurmans
    FriendFeed: friendfeed.com/josschuurmans

    See www.josschuurmans.com/about.html for Jos's presence on other online social networks and media.

Advertisements (2)

Jos's Twitter Updates

How can I serve you better?

  • Did you find what you were looking for on my blog? Let me help you... read more

Advertisements (3)

Categories

Advertisements (4)

Statistics