Nobody's Papers
“Combating Spam in Tagging Systems” by Koutrika et al.

I’m interested in how SPAM in social networks is avoided when it comes to resources tagging.

By resources tagging I mean “Ok, this resource is correctly associated with this tag” which differs a little bit from the usual behaviour when a user enters a URL in del.icio.us.

In my case I have a series of resources that are shown to the user when he/she asks for a specific tag (or query if you want) and the user is able to confirm the association between the resource and the tag.

Searching a little bit I have found

Combating Spam in Tagging Systems” by Koutrika et al. (Georgia Koutrika, Frans Effendi, Zoltan Gyongyi, Paul Heymann y Hector Garcia-Molina) http://doi.acm.org/10.1145/1409220.1409225

in the following URL: http://heymann.stanford.edu/tagspam.html

The paper is easy to read (not too much maths involved), well structured and very useful. It offers lots of data and itneresting concussions.

I’m a little bit disappointed though because in the paper no real widely used tagging system is studied. Instead, an ideal system is proposed where ideal users follow previously defined behaviours (including spammers behaviours) and consequences of modifying some system parameters are analyzed (i.e. how many taggings a user performs, how many bad users are in the system, how many resources are in the network, etc.).

Despite this the paper is highly interesting as it’s a glimpse of how the system degrade as bad users begin to hack it.

Some interesting thoughts for me:

  • The idea of an spam “ghetto”, a set of queries to the system where most of the spammers are confined and where they don’t disturb good users as these doesn’t search in this “ghettos”.
  • The interest of analizing the time where the taggings are performed. A possible research line may appear if we apply some time series analysis.
  • The user profiles described in section 3 are interesting as the allow an easy implementation so a simulator can be built (in fact, they built one for their ideal system). I miss some bad behaviours though (i.e. the bad user which overemphasizes the relevance of a resource for a tag which, in fact, is valid).
  • Some of the concussions are interesting, but they cannot be summarized here (you must read the paper :) ).
  • Most of the methods rely on user profile. This is a pity, as I probably won’t have such a profile (if I have a spammer could easily fake a new one).

As a summary, this paper was interesting, but it’s not exactly what I was looking for. It’s not because of the lack of real analysis (in the end, it seems to be a research subject in progress and the may perform this kind of analysis in the future) but because I’m more interested in a time-centered annonymous analysis.

How the network evolves when a spam attack is on course? How does the ant colony defend itself?

blog comments powered by Disqus
Comments
blog comments powered by Disqus