This article was published on Monday, September 27, 2004. It is an interview with the founder of Copyscape, Gideon Greenspan.

Copyscape was a brand new service in 2004 and not well known yet. As a publisher concerned by plagiarism I had found Copyscape to be a useful product. So I contacted Gideon and interviewed him.

Are Content Thieves Copying your Website?

Many webmasters put great effort into producing quality content for the web. But unknown to them is there are tens of thousands of other webmasters who are busy copying that content without permission. However you look at it, copying content is theft and there are hundreds of thousands of webmasters who are out there copying your content.

Who is publishing your content and violating your copyright? And most importantly, what can you do about it? There’s a service called Copyscape.com that will help you find content thieves. You can fight back. We interviewed one of the principles of Copyscape.com, Gideon Greenspan, and what he reveals about content theft will open your eyes.

mb: For those who may not know what Copyscape.com is, can you explain in your own words what the website does, what kind of content it offers, and how it benefits the webmaster community?

Gideon: Copyscape finds copies of your content online. You simply enter the URL of your page, and Copyscape finds other pages on the web which have copied or quoted some of your content. Copyscape uses advanced text analysis techniques to ensure that it also finds copies which have been significantly modified from your original.

Judging from the feedback we have received from users, the benefits are significant. The ‘copy-and-paste’ method of website design is extremely widespread and it undermines the effort people put in to create good content for their site. Copyscape makes it quick and easy to protect your rights online by detecting these instances of theft.

mb: What inspired Copyscape? (ie a personal battle w/copy infringement, etc.). What are your motivations behind Copyscape?

Gideon: The original idea for Copyscape came from our Google Alert service (www.googlealert.com). Several webmasters told us they found instances of plagiarism from a Google Alert email.

In these cases, the site had stolen some of their text, which happened to contain a search term they were tracking.

But a simple search is not sufficient to detect many instances of plagiarism since the copied content is usually modified and therefore harder to find. That’s why we decided to develop a new search service aimed specifically at detecting content theft.

mb: What kind of feedback are you receiving from webmasters?

Gideon: The feedback has been overwhelmingly positive and appreciative. We keep hearing from webmasters that they are shocked at how many sites have copied their content, and that they had no idea that the problem was so widespread before they used Copyscape.

mb: Any thoughts of integrating other SEs (like Yahoo – which tends to spider dynamic URLs better) – and might show more results than G alone.

Gideon: Copyscape uses the Google Web APIs, which allow Google to be legitimately queried in an automated fashion. As far as we are aware, no other search engine provides a similar API or permits automated querying.

mb: Do you have any suggestions for preventative actions for deterring content theft?

Gideon: Since the Web is an open medium, there is no easy way to prevent the theft occurring, so the best prevention comes from the risk of detection and subsequent legal action. As a deterrent to would-be content thieves, we provide banners that can be placed on a site to warn potential infringers that they will be found.

mb: How much content on the web do you believe is stolen?

Gideon: It’s hard to put a number on it. All I can tell you is that, as I mentioned before, we are constantly hearing from webmasters who are devastated when they find out how many other sites have plagiarized their content.

mb: Where do you see the Copyscape.com website going in the future?

Gideon: We are now developing a service which will regularly check the web for instances of plagiarism of any page on your site and send you an email alert when it happens. We’re also continuing to develop Copyscape into a central resource for online copyright issues by expanding the information on the site. We’ve recently introduced a list of “10 easy steps to protect your content online”.

mb: What kind of technical demands does it take to run Copyscape.com?

Gideon: We use a powerful server to handle the bandwidth and load. The front end of Copyscape runs on a standard Apache/PHP platform. Behind the scenes, some highly optimized compiled C code performs the page decoding and comparison.

mb: In your opinion, what is the mentality of the average content thief? Who is it that we are dealing with?

Gideon: In a word – laziness. It takes time, creativity and effort to develop original compelling content for a site.

It is much quicker and easier to copy from someone else’s page and then make minor modifications to suit your needs. Before Copyscape, the risk of detection was very low.

mb: What are we up against in the war against content thieves? How sophisticated are they?

Gideon: I do not believe that most content thieves are particularly sophisticated. They visit a website in their browser, select some text, hit ‘Copy’ and then ‘Paste’ into their own page. If they were sophisticated, they probably would have written their own content instead.

mb: Do you see content theft going away or diminishing anytime in the future?

Gideon: In the short term, I’m sure it will continue to increase as the web forms a larger and more integral part of the global economy. It will only begin diminishing once tools for fighting content theft like Copyscape become integral to the web landscape.

mb: Anything you’d like to emphasize to webmasters?

Gideon: It’s important to be proactive. Keep an eye on what is happening on the Web and check regularly for any infringements of your rights.

With a few simple clicks, someone can copy your entire website and post it under their own domain name and identity. If you don’t act quickly, they will soon be taking away your traffic and sales.

mb: So there you have it. Visit Copyscape.com and plug in your url. You’ll be surprised at what you find. Content thieves are out there but now you have a tool to help you fight back.