Thursday, June 26, 2014

Cache busting

The process by which sites or servers serve content or HTML in such a manner as to minimize or prevent browsers or proxies from serving content from their cache. This forces the user or proxy to fetch a fresh copy for each request. Among other reasons, cache busting is used to provide a more accurate count of the number of requests from users.

What is Cache?

Memory used to temporarily store the most frequently requested content/files/pages in order to speed its delivery to the user. Caches can be local (i.e. on a browser) or on a network. In the case of local cache, most computers have both memory (RAM), and disk (hard drive) cache.

Behavioral targeting

Using previous online user activity (e.g., pages visited, content viewed, searches, clicks and purchases) to generate a segment which is used to match advertising creative to users (sometimes also called Behavioral Profiling, Interest-based Advertising, or online behavioral advertising). Behavioral targeting uses anonymous, non-PII data.

Friday, June 20, 2014

What is the Web crawler?

web crawler (also known as an automatic indexerbotWeb spiderWeb robot) is a software program which visits Web pages in a methodical, automated manner.
This process is called Web crawling or spidering, and the resulting data is used for various purposes, including building indexes for search engines, validating that ads are being displayed in the appropriate context, and detecting malicious code on compromised web servers.
Many web crawlers will politely identify themselves via their user-agent string, which provides a reliable way of excluding a significant amount of non-human traffic from advertising metrics. The IAB (in conjunction with ABCe) maintains a list of known user-agent strings as the Spiders and Bots list. However, those web crawlers attempting to discover malicious code often must attempt to appear to be human traffic, which requires secondary, behavioral filtering to detect.
Most web crawlers will respect a file called robots.txt, hosted in the root of a web site. This file informs the web crawler which directories should and shouldn't be indexed, but does not enact any actual access restrictions.
Technically, a web crawler is a specific type of bot, or software agent.