Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Basic steps of the web analytics process  



1.1  Web analytics Categories  



1.1.1  On-site web analytics  





1.1.2  Off-site web analytics  







1.2  Web analytics data sources  





1.3  Web server log file analysis  





1.4  Page tagging  





1.5  Logfile analysis vs page tagging  



1.5.1  Advantages of logfile analysis  





1.5.2  Advantages of page tagging  





1.5.3  Economic factors  





1.5.4  Hybrid methods  







1.6  Geolocation of visitors  





1.7  Click analytics  





1.8  Customer lifecycle analytics  





1.9  Other methods  







2 Common sources of confusion in web analytics  



2.1  The hotel problem  





2.2  Analytics Poisoning  







3 Problems with third-party cookies  





4 Secure analytics (metering) methods  





5 See also  





6 References  





7 Bibliography  














Web analytics






العربية
Azərbaycanca
Boarisch
Čeština
Dansk
Deutsch
Español
Euskara
فارسی
Français
Galego

ि
Italiano
עברית

Magyar
Nederlands

Nordfriisk
Norsk bokmål

Português
Русский
Slovenčina
Српски / srpski
Suomi
Svenska
ி
Українська
Tiếng Vit

 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 




In other projects  



Wikimedia Commons
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 

(Redirected from Web analytics software)

Web analytics is the measurement, collection, analysis, and reporting of web data to understand and optimize web usage.[1] Web analytics is not just a process for measuring web traffic but can be used as a tool for business and market research and assess and improve website effectiveness. Web analytics applications can also help companies measure the results of traditional print or broadcast advertising campaigns. It can be used to estimate how traffic to a website changes after launching a new advertising campaign. Web analytics provides information about the number of visitors to a website and the number of page views, or creates user behavior profiles.[2] It helps gauge traffic and popularity trends, which is useful for market research.

Basic steps of the web analytics process

[edit]
Basic Steps of Web Analytics Process

Most web analytics processes come down to four essential stages or steps,[3] which are:

Another essential function developed by the analysts for the optimization of the websites are the experiments:

The goal of A/B testing is to identify and suggest changes to web pages that increase or maximize the effect of a statistically tested result of interest.

Each stage impacts or can impact (i.e., drives) the stage preceding or following it. So, sometimes the data that is available for collection impacts the online strategy. Other times, the online strategy affects the data collected.

Web analytics Categories

[edit]

There are at least two categories of web analytics, off-site and on-site web analytics.

In the past, web analytics has been used to refer to on-site visitor measurement. However, this meaning has become blurred, mainly because vendors are producing tools that span both categories. Many different vendors provide on-site web analytics software and services. There are two main technical ways of collecting the data. The first and traditional method, server log file analysis, reads the logfiles in which the web server records file requests by browsers. The second method, page tagging, uses JavaScript embedded in the webpage to make image requests to a third-party analytics-dedicated server, whenever a webpage is rendered by a web browser or, if desired, when a mouse click occurs. Both collect data that can be processed to produce web traffic reports.

On-site web analytics

[edit]

There are no globally agreed definitions within web analytics as the industry bodies have been trying to agree on definitions that are useful and definitive for some time, that is saying, metrics in tools and products from different companies may have different ways to measure, counting, as a result, a same metric name may represent different meaning of data. The main bodies who have had input in this area have been the IAB (Interactive Advertising Bureau), JICWEBS (The Joint Industry Committee for Web Standards in the UK and Ireland), and The DAA (Digital Analytics Association), formally known as the WAA (Web Analytics Association, US). However, many terms are used in consistent ways from one major analytics tool to another, so the following list, based on those conventions, can be a useful starting point:

Off-site web analytics

[edit]

Off-site web analytics is based on open data analysis, social media exploration, and share of voice on web properties. It is usually used to understand how to market a site by identifying the keywords tagged to this site, either from social media or from other websites.

Web analytics data sources

[edit]

The fundamental goal of web analytics is to collect and analyze data related to web traffic and usage patterns. The data mainly comes from four sources:[8]

  1. Direct HTTP request data: directly comes from HTTP request messages (HTTP request headers).
  2. Network-level and server-generated data associated with HTTP requests: not part of an HTTP request, but it is required for successful request transmissions - for example, the IP address of a requester.
  3. Application-level data sent with HTTP requests: generated and processed by application-level programs (such as JavaScript, PHP, and ASP.Net), including sessions and referrals. These are usually captured by internal logs rather than public web analytics services.
  4. External data: can be combined with on-site data to help augment the website behavior data described above and interpret web usage. For example, IP addresses are usually associated with Geographic regions and internet service providers, e-mail open and click-through rates, direct mail campaign data, sales, lead history, or other data types as needed.

Web server log file analysis

[edit]

Web servers record some of their transactions in a log file. It was soon realized that these log files could be read by a program to provide data on the popularity of the website. Thus arose web log analysis software.

In the early 1990s, website statistics consisted primarily of counting the number of client requests (orhits) made to the web server. This was a reasonable method initially since each website often consisted of a single HTML file. However, with the introduction of images in HTML, and websites that spanned multiple HTML files, this count became less useful. The first true commercial Log Analyzer was released by IPRO in 1994.[9]

Two units of measure were introduced in the mid-1990s to gauge more accurately the amount of human activity on web servers. These were page views and visits (orsessions). A page view was defined as a request made to the web server for a page, as opposed to a graphic, while a visit was defined as a sequence of requests from a uniquely identified client that expired after a certain amount of inactivity, usually 30 minutes.

The emergence of search engine spiders and robots in the late 1990s, along with web proxies and dynamically assigned IP addresses for large companies and ISPs, made it more difficult to identify unique human visitors to a website. Log analyzers responded by tracking visits by cookies, and by ignoring requests from known spiders.[citation needed]

The extensive use of web caches also presented a problem for log file analysis. If a person revisits a page, the second request will often be retrieved from the browser's cache, and so no request will be received by the web server. This means that the person's path through the site is lost. Caching can be defeated by configuring the web server, but this can result in degraded performance for the visitor and bigger load on the servers.[10]

Page tagging

[edit]

Concerns about the accuracy of log file analysis in the presence of caching, and the desire to be able to perform web analytics as an outsourced service, led to the second data collection method, page tagging or "web beacons".

In the mid-1990s, Web counters were commonly seen — these were images included in a web page that showed the number of times the image had been requested, which was an estimate of the number of visits to that page. In the late 1990s, this concept evolved to include a small invisible image instead of a visible one, and, by using JavaScript, to pass along with the image request certain information about the page and the visitor. This information can then be processed remotely by a web analytics company, and extensive statistics generated.

The web analytics service also manages the process of assigning a cookie to the user, which can uniquely identify them during their visit and in subsequent visits. Cookie acceptance rates vary significantly between websites and may affect the quality of data collected and reported.

Collecting website data using a third-party data collection server (or even an in-house data collection server) requires an additional DNS lookup by the user's computer to determine the IP address of the collection server. On occasion, delays in completing successful or failed DNS lookups may result in data not being collected.

With the increasing popularity of Ajax-based solutions, an alternative to the use of an invisible image is to implement a call back to the server from the rendered page. In this case, when the page is rendered on the web browser, a piece of JavaScript code would call back to the server and pass information about the client that can then be aggregated by a web analytics company.

Logfile analysis vs page tagging

[edit]

Both logfile analysis programs and page tagging solutions are readily available to companies that wish to perform web analytics. In some cases, the same web analytics company will offer both approaches. The question then arises of which method a company should choose. There are advantages and disadvantages to each approach.[11][12]

Advantages of logfile analysis

[edit]

The main advantages of log file analysis over page tagging are as follows:

Advantages of page tagging

[edit]

The main advantages of page tagging over log file analysis are as follows:

Economic factors

[edit]

Logfile analysis is almost always performed in-house. Page tagging can be performed in-house, but it is more often provided as a third-party service. The economic difference between these two models can also be a consideration for a company deciding which to purchase.

Which solution is cheaper to implement depends on the amount of technical expertise within the company, the vendor chosen, the amount of activity seen on the websites, the depth and type of information sought, and the number of distinct websites needing statistics.

Regardless of the vendor solution or data collection method employed, the cost of web visitor analysis and interpretation should also be included. That is, the cost of turning raw data into actionable information. This can be from the use of third party consultants, the hiring of an experienced web analyst, or the training of a suitable in-house person. A cost-benefit analysis can then be performed. For example, what revenue increase or cost savings can be gained by analyzing the web visitor data?

Hybrid methods

[edit]

Some companies produce solutions that collect data through both log files and page tagging and can analyze both kinds. By using a hybrid method, they aim to produce more accurate statistics than either method on its own.[14]

Geolocation of visitors

[edit]

With IP geolocation, it is possible to track visitors' locations. Using an IP geolocation database or API, visitors can be geolocated to city, region, or country level.[15]

IP Intelligence, or Internet Protocol (IP) Intelligence, is a technology that maps the Internet and categorizes IP addresses by parameters such as geographic location (country, region, state, city and postcode), connection type, Internet Service Provider (ISP), proxy information, and more. The first generation of IP Intelligence was referred to as geotargetingorgeolocation technology. This information is used by businesses for online audience segmentation in applications such as online advertising, behavioral targeting, content localization (orwebsite localization), digital rights management, personalization, online fraud detection, localized search, enhanced analytics, global traffic management, and content distribution.

Click analytics

[edit]
Clickpath Analysis with referring pages on the left and arrows and rectangles differing in thickness and expanse to symbolize movement quantity.

Click analytics, also known as Clickstream is a special type of web analytics that gives special attention to clicks.

Commonly, click analytics focuses on on-site analytics. An editor of a website uses click analytics to determine the performance of his or her particular site, with regards to where the users of the site are clicking.

Also, click analytics may happen real-time or "unreal"-time, depending on the type of information sought. Typically, front-page editors on high-traffic news media sites will want to monitor their pages in real-time, to optimize the content. Editors, designers or other types of stakeholders may analyze clicks on a wider time frame to help them assess performance of writers, design elements or advertisements etc.

Data about clicks may be gathered in at least two ways. Ideally, a click is "logged" when it occurs, and this method requires some functionality that picks up relevant information when the event occurs. Alternatively, one may institute the assumption that a page view is a result of a click, and therefore log a simulated click that led to that page view.

Customer lifecycle analytics

[edit]

Customer lifecycle analytics is a visitor-centric approach to measuring.[16] Page views, clicks and other events (such as API calls, access to third-party services, etc.) are all tied to an individual visitor instead of being stored as separate data points. Customer lifecycle analytics attempts to connect all the data points into a marketing funnel that can offer insights into visitor behavior and website optimization.[17] Common metrics used in customer lifecycle analytics include customer acquisition cost (CAC), customer lifetime value (CLV), customer churn rate, and customer satisfaction scores.[16]

Other methods

[edit]

Other methods of data collection are sometimes used. Packet sniffing collects data by sniffing the network traffic passing between the web server and the outside world. Packet sniffing involves no changes to the web pages or web servers. Integrating web analytics into the webserver software itself is also possible.[18] Both these methods claim to provide better real-time data than other methods.

Common sources of confusion in web analytics

[edit]

The hotel problem

[edit]

The hotel problem is generally the first problem encountered by a user of web analytics. The problem is that the unique visitors for each day in a month do not add up to the same total as the unique visitors for that month. This appears to an inexperienced user to be a problem in whatever analytics software they are using. In fact it is a simple property of the metric definitions.

The way to picture the situation is by imagining a hotel. The hotel has two rooms (Room A and Room B).

Day 01 Day 02 Day 03 Total
Room A John John Mark 2 Unique Users
Room B Mark Anne Anne 2 Unique Users
Total 2 2 2 ?

As the table shows, the hotel has two unique users each day over three days. The sum of the totals with respect to the days is therefore six.

During the period each room has had two unique users. The sum of the totals with respect to the rooms is therefore four.

Actually only three visitors have been in the hotel over this period. The problem is that a person who stays in a room for two nights will get counted twice if they are counted once on each day, but are only counted once if the total for the period is looked at. Any software for web analytics will sum these correctly for the chosen time period, thus leading to the problem when a user tries to compare the totals.

Analytics Poisoning

[edit]

As the internet has matured, the proliferation of automated bot traffic has become an increasing problem for the reliability of web analytics.[citation needed] As bots traverse the internet, they render web documents in ways similar to organic users, and as a result may incidentally trigger the same code that web analytics use to count traffic. Jointly, this incidental triggering of web analytics events impacts interpretability of data and inferences made upon that data. IPM provided a proof of concept of how Google Analytics as well as their competitors are easily triggered by common bot deployment strategies.[19]

Problems with third-party cookies

[edit]

Historically, vendors of page-tagging analytics solutions have used third-party cookies sent from the vendor's domain instead of the domain of the website being browsed. Third-party cookies can handle visitors who cross multiple unrelated domains within the company's site, since the cookie is always handled by the vendor's servers.

However, third-party cookies in principle allow tracking an individual user across the sites of different companies, allowing the analytics vendor to collate the user's activity on sites where he provided personal information with his activity on other sites where he thought he was anonymous. Although web analytics companies deny doing this, other companies such as companies supplying banner ads have done so. Privacy concerns about cookies have therefore led a noticeable minority of users to block or delete third-party cookies. In 2005, some reports showed that about 28% of Internet users blocked third-party cookies and 22% deleted them at least once a month.[20] Most vendors of page tagging solutions have now moved to provide at least the option of using first-party cookies (cookies assigned from the client subdomain).

Another problem is cookie deletion. When web analytics depend on cookies to identify unique visitors, the statistics are dependent on a persistent cookie to hold a unique visitor ID. When users delete cookies, they usually delete both first- and third-party cookies. If this is done between interactions with the site, the user will appear as a first-time visitor at their next interaction point. Without a persistent and unique visitor id, conversions, click-stream analysis, and other metrics dependent on the activities of a unique visitor over time, cannot be accurate.

Cookies are used because IP addresses are not always unique to users and may be shared by large groups or proxies. In some cases, the IP address is combined with the user agent in order to more accurately identify a visitor if cookies are not available. However, this only partially solves the problem because often users behind a proxy server have the same user agent. Other methods of uniquely identifying a user are technically challenging and would limit the trackable audience or would be considered suspicious. Cookies reach the lowest common denominator without using technologies regarded as spyware and having cookies enabled/active leads to security concerns.[21]

Secure analytics (metering) methods

[edit]

Third-party information gathering is subject to any network limitations and security applied. Countries, Service Providers and Private Networks can prevent site visit data from going to third parties. All the methods described above (and some other methods not mentioned here, like sampling) have the central problem of being vulnerable to manipulation (both inflation and deflation). This means these methods are imprecise and insecure (in any reasonable model of security). This issue has been addressed in several papers,[22][23][24][25] but to date the solutions suggested in these papers remain theoretical.

See also

[edit]

References

[edit]
  1. ^ WAA Standards Committee. "Web analytics definitions." Washington DC: Web Analytics Association (2008).
  • ^ Nielsen, Janne (2021-04-27). "Using mixed methods to study the historical use of web beacons in web tracking". International Journal of Digital Humanities. 2 (1–3): 65–88. doi:10.1007/s42803-021-00033-4. ISSN 2524-7832. S2CID 233416836.
  • ^ Jansen, B. J. (2009). Understanding user-web interactions via web analytics. Synthesis Lectures on Information Concepts, Retrieval, and Services, 1(1), 1-102.
  • ^ Sng, Yun Fei (2016-08-22), "Study on Factors Associated With Bounce Rates on Consumer Product Websites", Business Analytics, WORLD SCIENTIFIC, pp. 526–546, doi:10.1142/9789813149311_0019, ISBN 978-981-314-929-8, retrieved 2023-08-11
  • ^ Menasalvas, Ernestina; Millán, Socorro; Peña, José M.; Hadjimichael, Michael; Marbán, Oscar (July 2004). "Subsessions: A granular approach to click path analysis: Click Path Analysis". International Journal of Intelligent Systems. 19 (7): 619–637. doi:10.1002/int.20014.
  • ^ Chaffey, Dave; Patron, Mark (2012-07-01). "From web analytics to digital marketing optimization: Increasing the commercial value of digital analytics". Journal of Direct, Data and Digital Marketing Practice. 14 (1): 30–45. doi:10.1057/dddmp.2012.20. ISSN 1746-0174.
  • ^ "How a web session is defined in Universal Analytics - Analytics Help". support.google.com. Retrieved 2023-08-11.
  • ^ Zheng, G. & Peltsverger S. (2015) Web Analytics Overview, In book: Encyclopedia of Information Science and Technology, Third Edition, Publisher: IGI Global, Editors: Mehdi Khosrow-Pour
  • ^ Web Traffic Data Sources and Vendor Comparison by Brian Clifton and Omega Digital Media Ltd
  • ^ Marketing Management: A Value-Creation Process (2nd Edition) by Alain Jolibert, Pierre-Louis Dubois, Hans Mühlbacher, Laurent Flores, Pierre-Louis Jolibert Dubois, 2012, p. 359.
  • ^ Increasing Accuracy for Online Business Growth - a web analytics accuracy whitepaper
  • ^ "Page Tagging vs. Log Analysis An Executive White Paper" (PDF). sawmill. 2008.
  • ^ "Revisiting log file analysis versus page tagging": McGill University Web Analytics blog article (CMIS 530) "Revisiting Log File Analysis versus Page tagging". Archived from the original on July 6, 2011. Retrieved February 26, 2010.
  • ^ "Page Tagging (cookies) vs. Log Analysis". Logaholic Web Analytics. 2018-04-25. Retrieved 2023-07-21.
  • ^ IPInfoDB (2009-07-10). "IP geolocation database". IPInfoDB. Retrieved 2009-07-19.
  • ^ a b Kitchens, Brent; Dobolyi, David; Li, Jingjing; Abbasi, Ahmed (2018-04-03). "Advanced Customer Analytics: Strategic Value Through Integration of Relationship-Oriented Big Data". Journal of Management Information Systems. 35 (2): 540–574. doi:10.1080/07421222.2018.1451957. ISSN 0742-1222. S2CID 49681142.
  • ^ Önder, Irem; Berbekova, Adiyukh (2022-08-10). "Web analytics: more than website performance evaluation?". International Journal of Tourism Cities. 8 (3): 603–615. doi:10.1108/IJTC-03-2021-0039. ISSN 2056-5607.
  • ^ Hu, Xiaohua; Cercone, Nick (1 July 2004). "A Data Warehouse/Online Analytic Processing Framework for Web Usage Mining and Business Intelligence Reporting". International Journal of Intelligent Systems. 19 (7): 585–606. doi:10.1002/int.v19:7.
  • ^ "Analytics Poisoning: A Short Review - IPM Corporation". 5 December 2020. Retrieved July 29, 2022.
  • ^ McGann, Rob (14 March 2005). "Study: Consumers Delete Cookies at Surprising Rate". Retrieved 3 April 2014.
  • ^ "Home News Access the Guide Tools Education Shopping Internet Cookies- Spyware or Neutral Technology?". CNET. February 2, 2005. Retrieved 24 April 2017.
  • ^ Naor, M.; Pinkas, B. (1998). "Secure and efficient metering". Advances in Cryptology – EUROCRYPT'98. Lecture Notes in Computer Science. Vol. 1403. p. 576. doi:10.1007/BFb0054155. ISBN 978-3-540-64518-4.
  • ^ Naor, M.; Pinkas, B. (1998). "Secure accounting and auditing on the Web". Computer Networks and ISDN Systems. 30 (1–7): 541–550. doi:10.1016/S0169-7552(98)00116-0.
  • ^ Franklin, M. K.; Malkhi, D. (1997). "Auditable metering with lightweight security". Financial Cryptography. Lecture Notes in Computer Science. Vol. 1318. pp. 151. CiteSeerX 10.1.1.46.7786. doi:10.1007/3-540-63594-7_75. ISBN 978-3-540-63594-9.
  • ^ Johnson, R.; Staddon, J. (2007). "Deflation-secure web metering". International Journal of Information and Computer Security. 1: 39. CiteSeerX 10.1.1.116.3451. doi:10.1504/IJICS.2007.012244.
  • Bibliography

    [edit]
    Retrieved from "https://en.wikipedia.org/w/index.php?title=Web_analytics&oldid=1226021144"

    Categories: 
    Web analytics
    Audience measurement
    Digital marketing
    Market research
    Hidden categories: 
    Articles with short description
    Short description matches Wikidata
    All articles with unsourced statements
    Articles with unsourced statements from February 2013
    Articles with unsourced statements from June 2022
    Articles with unsourced statements from October 2016
    Articles with unsourced statements from July 2022
     



    This page was last edited on 28 May 2024, at 03:36 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki