Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 

















Wikipedia:Wikipedia Signpost/2012-08-27/Recent research







Add links
 









Project page
Talk
 

















Read
View source
View history
 








Tools
   


Actions  



Read
View source
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Get shortened URL
Download QR code
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 

< Wikipedia:Wikipedia Signpost | 2012-08-27

The Signpost


Recent research

New influence graph visualizations; NPOV and history; 'low-hanging fruit'

  • Facebook
  • Twitter
  • LinkedIn
  • Reddit
  • Digg
  • ByPiotr Konieczny, Sage Ross, Evan, Dario Taraborelli, Tilman Bayer and OrenBochman

    A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, edited jointly with the Wikimedia Research Committee and republished as the Wikimedia Research Newsletter.

    Wikipedia-based graphs visualize influences between thinkers, writers and musicians

    A visualization of musical genres related to psychedelic music, based on DBPedia data.

    In a blog post titled "Graphing the history of philosophy",[1] Simon Raper of the company MindShare UK describes how he constructed an influence graph of all philosophers using the "Influenced by" and "Influenced" fields of Template:Infobox philosopher (example: Plato). This information was retrieved using DBpedia with a simple SPARQL query. After some cleanup, the result, consisting of triplets in the form <Philosopher A, Philosopher B, Weight> was processed using the open source graph visualization package Gephi to create an impressive overview of the philosophers within their respective spheres of influence.

    Brendan Griffen extended the idea to "everyone on Wikipedia. Well, everyone with an infobox containing ‘influences’ and/or ‘influenced by’", arriving at a huge, far more dense "Graph Of Ideas" including not only philosophers, but also novelists, fantasy and science fiction writers, and comedians.[2] In another blog post,[3] Griffen added transitive links as well – so that each person is considered to be influenced both directly and indirectly. The most connected people in the graph were ancient Greek thinkers, with Thales, Pythagoras and Zeno of Elea occupying the top three spots. Griffen remarks that this vindicates a statement in Bertrand Russell's History of Western Philosophy (1945): "Western Philosophy begins With Thales".

    Also inspired by Raper's posting, Tony Hirst posted a number of visualizations of the Wikipedia link and category structure (likewise using DBpedia and Gephi, queried via the Semantic Web Import plugin) to visualize related entries and influence graphs in the English Wikipedia. The blog posts (all of which include detailed step-by-step tutorials) examine the related graph of philosophers,[4] and also visualize an influence graph of programming languages[5] and one of musical genres related to psychedelic music.[6] All these visualizations and blog posts by Hirst are released under a Creative Commons Attribution license.

    Hirst also mentioned a related tool called "WikiMaps", the subject of a recent article in the International Journal of Organisational Design and Engineering.[7] As described in a press release, the tool provides a『map of what is “important” on Wikipedia and the connections between different entries. The tool, which is currently in the “alpha” phase of development, displays classic musicians, bands, people born in the 1980s, and selected celebrities, including Lady Gaga, Barack Obama, and Justin Bieber. A slider control, or play button, lets you move through time to see how a particular topic or group has evolved over the last 3 or 4 years.』A demo version is available online.

    See also the recent coverage of a similar visualization, based on wikilinks instead of infoboxes: "The history of art mapped using Wikipedia"

    Information retrieval scientists turn their attention to Wikipedia's page view logs

    Found to be connected to the "#euro2012" hashtag by analyzing Wikipedia pageviews: Euro 2012 football championship

    The Time-aware Information Access workshop at this year's SIGIR (Special Interest Group on Information Retrieval) conference brought a wave of attention to Wikipedia's public page-view logs. Detailing the number of page views per hour for every Wikipedia project, these files figure prominently in a variety of open-source intelligence applications presented at the workshop.

    A group of researchers from ISLA, University of Amsterdam created an API providing access to this data and performing simple analysis tasks.[8] Though the site appears to be down at the time of writing, the API supports the retrieving a particular article's page-view time series as well as searching for other wikipedia articles based on the similarity of their time series. In addition to machine-readable JSON results, the API will supply simple plots in png format. While the idea of providing page specific time series is not new, support for finding other pages with similar viewing patterns highlights a fascinating new use for Wikipedia page views.

    Two other papers are combining Wikipedia page-view information with external time-series data sets. On the intuition that Wikipedia page views should have a strong correlation with real-world events, researchers from the University of Glasgow and Microsoft built a system to detect which hashtags frequently queried on Bing Social Search were event-related.[9] For example, the hashtag #thingsthatannoyme doesn't clearly correspond to an event, whereas a hashtag like "#euro2012" is about the UEFA European Football Championship. After tokenizing the hashtags into a list of words, the researchers queried Wikipedia for those terms and correlated the time series of hashtag search popularity with the page-view time series for the articles which are returned. This correlation score can be used to indicate which hashtags are likely to be about events, a useful feature for web searches and any other temporally aware zeitgeist application.

    In a similar vein, researchers from the University of Edinburgh and University of Glasgow used the Wikipedia page-view stream to tackle the problem known as first-story detection (FSD), which aims to automatically pick out the first publication relating to a new topic of interest.[10] While traditional techniques primarily focus on newswire or Twitter, the authors used a combination of Twitter and Wikipedia page views to construct an improved FSD system. To improve on state-of-the-art Twitter-only FSD systems, the authors aimed to filter out false positives by checking that the Twitter-based first stories corresponded to a Wikipedia page that was also experiencing heightened traffic during the same period.

    Using a simple outlier detection method, the authors created a set of Wikipedia pages with unexpectedly high page views for each hour. Each Twitter-based first story (tweet) was then matched against the corresponding collection of Wikipedia outliers, employing an undisclosed metric of textual similarity that uses only the Wikipedia page titles. If the tweet failed to match any spiking Wikipedia page, it was down-weighted as a first story candidate. The authors showed that this combined approach improves FSD precision in comparison to a twitter-only baseline for all but the most popular twitter-based stories. Though this research makes advances on the difficult task of first-story detection, perhaps the most immediately useful finding is that Wikipedia page views appear to lag behind twitter activity by roughly two hours. In general, we can expect to see an increasing amount of joint models over various open-source intelligence streams as we learn exactly what each stream is useful for and the relationships between the streams.

    See also the Signpost coverage of a small study of the highest hourly page views on the English Wikipedia during January-July 2010, and their likely causes: "Page view spikes"

    The limits of amateur NPOV history

    In "The inclusivity of Wikipedia and the drawing of expert boundaries: An examination of talk pages and reference lists"[11], information studies professor Brendan Luyt of Nanyang Technological University looks at History of the Philippines, a B-class article that had featured article status from October 2006 until it was delisted at the conclusion of its featured article review in January 2011.

    Luyt argues that talk-page discussions, the types of sources cited, and the organization of the article itself, all point to a very traditional view of what constitutes history: in short, great man history concerned mainly with political and military events, and the actions of elites. This style of history does not capture the breadth of approaches used by professional historians, so does not live up to the ideal of NPOV in which all significant viewpoints published in reliable sources are represented fairly and proportionately. In practice, Luyt shows, editors (lacking sufficient knowledge of the relevant professional historical literature) end up using arguments over bias and NPOV to construct a limited and conservative historical narrative—for this article at the least, although a similar pattern could be found for many broad historical topics.

    The sources cited are primarily what Luyt calls "textbookese" summaries, easily available online, which focus on bare facts without the historical debates that surround them. Between the valid sources and experts recognized by Wikipedia editors and the good-faith use of the NPOV principle to limit other viewpoints, Luyt concludes that—rather than being more inclusive of diverse views and sources than the typical "expert" community—Wikipedia in practice recognizes a considerably narrower set of viewpoints.

    Three new papers about Wikipedia class assignments

    An article titled "Assigning Students to edit Wikipedia: Four Case Studies"[12] presents the experiences of four professors who participated in the Wikipedia Education Program, in a total of six courses total (two of four instructors taught two classes each). The lessons from the assignments included: 1) the importance of strict deadlines, even for graduate classes; 2) having a dedicated class for acquiring skills in editing and for understanding Wikipedia policies, or spreading this over segments of several classes; 3) the benefits of having students interact with the campus ambassadors and the wider Wikipedia community.

    Overall, the instructors saw that compared with their engagement in traditional assignments, students were more highly motivated, produced work of higher quality, and learned more skills (primarily, related to using Wikipedia, such as being able to better judge its reliability). Wikipedia itself benefited from several dozen created or improved articles, a number of which were featured as DYKs. The paper presents a useful addition to the emerging literature on teaching with Wikipedia, as one of the first serious and detailed discussions of specific cases of this new educational approach.

    "Integrating Wikipedia Projects into IT Courses: Does Wikipedia Improve Learning Outcomes?"[13] is another paper that discusses the experiences of instructors and students involved in the recent Wikipedia:Global Education Program. Like most existing research in this area, the paper is roughly positive in its description of this new educational approach, stressing the importance of deadlines, small introductory assignments familiarizing students with Wikipedia early in the course, and the importance of close interactions with the community. A poorly justified (or explained) deletion or removal of content can be quite a stressful experience to students (and the newbie editors are unlikely to realize that an explanation may be left in an edit summary or page-deletion log). A valuable suggestion in the paper was that instructors (professors) make edits themselves, so they would be able to discuss editing Wikipedia with students with first-hand experience instead of directing students to ambassadors and how-to manuals; and to dedicate some class time to discussing Wikipedia, the assignment, and collective editing.

    A four-page letter[14] in the Journal of Biological Rhythms by a team of 48 authors reported on a a similar undergraduate class project in early 2011, where 46 students edited 15 Wikipedia articles in the field of chronobiology, aiming at good article status. After their first edits, they were systematically given feedback by one "Wikipedia editor and 6 experts in chronobiology" before continuing their edits (in the paper's acknowledgements the authors also thank "innumerable Wikipedia editors who critiqued student edits"). Because of the high visibility of the results – most of the articles were ranked top in Google results – students found the experience rewarding. Topics were selected collaboratively by the class, and because students came up with a relatively small number of suggestions, one concern was that the project might, if repeated, run out of article topics in the given subject area.

    A literature review presented at July's Worldcomp'12 conference in Las Vegas about "Wikipedia: How Instructors Can Use This Technology As A Tool In The Classroom"[15] also recommended to have students actively edit Wikipedia (as well as practicing to read it critically), and concluded that "it is time to embrace Wikipedia as an important information provider and one of the innovative learning tools in the educators' toolbox."

    Substantive and non-substantive contributors show different motivation and expertise

    "Investigating the determinants of contribution value in Wikipedia"[16] reports the results of a survey of Wikipedians who were asked their opinion about the "contribution value" of their edits (measured by agreement to statements such as "your contribution to Wikipedia is useful to others"), which was then related to various characteristics.

    The researchers used Google to obtain a list of 1976 Wikipedia users’ email addresses (using keywords such as “gmail.com” or “hotmail.com”). They sent invitation emails that provided the URL to the online questionnaire. In six weeks, 234 editors completed all the questions. Of these, 205 – Nine females and 196 males – supplied a valid user name and were considered in the rest of the analysis (anonymous editors were removed).

    A content analysis was performed of 50 randomly selected edits by each respondent (or all, if the user had fewer than 50 edits), classifying them as "substantive" changes (e.g. "add links, images, or delete inaccurate content") and "non-substantive changes" (e.g. "reorganizing existing content [or] correcting grammatical mistakes and formatting texts to improve the presentation"), corresponding to "two [proposed] new contributor types in Wikipedia to discriminate their editing patterns."

    An attempt was made to relate this to the "contribution value" the respondents assigned to their own edits, and to their responses in two other areas:

    The "breadth" of interests and resources was defined as the number of ratings above a certain threshold in each, and the "depth" as the highest rating assigned in each.

    In an "important consideration for practitioners", the authors wrote that:

    "[T]o produce valuable contributions, users with high depth of interests and resources should be encouraged to concentrate their efforts on substantive changes. Meanwhile, for users with high breadth of interests and resources, wiki practitioners should advise them to pay more attention to nonsubstantive changes. The findings imply that practitioners can try to identify two distinct types of users. To achieve this objective, they may develop certain algorithms in wikis to automatically detect the frequencies of substantive/non-substantive changes of users. ... For example, notification messages about wiki articles that need substantive changes can be sent to users who have high levels of depth of interests and resources. Similarly, well-prepared messages about articles that need non-substantive changes can be delivered to users who have high levels of breadth of interests and resources."

    Is there systemic bias in Wikipedia's coverage of the Tiananmen protests?

    Remembrance of the 20th anniversary of the June 4 events in Hong Kong (replica of the "goddess of democracy" statue)
    Remembrance in the West (replica of the same statue at the University of British Columbia, Canada)

    Wikipedia: Remembering in the digital age[17] is a masters dissertation by Simin Michelle Chen, examining collective memories as represented on the English Wikipedia; she looked at how significant events are portrayed (remembered) on the project, focusing on the Tiananmen Square Protests of 1989. She compared how this event was framed by the articles by New York Times and Xinhua News Agency, and in Wikipedia, where she focused on the content analysisofTalk:Tiananmen Square protests of 1989 and its archives.

    Chen found that the way Wikipedia frames the event is much closer to that of The New York Times than the sources preferred by the Chinese government, which, she notes, were "not given an equal voice" (p. 152). This English Wikipedia article, she says, is of major importance to China, but is not easily influenced by Chinese people, due to language barriers, and discrimination against Chinese sources that are perceived by the English Wikipedia as unreliable – that is, more subject to censorship and other forms of government manipulation than Western sources. She notes that this leads to on-wiki conflicts between contributors with different points of views (she refers to them as "memories" through her work), and usually the contributors who support that Chinese government POV are "silenced" (p. 152). This leads her to conclude that different memories (POVs) are weighted differently on Wikipedia. While this finding is not revolutionary, her case study up to this point is a valuable contribution to the discussion of Wikipedia biases.

    While Chen makes interesting points about the existence of different national biases, which impact editors' very frames of reference, and different treatment of various sources, her subsequent critique of Wikipedia's NPOV policy is likely to raise some eyebrows (pp. 48–50). She argues that NPOV is flawed because "it is based on the assumption that facts are irrefutable" (p. 154), but that those facts are based on different memories and cultural viewpoints, and thus should be treated equally, instead of some (Western) being given preference. Subsequently, she concludes that Wikipedia contributes to "the broader structures of dominance and Western hegemony in the production of knowledge" (p. 161).

    While she acknowledges that official Chinese sources may be biased and censored, she does not discuss this in much detail, and instead seems to argue that the biases affecting those sources are comparable to the those affecting Western sources. In other words, she is saying that while some claim Chinese sources are biased, other claim that Western sources are biased, and because the English Wikipedia is dominated by the Western editors, their bias triumphs – whereas ideally, all sources should be acknowledged, to reduce the bias. The suggestion is that Wikipedia should reject NPOV and accept sources currently deemed as unreliable. Her argument about the English Wikipedia having a Western bias is not controversial, was discussed by the community before (although Chen does not seem to be aware of it, and does not use the term "systemic bias" in her thesis) and reducing this bias (by improving our coverage of non-Western topics) is even a goal of the Wikimedia Foundation. However, while she does not say so directly, it appears to this reviewer that her argument is: "if there are no reliable non-Western sources, we should use the unreliable ones, as this is the only way to reduce the Western bias affecting non-Western topics". Her ending comment that Wikipedia fails to leave to its potential and to deliver "postmodern approach to truth" brings to mind the community discussions about verifiability not truth (the existence of this debates she briefly acknowledges on p. 48).

    Overall, Chen's discussion of biases affecting Wikipedia in general, and of Tiananmen Square Protests in particular, is useful. The thesis however suffers from two major flaws. First, the discussion of Wikipedia's policies such as reliable sources and verifiability (not truth ...) seems too short, considering that their critique forms a major part of her conclusions. Second, the argumentation and accompanying value-judgements that Wikipedia should stop discriminating against certain memories (POVs) is not convincing, lacking a proper explanation of the reasons why the Wikipedia community made those decisions favoring verifiability and reliable sources over inclusion of all viewpoints. Chen argues that Wikipedia sacrifices freedom and discriminates against some memories (contributors), which she seems to see as more of a problem that if Wikipedia was to accept unreliable sources and unverifiable claims.

    "Low-hanging fruit hypothesis" explains Wikipedia's slowed growth?

    A student paper titled "Wikipedia: nowhere to grow"[18] from a Stanford class about "Mining Massive Data Sets" argues for the "low-hanging fruit hypothesis" as one factor explaining the well-known observation that "since 2007, the growth of English Wikipedia has slowed, with fewer new editors joining, and fewer new articles created". The hypothesis is described as follows: "the larger [Wikipedia] becomes, and the more knowledge it contains, the more difficult it becomes for editors to make novel, lasting contributions. That is, all of the easy articles have already been created, leaving only more difficult topics to write about". The authors break this hypothesis into three smaller ones that are easier to test – that (1) there has been a slowing in edits across many languages with diverse characteristics; (2) older articles are more popular to edit; and (3) older articles are more popular to read. They find a support for all three of the smaller hypotheses, which they argue supports their main low-hanging fruit hypothesis.

    While the overall study seems well-designed, the extrapolation from the three subhypotheses to the parent hypothesis seems problematic. The authors do not provide a proper operationalization of terms such as "novel", "lasting", and "easy/difficult", making it difficult to enter into a discourse without risking miscommunication. There may be at least four main issues in the work:

    Overall, the paper presents four hypotheses, three of which seem to be well supported by data, and contribute to our understanding of Wikipedia, but their main claim seems rather controversial and poorly supported by their data and argumentation.

    See also the coverage of a related paper in a precursor of this research report last year: "IEEE magazine summarizes research on sustainability and low-hanging fruit"

    Briefly

    References

    1. ^ Raper, Simon: Graphing the history of philosophy. Drunks and Lampposts, June 13, 2012
  • ^ Brendan Griffen: The Graph Of Ideas. Griff's Graphs, July 3, 2012
  • ^ Brendan Griffen: The Graph Of Ideas 2.0. Griff's Graphs, July 20, 2012
  • ^ Hirst, Tony (2012). Visualising related entries in Wikipedia using Gephi. OUseful.Info, July 3, 2012
  • ^ Hirst, Tony (2012). Mapping how Programming Languages Influenced each other According to Wikipedia. OUseful.Info, July 3, 2012
  • ^ Hirst, Tony (2012). Mapping related Musical Genres on Wikipedia with Gephi. OUseful.Info, July 4, 2012
  • ^ "Wikimaps: dynamic maps of knowledge" in Int. J. Organisational Design and Engineering, 2012, 2, 204–224
  • ^ Peetz, M. H., Meij, E., & de Rijke, M. (2012). OpenGeist: Insight in the Stream of Page Views on Wikipedia. SIGIR 2012 Workshop on Time-aware Information Access (#TAIA2012). PDF Open access icon
  • ^ Whiting, S., Alonso, O., & View, M. (2012). Hashtags as Milestones in Time. SIGIR 2012 Workshop on Time-aware Information Access (#TAIA2012). PDF Open access icon
  • ^ Osborne, M., Petrovic, S., McCreadie, R., Macdonald, C., & Ounis, I. (2012). Bieber no more: First Story Detection using Twitter and Wikipedia. SIGIR 2012 Workshop on Time-aware Information Access (#TAIA2012). Open access icon
  • ^ Luyt, B. (2012). The inclusivity of Wikipedia and the drawing of expert boundaries: An examination of talk pages and reference lists. Journal of the American Society for Information Science and Technology, 63(9), 1868–1878. doi:10.1002/asi.22671 Closed access icon
  • ^ Carver, B., Davis, R., Kelley, R. T., Obar, J. A., & Davis, L. L. (2012). Assigning Students to Edit Wikipedia: four case studies. E-Learning and Digital Media, 9(3), 273–283. PDF Closed access icon
  • ^ Patten, K., & Keane, L. (2012). Integrating Wikipedia Projects into IT Courses: Does Wikipedia Improve Learning Outcomes? AMCIS 2012 Proceedings. PDF Closed access icon
  • ^ Chiang, C. D., Lewis, C. L., Wright, M. D. E., Agapova, S., Akers, B., Azad, T. D., Banerjee, K., et al. (2012). Learning chronobiology by improving Wikipedia. Journal of Biological Rhythms, 27(4), 333–36. HTML Closed access icon
  • ^ Hogg, J. L. (2012). Wikipedia: How Instructors Can Use This Technology As A Tool In The Classroom. Worldcomp’12. PDF Open access icon
  • ^ Zhao, S. J., Zhang, K.Z.K., Wagner, C., & Chen, H. (2012). Investigating the determinants of contribution value in Wikipedia. International Journal of Information Management. doi:10.1016/j.ijinfomgt.2012.07.006 Closed access icon
  • ^ Chen, Simin Michelle (2012): Wikipedia: Remembering in the digital age. University of Minnesota MA thesis. June 2012. PDF Open access icon
  • ^ Austin Gibbons, David Vetrano, Susan Biancani (2012). Wikipedia: Nowhere to grow Open access icon
  • ^ Rienco Muilwijk: Trust in online information A comparison among high school students, college students and PhD students with regard to trust in Wikipedia. University of Twente, February 2012 PDF Open access icon
  • ^ von Muhlen, M., & Ohno-Machado, L. (2012). Reviewing social media use by clinicians. Journal of the American Medical Informatics Association : JAMIA, 19(5), 777–81. doi:10.1136/amiajnl-2012-000990 Open access icon
  • ^ Ferschke, O., Gurevych, I., & Rittberger, M. (2012). FlawFinder: A Modular System for Predicting Quality Flaws in Wikipedia. Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN) Workshop (PAN @CLEF 2012), Rome. PDF Open access icon
  • ^ Suzuki, Y., & Yoshikawa, M. (2012), QualityRank: assessing quality of wikipedia articles by mutually evaluating editors and texts. 23rd ACM Conference on Hypertext and Social Media (HT 2012). DOI Open access icon
  • S
    In this issue
  • Recent research
  • Technology report
  • Featured content
  • WikiProject report
  • Discussion report
  • + Add a comment

    Discuss this story

    These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

    Low-hanging fruit

    I totally buy the low-hanging fruit hypothesis. Maybe the authors of that paper could have done more to clarify their definitions but they still convince me. Signpost misses the point when it argues that it is easier to write about a relatively obscure area than expand an existing article in a well know area. Yes. That is true. But in the early days:

    1. It wasn't so much about expanding an article; it was about creating a new one.
    2. There were fewer rules and much less bureaucracy to deal with when either creating or expanding.

    Some of the rules and bureaucracy are required, to protect what we have already got and enable such a large group of editors to work together. Maybe that is a missing piece of the puzzle that the Stanford authors could have mentioned, but it does actually fit in nicely with the concept of low-hanging fruit. When you have fewer editors and fewer, lower-quality articles you don't need so much bureaucracy, making it easier to pick that fruit.

    Yaris678 (talk) 11:44, 29 August 2012 (UTC)[reply]

    While you may be right, this is not what the authors said, and our (mine...) critique was more of what they actually said :) --Piotr Konieczny aka Prokonsul Piotrus| reply here 16:50, 30 August 2012 (UTC)[reply]

    Hi! I am one of the authors of the paper. Thanks for the responses Piotr and Yaris.

    Piotr, I agree that we did not define our terms very precisely - I think this is in part an artifact of this paper being coupled with a verbal presentation, but I will make efforts to define our terms more precisely. I also agree that we left some areas under-explored, the quarter ends unfortunately quickly. Our initial explorations included attempts at quantifying the "deletionism" and its resulting influence, but we did not find anything promising, and were not able to come back to the topic in time. I am however opposed to your claim that the Missing Articles list offers support against the low hanging fruit argument with respect to ability to create articles. Indeed, I would almost consider that list supporting argument that expert or esoteric knowledge is required! I should like to quantify both people's ability and desire to approach these subjects to see what comes out. Ultimately, I do concede that our three supporting arguments don't conclusively prove our core hypothesis - personally I think it only lends weight that amongst our initial three hypothesis the low-hanging fruit idea is the most likely. Nevertheless, I hope it can act as a starting point should anyone explore further.

    Yaris, we tried to quantify the effect of rules and bureaucracy in the following manner :

    1. Create features for editors (such as # posts added, # posts reverted/deleted, # topics touched, etc.)
    2. Cluster editors
    3. Assign labels to the clusters manually (the hope being there would be a cluster of "novices", of "reverters", etc.)
    4. Observe how these clusters change over time
    5. Ideally, Observe the reverters cluster grow

    For whatever reason (feature selection, algorithm, who knows) we couldn't get good, consistent clusters. This led us to do the statistical analysis we did.

    I would like to see someone better quantify the affect of bureaucracy, I think it is there even though I do consider to play a smaller role to low hanging fruit hypothesis. I interviewed five editors, and while four of them did not think the culture was too terrible, one of them spoke vehemently against it, saying he never edited a talk page because it was filled with bickering and flame wars.

    Thanks again for the feedback. If anyone should care to contact me (or one of the other authors) I will come visit this page again or you can contact me by email by clicking on my user name below and then clicking on "Email this user" in the "toolbox" menu on the left.

    AustinGibbons (talk) 22:07, 30 August 2012 (UTC)[reply]

    One more thing - I strongly encourage anyone looking at wikipedia trends to do so across many languages! We observed many more interesting patterns than just what we presented. Particularly with respect to those pesky wikipedia bots!

    AustinGibbons (talk) 22:08, 30 August 2012 (UTC)[reply]

    Short comments

    NPOV History

    China #2

    Doctors

    I can hear it now: "I stayed at a Holiday Inn Express last night and I read about your surgery on Wikipedia. Now where am I supposed to start cutting?"--ukexpat (talk) 17:56, 30 August 2012 (UTC)[reply]

    Predicting quality flaws

    FlawFinder sounds very interesting. I can think of at least two ways it could be used:

    1. AWP:STiki-like system that suggests articles to look at and tag or improve.
    2. A system to to look at the history of an article and how the probabilities of faults changed over time. This may give people a clue as to when an article got messed up.

    Yaris678 (talk) 21:02, 2 September 2012 (UTC)[reply]

    Late comment, but whatever: I cited the low-hanging fruit hypothesis in this op ed ages ago. ResMar 21:32, 9 February 2013 (UTC)[reply]

    Want the latest Signpost delivered to your talk page each month?

    Archives

    Newsroom

    Subscribe

    Suggestions


    Retrieved from "https://en.wikipedia.org/w/index.php?title=Wikipedia:Wikipedia_Signpost/2012-08-27/Recent_research&oldid=1193869844"

    Category: 
    Wikipedia Signpost archives 2012-08
     



    This page was last edited on 6 January 2024, at 01:49 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki