Why there isn’t an Apache Arrow article in Wikipedia

TallGuyShort · on Dec 18, 2019

Having worked on commercially-resold Apache projects, can't say I argue with Wikipedia a whole lot on this. It seems to me it should be let in, but it's a bit silly to go and call them out like this on a corporate blog, IMO.

Dremio does benefit from Apache Arrow publicity and notoriety, even if they don't profit directly. Having a de-facto standard data format and open-source engines is a selling point for some. That's why Dremio explicitly calls it out on their own website. It also never hurts in the recruiting department. (edit: there's a reason the article was submitted by someone working in marketing & strategy)

>> I’m wondering if Wikipedia can continue to be considered a reliable source of information for technical folks who want to learn more about the vast system of Apache open source software projects.

Sign up for the Olympics, because that's a hell of a leap. You didn't get your page in, it's really not much of a reflection on the rest of Wikipedia. It's an open-source project. It should have it's own freely available documentation that fills much the same purpose anyway. If I want to learn about Apache X, I go straight to x.apache.org. They concede that it's not an end-user product anyway, so I'd think their key audience knows how to find an open-source project website. Lower the bar too far the other way, and there are plenty of semi-open-source project's marketing departments would be all over using Wikipedia to their own ends - I've seen my own former employer do this for their Apache projects.

sitkack · on Dec 18, 2019

This was posted by the direct of marketing, on their marketing blog... And the wikipedia article mentions, "efficient, effective, optimized" multiple times in the introduction paragraph. Compared to column-store [1], which the OPs article links to, it only mentions it once at the end and in weaker language.

As it stands, the Apache Arrow entry reads like a press release. I would recommend that Justin has a non-marketing copy editor clean it up before pressing the case further.

[1] https://en.wikipedia.org/wiki/Column-oriented_DBMS

duskwuff · on Dec 18, 2019

> I’m wondering if Wikipedia can continue to be considered a reliable source of information for technical folks who want to learn more about the vast system of Apache open source software projects.

I'm confused why the writer thinks it should be!

The Apache Foundation is a big tent. There's some clearly notable projects in there (like Apache httpd), but there's also a lot of really obscure crap that basically nobody outside ASF cares about (like Apache Creadur or Apache Pony Mail). Expecting Wikipedia to document every Apache project is ridiculous.

Is this particular project notable enough for a Wikipedia article? I don't know a lot about it, so I can't say for sure. But the article drafts that I've seen don't convince me that it is.

lallysingh · on Dec 18, 2019

> Expecting Wikipedia to document every Apache project is ridiculous.

Wikipedia has a lot of rather obscure entries. Long, long lists that I think easily under-rank the Apache project.

I'm not saying you're wrong, but the bar for notability is kinda vague. Lists of every episode of a series, every kind of kim-chee, etc.

Semiapies · on Dec 19, 2019

I would have said Wikipedia is a questionable source on "open source software projects", period.

The idea that Apache software projects are too obscure to include, while every single individual episode of Buffy the Vampire Slayer has a detailed article, is pretty typical of the site.

tshaddox · on Dec 19, 2019

On some occasions I would love to have articles about open source projects that are written in the high quality and non-point-of-view tone that Wikipedia encourages. A project's own API docs are obviously going to be a better source for, well, the API, but a blunt description of what a project is is something Wikipedia can definitely provide, and I don't see why they don't.

I certainly use Wikipedia to understand what various companies are. There are companies whose own websites seem deliberately designed to obfuscate what the company is. I don't see why Wikipedia couldn't provide the same benefit for open source projects.

JohnFen · on Dec 18, 2019

> Sign up for the Olympics, because that's a hell of a leap.

I agree. Personally, I don't think I've ever used Wikipedia to learn about an OSS project.

wenc · on Dec 19, 2019

> Personally, I don't think I've ever used Wikipedia to learn about an OSS project.

I think I might be part of the silent majority that actually does -- I often use Wikipedia to learn about the origin story of an OSS project. (not random tiny OSS projects, but more established ones)

Project websites don't tell you stuff like the original author, key people, context, adjacent categories of software, the history, the original problem that it was trying to solve, the drama (fights, competitors, disagreements between folks involved), and evolution of the project over time. The Wikipedia article often does.

This type of intelligence is invaluable when evaluating projects/products. If you're not wiki'ing your OSS project, you'd have to google and wade through mailing lists and piece together the story from blogposts, tweets, etc.

Here are some examples:

https://en.wikipedia.org/wiki/Spanner_(database)

https://en.wikipedia.org/wiki/Cockroach_Labs

https://en.wikipedia.org/wiki/QEMU

https://en.wikipedia.org/wiki/SQLite

romanows · on Dec 18, 2019

FYI, there will sometimes be nice comparison tables to make comparing software easier. For example: https://en.wikipedia.org/wiki/Comparison_of_deep-learning_so...

bjourne · on Dec 18, 2019

There is a set of policies that Wikipedia is supposed to follow when it comes to deciding if a page should be in or not. Nothing in this set of policies disqualifies a page if it benefits a company. Or even if it was written by employees of that company.

Thus, Wikipedia is violating its own policies. It follows that decisions on whether a page should be created becomes arbitrary which opens up the door for corruption. Some company pays Wikipedians and get their page(s) created, others don't and don't get any page(s).

TallGuyShort · on Dec 18, 2019

I don't disagree with anything you said, but I mentioned the benefit because the blog explicitly denies any benefit. But the reviewers do call out a conflict of interest. They criticized it because it read like an ad, and I agree. I've seen other Apache projects (looking at Drill) that read like an ad, and it's annoying.

mxfh · on Dec 18, 2019

It was a pain to get gitlab in 5 years ago after a "controversial" deletion, so it wasn't available for simple undeletion. Domain specific knowledge has it notouriously hard with wikilawyers who, at large, seemingly stopped adding new things to their world view 15 years ago.

Then it becomes a game of jumping through hoops and hoping you end up with a kind wiki-landlord or knowing a friendly wikipedia admin.

Doing the latter by anouncing your concern on social media and hoping a sympathetic admin picks it up, might be the easiest on human time and resources, just let them copy your reasonably well sourced article draft from your personal space and see what happens.

sien · on Dec 18, 2019

At one point about 10 years ago the entry for Atlasssian was deleted for not being notable.

Geimfari · on Dec 18, 2019

It was deleted in 2005 for being a blatant advert, and again in 2010 for the same reason. I doubt it was actually a notable company in 2005.

[1] https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletio...

[2] https://en.wikipedia.org/wiki/Special:Log?type=delete&page=A...

mxfh · on Dec 18, 2019

Even if it was a clumsy self-promotion or over-ambitious fans with no clue on wikipedia inner mechanics shouldn't set back a viable interest on information about a given company or other entitity by a multitude of years. After a deletion it's just magnitudes harder for anyone to get an article restored, compared to an entity which didn't have the "luck" to get added to wikipedia too early. Deletion history shouldn't have that much of a say on actively developing entities as it has now.

tptacek · on Dec 18, 2019

The exact opposite thing is true. If the article is bad, it needs to not be on the site. What's important are reliable articles, not how many articles there are. It's perfectly fine for a topic we know will be more obviously notable in the coming years to stall for an article until a decent one can be written.

This has been the ethos of the project practically since its inception. It's always startling to see people questioning Wikipedia's premises, since it seems pretty clearly to be one of the most successful volunteer projects in the entire history of the Internet.

CharlesColeman · on Dec 18, 2019

Wikipedia can actually be pretty schizophrenic on the issue. Depending on timing and the interest groups involved, it can go either way.

I've personally given up on editing Wikipedia (too many fanatics with infinite time), but IMHO it needs to be much more deletionist than it is now. There is value to its current wide scope, but its maintenance model has trouble with long tail articles. It shouldn't have an article unless it can consistently gather medium-sized quorum of active editors to watch over it.

bjourne · on Dec 18, 2019

That is not what Wikipedia's policies say. They say that if a topic full-fills the notability criteria there should be an article for it. It does not say that if an article is bad it should be deleted - rather the contrary - if an article is bad, improve it!

This was the ethos of the project in the beginning but is not the ethos anymore. People have realized how valuable it is for companies and other actors to have their own article on Wikipedia. Therefore Wikipedians have created a very bureaucratic system for deciding which articles should be created. And people like to wield power. For example, by rejecting perfectly good articles.

tptacek · on Dec 19, 2019

This article was struck for not meeting the notability criteria, which involves citing reliable sources that make a straightforward claim of notability. It's not a perfectly good article.

perl4ever · on Dec 19, 2019

If the problem is rejection of "perfectly good articles", why start by arguing there's no grounds for deleting bad articles? Seems like dancing around the point.

yellowapple · on Dec 18, 2019

2005 was 15 years ago, not 10.

zozbot234 · on Dec 18, 2019

> Doing the latter by anouncing your concern on social media

Be careful about doing this. It's harmless if you're simply a concerned user, but once you're actually in a dispute with someone on wiki it can easily be in breach of their guidelines.

tptacek · on Dec 18, 2019

Open source projects are particularly tricky for Wikipedia. There are tens of thousands of them. Their owners are often passionate. They compete with each other, so there's incentive to write hard-to-adjudicate competing claims. Many have commercial backing, which further warps incentives. The projects themselves are highly technical; many, like Arrow, are software development tools and components. There are few authoritative sources that reliably track open source projects. Keeping up involves directly following bug trackers and message boards and then synthesizing a narrative, which is the definition of "original research", forbidden in the encyclopedia.

It's likely that Arrow does deserve a WP article. But Arrow's sponsors misunderstand more about Wikipedia than Wikipedia does about Arrow. Writing a defensible article about their project will require work; in particular, they're going to need to spend the time tracking down authoritative sources for why Arrow is notable, and those claims will probably need to be something more persuasive than "hundreds of companies use it"; hundreds of companies use all sorts of things that don't, and shouldn't, be featured in their own encyclopedia articles.

I understand the impulse behind "this project is important; it should have a Wikipedia article". But when you take a step back and accept what Wikipedia actually is, rather than what you think it should be, you're left with the question: do we really need to feature this particular piece of software in its own encyclopedia article? 20 years from now, will people still be getting value from it? Whatever value that might be, will it outweigh the 20 years of other people's volunteer efforts to maintain the article, keeping it free of vandalism and ensuring that it doesn't surreptitiously turn into a promotion piece for some company or another?

The answers might be "yes". But I don't see much evidence in this piece considered the questions.

Lots of things that don't seem deserving have in-depth Wikipedia coverage. Many of those things probably really don't belong in an encyclopedia! But there are two sides to this problem: the merit of the topic, and the cost, in volunteer time, of including them. A marginal topic can be defensible if it's easy to reliably cover it. A seemingly important technical topic might not be if the only way to say anything interesting about it is to write original research directly into its article.

Late edit

A useful tip for getting your open source project covered in its own Wikipedia article: don't have the Chief Marketing Officer of the company that owns the project write the article.

Analemma_ · on Dec 18, 2019

This is a great comment; I'll just add one other thing, which is something I've mentioned before in arguments about Wikipedia: Wikipedia's goal is verifiability, NOT truth. "Truth" is explicitly a non-goal of the Wikipedia project. For any given subject, Wikipedia is not meant to provide the truth about that subject, it's meant to be a summary and distillation of the existing reliable sources about it. If there are none, that's neither Wikipedia's fault nor its problem.

You can take issue with this goal, but that's how it works, and it's also how encyclopedias have always worked.

ghaff · on Dec 18, 2019

>and it's also how encyclopedias have always worked.

Well... Hopefully verifiability and truth have some correlation. Otherwise I'd argue that verifiability isn't worth much. What is different from traditional encyclopedias is that they did make determinations about what was important (which is at least akin to notability) and would allocate articles and pages as appropriate. From today's perspective we might dispute the judgments of importance but they were there.

tptacek · on Dec 18, 2019

The Wikipedia meta article this is drawn from does a better job of answering this concern than any of us can.

https://en.wikipedia.org/wiki/Wikipedia:Verifiability,_not_t...

Argue with it if you must, but let's try not to make the thread tediously recapitulate it.

btilly · on Dec 18, 2019

Hopefully verifiability and truth have some correlation.

Not as much as you would hope.

I have two sisters with Wikipedia articles. Let's pick https://en.wikipedia.org/wiki/Jennifer_Tilly for one of them. It claims that her mother was Irish and Finnish, and goes on to list how many siblings she has. Those statements are verifiable but false. You can find an article written by reporters that said those things.

She isn't Irish, her step-father (my father) was. She also has 2 more brothers than are listed in that article. That is true, but not verifiable. Nor will they ever be verifiable. And therefore Wikipedia will never be corrected.

The problem here is that the Gell-Mann Amnesia Effect (see https://www.goodreads.com/quotes/65213-briefly-stated-the-ge... for an explanation) guarantees that there will be lots of verifiable statements that aren't so. Wikipedia builds a coherent view of a subject on that sand, and it is very hard to find what it is mistaken about. But it is riddled with errors that will never get fixed because they were wrong in a verifiable primary source.

And information not captured in a verifiable primary source will never make it in. For example her grandfather was the T in https://www.cmtengr.com/. Good luck verifying that one!

ghaff · on Dec 18, 2019

>And information not captured in a verifiable primary source will never make it in

In theory. In general? I was just looking at an article where I have a lot of personal knowledge.

Is mostly True, as far as much of my first-hand knowledge can tell. And leave aside a couple of the random personal insertions that are definitely True if outside of all proportion to the rest of the article.

But there's one section in particular that goes into even more detail than I knew even as someone fairly in the depths of this particular thing. (But it's very plausible and consistent with what I do know.) It's certainly not something that's ever been written about publicly AFAIK and the actual references in the article are minimal.

Which comes back to that notability/verifiability/etc. are nice theories--and may even make sense in the abstract--but there's a huge amount of inconsistency depending upon whether someone has taken notice of an article or not. (And, in at least some cases, I'm often happy with people not looking too hard.)

tptacek · on Dec 18, 2019

Which inconsistency is, of course, what you'd expect from an all-volunteer project.

ghaff · on Dec 18, 2019

Sure. I'm also not sure that the fact that Wikipedia's rules often fall through the cracks is entirely a bad thing. You end up with some unverified information. You also end up with maybe somewhat unreliable information that would never have been verifiable. Even if I can't fully endorse this sort of informal breaking of the rules, I'm not really opposed to it either.

lonelappde · on Dec 19, 2019

Wikipedia says "her mother was of Irish and Finnish ancestry."

Jennifer Tilly is your sister, but her mother's step father is your father?

Her grandfather is her brother's father?

btilly · on Dec 19, 2019

Are you seriously confused by my carelessness with pronouns?

Jennifer and I are siblings. Our mother's mother was Finnish. Our mother's father (the Tilly in CMT) was a complicated mix. Jennifer's father was Chinese. My father was Irish.

She was born Chan, I was born Ward, our names were changed to our mother's maiden name after her divorce from my father.

All clear?

tptacek · on Dec 18, 2019

The "Gell-Mann Amnesia Effect" being the banal fact that reporters are sometimes wrong about things?

Have you tried leaving a comment on the Talk page of the article saying that you're Jennifer Tilly's sister, linking to something about you (you're obviously bona fide), and asking for a correction? WP has special reliability rules (WP:BLP) for "Biographies Of Living Persons".

It doesn't look like CMT has a Wikipedia article at all. Should it?

btilly · on Dec 18, 2019

The "Gell-Mann Amnesia Effect" being the banal fact that reporters are sometimes wrong about things?

Sometimes?

I've yet to read a feature article written by a reporter on a subject that I know well which didn't have multiple mistakes.

Have you tried leaving a comment on the Talk page of the article saying that you're Jennifer Tilly's sister, linking to something about you (you're obviously bona fide), and asking for a correction? WP has special reliability rules (WP:BLP) for "Biographies Of Living Persons".

Actually I am one of the brothers that Wikipedia does not know about.

Back in the 2007-2008 period I decided to make some obvious corrections. They got rejected. I left some comments in talk. A couple of my comments are still there on Jennifer's talk page.

If you want to try to fix the page, you could use http://www.officialmegtilly.com/blog/megs_made_up_muffins/ and http://www.officialmegtilly.com/blog/hell_in_a_hand_basket/ as evidence that Meg has at least one brother that Wikipedia doesn't know about. Good luck getting it changed.

As for CMT, you tell me. It is a civil engineering company that has existed for decades and has a significant presence in multiple states. But there isn't much about them online other than the company website. Which, by definition, is not considered reliable.

tptacek · on Dec 18, 2019

I have it in for the "Gell-Mann Amnesia effect" (is there even evidence that Gell-Mann believed in it?), but your point is well taken: Wikipedia's rules do heavily privilege journalism, and journalism is merely the first draft of history, not the camera-ready final.

It's possible that Wikipedia has carefully balanced this; if they didn't privilege reporting, a lot fewer articles would get written, about a lot of things people actually do want to look up in the encyclopedia. Reliance on journalism means they'll routinely get some bad facts, but there's a bound on how bad things will be that there wouldn't be if they just got rid of WP:RS altogether.

It's much more likely that nobody has carefully thought about this, and it's just a shambolic volunteer project taking advantage of what they have to work with.

My basic take about Wikipedia is that it's hard to argue with the results. However obnoxious their policies are to nerds like us (and I commented upthread about obnoxious experiences I've had working on it --- I no longer contribute!), it's a tremendously successful project, perhaps one of the most successful in the history of the Internet.

It's bad when they have bad facts, more so when those facts pertain to living people, even more so when someone has the correct facts and can't get them accepted, and especially so when that person is a family member of the subject.

It's less bad, to me at least, that an encyclopedia happens to lack a page, for now, on Apache Arrow.

ghaff · on Dec 18, 2019

We're basically into the deletionist vs. inclusionist debate that is at least somewhat orthogonal to what laypeople think of as notability. Is a Pokemon character notable. Not really?? But because of the enthusiastic fan base tons have been written about them.

On the other hand, whether you're talking open source projects beyond the big names, corporate executives, or just people who are reasonably well known within fairly large communities, there just isn't a lot of independently sourced published material about them, especially in mainstream pubs--which (somewhat both understandably and ironically) Wikipedia tends to prefer. You even have people with tons of hits on Google but there isn't a ton of info about them online.

tptacek · on Dec 18, 2019

What "debate"? This isn't a live debate. There is a faction of people, some of whom are involved with Wikipedia, that want it to be something other than a tertiary-source encyclopedia, just like there are people who want to be able to write blog posts as Stack Overflow comments. It's true that they will never stop advocating for these changes, but there's no evidence that the projects themselves are going to cave.

ghaff · on Dec 18, 2019

Maybe it's not a debate so much as a tension--and it's a real one. Personally, I haven't contributed anything to Wikipedia in years. It's useful, I see its flaws, but I certainly don't care enough to push on it for the most part.

tptacek · on Dec 18, 2019

I'm exactly the same way. For instance: I did some writing about macOS security in the macOS articles, way back when, and most of it got struck because I couldn't cite it properly. It was frustrating to write a straightforward statement, like "the macOS Seatbelt sandboxing mechanism uses s-expressions", and have it get struck.

But I came quickly to realize the project was right. Without a reliable secondary source, I was effectively conducting research in the pages of the encyclopedia. What I learned from that was: I shouldn't be writing encyclopedia articles; the technical writing I do tends not to be tertiary.

It's fine – good, in fact – if most people don't write much in Wikipedia. It's its own special thing. You can't argue with its success: it might be the most successful project in the history of the Internet, and a long-term contender for one of the most successful volunteer knowledge projects ever.

lonelappde · on Dec 19, 2019

Th number of wigglypuff fans exceeds the number of Arrow fans by at least 10x. And the article is higher quality.

SkyBelow · on Dec 18, 2019

If a bulldog clip is notable, why wouldn't an open source project that hundreds of companies use not be?

https://en.wikipedia.org/wiki/Bulldog_clip

s/bulldog clip/many other random office supply items/

Edit: Swapped to bulldog clip as a better example of a less notable office supply.

tptacek · on Dec 18, 2019

This seems like an argument that says that Apache Arrow is as important as the paper clip, which would be an extraordinary claim.

That paper clip article is itself extraordinary. Go look at it again. It delves into the history of the paper clip, covers different designs, has excerpts from paper-clip-making-machine patents, and describes an actual controversy(!) over its invention, all carefully illustrated (illustrating things on Wikipedia is a bitch, by the way, because of IPR rules). People went through a lot of effort to make a good paper clip article.

And Wikipedia considers the paper clip article to be a "C-class article" (C here means approximately what it means in school), and the topic of "low" importance. Just so we're clear on what the bar is here.

Compare that with the author's attempt at an Arrow article:

https://en.wikipedia.org/wiki/Draft:Apache_Arrow

It's a paragraph of promotional material, a brief comparison to other systems, and a citation to a blog post saying "I do not see any reason not to embrace the Arrow standard".

Come on.

I think there probably should be an Arrow article. The authors have found a bunch of reliable sources covering it; they just haven't distilled from them a defensible claim to Arrow's notability. I think it's a matter of putting the work in.

SkyBelow · on Dec 18, 2019

>This seems like an argument that says that Apache Arrow is as important as the paper clip, which would be an extraordinary claim.

I picked the first office supply object that came to mind. There are better examples.

For example, why have the bulldog clip as it's own article when you already have binder clip?

https://en.wikipedia.org/wiki/Bulldog_clip

https://en.wikipedia.org/wiki/Binder_clip

I highly suspect that with some actual effort I could find an even less deserving office item.

And you may be right that Arrow needs to do more to be notable and ready for its own page. But ignoring some objective standard and instead looking at a relative standards of other articles, it does feel like there are some unequal requirements in this regard.

tptacek · on Dec 18, 2019

The binder clip article has many of the same merits as the paper clip article. The bulldog clip article is more interesting: it's a "stub" article (its authors are explicit about the fact that it's not a complete article), and still it manages to track down some of its history and cite interesting uses from books – someone had to read those books and fish the bulldog clip cites out of them.

I think it's pretty clear to anyone why bulldog clips are in the encyclopedia, and it is only clear to subject matter experts with strong opinions why Arrow would be.

If your topic requires subject matter expertise in order to recognize its importance, the standards are unequal: you are going to need to do more work to establish its notability, because you cannot reasonably expect the layperson volunteers in the Wikipedia project to do that work for you.

username90 · on Dec 19, 2019

An item which almost every office worker has seen or used is definitely notable enough to get an article. Yet another data format among hundreds which has yet to reach a wider audience could be, but it is not obvious.

nemothekid · on Dec 18, 2019

I'm sure an order of magnitude more of "hundreds" of companies use paperclips.

kevindong · on Dec 18, 2019

> I understand the impulse behind "this project is important; it should have a Wikipedia article". But when you take a step back and accept what Wikipedia actually is, rather than what you think it should be, you're left with the question: do we really need to feature this particular piece of software in its own encyclopedia article? 20 years from now, will people still be getting value from it? Whatever value that might be, will it outweigh the 20 years of other people's volunteer efforts to maintain the article, keeping it free of vandalism and ensuring that it doesn't surreptitiously turn into a promotion piece for some company or another?

I really don't think a 20-year-view is a good measure of whether or not an article should exist. Even if something is forgotten in the future, if it has relevance and importance today than that alone makes the article worth existing.

habitue · on Dec 19, 2019

For-profit businesses are particularly tricky for Wikipedia. There are tens of thousands of them. Their owners are often passionate. They compete with each other, so there's incentive to write hard-to-adjudicate competing claims. Many have commercial backing, which further warps incentives.

tptacek · on Dec 19, 2019

They are! Spend some time patrolling AfD. They're a huge problem; companies are constantly trying to get themselves into Wikipedia, because Wikipedia is heavily privileged in Google search results. But for-profit companies tend to present clearer cases for WP volunteers: they're either well-covered in reliable sources, in which case they're easy accepts, or they're not, in which case they're easy rejects.

The problem with OSS is that lots of projects probably do merit pages, but it's hard to see which ones.

jccalhoun · on Dec 18, 2019

>Arrow is designed to serve as a shared foundation for SQL execution engines, data analysis systems, storage systems, and more – think Pandas, Spark, Parquet, etc. Engineers across the community are working together to establish Arrow as a standard for columnar in-memory processing.

I like to think I'm fairly techy for a non-programmer but I have no idea what that means. That might be part of their problem if that is the description in their wikipedia entry.

ggggtez · on Dec 19, 2019

I believe it's what the kids call "buzzword bingo".

tetromino_ · on Dec 18, 2019

See https://en.wikipedia.org/wiki/Wikipedia:Notability - all you need to show is that Apache Arrow has received significant coverage in reliable sources that are independent of the subject.

So: find conference papers/talks by people not affiliated with Apache or the Apache Arrow project and that discuss Apache Arrow. Figure out how to incorporate the tidbits about Arrow from those papers into the article text. Add sources in footnotes. Done.

sb057 · on Dec 18, 2019

A version which was rejected included the following (non-Apache-affiliated [afaik]) references:

https://www.xenonstack.com/insights/what-is-apache-arrow/

https://link.springer.com/chapter/10.1007%2F978-1-4842-1311-...

https://www.biorxiv.org/content/biorxiv/early/2016/08/23/071...

http://delivery.acm.org/10.1145/3110000/3103003/p138-Maas.pd...

https://www.theregister.co.uk/2016/02/17/apache_arrow_toplev...

https://www.cio.com/article/3034279/big-data-gets-a-new-open...

https://www.infoworld.com/article/3033446/hadoop/apache-arro...

https://sdtimes.com/apache/guest-view-first-release-apache-a...

https://www.infoq.com/news/2016/12/le-dem-apache-arrow/

http://dbmsmusings.blogspot.com/2017/10/apache-arrow-vs-parq...

https://dbmsmusings.blogspot.com/2017/10/apache-arrow-vs-par...

tptacek · on Dec 18, 2019

The 1st source is a blog post on a consulting company website.

The 2nd mentions Arrow only in passing, after several pages of coverage of Spark; Arrow is covered only in relation to Spark. It's a reliable source but doesn't clearly establish notability.

The 3rd mentions Arrow hardly at all; it's an implementation detail, mentioned just once, in a paper about something else.

I can't fetch the 4th.

The 5th, a story in The Register, is reliable and probably does go towards notability, though it seems to sort of argue against it (the gist of the article is that it's surprising that Arrow has been made a top-level project at all).

The 6th, in CIO, is a recap of a press release. Trade press PR recaps shouldn't be WP:RS, but WP will often accept them, or would when I was patrolling AfD; it's luck-of-the-draw. The admins who shot down Arrow's page were smart enough not to accept it.

The 7th, in InfoWorld, is promotional as well, but it's at least written in some depth. It's a straightforward notability claim. The Arrow article should draw more clearly from it, in the opening paragraph.

The 8th, in SDTimes, is written by someone affiliated with the project itself; it's citable, but WP probably won't accept it independently as grounds for notability.

Same, in effect, for the 9th, which is just a recap of an interview with the project author.

The 10th and 11th are just blog posts. They're citable if they're not contentious, but they usually won't be acceptable as WP:RS for notability.

bjourne · on Dec 19, 2019

Blog posts are prima-facie evidence of notability. Same thing with mentions in published articles. From the book (second link):

"Recognizing that Value Vectors meet the needs of other data processing engines, in February 2016, the Apache Software Foundation announced Apache Arrow as a top-level project, bypassing the standard Incubator process. Committers to the project include developers from other Apache projects such as Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, Kudu, Pandas, Parquet, Phoenix, Spark and Storm.

Apache Arrow enables execution engines like Spark to take advantage of the latest operations included in modern processors, for fast analytical data processing. Columnar layout of data allows for better use of CPU caches by placing all data relevant to a column operation in as compact of a format as possible. ...

Apache Arrow software is available under the Apache License v2.0.

Dremio, a startup led by Jacques Nadeau, chair of the Apache Drill and Apache Arrow Project Management Committees, leads the development."

In the past, this and the other sources would have been more than enough to establish notability. I know that because I have created Wikipedia articles on subjects much less notable than that. The problem for Apache Arrow isn't that it isn't notable enough, it is that people have already tried four times to get it included in Wikipedia so the Wikipedians voting on new page inclusions are getting suspicious about it.

tptacek · on Dec 19, 2019

If you want to sum up something like 10 years of debate and consideration of the role of blogs as sources (it’s much more complicated than that they’re not allowed) by saying, in effect, “you’re all wrong”, well you do you.

bjourne · on Dec 19, 2019

I'm merely saying that you are wrong. Blogs are not always reliable sources in the Wikipedia world, but they can absolutely be used as evidence for notability.

tptacek · on Dec 19, 2019

Not routinely, and not most blogs. As you can clearly see from the admin comments on this Arrow post.

bjourne · on Dec 20, 2019

Yes, routinely. You can find plenty of articles which had much less support in sources when they were created here https://en.wikipedia.org/wiki/Category:AfC_submissions_by_da... That Wikipedians rejected the article is a moot point because the argument is that the rules are not applied consistently.

tptacek · on Dec 20, 2019

Blogs are not a consistently reliable source, particularly for notability claims. It depends on the subject and on the blog. I'm not making this up; I spent a year doing AfD patrol, and this was probably the most frequently debated point in AfD arguments.

Obviously, they can't always be WP:RS, because then literally everything would be "notable", since anyone can stand up a blog about anything. You can't even logically assemble the argument you're trying to make.

bjourne · on Dec 21, 2019

I didn't claim that blogs were consistently reliable sources. I claimed that they were routinely used as evidence of notability. Evidence of notability != Reputable source.

I'm not making anything up either; I have penned several articles on Wikipedia and gotten them through the AfC process with much less notability evidence than the Apache Arrow draft had. The difference was that I used to be an established contributor so the rules were not as harsh against we as they are against newbies and unknown contributors.

Also, you can look at the link I gave you and see that the notability rules are not uniformly applied.

wolfgang42 · on Dec 18, 2019

Of the 10 links you list (the dbsmusings link appears twice), 5 are used to back up the claim that Arrow was “donated to the Apache Software Foundation[7] in 2016, where it has been maintained and extended since.[7][8][9][10][11]”, which doesn’t really seem like it needs that many sources.

Of the other half, one appears to be some sort of marketing blogspam, one is a paper that briefly mentions that they used Arrow, and two I can't access for various reasons. That leaves one blog post that actually discusses Arrow, and the sentence it's used as a reference for in the draft article isn't about Arrow specifically, but the tradeoffs of in-memory vs on-disk storage.

Yes, these links may be independent of the Arrow project, but I'm not convinced that they add anything of substance to the actual content of the article. Mostly it looks like they were added in an attempt to game the number of references.

chubot · on Dec 18, 2019

The blog post should have included these citations because I was left wondering what they did to support their claims. It sounds like they probably should have an article but that they also have misunderstandings of Wikipedia.

dajohnson89 · on Dec 18, 2019

And to be clear -- this is the job of the article author, not the editor who rejects the article without doing a little bit of homework first?

tetromino_ · on Dec 18, 2019

Of course. Similarly, when you are submitting a PR to an open source project or a manuscript to an academic journal, it's your job as the author to take note of contributor guidelines.

wolfgang42 · on Dec 18, 2019

Yes; Wikipedia's expectation is that authors have researched the topic about which they are writing, and therefore they are in the best position to provide the sources from which they got their information. The editors' job is to ensure that Wikipedia's standards are met, not to re-do the research that the author should already have performed.

If the case is that no research was performed because the author is already an expert in the area, they are still expected to provide citations so that the same standard can be applied to all authors.

skywhopper · on Dec 18, 2019

Yes. The author is supposed to cite the sources.

xibalba · on Dec 18, 2019

As a strategy for getting Dremio on the front page of HN and thus on the radar of a large group of tech people (i.e. Dremio's prospects), this is article is very clever.

As a critique of Wikipedia, not so much.

jessaustin · on Dec 18, 2019

This seems like a good reason to flag it.

Ninjaneered · on Dec 18, 2019

Here's the link to the draft:

https://en.wikipedia.org/wiki/Draft:Apache_Arrow

And some possible additional sources:

* https://www.forbes.com/sites/forbestechcouncil/2019/09/24/dr...

* https://www.businesswire.com/news/home/20180906005114/en

* https://thesiliconreview.com/2016/02/apache-arrow-is-the-new...

tptacek · on Dec 18, 2019

The first article is a paid promotion piece, which WP won't accept as an RS.

The second is a press release by Arrow's sponsoring company, which, obviously, WP won't accept as an RS.

I have no idea what "The Silicon Review" is; this is the first time I've ever seen it. To the extent it's not a pay-to-play trade publication, it might qualify as a notability-establishing source. The fact that the "Review" does not itself have a WP page might make it harder to claim it's reliable, since it suggests nobody else knows what it is, either.

Ninjaneered · on Dec 19, 2019

Looks like my lateral reading was sub-par (actually I didn't even try, just a quick Google/post).

The "Silicon Review" one looks like a pay-to-play as well after further review, it's used in citation on a few other Wikipedia articles, but as far as I can tell, and due to some anecdotal stories, it doesn't look good.

* https://www.reddit.com/r/PublicRelations/comments/bha6hs/sil...

* https://arpr.com/blog/4-pay-for-play-scams/

Good catch, thanks for spending the time to review my links. Reading your comments above, I largely agree. It's a high bar (mostly) to get an article on Wikipedia, and that's a good thing. It allows us to read the majority of content on Wikipedia without too much suspicion.

SquishyPanda23 · on Dec 18, 2019

I read the draft.

Maybe this is an unpopular opinion, but it's obvious advertising and has no place on Wikipedia. Maybe a Medium post would be more appropriate.

Wikipedia already has a problem with bad software articles like this.

JohnFen · on Dec 18, 2019

I mostly agree. It is distinctly marketing-flavored, although not to a degree that I think should disqualify it alone.

What I think should disqualify it is that it's missing a lot of detail that would make the entry genuinely useful. As it is, it's as useful as a press release. Also, it does appear to have a problem with appropriate references.

Generally speaking, I have a hard time disagreeing with the reasons listed on that page for the rejections.

qwerty456127 · on Dec 18, 2019

Once I witnessed awesome articles [others added and I used with delight] on open source frameworks as well as some minor facts [I added] on other subjects deleted for being "insignificant" I decided I'm not donating to Wikipedia until this bullshit ends.

Wiki articles are not videos, they take humble disk space to host so I can't recognize any reason in dismissing "insignificant" information other than a stupid rule.

IMHO whatever can be considered a piece of knowledge should be there.

BTW nearly the same applies to StackOverflow - thanks to high reputation points I earnt during the early days I can see deleted questions and answers and I often see really interesting (having three-figure upvvote scores and dozens of stars) questions and very informative (also heavily upvoted) answers deleted.

oefrha · on Dec 18, 2019

https://en.wikipedia.org/wiki/Draft:Apache_Arrow

> REVIEWERS: Please note that the submitting editor is the chief marketing officer and vice president of strategy at this company.

Yeah, sorry, big no no there.

Disclosure: consider myself a Wikipedian to some extent, got a couple hundred edits on Wikipedia.

xeeeeeeeeeeenu · on Dec 18, 2019

As long as you abide by WP:N, WP:NOR and WP:NPOV, writing articles about yourself is perfectly acceptable on Wikipedia and doesn't break the rules.

oefrha · on Dec 18, 2019

You may want to review https://en.wikipedia.org/wiki/Wikipedia:Conflict_of_interest. Acceptable — sure, sometimes. Perfectly — no. It’s a strongly discouraged practice.

Geimfari · on Dec 18, 2019

It's strongly discouraged and looked down upon. Editors with a conflict of interest take up a disproportionate amount of other editors' time and are practically never able to write neutrally about themselves or their company.

foota · on Dec 18, 2019

This is wrong though, Apache Arrow is an open source spec, the fact that they happen to work at a company working on it seems secondary.

oefrha · on Dec 18, 2019

1. Conflict of interest has nothing to do with whether it’s open source. If I submit a Wikipedia article about my completely uncommercial personal project it’s still a huge conflict of interest. Rule of thumb: don’t submit Wikipedia articles about yourself, your project, your employer or your employer’s project. If it’s notable enough eventually someone without a conflict of interest will do it.

2. Tons of companies use open source for marketing. This one is no different as far as I can tell. Even had the chief marketing officer submit a Wikipedia article for their project.

foota · on Dec 18, 2019

But it's not _their_ project, it's a project they contribute to. Google contributing an article about k8s would be a completely different thing from Google contributing an article on say Hadoop.

oefrha · on Dec 18, 2019

> it's a project they contribute to.

One hell of an understatement. Search for Apache Arrow and look at the top results.

foota · on Dec 18, 2019

Of the ten results on the first mobile search results page for me only two (result 3 & 4) seem related to the company?

mxfh · on Dec 18, 2019

The problem with these rules are, that they are so selectively enforced, it is a farce. They selectively assume bad faith, where, by all objective means is none, and brush over others, as long as it is backed up by random arguably non-neutral publications.

It's near impossible to put article's up over prolific female journalist for example, because all this can be enforced since, the publications are all from the same source (the publisher they work for) or are interviews or some sort of talk or award, where it's almost never for men, which get away* by linking to a podcast.

tldr: *WP:OTHERSTUFFEXISTS

mistrial9 · on Dec 18, 2019

sadly, I can jump in on the "Wikipedia fails" train here, also. In about five attempts to really change an article (different ones) in about five years, every single change was rejected, as far as I know. The changes were different, one was writing style and order of facts on a public historical event in this century; one was adding a lot of detail to the description of a popular fantasy fiction series; one was removing a controversial and provocative one-liner at the top of a page about people at the edge of (western) society; and another .. hmm I forget now, because I just gave up !

My aging colleague tells me, just keep doing the changes, they cant stop everything. However, my direct (and limited) experience is.. they do stop everything (that I try). I was logged in twice and used anonymous three times, and added citation a bit, too.

To the point of the article, FOSS projects in wikipedia ? hmm maybe there could be a clear category for that ? software projects are proliferating rapidly.. dunno

zozbot234 · on Dec 18, 2019

The way around mindless reverts is to first detail the proposed change on the talk page, then wait for anyone interested to object. If no one does, you can make the change live, pointing out that no objections were raised. Even if someone does object, such objections should ultimately be made actionable, i.e. it should be made clear how to address them to the other party's satisfaction.

wffurr · on Dec 18, 2019

Did you read the reasons for rejection and try to modify subsequent submissions to better comply with Wikipedia's published guidelines?

mistrial9 · on Dec 18, 2019

yes, I did, and I feel that this revert behavior was more hazing/article control than substantive in all cases but one, and that one I dont personally agree.

ErrantX · on Dec 18, 2019

Have you got an example?

pradn · on Dec 18, 2019

It looks like there aren't enough independent, non-commercial articles to use as references. This is somewhat common for many newish technical projects. Add some academic papers, some usage numbers, some summary blog posts that aren't related to the project. Wiki editors are very suspicious of people from companies editing articles related to their work.

PeterCorless · on Dec 18, 2019

I'm not so sure it's falling afoul of 'not-notable' so much as WP:COI.

mikl · on Dec 18, 2019

Why do you care about having a Wikipedia page for Arrow? Why is it important enough to whinge about on HN?

Wikipedia is much like Stack Overflow these days, the community has become hostile to newcomers who fail to meet their somewhat arbitrary but very exacting standards for what is allowed on their site.

Fortunately, you can just publish your own web site. No need to be bothered about not being on WP.

dredmorbius · on Dec 19, 2019

For those who think that edit wars, content disagreements, and innacuracies are any special realm of Wikipedia, they're not.

One of the best examples I've encountered demonstrating this is a 19th century edit revision war between the British and American publishers of Chamber's Encyclopaedia, on the topics of Free Trade, Protection Duties, Slavery, and certain salacious particulars concerning His Royal Highness, the Prince of Wales.

https://old.reddit.com/r/dredmorbius/comments/4xe2k1/chamber...

What's novel concerning Wikipedia is that these disputes (as with those of free software vs. proprietary software) tend to occur, or at least leave significant evidence, in the open public record.

thrower123 · on Dec 18, 2019

The hard-line Wikipedia deletionists should be deleted themselves. The argument is always brought up, like StackOverflow, that they have to be ruthless or it turns into an Eternal September dumping ground of garbage, but the quality is already very uneven and gatekeeping like Cerberus doesn't help further that goal. There's already a toxic Dead Sea effect where the pedantry and politicking has chased out a lot of people that would contribute; who the hell wants to bother putting in some hours writing something up if it is just going to be summarily deleted?

Bandwidth and hard drives are cheap.

Just spitballing, but it'd be nice if Wikipedia worked a little more like Linux distro repositories. Keep the tightly curated articles in a "core", but leave room for "community" or "nonfree" collections if you want to turn them on.

CharlesColeman · on Dec 18, 2019

> Just spitballing, but it'd be nice if Wikipedia worked a little more like Linux distro repositories. Keep the tightly curated articles in a "core", but leave room for "community" or "nonfree" collections if you want to turn them on.

I think that's a fantastic idea, especially if it would lead to a drastic reduction in the number of articles served from the main Wikipedia domain (to a number that can meet some reasonable quality and maintenance standard, maybe 10 times the size of the most comprehensive print encyclopedia, or a 1/6 of Wikipedia's current size) [1].

[1] https://newrepublic.com/article/101795/encyclopedia-britanni...: "The 2002 Britannica contained 65,000 articles and 44 million words. Wikipedia currently contains close to four million articles and over two billion words..."

bjourne · on Dec 19, 2019

Most communities seem to go that way. In the beginning, most people spend their time contributing first-order content. Then, as the community grows, it attracts more meta-users who are more interested in moderating the content creators. They create ever more rules and policies requiring content creators to jump through more and more hoops. Eventually the experience becomes so frustrating that people give up.

Wikipedia seem to me to be in that situation. StackOverflow is on its way there. It has exactly the same kind of problem with "deletionists" that Wikipedia has. Perfectly good questions are often closed for very arbitrary reasons.

julianlam · on Dec 20, 2019

The whole concept of "notability" in Wikipedia-land is subjective as hell. Whether your article makes it in is simply a matter of rolling the dice the first time you submit the article.

I created an article for NodeBB, a piece of forum software used worldwide by companies small and large (including several triple A gaming companies). We got AfD'd, and now every time someone creates an article for NodeBB, the AfD is brought up and the entire discussion ends as soon as it has begun.

We even created an article the _suggested_ way, by submitting a draft for review. It got reviewed alright... instant rejection because they felt it looked like an ad. We made changes, but nobody ever took a second look at the article.

Of course, a number of defunct open-source (and some proprietary) forum softwares with zero sources are still allowed on Wikipedia, simply due to the fact that they made it through when nobody was looking :)

One could argue that we shouldn't be writing our own articles (and they'd be right), so we just quietly accepted our judgement and market NodeBB based on the merits of the software, instead of whether it appears in some arbitrary ranking of forum software.

That said, it'd still be nice if we were listed in the Wikipedia list of forum softwares.... _sigh_, a guy can dream.

zamadatix · on Dec 20, 2019

From your own description it doesn't sound subjective as much as understaffed.

jabvigWe · on Dec 18, 2019

Add it to the Free Software Directory!

https://directory.fsf.org/wiki/Main_Page

aaron695 · on Dec 19, 2019

View the original declined drafts here -

https://en.wikipedia.org/w/index.php?title=Draft:Apache_Arro...

Geez if you want to use Wikipedia as an ad, put a bit of effort in, when did marketing become so lazy and blame the platform.

Although this meta ad is possibly a far better payoff.

michelpp · on Dec 18, 2019

We're having the same issue getting the GraphBLAS API article to be accepted: https://en.wikipedia.org/wiki/Draft:GraphBLAS. At first it was summarily deleted overnight, now we're stuck in Draft for who know how long.

nanoscopic · on Dec 21, 2019

This reminds me of my "war" to get an article for my parser XML::Bare.

There was a time when there was a comparison page for XML parsers, and many parsers had articles.

Still existing parsers on Wikipedia that should be removed; if they are to stay true to their war on having useful software info in Wikipedia:

https://en.wikipedia.org/wiki/Category:XML_parsers

The original argument was that if you can find a citation in print you can have whatever it is on Wikipedia, but that ceased to be true years ago and it has become a popularity contest and power struggle with obnoxious Wikipedia editors.

scarejunba · on Dec 19, 2019

https://en.wikipedia.org/wiki/Draft:Apache_Arrow

This reads like it was written by the guy who wrote it. It can do this. It efficiently does that. It’s all promotional content. Not useful.

ggggtez · on Dec 19, 2019

I've never heard of it. Add my vote to removing the article.

Cry more, company I never heard of either.

lmeyerov · on Dec 18, 2019

For context, some other companies contributing to it are in the GPU space, so orthogonal to CPU-centric Dremio: Nvidia, Blazing SQL, and Graphistry (us). Likewise, the pydata big guns intersect a bit here: conda, pandas, ... . This effort got a BOSSIE award for GPU dataframes this year and is taking off now that it is becoming usable for more than just framework devs. The reason we all really on it is because a standardized columnar IO streaming format is an awesome idea for compositional HPC.

It does sounds like maybe Dremio's CMO wrote the original articles and it came off centered on them? (Did not have a chance to read.)

tptacek · on Dec 18, 2019

Yes: Dremio's CMO wrote the article, and the article was overtly promotional. Of course it got killed.

TallGuyShort · on Dec 19, 2019

And the same guy submitted it here 3 times. Also overtly promotional.

riboflavin, genuine suggestion: ask one of the PMC members or committers to rewrite the article from scratch from an engineer's perspective, source everything, demonstrate notability, and resubmit. If they still don't take it, move on with your life. ... but you might have generated a lot of ill-will with the Wikipedia elites here already.

est31 · on Dec 19, 2019

Hmmm this reminds me of the battle to get a Wikipedia page approved for Minetest, the biggest FLOSS voxel engine out there:

https://en.wikipedia.org/wiki/Draft:Minetest

https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletio...

ForHackernews · on Dec 18, 2019

I've literally never heard of this piece of software, and it's fair to say I'm much more interested in FLOSS than the average person on the internet. Why should this thing have its own article and not just appear in a list of Apache foundation projects?

m_ke · on Dec 18, 2019

Because it's actually a pretty big deal for the (python) data science ecosystem.

aabbcc1241 · on Dec 19, 2019

You're free to post to anywhere; And each site admin/helper/whatever-title are free to do their own censorship or moderation.

It's the nature of the web.

ksec · on Dec 19, 2019

It was the same with 802.11ax aka WiFi 6.

Someone decided all the technical information on the subject are irrelevant and deleted all Data Rate and Technical Improvement section. Another reason was because those details were not finalised.

While it was a little frustrating that those useful information were gone as one could always found those in other source and media, but they also deleted the whole section on DensiFi [1], where all the major companies ( Apple, Broadcom, Cisco, Intel, Qualcomm, Huawei, Samsung and others ) behind the 802.11ax decided to do the work behind close door. TL;DR They were trying to push 802.11ax to the market earlier despite of all the un-resolved issues.

So I decided to add only the DensiFi section, and it was constantly being deleted within 24 hours. After a few weeks of fun the page simply got back to the original, where Data Rate and Improvement are back but DensiFi section is totally gone. So it turns out it wasn't the technical section they were trying to get rid of.

P.S We should be glad someone in the working group discovered this and called out on the action. The current WiFi 6 / 802.11ax situation and UX is much better than what we had when 802.11ac were shipped. Although this is at the expense of somewhat 2 years delay of the standard.

[1] https://mlexmarketinsight.com/insights-center/editors-picks/...

foota · on Dec 18, 2019

Can HN create a draft that would be accepted? :)

thebooktocome · on Dec 18, 2019

Nope, this violates Wikipedia's policy against "meat puppetry".

ErrantX · on Dec 18, 2019

Not at all! The main thrust of that policy is for discussions. Improving Wikipedia by conspiring to write a compliant article is explicitly allowed!

thebooktocome · on Dec 18, 2019

Tell your deletionist editors that.

cdeil · on Dec 19, 2019

I think the reason this is discussed now is because yesterday I tried to re-submit the Apache Arrow article. Here's what I wrote: https://en.wikipedia.org/w/index.php?title=Draft:Apache_Arro... It was rejected / reverted 10 minutes later by a Wikipedia editor. The blog post from Justin was in July 2019 (https://www.dremio.com/why-apache-arrow-wikipedia/)

There's many interesting and good points in the discussion here, thank you!

To add my 2 cents:

- Apache Arrow is notable, deserves a Wikipedia page. It might not have been when someone first tried to create a Wikipedia page for it in 2017 (see https://en.wikipedia.org/w/index.php?title=Draft:Apache_Arro...), but in the three years since it has become a major project, see e.g. https://blogs.apache.org/foundation/entry/the-apache-softwar... Notability is clearly subjective, depends on what the author and reviewer find interesting. In the variant I submitted yesterday I tried to make it clear why it's notable - Apache arrow is a standard format that connects different languages, runtimes, data systems, communities, e.g. the Python and Java data communities. See e.g. https://wesmckinney.com/blog/apache-arrow-pandas-internals/ - Apache Arrow is to my knowledge partly the brainchild of Wes McKinney, creator of pandas, it's his attempt (looking strongly like success) to resolve a major issue in data science. - I think it's a good point Justin made at https://www.dremio.com/why-apache-arrow-wikipedia/ that it's bad that Wikipedia editors reject articles on stuff they know nothing about - if you look at their profiles, they don't seem to have any knowledge or interest about technology or software. That's not a good system. - I haven't contributed to Wikipedia really before, and I don't understand the rules, I admit that. Probably what I did yesterday was just not following their process, and that's the reason my edit was reverted. I guess it's also true that Justin at first didn't do a great job at submitting an impartial, non-PR article. However, my understanding from looking at some drafts and the talk page is that he then took the editor comments into account, and the last variant of the page he tried to submit in July 2019 was OK. - So overall I think the answer to the question "Why isn't there a Wikipedia page on Apache arrow?" is that it's an unfortunate case of authors and editors not doing a great job. At least I'm pretty sure I didn't do a good job yesterday, I wanted to help, but only had an hour, not a day to learn how Wikipedia ticks and to do more research to find better references. I hope someone with more experience in Wikipedia and Arrow will try to re-write and re-submit the Wikipedia article in the future. - The rule to discourage (or forbid?) people involved with Apache Arrow from contributing to its Wikipedia page is unfortunate. I recently started to use it and learn about it, but I don't know much about it at this point. E.g. Wes McKinney has written at this point 8 high-quality blog posts about it (https://wesmckinney.com/archives.html) - those don't count as references? Even if he or the Apache Arrow team wrote a paper about it, it wouldn't count because it's a primary source, and Wikipedia only wants secondary sources to establish notability? There are ~ 100 videos on YouTube, and many blog posts and a few podcasts (e.g. https://softwareengineeringdaily.com/2016/07/17/apache-arrow...) that mention Apache Arrow. Naturally almost all of them are from Apache Arrow contributors, or from companies using Apache Arrow. - Apache Arrow has an interesting story, and it has evolved over the past years and will keep evolving, so I think exactly for that reason a Wikipedia page would be good to have, since the current project page and old blog posts don't capture that well.

swayvil · on Dec 18, 2019

[flagged]

blattimwind · on Dec 18, 2019

Now imagine the German Wikipedia, were Exclusionists are 200 times stronger than the Inclusionists, relative to the English Wikipedia.

yorwba · on Dec 18, 2019

The German Wikipedia is the fourth largest by number of articles, with one article for 33 German speakers, whereas the English Wikipedia has one article for 84.4 speakers. https://meta.wikimedia.org/wiki/List_of_Wikipedias_by_speake...

Doesn't look like the Exclusionists are really stronger on the German Wikipedia.

kstrauser · on Dec 18, 2019

200x? I'm imagining a single page, titled "Nein."

zeveb · on Dec 18, 2019

One could perhaps be forgiven for wishing that the deletionists would … delete themselves.

Seriously, though, bytes are cheap, and an article sitting somewhere in Wikipedia doing nothing and bothering no-one is pretty damned cheap too.

SpicyLemonZest · on Dec 18, 2019

I dunno, this kind of thing seems like exactly the canonical argument for deletionism. Maybe there's no cost to a page sitting on Wikipedia describing, like, some guy's special attack from Naruto. There are reasonable arguments that allowing things like that would set a bad precedent and encourage behavior that doesn't help the project, but I admit it's pretty tenuous.

There are obvious and important costs if Wikipedia articles start being perceived as promotional material rather than encyclopedia entries.

CharlesColeman · on Dec 18, 2019

> Maybe there's no cost to a page sitting on Wikipedia describing, like, some guy's special attack from Naruto.

There is a cost, but it's measured in hours of maintenance labor not bytes of storage.

If Wikipedia wants to maintain a semblance of accuracy [1] in the face of declining participation, it needs to concentrate its labor resources rather than spread them out.

[1] which IMHO is vital given its unwise prestige as arbiter or truth

sgift · on Dec 18, 2019

Since Wikipedias concentration of labor itself is a source of declining participation[1] it's doubtful that continuing this behavior will result in something else than a death spiral with even fewer people ready to do the work, more concentration, even fewer .. and so on.

[1] Among other reasons, more here: https://en.wikipedia.org/wiki/Wikipedia:Why_is_Wikipedia_los...

CharlesColeman · on Dec 19, 2019

> Since Wikipedias concentration of labor itself is a source of declining participation[1] it's doubtful that continuing this behavior will result in something else than a death spiral

I'm not as interested in the viability of Wikipedia's culture than the reliability of Wikipedia as a resource given its prominence. I'd take a dead Wikipedia over one that's lively and fun but full of crap and poorly-checked influence attempts.

It's never going to recapture its halcyon days, so it's going to have to evolve with the times in more ways than one.