The problem is that Google's search engine - but, oddly enough, ALL search engines - got worse before that already. I noticed that search engines got worse several years before 2022. So, AI further decreased the quality, but the quality had a downwards trend already, as it was. There are some attempts to analyse this on youtube (also owned by Google - Google ruins our digital world); some explanations made sense to me, but even then I am not 100% certain why Google decided to ruin google search.
One key observation I made was that the youtube search, was copied onto Google's regular search, which makes no sense for google search. If I casually search for a video on youtube, I may be semi-interested in unrelated videos. But if I search on Google search for specific terms, I am not interested in crap such as "others also searched for xyz" - that is just ruining the UI with irrelevant information. This is not the only example, Google made the search results worse here and tries to confuse the user in clicking on things. Plus placement of ads. The quality really worsened.
With them, at least the AI stuff can be turned off.
Membership is presently about 61k, and seems to be growing about 2k per month: https://kagi.com/stats
If someone can point me to a better index for that purpose, I'd love to avoid Yandex. Please inform me.
It's worth pointing out the flaws of all bad actors. The more info we have, the more effectively we can act.
Some of us would rather take a stand, imperfect as it is, than just sit and do nothing. Especially in the very clear case of someone (Kagi) doing business with a country that invaded a neighboring country for no reason, and keeps killing people there.
First, any stand is better than whataboutism and just sitting there doing nothing.
Second, this stand results from my thoughts. It is my stand. There are many like it, but this one is mine.
Third, in the history of the modern world there were very few black&white situations where there was one side which was clearly the aggressor. This is one of them.
I definitely disagree with this. There are many cases where you might take the wrong stand, especially where you do not have detailed knowledge of the issue you re taking a stand over.
It's a lot easier to live with yourself when you act according to your best understanding of the situation than when you allow fear to paralyze you into inaction at a time when you should have done something.
But that's the whole point, isn't it? What is "wrong"? You decide.
You see people in a peaceful country getting invaded and being bombed, shot, raped and tortured? You decide if this is "wrong". I'm just saying that you should decide, rather than say "but what about something else".
My point was that we should either not take an issue with things like this and block everything in the whole country because their government is bad. Or we should do the same for other countries too.
* most of the evil stuff Russia's done, the USA's done way more of
You can't call for a boycott on a cosmetics company that experiments on dogs if you are a rival cosmetics company that experiments on ten times as many dogs.
Kagi uses both Yandex and Google btw.
/hj?
FWIW, I don't think Kagi should remove or avoid indexing content from countries that invade others, because a lot of the times websites in those countries have useful information on them. If Kagi were to enact such a block, it would mean it would no longer surface results from HN, reddit and a bunch of other communities, effectively making the search engine a lot less useful.
Feels more like a scare campaign to me - someone doesn't want you to use Kagi, and points to Yandex as a reason for that.
If you are concerned about heinous war crimes and the slaughter of civilians to the point that you don't want to use private services from countries that conduct such acts, you should avoid both already.
The post you linked was posted when the divestment was already going underway, so it is at least dishonest if not malicious.
For instance here you can learn that Yandex NV is fully controlled by a group of Russian investors: https://www.rbc.ru/business/06/03/2024/65e7a0f29a7947609ea39...
They have a lot of hardware in e.g. Finland. I don't think they provide GPU access to the russian companies, feel free to correct me
https://som.yale.edu/story/2022/over-1000-companies-have-cur...
You pays your money, you takes your choice.
Its a different story if the payer truly can't afford to pay the alimony, but at that point they wouldn't have the immense power you are concerned with.
Now we need a 2nd Kagi, so we can switch to that one instead. :(
But as one counterexample: The end of the US penny was formed and announced not with public legislative discourse, nor even with an executive order, but with a brief social media post by the president.
And I don't mean that it's atrocious or anything, but I wanted to see that social media post myself. Not a report about it, or someone's interpretation of it, but -- you know -- the actual utterance from the horse's mouth.
Which should be a simple matter. After all, it's the WWW.
And I've been Googling for as long as there has been a Google to Google with. I'd like to think that I am proficient at getting results from it.
But it was like pulling teeth to get Google to eventually, kicking and screaming, produce a link to the original message on Truth Social.
If that kind of active reluctance isn't censorship on Google's part, then what might it be described as instead?
And if they're seeking to keep me away from the root of this very minor issue, then what else might they also be working to keep me from?
But Google does censor.
There certainly is a huge army of people ready to spout this sort of nonsense in response to anyone talking about doing anything.
Hard to know what percentage of these folks are trying to assuage their own guilt and what percentage are state actors. Russia and Israel are very chronically online, and it behooves us internet citizens to keep that in mind.
Should we stop using products imported from China for the cultural genocide they've perpetrated against the Uyghurs?[2]
Is Yandex Russia?
[1] https://en.wikipedia.org/wiki/Casualties_of_the_Iraq_War
[2] https://en.wikipedia.org/wiki/Persecution_of_Uyghurs_in_Chin...
However if that's the case how can they continue buying Chinese products when China has done the same thing, but worse, and for longer, to their own population? Because it's less convenient to stop? _That_ to me lands squarely in the "take whatever stand you want" category with the addendum of, "and don't worry if it doesn't make sense."
Is it because it's within their own borders and therefore isn't our problem?
Why are you assuming they are?
That's one way of phrasing it.
Regardless of one's position on the 'everything online is Russian propaganda, Russian bots or misinformation - invest in sickles and hammers, comrade / wtf just use basic common sense and the internet is as safe as it ever was' continuum, such universal enthusiasm for a Russian-owned, Russian-controlled search engine should generate a little more counter-argument, at the very least.
Absolutely no mention of Google, Bing, Startpage, DDG, or even Mojeek search engines usually pass online without somebody detailing the problems, flaws, or why they're not as good as the alternatives. Usually, at least 20% of the comments will be overtly critical, with at least 1 person passionately arguing that this search engine is going to destroy life as we know it / funds genocide / is an abomination unto God.
On open forums and spaces where a variety of users and tastes are represented, that minimum level of criticism usually applies to absolutely everything from movies to toothbrushing techniques to kids' TV to low-carb breakfasts. If more than 3 people care enough about something to discuss it, at least 1 of those people will hate it and feel the need to enunciate why.
Except Kagi. Kagi must enjoy the highest praise-criticism ratio of anything I've ever seen on the web, including concepts like sunshine and heaven and the eradication of polio.
Seriously. The only 'real' criticism I ever see of Kagi is like 'I personally don't like it because I don't think a search engine is worth more than $19.99' or 'unfortunately I need x feature', and it's always followed by a reply saying 'Ah, well Kagi is now available for $19.50' or 'you'll be thrilled to know that x feature can be enabled in Kagi by following these steps'.
And the occasional 'I don't use it because it seemed a bit wierd and wasn't worth it' comment languishing on the outskirts of the discussion.
So yeah. I do not expect this comment to stir much discussion, mainly because it's like 24 hours after the main debate and is on a pretty low-impact thread on hacker news from an uninspiring new ish account. But also because Kagi critical comments are written in sand, whatever the discussion or authority or audience.
That should make people more suspicious.
Maybe people just turn up too late and their comments generally aren't seen?
https://www.google.com/search?udm=14&q=kagi
My default browser search tool is set to google with ?udm=14 automatically appended.ie it's not forced down your throat, nor mysteriously/accidentally/etc turned back on occasionally
Did you mean:
worse results near me
are worse results worth it
worse results net worth
best worse results
worse results reddit
Did you mean vim ?
(vice-versa)You assume the aim here is for you to find relevant information, not increase user retention time. (I just love the corporate speak for making people's lives worse in various ways.)
The funny thing is that it seems like when they gave up it wasn't because some new advancement in the arms race. It was well before LLMs hit the scene. The SEO spam was still incredibly obvious to a human reader. Really seems like some data-driven approach demonstrated that surrendering on this front led to increased ad revenue.
Problem is that no mainstream search engine will do it because they happen to also be in the ad business and wouldn't want to reduce their own revenue stream.
Though maybe it's a long term gain. I know many normal (i.e. non-IT) people who've noticed the poor search results, yet they continue to use Google search.
To be fair, that's most of what I use search for these days is "<<Programming Language | Tool | Library | or whatever>> <<keyword | function | package>>" then navigate to the documentation, double check the versions align with what I'm writing software in, read... move on.
Sometimes I also search for "movie showtimes nyc" or for a specific venue or something.
So maybe my use cases are too specific to screw up, who knows. If not, maybe DDG is worth a try.
That's a separate problem. The search algorithm applied on top of the underlying content is a separate problem from the quality or origin of the underlying content, in aggregate.
Years ago, I would consider a search "failed" if the page with related information wasn't somewhere in the top 10. Now a search is "failed" if the AI answer doesn't give me exactly what I'm looking for directly.
Ask Prabhakar Raghavan. Bet he knows.
(wrote up in https://www.latent.space/i/139368545/the-concept-of-low-back... - but ironically repeating something somebody else said online is kinda what i'm willingly participating in, and it's unclear why human-origin tokens should be that much higher signal than ai-origin ones)
"...began to fall in 1963, when the Partial Nuclear Test Ban Treaty was enacted, and by 2008 it had decreased to only 0.005 mSv/yr above natural levels. This has made special low-background steel no longer necessary for most radiation-sensitive uses, as new steel now has a low enough radioactive signature."
What we're seeing now is something more like the peak of summer. If it ends up being a bubble, and it burtst, some months after that will be "AI Winter" as investors won't want to continue chucking money at problems anymore, and it'll go back to "in the background research" again, as it was before.
Also that winter comes after September (fall)
Apparently, comparing low-background steel to pre-LLM text is a rather obvious analogy.
If you have a thought, it's likely it's not new.
i claimed swyx heard it through me - which he did
but we appreciated that, we called it "standing on the shoulders of giants"
We do not see nearly so far though.
Because these days we are standing on the shoulders of giants that have been put into a blender and ground down into a slippery pink paste and levelled out to a statistically typical 7.3mm high layer of goo.
I think all we can expect from internet information is a good description of the distribution of materials out there, not truth. This is totally within the capabilities of LLMs. For additional confidence run 3 reports on different models.
Whether or not the optimization functions align with human survival, and thus our whole existence is not a slop, we're about to find out.
- Sir, this is an elevator.
The industrial age was built on dinosaur slop, and they were giant.
Listen, lad. I built this kingdom up from nothing. When I started here, all there was was swamp. Other kings said I was daft to build a castle on a swamp, but I built it all the same, just to show 'em. It sank into the swamp. So, I built a second one. That sank into the swamp. So, I built a third one. That burned down, fell over, then sank into the swamp, but the fourth one... stayed up! And that's what you're gonna get, lad: the strongest castle in these islands.
While this is religious: [24] “Everyone then who hears these words of mine and does them will be like a wise man who built his house on the rock. [25] And the rain fell, and the floods came, and the winds blew and beat on that house, but it did not fall, because it had been founded on the rock. [26] And everyone who hears these words of mine and does not do them will be like a foolish man who built his house on the sand. [27] And the rain fell, and the floods came, and the winds blew and beat against that house, and it fell, and great was the fall of it.”
Humans build not on each other's slop, but on each other's success.Capitalism, freedom of expression, the marketplace of ideas, democracy: at their best these things are ways to bend the wisdom of the crowds (such as it is) to the benefit of all; and their failures are when crowds are not wise.
The "slop" of capitalism is polluted skies, soil and water, are wage slaves and fast fashion that barely lasts one use, and are the reason why workplace health and safety rules are written in blood. The "slop" of freedom of expression includes dishonest marketing, libel, slander, and propaganda. The "slop" of democracy is populists promising everything to everyone with no way to deliver it all. The "slop" of the marketplace of ideas is every idiot demanding their own un-informed rambling be given the same weight as the considered opinions of experts.
None of these things contributed our social, technological, or economic advancement, they are simply things which happened at the same time.
AI has stuff to contribute, but using it to make an endless feed of mediocrity is not it. The flood of low-effort GenAI stuff filling feeds and drowning signal with noise, as others have said: just give us your prompt.
Why is anybody still surprised that the AI bubble made it that big?
If Einstein came up with relativity by standing on "the religious non-sense and superstitions of the medieval ages," you'd have a point.
They have so many ways of saying "God" without saying God.
You might be missing the point of science.
It's ultimately an endeavor of finding testable descriptions of the world in the face of being fallible. It's not about the "why". It's about "how" the world is. No faith required. "Why" the world is is a philosophical question and perhaps a religious one. But that has nothing to do with testable theories.
Any scientific theory gains credibility by providing ways to test it. Each such experiment that fails to disprove the theory increases confidence in the theory's validity. There is no faith required for any of that and no god either. If you can predict that conditions A and B lead to C happening, and I can try it and see that indeed C is happening, then you have science going on, without any faith.
- (1) A lot of developing can be just chores around managing scaffolds and repeatable work, and due to this macros, autogenerated code and other tools have been a thing at many layers for a long time; and
- (2) I remember copy-pasting from Google/StackOverflow (i.e. mostly search + pattern matching with some minimal reasoning) being criticized as a low-effort mode of development during the 2010s, before ChatGPT and AI assisted coding tools took over that part.
So yes, I'd argue a huge amount of software development problems can be solved without ever actually reasoning from first principles, AI tools just made that more visible.
https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo...
Like this:
"You can't believe everything you read on the internet." -- Abraham Lincoln, personal correspondence, 1863
I think there may be a way to disable this, but I don’t care enough to bother.
If people want to think my posts are AI generated, oh well.
It depends if you put the space before and after the dashes--that, to be clear, are meant to be there--or if you don't.
- vs – vs —
Compiler error while working on some ObjC. Nothing obviously wrong. Copy-pasted the line, same thing on the copy. Typed it out again, no issue with the re-typed version. Put the error version and the ok version next to each other, apparently identical.
I ended up discovering I'd accidentally lent on the option key while pressing the "-"; Monospace font, Xcode, m-dash and minus looked identical.
https://en.wikipedia.org/wiki/Whitespace_character#Hair_spac...
Typographers usually add space to the left side of the following marks:
: ; ” ’ ! ? / ) ] } * ¿ › » @ ® ™ ℓ ° ¡ ' " † + = ÷ - – —
And they usually add space to the right of these: “ ‘ / ( [ { > ≥ < ≤ £ $ ¢ € ‹ « √ μ # @ + = ÷ - – —
https://www.smashingmagazine.com/2020/05/micro-typography-sp...1. (letterpress typography) A piece of metal type used to create the narrowest space. 2. (typography, US) The narrowest space appearing between letters and punctuation.
https://en.wiktionary.org/wiki/hair_space
Now I'd like to see how the metal type looks like, but ehm... it's difficult googling it. Also a whole collection of space types and what they're called in other languages.
Similarly, French puts spaces before and after ? ! while English and German only put spaces afterwards.
[EDIT: I originally wrote that French treats . , ! ? specially. In reality, french only treats ? and ! specially.]
Didn't know! Woot, I win!
Why does AI have a preference for doing it differently?
So, it’s not unambiguously s substitute for either is essentially its own punctuation mark used in ASCII-only environments with some influence from both the use of em-dashed and that of en-dashes in more formal environments.
(Or something like that: it's been awhile since I played the game, and I don't remember the specific details of the story.)
It makes me wonder if a new human-only internet will need to be made at some point. It's mostly sci-fi speculation at this point, and you'd really need to hash out the details, but I am thinking of something like a meatspace-first network that continually verifies your humanity in order for you to retain access. That doesn't solve the copy-paste problem, or a thousand other ones, but I'm just thinking out loud here.
Er...digital id.
If someone can consistently produce high-quality content with AI assistance, so be it. Let them. Most don't, though.
AIslop you can produce faster than you're able to read it. This makes it incredibly costly to filter out in comparison. It just messes so much with the signal to noise ratio on the web.
Why is this the problem and not the reverse - using AI without adding anything original into the soup? I could paraphrase an AI response in my own words and it will be no better. But even if I used AI, if it writes my ideas, then it would not be AI slop.
[citation needed]
(I see absolutely no reason why that should be the case)
That being said, the idea of a new freer internet is reality.. Mastodon is a great example. I think private havens like discord/matrix/telegram are an important step on the way.
https://cyberpunk.fandom.com/wiki/Blackwall_Gateway
Absolutely brutal: https://www.youtube.com/watch?v=LD5z3GmQRXQ
---
I also noticed how simple the "new web" is when interacting with it. Of course, that's a game mechanic, but also kinda makes sense.
Only if those humans don't take their leads from AI. If they read AI and write, not much benefit.
the only place I've ever had any issue with AI content is r/chess, where people love to ask ChatGPT a question and then post the answer as if they wrote it, half the time seemingly innocently, which, call me racist, but I suspect is mostly due to the influence of the large and young Indian contingent. otherwise I really don't understand where the issue lies. follow the exact same rules you do for avoiding SEO spam and you will be fine
It misidentified what the actual bug was.
But the tone was so confident, and he replied to my later messages using chat gpt itself, which insisted I was wrong.
I don't like this future.
What you're describing is not the future. It's a fireable offense.
Some of the science, energy, and technology subreddits receive a lot of ChatGPT repost comment. There are a lot of people who think they’ve made a scientific or philosophical breakthrough with ChatGPT and need to share it with the world.
Even the /r/localllama subreddit gets constant AI spam from people who think they’ve vibecoded some new AI breakthrough. There have been some recent incidents where someone posted something convincing and then others wasted a lot of time until realizing the code didn’t accomplish what the post claimed it did.
Even on HN some of the “Show HN” posts are AI garbage from people trying to build portfolios. I wasted too much time trying to understand one of them until I realized they had (unknowingly?) duplicated some commits from upstream project and then let the LLM vibe code a README that sounded like an amazing breakthrough. It was actually good work, but it wasn’t theirs. It was just some vibecoding tool eventually arriving at the same code as upstream and then putting the classic LLM written, emoji-filled bullet points in the README
Yes, it is because of the other side of the coin. If you are writing human-generated, curated content, previously you would just do it in your small patch of Internet, and probably SEs (Google...) will pick it up anyway because it was good quality content. You just didn't care about SEO-driven shit anyway. Now you nicely hand-written content is going to be fed into LLM training and it's going to be used - whatever you want it or not - in the next generation of AI slop content.
Slop did not originate from AI itself, but from the feed ranking Algorithm which sets the criteria for visibility. They "prompt" humans to write slop.
AI slop is just an extension of this process, and it started long before LLMs. Platforms optimizing for their own interest at the expense of both users and creators is the source of slop.
Also, the AI slop is covering almost every sentence or phrase you can think of to search. Before, if I used more niche search phrases and exact searches, I was pretty much guaranteed to get specific results. Now, I have to wade through pages and pages of nonsense.
* ChatGPT hallucinated an answer
* ChatGPT put it in my memory, so it persisted between conversations
* When asked for a citation, ChatGPT found 2 AI created articles to back itself up
It took a while, but I eventually found human written documentation from the organization that created the technical thingy I was investigating.
This happens A LOT for topics on the edge of knowledge easily found on the Web. Where you have to do true research, evaluate sources, and make good decisions on what you trust.
Except it's all via the chat bot and it isn't as easy to get it to move off of a broken solution.
chatgpt
vs
chatgpt before:2022-01-01
give me quite different results. In the 2nd query, most results have a date listed next to them in the results page, and that date is always prior to 2022. So the date filtering is "working". However, most of the dates are actually Google making a mistake and misinterpreting some unimportant date it found on the page as the date the page was created. At least one result is a Youtube video posted before 2022, that edited its title after Chatgpt was released to say Chatgpt.
Disclosure: I work at Google, but not on search.
just use Kagi and block all SEO sites...
https://help.kagi.com/kagi/features/slopstop.html
That's specifically for AI generated content, but there are other indicators like how many affiliate links are on the page and how many other users have downvoted the site in their results. The other aspect is network effect, in that everyone tunes their sites to rank highly on Google. That's presumably less effective on other indices?
https://www.mojeek.com/search?q=britney+spears+before%3A2010...
This goes for you, too, website search.
All AI is doing is making it harder to know what is good information and what is slop, because it obscures the source, or people ignore the source links.
Plus other sites that link to the content could also give away it's date of creation, which is out of the control of the AI content.
I believe I learned about it through HN, and it was this blog post: https://hallofdreams.org/posts/physicsforums/
It kind of reminds me of why some people really covet older accounts when they are trying to do a social engineering attack.
According to the article, it was the founder himself who was doing this.
None of these documents were actually published on the web by then, incl., a Watergate PDF bearing date of Nov 21, 1974 - almost 20 years before PDF format got released. Of course, WWW itself started in 1991.
Google Search's date filter is useful for finding documents about historical topics, but unreliable for proving when information actually became publicly available online.
https://www.google.com/search?q=site%3Achatgpt.com&tbs=cdr%3...
So it looks like Google uses inferred dates over its own indexing timestamps, even for recently crawled pages from domains that didn't exist during the claimed date range.
I wonder why they do that when they could use time of first indexing instead.
It also makes me wonder how future kids will see this era. Maybe it will look the same way early mechanical computers look to us. A short period where people had to be unusually inquisitive just to make things work.
Plus, the AI already read everything made before 2023, so what does it matter?
Creatives need to think a bit bigger with this particular issue.
I find it a bit annoying to navigate between hallucinations and outdated content. Too much invalid information to filter out.
How does it do that? At least Google seems to take website creation date metadata at face value.
It's a hell of a lot better than nothing, if one is using chrome or Firefox (neither of which are my primary browsers).
Actually, it came out in 2015 and was just low budget.
I use this to find old news articles for instance.
What we really need to do is build an AI tool to filter out the AI automatically. Anybody want to help me found this company?
You could even add options for later cutoffs… for example, you could use today’s AIs to detect yesterday’s AI slop.