This content offers a detailed analysis of the information sources used by artificial intelligence in 2025, based on a study of 40,000 searches. It covers the methodology, the types of preferred sources, their evolution according to the user journey, and the trends of major players in the sector. Through this in-depth study, you will understand how AI sources its data to respond reliably and appropriately to current expectations.

The Fundamentals of the Study on AI Information Sources

In 2025, the question of the origin of the data feeding AI has become crucial. At a time when conversational engines like Semantic Scholar or academic research tools play a central role, it is important to understand how these systems select, filter, and highlight their sources. Most researchers, as well as companies, are questioning the reliability and transparency of the data these AIs use to generate answers. The analysis methods used in this study, conducted by xfunnel, are both rigorous and innovative. It observed more than 250,000 outbound links from 40,000 responses generated by three of the leading conversational engines of 2025: ChatGPT, Perplexity, and Gemini (Google’s chatbot). Each citation was categorized according to specific criteria: source type, domain authority, stage of the user journey, etc. The results quickly reveal a priority: the majority of cited sources have high authority, particularly those with a domain score above 80. This means that a site recognized for its credibility, such as an official organization or a reputable media outlet, is more likely to be included in the responses of these AIs. This content, sourced from quality sources, ensures a certain legitimacy to the information produced. discover the leading sources of information on artificial intelligence, including articles, studies, and relevant resources to stay up to date on the latest advances and trends in this ever-evolving field. How AIs choose and enhance their sources of information To understand the logic behind these choices, we must analyze the methodology used by these AI assistants. The first step is to examine

tools for analyzing this data . Through a detailed classification, we see that sites with high authority clearly dominate the information landscape: nearly 32% of citations come from domains with a DA (domain authority) between 80 and 100. We find in particular: Institutional sites, such as those of governments or universities.

Major media and recognized press companies.

Specialized databases recognized for their scientific rigor.

Sites with low authority, generally speaking little or poorly about the reliability of the information, are rarely cited. The trend is therefore clear: to go beyond basic metrics, AI readily favors a selection of

solid and verifiable sources .In terms of typology, we observe a strong predilection for: “Earned” media, that is to say those which have earned their reputation through their content and credibility Third-party sites and influential blogs

  • UGC spaces, with particular attention to Reddit or YouTube, which provide a large amount of user content in real time
  • Sources according to the purchasing journey
  • What is striking is that the selection of sources by AI is not done at random. It evolves with the user’s approach:

Stage of the journey Preferred sourcesExample sites

Exploration

  1. Earned media
  2. And
  3. public content

Yahoo, Ecosia, news sites, influential blogs

Comparison

UGC and customer reviews G2, Trustpilot, specialized forums Final research & evaluation
Proprietary sites and direct competitors Official sites, product pages, premium comparison sites This strategic choice shows that AI, to better respond, adapts to the stage of the consumer or researcher’s journey. The platform doesn’t simply identify a source; it guides it according to the approach to provide a relevant and reliable answer. Discover the best sources of information on artificial intelligence. Stay up to date with the latest trends, research, and innovations in the field. Explore our recommendations to enrich your knowledge of AI. Major tech players and their influence on the sources cited
At the level of digital players, the differences are notable. Let’s decipher some major trends: Giants like Google, with their Google Search engine, tend to favor their own sites in their responses. Domination is often based on their authority, reinforced by their position in the ecosystem. Other search engines, such as Bing, or Yahoo, adopt a similar approach, leveraging their own index or drawing from accessible databases. Alternative search engines such as DuckDuckGo or Ecosia, which emphasize transparency and privacy, use more diverse sources and less biased algorithms, even if their citation volume is generally lower.
This phenomenon, combined with the massive presence of these sites in the responses, raises questions about transparency and impartiality. Competition between search engines has never been fiercer, particularly against players like Naver or Baidu, which also integrate local sources specific to their market. Your browser does not support this video. UGC Sources: Reddit, YouTube, and Other Community Giants in 2025

User-generated content remains a key pillar in building AI responses: Reddit and YouTube are at the forefront, embodying the richness and diversity of Internet users’ opinions. GitHub and Medium also significantly contribute to the knowledge base, particularly for technical or niche topics. Each engine prioritizes its own preferences: Perplexity favors YouTube and PeerSpot, while Gemini targets Medium and Reddit more. ChatGPT most often cites LinkedIn, G2, and Gartner Peer Reviews for professional opinions. Influencers, forums, and community spaces contribute to this content. Their role is to offer an immediate perspective, sometimes subjective, but often highly relevant to a specific need. Discover the best sources of information on artificial intelligence. Explore articles, studies, and reliable resources to understand the latest trends and technological advances in the field of AI. The Challenges, Limitations, and Prospects of Transparency in Feeding AI

What emerges from this analysis is a wealth of diversity in the sources, but also crucial issues:

Transparency

They initially prioritize third-party media during the exploration phase, then integrate more testimonials and user-generated content during the comparison phase, at the time of the final evaluation.

Yes, these search engines tend to prioritize their index or internal databases, while offering the user a diversity adapted to the sensitivity of their transparency issues.

Is UGC (User Generated Content) reliable?

  • It provides an immediate and often highly subjective perspective, but its reliability must always be verified, especially when it comes from forums or social networks. What ethical issues surround AI’s source selection? Transparency, neutrality, and a plurality of perspectives remain at the heart of the debate, while the obsession with dominating the information landscape can distort overall understanding.

Kevin Grillot

Écrit par

Kevin Grillot

Consultant Webmarketing & Expert SEO.