# Data Collection

In AI, gathering data is a big hurdle slowing down progress. A lot of machine learning projects work is about getting the data ready. This includes collecting, cleaning, analyzing, showing it visually, and preparing features. Collecting data is the toughest of all these steps for a few reasons.

First, when machine learning is applied to new areas, there often isn't enough data to train the machines. Older fields like translating languages or recognizing objects have a ton of data collected over the years, but new areas don't have this advantage.

Also, with deep learning becoming more popular, the need for data has gone up. In traditional machine learning, much effort goes into feature engineering, where you need to know the field well to pick and create features for training. Deep learning makes this easier by figuring out features independently, which means less work in preparing data. But, this ease comes with a trade-off: deep learning usually needs more data to work well. So, finding effective and scalable ways to collect data is now more critical than ever, especially for big language models (LLMs).

Fig.1 shows a high-level landscape of data collection for machine learning. The sub-topics that the community can decentrally contribute are highlighted using green text.

<figure><img src="/files/gieJGKf7SnwVV1HH7DoY" alt=""><figcaption><p>Fig.1 landscape of data collection</p></figcaption></figure>

Anyone can help the entire DIN network collect on-chain and off-chain data through the two dApps in the ecosystem, Analytix and [xData](/din-cook-data-for-ai/how-to-join/xdata-explained.md).

The network rewards data collection nodes based on the data quality (this quality assessment standard is automatically determined by the network, that is, with the help of the validator node).&#x20;

The validator node is permissionless, which ensures that the more people participate in network construction, the more robust the entire network will be.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.din.lol/din-cook-data-for-ai/the-concept/comprehensive-network-architecture/exploring-din-an-in-depth-technical-overview/data-collection.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
