Data Flow is a machine learning pattern representing the data movement sequence in the AI engineering life cycle.
First, Data is processed layer by layer, as shown in Fig.1, to prepare it for storage, training, etc.
Then, data passes through processing layers as it is stored, refined, and prepared for use in Machine Learning models and applications. In a more functional perspective, the data is then used by different machine learning function groups, as shown below:
A detail for each layer in the above chart is as follows:
Data sources include:
Company Internal Databases
Company Internal Files
Websites
Public Data
Smartphone Apps
IoT Devices
Commercial Data Aggregators
Point of Sale
Corporate Internal Processes
Social Media
Data Streams
Capture mechanisms include:
Website Scraping
Website and Smartphone Chat Dialogues
Website and Smartphone Form Submissions
IoT Device Interfaces
Commercial Data Aggregator Feeds
Corporate Internal Process Feeds
Pipeline processes include:
Data Ingestion
Data Temporary Storage
Data Subscription
Data Publication
Databases include:
Data Lakes
Sequel Databases
Document Databases
Graph Databases
ETLs Include:
Extract Functions: pulling data from selected sources
Transform Functions: normalization, regularization, aggregation
Load Functions: saving data in formats for use in modeling processes
Model-type category examples include:
Artificial Neural Networks
Decision Trees
Probabilistic Graphical Models
Cluster Analysis
Gaussian Processes
Regression Analysis
Application examples include:
Medical Diagnosis
Autonomous Vehicles
Chatbot Dialog
Image Recognition
Face Recognition
Product Recommendations
Churn Prediction
Malware Detection
Search Refinement