Data Pipeline
Data Pipelines capture data inputs, retain data for a period of time, and deliver data to receivers.
The implementation of a data pipeline can take a number of forms:
a general pipeline form of a series of processes that are linked together
a more specific form of a pipeline system such as Apache Kafka
Terminology
Depending on the specific implementation of a Data Pipeline, the terminology used can vary:
Data
Data
Records
Streams
Messages
Indexes
Indexes
Topics
Consumers
Loading Functions
Loading
Ingestion
Importing
Registration
Subscribing
Publishing
Connectors
Producing/Producers
Queuing Functions
Queuing
Streaming
Logging
Storing
Messaging
Brokering/Brokers
Threading/Threads
Clustering/Clusters
Indexing Functions
Indexing
Tracking
Topics
Catalog Functions
Cataloging
Categorizing
Retrieval Functions
Connecting/Connectors
Listening/Listeners
Subscribing/Subscribers
Exporting
Reading
Consuming/Consumers
Subscribing
Distributing
Producing
Key Performance Factors
Key performance factors to consider and monitor include:
throughput
real-time response times
batch response times
queuing data retrieval time period