Saved searches
Use saved searches to filter your results more quickly
Cancel Create saved search
Sign up Reseting focus
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
Concurrent and multi-stage data ingestion and data processing with Rust+Tokio
License
Notifications You must be signed in to change notification settings
Rustixir/tokio_sky
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Go to file
Folders and files
Last commit message
Last commit date
Latest commit
History
View all files
TokioSky
Build concurrent and multi-stage data ingestion and data processing pipelines with Rust+Tokio. TokioSky allows developers to consume data efficiently from different sources, known as producers, such as Apache Kafka, Apache Pulsar and others. inspired by elixir broadway
Features
- Producer - source of data pipelines
- Processor - process message also can send to next stage by dispatcher
- BatchProcessor process group of message, that is used for latest stage, can not have next stage
- Dispatcher - dispatch message with three mode ( RoundRobin , BroadCast , Partition )
- Customizable - can use built-in Producer , Processor , BatchProcessor like Apache Kafka, Apache Pulsar or write your custom Producer , Processor , BatchProcessor
- Batching - TokioSky provides built-in batching, allowing you to group messages either by size and/or by time. This is important in systems such as Amazon SQS, where batching is the most efficient way to consume messages, both in terms of time and cost. Good Example imagine processor has to check out a database connection to insert a record for every single insert operation, That’s pretty inefficient, especially if we’re processing lots of inserts.Fortunately, with TokioSky we can use this technique, is grouping operations into batches, otherwise known as Partitioning. See Example
- Dynamic batching - TokioSky allows developers to batch messages based on custom criteria. For example, if your pipeline needs to build batches based on the user_id, email address, etc, See Example
- Ordering and Partitioning - TokioSky allows developers to partition messages across workers, guaranteeing messages within the same partition are processed in order. For example, if you want to guarantee all events tied to a given user_id are processed in order and not concurrently, you can use Dispatcher with Partition mode option. See Example.
- Data Collector - when source Producer of your app is web server and need absorb data from client request can use 'Collector' as Producer , that asynchronous absorb data, then feeds to pipelines See Example
- Graceful shutdown - first terminate Producers, wait until all processors job done, then shutdown
- Topology - create and syncing components
Examples
The complete Examples on Link.
Explain
- factory - instance factory
- concurrency - creates multiple instance (For parallelism)
- router - used by dispatcher for routing message ( RoundRobin || BroadCast || Partition )
- producer_buffer_pool - producer internally used buffer for increase throughout
- run_topology - TokioSky always have one Producer Layer and at-least have 1 processor layer and at-max 5 processor layer and 1 optional layer batcher for creating and syncing components must use run_topology_X or run_topology_X_with_batcher
Attention
- Producer.dispatcher cannot be Partition mode
- Processor if have not next stage channel must return ProcResult::Continue unless processor (skip) that message
- All Built-in processor if have next stage, must dispatcher not be partition mode
Crates.io
tokio_sky = 1.0.0