Enterprises grappling with fragmented data pipelines and costly migrations face significant barriers to effective AI adoption. Tom Sawyer Software, a leader in graph and data visualization, has announced the beta release of Tom Sawyer Data Streams, a schema-driven platform designed to extract, transform, and load (ETL) both streaming and legacy data from diverse sources into a single, governed, and query-ready knowledge graph. The platform aims to create a reliable context layer for AI pipelines, operational decisions, and analytics without requiring expensive re-platforming.
Tom Sawyer Software launches Data Streams 1.0 Beta, a schema-driven ETL platform.
It ingests data from Apache Kafka/Confluent topics, databases, files, and APIs.
The platform transforms and links streaming data into a persisted, governed knowledge graph.
Key use cases are AI pipeline preparation, RAG, validating AI outputs, and impact analysis.
It features a visual flow designer, real-time processing, and enterprise security.
The output integrates with Tom Sawyer's visualization tools for immediate exploration.
Data Streams addresses the core challenge of data isolation by using Apache Kafka as a central nervous system. It subscribes to Kafka topics—fed from databases, CDC tools, files, or APIs—and applies user-defined transformations to model the incoming data as a knowledge graph in real-time. This approach allows enterprises to consolidate disparate structured and unstructured data into a semantically rich, linked-data model that serves as a single source of truth for downstream AI applications, including retrieval-augmented generation (RAG) and validation of generative AI results.
Brendan Madden, CEO of Tom Sawyer Software, explained the value proposition: "Data Streams is a breakthrough capability for enterprises struggling with isolated and legacy datasets, costly migrations, and streamlining AI pipelines. Data Streams uses Kafka, CDC, and well-defined transformations to assemble a governed knowledge graph alongside your current stack. The result is lower storage and migration spend, and a reliable context layer for operations and AI."
The platform emphasizes developer and data architect productivity through a web-based visual designer for building and monitoring data flows. Users can automatically extract schemas from Kafka topics, then refine them with a visual editor to rename fields, convert node types, and apply filters using Spring Expression Language (SpEL). This governance ensures the resulting knowledge graph reflects business semantics and maintains consistency across all integrated sources, which is critical for accurate lineage tracking and impact analysis.
Built for the enterprise, Data Streams supports secure authentication (OAuth 2.0, Keycloak), Docker-based installation for cloud, on-premises, and air-gapped environments, and the option to publish results back to Kafka for other services. A key advantage is its seamless integration with Tom Sawyer's existing visualization tools, Perspectives and Explorations, allowing users to move rapidly from data ingestion to interactive visual exploration of the graph to uncover patterns, relationships, and trends.
Tom Sawyer Data Streams represents a strategic evolution from visualization tools to a full-stack data integration and contextualization platform. By focusing on the knowledge graph as the target model and leveraging ubiquitous streaming infrastructure, it offers a pragmatic path for organizations to modernize their data estate for AI. This approach avoids "rip-and-replace" scenarios, instead layering intelligence and connectivity over existing systems to unlock operational and analytical insights that were previously obscured by data silos.
About Tom Sawyer Software
Tom Sawyer Software is the leading provider of software and services that enable organizations to build highly scalable and flexible graph and data visualization and analysis applications. These applications are used to discover hidden patterns, complex relationships, and key trends in large and diverse datasets. Tom Sawyer Software serves clients with needs in link analysis; network topology; architectures and models; schematics and maps; and dependencies, flows, and processes. We help clients federate and integrate their data from multiple sources and build the graph and data visualization applications that are critical to analyzing and gaining insight into their data.