PostgreSQL and pgvector: Now Faster than Pinecone, 75% cheaper, 100% open-source. Introducing pgvectorscale, an open-source PostgreSQL extension that builds on pgvector, enabling greater performance and scalability. Here’s how pgvectorscale helps pgvector outperform specialized vector database like Pinecone: 1️⃣ StreamingDiskANN: A new vector search index that overcomes limitations of in-memory indexes like HNSW the index on disk, making it more cost-efficient to run and scale as vector workloads grow. Inspired by the DiskANN paper from Microsoft. 2️⃣ Statistical Binary Quantization (SBQ): Developed by researchers at Timescale, this technique improves on standard binary quantization techniques by improving accuracy when using quantization to reduce the space needed for vector storage 3️⃣ Written in Rust, giving the PostgreSQL community to contribute to vector support. 📈The result? On our benchmark of 50 million Cohere embeddings (768 dimensions each), PostgreSQL with pgvector and pgvectorscale achieves 28x lower p95 latency and 16x higher query throughput compared to Pinecone for approximate nearest neighbor queries at 99 % recall, all at 75 % less cost when self-hosted on AWS EC2. We also tested it against Pinecone’s p2 high performance index, see the blog post at the end of this post for full results (spoiler: It’s just as impressive). Pgvectorscale is open-source under the PostgreSQL license and free for you to use on any PostgreSQL database for your AI projects. To get started, see the pgvectorscale github repo: https://lnkd.in/ghXj2e-U Or try it on Timescale Cloud on any new database service. Eager to learn more about pgvectorscale and how it works? Head over to our blog post with all the details: https://lnkd.in/gcMcxrVb
Timescale
Software Development
New York, New York 10,361 followers
Timescale is the modern cloud platform built on PostgreSQL for time series, events, and analytics.
About us
Timescale is addressing one of the largest challenges (and opportunities) in databases for years to come: helping developers, businesses, and society make sense of the data that humans and their machines are generating in copious amounts. TimescaleDB is the only open-source time-series database that natively supports full-SQL, combining the power, reliability, and ease-of-use of a relational database with the scalability typically seen in NoSQL systems. It is built on PostgreSQL and optimized for fast ingest and complex queries. TimescaleDB is deployed for powering mission-critical applications, including industrial data analysis, complex monitoring systems, operational data warehousing, financial risk management, and geospatial asset tracking across industries as varied as manufacturing, space, utilities, oil & gas, logistics, mining, ad tech, finance, telecom, and more. Timescale is backed by NEA, Benchmark, Icon Ventures, Redpoint Ventures, Two Sigma Ventures, and Tiger Global. Documentation: https://docs.timescale.com GitHub: https://github.com/timescale/timescaledb Twitter: https://twitter.com/timescaledb
- Website
-
https://www.timescale.com/
External link for Timescale
- Industry
- Software Development
- Company size
- 51-200 employees
- Headquarters
- New York, New York
- Type
- Privately Held
- Founded
- 2015
- Specialties
- RDBMS, OpenTelemetry, Observability, Promscale, Technology, PostgreSQL, SQL, Data Historian, Geospatial Data, Time-Series Data, Databases, IoT, Sensor Data, Metrics, Developer Community, Software Development, Open Source, Software, and Data Management
Products
Timescale Cloud
Time Series Databases (TSDB)
TimescaleDB is a time-series SQL database providing fast analytics, scalability, with automated data management on a proven storage engine.
Locations
-
Primary
335 Madison Ave.
Floor 5, Suite E
New York, New York 10017, US
Employees at Timescale
Updates
-
PostgreSQL has overtaken MySQL and has become the top choice among many developers. 🌟 Explore how it’s more than just an SQL database, with powerful extensions like Timescale, PostGIS, pgvector, and pgvectorscale transforming data management. 👇 🏅 Shout out to Uma Abu for this amazing video!
-
Learn how to install TimescaleDB on AWS to ensure a scalable and high-performance database for your time-series data needs. 💪 TimescaleDB is an open-source database built on PostgreSQL for efficiently managing time-series data, events, and analytics. It supports full SQL, making it user-friendly like traditional relational databases while scaling like NoSQL databases. This guide covers setting up TimescaleDB on AWS. Here's what it covers: 1️⃣ Getting started with AWS 2️⃣ Setting up an EC2 instance 3️⃣ Installing & Using TimescaleDB 4️⃣ Configuring PostgreSQL ...and more! Install TimescaleDB on AWS now and learn more: https://tsdb.co/ts-aws
How to install TimescaleDB on AWS
timescale.com
-
Timescale reposted this
Founder & Host of "The Ravit Show" | LinkedIn Top Voice | Startups Advisor | Gartner Ambassador | Evangelist | Data & AI Community Builder | Influencer Marketing B2B | Marketing & Media
Top 16 Open-Source Databases you should know! Whether you're developing for web, mobile, or enterprise, this list covers the essential databases for all your needs. Highlights: 1. MariaDB - High performance and rich features, ideal for web and enterprise applications. 2. PostgreSQL - Advanced, robust relational database known for extensibility and standards compliance. 3. MongoDB - Flexible, scalable NoSQL database using JSON-like documents. 4. CockroachDB - Distributed SQL database with strong consistency and horizontal scalability. 5. MySQL - Reliable, high-performance relational database widely used in web applications. 6. Redis - In-memory data store for database, cache, and message broker applications. 7. Neo4j - Graph database for storing and querying complex relationships. 8. Couchbase - High-performance NoSQL database combining document and key-value stores. 9. SQLite - Lightweight, serverless SQL database engine for embedded systems and mobile apps. 10. ScyllaDB - High-performance NoSQL database compatible with Cassandra. 11. PrestoDB - Distributed SQL query engine for fast analytics across various data sources. 12. Cassandra - Scalable NoSQL database for high availability and performance across many servers. 13. Apache HBase - Scalable, distributed big data store for real-time read/write access. 14. InfluxDB - Time-series database for fast, high-availability data storage and retrieval. 15. TimescaleDB - Time-series database built on PostgreSQL, optimized for fast ingest and complex queries. 16. ArangoDB - Multi-model database supporting graph, document, and key/value models. Join our Newsletter with 130k+ subscribers — www.theravitshow.com #data #ai #databases #opensource #theravitshow
-
Timescale reposted this
Timescale's pgvectorscale has received phenomenal feedback! 🎉 It’s heartening to see such positive responses from our community and users. Your support and feedback are invaluable to us, and they motivate us to keep innovating and improving. 🐯 Ajay Kulkarni 🐯 Michael Freedman Ramon Guiu A big thank you to everyone who has been part of this journey. We're just getting started, and there's so much more to come. Stay tuned for exciting updates and new features! #VectorSearch #GenerativeAI #DatabaseInnovation #PostgreSQL
𝟳𝟬𝟬+ 𝙨𝙩𝙖𝙧𝙨 𝙛𝙤𝙧 𝙥𝙜𝙫𝙚𝙘𝙩𝙤𝙧𝙨𝙘𝙖𝙡𝙚⭐️ 𝙏𝙝𝙧𝙞𝙡𝙡𝙚𝙙 𝙗𝙮 𝙩𝙝𝙚 𝙛𝙚𝙚𝙙𝙗𝙖𝙘𝙠 𝙬𝙚'𝙫𝙚 𝙧𝙚𝙘𝙚𝙞𝙫𝙚𝙙 𝙖𝙣𝙙 𝙚𝙭𝙘𝙞𝙩𝙚𝙙 𝙩𝙤 𝙠𝙚𝙚𝙥 𝙞𝙢𝙥𝙧𝙤𝙫𝙞𝙣𝙜. 𝙋𝙤𝙨𝙩𝙜𝙧𝙚𝙎𝙌𝙇 🚀 𝗜𝗖𝗠𝗬𝗜: Curious about pgvector and want to learn how pgvectorscale can help you? Here's a handy overview: 🐘 𝗪𝗵𝗮𝘁 𝗶𝘀 𝗽𝗴𝘃𝗲𝗰𝘁𝗼𝗿𝘀𝗰𝗮𝗹𝗲? Pgvectorscale is an open-source PostgreSQL extension that builds on pgvector, enabling greater performance and scalability. By using pgvector and pgvectorscale, developers can build more scalable AI applications, benefiting from higher-performance embedding search and cost-efficient storage. 📈 𝗛𝗼𝘄 𝗱𝗼𝗲𝘀 𝗶𝘁 𝗽𝗲𝗿𝗳𝗼𝗿𝗺? On our benchmark of 50 million Cohere embeddings (768 dimensions each), PostgreSQL with pgvector and pgvectorscale achieves 𝟮𝟴𝘅 𝗹𝗼𝘄𝗲𝗿 𝗽𝟵𝟱 𝗹𝗮𝘁𝗲𝗻𝗰𝘆 and 𝟭𝟲𝘅 𝗵𝗶𝗴𝗵𝗲𝗿 𝗾𝘂𝗲𝗿𝘆 𝘁𝗵𝗿𝗼𝘂𝗴𝗵𝗽𝘂𝘁 compared to Pinecone for approximate nearest neighbor queries at 99 % recall, all at 𝟳𝟱% 𝗹𝗲𝘀𝘀 𝗰𝗼𝘀𝘁 when self-hosted on AWS EC2. We also tested it against Pinecone’s p2 high performance index, see the blog post int he comments for full results (spoiler: It’s just as impressive). 🤔 𝗪𝗵𝘆 𝗱𝗶𝗱 𝘄𝗲 𝗯𝘂𝗶𝗹𝗱 𝗽𝗴𝘃𝗲𝗰𝘁𝗼𝗿𝘀𝗰𝗮𝗹𝗲? Our team at @timescale built pgvectorscale to make PostgreSQL a better database for AI and to challenge the notion that PostgreSQL and pgvector are not performant for vector workloads. ⚙️𝗛𝗼𝘄 𝗱𝗼𝗲𝘀 𝗶𝘁 𝗮𝗰𝗵𝗶𝗲𝘃𝗲 𝘀𝘂𝗰𝗵 𝗴𝗼𝗼𝗱 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲? Pgvectorscale brings specialized data-structures and algorithms for large-scale vector search and storage to PostgreSQL as an extension, including: (1) StreamingDiskANN – a high-performance, cost-efficient vector search index for pgvector data inspired by research at Microsoft, and (2) Statistical Binary Quantization (SBQ), developed by Timescale’s own researchers to improve upon standard binary quantization techniques. 📚𝗟𝗲𝗮𝗿𝗻 𝗺𝗼𝗿𝗲 I've linked the pgvectorscale GitHub and explainer blogs in the comments -- it's a great first step to getting started, whether you already use pgvector or are just curious about vector databases in general. #pgvector #vectordatabase #opensource #postgresql #devtool
-
Timescale reposted this
🐘 Ever wonder what happens under the hood when your PostgreSQL queries slow down? Use EXPLAIN to decipher query plans, understand operation costs, and see a real-world example of query optimization in action in our latest article with Timescale:
Explaining PostgreSQL EXPLAIN | Timescale
timescale.com
-
PostgreSQL is perfect for real-time monitoring and historical analysis, thanks to its robust SQL support. However, writing efficient, readable queries can be challenging. 🤔 In this webinar, learn to write 5 simple, efficient queries to gain deeper insights from your data, whether monitoring real-time changes or analyzing historical patterns. You'll get practical guidance to enhance your monitoring and analytical queries. 👇 During the session, you'll: 1️⃣ Learn essential PostgreSQL functions and build queries for common monitoring scenarios, like calculating deltas and rates. 2️⃣ Use TimescaleDB SQL functions for time-series analysis, such as filling data gaps and examining metrics over time. 3️⃣ Hear best practices and customizable guidance for your projects. ...and more! We'll focus on code and step-by-step demos using a real-world example and time-series dataset. Check it out: https://tsdb.co/pg-monitor
5 (Powerful) PostgreSQL functions for Monitoring and Analytics | Timescale
timescale.com
-
Artificial intelligence has evolved from science fiction to reality, thanks to innovations in generative AI like ChatGPT and Microsoft’s Copilot. To better understand the future of AI, 🐯 Ajay Kulkarni examines its past and offers insights in our latest newsletter. Learn more about the history of AI— its evolution, innovations, impact, and more.👇
A Brief History of AI
Timescale on LinkedIn
-
🌍 Geo-Spatial Analysis with PostgreSQL and PostGIS: Mapping the Future 🗺️ Discover how to leverage the power of PostgreSQL and the PostGIS extension to perform advanced geo-spatial analysis and mapping. PostGIS is a powerful extension for PostgreSQL that adds support for geographic objects, allowing you to store, query, and analyze spatial data. This combination enables you to build location-based applications, perform spatial analysis, and create interactive maps. Code: sql -- Creating a table with a geographic column CREATE TABLE locations ( id SERIAL PRIMARY KEY, name TEXT, geom GEOMETRY(POINT, 4326) ); -- Inserting a new location INSERT INTO locations (name, geom) VALUES ('New York', ST_GeomFromText('POINT(-74.0060 40.7128)', 4326)); #PostgreSQL #PostGIS #GIS #Geospatial #Spatial
-
Public datasets provide valuable insights, but combining them with other data often requires a series of steps to normalize the data. 🔁 💡 Data normalization is the process of organizing data to reduce redundancy and improve data integrity. It typically involves dividing a database into smaller, related tables and defining relationships between them, following rules known as normal forms to help ensure consistent and efficient data storage and retrieval. In a recent project, we combined two public datasets— the San Francisco police incident and NOAA weather database, and encountered two challenges: 1️⃣ Different date formats (yyyy-mm-dd vs. mm/dd/yyyy) made joining data difficult. 2️⃣ Gaps in weather data complicated time-series graphs. Read our latest blog to learn how to solve these issues and tackle other common problems: https://lnkd.in/gqpVef4q
Data Normalization Tips: How to Weave Together Public Datasets
timescale.com