Version 0.9

Stay updated with the latest tech, engineering, and research content

Note: Blog summaries are generated by AI and may sometimes contain errors, irrelevant words or incomplete sentences

How to Get Started with the Weaviate Vector Database on Docker


Docker

AI
Security

Published: 19TH September, 2023

Author: Leonie Monigatti

Summary:
  • Weaviate is an open source vector database that enables modern search capabilities.
  • With Weaviate, you can build advanced LLM applications, next-level search systems, recommendation systems, and more.
  • Learn how to install Weaviates on Docker usingDocker Composes.
  • Weaviate can be used with various modules to integrate directly with inferencing services like OpenAI, Cohere, or Hugging Face.
  • Modules can also be used to extend Weaviate’s core capabilities (e.g., question answering, generative search, etc.) In the final step of the Weaviate Docker Compose configurator, selectDocker Composefor your runtime.
  • You can then download and further customize your Weaviates setup.

Unexpected Tools in the Databricks Marketplace to Supercharge Manufacturing Supply Chains


Databricks

Published: 19TH September, 2023

Author: Databricks

Summary:
  • Supply chains compete, not companies.
  • No two supply chains are identical.
  • The unique combination of products, industries and geographic locat...

Introducing Project-Based Learning in Postman Academy


Postman

AI
Backend

Published: 21ST September, 2023

Author: Som Nath

Summary:
  • Project-Based Learning is a guided approach to building real-world API portfolio projects with the help of Postman.
  • Students will learn how to leverage Postman in a real- world context.

The Power of a Trusted Data Lakehouse: Go Bust or Boom


Databricks

Published: 15TH September, 2023

Author: Databricks

Summary:
  • CNN.com has partnered with Immuta, Alation, and Anomalo to create this article.
  • We are happy to share the content and technical assets from this article with our readers.

How HubSpot Upgraded a Thousand MySQL Clusters at Once


Hubspot

Security
Backend
Data Science
Network

Published: 19TH September, 2023

Author: Olga Shestopalova

Summary:
  • HubSpot runs over a thousand MySQL clusters in each environment.
  • We jumped 9 major versions of Vitess, the clustering software we use for MySQL.
  • This time around our upgrade journey took a year, but our estimate for the next upgrade is only a quarter, given all that we have learned and developed.
  • HubSpot is upgrading from Vitess 5 to 14.
  • The upgrade was gradual and could be easily reverted in case of emergency.
  • The first step of porting over our custom patches took much longer than expected.
  • Vitess 14 is a new version of the company's vtgate and vttablet software.
  • The change is being rolled out like a regular code change, with a canary deploy.
  • The problem is that you’re deploying it out to everyone, so if one cluster has an issue, you have to stop and roll it back for everyone.
  • The upgrade process is completely automated and hands-off.
  • It is meant to be entirely invisible to clients, with zero downtime.
  • It took us a year to roll out this upgrade, from the patching and building tooling phase all the way through completion of rollout to production environments.
  • The actual rollout took only 4 of those months, and we anticipate the next upgrade taking approximately a quarter.
  • Some queries were streaming queries that were inside of a transaction.
  • This led to some confusing behavior that was difficult to debug.
  • Rather than patching Vitess 14 to allow the flawed queries to execute, we removed the streaming queries.

Improve Lakehouse Security Monitoring using System Tables in Databricks Unity Catalog


Databricks

Published: 12TH September, 2023

Author: Databricks

Summary:
  • As the lakehouse becomes increasingly mission-critical to data-forward organizations, so too grows the risk that unexpected events, outages, and security incidents may derail...

Introducing Confluent’s Financial Services Resource Center Your one-stop shop for all things data streaming in finance and banking


Confluent

AI

Published: 15TH September, 2023

Author: Jess Larkin

Summary:
  • Confluent is pleased to announce our Financial Services Data Streaming Resource Center.
  • In the resource center, you can view our on-demand virtual events hub, read white papers, and customer case studies.
  • Join us on September 28 for a webinar with Thrivent and Improving.

AWS Weekly Roundup: C7i Instances, Knowledge Base for Amazon Bedrock, and More (Sept. 18, 2023)


Amazon Web Services

Web
Database
Cloud
AI
DevOps
Backend

Published: 18TH September, 2023

Author: Danilo Poccia

Summary:
  • Last week there was the EMEAAWS HeroesSummit in Munich, an amazing day full of insights and passion.
  • While daylight is getting shorter in the Northern hemisphere, we’ve got two new EC2 instance types optimized for compute and memory.
  • AWS IAM Identity Center Session Duration Increased Up to 90 Days.
  • You now have more flexibility based on your security context and desired end-user experience.
  • You can now generate forms connected to your API.
  • AWS Global Summits, Sept.
  • 26– The last in-person AWS Summit of the year will be held inJohannesburg.
  • CDK Day is a community-led fully virtual event with tracks in English and Spanish.

Solution Accelerator: LLMs for Manufacturing


Databricks

Published: 21ST September, 2023

Author: Databricks

Summary:
  • Large language models (LLMs) have come to dominate the field of transformers.
  • Since the publication of the seminal paper on transformers by Vaswani et.
  • al.
  • from Google, large language models have become the dominant model.

Introducing Postman’s new and improved system proxy


Postman

Published: 15TH September, 2023

Author: Akshay Deo

Summary:
  • Try Postman now.
  • Give it a try.
  • Let us know your feedback in a comment below.

Updates to the Private API Network: new workflows for approval, discovery, and search


Postman

Backend
Algorithms

Published: 15TH September, 2023

Author: Aravind Arunachalam

Summary:
  • Postman’sPrivate API Networkprovides a centralized repository for publishing and distributing all of the APIs within your organization.
  • It also supports discovery and evaluation by enabling users to quickly search for and look up the APIs they’re interested in.

Apache Kafka Message Compression


Confluent

Performance
Hardware

Published: 18TH September, 2023

Author: Dave Troiano

Summary:
  • Kafka's approach to partitioning topics helps achieve this level of scalability.
  • Topic partitions are the main "unit of parallelism" in Kafka.
  • Producers communicate with topic partition leaders.
  • There are five types of compression available: none, nonegzipsnappylz4zstd none means that compression won’t occur.
  • The other settings are different compression algorithms supported by Kafka.
  • In general, lz4 is recommended for performance.

The untapped opportunity at the bottom of your customer funnel


Stripe

Algorithms

Published: 20TH September, 2023

Author: Stripe

Summary:
  • The checkout flow should be clear, concise, and quick to navigate.
  • 60% say they would abandon a website if a purchase took more than two minutes to complete.
  • Mobile orders are rising rapidly as a share of overall ecommerce sales.
  • A responsive checkout automatically resizes based on the most common device screen sizes.
  • Stripe Elements and Payment Links are prebuilt payment user interfaces.
  • These prebuilt UIs include mobile-friendly navigation, autofill, error messages, input masking, simplified compliance, mobile SDK support and more.

Bringing Scalable AI to the Edge with Databricks and Azure DevOps


Databricks

AI

Published: 13TH September, 2023

Author: Databricks

Summary:
  • The opportunity for machine learning and AI in manufacturing is immense.
  • From better alignment of production with consumer demand to improved process control.

Docker Desktop 4.23: Updates to Docker Init, New Configuration Integrity Check, Quick Search Improvements, Performance Enhancements, and More


Docker

Hardware

Published: 12TH September, 2023

Author: Nuno Coracao

Summary:
  • Docker Desktop 4.23 has added support for ASP.NET.
  • VirtioFS is now set to default to provide performance gains for Mac users.
  • Mac users will also see increased network performance using traditional network connections.
  • VirtioFS minimizes file transfer overhead by allowing containers to access files without volume mounts or network shares.
  • We measured performance improvements, decreasing file transfer time from 7:13.21s to 1:4.4s.

From Outages to On-Time Delivery: How Confluent Cloud Transformed a Delivery Company's Data Infrastructure


Confluent

Performance
Database
Cloud
Backend

Published: 19TH September, 2023

Author: Pedro Busko

Summary:
  • Grocery delivery companies need a data infrastructure which seamlessly connects disparate systems.
  • In this blog, we’ll explain how one grocery delivery company guarantees same day delivery with data streaming and Confluent as the backbone of their data infrastructure.
  • We'll outline the challenges they were facing prior to partnering with Confluent, before delving into how their current event-driven architecture is enabling them to Confluent offers a library of over 120 pre-built connectors, 70 of which are fully-managed.
  • Schema Registry is used by this company to ensure the consistency of their data.
  • Confluent supports a number of popular client libraries, including Java, Python, and REST.

Building a Real-Time Service Marketplace with Confluent Cloud


Confluent

Backend
Algorithms

Published: 13TH September, 2023

Author: Arpita Agarwal

Summary:
  • As the world’s population continues to increase, so does the need for new infrastructure, buildings and housing, automatically increasing the demand for tradespeople.
  • The need for efficient and seamless communication between customers and tradespeople has become even higher.
  • This requires innovation as well as seamless orchestration of real-time interactions to optimize ROI and efficiency.
  • The service marketplace allows any customer to post a job for any location.
  • On average, the marketplace gets nearly 50k job requests every day, globally.
  • From job postings, bid placements, quotes, and offers, to status updates and feedback loops, there are millions of events generated every day.
  • One microservice writes GPS data to Kafka, performs stream processing, and sinks to MongoDB using Confluent’s fully managed MongoDB Sink Connector.
  • Truck location is tracked in real time and customers can view this on their mobile devices.

AWS Weekly Roundup: R7iz Instances, Amazon Connect, CloudWatch Logs, and Lots More (Sept. 11, 2023)


Amazon Web Services

Performance
Cloud
DevOps
Backend
Network
Hardware

Published: 11TH September, 2023

Author: Jeff Barr

Summary:
  • R7iz instances are powered by the fastest 4th Generation Intel Xeon Scalable-based (Sapphire Rapids) instances in the cloud.
  • They are available in eight sizes, with 2 to 128 vCPUs and 16 to 1024 GiB of memory, along with generous network and EBS bandwidth.
  • AWS and cloud blogs are a good place to start.
  • Here are a few posts from some of the other AWS blogs that I follow.

Top 5 Best Practices for Building Event-Driven Architectures Using Confluent and AWS Lambda


Confluent

Cloud
Backend

Published: 19TH September, 2023

Author: Ashish Chhabria

Summary:
  • Keep long-lived connections to Kafka (initialize outside the handler) for better efficiency.
  • If a Lambda instance is not available to respond to an invocation, a new instance must be created.
  • Confluent offers a fully managed Schema Registry for each Confluent Cloud environment.
  • Broker-Side Schema Validation enforces strict adherence to data schemas.

Meta Quest 2: Defense through offense


Facebook

Published: 12TH September, 2023

Author: Facebook

Summary:
  • The vulnerability that we chose for exploitation never made it into a production release, but it was introduced in a code commit in 2021.
  • The vulnerability gives an attacker the ability to perform arbitrary 8-byte-long corruptions at arbitrary offsets (32-bit limit) Any VR application can trigger this vulnerability by sending a specially craftedSetPerformanceIdealFeatureStateRuntime IPC message.
  • Bypassing Address Space Layout Randomization (ASLR) is a common problem in modern exploitation.
  • We looked for a function pointer that we knew could be invoked indirectly from an RPC command.
  • To hijack control flow, we first used the write primitive to corrupt thevkGetPhysicalDeviceImageFormatPropertiesfunction pointer.
  • This eventually allowed us to control the program counter: We hijack the VR Runtime process of an already installed untrusted application.
  • Our approach was to get the VR runtime process to load a shared library using dlopen from our application APK.
  • When the library loaded the library, our payload would be executed automatically as part of the loaded library’s initialization function.
  • Corruptions.
  • 3f d6 blr x2 constuint64_t vkGetPhysicalDeviceImageFormatPropertiesOffset = Vulkan loader offset + 0x68; constuint 64_t FirstGadget = ModuleMap.at("eglSubDriverAndroid.so") + 0xb3'ac; Corruptions emplace_back(vk.

New – Amazon EC2 R7a Instances Powered By 4th Gen AMD EPYC Processors for Memory Optimized Workloads


Amazon Web Services

Cloud

Published: 11TH September, 2023

Author: Channy Yun

Summary:
  • You can send feedback toec2-amd-customer-feedback@amazon.com,AWS re:Post for EC2 or through your usual AWS Support contacts.

DockerCon Workshops: What to expect


Docker

Web

Published: 14TH September, 2023

Author: Michael Irwin

Summary:
  • Docker for Web Development is a hands-on workshop on how to use Docker to speed up your web development stack.
  • You'll learn how to connect multiple services together and build a development environment that will require no installation and configuration (beyond Docker Desktop).

AI and Data Streaming: Essential Resources for Developers


Confluent

AI
Backend

Published: 20TH September, 2023

Author: Erin Junio

Summary:
  • GPT-4 + Streaming Data = Real-Time Generative AI (webinar and blog) In this webinar and blog post, you’ll discover how to harness ChatGPT to build real-world use cases and integrations.

Best Practices for LLM Evaluation of RAG Applications


Databricks

Published: 12TH September, 2023

Author: Databricks

Summary:
  • Chatbots are the most widely adopted use case for leveraging the powerful chat and reasoning capabilities of large language models (LLM) The retrieval...

GitHub Availability Report: August 2023


Github

DevOps
Backend
Hardware

Published: 13TH September, 2023

Author: Jakub Oleksy

Summary:
  • In August, we experienced two incidents that resulted in degraded performance across GitHub services.
  • The delays were caused by a significant and sustained spike in webhook deliveries.

The Data + AI Trifecta: People, Process, and Platform


Databricks

Published: 19TH September, 2023

Author: Databricks

Summary:
  • Business leaders are all asking the same questions: How do we accelerate our company’s plan for data and AI? How can we take a...

New – NVMe Reservations for Amazon Elastic Block Store io2 Volumes


Amazon Web Services

Published: 19TH September, 2023

Author: Jeff Barr

Summary:
  • CNN.com will feature iReporter photos in a weekly Travel Snapshots gallery.
  • Please submit your best shots of the U.S.
  • for next week.
  • Visit CNN.com/Travel next Wednesday for a new gallery of snapshots.

Dataflow Programming with Apache Flink and Apache Kafka


Confluent

Backend

Published: 14TH September, 2023

Author: Wade Waldron

Summary:
  • Recently, I got my hands dirty working with Apache Flink®.
  • To share what I learned, I created the Building Flink Applications in Java course.
  • I also wrote a blog post to walk through an example of how to do dataflow programming with Flink.
  • The full code for this blog post can be found in the GitHub repository.
  • The ClickStreamAnalytics application is to produce something like the following.
  • This record counts the total records, successes, and failures for each request.
  • Because it is an infinite stream of data, I won't be able to do a final count.
  • Instead, I will calculate these numbers once per minute.
  • Every Flink job starts with an entry point.
  • This is a standard Java main method so it's simple to create.
  • I'll eventually be producing records for Kafka as well.
  • I need to define a watermark strategy.
  • Kafka records include a timestamp by default, so I will default to the timestamp from Kafka.
  • I will use the forMonotonousTimestamps strategy and define a TimestampAssigner.
  • I also include a name for the source clickstream-source which will make it easier to identify in logs and the Flink dashboard.
  • Group By Keys and Apply Windows now that I have defined my function, it's time to put everything together.
  • I am using the keyBy operation to group my records according to the request.
  • Flink does a shuffle so that each of these groups can be processed as a different task.
  • After keyBy is complete, I use a TumblingEventTimeWindow of one minute to aggregate the.

Introducing Postman Live Insights: faster, better API debugging


Postman

Web
Cloud
Backend

Published: 15TH September, 2023

Author: Jean Yang

Summary:
  • With the rise of APIs, it’s become far easier for developers to build systems that they themselves do not understand the full inner workings of.
  • With Postman Live Insights, our vision is to help guide a developer with little previous experience—in both the system being monitored and the Postman API Platform—to find and fix issues.
  • Postman Live Insights will help developers find and fix issues within 30 minutes of signing up for an account.
  • The Live Collections Agent passively watches your API traffic to automatically populate a Postman Collection with API endpoints.

Women in Tech: 5 Key Tips for Building a Lasting Career at Salesforce


Salesforce

Published: 19TH September, 2023

Author: Scott Nyberg

Summary:
  • Women make up justone in four C-Suite positions at Salesforce.
  • The Women’s Leadership Program focuses on leadership skills, financial and business acumen, and network building.
  • The Leading Technology and Product Program provides exclusive access to executives.
  • From owning your values to building connections to leveraging relationships, these tips will help you stay true to your values.

Postman Flows updates, plus a peek into the AI-powered future


Postman

AI
Backend
Algorithms

Published: 15TH September, 2023

Author: Rodric Rabbah

Summary:
  • Flows can now run longer.
  • We are extending the 30-minute limit on Flow durations.
  • This makes it possible to complete lengthy tasks, such as data migrations.
  • We also simplified how we measure and bill for Flows usage by introducing step counts.
  • Flows is an API-first application development tool.
  • You can easily use APIs as building blocks and directly manipulate the data that flows between them.
  • Your Flows journey starts with a single API request.

Introducing MLflow 2.7 with new LLMOps capabilities


Databricks

Published: 14TH September, 2023

Author: Databricks

Summary:
  • MLflow 2.7.7 adds support for LLMOps.
  • The update is part of MLflow 2’s support for prompt engineering.

Announcing the Postman Secret Scanner for team workspaces


Postman

Published: 15TH September, 2023

Author: Suhas Gaikwad

Summary:
  • Postman Secret Scanner is now available for bothteam and public workspaces.
  • This update will help teams monitor secrets inside their team workspace.
  • Users with Admin, Super Admin, and Workspace Admin roles can access SecretScanner findings.

McAfee Moves from Monolith to Microservices on Confluent Cloud


Confluent

Mobile
Cloud
Backend
Algorithms

Published: 22ND September, 2023

Author: Content Marketing Team

Summary:
  • McAfee has millions of endpoints that provide business insights, such as automatically alerting a customer to a breach and walking them through the corrective action.
  • As cybercrime has gotten exponentially more complex, so has McAfee’s data landscape.
  • Confluent has become a critical tool.
  • McAfee wanted to move toward a standardized, centralized nervous system that wouldn’t rely on open source Kafka.
  • Confluent was the ideal solution.
  • Using Confluent Cloud to build streaming data pipelines enabled a wide range of use cases within the organization.
  • The way that Confluent works for McAfee today is sophisticated and seamless.
  • Various systems within the architectural stack produce events.
  • These events are then consumed by downstream systems through streaming data pipelines.

Building for Inclusivity: The Technical Blueprint of Pinterest’s Multidimensional Diversification


Pinterest

AI

Published: 20TH September, 2023

Author: Pinterest Engineering

Summary:
  • Body type diversification has been rolled out on search and closeup recommendations within the United States, New Zealand, United Kingdom, Ireland, Canada, and Australia.
  • This shift towards inclusive and saveable content leads to increases in relevance, engagement, and user value.

Orchestrating Data Analytics with Databricks Workflows


Databricks

Published: 20TH September, 2023

Author: Databricks

Summary:
  • Data analysts play a crucial role in extracting insights from data and presenting it in a meaningful way.
  • Many...

Apache Spark ❤️ Apache DataSketches: New Sketch-Based Approximate Distinct Counting


Databricks

Database

Published: 21ST September, 2023

Author: Databricks

Summary:
  • Apache Spark has a set of advanced SQL functions that leverage the HyperLogLog algorithm.
  • These functions can be used to write complex queries to the Spark database.

Preview – Connect Foundation Models to Your Company Data Sources with Agents for Amazon Bedrock


Amazon Web Services

Cloud
AI

Published: 13TH September, 2023

Author: Antje Barth

Summary:
  • Amazon Bedrock is not creating a vector database on your behalf.
  • You must create a new, empty vector database from the list of supported options.
  • This vector database will need to be for exclusive use with Amazon Bedrock.
  • Once you’ve finished the agent configuration, you can test your agent and how it’s using the added knowledge base.
  • Note how the agent provides a source attribution for information pulled from knowledge bases.

How Edmunds builds a blueprint for generative AI


Databricks

Published: 20TH September, 2023

Author: Databricks

Summary:
  • Long envisioned as a key milestone in computing, we've reached a milestone in cloud computing.
  • This blog post is in collaboration with Greg Rokita, AVP of Technology at Edmunds.

Postman now supports MQTT


Postman

Security
Hardware

Published: 15TH September, 2023

Author: Jonathan Haviv

Summary:
  • MQTT (Message Queuing Telemetry Transport) is a lightweight communication protocol designed for the Internet of Things (IoT) MQTT is ideal for IoT applications like home automation, industrial monitoring, weather stations, and vehicle telemetry.

How IKEA Retail Standardizes Docker Images for Efficient Machine Learning Model Deployment


Docker

Cloud
AI
DevOps
Security
Backend
Data Science

Published: 20TH September, 2023

Author: Karan Honavar

Summary:
  • Docker and Seldon-Core are the blueprint for a revolution in ML deployment.
  • This synergy is no mere technical alliance; it’s a transformative collaboration.
  • Unlock efficiency, scalability, and a strategic advantage in ML production.
  • The Docker model class is an essential architectural element that will later reside within the Docker image.
  • We must shape this class, adhering to the exacting standards proposed bySeldon.
  • The architecture is designed to standardize and streamline the process of deploying and deploying an ML model.
  • The DockerModelclass encapsulates these four essential aspects: loading, predicting, monitoring and feedback.
  • Docker plays a vital role in MLOps, providing a standardized environment that streamlines the transition from development to production.
  • Docker containers encapsulate everything an ML model needs to run, easing the deployment process.
  • Using Docker, data scientists and engineers can ensure that models and their dependencies remain consistent across different stages of development.
  • Dockerfile is like sketching the architectural plan of a building.
  • It outlines the instructions for creating a Docker image to run the application in a coherent, isolated environment.
  • This design ensures that the entire model is treated as a cohesive entity.
  • Docker Desktop is recommended for this task.
  • Docker Desktop provides a graphical user interface that simplifies the process of building, running, and managing Docker containers.
  • The prediction of our model will be similar to this dictionary.
  • We will use a POST request to get the prediction of the data payload.

The AR Measuring Box: Etsy's answer to Big Tape Measure


Etsy

Mobile
Backend
Coding
Algorithms

Published: 18TH September, 2023

Author: Pedro Michel

Summary:
  • Etsy introduced a new feature in its iOS app that could place Etsy sellers' artwork on a user's wall using Apple's Augmented Reality (AR) tools.
  • It let them visualize how a piece would look in their space, and even gave them an idea of its size options.
  • When we launched the feature as a beta, it was only available in "wall art"-related categories Etsy's AR app is now able to show users size options for items that have them.
  • The app uses a server-side algorithm to parse out dimensions from variations in listings.
  • We sanitized free-form text input fields that might contain dimensions.
  • Users drag or tap the measuring box to move it around, and it can rotate on all axes.
  • Users get valuable information about how an item will fit in their space.
  • In addition to ensuring the box would be scaled correctly, we built inscene occlusion.
  • Taking advantage of occlusion gives our users an especially realistic AR experience.
  • The project took a 2D concept, our Wall View experience, and literally extended it into 3-dimensional space.

New – Add Your Swift Packages to AWS CodeArtifact


Amazon Web Services

Mobile
Cloud
Security
Backend
Coding

Published: 20TH September, 2023

Author: Sébastien Stormacq

Summary:
  • AWS CodeArtifact is a managed service that allows you to safely distribute packages to your internal teams of developers.
  • Swift packages are a popular way to package and distribute reusable Swift code elements.
  • To learn how to create your own Swift package, you can followthis tutorial.
  • In this demo, I show you how I prepare my environment, upload the package to the repository, and use this specific package build as a dependency for my project.
  • Imagine I’m working on an iOS application that usesAmazon DynamoDBas a database.
  • My application embeds theAWS SDK for Swiftas a dependency.
  • To comply with my organization policies, the application must use CodeArtifact costs for Swift packages are the same as for the other package formats already supported.
  • CodeArtifact billing depends on three metrics: the storage (measured in GB per month), the number of the packages, and the version.

How “Customer Engineering” Enhanced “Customer Experience” at Razorpay


Razorpay

Published: 20TH September, 2023

Author: Vikrant Saini | Technology Operations Leader

Summary:
  • The story started with customer sentiment about the service.
  • We measure the outcomes of customer engineering teams by CSAT, NPS, and age of the customer with the brand.

A Costa Rica journey with a Twist of Pura Vida


Databricks

Published: 18TH September, 2023

Author: Databricks

Summary:
  • Costa Rica is known for several things, both culturally and ecologically.
  • Among those are biodiversity, coffee, Pura Vida, and most recently a rapidly...

Reddit’s LLM text model for Ads Safety


Reddit

Performance
AI
Backend
Data Science
Hardware
Algorithms

Published: 11TH September, 2023

Author: /u/SussexPondPudding

Summary:
  • Reddit has launched its first in-house Large Language Model (LLM) in the Ads Safety space.
  • The model identifies “X” text content (sexually explicit text) and “V” content (violent text) The model tags posts with these labels and helps our brand safety system know where to display ads responsibly.
  • The LLM is able to understand natural language in a way that previously could not be done.
  • based on the surrounding context, they have the ability to understand posts with X or V tags.
  • The GBT model utilized a combination of internal and external signals.
  • We believe this system can help us scale to 1 million messages per second and can continue to serve our needs as we expand coverage to all text on Reddit.
  • Reddit posts can be very long (up to ~40k characters) This length of text far exceeds the max token length of our RoBERTa based model which is 512 tokens.
  • We can either truncate the text (cut off at) or break the text into pieces and run the model on each piece.
  • Truncation allows running the model relatively fast, but we may lose a Pytorch allows for multiple CPU threads during model inferenceto take advantage of multiple CPU cores.
  • Tuning the number of threads to the hardware you are serving your model on can reduce the model latency due to increasing parallelism.
  • Exporting our model to ONNX format reduced our model latency by ~30% compared to the base pytorch implementation.
  • ONNX frameworks work better without batching the inputs.
  • This is likely due to reduced tensor size during computation since padding would not be required.
  • We’d like to extend our model’s coverage to other types of text.

Protecting Reddit Users in Real Time at Scale


Reddit

Web
Performance
Mobile
Backend
Algorithms

Published: 18TH September, 2023

Author: /u/sassyshalimar

Summary:
  • In 2016, Reddit developed a rules-engine, Rule-Executor-V1 (REV1), to curb policy-violating content on the site in real time.
  • After two years of hard-work, we’ve migrated all workloads from REV1 to our new system, REV2, and have deprecated the old V1 infrastructure.
  • In REV1, users could create a rule, select various input Kafka topics for the rule to read from, and then implement the actual Lua rule logic from the browser itself.Whenever a rule was modified via the UI, a corresponding update would be sent to ZooKeeper.
  • REV2 has a Flink Stateful Functionsstreaming layer and a Baseplate application that executes Lua REV2’s rule configuration and logic is primarily code-based and version-controlled.
  • We no longer use ZooKeeper for rule storage and instead use Github and S3 (for fast rule updates, discussed later) Flink and Baseplate deployments run on Kubernetes (K8s) With REV2, we wanted a way for the Safety Operations team to be able to run rules against production data streams in a sandboxed environment.
  • We accomplished this by setting up a separate K8s staging deployment that triggers on updates to rules that have their “staging” flag set to “true” This deployment writes actions to special staging topics that are uncons REV2 has done well to curb policy-violating content at scale, but we are constantly striving to improve the system.
  • Some areas that we’d like to improve are simplifying our deployment process and reducing load on Flink.

Microcks joins the CNCF as a Sandbox project


Postman

Published: 18TH September, 2023

Author: Ankit

Summary:
  • Microcks, an open source API mocking and testing tool, has been accepted as a Sandbox project by theCloud Native Computing Foundation.

Introducing Apache Spark™ 3.5


Databricks

Backend

Published: 15TH September, 2023

Author: Databricks

Summary:
  • Apache Spark™ 3.5 is available on Databricks.
  • It is available as part ofDatabricks Runtime 14.0.
  • We extend our s...

US Foods Builds Digital-First Ecommerce Platform with Data Streaming


Confluent

Database
DevOps
Backend

Published: 21ST September, 2023

Author: Zion Samuel

Summary:
  • US Foods is a food distribution company that supports companies across all industries and sectors.
  • The company has been using data streaming to drive its event-driven vision for digital food services and future-proof the organization's data stack.
  • Brett: US Foods is trying to transform into a “digital food company.” He says the company is challenging itself to build systems that can holistically scale on demand.
  • US Foods has two mainframe systems that handle core, backend data processing.
  • Brett: I think where Confluent really shines is being able to connect different downstream systems and help us get that event streaming architecture built out.
  • Leveraging Kafka truly as a service removes so much labor from my team and makes our integration so much easier.

What are HTTP status codes?


Postman

Published: 20TH September, 2023

Author: The Postman Team

Summary:
  • HTTP status codes are three-digit codes that indicate the outcome of an API request.
  • They are included in the API’s response to theAPI client, and they include important information.
  • Anyone who works with web-based technologies should have a solid understanding of these codes.
  • 500 Internal Server Error: This generic error code indicates the server encountered an unexpected condition that prevented it from fulfilling the request.502 Bad Gateway: This status code indicates that a server acting as a gateway or proxy received an invalid response from an upstream server.503 Service Unavailable: This is often seen during periods of increased traffic or when the server is undergoing maintenance.404 Not Found: This.

Announcing the Harness the Weather Hackathon winner


Postman

Published: 18TH September, 2023

Author: Kyle Calica

Summary:
  • The Harness the Weather Hackathon has announced the winner.
  • The AgroSoil team won with their groundbreaking project, “Agrosecurity” The project is meant to fortify the resilience of farmers in a world increasingly besieged by climate change.

IoT Data Integration for Real-Time Processing with Confluent Cloud and MQTT


Confluent

Published: 15TH September, 2023

Author: Brennan Lill

Summary:
  • Kai Waehner’s blog about Confluent Cloud and Waterstream is a great place to start and includes many publicly referenceable companies using Kafka and IoT.
  • Visit the IoT solutions page for related use cases.

Last Mile Data Processing with Ray


Pinterest

Performance
AI
Hardware

Published: 12TH September, 2023

Author: Pinterest Engineering

Summary:
  • Machine Learning plays a crucial role in Pinterest's mission to deliver high-quality inspiration to our 460 million monthly active users.
  • Hundreds of ML engineers iteratively improve recommendation engines that power Pinterest, processing petabytes of data and training thousands of models using hundreds of GPUs.
  • We found out that in some cases, it takes several weeks for an ML engineer to train a model with a new dataset variation using workflows.
  • ML engineers have to write new jobs in scala / PySpark and test them.
  • They have to integrate these jobs with workflow systems, test them at scale, tune them, and release into production.
  • Ray Data is a distributed data processing library built on top of Ray.
  • It supports a wide variety of data sources and common data processing operators.
  • One of the key breakthrough functionalities we saw from Ray data is itsstreaming execution capability.
  • This allows us to concurrently transform data and train at the same time.
  • This post only touches on a small portion of our journey in Pinterest with Ray.
  • It marks the beginning of the “Ray @ Pinterest’ blog post series.
  • Spanning multiple parts, this series will cover the different facets of utilizing Ray at Pinterest.
  • Stay tuned for our upcoming posts!.

New – Amazon EC2 M2 Pro Mac Instances Built on Apple Silicon M2 Pro Mac Mini Computers


Amazon Web Services

Mobile
Cloud
Backend

Published: 19TH September, 2023

Author: Channy Yun

Summary:
  • EC2 M2 Pro Mac instances deliver up to 35 percent faster performance over the existing M1 Mac instances when building and testing applications for Apple platforms.
  • EC2 Mac instances support macOS Ventura (version 13.2 or later) as AMIs.
  • Customers have reported up to 4x reduction in build times.
  • EC2 M2 Pro Mac instances support Dedicated Host tenancy with a minimum host allocation duration of 24 hours.
  • Developers can integrate the latest macOS features into their applications and test existing applications for compatibility.

Slack Behind the Scenes: Overcoming Key Challenges to Craft a Seamless Mobile App


Salesforce

Backend

Published: 12TH September, 2023

Author: Scott Nyberg

Summary:
  • Tracy Stampfli is a Principal Software Engineer for Slack at Salesforce.
  • Her team addresses key engineering challenges, drives feature development, and continuously improves the app’s performance.
  • Tracy Stampfli is a Principal Software Engineer for Slack at Salesforce.
  • Her team addresses key engineering challenges, drives feature development, and continuously improves the app’s performance.
  • Slack's infrastructure team plays a critical role in managing extreme cases of scale.
  • This could include supporting teams using 80,000 custom emoji or users in 20,000 channels.
  • Tracy shares why Slack’s engineering culture is unique.
  • A primary risk is the high stakes associated with making changes to an app.
  • To deal with these edge cases, we perform hotfixes.
  • Tracy works behind the scenes on Slack’s mobile infrastructure team.
  • Her team addresses key engineering challenges, drives feature development, and continuously improves the app's performance.
  • Tracy Stampfli is a Principal Software Engineer for Slack at Salesforce.
  • Her team addresses key engineering challenges, drives feature development, and continuously improves the app’s performance.
  • Tracy shares why Slack’s engineering culture is unique.
  • Introducing major changes to our codebase.
  • Reducing legacy technical debt.
  • Launching a new feature architecture on iOS.
  • Tracy shares why Slack’s engineering culture is unique.
  • A primary risk is the high stakes associated with making changes to an app.
  • We prevent this by leveraging strategies like feature flagging or change toggling.
  • Our infrastructure team plays a critical role in managing extreme cases of scale to satisfy increasing demands of our customers.
  • The mobile engineering team has expanded to over 120 engineers, which introduced scaling challenges of managing a large codebase.
  • Slack’s infrastructure team plays a critical role in managing extreme cases of scale.
  • The mobile engineering team has expanded to over 120 engineers, which introduced scaling challenges of managing a large codebase.
  • Our infrastructure team plays a critical role in managing extreme cases of scale to satisfy increasing demands of our customers.
  • This could include supporting teams using 80,000 custom emoji or users in 20,000 channels.

Manage publicly exposed Postman API keys


Postman

Published: 15TH September, 2023

Author: Suhas Gaikwad

Summary:
  • Manage publicly exposed API keys in Postman.
  • Postman API keys are exposed in public elements, such as public workspaces and documentation.
  • Super Admin and Admin users can revoke the exposed keys manually or by enabling a setting to automatically revoke them.

Crossing Bridges: Reporting on NYC taxi data with RStudio and Databricks


Databricks

Published: 13TH September, 2023

Author: Databricks

Summary:
  • Posit's RStudio Desktop and Databricks allow you to analyze data with dplyr.

Congratulations to Summer Hacakthon Winners!


Databricks

Published: 22ND September, 2023

Author: Databricks

Summary:
  • Databricks launched Dolly 2.0: the world's first truly open instruction-tuned Large Language Model (LLM).

Introducing the Support of Lateral Column Alias


Databricks

Database

Published: 19TH September, 2023

Author: Databricks

Summary:
  • Lateral Column Alias (LCA) is a new feature in Apache Spark and Databricks.
  • This feature...

Meet Postbot’s new avatar


Postman

Web
AI
Backend

Published: 15TH September, 2023

Author: Abhijit Kane

Summary:
  • Postbot is Postman’s new AI assistant.
  • Postbot has helped over 50,000 users be more productive across their API workflows.
  • The latest refresh of Postbot now offers a consistent, conversational interface.

Stepping up marketing for advertisers: Scalable lookalike audience


Grab

Performance
Backend
Algorithms

Published: 22ND September, 2023

Author: Grab

Summary:
  • The advertising industry is constantly evolving, driven by advancements in technology and changes in consumer behaviour.
  • One of the key challenges in this industry is reaching the right audience.
  • This is where the concept of a lookalike audience comes into play.
  • By identifying and targeting individuals who share similar characteristics, businesses can significantly improve the effectiveness of their advertising campaigns.
  • Our solution comprises two main components: real-time lookalike audience retrieving and Embedding-based audience creation and updating.
  • To minimise costs, we leverage the passenger embeddings that are also utilised by other projects within Grab, beyond advertising.
  • Users can see advertisements targeting a new audience within 15 mins after the advertiser creates a campaign.
  • This new system doubled the impressions and clicks, while also improving the CTR, conversion rate, and return on investment.
  • Costs for generating lookalike audiences decreased by 98%.

🧑🏻‍💻 xqbuilds

🌙

Made with 💖 by XQ. ©2023 XQ Builds