Create a data lake of clickstream data
In this tutorial, you will learn how to build a data lake of website interaction events (clickstream data), using Pipelines.
Data lakes are a way to store large volumes of raw data in an object storage service such as R2. You can run queries over a data lake to analyze the raw events.
For this tutorial, you will build a landing page for an e-commerce website. Users can click on the website, to view products or add them to the cart. When a user clicks on buttons on the page, events will be sent to a pipeline. These events are sent from the client-side (directly from the user's browser). Your pipeline will automatically batch the ingested data, build output files, and deliver them to an R2 bucket to build your data lake.
- Install
Node.js
โ.
Node.js version manager
Use a Node version manager like Volta โ or
nvm โ to avoid permission issues and change
Node.js versions. Wrangler, discussed
later in this guide, requires a Node version of 16.17.0
or later.
You will create a new Worker project that will use Static Assets to serve the HTML file. While you can use any front-end framework, this tutorial uses plain HTML and JavaScript to keep things simple. If you are interested in learning how to build and deploy a web application on Workers with Static Assets, you can refer to the Frameworks documentation.
Create a new Worker project by running the following commands:
npm create cloudflare@latest -- e-commerce-pipelines-client-side
pnpm create cloudflare@latest e-commerce-pipelines-client-side
yarn create cloudflare e-commerce-pipelines-client-side
For setup, select the following options:
- For What would you like to start with?, choose
Hello World Starter
. - For Which template would you like to use?, choose
SSR / full-stack app
. - For Which language do you want to use?, choose
TypeScript
. - For Do you want to use git for version control?, choose
Yes
. - For Do you want to deploy your application?, choose
No
(we will be making some changes before deploying).
Navigate to the e-commerce-pipelines-client-side
directory:
cd e-commerce-pipelines-client-side
Using Static Assets, you can serve the frontend of your application from your Worker. The above step creates a new Worker project with a default public/index.html
file. Update the public/index.html
file with the following HTML code:
Select to view the HTML code
<!DOCTYPE html><html> <head> <meta charset="utf-8" /> <title>E-commerce Store</title> <script src="https://cdn.tailwindcss.com"></script> </head> <body> <nav class="bg-gray-800 text-white p-4"> <div class="container mx-auto flex justify-between items-center"> <a href="/" class="text-xl font-bold"> E-Commerce Demo </a> <div class="space-x-4 text-gray-800"> <a href="#"> <button class="border border-input bg-white h-10 px-4 py-2 rounded-md">Cart</button> </a> <a href="#"> <button class="border border-input bg-white h-10 px-4 py-2 rounded-md">Login</button> </a> <a href="#"> <button class="border border-input bg-white h-10 px-4 py-2 rounded-md">Signup</button> </a> </div> </div> </nav> <div class="container mx-auto px-4 py-8"> <h1 class="text-3xl font-bold mb-6">Our Products</h1> <div class="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-6" id="products"> <!-- This section repeats for each product -->
<!-- End of product section --> </div> </div>
<script> // demo products const products = [ { id: 1, name: 'Smartphone X', desc: 'Latest model with advanced features', cost: 799, }, { id: 2, name: 'Laptop Pro', desc: 'High-performance laptop for professionals', cost: 1299, }, { id: 3, name: 'Wireless Earbuds', desc: 'True wireless earbuds with noise cancellation', cost: 149, }, { id: 4, name: 'Smart Watch', desc: 'Fitness tracker and smartwatch combo', cost: 199, }, { id: 5, name: '4K TV', desc: 'Ultra HD smart TV with HDR', cost: 599, }, { id: 6, name: 'Gaming Console', desc: 'Next-gen gaming system', cost: 499, }, ];
// function to render products function renderProducts() { console.log('Rendering products...'); const productContainer = document.getElementById('products'); productContainer.innerHTML = ''; // Clear existing content products.forEach((product) => { const productElement = document.createElement('div'); productElement.classList.add('rounded-lg', 'border', 'bg-card', 'text-card-foreground', 'shadow-sm'); productElement.innerHTML = ` <div class="flex flex-col space-y-1.5 p-6"> <h2 class="text-2xl font-semibold leading-none tracking-tight">${product.name}</h2> </div> <div class="p-6 pt-0"> <p>${product.desc}</p> <p class="font-bold mt-2">$${product.cost}</p> </div> <div class="flex items-center p-6 pt-0 flex justify-between"> <button class="border px-4 py-2 rounded-md" onclick="handleClick('product_view', ${product.id})" name="">View Details</button> <button class="border px-4 py-2 rounded-md" onclick="handleClick('add_to_cart', ${product.id})">Add to Cart</button> </div> `; productContainer.appendChild(productElement); }); } renderProducts();
// function to handle click events async function handleClick(action, id) { console.log(`Clicked ${action} for product with id ${id}`); } </script> </body>
</html>