All Databases
fifa-world-cup-historical-elo-ratings Public
2026 FIFA World Cup — Historical Elo Ratings
Size 576K
Tables 1
Columns 25
Rows 4,683
fide-world-chess-ratings-201k-players Public
FIDE World Chess Ratings 201K Players
Size 608K
Tables 1
Columns 14
Rows 23,500
Gen-Z_Social_Media_Usage Public
The Gen-Z Social Media Usage Dataset is a large-scale, synthetic yet behaviorally realistic dataset designed to model how individuals aged 13–27 interact with social media platforms in the modern digital ecosystem. The data is generated using statistically grounded distributions and incorporates realistic correlations between variables such as screen time, platform preference, and mental well-being. It is intended for research, machine learning modeling, behavioral analysis, and educational purposes.
Size 127.14M
Tables 1
Columns 14
Rows 1,000,000
Sentiment140-1.6-Million-Tweets Public
1.6 Million Tweets for Real-World Sentiment Analysis
Size 62.09M
Tables 1
Columns 9
Rows 332,764
TMDB-5000-Movie Public
Description: This dataset contains metadata for approximately 5,000 movies collected from the TMDb API, including movie titles, cast, crew, genres, keywords, budgets, revenues, release dates, ratings, popularity scores, production companies, and plot overviews. It is widely used for movie recommendation systems, sentiment analysis, exploratory data analysis, and predictive modeling in entertainment analytics.
Size 0B
Tables 0
Columns 0
Rows 0
African-Snake-Displacement-Risk Public
Description: This dataset contains geographic and environmental risk indicators related to snake presence and displacement across African regions. It is designed for analyzing ecological movement patterns, habitat disruption, and risk assessment using environmental and spatial features.
Size 317.72M
Tables 1
Columns 25
Rows 8,000
Food-com-Recipes-and-Interactions Public
This dataset contains a large collection of food recipes and user interaction data from Food.com, spanning approximately 18 years of activity. It includes structured recipe metadata such as recipe ID, name, ingredients, cooking steps, nutrition information, and tags, along with extensive user interaction data including ratings, reviews, and user-recipe engagement history.
The dataset is widely used for recommendation systems, natural language processing, and user behavior modeling. It supports tasks such as personalized recipe recommendation, rating prediction, ingredient-based retrieval, and text-based sentiment analysis of reviews. The data is split into multiple CSV files representing recipes and user interactions, making it suitable for both relational analysis and machine learning pipelines.
Size 233.23M
Tables 1
Columns 45
Rows 178,265
Job-Salary Public
This dataset contains approximately 250,000 synthetic job records designed for salary prediction and workforce analytics. It includes structured features such as job title, years of experience, education level, industry, company size, location, remote work status, number of skills, certifications, and other HR-related attributes. The target variable is annual salary, making it suitable for regression-based machine learning tasks.
Size 28.06M
Tables 1
Columns 12
Rows 250,000
Global-Stock-Market Public
This dataset contains structured records of user engagement and reactions to Donald Trump’s Truth Social posts related to the Iran war and geopolitical conflict discourse. It typically includes post content, timestamps, engagement metrics (likes, comments, shares/re-truths depending on extraction), and sometimes user-level reaction data or sentiment indicators derived from replies.
Size 260.25M
Tables 1
Columns 49
Rows 377,624
Jigsaw-Snapshot Public
This dataset is a reduced snapshot of the Jigsaw Unintended Bias in Toxicity Classification dataset, created for faster experimentation in NLP workflows. It contains Wikipedia talk page comments labeled for toxicity using a simplified binary scheme derived from continuous toxicity scores. Comments with high toxicity scores are labeled as toxic, while low-score comments are labeled as non-toxic, and mid-range samples are removed. The dataset is commonly used for training lightweight models in text classification, content moderation, and sentiment/toxicity detection tasks. It is designed for prototyping rather than full-scale model training.
Size 16K
Tables 1
Columns 1
Rows 0
Zomato-Bangalore-Restaurants Public
This dataset contains detailed information on approximately 12,000+ to 50,000+ restaurants in Bengaluru, India, depending on the processed version. It includes rich restaurant attributes such as restaurant name, location, cuisine types, approximate cost for two people, online ordering availability, table booking options, ratings, votes, and detailed review data. The dataset is widely used for exploratory data analysis, recommendation systems, and business intelligence in the food and hospitality sector. It enables insights into restaurant distribution, customer preferences, pricing trends, and locality-based food patterns across Bangalore.
Size 670.16M
Tables 1
Columns 31
Rows 51,716
House-Price Public
This dataset contains structured real estate information used for house price prediction tasks. It typically includes features such as property area, number of bedrooms and bathrooms, location attributes, furnishing status, parking availability, and other housing characteristics that influence market price. The dataset is designed for regression modeling, exploratory data analysis, and feature engineering in real estate analytics. It is commonly used for training machine learning models like linear regression, decision trees, and ensemble methods to estimate property values based on input features.
Size 119.14M
Tables 1
Columns 24
Rows 187,531
Synthetic-Gym-Membership-Churn Public
Synthetic Gym Membership Churn Dataset (1M Rows) — Done
URL: https://www.kaggle.com/datasets/eminkarltepe/synthetic-gym-membership-churn-dataset-1m-rows
Tagline: Large-scale synthetic gym membership churn dataset for customer retention, behavioral analytics, and ML classification tasks.
Description:
This dataset contains approximately 1,000,000 synthetic gym member records designed for churn prediction and customer behavior analysis in the fitness industry. It includes structured features such as member demographics (age, gender), membership type, join and last visit dates, workout behavior metrics (average workout duration, calories burned, visits per month), and engagement indicators. The dataset also includes a churn label indicating whether a member has stopped attending the gym. It is intended for machine learning applications such as classification, customer retention modeling, and behavioral analytics in subscription-based fitness businesses.
Format: CSV
Size 145.16M
Tables 1
Columns 21
Rows 1,000,000
Employee-Mental-Health Public
This dataset contains approximately 150,000 synthetic employee records designed for analyzing workplace mental health and burnout patterns. It includes features related to work conditions, lifestyle factors, stress levels, sleep patterns, job role, work environment, and social support. The dataset also provides derived mental health indicators such as stress score, anxiety score, depression score, and a target label for burnout level (Low, Moderate, High) along with indicators for seeking professional help. It is widely used for classification, predictive modeling, and HR analytics in workforce well-being studies.
Size 23.06M
Tables 1
Columns 27
Rows 150,000
Student-Placement-Prediction Public
This dataset contains student-level academic and skill-based attributes used to predict whether a student will be placed in campus recruitment. It typically includes features such as CGPA, communication skills, aptitude scores, number of internships, projects, certifications, extracurricular activities, and academic performance indicators. The target variable represents placement status (placed or not placed). It is designed for classification tasks in machine learning and is widely used for educational data mining, feature importance analysis, and predictive modeling of student employability. Based on similar Kaggle placement datasets, it is generally a medium-sized dataset suitable for ML experimentation and EDA tasks.
Size 212.22M
Tables 1
Columns 32
Rows 1,000,000
Drive-Disruptive-Innovation-China Public
This dataset contains Chinese A-share listed manufacturing firm data (2014–2023) used to study the relationship between digital transformation, big data applications, and disruptive innovation. It integrates multiple sources including firm annual reports, patent data (CNIPA), and financial databases such as CSMAR and Wind. The dataset includes variables related to innovation performance, firm financial indicators, governance structure, and digital technology adoption. It is commonly used in econometrics, innovation economics, and business intelligence research to measure how digital transformation influences disruptive innovation outcomes. After cleaning and filtering, the dataset represents approximately 19,000+ firm-year observations.
Size 4.80M
Tables 1
Columns 20
Rows 19,034
Ember-Global-Electricity Public
This dataset is based on Ember’s global electricity data, providing monthly long-format records of electricity generation, demand, and emissions across dozens of countries covering most of global power consumption. It includes structured time-series data with variables such as country/region, fuel type (coal, gas, solar, wind, hydro, nuclear, etc.), electricity generation values, emissions estimates, and demand metrics. The dataset is widely used for climate analytics, energy transition research, forecasting renewable energy adoption, and global macro-energy modeling. It supports comparative analysis between countries and tracking long-term energy transition trends.
Size 87.11M
Tables 1
Columns 20
Rows 501,483
Online-Casino-Games Public
This dataset contains approximately 1.2 million+ records of online casino game outcomes collected from digital casino platforms. It includes structured game logs such as game ID, round ID, timestamps (in some versions), game results, and outcome categories depending on the casino game type (e.g., roulette-like, card-based, or slot-style games). The dataset is designed for analyzing randomness, probability distributions, and building predictive or reinforcement learning models to simulate or understand casino game behavior. It is commonly used for experimentation in AI-driven game prediction, pattern recognition, and statistical analysis of gambling systems.
Size 309.33M
Tables 1
Columns 22
Rows 1,200,000
Flight Public
This dataset contains flight booking information collected from the “Ease My Trip” platform, including details such as airline name, source and destination cities, departure and arrival times, number of stops, duration, class (economy/business), and days left until departure. The target variable is flight ticket price, making it suitable for regression-based machine learning tasks. It is widely used for exploratory data analysis, feature engineering, and building predictive models for airfare pricing trends and optimization strategies.
Size 38.08M
Tables 1
Columns 14
Rows 300,153