Skip to content

UDT Types schema samples #58

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions java/datastax-v4/udt-types/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

152 changes: 152 additions & 0 deletions java/datastax-v4/udt-types/README.MD
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# User-defined types (UDTs) in Amazon Keyspaces

Amazon Keyspaces (for Apache Cassandra) allows the use of user-defined types (UDTs) to optimize data organization and enhance data modeling capabilities.

A user-defined type (UDT) is a grouping of fields and data types that you can use to define a single column in Amazon Keyspaces.
Valid data types for UDTs are all supported Cassandra data types, including collections and other UDTs that you've already created in the same keyspace.

For more information about supported Keyspaces data types, see [Cassandra data type support](https://docs.aws.amazon.com/keyspaces/latest/devguide/cassandra-apis.html#cassandra-data-type) .


# Amazon Keyspaces Real Estate Schema Example

This example demonstrates a comprehensive Real Estate data model using Amazon Keyspaces with User Defined Types (UDTs). The schema showcases how to structure complex real estate data including property details, market analytics, and location-based queries.


## Schema Overview
The real estate schema consists of three main tables and seven UDTs that model comprehensive property information:

### User Defined Types (UDTs)

- **address_details**: Complete address information including coordinates and neighborhood data
- **property_specifications**: Physical property details (size, rooms, construction details)
- **property_amenities**: Features and amenities (pool, security, smart home features)
- **financial_details**: Pricing, taxes, HOA fees, and financing information
- **location_quality**: School districts, walkability scores, and area ratings
- **market_intelligence**: Market trends, days on market, and pricing analytics
- **listing_details**: Agent information, listing status, and marketing materials

### Tables

1. **properties**: Main property table with complete property information
2. **properties_by_location**: Optimized for geographic and price range queries
3. **market_analytics**: Time-series market data for trend analysis

## Sample Data

The schema includes sample data for the Seattle/Bellevue area:

- **Luxury Properties**: High-end homes and condos ($1M+)
- **Mid-Range Properties**: Family homes and investment properties ($500K-$1M)
- **Market Analytics**: Historical market data with trends and statistics
- **Location Data**: Properties organized by ZIP code and property type

## Quick Start

### Prerequisites

- Amazon Keyspaces access or local Cassandra installation
- cqlsh or cqlsh-expansion for Amazon Keyspaces

### Setup

1. **Execute the setup script:**
```bash
cd real_eastate_schema_sample
chmod +x execute_real_estate_schema.sh
./execute_real_estate_schema.sh
```

2. **Manual setup (alternative):**
```bash
# Create keyspace
cqlsh -e "CREATE KEYSPACE IF NOT EXISTS real_estate WITH REPLICATION = {'class': 'SingleRegionStrategy'};"

# Execute files in order
cqlsh -f 01_real_estate_udts.cql
cqlsh -f 02_properties_table.cql
cqlsh -f 03_sample_luxury_properties.cql
cqlsh -f 04_sample_midrange_properties.cql
cqlsh -f 05_sample_properties_by_location.cql
cqlsh -f 06_sample_market_analytics.cql
```


## Sample Queries

The schema supports various query patterns:

### Location-Based Queries
```cql
-- Properties by ZIP code and type
SELECT property_id, address.street_name, financial.listing_price
FROM properties_by_location
WHERE zip_code = '98101'
AND property_type = 'condo';
```

### Market Analytics
```cql
-- Market trends over time
SELECT area_code, date, avg_price, median_price,
market_stats.market_temperature
FROM market_analytics
WHERE area_code = '98101'
AND property_type = 'condo';
```

## Schema Features

### Complex Data Modeling
- **Nested UDTs**: Rich data structures for comprehensive property information
- **Collections**: Lists and maps for amenities, school ratings, and features
- **Flexible Schema**: Easy to extend with additional property attributes

### Query Optimization
- **Partition Keys**: Efficient data distribution by location and property type
- **Clustering Keys**: Ordered data for range queries and time-series analysis

### Real Estate Use Cases
- **Property Listings**: Complete MLS-style property information
- **Market Analysis**: Historical trends and comparative market analysis
- **Location Intelligence**: School districts, walkability, and neighborhood data
- **Investment Analysis**: Financial metrics and market performance

## File Structure

```
real_eastate_schema_sample/
├── 01_real_estate_udts.cql # UDT definitions
├── 02_properties_table.cql # Table schemas
├── 03_sample_luxury_properties.cql # High-end property samples
├── 04_sample_midrange_properties.cql # Mid-range property samples
├── 05_sample_properties_by_location.cql # Location-based data
├── 06_sample_market_analytics.cql # Market trend data
├── 07_sample_queries.cql # Example queries
└── execute_real_estate_schema.sh # Setup script
```

## Data Model Benefits

### Structured Data
- **Type Safety**: UDTs provide schema validation
- **Data Integrity**: Consistent structure across all properties
- **Rich Metadata**: Comprehensive property and market information

### Flexibility
- **Extensible**: Easy to add new UDT fields
- **Backward Compatible**: Schema evolution without breaking changes
- **Multi-Use**: Supports various real estate applications

---
## Amazon Keyspaces Considerations
When using with Amazon Keyspaces:
- Use point-in-time recovery for data protection
- Consider on-demand billing for variable workloads
- Implement proper IAM policies for data access

### Next Steps

1. **Extend the Schema**: Add more UDTs for additional property types
2. **Implement Applications**: Build real estate applications using this schema
3. **Monitor Performance**: Use CloudWatch metrics to optimize queries
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
-- Real Estate UDT Definitions
-- Execute these first before inserting sample data

USE real_estate;

-- Address Information
CREATE TYPE IF NOT EXISTS address_details (
street_number TEXT,
street_name TEXT,
unit_number TEXT,
city TEXT,
state TEXT,
zip_code TEXT,
county TEXT,
country TEXT,
latitude DECIMAL,
longitude DECIMAL,
timezone TEXT,
neighborhood TEXT,
subdivision TEXT
);

-- Property Specifications
CREATE TYPE IF NOT EXISTS property_specifications (
property_type TEXT,
property_subtype TEXT,
square_feet INT,
lot_size_sqft INT,
bedrooms INT,
bathrooms DECIMAL,
half_baths INT,
stories INT,
year_built INT,
year_renovated INT,
garage_spaces INT,
parking_spaces INT,
basement_type TEXT,
foundation_type TEXT,
roof_type TEXT,
exterior_material TEXT,
heating_type TEXT,
cooling_type TEXT,
flooring_types LIST<TEXT>
);

-- Property Amenities
CREATE TYPE IF NOT EXISTS property_amenities (
pool BOOLEAN,
pool_type TEXT,
spa_hot_tub BOOLEAN,
fireplace BOOLEAN,
fireplace_count INT,
deck BOOLEAN,
patio BOOLEAN,
balcony BOOLEAN,
fence BOOLEAN,
fence_type TEXT,
security_system BOOLEAN,
alarm_system BOOLEAN,
sprinkler_system BOOLEAN,
central_vacuum BOOLEAN,
intercom_system BOOLEAN,
elevator BOOLEAN,
wheelchair_accessible BOOLEAN,
solar_panels BOOLEAN,
energy_efficient_appliances BOOLEAN,
smart_home_features LIST<TEXT>,
outdoor_features LIST<TEXT>,
interior_features LIST<TEXT>
);

-- Financial Details
CREATE TYPE IF NOT EXISTS financial_details (
listing_price DECIMAL,
original_list_price DECIMAL,
price_per_sqft DECIMAL,
last_sold_price DECIMAL,
last_sold_date DATE,
assessed_value DECIMAL,
assessment_year INT,
annual_property_taxes DECIMAL,
monthly_hoa_fees DECIMAL,
hoa_name TEXT,
special_assessments DECIMAL,
homeowners_insurance_estimate DECIMAL,
utilities_included LIST<TEXT>,
financing_options LIST<TEXT>
);

-- Location Quality
CREATE TYPE IF NOT EXISTS location_quality (
school_district TEXT,
elementary_school TEXT,
middle_school TEXT,
high_school TEXT,
school_ratings MAP<TEXT, INT>,
crime_index INT,
walkability_score INT,
transit_score INT,
bike_score INT,
noise_level TEXT,
air_quality_index INT,
flood_zone TEXT,
earthquake_zone TEXT,
hurricane_zone TEXT,
nearby_amenities MAP<TEXT, DECIMAL>,
commute_times MAP<TEXT, INT>,
walkable_destinations LIST<TEXT>
);



-- Market Intelligence
CREATE TYPE IF NOT EXISTS market_intelligence (
days_on_market INT,
listing_views INT,
showing_count INT,
offer_count INT,
market_temperature TEXT,
price_trend_30_days DECIMAL,
price_trend_90_days DECIMAL,
price_trend_1_year DECIMAL,
comparable_sales_count INT,
inventory_level TEXT,
absorption_rate DECIMAL,
median_dom_area INT,
price_per_sqft_area DECIMAL,
market_velocity TEXT
);

-- Listing Details
CREATE TYPE IF NOT EXISTS listing_details (
listing_agent_name TEXT,
listing_agent_phone TEXT,
listing_agent_email TEXT,
listing_brokerage TEXT,
listing_date DATE,
listing_status TEXT,
listing_type TEXT,
showing_instructions TEXT,
commission_rate DECIMAL,
buyer_agent_commission DECIMAL,
listing_remarks TEXT,
private_remarks TEXT,
virtual_tour_url TEXT,
video_tour_url TEXT,
floor_plan_url TEXT
);
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
-- Main Properties Table
USE real_estate;

CREATE TABLE IF NOT EXISTS properties (
property_id UUID PRIMARY KEY,
mls_number TEXT,
address frozen<address_details>,
specifications frozen<property_specifications>,
amenities frozen<property_amenities>,
financial frozen<financial_details>,
location frozen<location_quality>,
market frozen<market_intelligence>,
listing frozen<listing_details>,
created_at TIMESTAMP,
updated_at TIMESTAMP,
data_source TEXT,
data_quality_score INT
);

-- Properties by Location Table
CREATE TABLE IF NOT EXISTS properties_by_location (
zip_code TEXT,
property_type TEXT,
price_range TEXT,
property_id UUID,
address frozen<address_details>,
specifications frozen<property_specifications>,
financial frozen<financial_details>,
location frozen<location_quality>,
listing frozen<listing_details>,
market frozen<market_intelligence>,
PRIMARY KEY ((zip_code, property_type), price_range, property_id)
) WITH CLUSTERING ORDER BY (price_range ASC, property_id ASC);

-- Market Analytics Table
CREATE TABLE IF NOT EXISTS market_analytics (
area_code TEXT,
date DATE,
property_type TEXT,
avg_price DECIMAL,
median_price DECIMAL,
avg_price_per_sqft DECIMAL,
total_listings INT,
new_listings INT,
closed_sales INT,
pending_sales INT,
avg_days_on_market INT,
inventory_months DECIMAL,
market_stats frozen<market_intelligence>,
PRIMARY KEY ((area_code, property_type), date)
) WITH CLUSTERING ORDER BY (date DESC);
Loading