背景
本人是帆板运动爱好者,这项运动依赖风力和相对难以运输的设备,所以很多爱好者会去常年风力充沛的地方旅行,并租用当地俱乐部的器材。
在逛Reddit相关论坛的时候,发现很多人在问XXX区域有没有俱乐部推荐,所以想做一个帆板俱乐部导航站,帮帆板爱好者基于地理位置,提供的服务以及支持的水上运动类型去筛选俱乐部。
用了哪些技术/工具
- 网页框架:Next.JS
- 数据库:Sanity (Headless CMS)
- 部署:Vercel
- 开发:Claude Code, Cursor
- 俱乐部数据的收集:Exa AI的Webset(一个基于自然语言搜索,返回结构化数据的产品)
- 网站内容爬取:Jina AI
开发流程
- 在EXA AI中,提问每个大洲的帆板俱乐部有哪些,让他给出URL(这里EXA会自动做去重,总结,并给你最符合需求的URL)

基于URL,使用Jina AI爬取网站内容,并传给大模型(提示词如下,是Claude Code写的,我稍微改了一下)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57You are a professional windsurfing website service analysis expert. Please carefully analyze the following website content and identify the specific services provided by this website/organization.
## Website Information:
**URL**: {url}
**Website Title**: {title}
**Existing Description**: {long_description}
**Website Content**: {website_content}
## Analysis Task:
Based on the website content and existing description, please analyze the services provided by this website and return ONLY the following JSON format:
\```json
{{
"provided_service": [
"equipment_rental",
"training_lessons"
],
"supported_watersports": [
"Windsurfing",
"Kitesurfing",
"SUP"
],
"confidence_score": 0.85,
"target_audience": "Target audience description",
"note": "Additional notes if needed, especially when 'other' category is used or important details need clarification"
}}
\```
## Service Category Definitions (STRICT - Use EXACTLY these 9 categories):
**You MUST use only these exact category names, do not modify or create new ones:**
- equipment_rental: Equipment rental (windsurfing boards, sails, wetsuits, helmets, masts, etc.)
- training_lessons: Training courses (beginner courses, advanced techniques, private lessons, etc.)
- gear_storage: Equipment storage services (board storage, equipment keeping, etc.)
- repairs_maintenance: Repair and maintenance services (board repairs, equipment maintenance, sail repairs, etc.)
- retail_sales: Retail sales (windsurfing equipment sales, accessories sales, etc.)
- accommodation: Accommodation services (windsurfing resorts, guesthouses, hotels, etc.)
- food_beverage: Food and beverage services (beachside restaurants, bars, cafes, etc.)
- event_racing: Event organization, racing competitions, regattas, tournaments, and competitive water sports events
- other: Other services (tourism services, transportation, insurance, etc.)
## Important Rules:
1. **STRICT SERVICE CATEGORIES**: Use ONLY the 9 exact category names listed above
2. **If you use "other"**: Explain what specific services fall under "other" in the "note" field
3. **Supported Watersports**: List all water sports mentioned (Windsurfing, Kitesurfing, Sailing, SUP, etc.)
4. **Confidence Score**: 0.0-1.0 based on information clarity and completeness
5. **Target Audience**: Be specific about who the services are for
6. **Note Field**: Use for clarifications, especially when "other" category is needed
## Analysis Focus:
- Focus on water sports related services
- Identify core services: equipment, training, storage, sales
- Consider user needs: beginners, advanced, equipment owners
- Note any unique or special services that don't fit standard categories
Please return ONLY valid JSON format without any additional text.大模型基于网站内容,按照提示词中定义的JSON格式,输出每个俱乐部的位置,服务,运动类型等信息,并传到数据库。
让AI写了个脚本,给每个网站截图并获取Icon,传到数据库
调用Here API, 为每个俱乐部增加经纬度数据,传到数据库
遇到的问题:
AI的幻觉问题。
主要是AI判断俱乐部的水上运动类别这块,加了一些本不存在的运动类别,以Windsurfing(帆板)为主。
我怀疑是提示词中出现了太多Windsurfing 文本导致的。 不过情况不太严重,人工处理掉了。Here API算出来的经纬度不准确
review数据的时候, 发现有些俱乐部实际地址,和Here返回的经纬度根本不在一个大洲,于是让Claude Code写了个脚本做交叉验证:用俱乐部名字作为搜索词,将其输入Google Map,得到Google map的坐标
将Google的坐标和Here的坐标对比,距离差距>100KM的,列为人工Review Item
人工验证并最后更新
Btw, 人工验证的结论是,Google Map在POI方面的准确度,是远高于Here的,但是当时卡在银行卡验证上,没能申请到Google Map的API,所以才用了Here。
使用Claude Code的tips
多用Git。
即使是单人项目,每完成一个功能,也要提交Git,避免AI给你生成屎山代码,又难以回滚
写好Claude.md
它的作用主要有两个:将项目架构告诉大模型,节省它搜索代码库的时间和token消耗
设置全局的guideline,避免每次新开对话都要重复要求,比如代码中的注释要用英文等
如果你不知道怎么写,让Claude Code帮你写一个也行。
这个是我项目的Claude.md
1 | # Windsurfing Club Directory |
- 及时用 /clear 清理聊天记录
由于大模型的上下文空间有限,读的代码多的话,几轮对话下来可能就达到上限了,此时Claude code会自动开启新的对话,并总结你前面的聊天内容作为背景信息。 问题是压缩会丢掉很多关键信息,导致压缩后的代码各种报错。
所以为了避免报错,尽量让一个功能在几轮对话内搞定,然后clear,再开启新对话。
