[Analysis] API Efficiency Issues and Optimization Opportunities #25
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Executive Summary
Assumption: CONFIRMED. The current implementation is inefficient. The Polymarket API does NOT support server-side filtering or sorting. We must fetch all events and filter/sort client-side.
API Behavior Findings
Current Inefficiencies
Rate Limit Testing
Proposed Solutions
Questions for Discussion
Full analysis:
docs/api-efficiency-analysis.mdParallel Page Fetches
for parallelization, how much parallel should we do in a single process? I meant for example if there is 50 pages. should we just run 50 parallelization? or we will get 4 pages at a time? how to define how many parallezation? is it regarding cpu or memory?
Smarter Early Exit
Changing it to OR will make it to be a different purpose. AND means we want both of them. OR means we want either of them. AND needed when user want to get both of them. while OR needed when user doesn't care which one is it as long as they got for example 10 markets.
Caching Layer
yes, I was going to suggest caching. can you propose and benchmark also put tradeoff is there better option than caching it for 5 mins? new markets doesn't appear in every seconds or something. I think if user want to get new data without caching, they should put an extra flag like --refresh or something. what do you think?
Volume-Weighted Sampling
I don't quite understand what is the purpose of it
I want to keep both speed and completeness. I don't want to have 1 second speed but only 1% completeness. I want something that doesn't take too long but also completeness is reach the threshold. what do you think about it?
Thanks for the detailed questions. Here are my responses:
Parallel Page Fetches
It is about network I/O and avoiding overwhelming the API, not CPU/memory-bound. A good starting point is
min(5, total_pages)concurrent requests. With 50 pages, we would do batches of 5. This balances speed and being respectful to the API.Smarter Early Exit
You are right. The current AND logic is semantically correct for "I want both match and non-match markets." The issue is it is too conservative. We could add an early_exit_mode parameter:
early_exit_mode="any"(stop when either quota met) vsearly_exit_mode="all"(current behavior). But given your requirements below, this may not be needed.Caching Layer
Agreed. Proposal:
cache_ttl=300(5 min default) with--no-cacheor--refreshflag to bypass. Also consider stale-while-revalidate pattern: serve cached data immediately while fetching fresh data in background.Volume-Weighted Sampling
Given the API returns only 5 events/page and does not sort by volume, I was exploring "what if we only fetch first 10 pages (50 events) instead of all 155?" We would get ~35% of events but possibly 70%+ of high-volume ones. But honestly this is a poor tradeoff. Discarding it.
Revised Proposal (based on your requirements)
You want BOTH speed AND completeness, not a tradeoff.
This gives you speed (parallel + caching) AND completeness (full data when needed).
Shall we proceed with this hybrid approach?
Parallel Page Fetches
yes I think going with min(5, total_pages) is good for now. we'll see later if we need further enchantment or not
Smarter Early Exit
I think once we got caching, we don't really need this one? maybe an optional parameter of (max_total)? so basically if max_matches is 10 and then max_non_matches is 10 but max_total is 15, it means that once we reached 15 we will exit early. what do you think?
Parallel Page Fetches
Agreed. min(5, total_pages) as the initial concurrency level.
Smarter Early Exit / max_total
I like the max_total idea. It is simpler and more intuitive than early_exit_mode. Semantics:
max_total=15means "fetch at most 15 events total, then stop early"Implementation:
This gives users a new way to get quick results without changing the existing max_matches/max_non_matches behavior.
Shall I proceed with this plan?
yes lets proceed
Great. Starting implementation now.
Plan:
Will create PR once done.