A few months ago I wrote a spider to fetch content from various RSS feeds and count the words in the posts. Since then, it's fetched over 20,000 documents, discovered usages of 50,000 words, and counted a grand total of 18 million word instances.
The 30 most common words are:
the to and of in was that her he his you it for on with she as at is had him be said me but have my up from not
"the" alone counts for almost 5% of all words used.
The most common words of at least 10 characters are
everything government understand completely immediately information especially expression themselves remembered
"everything" makes up only 0.04% of words in the corpus.
Full results can be found in this CSV file. This file is being placed in the public domain, however if you discover anything interesting when using it, I'd love to hear about it.
The spider fetched new entries from a variety of RSS feeds to create the corpus - feeds used include Newsweek, People, TechCrunch, FanFiction.net, and Sports Illustrated.
A spider fetched new entries from these sites every 4 hours. Each link was followed, converted to a rough text format, and split in to sentences. Since it was hard to separate page structure from content, each sentence was only counted once - in an attempt to reduce the prevalence of text like "Click here to sign up" or "Make CNN your home page". Sentences with two or less words, all capitalized words, or consisting entirely of lower case letters were also ignored.
Each sentence was then split in to words, and the word list was filtered against the ENABLE2K data set. 96,000 words were found that were not valid English - these may have been names or word fragments. 125,000 words in the ENABLE2K set were not found in any document.
The system is far from perfect: for instance, the word "de" is erroneously counted among the top 50 most popular words.
Recently I took part in my first podcast. Near the end of it, I somewhat accidentally made a point that's bounced around in my head for a while - and I wanted to explain it and flesh it out.
The single biggest advantage, in my opinion, that a small company has over a large one is that the user is priority 1. Small companies have to make something useful and interesting, and make it fast. Big companies have a lot of mouths to feed, and as such are prone to being risk-averse.
If a big company releases a product that sucks, or fails to meet an announced release date, or releases something with security problems - the press will be there in force, and there will be layoffs, blame,
and stockholder meetings. When a stealth mode startup does something
stupid, nobody notices.
This is why startups can always make the user their first priority. A
big company has to make not screwing up their first priority. If the
right choice for the user involves substantial technology risk, the
big company usually won't go there until they're dragged there.
Because if that risk doesn't pay off, they're going to be embarrassed.
And embarrassment is scarier than releasing a less interesting
As a startup, you have to beat the brand name recognition and PR
departments of your big company competitors by releasing a product
that is blindingly useful and fun to use. So the right choice for a
startup is always to take the technology risk because if it pays off,
they have a chance to survive to take another risk.
This is why indie music is almost always more interesting than pop.
Local restaurants will be more interesting than chains. And startup
products will be more interesting than corporations. Not always
better - but at least more interesting.
A few weeks ago I installed the free upgrade to Windows Vista that came with my new PC. Yesterday, I backed up my data, reformatted my hard drive, and re-installed Windows XP. Here's why I'm so happy I did:
My PC is three weeks old, has a brand new high end Core Duo processor, and 2GB of RAM. Yet, everything in Vista takes forever. Switching programs takes 1 second. Switching tabs in FireFox takes 1 second. Opening a popup menu in explorer once took 6 seconds. In that span of time, my processor could have performed at least 15 billion operations. These apparently were all required to show me a popup menu.
Many new features in Vista seem to be aimed at copying something in OS X. Unfortunately, every one of these copied features misses the mark big time.
One such copy seems indicative of the whole Vista experience. On boot up, instead of seeing potentially confusing system messages, you now see a progress bar and the word Microsoft. This is a lot like the boot up screen with a Mac. Except, the word Microsoft is blurry. I know, who cares? But it is a great metaphor for the Vista experience. Decent (stolen) ideas, bad and lazy implementation.
For example: now Windows has a sidebar with widgets, clearly aimed to fill the same need as OS X's Dashboard. Except, the clock widget only supports a 24 hour clock. You can only have one sticky note. You can't have the sidebar auto-hide. Did this really take 6 years?
Really, the main changes in Vista are the new look and feel. Finally, I can see a weird 3D slanty angle view of all my Windows! Hooray! And it only takes 3 seconds to render!
Now my window headers can look like frosted glass. Why?
Now when you close windows, they shrink to oblivion. But, unlike Apple's genie effect, which reinforces visually where my program "went", this animation just wastes time and processor cycles.
Or take FreeCell. The cards are much prettier now and when you click on one - it glows! Woooooh. Unfortunately, it's nearly impossible to see the faint yellow glow and so you can't tell which card you've selected. Sometimes prettier is shittier.
Just Plain Bad
I had a folder that Vista magically decided should be read only. So, I opened the properties window, unchecked read only, and clicked OK. But I still had trouble with the folder. So, I opened the properties window. Lo and behold it was still read only. What?
Apparently, this error-message-free ignoring of my wishes had to do with Vista's new security features - one of which involves having a ton of users and meta-users like 'Everyone', 'All Users', 'Guests', etc. To make the file actually not read only required setting up permissions for 3 meta-users. Why?
Or, if I went to my 'Documents' folder, and then tried to click 'Pictures' I was told I did not have access. But, if I clicked on the same folder elsewhere (the 'Pictures' icon underneath my name in the left) I could get in. This doesn't even make sense.
I Did Like One Feature
The chess game was fun.
Lessons I've Learned
As a software engineer myself I feel like I need to learn something from this mess, if only to justify all the time I wasted waiting for popup menus.
1) Don't blindly copy - Vista copies the surface level features of OS X but fails to understand their motivation. They should have focused on actually solving the user's every day problems.
2) Performance is key - There is nothing worse than waiting for something that shouldn't take time. I'm happy to wait for CDs to burn, or Photoshop to apply a filter. I understand that these things are "hard". But there is nothing hard about switching from one window to another.
3) Usability > Design - Graphic design should be make the user's experience easier and more visually pleasant. In that order.
4) Get a Mac - After 13 years of being a dedicated PC enthusiast and Mac basher, I think I'm ready to switch. I don't want to rely on a company that takes 6 years to make their product worse.
Anyone else have similar experiences?
So, you're lucky enough to have received an interview from YCombinator. Congratulations - you're a good hacker and now you get a free trip to Mountain View or Boston.
It's not a job interview
This is definitely not like a job interview. You won't be wearing a suit and neither will they. Nobody has arranged the chairs to make sure you feel inferior. Don't worry about whether your posture is at the right angle to indicate "interest" but not "aggression". This interview is about more important things, such as:
Passion for the idea
Passion is contagious. When someone truly cares about something and puts themselves on the line for it people always take notice. Taylor Hicks won American Idol. QED
Make sure you have an idea you really care about. Don't try to fake it - if you're not actually passionate about your idea then come up with a new one. Quickly. Your idea should show up in your sleep. I have actually had dreams where parts were on slides in bullet points. I wish I was kidding.
That said - don't mistake passion for stubbornness. You should be as excited about your idea as possible without being violently attached to it. If you're passionate about a bad idea it's probably because there's some really good morsels in there. YC will help you pick out the good parts.
Whenever somebody says your idea has something wrong with it, they are giving you a chance to learn something. The right answer is always "why do you think that?" instead of "we'll prove you wrong".
(Re-)Re-Read the application
Every question asked in the interview is hidden somewhere in your application. Think about it: they've read 400 - 500 of these things. They don't remember yours or anybody else's all that well.
Naturally then, they'll ask you things you've already covered. They'll challenge you to think critically about your idea, how to market it, who is going to use it, and they might even ask how it will make money. All of these things are on the application! Read over it again. Expand on your answers and re-think them.
Bring a demo
Whether or not you have a demo already done, you need one for your interview. Start building it now! We didn't have a demo when we applied, but we had one for our interview thanks to two weeks of non-stop coding (whenever we weren't at our day jobs). Plus, it's nice to be able to say: "Look what we did in two weeks!" If you already had a demo, code like crazy anyways. Show progress.
It doesn't have to be done, it just has to be as much as possible. It gives you something to talk about, and it's nice to show that your ideas are already turning in to reality.
Start your company anyways
If you don't get accepted to YC, you should start your company anyways. You need to believe and show them that if they don't invest in you, they will have missed their chance at being in on the next big thing. PG talks about this frequently - nothing drives acquisitions and investments quite like fear.
My mom would be disappointed if I didn't include this mantra in every part of my life. But it's actually true. I hate job interviews - they're very artificial and weird. The YC interview is a conversation that honestly is exhilarating.
Enjoy it - and good luck!
I just finished (graduated?) YCombinator's Winter program and it has been a truly fantastic 3 months. So many things in my life are so different at this point - I know what I want to do, I know how much I can do, and I know people who can help me do it. I wanted to share some advice from my experience, and I'm going to start from the beginning.
My co-founder Wayne and I decided to apply for YC way before the deadline. We spent weeks on our application, and I think this was a Good Thing. The application basically tests three things:
1. Are you a good hacker?
This is the key to getting funded by YC. And, there's really only one way to prove that you are: build something at a public URL. For me, it was Majigs and QuickRef and for Wayne it was Count to Nine. For you, it could be a demo of your product. But make sure they have something to go look at.
2. Can you think through an idea?
This is why I'm glad we spent so much time on our application. The questions on the YC app are crucial - you will have to refine your answers to these as you go further in the program. Things like "What do other people have to do now because your product doesn't exist yet?" are great places to figure out why your product is different, and more importantly, better. Even if you don't get in, your product will be better off for having thought through these questions. They are worth your time and honest appraisal.
3. Are you sane?
I used to direct an improv comedy troupe in college. Every semester we held auditions and it was always an eye-opening experience. You'll never find a more interesting group of people than when you say: "Prove to me that you're interesting". I imagine YC goes through this - except we only had to deal with 60 people auditioning whereas they get hundreds of applications.
I'm not sure how to give advice on how to be sane. I guess, if this question angers you, you probably are not sane.
Better late than never?
The SFP application deadline has passed, and thus this advice comes a bit late. Hopefully it helps the next round of applicants - or helps those of you who get interviews. In any case, good luck!