On the first question, is the definition of rank just for our information as part of the rest of the definitions, or is it possible to use it to determine the rank of a given x?
I’m not sure I understand, but I think the notion of rank is merely defined to introduce the notion of an eps-approximate median, which is the thing you need to find. Obviously if you wanted to find the rank of a given x exactly, it would take linear time.
Again on question one, does the algorithm have to return an element that is actually in the set, or can it give a value that seems likely to be an epsilon-approximate median?
On the first question, does our algorithm always need to return a value that is within the given rank bounds, or can it be only expected to return such an answer?
Brent: the algorithm must return an element of the set.
Mark: please refer to the last bullet on the cover sheet
Mike: epsilon is an input parameter that is at most 0.5, and is strictly more than zero.
you can do either. If you design an algorithm, then you’re not constrained by the usual running time bounds (polytime etc) for obvious reasons. If you show an existence proof, that’s fine as well.
Not necessarily, but you can ignore issues of word length i.e if the query is not an integer, it still fits in whatever number of bits is allocated to a query variable. You may also assume that standard operations like log, exp, floor, ceiling etc can be performed efficiently (although this can be dangerous in general!)
I’m not sure I understand the question. In both problems, we’re dealing with an input that must be ordered (otherwise medians and binary search are not even well defined)
On question 1 when you say “your algorithm running time will not depend on n”, what do you mean? Do you mean that we can’t have n at all in our running time? Or can we show that n essentially doesn’t matter at all in our running time?
aren’t both those statements equivalent ? In any case, the first interpretation is correct: n should NOT appear in the running time. It sounds strange, but it’s true !
p.s if you want to be overly particular, assume that operations on an individual element of the set (reading, writing, addressing etc) can be performed in constant time
You are confusing yourselves here, and I’m confusing you because I’m thinking about issues that you are not worried about (which is fine for this problem). Going back to basics, n should not appear in the running time, and that’s it.
Vikas: But what is OPT ? it is not obvious that *any* ordering will guarantee that greedy works: what I’m asking you to show is that there *is* some ordering that works.
On question 4, both parts , it says”show”….dow we ome up with exaples/counterexamples or do we need a formalized mathematial proof?
Also, the ordering is not speified. When we proove(either with examples or math), do we proove it for a partiular ordering?
Please read the last bullet on the first page. “Show” = “prove”. A proof may involve a counter example, but it also involves proving why the counter-example *is* one.
4a says “Show that for all graphs, there exists an ordering…”
4b says “Show that there exists a graph AND an ordering”….
and if I may get on my personal soapbox, there’s no such thing as a “proof with math” and a “proof with examples”. A proof is a proof is a proof. It’s not formalized or mathematical or anything: it is just a sequence of logical inferences, one following from another.
What people are thinking of when they think of a “non-formalized” proof is a sketchy handwave that misses key steps, but which the author hopes will work. Remember the skeptical verifier !
On question 1, I wanted to verify this even though it has been mentioned: Could my running time be dependent both on n and epsilon-or on epsilon alone?
Maybe i am missing something, But what happens in the following case? Given eps, if the data distribution is such that for any q that is provided as input, there is *atmost* one element within the (1+eps)q - (1-eps)q range (in the data set), will the bound still work?
If it does, havent we acheived some superfast search mechanism without compromising the search quality?
It’s a good point, and if the bound works, then it does seem like we’ve achieved something fast. But we’ve compromised on something: “there is at most one element in the range for any q”. not all point sets can satisfy that criterion, and so the lower bound, which applies to an ARBITRARY point set, no longer is valid
btw: epsilon can be more than 0.5 (I was thinking of a different formulation of the problem). epsilon will be less than 1. this doesn’t really matter either way though
Question 2 talks about being “willing to be sloppy …”, being “willing to tolerate …”, and “we are *allowed* to return …”.
If (for epsilon=.1) the query is q = 5, and our set S = {4.99, 5.01} can we return that 5 is not in S, or are we required to return an entry (if we have one) that is close enough?
but in this case, 4.99 is within the bounds, and so you should return it. the goal *is* to provide a binary search: that part is not optional. What’s relaxed is the accuracy of the answer returned
It just seemed that to say that 5 wasn’t in the set (rather than returning 4.99 or 5.01) would be a *more* accurate answer so I wondered if it would be “correct” to do so.
I guess another way to ask the same question would be:
Assuming that a regular (exact) binary search on the entire set were fast enough (which of course it isn’t), would it satisfy the specification, as far as correctness goes, for all values of epsilon?
(Don’t ask me why I care.
I’m an infinitely powerful Prover, with no desire to convince you, the Verifier :). I don’t provide information, but I will (occasionally) respond to queries.
In problem 2 is there any space restriction in the preprocessing step ? i.e. can we store any amount of information during the preprocessing step or do we have some bound ?
December 12th, 2008 at 5:24 pm
With the deadline now over, I was wondering how we might get the solutions for homework 6..
December 12th, 2008 at 6:01 pm
I’m going to email it later tonight
December 13th, 2008 at 12:02 am
On the first question, is the definition of rank just for our information as part of the rest of the definitions, or is it possible to use it to determine the rank of a given x?
December 13th, 2008 at 12:09 am
I’m not sure I understand, but I think the notion of rank is merely defined to introduce the notion of an eps-approximate median, which is the thing you need to find. Obviously if you wanted to find the rank of a given x exactly, it would take linear time.
December 13th, 2008 at 9:20 am
Again on question one, does the algorithm have to return an element that is actually in the set, or can it give a value that seems likely to be an epsilon-approximate median?
December 13th, 2008 at 9:22 am
On the first question, does our algorithm always need to return a value that is within the given rank bounds, or can it be only expected to return such an answer?
December 13th, 2008 at 9:56 am
Also for 1:
What are the bounds on epsilon?
December 13th, 2008 at 12:19 pm
Brent: the algorithm must return an element of the set.
Mark: please refer to the last bullet on the cover sheet
Mike: epsilon is an input parameter that is at most 0.5, and is strictly more than zero.
December 13th, 2008 at 1:09 pm
For 4.a do we need to show an algorithm for getting these sequences, or just show that there exists a sequence for any graph (just prove it)?
December 13th, 2008 at 1:10 pm
you can do either. If you design an algorithm, then you’re not constrained by the usual running time bounds (polytime etc) for obvious reasons. If you show an existence proof, that’s fine as well.
December 13th, 2008 at 1:45 pm
On question 2 will all of the queries be integers?
December 13th, 2008 at 1:47 pm
Not necessarily, but you can ignore issues of word length i.e if the query is not an integer, it still fits in whatever number of bits is allocated to a query variable. You may also assume that standard operations like log, exp, floor, ceiling etc can be performed efficiently (although this can be dangerous in general!)
December 13th, 2008 at 4:04 pm
For problem 1 and 2, is it unordered or ordered?
December 13th, 2008 at 4:09 pm
I’m not sure I understand the question. In both problems, we’re dealing with an input that must be ordered (otherwise medians and binary search are not even well defined)
December 13th, 2008 at 4:14 pm
But then why not just take the (n+1)/2 item as the median?
December 13th, 2008 at 4:21 pm
oh I see. so that was the question. NO, the items are not presented in ordered fashion.
What I meant in my answer is that there’s an inherent ordering property on the objects (which is what I inferred the earlier question to be asking).
December 13th, 2008 at 5:12 pm
On question 1 when you say “your algorithm running time will not depend on n”, what do you mean? Do you mean that we can’t have n at all in our running time? Or can we show that n essentially doesn’t matter at all in our running time?
December 13th, 2008 at 5:13 pm
aren’t both those statements equivalent ? In any case, the first interpretation is correct: n should NOT appear in the running time. It sounds strange, but it’s true !
p.s if you want to be overly particular, assume that operations on an individual element of the set (reading, writing, addressing etc) can be performed in constant time
December 13th, 2008 at 5:26 pm
For problem 2, we are dealing with a sorted input, right? Otherwise binary search doesn’t help us much.
December 13th, 2008 at 5:29 pm
ah. my mistake. So let me make one correction.
* you can use super-linear pre-processing time (i.e you can sort if you want to).
December 13th, 2008 at 5:32 pm
What do you mean by ‘addressing’?
December 13th, 2008 at 5:33 pm
addressing: “Give me the 10th item in the set”.
December 13th, 2008 at 5:36 pm
But by that definition, we could again simply request the (n+1)/2 item in the input set to get the median, correct?
December 13th, 2008 at 5:36 pm
Never mind, unsorted input set >_<
December 13th, 2008 at 5:39 pm
Can we consider that the greedy algorithm uses the same ordering as OPT in problem 4a, to show that greedy coloring uses optimal number of colors??
December 13th, 2008 at 5:40 pm
Is the ‘10th’ just position in the unsorted input, with nothing to do with the rank?
December 13th, 2008 at 5:44 pm
You are confusing yourselves here, and I’m confusing you because I’m thinking about issues that you are not worried about (which is fine for this problem). Going back to basics, n should not appear in the running time, and that’s it.
Vikas: But what is OPT ? it is not obvious that *any* ordering will guarantee that greedy works: what I’m asking you to show is that there *is* some ordering that works.
December 13th, 2008 at 5:48 pm
On question 4, both parts , it says”show”….dow we ome up with exaples/counterexamples or do we need a formalized mathematial proof?
Also, the ordering is not speified. When we proove(either with examples or math), do we proove it for a partiular ordering?
December 13th, 2008 at 5:49 pm
rather, an we assume (& state) a definite ordering and proove it from there….?
December 13th, 2008 at 5:51 pm
Please read the last bullet on the first page. “Show” = “prove”. A proof may involve a counter example, but it also involves proving why the counter-example *is* one.
4a says “Show that for all graphs, there exists an ordering…”
4b says “Show that there exists a graph AND an ordering”….
December 13th, 2008 at 5:54 pm
and if I may get on my personal soapbox, there’s no such thing as a “proof with math” and a “proof with examples”. A proof is a proof is a proof. It’s not formalized or mathematical or anything: it is just a sequence of logical inferences, one following from another.
What people are thinking of when they think of a “non-formalized” proof is a sketchy handwave that misses key steps, but which the author hopes will work. Remember the skeptical verifier !
December 13th, 2008 at 9:27 pm
I saw only 4 questions in the final yesterday. But now I see five. Do we have to answer all the questions or do we get a choice ?
December 13th, 2008 at 9:29 pm
say what ? oops. let me fix that. no there’s no 5th question
December 14th, 2008 at 2:00 pm
On question 1, I wanted to verify this even though it has been mentioned: Could my running time be dependent both on n and epsilon-or on epsilon alone?
December 14th, 2008 at 2:02 pm
No, it only depends on epsilon
December 14th, 2008 at 4:27 pm
For the Sloppy Search, is there a limit on the possible values of \eps in terms of n, or can \eps just be anything in the range (0,1)?
December 14th, 2008 at 5:05 pm
For sloppy search, am I correct in assuming that we don’t know what q or epsilon are during our preprocessing?
December 14th, 2008 at 5:20 pm
parasaran: eps can be anything.
Questioner: epsilon is given in advance, but not q.
December 14th, 2008 at 5:55 pm
Maybe i am missing something, But what happens in the following case? Given eps, if the data distribution is such that for any q that is provided as input, there is *atmost* one element within the (1+eps)q - (1-eps)q range (in the data set), will the bound still work?
If it does, havent we acheived some superfast search mechanism without compromising the search quality?
December 14th, 2008 at 6:16 pm
It’s a good point, and if the bound works, then it does seem like we’ve achieved something fast. But we’ve compromised on something: “there is at most one element in the range for any q”. not all point sets can satisfy that criterion, and so the lower bound, which applies to an ARBITRARY point set, no longer is valid
December 14th, 2008 at 7:47 pm
btw: epsilon can be more than 0.5 (I was thinking of a different formulation of the problem). epsilon will be less than 1. this doesn’t really matter either way though
December 15th, 2008 at 10:31 am
Question 2 talks about being “willing to be sloppy …”, being “willing to tolerate …”, and “we are *allowed* to return …”.
If (for epsilon=.1) the query is q = 5, and our set S = {4.99, 5.01} can we return that 5 is not in S, or are we required to return an entry (if we have one) that is close enough?
December 15th, 2008 at 10:49 am
but in this case, 4.99 is within the bounds, and so you should return it. the goal *is* to provide a binary search: that part is not optional. What’s relaxed is the accuracy of the answer returned
December 15th, 2008 at 11:02 am
It just seemed that to say that 5 wasn’t in the set (rather than returning 4.99 or 5.01) would be a *more* accurate answer so I wondered if it would be “correct” to do so.
I guess another way to ask the same question would be:
Assuming that a regular (exact) binary search on the entire set were fast enough (which of course it isn’t), would it satisfy the specification, as far as correctness goes, for all values of epsilon?
(Don’t ask me why I care.
Does the question make sense?
Thanks.
December 15th, 2008 at 11:05 am
Well it would satisfy the specification for eps = 0.
December 15th, 2008 at 12:57 pm
Any more hints you can give on question 2? I’m sure there are many of us who could appreciate it
December 15th, 2008 at 12:58 pm
I’m an infinitely powerful Prover, with no desire to convince you, the Verifier :). I don’t provide information, but I will (occasionally) respond to queries.
December 15th, 2008 at 1:12 pm
For paper submissions-dow we still slip them under your door….I dont want to messup like last time….
December 15th, 2008 at 1:13 pm
yup. under my door it is
December 15th, 2008 at 1:34 pm
In problem 2 is there any space restriction in the preprocessing step ? i.e. can we store any amount of information during the preprocessing step or do we have some bound ?